A Performance Analysis for Microprocessor Architectures

Similar documents
An Adaptive Control Scheme for Multi-threaded Graphics Programs

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 06: Multithreaded Processors

Simultaneous Multithreading on Pentium 4

Understanding Dual-processors, Hyper-Threading Technology, and Multicore Systems

Introduction to Microprocessor

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Exploring the Effects of Hyperthreading on Scientific Applications

MICROPROCESSOR ARCHITECTURE

Advanced Processor Architecture

32 Hyper-Threading on SMP Systems

Chapter 1: Introduction to the Microprocessor and Computer 1 1 A HISTORICAL BACKGROUND

Parallel Programming Multicore systems

MICROPROCESSOR TECHNOLOGY

Install and Configure ICT Equipment and Operating Systems Unit 229 Central Processing Unit (CPU)

Scaling PortfolioCenter on a Network using 64-Bit Computing

Superscalar Processors

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

HISTORY OF MICROPROCESSORS

Performance Benefits of OpenVMS V8.4 Running on BL8x0c i2 Server Blades

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP

Intel Enterprise Processors Technology

A+ Guide to Managing & Maintaining Your PC, 8th Edition. Chapter 5 Supporting Processors and Upgrading Memory

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Systems Design and Programming. Instructor: Chintan Patel

Why Parallel Architecture

This Material Was All Drawn From Intel Documents

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

3.3 Hardware Parallel processing

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 03, SPRING 2013

The Use of Cloud Computing Resources in an HPC Environment

CS 351 Exam 1, Fall 2010

Computer Architecture. Introduction. Lynn Choi Korea University

School of Computer and Information Science

1 Simultaneous Multi-Threading on SMP Systems

System Requirements. SuccessMaker 7

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Xbox 360 high-level architecture

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

Department of Industrial Engineering. Sharif University of Technology. Operational and enterprises systems. Exciting directions in systems

A Thread is an independent stream of instructions that can be schedule to run as such by the OS. Think of a thread as a procedure that runs

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

Chapter 1 Introduction

Hyperthreading Technology

System Requirements. SuccessMaker 8

Multicore Hardware and Parallelism

Multi-Core Microprocessor Chips: Motivation & Challenges

Update for New Implementations. As new implementations of the Itanium architecture

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters

GPGPU/CUDA/C Workshop 2012

IJESR Volume 1, Issue 1 ISSN:

INTRODUCTION TO MICROPROCESSORS

A+ Guide to Hardware, 4e. Chapter 4 Processors and Chipsets

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Final Lecture. A few minutes to wrap up and add some perspective

Online Course Evaluation. What we will do in the last week?

Parallelism in Hardware

A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures

SYSTEM BUS AND MOCROPROCESSORS HISTORY

Intel Core i7 Processor

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1

Xbox 360 Architecture. Lennard Streat Samuel Echefu

Character Segmentation and Recognition Algorithm of Text Region in Steel Images

Technology in Action

Scheduling the Intel Core i7

Course overview Computer system structure and operation

Out-of-Order Parallel Simulation of SystemC Models. G. Liu, T. Schmidt, R. Dömer (CECS) A. Dingankar, D. Kirkpatrick (Intel Corp.)

Memory Systems IRAM. Principle of IRAM

Hyper-Threading Influence on CPU Performance

X-Stream II. Processing Method. Operating System. Hardware Performance. Elements of Processing Speed TECHNICAL BRIEF

Concurrent Object Oriented Languages

without too much work Yozo Hida April 28, 2008

Fundamentals of Computers Design

Last class: Today: Course administration OS definition, some history. Background on Computer Architecture

Quad-core Press Briefing First Quarter Update

Benchmark Performance Results for Pervasive PSQL v11. A Pervasive PSQL White Paper September 2010

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel.

Intel released new technology call P6P

vsphere Resource Management Update 1 11 JAN 2019 VMware vsphere 6.7 VMware ESXi 6.7 vcenter Server 6.7

COMP Parallel Computing. SMM (1) Memory Hierarchies and Shared Memory

Mo Money, No Problems: Caches #2...

Improving Performance and Power of Multi-Core Processors with Wonderware and System Platform 3.0

CS425 Computer Systems Architecture

UMBC. Rubini and Corbet, Linux Device Drivers, 2nd Edition, O Reilly. Systems Design and Programming

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

GOP Level Parallelism on H.264 Video Encoder for Multicore Architecture

Chapter 2. Parallel Hardware and Parallel Software. An Introduction to Parallel Programming. The Von Neuman Architecture

Pexip Infinity Server Design Guide

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multi-core Architectures. Dr. Yingwu Zhu

Fundamentals of Computer Design

CSC501 Operating Systems Principles. OS Structure

COSC 243. Computer Architecture 2. Lecture 12 Computer Architecture 2. COSC 243 (Computer Architecture)

DEMYSTIFYING INTEL IVY BRIDGE MICROARCHITECTURE

Parallel Computing. Parallel Computing. Hwansoo Han

Contents: Computer definition Computer literacy Computer components Computer categories Computer usage. Tablet PC

7/28/ Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc.

Transcription:

Proceedings of the 6th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 15-17, 2007 436 A Performance Analysis for Microprocessor Architectures Nakhoon Baek School of EECS Kyungpook National University Daegu 702-701 Korea oceancru@gmail.com Hwanyong Lee Solution Division HUONE Inc. Daegu 702-205 Korea hylee@hu1.com Abstract: In this paper, we selected three different CPU architectures for performance analysis: single-core, dualcore and hyper-threading CPU s. Four kinds of are executed on these architectures. After analyzing all the data, we found that the single-core and dual-core act as usually expected: the execution times of combined are very close to the sum of that of compounding. In contrast, the hyper-threading CPU shows better performance when each thread performs specific, rather than. Key Words: CPU architectures, performance analysis, multi-threading 1 Introduction Nowadays, the performance of microprocessors is approaching their physical limits. In the case of largescale computers including super computers and mainframes, they already met this kind of technical limits in their CPU powers. Thus, they developed various parallel processing techniques including multithreading, super-threading, hyper-threading, and so on[1]. In these days, microprocessors used in conventional PC s and even in high-end embedded systems have improved their ability to effectively support parallel processing techniques. At this time, we already have some commercial multi-core CPU s including Intel 2 Duo, Intel 2 Quad, Intel Xeon, customized triple-core CPU s for Xbox 360, etc[2, 3]. It is clear that conventional programming models based on the sequential processing paradigm is not exactly suitable for these multi-core CPU s. We need computer programs based on the parallel processing paradigm such as multi-processing and/or multithreading. In this paper, we represent the experimental results on the execution time of some CPU-intensive for an amount of and/or floating-point, to finally analyze which programming architecture is more suitable for newly appeared CPU architectures. Corresponding Author. 2 Background Works In computer programming, a thread means a light process, which executes a given area of programming codes, with a dedicated stack area[4]. In contrast to usual processes, threads can share their memory each other, which can act as a strong point. Comparing the conventional sequential programming paradigm to the multi-threaded programming, one of the most strong point for the multi-threading is that multiple threads can be simultaneously executed in a parallelized manner[5]. Nowadays, multiple threads can be simultaneously executed on many computer systems. On single processor systems, the time sharing method is used to execute several threads, in turns. Through alternating the executing thread very frequently, this system can make the illusion of simultaneous executions. However, in fact, the single processor systems only alternate multiple threads, rather than physically executing the threads in a parallelized way. Multi-processor systems or multi-core processors are capable of physically executing multiple threads. In other words, multi-processing is now possible. Thus, we can use the multi-threading in more wide areas of programming, although they recommend it only for I/O intensive works, in past. Multi-threading does not always make overall speed ups in all situations[6]. First of all, parallel programs based on the multi-threading needs much more steps to start something useful, in comparison with previous sequential programming techniques. Thus, in worst case, the preparing and arranging times for the multi-threading requires somewhat significant

Proceedings of the 6th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 15-17, 2007 437 Hyper-Threading Architecture Dual Architecture On-Die Cache On-Die Cache system bus system bus Figure 1: Hyper-threading architecture. Figure 2: Dual-core architecture. portions in its overall execution time. Of course, programmers should avoid this situation. Hyper-threading is a recently developed technology for more efficient multi-threading, on the Pentium4 microprocessor architectures, delivered by Intel. It is also officially known as HTT(hyper-threading technology). In this technology, when a processing core is active, the other CPU pipelines not in use may be used by other threads, to finally simulate two logical processors in a single physical processor. So, we can expect two logical processors in a hyper-threading possible CPU s[7]. Figure 1 shows the conceptual diagram for the hyper-threading environment. In spite of its strong points, hyper-threading also has drawbacks. Since the logical cores share level-1 and level-2 caches, there is some security holes and some slow-downs in real world applications[8]. Multi-core microprocessors have two or more processing cores in a single physical processor package, as shown in Figure 2. In this case, each processing core has its own resources such as caches, registers, execution units, etc. Thus, there is no resource sharing in multi-core architectures, while hyper-threading invokes some kind of resource sharing. Some multi-core processors are designed to cooperate with hyper-threading technology. Although multi-core CPU s are one of costeffective way of implementing parallel programming paradigm, it also has some drawbacks. At this time, multi-core CPU s have slower clock than conventional single-core CPU s. Thus, current multi-core CPU s show bad scores for sequentially designed computer programs, in comparison with single-core CPU s[9]. At this time, we have dual-core and quad-core CPU s commercially available. A customized CPU for Xbox 360 has triple cores, while some CPU s for workstation computers also have multi-core architectures. 3 Performance Analysis In this paper, we will compare several CPU architectures: single-core, dual-core and hyper-threading CPU s. For this purpose, we select four kinds of. Basically, we focused on the arithmetic, to fully utilize the internal computing power of CPU s. To test operation units and floatingpoint units separately, we prepare the following : : consist of 1,000 additions, which are repeated 5,000,000 times. floating-point : consist of 1,000 precision floating-point additions, repeated 5,000,000 times. : consist of 1,000 additions and 1,000 floating-point additions. When multiple threads are used, each thread is allotted to the same amount of and floating-point. : consist of 1,000 additions and 1,000 floating-point additions. When multiple threads are used, each thread is wholly served for or for

Proceedings of the 6th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 15-17, 2007 438 Table 1: Execution times on the single-core CPU. threads 1 1.294 2.145 3.491 3.453 2 1.306 2.157 3.459 3.435 3 1.316 2.173 3.499 3.457 4 1.326 2.155 3.463 3.457 5 1.336 2.137 3.461 3.469 6 1.346 2.165 95 3.477 7 1.388 2.169 45 3.455 8 1.390 2.147 11 3.467 Table 2: Execution times on the dual-core CPU. threads 1 0.691 1.772 2.716 2.459 2 0.366 0.894 1.256 1.775 3 0.403 0.941 1.294 1.334 4 0.400 0.919 1.297 1.331 5 0.400 0.931 1.306 1.306 6 0.409 0.928 1.319 1.303 7 0.412 0.925 1.319 1.281 8 0.416 0.903 1.316 1.294 Figure 3: Single-core CPU performance. floating-point. I.e., the and floating-point are into independent threads. 3.1 Single-core case We use an Intel Pentium4 1.6 GHz processor with GB memory as a testing system for the single-core case. The measured execution times for the four kinds of are shown in Table 1. All the are measured for varying from 1 to 8. The graphical representation of these data is also shown in Figure 3. Since there is only one CPU core, the execution time is independent on the number of threads. As we can trivially guess, the execution times for - and - threads are very close to the sum of those of - and - threads. Additionally, there is no noticeable difference between the execution time of -operation and -operation threads. 3.2 Dula-core case For dual-core cases, we use an Intel 2 E6400 2.13GHz CPU system, with GB memory. The experiments are actually the same to the single-core case. The experimental results are summarized in Table 2 and Figure 4. Since we use a dual-core CPU, the execution time with 2 or more threads are dropped to half of the execution time of single threaded case. Similar to the single-core case, the execution times for and - threads are very close to the sum of those of - and - threads. 3.3 Hyper-threading case To test the hyper-threading CPU case, an Intel Pentium4 2.8 GHz processor, with hyper-threading facility is used, with GB memory. Measured execution times are listed in Table 3, and its corresponding graphical representation is shown in Figure 5. One remarkable thing on the graph is that the - threads show better performance with respect to the - threads. We guess that the processor core is fully utilized when a

Proceedings of the 6th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 15-17, 2007 439 Figure 4: Dual-core CPU performance. Figure 5: Hyper-threading CPU performance. Table 3: Execution times on the hyper-threading CPU. threads 1 0.756 2.200 3.266 2.941 2 0.747 1.125 1.844 2.181 3 0.744 1.144 1.875 1.816 4 0.750 1.119 1.903 1.819 5 0.766 1.128 1.909 1.856 6 0.791 1.141 1.925 1.859 7 0.781 1.156 1.950 1.853 8 0.784 1.131 1.941 1.856 thread uses -operation unit and another thread uses floating-point unit. 4 Conclusion In this paper, we selected three different CPU architectures for performance analysis: single-core, dualcore and hyper-threading CPU s. Four kinds of oper- ations are executed on these architectures. We measured the execution times of these all, for different from 1 to 8. After analyzing all the data, we found that the single-core and dual-core act as usually expected, i.e. the execution time of combined are very close to the sum of that of compounding. In contrast, the hyper-threading CPU shows better performance when each thread performs specific, rather than. Conclusively, in the case of hyper-threading CPU s, we had better design the multi-threading software to avoid a thread with. We need more experiments and analysis for more precise inferences. Acknowledgements: This work is financially supported by the Ministry of Education and Human Resources Development(MOE), the Ministry of Commerce, Industry and Energy(MOCIE) and the Ministry of Labor(MOLAB) through the fostering project of the Industrial-Academic Cooperation Centered U- niversity.

Proceedings of the 6th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 15-17, 2007 440 References: [1] J. Stokes. Introduction to multithreading, superthreading and hyperthreading, 2005. http://arstechnica.com/articles/paedia/cpu/hyperthreading.ars. [2] J. Stokes. Inside the Xbox 360, part I: procedural synthesis and dynamic worlds, 2005. http://arstechnica.com/articles/paedia/cpu/xbox360-1.ars. [3] J. Stokes. Inside the Xbox 360, part II: the Xenon CPU, 2005. http://arstechnica.com/articles/paedia/cpu/xbox360-2.ars. [4] K. Wackowski and P. Gepner. Hyper-threading technology speeds clusters. In Proc. 5th Int l Conf. on Parallel Proc. and Appl. Math., pages 17 26, 2003. [5] G. Keren. Multi-threaded programming with POSIX threads, 2002. http://users.actcom.coil/- choo/lupg/index.html. [6] D. Sarkar. Cost and time-cost effectiveness of multiprocessing. IEEE Trans. Parallel Distrib. Syst., 4(6):704 712, 1993. [7] T. Martinez and Sunish Parikh. Understanding dual processors, hyper-threading technology, and multi-core systems. Intel Optimizing Center, 2005. http://www.devx.com/intel/article/27399. [8] C. Percival. Cache missing for fun and profit. In BSDCan 05, 2005. [9] J. Handy. The Cache Memory Book. Academic Press, 1998.