Is Intel s Hyper-Threading Technology Worth the Extra Money to the Average User?

Size: px
Start display at page:

Download "Is Intel s Hyper-Threading Technology Worth the Extra Money to the Average User?"

Transcription

1 Is Intel s Hyper-Threading Technology Worth the Extra Money to the Average User? Andrew Murray Villanova University 800 Lancaster Avenue, Villanova, PA, United States of America ABSTRACT In the mid-1990 s, Intel Corporation decided to use symmetric multiprocessing (SMP) in order to increase the number of instructions that could execute simultaneously by putting more than one processor on a motherboard. This idea increased the overall performance of a system, but it was too expensive for the average user to afford. Intel then looked into the idea of simultaneous multithreading (SMT) for a single processor. The idea was to allow one processor to execute two threads simultaneously to increase the performance of the system. This technology was applied to Intel s processors and is called hyper-threading technology. This paper compares SMP to SMT and shows how the two technologies as similar, but very different. It then goes on to describe the hyper-threading technology in a little more detail and shows how hyper-threading compared to non-hyper-threading processors against some benchmarks and tests. 1. INTRODUCTION Every year since their creation, computers have continued to save time for the people that use them by being able to complete tasks with lighting speed. It is because of this that people are now looking for ways to get more out of this fabulous machine. They have begun making larger and more complex programs, executing multiple processes at the same time, and using them to run the servers that are the backbone of the Internet. For the longest time, the computer and processor architects have been able to keep up with the growing demand for more speed, but they needed something that would put them ahead of the demand. The idea of symmetric multiprocessing (SMP) was an answer to this problem for large scale computer users. The idea of using more than one processor to handle the workload definitely increased the power of the computer. The only problem was this type of computer was a lot more expensive and not feasible for the average user to go out and buy this type of machine. This meant that the average user was stuck to a single processor. In order to make the general public happy, processor architects did all they could to increase the speed of these processors to handle the average user s workload. To do this, they needed to add more transistors to increase the overall speed, but at the same time, these processors were consuming more power. This put the processor architects into a dilemma because speed sells, but they needed a way to improve the performance at a greater rate than transistor counts and power dissipation [3]. This is when the Intel Corporation created Hyper- Threading technology that was based on the idea of simultaneous multithreading (SMT) [1]. The very broad idea was to take a single processor and enable it to execute two separate threads simultaneously in order to improve the overall performance without increasing power and transistor counts. However, the main question that arises is whether or not this technology is better and more cost efficient than a regular processor at the same speed without hyper-threading. There have been a number of benchmarks that have compared that very idea and the results that will be seen later can be very surprising. 2. SMP vs. SMT Processor architects found a way to increase the overall performance of a system through the use of parallelism. Parallelism is the basic idea of having more than one independent thread executing simultaneously in order to boost the performance of a system [7]. In the mid-1990 s, Intel decided to use the idea of parallelism by putting more than one processor in a machine in order to execute different threads of a process simultaneously. This became known as symmetric multiprocessing (SMP). SMP did improve the overall performance of a system because it was able to execute more instructions simultaneously than could a single processor. What made it so power was its ability to continue execution. For example, if processor A receives an instruction to execute, but it must stall during its execution, processor B could receive the next instruction or an instruction from another program, which will keep the system busy and hide the stall latency on processor A. This is a huge improvement over single processor systems. In order for SMP systems to achieve these types of results, they must share the system resources and find different ways to schedule

2 threads for all the available processors. Programs that are already written for multithreaded environments fit perfectly into this type of system and will drastically increase its performance. However, non-multithreaded programs need to be scheduled in a way to achieve a multithreaded state. One way to achieve this is through out-of-order execution of instructions. This means that the processor or complier combines multiple instruction sequences, meant to be executed in a specific order and reschedules them so that they can be executed with the highest efficiency [3, 4]. This helps to ensure that all the processors receive some work to do and help increase the overall performance. SMP is still used today, especially in environments, like servers, that require a lot of work to be done in a relatively short amount of time. The only problem was how to bring the idea of SMP to the home user. SMP systems are relatively expensive because there is more than one processor on a motherboard and the hardware becomes more difficult and expensive to create. This puts this type of technology out of reach for most average users due to the cost. This problem was finally solved when the idea of simultaneous multithreading (SMT) was applied to processors. SMT is based on an idea known as thread-level parallelism (TLP), where multiple independent execution states can occur within a larger program context [3, 7]. Intel looked into this idea as a way to gain better performance vs. transistor count and power ratio [3]. It was discovered that when TLP is utilized, the overall performance of the program is increased. That s when Intel architects decided to apply this idea, by allowing the processor to handle multiple threads, in order to increase the performance of the processor. They decided to use SMT, which is a very fine grained form of hardware multithreading that allows simultaneous execution of more than one thread [1]. The main advantage of SMT is its ability to better utilize processor resources and to hide memory hierarchy latency by being able to provide more independent work to keep the processor busy [1]. This is similar idea to SMP, but instead of having more than one processor, everything occurs in the same processor. This has become known as Intel s new technology called Hyper-Threading. 3. Hyper-Threading Hyper-Threading Technology makes a single physical processor appear as multiple logical processors [3, 5, 6]. In order for this to happen, a copy of the architecture state is given to each logical processor so that two separate threads can execute at the same time on each architecture state. Each logical processor shares a single set of physical execution resources as compared to all the resources being shared with SMP type systems. Figure 1: Processors without Hyper-Threading Technology [3] Figure 1 shows what a classic two processor SMP system would look like during its execution. Each would have its own architecture state in order to execute separate threads simultaneously. Figure 2: Processors with Hyper-Threading Technology [3] Figure 2 shows what a two processor SMP system would look like with hyper-threading technology. Each processor now contains two copies of the architecture state, which means that each processor can execute two threads simultaneously. Since both processors can execute two threads in a single processor, this means that a two processor SMP system with hyper-threading could execute four threads simultaneously or in other words, it appears to have four logical processors [3]. This apparent increase in the number of processors, without having that number of physical processors present is a huge space saver on the motherboard and is also more cost efficient. Hyper-threading uses these logical processors in order to increase the performance of the system. It accomplishes this by switching the utilization of chip resources from the currently executing thread to a new thread when the currently executing thread initiates a long latency operation [7]. This ensures that any long pipeline stalls can be avoided by allowing the second logical processor to take over execution as the other logical processor stalls.

3 This apparent increase in performance should make the decision of buying a hyper-threading processor an easy one. If it can run two threads simultaneously and help hide long latency operations, then it must make it a better processor over the same speed processor without hyperthreading. The next section will prove or disprove that statement by showing a number of benchmarks and tests that were used with two similar speed processors, but one had hyper-threading technology. 4. Analysis This section takes a look at six benchmarks and tests that were taken from three online sites that perform a number of hardware performance benchmark tests. These sites are hardwareanalysis.com, linuxhardware.org, and tomshardware.com. Each site used a number of different benchmarks that are supposed to cover the range of user actions, like playing games, audio/video, and multitasking. They each used a Pentium GHz processor with hyper-threading technology. They were able to disable the hyper-threading technology in order to achieve the non-hyper-threading results. The first test will be how many frames per second can be achieved while playing the video game Quake 3, which is a very intense video game that requires a lot of graphics. Figure 4: SPECViewperf 7.0 [2] The next benchmark is a professional graphics benchmark shown in Figure 4. Its power is measured in frames per second and as it can be seen, the processor with the hyperthreading technology does not make too big of a difference. It is way to close to the non-hyper-threading processor to say either processor is better than the other. Figure 5: BAPCo SYSMark 2002 [4] Figure 3: Quake 3 Demo [2] Figure 3 shows the results of the Quake 3 Demo benchmark at three different resolutions. It seems that at the first two resolutions, the hyper-threading does a little bit better than the non-hyper-threading processor. However, at the last resolution, the non-hyper-threading processor just edges out the hyper-threading processor. It can be seen that there is no real advantage in having a hyper-threading processor because they are too close to have a statistically significant difference. Figure 5 shows the BAPCo SYSMark 2002 benchmark that is based on scripted runs of several popular office and workstation applications, including Microsoft Office, Adobe Photoshop and Premiere, and other popular applications. One of the best features of SYSMark2002 is that it runs multiple applications at once, unlike its predecessors, so essentially it is much more realistic and representative of how an actual user would work behind the PC [4]. It shows that the hyper-threading processor gives about a 3% increase in performance than the nonhyper-threading processor. This is also not a huge difference to make it clear that one processor is better than the other. After three benchmarks, the hyper-threading processor has done just slightly better than the non-hyper-threading processor. Although there has not been enough substantial evidence to say it is better than the non-hyperthreading processor. The next three benchmarks will try to simulate the situations when hyper-threading should be the best choice.

4 Figure 6: MadOnion 3DMark2001SE [4] The MadOnion 3DMark2001SE is not a multithreaded application and it is a single process. When it is executed on both the hyper-threading and non-hyper-threading processors, the result is about the same. However, when SETI@Home is executed in the background, the hyperthreading processor improves the performance by almost 40% compared to the other processor. SETI@Home is a processor-intensive application that requests only specific execution resources, leaving the other resources to be used by the other logical processor and thus increase the overall performance [4]. This is the first case that once resources were available to each logical processor, the true nature of hyper-threading was able to be used and it can definitely be seen. Figure 8: SiSoft- Sandra 2002 SP1 [7] Figure 7: 1GB ZIP Compression [4] This next test in figure 7 shows how compressing a 1 GB file fairs against the two processors. Since the compression program is not multithreaded, it is not expected that there would be an improvement in performance with the hyper-threading processor. However, when more programs begin to execute, like the mp3 playing with a plug-in, the hyper-threading processor is able to handle the workload a lot better than the nonhyper-threading processor. This is a good example of how well a hyper-threading processor can handle multitasking, which is what most average users are going to be doing. The last benchmark shown in figure 8 is tested against the SiSoft Sandra 2002 SP1 benchmark. It is a multimedia benchmark and from the results, a Pentium GHz hyper-threading processor beat every other processor. It even beat a Pentium GHz processor in this particular benchmark. This benchmark shows how well a hyperthreading processor can handle integer MMX, floating point SSE and SSE2, and 3DNow operations compared to the other processors. Overall, it can be seen through these various benchmarks and tests that hyper-threading technology is best when it is in a multitasking environment or when a particular application is multithreaded. This is a problem because a lot of developers were not aware of the potential benefits of hyper-threading when writing programs, such as Adobe Photoshop and Windows Media Decoder [7]. Once developers realize this benefit and start to develop multithreaded applications, then the true power of hyperthreading will be seen across those processors. Until then, the only situation where a hyper-threaded processor is beneficial is in the multitasking environment. A good area for research is to determine how much multitasking must be done at once, with what size programs, and whether or not any of the programs are multithreaded, in order to achieve the best overall performance gain from a hyper-threading processor.

5 3. CONCLUSION Intel s hyper-threading technology is a very creative way to increase the overall performance of a processor without having to add more transistors or consume more power. It was shown in certain situations that hyper-threading was able to drastically increase the performance of a system. However, these situations were very limited and very dependent on the nature of the workload being done by the user. Only multithreaded applications and multitasking were the situations in which the most improvement was observed. This means that single, nonmultithreaded applications that were executed alone showed no improvement with this new technology. This is a problem because a lot of average computer users only execute one application at a time. It is only with the more advanced home users who really multitask these nonmultithreaded applications that would see a significant performance gain with hyper-threading technology. [4] S. Sassen. Hyper-Threading on the Desktop. Hardware Analysis.com. Nov. 14, [5] The Standard Performance Evaluation Corporation. [6] N. Tuck and D. M. Tullsen. Initial observations of the simultaneous multithreading Pentium 4 processor. In Proceedings of the 12 th International Conference on Parallel Architecture and Compilation Techniques (PACT 2003), pages IEEE Computer Society, Sept [7] F. Volkel et al. Single CPU in Dual Operation: P GHz with Hyper-Threading Technology. Tom s Hardware Guide. Nov. 2, All of this evidence shows that hyper-threading technology is only worth the extra money if the user is going to take advantage of its power. This means that only users who will use multithreaded applications or who do a lot of multitasking should consider purchasing a processor with this additional power. Otherwise, the extra money is simply wasted since there is no significant gain in performance when not used on multithreaded applications and in a multitasked environment. This conclusion should change within the next couple of years as software engineers begin to create more multithreaded applications. However, until multithreaded applications become the norm in the computer industry, a processor with hyper-threading technology should only be purchased if the user intends on taking advantage of what this new technology has to offer. REFERENCES [1] J. R. Bulpin and I. A. Pratt. Multiprogramming Performance of the Pentium 4 with Hyper-Threading. In Proceedings of 31 st International Symposium on Computer Architecture (ISCA-31)., pages Munich, Germany, June [2] Linux Hardware Hyper-Threading Benchmarks. Linux Hardware.com &mode=thread [3] D. Marr et al. Hyper-Threading Technology Architecture and Microarchitecture: A Hypertext History. Intel Technology J., vol. 6, issue 1, Feb

Simultaneous Multithreading on Pentium 4

Simultaneous Multithreading on Pentium 4 Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on

More information

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Module 18: TLP on Chip: HT/SMT and CMP Lecture 39: Simultaneous Multithreading and Chip-multiprocessing TLP on Chip: HT/SMT and CMP SMT TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Multi-{Socket,,Thread} Getting More Performance Keep pushing IPC and/or frequenecy Design complexity (time to market) Cooling (cost) Power delivery (cost) Possible, but too

More information

Multithreading: Exploiting Thread-Level Parallelism within a Processor

Multithreading: Exploiting Thread-Level Parallelism within a Processor Multithreading: Exploiting Thread-Level Parallelism within a Processor Instruction-Level Parallelism (ILP): What we ve seen so far Wrap-up on multiple issue machines Beyond ILP Multithreading Advanced

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue

More information

Performance. February 12, Howard Huang 1

Performance. February 12, Howard Huang 1 Performance Today we ll try to answer several questions about performance. Why is performance important? How can you define performance more precisely? How do hardware and software design affect performance?

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

Hyper-Threading Performance with Intel CPUs for Linux SAP Deployment on ProLiant Servers. Session #3798. Hein van den Heuvel

Hyper-Threading Performance with Intel CPUs for Linux SAP Deployment on ProLiant Servers. Session #3798. Hein van den Heuvel Hyper-Threading Performance with Intel CPUs for Linux SAP Deployment on ProLiant Servers Session #3798 Hein van den Heuvel Performance Engineer Hewlett-Packard 2004 Hewlett-Packard Development Company,

More information

Twos Complement Signed Numbers. IT 3123 Hardware and Software Concepts. Reminder: Moore s Law. The Need for Speed. Parallelism.

Twos Complement Signed Numbers. IT 3123 Hardware and Software Concepts. Reminder: Moore s Law. The Need for Speed. Parallelism. Twos Complement Signed Numbers IT 3123 Hardware and Software Concepts Modern Computer Implementations April 26 Notice: This session is being recorded. Copyright 2009 by Bob Brown http://xkcd.com/571/ Reminder:

More information

THREAD LEVEL PARALLELISM

THREAD LEVEL PARALLELISM THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Dec. 11 th This lecture

More information

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 5)

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 5) Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 5) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3, 3.9, and Appendix C) Hardware

More information

IJESR Volume 1, Issue 1 ISSN:

IJESR Volume 1, Issue 1 ISSN: Multimedia Applications Are Especially Conducive To Thread-Level Parallelism Dr S.Kishore Reddy * Dr. Syed Musthak Ahmed ** Abstract Multimedia applications are especially conducive to thread-level parallelism

More information

Intel Core i7 Processor

Intel Core i7 Processor Intel Core i7 Processor Vishwas Raja 1, Mr. Danish Ather 2 BSc (Hons.) C.S., CCSIT, TMU, Moradabad 1 Assistant Professor, CCSIT, TMU, Moradabad 2 1 vishwasraja007@gmail.com 2 danishather@gmail.com Abstract--The

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

Multi-core Programming Evolution

Multi-core Programming Evolution Multi-core Programming Evolution Based on slides from Intel Software ollege and Multi-ore Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts, Evolution

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More

More information

STAR Watch Statewide Technology Assistance Resources Project A publication of the Western New York Law Center,Inc.

STAR Watch Statewide Technology Assistance Resources Project A publication of the Western New York Law Center,Inc. STAR Watch Statewide Technology Assistance Resources Project A publication of the Western New York Law Center,Inc. Volume 9 Issue 3 June 2005 Double the Performance: Dual-Core CPU s Make Their Debut Starting

More information

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 06: Multithreaded Processors

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 06: Multithreaded Processors Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 06: Multithreaded Processors Objective To learn meaning of thread To understand multithreaded processors,

More information

Hyperthreading 3/25/2008. Hyperthreading. ftp://download.intel.com/technology/itj/2002/volume06issue01/art01_hyper/vol6iss1_art01.

Hyperthreading 3/25/2008. Hyperthreading. ftp://download.intel.com/technology/itj/2002/volume06issue01/art01_hyper/vol6iss1_art01. Hyperthreading ftp://download.intel.com/technology/itj/2002/volume06issue01/art01_hyper/vol6iss1_art01.pdf Hyperthreading is a design that makes everybody concerned believe that they are actually using

More information

Inside Intel Core Microarchitecture

Inside Intel Core Microarchitecture White Paper Inside Intel Core Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation

More information

This Material Was All Drawn From Intel Documents

This Material Was All Drawn From Intel Documents This Material Was All Drawn From Intel Documents A ROAD MAP OF INTEL MICROPROCESSORS Hao Sun February 2001 Abstract The exponential growth of both the power and breadth of usage of the computer has made

More information

Multi-Screen Computer Buyers Guide. // //

Multi-Screen Computer Buyers Guide.   // // www.multiplemonitors.co.uk // Sales@MultipleMonitors.co.uk // 0845 508 53 77 CPU / Processors CPU s or processors are the heart of any computer system, they are the main chips which carry out instructions

More information

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances) HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access

More information

Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000

Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000 Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000 Mitesh R. Meswani and Patricia J. Teller Department of Computer Science, University

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Lecture 14: Multithreading

Lecture 14: Multithreading CS 152 Computer Architecture and Engineering Lecture 14: Multithreading John Wawrzynek Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~johnw

More information

Understanding Dual-processors, Hyper-Threading Technology, and Multicore Systems

Understanding Dual-processors, Hyper-Threading Technology, and Multicore Systems Understanding Dual-processors, Hyper-Threading Technology, and Multicore Systems This paper will provide you with a basic understanding of the differences among several computer system architectures dual-processor

More information

AN504: Memory Options and Performance on the Intel 955X Express Chip Set. John Beekley, VP Applications Engineering, Corsair Memory, Inc.

AN504: Memory Options and Performance on the Intel 955X Express Chip Set. John Beekley, VP Applications Engineering, Corsair Memory, Inc. APPLICATIONS NOTE AN504: Memory Options and Performance on the Intel 955X Express Chip Set John Beekley, VP Applications Engineering, Corsair Memory, Inc. Introduction This white paper will examine memory

More information

Fundamentals of Computer Design

Fundamentals of Computer Design Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures

Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures Nagi N. Mekhiel Department of Electrical and Computer Engineering Ryerson University, Toronto, Ontario M5B 2K3

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1996 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue

More information

32 Hyper-Threading on SMP Systems

32 Hyper-Threading on SMP Systems 32 Hyper-Threading on SMP Systems If you have not read the book (Performance Assurance for IT Systems) check the introduction to More Tasters on the web site http://www.b.king.dsl.pipex.com/ to understand

More information

Computer System Architecture

Computer System Architecture CSC 203 1.5 Computer System Architecture Budditha Hettige Department of Statistics and Computer Science University of Sri Jayewardenepura Microprocessors 2011 Budditha Hettige 2 Processor Instructions

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Hyperthreading Technology

Hyperthreading Technology Hyperthreading Technology Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville milenka@ece.uah.edu www.ece.uah.edu/~milenka/ Outline What is hyperthreading?

More information

Scheduling the Intel Core i7

Scheduling the Intel Core i7 Third Year Project Report University of Manchester SCHOOL OF COMPUTER SCIENCE Scheduling the Intel Core i7 Ibrahim Alsuheabani Degree Programme: BSc Software Engineering Supervisor: Prof. Alasdair Rawsthorne

More information

Online Course Evaluation. What we will do in the last week?

Online Course Evaluation. What we will do in the last week? Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design

Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design Based on papers by: A.Fedorova, M.Seltzer, C.Small, and D.Nussbaum Pisa November 6, 2006 Multithreaded Chip

More information

Legal Notices and Important Information

Legal Notices and Important Information 1 September, 2009 Legal Notices and Important Information Regarding the performance measurements in this presentation Intel processor numbers are not a measure of performance. Processor numbers differentiate

More information

ILP Ends TLP Begins. ILP Limits via an Oracle

ILP Ends TLP Begins. ILP Limits via an Oracle ILP Ends TLP Begins Today s topics: Explore a perfect machine unlimited budget to see where ILP goes answer: not far enough Look to TLP & multi-threading for help everything has it s issues we ll look

More information

Multi-core Architectures. Dr. Yingwu Zhu

Multi-core Architectures. Dr. Yingwu Zhu Multi-core Architectures Dr. Yingwu Zhu Outline Parallel computing? Multi-core architectures Memory hierarchy Vs. SMT Cache coherence What is parallel computing? Using multiple processors in parallel to

More information

High-End Computing Systems

High-End Computing Systems High-End Computing Systems EE380 State-of-the-Art Lecture Hank Dietz Professor & Hardymon Chair in Networking Electrical & Computer Engineering Dept. University of Kentucky Lexington, KY 40506-0046 http://aggregate.org/hankd/

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

EECS 452 Lecture 9 TLP Thread-Level Parallelism

EECS 452 Lecture 9 TLP Thread-Level Parallelism EECS 452 Lecture 9 TLP Thread-Level Parallelism Instructor: Gokhan Memik EECS Dept., Northwestern University The lecture is adapted from slides by Iris Bahar (Brown), James Hoe (CMU), and John Shen (CMU

More information

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John Computer Performance Evaluation and Benchmarking EE 382M Dr. Lizy Kurian John Evolution of Single-Chip Transistor Count 10K- 100K Clock Frequency 0.2-2MHz Microprocessors 1970 s 1980 s 1990 s 2010s 100K-1M

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) #1 Lec # 2 Fall 2003 9-10-2003 Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing

More information

Multithreaded Processors. Department of Electrical Engineering Stanford University

Multithreaded Processors. Department of Electrical Engineering Stanford University Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread

More information

HISTORY OF MICROPROCESSORS

HISTORY OF MICROPROCESSORS HISTORY OF MICROPROCESSORS CONTENTS Introduction 4-Bit Microprocessors 8-Bit Microprocessors 16-Bit Microprocessors 1 32-Bit Microprocessors 64-Bit Microprocessors 2 INTRODUCTION Fairchild Semiconductors

More information

Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-742 Fall 2012, Parallel Computer Architecture, Lecture 9: Multithreading

More information

CS377P Programming for Performance Multicore Performance Multithreading

CS377P Programming for Performance Multicore Performance Multithreading CS377P Programming for Performance Multicore Performance Multithreading Sreepathi Pai UTCS October 14, 2015 Outline 1 Multiprocessor Systems 2 Programming Models for Multicore 3 Multithreading and POSIX

More information

Seminar report Hyper-Threading Submitted in partial fulfillment of the requirement for the award of degree Of Mechanical

Seminar report Hyper-Threading Submitted in partial fulfillment of the requirement for the award of degree Of Mechanical A Seminar report On Hyper-Threading Submitted in partial fulfillment of the requirement for the award of degree Of Mechanical SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org Acknowledgement

More information

Kaisen Lin and Michael Conley

Kaisen Lin and Michael Conley Kaisen Lin and Michael Conley Simultaneous Multithreading Instructions from multiple threads run simultaneously on superscalar processor More instruction fetching and register state Commercialized! DEC

More information

Simultaneous Multithreading and the Case for Chip Multiprocessing

Simultaneous Multithreading and the Case for Chip Multiprocessing Simultaneous Multithreading and the Case for Chip Multiprocessing John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 2 10 January 2019 Microprocessor Architecture

More information

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,

More information

Fundamentals of Computers Design

Fundamentals of Computers Design Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2

More information

administrivia final hour exam next Wednesday covers assembly language like hw and worksheets

administrivia final hour exam next Wednesday covers assembly language like hw and worksheets administrivia final hour exam next Wednesday covers assembly language like hw and worksheets today last worksheet start looking at more details on hardware not covered on ANY exam probably won t finish

More information

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading CS 152 Computer Architecture and Engineering Lecture 18: Multithreading Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

Three OPTIMIZING. Your System for Photoshop. Tuning for Performance

Three OPTIMIZING. Your System for Photoshop. Tuning for Performance Three OPTIMIZING Your System for Photoshop Tuning for Performance 72 Power, Speed & Automation with Adobe Photoshop This chapter goes beyond speeding up how you can work faster in Photoshop to how to make

More information

Multi-Core Microprocessor Chips: Motivation & Challenges

Multi-Core Microprocessor Chips: Motivation & Challenges Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 4

ECE 571 Advanced Microprocessor-Based Design Lecture 4 ECE 571 Advanced Microprocessor-Based Design Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Homework #1 was due Announcements Homework #2 will be posted

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Introduction Goal: connecting multiple computers to get higher performance

More information

Linux Clusters for High- Performance Computing: An Introduction

Linux Clusters for High- Performance Computing: An Introduction Linux Clusters for High- Performance Computing: An Introduction Jim Phillips, Tim Skirvin Outline Why and why not clusters? Consider your Users Application Budget Environment Hardware System Software HPC

More information

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. Munara Tolubaeva Technical Consulting Engineer 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. notices and disclaimers Intel technologies features and benefits depend

More information

Unit 4 Part A Evaluating & Purchasing a Computer. Computer Applications

Unit 4 Part A Evaluating & Purchasing a Computer. Computer Applications Unit 4 Part A Evaluating & Purchasing a Computer Computer Applications Making Informed Computer Purchasing Decisions Before Buying a Computer Speaking the language of the computer world can be tricky It

More information

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1) Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling

More information

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor 1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic

More information

Characterization of Native Signal Processing Extensions

Characterization of Native Signal Processing Extensions Characterization of Native Signal Processing Extensions Jason Law Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 jlaw@mail.utexas.edu Abstract Soon if

More information

Measurement-based Analysis of TCP/IP Processing Requirements

Measurement-based Analysis of TCP/IP Processing Requirements Measurement-based Analysis of TCP/IP Processing Requirements Srihari Makineni Ravi Iyer Communications Technology Lab Intel Corporation {srihari.makineni, ravishankar.iyer}@intel.com Abstract With the

More information

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation SAS Enterprise Miner Performance on IBM System p 570 Jan, 2008 Hsian-Fen Tsao Brian Porter Harry Seifert IBM Corporation Copyright IBM Corporation, 2008. All Rights Reserved. TABLE OF CONTENTS ABSTRACT...3

More information

Parallel Processing SIMD, Vector and GPU s cont.

Parallel Processing SIMD, Vector and GPU s cont. Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

The Intel move from ILP into Multi-threading

The Intel move from ILP into Multi-threading The Intel move from ILP into Multi-threading Miguel Pires Departamento de Informática, Universidade do Minho Braga, Portugal migutass@hotmail.com Abstract. Multicore technology came into consumer market

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

DEMYSTIFYING INTEL IVY BRIDGE MICROARCHITECTURE

DEMYSTIFYING INTEL IVY BRIDGE MICROARCHITECTURE DEMYSTIFYING INTEL IVY BRIDGE MICROARCHITECTURE Roger Luis Uy College of Computer Studies, De La Salle University Abstract: Tick-Tock is a model introduced by Intel Corporation in 2006 to show the improvement

More information

Multi-core Programming - Introduction

Multi-core Programming - Introduction Multi-core Programming - Introduction Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

Chap. 4 Multiprocessors and Thread-Level Parallelism

Chap. 4 Multiprocessors and Thread-Level Parallelism Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,

More information

Quiz for Chapter 1 Computer Abstractions and Technology

Quiz for Chapter 1 Computer Abstractions and Technology Date: Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,

More information

How to Write Fast Code , spring th Lecture, Mar. 31 st

How to Write Fast Code , spring th Lecture, Mar. 31 st How to Write Fast Code 18-645, spring 2008 20 th Lecture, Mar. 31 st Instructor: Markus Püschel TAs: Srinivas Chellappa (Vas) and Frédéric de Mesmay (Fred) Introduction Parallelism: definition Carrying

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Using Intel Streaming SIMD Extensions for 3D Geometry Processing

Using Intel Streaming SIMD Extensions for 3D Geometry Processing Using Intel Streaming SIMD Extensions for 3D Geometry Processing Wan-Chun Ma, Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University firebird@cmlab.csie.ntu.edu.tw,

More information

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

Multi-core Architectures. Dr. Yingwu Zhu

Multi-core Architectures. Dr. Yingwu Zhu Multi-core Architectures Dr. Yingwu Zhu What is parallel computing? Using multiple processors in parallel to solve problems more quickly than with a single processor Examples of parallel computing A cluster

More information

DIGITALGLOBE ENHANCES PRODUCTIVITY

DIGITALGLOBE ENHANCES PRODUCTIVITY DIGITALGLOBE ENHANCES PRODUCTIVITY WITH NVIDIA GRID High-performance virtualized desktops transform daily tasks and drastically improve staff efficiency. ABOUT DIGITALGLOBE FIVE REASONS FOR NVIDIA GRID

More information

How Scalable is your SMB?

How Scalable is your SMB? How Scalable is your SMB? Mark Rabinovich Visuality Systems Ltd. What is this all about? Visuality Systems Ltd. provides SMB solutions from 1998. NQE (Embedded) is an implementation of SMB client/server

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information