The Intel move from ILP into Multi-threading

Size: px
Start display at page:

Download "The Intel move from ILP into Multi-threading"

Transcription

1 The Intel move from ILP into Multi-threading Miguel Pires Departamento de Informática, Universidade do Minho Braga, Portugal Abstract. Multicore technology came into consumer market in the last years to face what seem to be the limits to the technological paradigm of single core. With a small increase in chip cost and some engineering development, the first implementations of multithreading presented relevant improvements by dividing programs into threads and mixing threads in the processor pipeline (single core). Core parallelism (multicore architectures) applied the multithread technology at core level (in the same chip). This paper presents an introduction to simultaneous multithreading technology and its implementations. It explains Intel s hyperthreading approach, and the analysis of multicore techniques, with special emphasis on the Xeon processor (single and dual core) and its competitors. As the single core technology does not seem to achieve enough improving, multithreading techniques at core level (core parallelism) and sometimes at logical level too (like hyperthreading) combined with powerful software "thread tailored" seem to achieve the best results. 1 Introduction Thread level parallelism techniques in a single processor, like HT (hyperthreading), tend to minimise both horizontal and vertical waste in processor pipeline by simulating a logical processor and by mixing various threads at the same time. MC (multicore) processors extend this technique in a multi processor (in the same chip) at physical level. Nowadays some processors combine these techniques at instruction and thread (logical and processor) level with powerful thread oriented software to maximise a processor s multithreading capabilities and than to achieve better throughputs. In section 2 of this article will be described SC and Section 4 compares HT and MC performance gain on Intel s micro architectures and critical factors in performance. Finally is presented a comparative analysis between two different approaches to MC architecture, Intel Xeon and AMD Opteron (section 5). 2 Multithreading on Single Core Superscalar single threaded architectures optimize processor pipeline but do not have in mind important factors that are external to the pipeline, like cash-misses, interrupts and branch mispredictions. A superscalar, out-of-order, with multi level pre-fetches and with the possibility of executing several instructions simultaneously in the same clock cycle processor is represented in figure 1. As is can be seen, some horizontal and vertical execution bubbles can be found, because only one block of instructions runs at a time [1][2][12]. In fine-grain multithreading various threads run simultaneously, but only one thread is executed at a clock cycle. A commuting process selects one thread at a time, avoiding horizontal bubbles in the pipeline, but does not give any solution to the vertical waste. SMT (simultaneous multi threading) runs at the same time various threads, trying to fill

2 pipeline vertically and horizontally, as much as possible. partition resources either in space or time, thereby limiting their flexibility to adapt to available parallelism. Figure 1 Vertical and horizontal waste of non-threaded microarchitectures [3] As described in [1] the first known implementation of multithreading technology is called TX2 and dates back to TX2 used multiple threads to support fast context switching to handle I/O functions. Since then many evolutions of this concept were developed but the more significant one was a fine-grained multithreading scheme with interleave scheduling among threads (CDC 6600). The first simulation for multithreaded superscalar architecture appeared in 1994 and in 1995 was known the first realistic simulation assessment and coined the term simultaneous multi threading (SMT). As we can see in figure 2, experiences made in this field [4], shown that even when comparing SMT single core and parallel multiprocessor without SMT, the first one s performance is better (like in figure 3). Intel introduced Hyperthreading technology in 2002, based in SMT techniques by allowing two simultaneous threads at the same clock cycle in a single processor. The execution can be divided in two mixed threads in the pipeline. Using optimized algorithms, threads share physical resources such as caches, execution units, branch predictors, control logic, and buses. s (Advanced Programmable Interrupt Controllers) control the state of each logical processor and therefore they are duplicated as shown in Figure 4. [5] Figure 2 Performance comparison of SMT to Superscalars, multithreaded processors and onchip multiprocessors (instructions/cycle)[4] Multiprocessor SMT is then an evaluative technique that minimizes the pipeline s waste at multiple levels (thread and instruction levels), to raise substantially the use of the processor. As a consequence, the number of instructions per clock cycle raises and leads to both multiprogramming and parallel workloads gains. Multiscalar processors speculatively execute threads using dynamic branch prediction techniques and squashes threads if control (branches) or data (memory) speculation is incorrect. Although all of these architectures exploit multiple forms of parallelism, only SMT has the ability to dynamically share execution resources among all threads. In contrast, the others Figure 3 Pipeline Multiprocessor Architecture (based in [5])

3 DualCore A HT Technolog way, HT gains depend on how applications are fitted to take advantage of this technology, like those who explore data-parallel execution but most of the times require some engineering. [6] Intel reported that HT achieved 15% to 27% increase in processor resources utilization in well-optimized multimedia Technology. [7] Figure 4 Dual Core and HyperThreading Intel technology (based in [5]) In a super-pipelined micro architecture, events like cache misses, interrupts and branch mispredictions can be costly, so when this happens in one thread, HT processors can fill the pipeline with the other thread, and then maximise the number of instructions instruction per cycle. Intel reports that once the logical processors share almost all physical microprocessor resources, and only a few small structures were replicated, the die area cost of the implementation was less than 5% of the total area and the clock-cycle time is not significantly different from the nonmultithread one [5]. This two logical processor architecture has many engineering implications. HT changed many basic assumptions about single-threaded out-of-order design. Therefore to introduce multithread Intel had to change algorithms and create new ones to prioritize micro operations, or micro-ops, from different logical processors. They had to take some options concerning memory sharing by the two threads and pointer manipulation, a subject already complex enough in x86 architecture. Increased complexity dramatically increases the validation effort. Also on the platform side they reviewed and optimised chipset, BIOS, operating systems, and applications. [5] HT improves overall performance by multitasking, and when applications are already multithreaded. In this Figure 5 Hyperthreading technology performance gains on several popular multithreaded software packages. [8] Figure 6 - Hyperthreading technology performance boost on multitasking workloads. [8] During the period , microprocessor performances improved at appreciated levels (following the Moore s Law) in a single processor basis. In November of 2003, a group of Intel researchers announced the technological limits for the microprocessors miniaturization [16]. Due to this single core technology constraints, continuous demands in speed gains, limits in exploring Instruction Parallelism (will not support the same growth) and the advances in parallel computing technology, manufacturers saw new opportunities in connecting multiple processors together [1]. As the main aim of parallelism is to maximize the use of the processor, all the accelerating techniques in a single core cause more activity and therefore higher temperatures in the processor. The more single processor technology slows down the performance growth, the more attractive is the multiprocessor field.

4 3 Multithreading on Multicore Although HT seems like there are two logical possessors instead of one, the number of microinstructions at a single clock cycle that can be executed at the same time (pipeline width) is the same. Furthermore, SMT single core technologies have a major impact in long pipelines (Intel s netburst architecture case) but can be inefficient in other micro architectures. [11] The dissemination of SMT technology and the good performances achieved in parallel computing encouraged manufacturers to think about new opportunities in the SMT (applied to multiplie cores) as shown in [9]. Researchers realized that SMT would lead to higher degrees of parallelism with MP products. With significant advances in microelectronics and high threaded software usage, Intel reported in 2005 the MC (MultiCore) product line. [10] Advances in electronics and miniaturization made possible to have two cores (and its cache memories) in the same chip. It is like having independent processors but with much faster communication and memory access. Initially in the server market with the high-end computing market Pentium (Xeon), Intel introduced for the first time the MultiProcessor (MP) Technology. In one the first versions of Multicore Xeon (in figure 7), each of the Xeon cores has L1 (16k) and L2 (1MB) and the 16 MB of L3 memory is shared between cores. Due to the 65 nm technologies, it was possible to put millions of transistors in one single chip. This Xeon presents a hyper-pipelined architecture with 32 levels of pipeline. two threads can be mixed in the same pipeline and run at the same time (what means four threads for dual core). If the execution has fewer threads than de maximum allowed by processor, naturally preference goes to core execution because of the limitations of HT compared to MC efficiency explained above. In some versions of MC, HT does not exist or can be switched off, because in some computing markets HT is not efficient in the MC approach. Processors explore the full advantage of MC when the execution is thread tuned (naturally or forced ) but there are some computing markets where work tends to be naturally threaded. In this cases like server market, is possible to take advantage of this multithreading executing procedure as many times as microelectronics (and market) can get. One example of this is Intel s QuadCore tailored to server market, which principle is the same applied to 4 Cores (and also HT). But having the possibility of many cores, can normal systems take advantage of Hyperthreading? To what limit? Many authors think that systems do not explore yet the possibilities of multithreaded execution, because this paradigm only recently was realised and is relatively recent in the software industry, so there is much more to run in this way. [10] Researches in this domain suggest that a very high number of threads lead to complicated and inefficient resource sharing, even in powerful processors. In this way, some authors think that the processor shall decide the optimal number of threads to process. [13]. The advantage of having two cores in the same chip is the possibility of real processor multithreading with a communication much faster between the cores (one of the problems of parallel computing) and more efficient shared memory management. In first versions all the Xeon processors accumulated MC with HT technology, what means that in each core, Figure 7 View of Intel Xeon Dual Core Chip. [16]

5 4 SMT Performance comparison: SC and MC As it was referred in the section 2, multithreading techniques surplus depend how software takes advantage of processor s multithreading capabilities. Experiences in this field [10] demonstrate that MC demand specific adjustments in Compilers and other software. They suggest that to take full advantage from the MC innovations, compilers should be coretuned (2 Cores, 4 Cores, etc.). Placing the two SMT techniques side-by-side, Dual Core with two threads is more efficient than a SC with the same threads because of the resource sharing in a single core. Is not always efficient to use various contexts (virtual processors) in each core, because at a high number of threads may cause conflict sharing (depends on the application), but it is certain that Multicore Processors are faster than Uni-core ones when applications (mainly compilers) take advantage of multithread. The MC gain can be up to 30% when parallel execution is at high level, but in common applications will be under that value [7]. Figure 8 shows the MC effect at microinstructions level in various scenarios (cores, threads). The following comparison gives an overview of characteristics and performances of two big competitors in multicore processors, the Intel Xeon 7140M 3.4 Ghz and AMD Opteron 8220SE 2.8 Ghz [12]. Intel presents an architecture dualcore hyper-pipelined (31 stages), superscalar, hyperthreading, L1 (16k), L2 (1MB) and 16MB L3 shared chache. AMD Opteron is a dual core with a 64Kb L1, and 1MB L2 per core, hypertransport and AMD virtualization technologies. Both processors present high performance levels although very distinct architectural options. Xeon s 16MB L3 cache is a surplus in ERP s Applications and database. On the other hand, Opteron gets better scalability due to bus between cores, memory and a larger bandwidth. Due to the long pipeline, Xeon uses hyperthreading technology to optimize threads in each core. Intel s simply placed two Prescott (previous series) cores in the same chip. On the other side, AMD developed a new memory control between cores. This means that there is no need to communicate thru chipset, because memories are addressed directly from an exclusive bus named hypertransport what means best bandwidth. The communication with the other resources is also made by hypertransport. There is no need to share resources of the super I/O IDE controller, SATA, AGP, PCI- Express, USB, etc.). Hypertransport is a high performance, low latency and full-duplex connection, and it is possible to expand from dual core to quadcore applying the same scheme (Fig. 9). Figure 8-Normalized execution time of the benchmarks on the SMT multiprocessor. The sequential execution time is used as a reference for the normalization. 5 Comparison with competitors As usual, the best technological examples are introduced first in the high-value market, and the high-performance server market is a good example. Figure 9-AMD QuadCore technologies with HyperTransport [15]

6 6 Conclusions Simultaneous Multi Threading is a processor design methodology that combines the instruction level parallelism and the thread level parallelism. The aim is to increase gains of conventional superscalar processors in single or lately multiprocessor-in-one chip basis. Multithread techniques divide the execution into several independent threads. In single core SMT technology (HT in Intel netburst architecture), physical resources are just shared in an optimal thread mixing, but the pipeline does not enlarge and probably at micro-instruction level may cause some inefficiency. On the other hand SMT single core is inefficient in small pipelines like AMD s Opteron because there are no many wait times in the pipeline. Lately, incorporating SMT and parallel computation knowledge and recent progresses of microelectronics, manufacturers moved into MultiCore Many processors in one chip concept that allows threads distributed by the cores available. Core parallelism is a model that is only in the beginning and can be improved to electronic miniaturization limits. The MC technology replication seems to give good results and can be a key to faster micro processing architectures. However, the bandwidth off-chip does not seem to increase at the same speed and this will be certainly constraining the number of useful cores in a chip. This means that the supply-chain of cores will not be fast enough to send all the data that cores can process [14]. References [1] Hennessy, John L. and Patterson, David A.: Computer Architecture A quantitative approach Chapter 6, 3d edition, Elsevier Science USA, 2003 [2] LO, J., EMER, J., Levy, H., Stamm, R., Tullsen, M., Converting Thread-Level Parallelism to Instruction-Level Parallelism via SMT, ACM, Vol. 15, No. 3, August [3] Tullsen, D., Levy H., Simultaneous Multithreading: Maximizing On-Chip Parallelism, ACM Transactions on Computer Systems, 1995 [4] Eggers, S., Emer, J., Levy, H., Lo, J., Stamm, R., Tullsen, D.: SMT: A Platform for Next-Generation Processors, IEEE Micro, 1997 [5] Marr, D., Binns, F., Hill, D., Hinton, G., Koufaty, D., Miller, J. Upton, M.: Hyper-Threading Technology Architecture and Microarchitecture: Intel Technology Journal Q1, 2002 [6] Magro, W., Petersen, P., Shah S.: Hyper- Threading Technology: Impact on Compute- Intensive Workloads, Intel Technology Journal Q1, 2002 [7] Chen, Y., Holliman, M., Debes, E., Zheltov, S., Knyazev, A., Bratanov, S., Belenov, R., Santos, I.:Media Applications on Hyper-Threading Technology, Intel Technology Journal, 2002 [8] Koufaty, David, Marr, Deborah T.: Hyperthreading Technology in the Netburst Microarchitecture, IEEE Computer Society, 2003 [9] Spracklen, L., Abraham, G.: Chip Multithreading: Opportunities and Challenges, IEEE, 2005 [10] Curtis-Maury, M., Ding, X., Antonoupoulos, C., Nikopoulos, S.: An evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors, DCS The College of William and Mary, 2002 [11] Hewlett-Packard Development Company: Characterizing x86 processors for industrystandard servers: AMD Opteron and Intel Xeon, Technology Brief, 2nd Edition, 2005 [12] Silva, D., Ferreira, A.: Comparação dos MultiProcessadores Intel Xeon Dual Core e AMD Opteron, IST DEI, 2006/2007 [13] Courtis-Mauri, M., Wang, T., Antonopoulos, C., Nikolopoulos, D.:Integrating Multiple Forms of Multithreaded Execution on SMT Processors, College of William and Mary, 2005 [14] yperthreading/ visited in January, 25, 2007 Dua, R., Lokhande, B.:A Comparative study of SMT and CMP multiprocessors, Princeton University, ee8365, 2006 [15] Cardoso, B., Rosa, S., Fernandes, T.:Multicore, Unicamp, 2005

Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures

Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures Nagi N. Mekhiel Department of Electrical and Computer Engineering Ryerson University, Toronto, Ontario M5B 2K3

More information

Hyperthreading Technology

Hyperthreading Technology Hyperthreading Technology Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville milenka@ece.uah.edu www.ece.uah.edu/~milenka/ Outline What is hyperthreading?

More information

Simultaneous Multithreading on Pentium 4

Simultaneous Multithreading on Pentium 4 Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on

More information

Simultaneous Multithreading: a Platform for Next Generation Processors

Simultaneous Multithreading: a Platform for Next Generation Processors Simultaneous Multithreading: a Platform for Next Generation Processors Paulo Alexandre Vilarinho Assis Departamento de Informática, Universidade do Minho 4710 057 Braga, Portugal paulo.assis@bragatel.pt

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls

More information

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Module 18: TLP on Chip: HT/SMT and CMP Lecture 39: Simultaneous Multithreading and Chip-multiprocessing TLP on Chip: HT/SMT and CMP SMT TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012

More information

Beyond ILP. Hemanth M Bharathan Balaji. Hemanth M & Bharathan Balaji

Beyond ILP. Hemanth M Bharathan Balaji. Hemanth M & Bharathan Balaji Beyond ILP Hemanth M Bharathan Balaji Multiscalar Processors Gurindar S Sohi Scott E Breach T N Vijaykumar Control Flow Graph (CFG) Each node is a basic block in graph CFG divided into a collection of

More information

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading CS 152 Computer Architecture and Engineering Lecture 18: Multithreading Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

Lecture 14: Multithreading

Lecture 14: Multithreading CS 152 Computer Architecture and Engineering Lecture 14: Multithreading John Wawrzynek Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~johnw

More information

Multi-core Programming Evolution

Multi-core Programming Evolution Multi-core Programming Evolution Based on slides from Intel Software ollege and Multi-ore Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts, Evolution

More information

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Advanced Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

One-Level Cache Memory Design for Scalable SMT Architectures

One-Level Cache Memory Design for Scalable SMT Architectures One-Level Cache Design for Scalable SMT Architectures Muhamed F. Mudawar and John R. Wani Computer Science Department The American University in Cairo mudawwar@aucegypt.edu rubena@aucegypt.edu Abstract

More information

Kaisen Lin and Michael Conley

Kaisen Lin and Michael Conley Kaisen Lin and Michael Conley Simultaneous Multithreading Instructions from multiple threads run simultaneously on superscalar processor More instruction fetching and register state Commercialized! DEC

More information

Advanced Processor Architecture

Advanced Processor Architecture Advanced Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong

More information

Simultaneous Multithreading Processor

Simultaneous Multithreading Processor Simultaneous Multithreading Processor Paper presented: Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor James Lue Some slides are modified from http://hassan.shojania.com/pdf/smt_presentation.pdf

More information

Beyond ILP II: SMT and variants. 1 Simultaneous MT: D. Tullsen, S. Eggers, and H. Levy

Beyond ILP II: SMT and variants. 1 Simultaneous MT: D. Tullsen, S. Eggers, and H. Levy EE482: Advanced Computer Organization Lecture #13 Processor Architecture Stanford University Handout Date??? Beyond ILP II: SMT and variants Lecture #13: Wednesday, 10 May 2000 Lecturer: Anamaya Sullery

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

32 Hyper-Threading on SMP Systems

32 Hyper-Threading on SMP Systems 32 Hyper-Threading on SMP Systems If you have not read the book (Performance Assurance for IT Systems) check the introduction to More Tasters on the web site http://www.b.king.dsl.pipex.com/ to understand

More information

IMPLEMENTING HARDWARE MULTITHREADING IN A VLIW ARCHITECTURE

IMPLEMENTING HARDWARE MULTITHREADING IN A VLIW ARCHITECTURE IMPLEMENTING HARDWARE MULTITHREADING IN A VLIW ARCHITECTURE Stephan Suijkerbuijk and Ben H.H. Juurlink Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics and Computer Science

More information

Understanding Dual-processors, Hyper-Threading Technology, and Multicore Systems

Understanding Dual-processors, Hyper-Threading Technology, and Multicore Systems Understanding Dual-processors, Hyper-Threading Technology, and Multicore Systems This paper will provide you with a basic understanding of the differences among several computer system architectures dual-processor

More information

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University Advanced d Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

Multithreading: Exploiting Thread-Level Parallelism within a Processor

Multithreading: Exploiting Thread-Level Parallelism within a Processor Multithreading: Exploiting Thread-Level Parallelism within a Processor Instruction-Level Parallelism (ILP): What we ve seen so far Wrap-up on multiple issue machines Beyond ILP Multithreading Advanced

More information

DEMYSTIFYING INTEL IVY BRIDGE MICROARCHITECTURE

DEMYSTIFYING INTEL IVY BRIDGE MICROARCHITECTURE DEMYSTIFYING INTEL IVY BRIDGE MICROARCHITECTURE Roger Luis Uy College of Computer Studies, De La Salle University Abstract: Tick-Tock is a model introduced by Intel Corporation in 2006 to show the improvement

More information

CS 152 Computer Architecture and Engineering. Lecture 14: Multithreading

CS 152 Computer Architecture and Engineering. Lecture 14: Multithreading CS 152 Computer Architecture and Engineering Lecture 14: Multithreading Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Microprocessor Trends and Implications for the Future

Microprocessor Trends and Implications for the Future Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

45-year CPU Evolution: 1 Law -2 Equations

45-year CPU Evolution: 1 Law -2 Equations 4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there

More information

THREAD LEVEL PARALLELISM

THREAD LEVEL PARALLELISM THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Dec. 11 th This lecture

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

High-Performance Processors Design Choices

High-Performance Processors Design Choices High-Performance Processors Design Choices Ramon Canal PD Fall 2013 1 High-Performance Processors Design Choices 1 Motivation 2 Multiprocessors 3 Multithreading 4 VLIW 2 Motivation Multiprocessors Outline

More information

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor 1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic

More information

A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures

A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures W.M. Roshan Weerasuriya and D.N. Ranasinghe University of Colombo School of Computing A Comparative

More information

Is Intel s Hyper-Threading Technology Worth the Extra Money to the Average User?

Is Intel s Hyper-Threading Technology Worth the Extra Money to the Average User? Is Intel s Hyper-Threading Technology Worth the Extra Money to the Average User? Andrew Murray Villanova University 800 Lancaster Avenue, Villanova, PA, 19085 United States of America ABSTRACT In the mid-1990

More information

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections ) Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Multi-{Socket,,Thread} Getting More Performance Keep pushing IPC and/or frequenecy Design complexity (time to market) Cooling (cost) Power delivery (cost) Possible, but too

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 5)

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 5) Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 5) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3, 3.9, and Appendix C) Hardware

More information

HW Trends and Architectures

HW Trends and Architectures Pavel Tvrdík, Jiří Kašpar (ČVUT FIT) HW Trends and Architectures MI-POA, 2011, Lecture 1 1/29 HW Trends and Architectures prof. Ing. Pavel Tvrdík CSc. Ing. Jiří Kašpar Department of Computer Systems Faculty

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors:!

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue

More information

Fundamentals of Computers Design

Fundamentals of Computers Design Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

CO403 Advanced Microprocessors IS860 - High Performance Computing for Security. Basavaraj Talawar,

CO403 Advanced Microprocessors IS860 - High Performance Computing for Security. Basavaraj Talawar, CO403 Advanced Microprocessors IS860 - High Performance Computing for Security Basavaraj Talawar, basavaraj@nitk.edu.in Course Syllabus Technology Trends: Transistor Theory. Moore's Law. Delay, Power,

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

The Implications of Multi-core

The Implications of Multi-core The Implications of Multi- What I want to do today Given that everyone is heralding Multi- Is it really the Holy Grail? Will it cure cancer? A lot of misinformation has surfaced What multi- is and what

More information

Parallel Systems I The GPU architecture. Jan Lemeire

Parallel Systems I The GPU architecture. Jan Lemeire Parallel Systems I The GPU architecture Jan Lemeire 2012-2013 Sequential program CPU pipeline Sequential pipelined execution Instruction-level parallelism (ILP): superscalar pipeline out-of-order execution

More information

Reference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded

Reference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded Reference Case Study of a Multi-core, Multithreaded Processor The Sun T ( Niagara ) Computer Architecture, A Quantitative Approach, Fourth Edition, by John Hennessy and David Patterson, chapter. :/C:8

More information

Page 1. Review: Dynamic Branch Prediction. Lecture 18: ILP and Dynamic Execution #3: Examples (Pentium III, Pentium 4, IBM AS/400)

Page 1. Review: Dynamic Branch Prediction. Lecture 18: ILP and Dynamic Execution #3: Examples (Pentium III, Pentium 4, IBM AS/400) CS252 Graduate Computer Architecture Lecture 18: ILP and Dynamic Execution #3: Examples (Pentium III, Pentium 4, IBM AS/400) April 4, 2001 Prof. David A. Patterson Computer Science 252 Spring 2001 Lec

More information

White Paper. First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem)

White Paper. First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem) White Paper First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem) Introducing a New Dynamically and Design- Scalable Microarchitecture that Rewrites the Book On Energy Efficiency

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 14: Multithreading Part 2 Synchronization 1

CS252 Spring 2017 Graduate Computer Architecture. Lecture 14: Multithreading Part 2 Synchronization 1 CS252 Spring 2017 Graduate Computer Architecture Lecture 14: Multithreading Part 2 Synchronization 1 Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture

More information

MICROPROCESSOR TECHNOLOGY

MICROPROCESSOR TECHNOLOGY MICROPROCESSOR TECHNOLOGY Assis. Prof. Hossam El-Din Moustafa Lecture 20 Ch.10 Intel Core Duo Processor Architecture 2-Jun-15 1 Chapter Objectives Understand the concept of dual core technology. Look inside

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor Single-Issue Processor (AKA Scalar Processor) CPI IPC 1 - One At Best 1 - One At best 1 From Single-Issue to: AKS Scalar Processors CPI < 1? How? Multiple issue processors: VLIW (Very Long Instruction

More information

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1) Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue

More information

Inside Intel Core Microarchitecture

Inside Intel Core Microarchitecture White Paper Inside Intel Core Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Multithreading Processors and Static Optimization Review. Adapted from Bhuyan, Patterson, Eggers, probably others

Multithreading Processors and Static Optimization Review. Adapted from Bhuyan, Patterson, Eggers, probably others Multithreading Processors and Static Optimization Review Adapted from Bhuyan, Patterson, Eggers, probably others Schedule of things to do By Wednesday the 9 th at 9pm Please send a milestone report (as

More information

CS377P Programming for Performance Multicore Performance Multithreading

CS377P Programming for Performance Multicore Performance Multithreading CS377P Programming for Performance Multicore Performance Multithreading Sreepathi Pai UTCS October 14, 2015 Outline 1 Multiprocessor Systems 2 Programming Models for Multicore 3 Multithreading and POSIX

More information

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications September 2013 Navigating between ever-higher performance targets and strict limits

More information

Fundamentals of Computer Design

Fundamentals of Computer Design Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

The Power Wall. Why Aren t Modern CPUs Faster? What Happened in the Late 1990 s?

The Power Wall. Why Aren t Modern CPUs Faster? What Happened in the Late 1990 s? The Power Wall Why Aren t Modern CPUs Faster? What Happened in the Late 1990 s? Edward L. Bosworth, Ph.D. Associate Professor TSYS School of Computer Science Columbus State University Columbus, Georgia

More information

Processing Unit CS206T

Processing Unit CS206T Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

CPU Architecture Overview. Varun Sampath CIS 565 Spring 2012

CPU Architecture Overview. Varun Sampath CIS 565 Spring 2012 CPU Architecture Overview Varun Sampath CIS 565 Spring 2012 Objectives Performance tricks of a modern CPU Pipelining Branch Prediction Superscalar Out-of-Order (OoO) Execution Memory Hierarchy Vector Operations

More information

EECS 470. Lecture 18. Simultaneous Multithreading. Fall 2018 Jon Beaumont

EECS 470. Lecture 18. Simultaneous Multithreading. Fall 2018 Jon Beaumont Lecture 18 Simultaneous Multithreading Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi,

More information

Multicore Hardware and Parallelism

Multicore Hardware and Parallelism Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3

More information

ECE/CS 250 Computer Architecture. Summer 2016

ECE/CS 250 Computer Architecture. Summer 2016 ECE/CS 250 Computer Architecture Summer 2016 Multicore Dan Sorin and Tyler Bletsch Duke University Multicore and Multithreaded Processors Why multicore? Thread-level parallelism Multithreaded cores Multiprocessors

More information

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous

More information

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John Computer Performance Evaluation and Benchmarking EE 382M Dr. Lizy Kurian John Evolution of Single-Chip Transistor Count 10K- 100K Clock Frequency 0.2-2MHz Microprocessors 1970 s 1980 s 1990 s 2010s 100K-1M

More information

Computer Architecture

Computer Architecture Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 83 Part III Multi-Core

More information

Simultaneous Multithreading and the Case for Chip Multiprocessing

Simultaneous Multithreading and the Case for Chip Multiprocessing Simultaneous Multithreading and the Case for Chip Multiprocessing John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 2 10 January 2019 Microprocessor Architecture

More information

Pipelined Hash-Join on Multithreaded Architectures

Pipelined Hash-Join on Multithreaded Architectures Pipelined Hash-Join on Multithreaded Architectures Philip Garcia University of Wisconsin-Madison Madison, WI 53706 USA pcgarcia@wisc.edu Henry F. Korth Lehigh University Bethlehem, PA 805 USA hfk@lehigh.edu

More information

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University 18-742 Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core Prof. Onur Mutlu Carnegie Mellon University Research Project Project proposal due: Jan 31 Project topics Does everyone have a topic?

More information

An In-order SMT Architecture with Static Resource Partitioning for Consumer Applications

An In-order SMT Architecture with Static Resource Partitioning for Consumer Applications An In-order SMT Architecture with Static Resource Partitioning for Consumer Applications Byung In Moon, Hongil Yoon, Ilgu Yun, and Sungho Kang Yonsei University, 134 Shinchon-dong, Seodaemoon-gu, Seoul

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-742 Fall 2012, Parallel Computer Architecture, Lecture 9: Multithreading

More information

ECE 588/688 Advanced Computer Architecture II

ECE 588/688 Advanced Computer Architecture II ECE 588/688 Advanced Computer Architecture II Instructor: Alaa Alameldeen alaa@ece.pdx.edu Fall 2009 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2009 1 When and Where? When:

More information

CS 654 Computer Architecture Summary. Peter Kemper

CS 654 Computer Architecture Summary. Peter Kemper CS 654 Computer Architecture Summary Peter Kemper Chapters in Hennessy & Patterson Ch 1: Fundamentals Ch 2: Instruction Level Parallelism Ch 3: Limits on ILP Ch 4: Multiprocessors & TLP Ap A: Pipelining

More information

Lecture: SMT, Cache Hierarchies. Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1)

Lecture: SMT, Cache Hierarchies. Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) Lecture: SMT, Cache Hierarchies Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

This Material Was All Drawn From Intel Documents

This Material Was All Drawn From Intel Documents This Material Was All Drawn From Intel Documents A ROAD MAP OF INTEL MICROPROCESSORS Hao Sun February 2001 Abstract The exponential growth of both the power and breadth of usage of the computer has made

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) #1 Lec # 2 Fall 2003 9-10-2003 Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing

More information

Parallel Programming Multicore systems

Parallel Programming Multicore systems FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have

More information

Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000

Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000 Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000 Mitesh R. Meswani and Patricia J. Teller Department of Computer Science, University

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers William Stallings Computer Organization and Architecture 8 th Edition Chapter 18 Multicore Computers Hardware Performance Issues Microprocessors have seen an exponential increase in performance Improved

More information

Intel Hyper-Threading technology

Intel Hyper-Threading technology Intel Hyper-Threading technology technology brief Abstract... 2 Introduction... 2 Hyper-Threading... 2 Need for the technology... 2 What is Hyper-Threading?... 3 Inside the technology... 3 Compatibility...

More information