The Intel move from ILP into Multi-threading
|
|
- Giles Curtis
- 6 years ago
- Views:
Transcription
1 The Intel move from ILP into Multi-threading Miguel Pires Departamento de Informática, Universidade do Minho Braga, Portugal Abstract. Multicore technology came into consumer market in the last years to face what seem to be the limits to the technological paradigm of single core. With a small increase in chip cost and some engineering development, the first implementations of multithreading presented relevant improvements by dividing programs into threads and mixing threads in the processor pipeline (single core). Core parallelism (multicore architectures) applied the multithread technology at core level (in the same chip). This paper presents an introduction to simultaneous multithreading technology and its implementations. It explains Intel s hyperthreading approach, and the analysis of multicore techniques, with special emphasis on the Xeon processor (single and dual core) and its competitors. As the single core technology does not seem to achieve enough improving, multithreading techniques at core level (core parallelism) and sometimes at logical level too (like hyperthreading) combined with powerful software "thread tailored" seem to achieve the best results. 1 Introduction Thread level parallelism techniques in a single processor, like HT (hyperthreading), tend to minimise both horizontal and vertical waste in processor pipeline by simulating a logical processor and by mixing various threads at the same time. MC (multicore) processors extend this technique in a multi processor (in the same chip) at physical level. Nowadays some processors combine these techniques at instruction and thread (logical and processor) level with powerful thread oriented software to maximise a processor s multithreading capabilities and than to achieve better throughputs. In section 2 of this article will be described SC and Section 4 compares HT and MC performance gain on Intel s micro architectures and critical factors in performance. Finally is presented a comparative analysis between two different approaches to MC architecture, Intel Xeon and AMD Opteron (section 5). 2 Multithreading on Single Core Superscalar single threaded architectures optimize processor pipeline but do not have in mind important factors that are external to the pipeline, like cash-misses, interrupts and branch mispredictions. A superscalar, out-of-order, with multi level pre-fetches and with the possibility of executing several instructions simultaneously in the same clock cycle processor is represented in figure 1. As is can be seen, some horizontal and vertical execution bubbles can be found, because only one block of instructions runs at a time [1][2][12]. In fine-grain multithreading various threads run simultaneously, but only one thread is executed at a clock cycle. A commuting process selects one thread at a time, avoiding horizontal bubbles in the pipeline, but does not give any solution to the vertical waste. SMT (simultaneous multi threading) runs at the same time various threads, trying to fill
2 pipeline vertically and horizontally, as much as possible. partition resources either in space or time, thereby limiting their flexibility to adapt to available parallelism. Figure 1 Vertical and horizontal waste of non-threaded microarchitectures [3] As described in [1] the first known implementation of multithreading technology is called TX2 and dates back to TX2 used multiple threads to support fast context switching to handle I/O functions. Since then many evolutions of this concept were developed but the more significant one was a fine-grained multithreading scheme with interleave scheduling among threads (CDC 6600). The first simulation for multithreaded superscalar architecture appeared in 1994 and in 1995 was known the first realistic simulation assessment and coined the term simultaneous multi threading (SMT). As we can see in figure 2, experiences made in this field [4], shown that even when comparing SMT single core and parallel multiprocessor without SMT, the first one s performance is better (like in figure 3). Intel introduced Hyperthreading technology in 2002, based in SMT techniques by allowing two simultaneous threads at the same clock cycle in a single processor. The execution can be divided in two mixed threads in the pipeline. Using optimized algorithms, threads share physical resources such as caches, execution units, branch predictors, control logic, and buses. s (Advanced Programmable Interrupt Controllers) control the state of each logical processor and therefore they are duplicated as shown in Figure 4. [5] Figure 2 Performance comparison of SMT to Superscalars, multithreaded processors and onchip multiprocessors (instructions/cycle)[4] Multiprocessor SMT is then an evaluative technique that minimizes the pipeline s waste at multiple levels (thread and instruction levels), to raise substantially the use of the processor. As a consequence, the number of instructions per clock cycle raises and leads to both multiprogramming and parallel workloads gains. Multiscalar processors speculatively execute threads using dynamic branch prediction techniques and squashes threads if control (branches) or data (memory) speculation is incorrect. Although all of these architectures exploit multiple forms of parallelism, only SMT has the ability to dynamically share execution resources among all threads. In contrast, the others Figure 3 Pipeline Multiprocessor Architecture (based in [5])
3 DualCore A HT Technolog way, HT gains depend on how applications are fitted to take advantage of this technology, like those who explore data-parallel execution but most of the times require some engineering. [6] Intel reported that HT achieved 15% to 27% increase in processor resources utilization in well-optimized multimedia Technology. [7] Figure 4 Dual Core and HyperThreading Intel technology (based in [5]) In a super-pipelined micro architecture, events like cache misses, interrupts and branch mispredictions can be costly, so when this happens in one thread, HT processors can fill the pipeline with the other thread, and then maximise the number of instructions instruction per cycle. Intel reports that once the logical processors share almost all physical microprocessor resources, and only a few small structures were replicated, the die area cost of the implementation was less than 5% of the total area and the clock-cycle time is not significantly different from the nonmultithread one [5]. This two logical processor architecture has many engineering implications. HT changed many basic assumptions about single-threaded out-of-order design. Therefore to introduce multithread Intel had to change algorithms and create new ones to prioritize micro operations, or micro-ops, from different logical processors. They had to take some options concerning memory sharing by the two threads and pointer manipulation, a subject already complex enough in x86 architecture. Increased complexity dramatically increases the validation effort. Also on the platform side they reviewed and optimised chipset, BIOS, operating systems, and applications. [5] HT improves overall performance by multitasking, and when applications are already multithreaded. In this Figure 5 Hyperthreading technology performance gains on several popular multithreaded software packages. [8] Figure 6 - Hyperthreading technology performance boost on multitasking workloads. [8] During the period , microprocessor performances improved at appreciated levels (following the Moore s Law) in a single processor basis. In November of 2003, a group of Intel researchers announced the technological limits for the microprocessors miniaturization [16]. Due to this single core technology constraints, continuous demands in speed gains, limits in exploring Instruction Parallelism (will not support the same growth) and the advances in parallel computing technology, manufacturers saw new opportunities in connecting multiple processors together [1]. As the main aim of parallelism is to maximize the use of the processor, all the accelerating techniques in a single core cause more activity and therefore higher temperatures in the processor. The more single processor technology slows down the performance growth, the more attractive is the multiprocessor field.
4 3 Multithreading on Multicore Although HT seems like there are two logical possessors instead of one, the number of microinstructions at a single clock cycle that can be executed at the same time (pipeline width) is the same. Furthermore, SMT single core technologies have a major impact in long pipelines (Intel s netburst architecture case) but can be inefficient in other micro architectures. [11] The dissemination of SMT technology and the good performances achieved in parallel computing encouraged manufacturers to think about new opportunities in the SMT (applied to multiplie cores) as shown in [9]. Researchers realized that SMT would lead to higher degrees of parallelism with MP products. With significant advances in microelectronics and high threaded software usage, Intel reported in 2005 the MC (MultiCore) product line. [10] Advances in electronics and miniaturization made possible to have two cores (and its cache memories) in the same chip. It is like having independent processors but with much faster communication and memory access. Initially in the server market with the high-end computing market Pentium (Xeon), Intel introduced for the first time the MultiProcessor (MP) Technology. In one the first versions of Multicore Xeon (in figure 7), each of the Xeon cores has L1 (16k) and L2 (1MB) and the 16 MB of L3 memory is shared between cores. Due to the 65 nm technologies, it was possible to put millions of transistors in one single chip. This Xeon presents a hyper-pipelined architecture with 32 levels of pipeline. two threads can be mixed in the same pipeline and run at the same time (what means four threads for dual core). If the execution has fewer threads than de maximum allowed by processor, naturally preference goes to core execution because of the limitations of HT compared to MC efficiency explained above. In some versions of MC, HT does not exist or can be switched off, because in some computing markets HT is not efficient in the MC approach. Processors explore the full advantage of MC when the execution is thread tuned (naturally or forced ) but there are some computing markets where work tends to be naturally threaded. In this cases like server market, is possible to take advantage of this multithreading executing procedure as many times as microelectronics (and market) can get. One example of this is Intel s QuadCore tailored to server market, which principle is the same applied to 4 Cores (and also HT). But having the possibility of many cores, can normal systems take advantage of Hyperthreading? To what limit? Many authors think that systems do not explore yet the possibilities of multithreaded execution, because this paradigm only recently was realised and is relatively recent in the software industry, so there is much more to run in this way. [10] Researches in this domain suggest that a very high number of threads lead to complicated and inefficient resource sharing, even in powerful processors. In this way, some authors think that the processor shall decide the optimal number of threads to process. [13]. The advantage of having two cores in the same chip is the possibility of real processor multithreading with a communication much faster between the cores (one of the problems of parallel computing) and more efficient shared memory management. In first versions all the Xeon processors accumulated MC with HT technology, what means that in each core, Figure 7 View of Intel Xeon Dual Core Chip. [16]
5 4 SMT Performance comparison: SC and MC As it was referred in the section 2, multithreading techniques surplus depend how software takes advantage of processor s multithreading capabilities. Experiences in this field [10] demonstrate that MC demand specific adjustments in Compilers and other software. They suggest that to take full advantage from the MC innovations, compilers should be coretuned (2 Cores, 4 Cores, etc.). Placing the two SMT techniques side-by-side, Dual Core with two threads is more efficient than a SC with the same threads because of the resource sharing in a single core. Is not always efficient to use various contexts (virtual processors) in each core, because at a high number of threads may cause conflict sharing (depends on the application), but it is certain that Multicore Processors are faster than Uni-core ones when applications (mainly compilers) take advantage of multithread. The MC gain can be up to 30% when parallel execution is at high level, but in common applications will be under that value [7]. Figure 8 shows the MC effect at microinstructions level in various scenarios (cores, threads). The following comparison gives an overview of characteristics and performances of two big competitors in multicore processors, the Intel Xeon 7140M 3.4 Ghz and AMD Opteron 8220SE 2.8 Ghz [12]. Intel presents an architecture dualcore hyper-pipelined (31 stages), superscalar, hyperthreading, L1 (16k), L2 (1MB) and 16MB L3 shared chache. AMD Opteron is a dual core with a 64Kb L1, and 1MB L2 per core, hypertransport and AMD virtualization technologies. Both processors present high performance levels although very distinct architectural options. Xeon s 16MB L3 cache is a surplus in ERP s Applications and database. On the other hand, Opteron gets better scalability due to bus between cores, memory and a larger bandwidth. Due to the long pipeline, Xeon uses hyperthreading technology to optimize threads in each core. Intel s simply placed two Prescott (previous series) cores in the same chip. On the other side, AMD developed a new memory control between cores. This means that there is no need to communicate thru chipset, because memories are addressed directly from an exclusive bus named hypertransport what means best bandwidth. The communication with the other resources is also made by hypertransport. There is no need to share resources of the super I/O IDE controller, SATA, AGP, PCI- Express, USB, etc.). Hypertransport is a high performance, low latency and full-duplex connection, and it is possible to expand from dual core to quadcore applying the same scheme (Fig. 9). Figure 8-Normalized execution time of the benchmarks on the SMT multiprocessor. The sequential execution time is used as a reference for the normalization. 5 Comparison with competitors As usual, the best technological examples are introduced first in the high-value market, and the high-performance server market is a good example. Figure 9-AMD QuadCore technologies with HyperTransport [15]
6 6 Conclusions Simultaneous Multi Threading is a processor design methodology that combines the instruction level parallelism and the thread level parallelism. The aim is to increase gains of conventional superscalar processors in single or lately multiprocessor-in-one chip basis. Multithread techniques divide the execution into several independent threads. In single core SMT technology (HT in Intel netburst architecture), physical resources are just shared in an optimal thread mixing, but the pipeline does not enlarge and probably at micro-instruction level may cause some inefficiency. On the other hand SMT single core is inefficient in small pipelines like AMD s Opteron because there are no many wait times in the pipeline. Lately, incorporating SMT and parallel computation knowledge and recent progresses of microelectronics, manufacturers moved into MultiCore Many processors in one chip concept that allows threads distributed by the cores available. Core parallelism is a model that is only in the beginning and can be improved to electronic miniaturization limits. The MC technology replication seems to give good results and can be a key to faster micro processing architectures. However, the bandwidth off-chip does not seem to increase at the same speed and this will be certainly constraining the number of useful cores in a chip. This means that the supply-chain of cores will not be fast enough to send all the data that cores can process [14]. References [1] Hennessy, John L. and Patterson, David A.: Computer Architecture A quantitative approach Chapter 6, 3d edition, Elsevier Science USA, 2003 [2] LO, J., EMER, J., Levy, H., Stamm, R., Tullsen, M., Converting Thread-Level Parallelism to Instruction-Level Parallelism via SMT, ACM, Vol. 15, No. 3, August [3] Tullsen, D., Levy H., Simultaneous Multithreading: Maximizing On-Chip Parallelism, ACM Transactions on Computer Systems, 1995 [4] Eggers, S., Emer, J., Levy, H., Lo, J., Stamm, R., Tullsen, D.: SMT: A Platform for Next-Generation Processors, IEEE Micro, 1997 [5] Marr, D., Binns, F., Hill, D., Hinton, G., Koufaty, D., Miller, J. Upton, M.: Hyper-Threading Technology Architecture and Microarchitecture: Intel Technology Journal Q1, 2002 [6] Magro, W., Petersen, P., Shah S.: Hyper- Threading Technology: Impact on Compute- Intensive Workloads, Intel Technology Journal Q1, 2002 [7] Chen, Y., Holliman, M., Debes, E., Zheltov, S., Knyazev, A., Bratanov, S., Belenov, R., Santos, I.:Media Applications on Hyper-Threading Technology, Intel Technology Journal, 2002 [8] Koufaty, David, Marr, Deborah T.: Hyperthreading Technology in the Netburst Microarchitecture, IEEE Computer Society, 2003 [9] Spracklen, L., Abraham, G.: Chip Multithreading: Opportunities and Challenges, IEEE, 2005 [10] Curtis-Maury, M., Ding, X., Antonoupoulos, C., Nikopoulos, S.: An evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors, DCS The College of William and Mary, 2002 [11] Hewlett-Packard Development Company: Characterizing x86 processors for industrystandard servers: AMD Opteron and Intel Xeon, Technology Brief, 2nd Edition, 2005 [12] Silva, D., Ferreira, A.: Comparação dos MultiProcessadores Intel Xeon Dual Core e AMD Opteron, IST DEI, 2006/2007 [13] Courtis-Mauri, M., Wang, T., Antonopoulos, C., Nikolopoulos, D.:Integrating Multiple Forms of Multithreaded Execution on SMT Processors, College of William and Mary, 2005 [14] yperthreading/ visited in January, 25, 2007 Dua, R., Lokhande, B.:A Comparative study of SMT and CMP multiprocessors, Princeton University, ee8365, 2006 [15] Cardoso, B., Rosa, S., Fernandes, T.:Multicore, Unicamp, 2005
Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures
Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures Nagi N. Mekhiel Department of Electrical and Computer Engineering Ryerson University, Toronto, Ontario M5B 2K3
More informationHyperthreading Technology
Hyperthreading Technology Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville milenka@ece.uah.edu www.ece.uah.edu/~milenka/ Outline What is hyperthreading?
More informationSimultaneous Multithreading on Pentium 4
Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on
More informationSimultaneous Multithreading: a Platform for Next Generation Processors
Simultaneous Multithreading: a Platform for Next Generation Processors Paulo Alexandre Vilarinho Assis Departamento de Informática, Universidade do Minho 4710 057 Braga, Portugal paulo.assis@bragatel.pt
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls
More informationModule 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT
TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012
More informationBeyond ILP. Hemanth M Bharathan Balaji. Hemanth M & Bharathan Balaji
Beyond ILP Hemanth M Bharathan Balaji Multiscalar Processors Gurindar S Sohi Scott E Breach T N Vijaykumar Control Flow Graph (CFG) Each node is a basic block in graph CFG divided into a collection of
More informationCS 152 Computer Architecture and Engineering. Lecture 18: Multithreading
CS 152 Computer Architecture and Engineering Lecture 18: Multithreading Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste
More informationLecture 14: Multithreading
CS 152 Computer Architecture and Engineering Lecture 14: Multithreading John Wawrzynek Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~johnw
More informationMulti-core Programming Evolution
Multi-core Programming Evolution Based on slides from Intel Software ollege and Multi-ore Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts, Evolution
More informationAdvanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Advanced Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000
More informationOne-Level Cache Memory Design for Scalable SMT Architectures
One-Level Cache Design for Scalable SMT Architectures Muhamed F. Mudawar and John R. Wani Computer Science Department The American University in Cairo mudawwar@aucegypt.edu rubena@aucegypt.edu Abstract
More informationKaisen Lin and Michael Conley
Kaisen Lin and Michael Conley Simultaneous Multithreading Instructions from multiple threads run simultaneously on superscalar processor More instruction fetching and register state Commercialized! DEC
More informationAdvanced Processor Architecture
Advanced Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong
More informationSimultaneous Multithreading Processor
Simultaneous Multithreading Processor Paper presented: Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor James Lue Some slides are modified from http://hassan.shojania.com/pdf/smt_presentation.pdf
More informationBeyond ILP II: SMT and variants. 1 Simultaneous MT: D. Tullsen, S. Eggers, and H. Levy
EE482: Advanced Computer Organization Lecture #13 Processor Architecture Stanford University Handout Date??? Beyond ILP II: SMT and variants Lecture #13: Wednesday, 10 May 2000 Lecturer: Anamaya Sullery
More informationOutline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??
Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (I)
COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month
More information32 Hyper-Threading on SMP Systems
32 Hyper-Threading on SMP Systems If you have not read the book (Performance Assurance for IT Systems) check the introduction to More Tasters on the web site http://www.b.king.dsl.pipex.com/ to understand
More informationIMPLEMENTING HARDWARE MULTITHREADING IN A VLIW ARCHITECTURE
IMPLEMENTING HARDWARE MULTITHREADING IN A VLIW ARCHITECTURE Stephan Suijkerbuijk and Ben H.H. Juurlink Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics and Computer Science
More informationUnderstanding Dual-processors, Hyper-Threading Technology, and Multicore Systems
Understanding Dual-processors, Hyper-Threading Technology, and Multicore Systems This paper will provide you with a basic understanding of the differences among several computer system architectures dual-processor
More informationAdvanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University
Advanced d Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000
More informationMultithreading: Exploiting Thread-Level Parallelism within a Processor
Multithreading: Exploiting Thread-Level Parallelism within a Processor Instruction-Level Parallelism (ILP): What we ve seen so far Wrap-up on multiple issue machines Beyond ILP Multithreading Advanced
More informationDEMYSTIFYING INTEL IVY BRIDGE MICROARCHITECTURE
DEMYSTIFYING INTEL IVY BRIDGE MICROARCHITECTURE Roger Luis Uy College of Computer Studies, De La Salle University Abstract: Tick-Tock is a model introduced by Intel Corporation in 2006 to show the improvement
More informationCS 152 Computer Architecture and Engineering. Lecture 14: Multithreading
CS 152 Computer Architecture and Engineering Lecture 14: Multithreading Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste
More informationComputer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:
More informationCMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)
CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer
More informationMicroprocessor Trends and Implications for the Future
Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from
More informationCISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP
CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More information45-year CPU Evolution: 1 Law -2 Equations
4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there
More informationTHREAD LEVEL PARALLELISM
THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Dec. 11 th This lecture
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationHigh-Performance Processors Design Choices
High-Performance Processors Design Choices Ramon Canal PD Fall 2013 1 High-Performance Processors Design Choices 1 Motivation 2 Multiprocessors 3 Multithreading 4 VLIW 2 Motivation Multiprocessors Outline
More informationComputer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13
Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,
More informationMultiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering
Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to
More informationCPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor
1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic
More informationA Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures
A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures W.M. Roshan Weerasuriya and D.N. Ranasinghe University of Colombo School of Computing A Comparative
More informationIs Intel s Hyper-Threading Technology Worth the Extra Money to the Average User?
Is Intel s Hyper-Threading Technology Worth the Extra Money to the Average User? Andrew Murray Villanova University 800 Lancaster Avenue, Villanova, PA, 19085 United States of America ABSTRACT In the mid-1990
More informationLecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )
Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target
More informationComputer Architecture!
Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Multi-{Socket,,Thread} Getting More Performance Keep pushing IPC and/or frequenecy Design complexity (time to market) Cooling (cost) Power delivery (cost) Possible, but too
More informationChapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1
Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed
More informationComputer Architecture!
Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors
More informationChapter 3 Instruction-Level Parallelism and its Exploitation (Part 5)
Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 5) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3, 3.9, and Appendix C) Hardware
More informationHW Trends and Architectures
Pavel Tvrdík, Jiří Kašpar (ČVUT FIT) HW Trends and Architectures MI-POA, 2011, Lecture 1 1/29 HW Trends and Architectures prof. Ing. Pavel Tvrdík CSc. Ing. Jiří Kašpar Department of Computer Systems Faculty
More informationComputer Architecture!
Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors:!
More informationSimultaneous Multithreading (SMT)
Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue
More informationFundamentals of Computers Design
Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More informationCO403 Advanced Microprocessors IS860 - High Performance Computing for Security. Basavaraj Talawar,
CO403 Advanced Microprocessors IS860 - High Performance Computing for Security Basavaraj Talawar, basavaraj@nitk.edu.in Course Syllabus Technology Trends: Transistor Theory. Moore's Law. Delay, Power,
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationThe Implications of Multi-core
The Implications of Multi- What I want to do today Given that everyone is heralding Multi- Is it really the Holy Grail? Will it cure cancer? A lot of misinformation has surfaced What multi- is and what
More informationParallel Systems I The GPU architecture. Jan Lemeire
Parallel Systems I The GPU architecture Jan Lemeire 2012-2013 Sequential program CPU pipeline Sequential pipelined execution Instruction-level parallelism (ILP): superscalar pipeline out-of-order execution
More informationReference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded
Reference Case Study of a Multi-core, Multithreaded Processor The Sun T ( Niagara ) Computer Architecture, A Quantitative Approach, Fourth Edition, by John Hennessy and David Patterson, chapter. :/C:8
More informationPage 1. Review: Dynamic Branch Prediction. Lecture 18: ILP and Dynamic Execution #3: Examples (Pentium III, Pentium 4, IBM AS/400)
CS252 Graduate Computer Architecture Lecture 18: ILP and Dynamic Execution #3: Examples (Pentium III, Pentium 4, IBM AS/400) April 4, 2001 Prof. David A. Patterson Computer Science 252 Spring 2001 Lec
More informationWhite Paper. First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem)
White Paper First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem) Introducing a New Dynamically and Design- Scalable Microarchitecture that Rewrites the Book On Energy Efficiency
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 14: Multithreading Part 2 Synchronization 1
CS252 Spring 2017 Graduate Computer Architecture Lecture 14: Multithreading Part 2 Synchronization 1 Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture
More informationMICROPROCESSOR TECHNOLOGY
MICROPROCESSOR TECHNOLOGY Assis. Prof. Hossam El-Din Moustafa Lecture 20 Ch.10 Intel Core Duo Processor Architecture 2-Jun-15 1 Chapter Objectives Understand the concept of dual core technology. Look inside
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationLecture 1: Introduction
Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline
More informationCPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor
Single-Issue Processor (AKA Scalar Processor) CPI IPC 1 - One At Best 1 - One At best 1 From Single-Issue to: AKS Scalar Processors CPI < 1? How? Multiple issue processors: VLIW (Very Long Instruction
More informationLecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)
Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling
More informationSimultaneous Multithreading (SMT)
Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue
More informationInside Intel Core Microarchitecture
White Paper Inside Intel Core Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationMultithreading Processors and Static Optimization Review. Adapted from Bhuyan, Patterson, Eggers, probably others
Multithreading Processors and Static Optimization Review Adapted from Bhuyan, Patterson, Eggers, probably others Schedule of things to do By Wednesday the 9 th at 9pm Please send a milestone report (as
More informationCS377P Programming for Performance Multicore Performance Multithreading
CS377P Programming for Performance Multicore Performance Multithreading Sreepathi Pai UTCS October 14, 2015 Outline 1 Multiprocessor Systems 2 Programming Models for Multicore 3 Multithreading and POSIX
More informationMulti-threading technology and the challenges of meeting performance and power consumption demands for mobile applications
Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications September 2013 Navigating between ever-higher performance targets and strict limits
More informationFundamentals of Computer Design
Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University
More informationCISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP
CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationThe Power Wall. Why Aren t Modern CPUs Faster? What Happened in the Late 1990 s?
The Power Wall Why Aren t Modern CPUs Faster? What Happened in the Late 1990 s? Edward L. Bosworth, Ph.D. Associate Professor TSYS School of Computer Science Columbus State University Columbus, Georgia
More informationProcessing Unit CS206T
Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct
More informationWHY PARALLEL PROCESSING? (CE-401)
PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:
More informationCPU Architecture Overview. Varun Sampath CIS 565 Spring 2012
CPU Architecture Overview Varun Sampath CIS 565 Spring 2012 Objectives Performance tricks of a modern CPU Pipelining Branch Prediction Superscalar Out-of-Order (OoO) Execution Memory Hierarchy Vector Operations
More informationEECS 470. Lecture 18. Simultaneous Multithreading. Fall 2018 Jon Beaumont
Lecture 18 Simultaneous Multithreading Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi,
More informationMulticore Hardware and Parallelism
Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3
More informationECE/CS 250 Computer Architecture. Summer 2016
ECE/CS 250 Computer Architecture Summer 2016 Multicore Dan Sorin and Tyler Bletsch Duke University Multicore and Multithreaded Processors Why multicore? Thread-level parallelism Multithreaded cores Multiprocessors
More informationMulticore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.
CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous
More informationComputer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John
Computer Performance Evaluation and Benchmarking EE 382M Dr. Lizy Kurian John Evolution of Single-Chip Transistor Count 10K- 100K Clock Frequency 0.2-2MHz Microprocessors 1970 s 1980 s 1990 s 2010s 100K-1M
More informationComputer Architecture
Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 83 Part III Multi-Core
More informationSimultaneous Multithreading and the Case for Chip Multiprocessing
Simultaneous Multithreading and the Case for Chip Multiprocessing John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 2 10 January 2019 Microprocessor Architecture
More informationPipelined Hash-Join on Multithreaded Architectures
Pipelined Hash-Join on Multithreaded Architectures Philip Garcia University of Wisconsin-Madison Madison, WI 53706 USA pcgarcia@wisc.edu Henry F. Korth Lehigh University Bethlehem, PA 805 USA hfk@lehigh.edu
More informationSpring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University
18-742 Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core Prof. Onur Mutlu Carnegie Mellon University Research Project Project proposal due: Jan 31 Project topics Does everyone have a topic?
More informationAn In-order SMT Architecture with Static Resource Partitioning for Consumer Applications
An In-order SMT Architecture with Static Resource Partitioning for Consumer Applications Byung In Moon, Hongil Yoon, Ilgu Yun, and Sungho Kang Yonsei University, 134 Shinchon-dong, Seodaemoon-gu, Seoul
More informationIntroduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes
Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel
More informationComputer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-742 Fall 2012, Parallel Computer Architecture, Lecture 9: Multithreading
More informationECE 588/688 Advanced Computer Architecture II
ECE 588/688 Advanced Computer Architecture II Instructor: Alaa Alameldeen alaa@ece.pdx.edu Fall 2009 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2009 1 When and Where? When:
More informationCS 654 Computer Architecture Summary. Peter Kemper
CS 654 Computer Architecture Summary Peter Kemper Chapters in Hennessy & Patterson Ch 1: Fundamentals Ch 2: Instruction Level Parallelism Ch 3: Limits on ILP Ch 4: Multiprocessors & TLP Ap A: Pipelining
More informationLecture: SMT, Cache Hierarchies. Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1)
Lecture: SMT, Cache Hierarchies Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized
More informationSeveral Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining
Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the
More informationMotivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism
Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the
More informationThis Material Was All Drawn From Intel Documents
This Material Was All Drawn From Intel Documents A ROAD MAP OF INTEL MICROPROCESSORS Hao Sun February 2001 Abstract The exponential growth of both the power and breadth of usage of the computer has made
More informationSimultaneous Multithreading (SMT)
#1 Lec # 2 Fall 2003 9-10-2003 Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing
More informationParallel Programming Multicore systems
FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have
More informationEvaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000
Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000 Mitesh R. Meswani and Patricia J. Teller Department of Computer Science, University
More informationWilliam Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers
William Stallings Computer Organization and Architecture 8 th Edition Chapter 18 Multicore Computers Hardware Performance Issues Microprocessors have seen an exponential increase in performance Improved
More informationIntel Hyper-Threading technology
Intel Hyper-Threading technology technology brief Abstract... 2 Introduction... 2 Hyper-Threading... 2 Need for the technology... 2 What is Hyper-Threading?... 3 Inside the technology... 3 Compatibility...
More information