Why Parallel Computing?

Size: px
Start display at page:

Download "Why Parallel Computing?"

Transcription

1 Why Parallel Computing? Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong

2 What is Parallel Computing? Software with multiple threads Parallel vs. concurrent Parallel computing executes multiple threads at the same time on multiple processors performance Concurrent computing can either alternate multiple threads on a single processor or execute in parallel on multiple processors convenient design SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 2

3 What is Parallel Architecture? Machines with multiple processors Intel Quad Core i7 (Nehalem) Sony Playstation 3/4 IBM Blue Gene Parallel Computing vs. Distributed Computing Google Data Center Locations Facebook Data Center SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 3

4 What is Parallel Architecture? One definition A parallel computer is a collection of processing elements that cooperate to solve large problems fast Some general issues Resource allocation Data access, communication and synchronization Performance and scalability SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 4

5 Why Study Parallel Computing? 10 years ago Some important applications demand high performance With CPU clock frequency scaling, it is not enough Parallel computing provides higher performance Today Many interesting applications demand high performance CPU clock rates are no longer increasing! Instruction-level parallelism is not increasing either! Parallel computing is the only way to achieve higher performance in the foreseeable future SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 5

6 Why Study Parallel Architecture? Role of a computer architect To design and engineer the various levels of a computer system to maximize performance and programmability within limits of technology and cost Parallelism Provides alternative to faster clock for performance Applies at all levels of system design Bits, instructions, loops, threads, SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 6

7 Why Parallel Computing? Application demands Architectural Trends The impact of technology and power SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 7

8 Application Trends A positive feedback cycle between the two Delivered performance Applications demand for performance New Applications More Performance Example application domains Scientific computing: computational fluid dynamics, biology, meteorology, General purpose computing: video, graphics, Commercial computing: databases, SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 8

9 Speedup Goal of applications in using parallel machines Speedup (p processors) = Performance (p processors) Performance (1 processor) For a fixed problem size (input data set), performance = 1/time Speedup fixed problem (p processors) = Time (1 processor) Time (p processors) SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 9

10 Scientific Computing Demand SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong

11 Example: N-body Simulation Simulation of a dynamical system of particles N particles (location, mass, velocity) Influence of physical forces (e.g., gravity) Physics, astronomy, SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 11

12 Example: Graph Computation Google page-rank algorithm Determining the importance of a web page by counting the number and quality of likes to the page Relative importance of a web page u Repeat until stabilized Trillion pages in Web Credit: wikipedia SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 12

13 Engineering Computing Large parallel machines are mainstays in many industries Petroleum reservoir analysis Automotive crash simulation, drag analysis, combustion efficiency Aeronautics airflow analysis, engine efficiency, structural mechanics, electromagnetism Computer-aided design VLSI layout Pharmaceuticals molecular modeling Visualization all the above, entertainment, architecture Financial modeling yield and derivative analysis SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 13

14 Commercial Computing Relies on parallelism for high end Computational power determines scale of business that can be handled e.g.) databases, online-transaction processing, decision support, data mining, data warehousing... TPC benchmarks TPC-C (order entry), TPC-D (decision support) Explicit scaling criteria provided Size of enterprise scales with size of system Problem size no longer fixed as p increases, so throughput is used as a performance measure, transactions per minute (tpm) SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 14

15 Commercial Computing Hardware threads in the Top-10 TPC-C machines SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 15

16 Why Parallel Computing? Application demands Architectural Trends The impact of technology and power SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 16

17 Architectural Trends Architecture translates technology s gifts into performance and capability Four generations of architectural history Tube, transistor, IC, VLSI Greatest delineation in VLSI has been in type of parallelism exploited SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 17

18 Moore s Law Gordon Moore Co-founder of Intel co. 1965: the number of transistors/inch 2 doubles every 1.5 years Credit: shigeru23 (CC-BY-SA-3.0) SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 18

19 Evolution of Intel Microprocessors Intel 8088, core, no cache 29K transistors Intel Nehalem-EX, cores, 24MB cache 2.3B transistors Intel knight landing, 2016(?) 72 cores, 16GB DDR3 LLC(?) 7.1B transistors Figure credits: ExtremeTech, The future of microprocessors, Communications of the ACM, vol. 54, no.5 SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 19

20 Transistors & Clock Frequency Credit: The future of computing performance: game over or next level?, The National Academies Press, 2011 SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 20

21 Impact of Device Shrinkage What happens when the transistor size shrinks by a factor of x? Clock rate increases by x because wires are shorter Actually less than x, because of power consumption Transistors per unit area increases by x 2 Die size also tends to increase Typically another factor of ~x Raw computing power of the chip goes up by ~ x 4! Typically x 3 is devoted to either on-chip Parallelism: hidden parallelism such as ILP Locality: caches So most programs x 3 times faster, without changing them SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 21

22 Architectural Trends Greatest trend in VLSI generation is Increase in parallelism Up to 1985: bit level parallelism 4-bit 8 bit 16-bit, slows after 32 bit Adoption of 64-bit now under way, 128-bit far (not performance issue) Mid 80s to mid 90s: instruction level parallelism Pipelining and simple instruction sets + compiler advances (RISC) On-chip caches and functional units superscalar execution Greater sophistication: out of order execution, speculation, prediction To deal with control transfer and latency problems Now: thread level parallelism Multicore processors SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 22

23 Phases in VLSI Generation 100,000,000 Bit-level parallelism Instruction-level Thread-level (?) Transistors 10,000,000 1,000, ,000 4bits 8bits 16bits 32bits i80286 R10000 Pentium i80386 R2000 R3000 Multicore / manycore Heterogenous multicore GPGPU 10,000 i8086 i8080 i8008 i4004 Pipelining Superscalar Branch prediction Out-of-order execution VLIW 1, SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

24 Superscalar Execution Exploits instruction-level parallelism With N pipelines, N instructions can be issued (executed) in parallel 1. Load 2. Load 3. Add 4. Add 5. Add R1, R2 6. Store 2-way superscalar execution pipeline SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 24

25 Out-of-order Execution Exploits instruction-level parallelism With N pipelines, N instructions can be issued (executed) in parallel Reorder instructions 1. Load 2. Load 3. Add 4. Add 5. Add R1, R2 6. Store dependency 2-way superscalar execution pipeline SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 25

26 Architecture-driven Advances Credit: The Future of Microprocessors, Communications of the ACM, 2011 SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 26

27 Limitation in ILP Fraction of total cycles (%) Speedup Number of instructions issued Instructions issued per cycle (Issue width) Speedup begins to saturate after issue width of 4 Assumption of ideal machine Infinite resources and fetch bandwidth, perfect branch prediction and renaming But with realistic memory Real caches and non-zero miss latencies SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 27

28 Instruction-level Parallelism Concerns Issue waste cycles Full issue slot Empty issue slot Horizontal waste = 9 slots Vertical waste = 12 slots Issue slots (4way) Contributing factors Instruction dependencies Long-latency operations within a thread Credit: D. M. Tullsen, S. J. Eggers, and H. M. Levy, Simultaneous Multithreading: Maximizing On-Chip Parallelism, ISCA 1995 SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 28

29 Some Sources of Wasted Issue Slots TLB miss Instruction cache miss Data cache miss Load delays (L1 hit latency) Branch misprediction Instruction dependency Memory conflict Memory hierarchy Control flow Instruction stream SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 29

30 Simulation of 8-issue Superscalar Processor Highly underutilized processor cycles In most of SPEC 92 applications On average < 1.5 IPC (19%) Dominant waste differs by application Credit: D. M. Tullsen, S. J. Eggers, and H. M. Levy, Simultaneous Multithreading: Maximizing On-Chip Parallelism, ISCA 1995 SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 30

31 Single-Thread Performance Curve Architecture -driven Technology -driven The rate of single-thread performance improvement has decreased [Computer Architectures Hennessy & Patterson] SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 31

32 Single-chip Multiprocessor (Multicore) Limited instruction-level parallelism à Exploits thread-level parallelism 30% area 6-way superscalar processor 4 2-way multiprocessor Credit: K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang, A case for single-chip multiprocessor, ASPLOS, 1996 SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 32

33 Why Parallel Computing? Application demands Architectural Trends The impact of technology and power SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 33

34 Desperate Cooling? SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 34

35 Cooling Power delivery, packaging, and cooling costs Increased cost of thermal packing $1 per watt for CPUs more than 35W [Tiwari, DAC98] At high-end, 1W of cooling for every 1W of power Cooling Physical solutions Heatsink, thermal paste, fan APM (advanced power management) BIOS manages clock speed Triggered by thermal diodes ACPI (Advanced Configuration and Power Interface) OS manages power for all devices Idle, nap, sleep modes SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 35

36 Power Density (W/ cm 2 ) Power Consumption Trend Source: Patrick Gelsinger, Shenkar Bokar, Intelâ Reason for power trends Smaller feature size Increased compaction High clock frequency More complicated processors Sun s Surface Rocket Nozzle Nuclear Reactor Hot Plate 8086 P Pentium Year Process Transistors Supply voltage Frequency Power nm 0.2B V 1GHz 100W 90nm 1.2B V 3GHz 175W 55nm 7.8B 0.7V 10GHz 300W 3nm 50B 0.5V 30GHz 540W source: Intel (2002) SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 36

37 Power Density Limits Serial Performance Single-core processor Dynamic power = V 2 fc V = supply voltage, f = frequency, C = capacitance Increasing frequency (f) increases supply voltage (V) à Cubic effect Multicore processor Dynamic power = V 2 fc Increasing # of cores increase capacitance (C) linearly More transistors by Moore s law à more cores for higher performance instead or, same performance at reduced power SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 37

38 Power Consumption (watts) Core2Quad 3.0GHz Core2duo 3.0GHz Core2duo 2.4GHz (mobile) Core2duo 1.2GHz (mobile) SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong Source: MIT IAP

39 Evolution of Microprocessors Chip density continues increase 2x every 2 years Clock speed is not # of cores increase Power no longer increases SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 39

40 Recent Multicore Processors 2H 15: Intel Knight s Landing 60+ cores; 4-way SMT; 16GB (on pkg) 2015: Oracle SPARC M7 32 cores; 8-way fine-grain MT; 64MB cache Fall 14: Intel Haswell 18 cores; 2-way SMT; 45MB cache June 14: IBM Power8 12 cores; 8-way SMT; 96MB cache Sept 13: SPARC M6 12 cores; 8-way fine-grain MT; 48MB cache May 12: AMD Trinity 4 CPU cores; 384 graphics cores Q2 13: Intel Knight s Corner (coprocessor) 61 cores; 2-way SMT; 16MB cache Feb 12: Blue Gene/Q cores; 4-way SMT; 32MB cache Blue Gene/Q compute chip Credit: The IBM Blue Gene/Q Compute Chip, IEEE Micro 2012 SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 40

41 Simultaneous Multithreading (SMT) Permit several independent threads to issue to multiple functional units each cycle SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 41 Credit: R. Kalla, B. Sinharoy, J. Tendler, Simultaneous Multi-threading implementation in Power5, HotChips, 2003

42 Simultaneous Multithreading (SMT) SMT are easily added to superscalar architecture Two set of PCs and registers PC shares I-fetch unit Register rename maps second set of registers Credit: R. Kalla, B. Sinharoy, J. Tendler, Simultaneous Multi-threading implementation in Power5, HotChips, 2003 SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 42

43 Large-scale Parallel Hardware Top 7 sites in Top500 supercomputers (Nov. 2014) 148th, 169th, 170th Korea Meteorological administration Credit: top500.org SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 43

44 Cores/Socket & Cores in Top500 fef Systems k-8k 100 8k-16k 50 16k-32k 32k-64k 0 64k-128k k- Credit: top500.org, CS267@UC Berkeley by J. Demmel SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 44

45 Blue Gene/Q Packing Hierarchy 1. Chip 16 cores 2. Module Single Chip 3. Compute Card One single chip module, 16 GB DDR3 Memory 4. Node Card 32 Compute Cards, Optical Modules, Link Chips, Torus 5b. I/O Drawer 8 I/O Cards 8 PCIe Gen2 slots 6. Rack 2 Midplanes 1, 2 or 4 I/O Drawers 7. System 20PF/s 5a. Midplane 16 Node Cards Credit: The Blue Gene/Q Compute Chip, HotChips 2011 SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 45

46 Summary: Why Parallel Computing? If you can t exploit parallelism, performance will be a zero sum game If you want to add a new feature and maintain performance, you will need to remove something else. The future is multicore (i.e. parallel) according to all major processor vendors We expect more and more cores will be delivered with each generation Starting now, everyone will need to know how to write parallel programs It is no longer a niche area: it is a mainstream. SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) 46

47 References Some slides are adopted and adapted from CS194: Parallel Programming by Prof. Katherine Yelick at UC Berkeley CS267: Applications of Parallel Computers by Prof. Jim Demmel at UC Berkeley CS258: Parallel Computer Architecture by Prof. David E. Culler at UC Berkeley COMP422: Parallel Computing by Prof. John Mellor-Crummey at Rice Univ. SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong 47

Why Parallel Computing? Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Why Parallel Computing? Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Why Parallel Computing? Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu What is Parallel Computing? Software with multiple threads Parallel vs. concurrent

More information

Parallel Computing. Parallel Computing. Hwansoo Han

Parallel Computing. Parallel Computing. Hwansoo Han Parallel Computing Parallel Computing Hwansoo Han What is Parallel Computing? Software with multiple threads Parallel vs. concurrent Parallel computing executes multiple threads at the same time on multiple

More information

Why Parallel Architecture

Why Parallel Architecture Why Parallel Architecture and Programming? Todd C. Mowry 15-418 January 11, 2011 What is Parallel Programming? Software with multiple threads? Multiple threads for: convenience: concurrent programming

More information

Parallel Computer Architecture

Parallel Computer Architecture Parallel Computer Architecture What is Parallel Architecture? A parallel computer is a collection of processing elements that cooperate to solve large problems fast Some broad issues: Resource Allocation:»

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

Advanced Processor Architecture

Advanced Processor Architecture Advanced Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong

More information

Multithreading: Exploiting Thread-Level Parallelism within a Processor

Multithreading: Exploiting Thread-Level Parallelism within a Processor Multithreading: Exploiting Thread-Level Parallelism within a Processor Instruction-Level Parallelism (ILP): What we ve seen so far Wrap-up on multiple issue machines Beyond ILP Multithreading Advanced

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading CS 152 Computer Architecture and Engineering Lecture 18: Multithreading Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

Lecture 14: Multithreading

Lecture 14: Multithreading CS 152 Computer Architecture and Engineering Lecture 14: Multithreading John Wawrzynek Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~johnw

More information

Simultaneous Multithreading and the Case for Chip Multiprocessing

Simultaneous Multithreading and the Case for Chip Multiprocessing Simultaneous Multithreading and the Case for Chip Multiprocessing John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 2 10 January 2019 Microprocessor Architecture

More information

Multicore and Parallel Processing

Multicore and Parallel Processing Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 xkcd/619 2 Pitfall: Amdahl s Law Execution time after improvement

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Why do I need four computing cores on my phone?! Why do I need eight computing

More information

Microprocessor Trends and Implications for the Future

Microprocessor Trends and Implications for the Future Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

CSC 447: Parallel Programming for Multi- Core and Cluster Systems

CSC 447: Parallel Programming for Multi- Core and Cluster Systems CSC 447: Parallel Programming for Multi- Core and Cluster Systems Why Parallel Computing? Haidar M. Harmanani Spring 2017 Definitions What is parallel? Webster: An arrangement or state that permits several

More information

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University 18-742 Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core Prof. Onur Mutlu Carnegie Mellon University Research Project Project proposal due: Jan 31 Project topics Does everyone have a topic?

More information

Administration. Coursework. Prerequisites. CS 378: Programming for Performance. 4 or 5 programming projects

Administration. Coursework. Prerequisites. CS 378: Programming for Performance. 4 or 5 programming projects CS 378: Programming for Performance Administration Instructors: Keshav Pingali (Professor, CS department & ICES) 4.126 ACES Email: pingali@cs.utexas.edu TA: Hao Wu (Grad student, CS department) Email:

More information

NOW Handout Page 1 NO! Today s Goal: CS 258 Parallel Computer Architecture. What will you get out of CS258? Will it be worthwhile?

NOW Handout Page 1 NO! Today s Goal: CS 258 Parallel Computer Architecture. What will you get out of CS258? Will it be worthwhile? Today s Goal: CS 258 Parallel Computer Architecture Introduce you to Parallel Computer Architecture Answer your questions about CS 258 Provide you a sense of the trends that shape the field CS 258, Spring

More information

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John Computer Performance Evaluation and Benchmarking EE 382M Dr. Lizy Kurian John Evolution of Single-Chip Transistor Count 10K- 100K Clock Frequency 0.2-2MHz Microprocessors 1970 s 1980 s 1990 s 2010s 100K-1M

More information

CS 654 Computer Architecture Summary. Peter Kemper

CS 654 Computer Architecture Summary. Peter Kemper CS 654 Computer Architecture Summary Peter Kemper Chapters in Hennessy & Patterson Ch 1: Fundamentals Ch 2: Instruction Level Parallelism Ch 3: Limits on ILP Ch 4: Multiprocessors & TLP Ap A: Pipelining

More information

CS 194 Parallel Programming. Why Program for Parallelism?

CS 194 Parallel Programming. Why Program for Parallelism? CS 194 Parallel Programming Why Program for Parallelism? Katherine Yelick yelick@cs.berkeley.edu http://www.cs.berkeley.edu/~yelick/cs194f07 8/29/2007 CS194 Lecure 1 What is Parallel Computing? Parallel

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls

More information

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:

More information

45-year CPU Evolution: 1 Law -2 Equations

45-year CPU Evolution: 1 Law -2 Equations 4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there

More information

CS 152 Computer Architecture and Engineering. Lecture 14: Multithreading

CS 152 Computer Architecture and Engineering. Lecture 14: Multithreading CS 152 Computer Architecture and Engineering Lecture 14: Multithreading Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

Multicore Hardware and Parallelism

Multicore Hardware and Parallelism Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3

More information

Advanced Computer Architecture (CS620)

Advanced Computer Architecture (CS620) Advanced Computer Architecture (CS620) Background: Good understanding of computer organization (eg.cs220), basic computer architecture (eg.cs221) and knowledge of probability, statistics and modeling (eg.cs433).

More information

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering CSE 141: Computer 0 Architecture Professor: Michael Taylor RF UCSD Department of Computer Science & Engineering Computer Architecture from 10,000 feet foo(int x) {.. } Class of application Physics Computer

More information

ECE 588/688 Advanced Computer Architecture II

ECE 588/688 Advanced Computer Architecture II ECE 588/688 Advanced Computer Architecture II Instructor: Alaa Alameldeen alaa@ece.pdx.edu Fall 2009 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2009 1 When and Where? When:

More information

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,

More information

Parallel Processing SIMD, Vector and GPU s cont.

Parallel Processing SIMD, Vector and GPU s cont. Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP

More information

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Module 18: TLP on Chip: HT/SMT and CMP Lecture 39: Simultaneous Multithreading and Chip-multiprocessing TLP on Chip: HT/SMT and CMP SMT TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

Interconnection Network

Interconnection Network Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) Topics

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor Single-Issue Processor (AKA Scalar Processor) CPI IPC 1 - One At Best 1 - One At best 1 From Single-Issue to: AKS Scalar Processors CPI < 1? How? Multiple issue processors: VLIW (Very Long Instruction

More information

Parallelism in Hardware

Parallelism in Hardware Parallelism in Hardware Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3 Moore s Law

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010 Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)

More information

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş Evolution of Computers & Microprocessors Dr. Cahit Karakuş Evolution of Computers First generation (1939-1954) - vacuum tube IBM 650, 1954 Evolution of Computers Second generation (1954-1959) - transistor

More information

Parallel Programming: Principle and Practice

Parallel Programming: Principle and Practice Parallel Programming: Principle and Practice INTRODUCTION Course Goals The students will get the skills to use some of the best existing parallel programming tools, and be exposed to a number of open research

More information

Parallelism, Multicore, and Synchronization

Parallelism, Multicore, and Synchronization Parallelism, Multicore, and Synchronization Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon, Bala, Bracy, McKee, and Sirer, Roth, Martin] xkcd/619 3 Big Picture: Multicore

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Administration. Prerequisites. CS 395T: Topics in Multicore Programming. Why study parallel programming? Instructors: TA:

Administration. Prerequisites. CS 395T: Topics in Multicore Programming. Why study parallel programming? Instructors: TA: CS 395T: Topics in Multicore Programming Administration Instructors: Keshav Pingali (CS,ICES) 4.126A ACES Email: pingali@cs.utexas.edu TA: Aditya Rawal Email: 83.aditya.rawal@gmail.com University of Texas,

More information

Goals of this lecture

Goals of this lecture Power Density (W/cm 2 ) Goals of this lecture Design of Parallel and High-Performance Computing Fall 2017 Lecture: Introduction Motivate you! Trends High performance computing Programming models Course

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More

More information

Reference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded

Reference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded Reference Case Study of a Multi-core, Multithreaded Processor The Sun T ( Niagara ) Computer Architecture, A Quantitative Approach, Fourth Edition, by John Hennessy and David Patterson, chapter. :/C:8

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 1 Introduction Li Jiang 1 Course Details Time: Tue 8:00-9:40pm, Thu 8:00-9:40am, the first 16 weeks Location: 东下院 402 Course Website: TBA Instructor:

More information

CS/EE 6810: Computer Architecture

CS/EE 6810: Computer Architecture CS/EE 6810: Computer Architecture Class format: Most lectures on YouTube *BEFORE* class Use class time for discussions, clarifications, problem-solving, assignments 1 Introduction Background: CS 3810 or

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Multi-{Socket,,Thread} Getting More Performance Keep pushing IPC and/or frequenecy Design complexity (time to market) Cooling (cost) Power delivery (cost) Possible, but too

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 14: Multithreading Part 2 Synchronization 1

CS252 Spring 2017 Graduate Computer Architecture. Lecture 14: Multithreading Part 2 Synchronization 1 CS252 Spring 2017 Graduate Computer Architecture Lecture 14: Multithreading Part 2 Synchronization 1 Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture

More information

Performance of computer systems

Performance of computer systems Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type

More information

The Power Wall. Why Aren t Modern CPUs Faster? What Happened in the Late 1990 s?

The Power Wall. Why Aren t Modern CPUs Faster? What Happened in the Late 1990 s? The Power Wall Why Aren t Modern CPUs Faster? What Happened in the Late 1990 s? Edward L. Bosworth, Ph.D. Associate Professor TSYS School of Computer Science Columbus State University Columbus, Georgia

More information

Computer Architecture Crash course

Computer Architecture Crash course Computer Architecture Crash course Frédéric Haziza Department of Computer Systems Uppsala University Summer 2008 Conclusions The multicore era is already here cost of parallelism is dropping

More information

Computer Architecture. Fall Dongkun Shin, SKKU

Computer Architecture. Fall Dongkun Shin, SKKU Computer Architecture Fall 2018 1 Syllabus Instructors: Dongkun Shin Office : Room 85470 E-mail : dongkun@skku.edu Office Hours: Wed. 15:00-17:30 or by appointment Lecture notes nyx.skku.ac.kr Courses

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Processor Architecture

Processor Architecture Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)

More information

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Advanced Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

EECS4201 Computer Architecture

EECS4201 Computer Architecture Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis These slides are based on the slides provided by the publisher. The slides will be

More information

Lecture 7 Instruction Level Parallelism (5) EEC 171 Parallel Architectures John Owens UC Davis

Lecture 7 Instruction Level Parallelism (5) EEC 171 Parallel Architectures John Owens UC Davis Lecture 7 Instruction Level Parallelism (5) EEC 171 Parallel Architectures John Owens UC Davis Credits John Owens / UC Davis 2007 2009. Thanks to many sources for slide material: Computer Organization

More information

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor 1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic

More information

Computer Architecture

Computer Architecture Informatics 3 Computer Architecture Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh (thanks to Prof. Nigel Topham) General Information Instructor

More information

Supercomputing and Mass Market Desktops

Supercomputing and Mass Market Desktops Supercomputing and Mass Market Desktops John Manferdelli Microsoft Corporation This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

More information

Advanced Instruction-Level Parallelism

Advanced Instruction-Level Parallelism Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue

More information

Exploitation of instruction level parallelism

Exploitation of instruction level parallelism Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

Lec 25: Parallel Processors. Announcements

Lec 25: Parallel Processors. Announcements Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza

More information

Lecture 21: Parallelism ILP to Multicores. Parallel Processing 101

Lecture 21: Parallelism ILP to Multicores. Parallel Processing 101 18 447 Lecture 21: Parallelism ILP to Multicores S 10 L21 1 James C. Hoe Dept of ECE, CMU April 7, 2010 Announcements: Handouts: Lab 4 due this week Optional reading assignments below. The Microarchitecture

More information

Simultaneous Multithreading on Pentium 4

Simultaneous Multithreading on Pentium 4 Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on

More information

Chapter 1: Fundamentals of Quantitative Design and Analysis

Chapter 1: Fundamentals of Quantitative Design and Analysis 1 / 12 Chapter 1: Fundamentals of Quantitative Design and Analysis Be careful in this chapter. It contains a tremendous amount of information and data about the changes in computer architecture since the

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Transistors and Wires

Transistors and Wires Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis Part II These slides are based on the slides provided by the publisher. The slides

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

CSC 447: Parallel Programming for Multi- Core and Cluster Systems. Lectures TTh, 11:00-12:15 from January 16, 2018 until 25, 2018 Prerequisites

CSC 447: Parallel Programming for Multi- Core and Cluster Systems. Lectures TTh, 11:00-12:15 from January 16, 2018 until 25, 2018 Prerequisites CSC 447: Parallel Programming for Multi- Core and Cluster Systems Introduction and A dministrivia Haidar M. Harmanani Spring 2018 Course Introduction Lectures TTh, 11:00-12:15 from January 16, 2018 until

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Administration. Course material. Prerequisites. CS 395T: Topics in Multicore Programming. Instructors: TA: Course in computer architecture

Administration. Course material. Prerequisites. CS 395T: Topics in Multicore Programming. Instructors: TA: Course in computer architecture CS 395T: Topics in Multicore Programming Administration Instructors: Keshav Pingali (CS,ICES) 4.26A ACES Email: pingali@cs.utexas.edu TA: Xin Sui Email: xin@cs.utexas.edu University of Texas, Austin Fall

More information

TDT 4260 lecture 7 spring semester 2015

TDT 4260 lecture 7 spring semester 2015 1 TDT 4260 lecture 7 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Repetition Superscalar processor (out-of-order) Dependencies/forwarding

More information

Lecture 1: CS/ECE 3810 Introduction

Lecture 1: CS/ECE 3810 Introduction Lecture 1: CS/ECE 3810 Introduction Today s topics: Why computer organization is important Logistics Modern trends 1 Why Computer Organization 2 Image credits: uber, extremetech, anandtech Why Computer

More information

Administration. Prerequisites. Website. CSE 392/CS 378: High-performance Computing: Principles and Practice

Administration. Prerequisites. Website. CSE 392/CS 378: High-performance Computing: Principles and Practice CSE 392/CS 378: High-performance Computing: Principles and Practice Administration Professors: Keshav Pingali 4.126 ACES Email: pingali@cs.utexas.edu Jim Browne Email: browne@cs.utexas.edu Robert van de

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1996 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue

More information

ECE 588/688 Advanced Computer Architecture II

ECE 588/688 Advanced Computer Architecture II ECE 588/688 Advanced Computer Architecture II Instructor: Alaa Alameldeen alaa@ece.pdx.edu Winter 2018 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2018 1 When and Where? When:

More information

Moore s Law. CS 6534: Tech Trends / Intro. Good Ol Days: Frequency Scaling. The Power Wall. Charles Reiss. 24 August 2016

Moore s Law. CS 6534: Tech Trends / Intro. Good Ol Days: Frequency Scaling. The Power Wall. Charles Reiss. 24 August 2016 Moore s Law CS 6534: Tech Trends / Intro Microprocessor Transistor Counts 1971-211 & Moore's Law 2,6,, 1,,, Six-Core Core i7 Six-Core Xeon 74 Dual-Core Itanium 2 AMD K1 Itanium 2 with 9MB cache POWER6

More information

CS 152, Spring 2011 Section 10

CS 152, Spring 2011 Section 10 CS 152, Spring 2011 Section 10 Christopher Celio University of California, Berkeley Agenda Stuff (Quiz 4 Prep) http://3dimensionaljigsaw.wordpress.com/2008/06/18/physics-based-games-the-new-genre/ Intel

More information

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University Advanced d Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

Hyperthreading Technology

Hyperthreading Technology Hyperthreading Technology Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville milenka@ece.uah.edu www.ece.uah.edu/~milenka/ Outline What is hyperthreading?

More information

Exploring different level of parallelism Instruction-level parallelism (ILP): how many of the operations/instructions in a computer program can be performed simultaneously 1. e = a + b 2. f = c + d 3.

More information

CS 6534: Tech Trends / Intro

CS 6534: Tech Trends / Intro 1 CS 6534: Tech Trends / Intro Charles Reiss 24 August 2016 Moore s Law Microprocessor Transistor Counts 1971-2011 & Moore's Law 16-Core SPARC T3 2,600,000,000 1,000,000,000 Six-Core Core i7 Six-Core Xeon

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

The Art of Parallel Processing

The Art of Parallel Processing The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

ECE 486/586. Computer Architecture. Lecture # 2

ECE 486/586. Computer Architecture. Lecture # 2 ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:

More information