High-Performance Processors Design Choices

Size: px
Start display at page:

Download "High-Performance Processors Design Choices"

Transcription

1 High-Performance Processors Design Choices Ramon Canal PD Fall

2 High-Performance Processors Design Choices 1 Motivation 2 Multiprocessors 3 Multithreading 4 VLIW 2

3 Motivation Multiprocessors Outline SISD, SIMD, MIMD, and MISD Memory organization Communication mechanisms Multithreading VLIW 3

4 Motivation Instruction-Level Parallelism (ILP): What all we have covered so far: simple pipelining dynamic scheduling: scoreboarding and Tomasulo s alg. dynamic branch prediction multiple-issue architectures: superscalar, VLIW compiler techniques and software approaches Bottom line: There just aren t enough instructions that can actually be executed in parallel! instruction issue: limit on maximum issue count branch prediction: imperfect # registers: finite functional units: limited in number data dependencies: hard to detect dependencies via memory 4

5 So, What do we do? Key Idea: Increase number of running processes multiple processes: at a given point in time i.e., at the granularity of one (or a few) clock cycles not sufficient to have multiple processes at the OS level! Two Approaches: multiple CPU s: each executing a distinct process Multiprocessors or Parallel Architectures single CPU: executing multiple processes ( threads ) Multi-threading or Thread-level parallelism 5

6 Taxonomy of Parallel Architectures Flynn s Classification: SISD: Single instruction stream, single data stream uniprocessor SIMD: Single instruction stream, multiple data streams same instruction executed by multiple processors each has its own data memory Ex: multimedia processors, vector architectures MISD: Multiple instruction streams, single data stream successive functional units operate on the same stream of data rarely found in general-purpose commercial designs special-purpose stream processors (digital filters etc.) MIMD: Multiple instruction stream, multiple data stream each processor has its own instruction and data streams most popular form of parallel processing single-user: high-performance for one application multiprogrammed: running many tasks simultaneously (e.g., servers) 6

7 Multiprocessor: Memory Organization Centralized, sharedmemory multiprocessor: usually few processors share single memory & bus use large caches 7

8 Multiprocessor: Memory Organization Distributed-memory multiprocessor: can support large processor counts cost-effective way to scale memory bandwidth works well if most accesses are to local memory node requires interconnection network communication between processors becomes more complicated, slower 8

9 Communication Mechanisms Shared-Memory Communication around for a long time, so well understood and standardized memory-mapped ease of programming when communication patterns are complex or dynamically varying better use of bandwidth when items are small Problem: cache coherence harder use Snoopy and other protocols Message-Passing Communication (i.e. intel s Knight family) simpler hardware because keeping caches coherent is easier communication is explicit, simpler to understand focuses programmer attention on communication synchronization: naturally associated with communication fewer errors due to incorrect synchronization 9

10 Multiprocessor: Hybrid Organization Use distributed-memory organization at top level Each node itself may be a shared-memory multiprocessor (2-8 processors) 10

11 Multiprocessor: Hybrid Organization Use distributed-memory organization at top level Each node itself may be a shared-memory multiprocessor (2-8 processors) What about Big Data? Is it a game changer? Next slides based on the following works: M. Ferdman et al. Clearing the clouds ASPLOS 12 P.Lotfi-Kamran et al. Scale-OutProcessors ISCA 12 B. Grot et al. Optimizing Datacenter TCO with Scale-Out Processors, IEEE MICRO 2012 Next couple of slides of Prof. Babak Falsafi (EPFL) 11

12 Multiprocessors and Big Data PD,

13 PD,

14 PD,

15 PD,

16 PD,

17 Scale-out Processors Small LLC. Just to capture instructions. More cores for higher throughput Pods for small distance to memory PD,

18 Performance Iso server power (20MW) PD,

19 Summary Multiprocessors Need to tailor chip design to applications Big Data applications are too big for data caches. Best solution is too eliminate them. Big Data applications in need of coarse grain parallelism (i.e. At the request level) Still single-thread performance is STILL important for other applications (i.e. Computation intensive) PD,

20 Multithreading Threads: multiple processes that share code and data (and much of their address space) recently, the term has come to include processes that may run on different processors and even have disjoint address spaces, as long as they share the code Multithreading: exploit thread-level parallelism within a processor fine-grain multithreading switch between threads on each instruction! coarse-grain multithreading switch to a different thread only if current thread has a costly stall E.g., switch only on a level-2 cache miss 20

21 Multithreading How can we guarantee no dependencies between instructions in a pipeline? One way is to interleave execution of instructions from different program threads on same pipeline Interleave 4 threads, T1-T4, on non-bypassed 5-stage pipe T1: LW r1, 0(r2) T2: ADD r7, r1, r4 T3: XORI r5, r4, #12 T4: SW 0(r7), r5 T1: LW r5, 12(r1) 21

22 Simple Multithreaded Pipeline Have to carry thread select down pipeline to ensure correct state bits read/written at each pipe stage 22

23 Multithreading Fine-grain multithreading switch between threads on each instruction! multiple threads executed in interleaved manner interleaving is usually round-robin CPU must be capable of switching threads on every cycle! fast, frequent switches main disadvantage: slows down the execution of individual threads that is, traded off latency for better throughput 23

24 CDC 6600 Peripheral Processors (Cray, 1965) First multithreaded hardware 10 virtual I/O processors fixed interleave on simple pipeline pipeline has 100ns cycle time each processor executes one instruction every 1000ns accumulator-based instruction set to reduce processor state 24

25 Denelcor HEP (Burton Smith, 1982) First commercial machine to use hardware threading in main CPU 120 threads per processor 10 MHz clock rate Up to 8 processors precursor to Tera MTA (Multithreaded Architecture) 25

26 Tera MTA (Cray, 1997) Up to 256 processors Up to 128 active threads per processor Processors and memory modules populate a sparse 3D torus interconnection fabric Flat, shared main memory No data cache Sustains one main memory access per cycle per processor 260MHz 26

27 Tera MTA (Cray) Each processor supports 128 active hardware threads 128 SSWs, 1024 target registers, 4096 general-purpose registers Every cycle, one instruction from one active thread is launched into pipeline Instruction pipeline is 21 cycles long At best, a single thread can issue one instruction every 21 cycles Clock rate is 260MHz, effective single thread issue rate is 260/21 = 12.4MHz 27

28 Multithreading Coarse-grain multithreading switch only if current thread has a costly stall E.g., level-2 cache miss can accommodate slightly costlier switches less likely to slow down an individual thread a thread is switched off only when it has a costly stall main disadvantage: limited in ability to overcome throughput losses shorter stalls are ignored, and there may be plenty of those issues instructions from a single thread every switch involves emptying and restarting the instruction pipeline 28

29 IBM PowerPC RS64-III (Pulsar) Commercial coarse-grain multithreading CPU Based on PowerPC with quad-issue in-order five stage pipeline Each physical CPU supports two virtual CPUs On L2 cache miss, pipeline is flushed and execution switches to second thread short pipeline minimizes flush penalty (4 cycles), small compared to memory access latency flush pipeline to simplify exception handling 29

30 Simultaneous Multithreading (SMT) Key Idea: Exploit ILP across multiple threads! Share CPU to multiple threads i.e., convert thread-level parallelism into more ILP exploit following features of modern processors: multiple functional units modern processors typically have more functional units available than a single thread can utilize register renaming and dynamic scheduling multiple instructions from independent threads can co-exist and co-execute! 30

31 Multithreading: Illustration (a) (b) (c) (d) (a) A superscalar processor with no multithreading (b) A superscalar processor with coarse-grain multithreading (c) A superscalar processor with fine-grain multithreading (d) A superscalar processor with simultaneous multithreading (SMT) 31

32 From Superscalar to SMT SMT is an out-of-order superscalar extended with hardware to support multiple executing threads 32

33 Simultaneous Multithreaded Processor 33

34 Simultaneous Multithreaded Processor Add multiple contexts and fetch engines to wide out-oforder superscalar processor [Tullsen, Eggers, Levy, University of Washington, 1995] OOO instruction window already has most of the circuitry required to schedule from multiple threads Any single thread can utilize whole machine First examples: Alpha (DEC/Compaq) Pentium IV (Intel) Power 5 (IBM) Ultrasparc IV (Sun) 34

35 SMT: Design Challenges Dealing with a large register file needed to hold multiple contexts Maintaining low overhead on clock cycle fast instruction issue: choosing what to issue instruction commit: choosing what to commit keeping cache conflicts within acceptable bounds Power hungry! 35

36 Intel Pentium-4 Processor Hyperthreading = SMT Dual physical processors, each 2-way SMT Logical processors share nearly all resources of the physical processor Caches, execution units, branch predictors Die area overhead of hyperthreading ~5 % When one logical processor is stalled, the other can make progress No logical processor can use all entries in queues when two threads are active A processor running only one active software thread to run at the same speed with or without hyperthreading 36

37 Pentium 4 Micro-architecture 400 MHz System Bus Advanced Dynamic Execution Rapid Execution Engine Advanced Transfer Cache Hyper Pipelined Technology Execution Trace Cache Enhanced Floating Point / Multi-Media Streaming SIMD Extensions 2 37

38 Pentium 4 Micro-architecture What hardware complexity does OoO and SMT incur in? Advanced Dynamic Execution Hyper Pipelined Technology 38

39 Sun/Oracle Ultrasparc T5 (2013) 16 Core 3,6 Ghz 8 threads/core (128 T/Chip) X Core: 2-way OoO 16 KB I$ 16 KB D$ 128 KB L2 8 MB L3 28nm PD,

40 IBM Power 7 PD,

41 VLIW Very Long Instruction Word: Compiler packs a fixed number of operations into a single VLIW instruction. The operations within a VLIW instruction are issued and executed in parallel. Example: High-end signal processors (TMS320C6201) Intel s Itanium Transmeta Crusoe, Efficeon 41

42 VLIW VLIW (very long instruction word) processors use a long instruction word that contains a usually fixed number of operations that are fetched, decoded, issued, and executed synchronously. All operations specified within a VLIW instruction must be independent of one another. Some of the key issues of a (V)LIW processor: (very) long instruction word (up to bits per instruction), each instruction consists of multiple independent parallel operations, each operation requires a statically known number of cycles to complete, a central controller that issues a long instruction word every cycle, multiple FUs connected through a global shared register file. 42

43 VLIW and Superscalar sequential stream of long instruction words instructions scheduled statically by the compiler number of simultaneously issued instructions is fixed during compile-time instruction issue is less complicated than in a superscalar processor Disadvantage: VLIW processors cannot react on dynamic events, e.g. cache misses, with the same flexibility like superscalars. The number of instructions in a VLIW instruction word is usually fixed. Padding VLIW instructions with no-ops is needed in case the full issue bandwidth is not be met. This increases code size. More recent VLIW architectures use a denser code format which allows to remove the no-ops. VLIW is an architectural technique, whereas superscalar is a microarchitecture technique. VLIW processors take advantage of spatial parallelism. 43

44 VLIW and Superscalar Superscalar RISC solution Based on sequential execution semantics Compiler s role is limited by the instruction set architecture Superscalar hardware identifies and exploits parallelism VLIW solution Based on parallel execution semantics VLIW ISA enhancements support static parallelization Compiler takes greater responsibility for exploiting parallelism Compiler / hardware collaboration often resembles superscalar 44

45 VLIW and Superscalar Advantages of pursuing VLIW architectures Make wide issue & deep latency less expensive in hardware Allow processor parallelism to scale with additional VLSI density Architect the processor to do well with in-order execution Enhance the ISA to allow static parallelization Use compiler technology to parallelize program Loop Unrolling, Software Pipelining,... However, a purely static VLIW is not appropriate for generalpurpose use 45

46 Examples Intel Itanium Transmeta Crusoe Almost all DSPs Texas Instruments ST Microelectronics 46

47 Intel Itanium, Itanium 2 47

48 IA-64 Encoding Source: Intel/HP IA-64 Application ISA Guide

49 IA-64 Templates Source: Intel/HP IA-64 Application ISA Guide

50 Intel's IA-64 ISA Intel 64-bit Architecture (IA-64) register model: bit general purpose registers GR0-GR127 to hold values for integer and multimedia computations each register has one additional NaT (Not a Thing) bit to indicate whether the value stored is valid bit floating-point registers FR0-FR127 registers f0 and f1 are read-only with values +0.0 and +1.0, 64 1-bit predicate registers P0-PR63 the first register p0 is read-only and always reads 1 (true) 8 64-bit branch registers BR0-BR7 to specify the target addresses of indirect branches 50

51 Transmeta Crusoe i Efficeon 51

52 Overview HW/SW system for executing x86 code VLIW processor Code Morphing Software Underlying ISA and details invisible convenient level of indirection upgrades, fixes, freedom for changes as long as new CMS is implemented anything else? 52

53 VLIW CPU Simple in-order, very few interlocks TM5400, 7 million transistors, 7 stage pipeline low power, easier (and cheaper) to design TM5800 <=1GHz, 64KB L1, 512KB L MHz, V running typ mm app 53

54 Crusoe vs. PIII mobile (temperature) 54

55 VLIW CPU RISC-like ISA molecule(long instruction) 2 or 4 atoms (RISC-like instruction) slot distribution? 64 gprs and 32 fprs dedicated regs for x86 architectural regs 128-bit molecule FADD ADD LD BRCC Floating point unit INT unit 1 INT unit 2 Load/Store unit Branch unit 55

56 VLIW Conclusions Reduces hardware complexity at the cost of increasing compiler complexity Good for DSPs Not so good for GPPs (so far?) 56

57 Multiprocessors Conclusions Conventional superscalars are reaching ILP s limits exploit TLP or PLP Already known technology Multithreading Good for extensive use of superscalar cores More efficient than MP but more complex too 57

Lecture 14: Multithreading

Lecture 14: Multithreading CS 152 Computer Architecture and Engineering Lecture 14: Multithreading John Wawrzynek Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~johnw

More information

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading CS 152 Computer Architecture and Engineering Lecture 18: Multithreading Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

CS 152 Computer Architecture and Engineering. Lecture 14: Multithreading

CS 152 Computer Architecture and Engineering. Lecture 14: Multithreading CS 152 Computer Architecture and Engineering Lecture 14: Multithreading Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

Multithreading Processors and Static Optimization Review. Adapted from Bhuyan, Patterson, Eggers, probably others

Multithreading Processors and Static Optimization Review. Adapted from Bhuyan, Patterson, Eggers, probably others Multithreading Processors and Static Optimization Review Adapted from Bhuyan, Patterson, Eggers, probably others Schedule of things to do By Wednesday the 9 th at 9pm Please send a milestone report (as

More information

Multithreading and the Tera MTA. Multithreading for Latency Tolerance

Multithreading and the Tera MTA. Multithreading for Latency Tolerance Multithreading and the Tera MTA Krste Asanovic krste@lcs.mit.edu http://www.cag.lcs.mit.edu/6.893-f2000/ 6.893: Advanced VLSI Computer Architecture, October 31, 2000, Lecture 6, Slide 1. Krste Asanovic

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls

More information

Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-742 Fall 2012, Parallel Computer Architecture, Lecture 9: Multithreading

More information

Multithreading: Exploiting Thread-Level Parallelism within a Processor

Multithreading: Exploiting Thread-Level Parallelism within a Processor Multithreading: Exploiting Thread-Level Parallelism within a Processor Instruction-Level Parallelism (ILP): What we ve seen so far Wrap-up on multiple issue machines Beyond ILP Multithreading Advanced

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 14: Multithreading Part 2 Synchronization 1

CS252 Spring 2017 Graduate Computer Architecture. Lecture 14: Multithreading Part 2 Synchronization 1 CS252 Spring 2017 Graduate Computer Architecture Lecture 14: Multithreading Part 2 Synchronization 1 Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Module 18: TLP on Chip: HT/SMT and CMP Lecture 39: Simultaneous Multithreading and Chip-multiprocessing TLP on Chip: HT/SMT and CMP SMT TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

Multithreaded Processors. Department of Electrical Engineering Stanford University

Multithreaded Processors. Department of Electrical Engineering Stanford University Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread

More information

CS252 Graduate Computer Architecture Spring 2014 Lecture 13: Mul>threading

CS252 Graduate Computer Architecture Spring 2014 Lecture 13: Mul>threading CS252 Graduate Computer Architecture Spring 2014 Lecture 13: Mul>threading Krste Asanovic krste@eecs.berkeley.edu http://inst.eecs.berkeley.edu/~cs252/sp14 Last Time in Lecture 12 Synchroniza?on and Memory

More information

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor Single-Issue Processor (AKA Scalar Processor) CPI IPC 1 - One At Best 1 - One At best 1 From Single-Issue to: AKS Scalar Processors CPI < 1? How? Multiple issue processors: VLIW (Very Long Instruction

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor 1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

Lecture 9: Multiple Issue (Superscalar and VLIW)

Lecture 9: Multiple Issue (Superscalar and VLIW) Lecture 9: Multiple Issue (Superscalar and VLIW) Iakovos Mavroidis Computer Science Department University of Crete Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

Parallel Processing SIMD, Vector and GPU s cont.

Parallel Processing SIMD, Vector and GPU s cont. Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

Lecture-13 (ROB and Multi-threading) CS422-Spring

Lecture-13 (ROB and Multi-threading) CS422-Spring Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue

More information

Lecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor

Lecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor Lecture 12 Architectures for Low Power: Transmeta s Crusoe Processor Motivation Exponential performance increase at a low cost However, for some application areas low power consumption is more important

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III/VI Section : CSE-1 & CSE-2 Subject Code : CS2354 Subject Name : Advanced Computer Architecture Degree & Branch : B.E C.S.E. UNIT-1 1.

More information

Intel released new technology call P6P

Intel released new technology call P6P P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

THREAD LEVEL PARALLELISM

THREAD LEVEL PARALLELISM THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Dec. 11 th This lecture

More information

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,

More information

Exploring different level of parallelism Instruction-level parallelism (ILP): how many of the operations/instructions in a computer program can be performed simultaneously 1. e = a + b 2. f = c + d 3.

More information

Exploitation of instruction level parallelism

Exploitation of instruction level parallelism Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?) Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

Lecture 26: Parallel Processing. Spring 2018 Jason Tang Lecture 26: Parallel Processing Spring 2018 Jason Tang 1 Topics Static multiple issue pipelines Dynamic multiple issue pipelines Hardware multithreading 2 Taxonomy of Parallel Architectures Flynn categories:

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Multi-{Socket,,Thread} Getting More Performance Keep pushing IPC and/or frequenecy Design complexity (time to market) Cooling (cost) Power delivery (cost) Possible, but too

More information

TDT 4260 lecture 7 spring semester 2015

TDT 4260 lecture 7 spring semester 2015 1 TDT 4260 lecture 7 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Repetition Superscalar processor (out-of-order) Dependencies/forwarding

More information

Handout 2 ILP: Part B

Handout 2 ILP: Part B Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP

More information

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism

More information

Simultaneous Multithreading Architecture

Simultaneous Multithreading Architecture Simultaneous Multithreading Architecture Virendra Singh Indian Institute of Science Bangalore Lecture-32 SE-273: Processor Design For most apps, most execution units lie idle For an 8-way superscalar.

More information

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University 18-742 Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core Prof. Onur Mutlu Carnegie Mellon University Research Project Project proposal due: Jan 31 Project topics Does everyone have a topic?

More information

Keywords and Review Questions

Keywords and Review Questions Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

EECS 470. Lecture 18. Simultaneous Multithreading. Fall 2018 Jon Beaumont

EECS 470. Lecture 18. Simultaneous Multithreading. Fall 2018 Jon Beaumont Lecture 18 Simultaneous Multithreading Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi,

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

Introducing Multi-core Computing / Hyperthreading

Introducing Multi-core Computing / Hyperthreading Introducing Multi-core Computing / Hyperthreading Clock Frequency with Time 3/9/2017 2 Why multi-core/hyperthreading? Difficult to make single-core clock frequencies even higher Deeply pipelined circuits:

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer Architecture Spring 24 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 2: More Multiprocessors Computation Taxonomy SISD SIMD MISD MIMD ILP Vectors, MM-ISAs Shared Memory

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

anced computer architecture CONTENTS AND THE TASK OF THE COMPUTER DESIGNER The Task of the Computer Designer

anced computer architecture CONTENTS AND THE TASK OF THE COMPUTER DESIGNER The Task of the Computer Designer Contents advanced anced computer architecture i FOR m.tech (jntu - hyderabad & kakinada) i year i semester (COMMON TO ECE, DECE, DECS, VLSI & EMBEDDED SYSTEMS) CONTENTS UNIT - I [CH. H. - 1] ] [FUNDAMENTALS

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov

Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov Dealing With Control Hazards Simplest solution to stall pipeline until branch is resolved and target address is calculated

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

Processor Architecture and Interconnect

Processor Architecture and Interconnect Processor Architecture and Interconnect What is Parallelism? Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds. Parallel Processing

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University Advanced d Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

Parallelism, Multicore, and Synchronization

Parallelism, Multicore, and Synchronization Parallelism, Multicore, and Synchronization Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon, Bala, Bracy, McKee, and Sirer, Roth, Martin] xkcd/619 3 Big Picture: Multicore

More information

Multiple Issue ILP Processors. Summary of discussions

Multiple Issue ILP Processors. Summary of discussions Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware

More information

CS 1013 Advance Computer Architecture UNIT I

CS 1013 Advance Computer Architecture UNIT I CS 1013 Advance Computer Architecture UNIT I 1. What are embedded computers? List their characteristics. Embedded computers are computers that are lodged into other devices where the presence of the computer

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

Simultaneous Multithreading on Pentium 4

Simultaneous Multithreading on Pentium 4 Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on

More information

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

UNIT I (Two Marks Questions & Answers)

UNIT I (Two Marks Questions & Answers) UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-

More information

Beyond ILP. Hemanth M Bharathan Balaji. Hemanth M & Bharathan Balaji

Beyond ILP. Hemanth M Bharathan Balaji. Hemanth M & Bharathan Balaji Beyond ILP Hemanth M Bharathan Balaji Multiscalar Processors Gurindar S Sohi Scott E Breach T N Vijaykumar Control Flow Graph (CFG) Each node is a basic block in graph CFG divided into a collection of

More information

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define

More information

CS 654 Computer Architecture Summary. Peter Kemper

CS 654 Computer Architecture Summary. Peter Kemper CS 654 Computer Architecture Summary Peter Kemper Chapters in Hennessy & Patterson Ch 1: Fundamentals Ch 2: Instruction Level Parallelism Ch 3: Limits on ILP Ch 4: Multiprocessors & TLP Ap A: Pipelining

More information

Control Hazards. Branch Prediction

Control Hazards. Branch Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor

More information

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) #1 Lec # 2 Fall 2003 9-10-2003 Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

" # " $ % & ' ( ) * + $ " % '* + * ' "

 #  $ % & ' ( ) * + $  % '* + * ' ! )! # & ) * + * + * & *,+,- Update Instruction Address IA Instruction Fetch IF Instruction Decode ID Execute EX Memory Access ME Writeback Results WB Program Counter Instruction Register Register File

More information

Online Course Evaluation. What we will do in the last week?

Online Course Evaluation. What we will do in the last week? Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Control Hazards. Prediction

Control Hazards. Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

Chapter 4 The Processor 1. Chapter 4D. The Processor

Chapter 4 The Processor 1. Chapter 4D. The Processor Chapter 4 The Processor 1 Chapter 4D The Processor Chapter 4 The Processor 2 Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline

More information

Hyperthreading Technology

Hyperthreading Technology Hyperthreading Technology Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville milenka@ece.uah.edu www.ece.uah.edu/~milenka/ Outline What is hyperthreading?

More information

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Advanced Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

EEC 581 Computer Architecture. Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW)

EEC 581 Computer Architecture. Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW) 1 EEC 581 Computer Architecture Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW) Chansu Yu Electrical and Computer Engineering Cleveland State University Overview

More information