Advanced Computer Architectures CC721

Size: px
Start display at page:

Download "Advanced Computer Architectures CC721"

Transcription

1 Advanced Computer Architectures CC721 Magdy Saeb Company LOGO Arab Academy for Science, Technology & Maritime Transport

2 Ad. Comp Arch. CC721 Deeper understanding of; Computer Architecture concepts design trade-offs for cost/performance Advanced Architectures trends for the future Why? match/choose hardware and software to solve a problem design better software (for many programmers) design better hardware (for a chosen few)

3 Ad. Comp Arch.CC721 Course planning 8 Lectures 2 Written exams, project, report and presentation

4 Course Outline Course objectives: This course gives a thorough knowledge in advanced computer architecture concepts, parallel architectures and parallel processing. The main aim is to develop the students research skills and knowledge in the state-of-the-art architectures. This topic is strongly related to areas like: computer graphics acceleration, cryptography, coding, hardware design, etc Department Home page: ( Here you find many course handouts, VHDL lectures, solution of homework problems, and sample exams) Topics: 1. Course Overview, Computational Models, 2. ILP-Processors, Instruction Set, Area, Cost, 3. Pipelined Processors, 4. VLIW Processors, 5. Superscalar Processors, 6. Code Scheduling for ILP-Processors, 7. Branch Processing, 8. SIMD Architectures, 9. MIMD Architectures, Memory Systems, 10. Dataflow Architecture 11. Processor-in-Memory Architecture

5 Texts & Grading Text: Advanced Computer Architectures: A Design Space Approach, Addison-Wesley, References: J.L. Hennessy, D. A. Patterson, Computer Architecture, 3rd Edition, Morgan Kauffman, Kai Hwang, Advanced Computer Architecture: Parallelism, Scalability, Programmability, McGraw-Hill, Grading: Homework 10% Project 20% Midterm1 30% Final 40% Lecturer: Magdy Saeb, Ph.D..

6 Ad. Comp Arch.CC721 Main text: Advanced Computer Architectures: A Design Space Approach Sima, Fountain, Kacsuk Supplementary text: Computer Architecture: A Quantitative Approach Hennesey, Pattersson

7 Ad. Comp Arch.CC721 See for information on: News Lectures Sample Exams Lab Status Grading

8 Computational Models Company LOGO Arab Academy for Science, Technology & Maritime Transport

9 Advanced Computer Architectures: Part I Computational Models The Concept of Computer Architecture Introduction To Parallel Processing Sima, et al. introduce a design space approach to Computer Architecture (design aspects are broken down to atoms or tiny pieces).

10 Computational Models Computational Model Turing Von Neumann Language Class 0-language: Assembly Imperative: C-language Architectural Class Von Neumann Data Flow Single Assignment Data Flow Applicative Predicate Logicbased Object-oriented Functional: LISP Logic Programming: Prolog Object-oriented: C++ Reduction N/A Object-oriented

11 Part I, Computational Models Turing, Typ0 language Infinite memory, not feasible Von Neumann, Imperative (C) Traditional architecture, Control/Memory Finite State Machine (FSM) Multiple Assignment gives side effects Sequential in nature Control statements

12 Part I, Computational Models Dataflow, Single Assignment language Dataflow machines Applicative, Functional (Haskell/ML) Reduction machines Object Based, Object Oriented (C++) Object oriented computers (similar to Von Neumann however depend on message passing) Predicate Logic Based, Logic Based (Prolog) Has Not been realized

13 Part I, Computational Models Computational models can be emulated on Von Neumann machines. Hard to beat on cost/performance!

14 Part I, Sima, The Concept of Computer Architecture Abstract architecture Deals with functional specification For example: programmers model/instruction set Concrete architecture Deals with aspects of the implementation For example: logic design as block diagram of functional units

15 Part I, The Concept of Computer Architecture DS Trees to define design space con:consist of pex:can be exclusively performed by per:can be performed by example=con(pex(a,b),per(c,con(d,e)) A B C D E

16 Process/Process trees/threads Process Control Block (PCB) Resource mapping per process Threads/light weight processes, inherits/shares resources Concurrent/Parallel execution Concurrent (time sliced) Multi threaded architectures Parallel (multiple CPUs) Part I, Introduction to Parallel Processing Parallel architectures, multi processors, multi computers (clusters)

17 Part I, Introduction to Parallel Processing Flynn s Classification Types of Parallelism Available, inherent in problem Utilized by architecture implementation Functional, from problem solution (usually irregular) ILP, multi-threading, MIMD Data, from computations (regular, like vectors ) SIMD SISD SIMD MISD MIMD

18 Introduction to Instruction-Level Parallelism (ILP) Company LOGO Arab Academy for Science, Technology & Maritime Transport

19 Introduction to Instruction-Level Parallelism (ILP) Traditional Von Neumann Processors (sequential issue, sequential execution) Scalar ILP Processors (sequential issue, parallel execution) Parallelism of Instruction Execution SuperScalar ILP Processors (parallel issue, parallel execution) Parallelism of Instruction Issue typical implementation Non-pipelined Processors Processors with multiple nonpipelined EUs and pipelined processors VLSI and superscalar processors with multiple pipelined EUs.

20 Some Definitions A pipelined processor : Has instruction level parallelism by having one instruction in each stage of the pipeline An execution unit (EU) is a block that performs some function which helps complete an instruction : Integer ALU, Floating Point Unit (FPU), Branch Unit (BU), Load Store Unit are examples of execution units.

21 Methods of achieving parallelism There are two major methods of achieving parallelism: Pipelining Replication

22 More Definitions A superscalar processor : Issues multiple instructions per clock cycle from a sequential stream Dynamic scheduling of execution units (scheduling done in hardware) An Very Long Instruction Word (VLIW) processor : Issues one very wide instruction per clock cycle; this instruction contains multiple operations Static scheduling of execution units (done by compiler).

23 Pipelined vs. VLIW/Superscalar Pipelined operation Parallel operation EU1 EU2 EU3 Pipelined Processors EU1 EU2 EU3 VLIW and Superscalar processors Execution units in VLIW and Superscalar processors can be pipelined!

24 Typical Pipeline Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load Ifetch Reg/Dec Exec Mem Wr Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: Calculate the memory address Mem: Read the data from the Data Memory Wr: Write the data back to the register file

25 Execution Units can be pipelined (PowerPC 601 Example) Branch Instructions Fetch Issue Decode Execute Predict Integer Instructions Fetch Issue Decode Execute Writeback Load/Store Instructions Fetch Issue Decode Addr Gen Cache Writeback

26 Execution Units can be pipelined (PowerPC 601 Example) (cont.) FP Instructions Fetch Issue Decode Execute 1 Execute 2 Writeback

27 Data Dependencies Data Dependencies present problems for instruction level parallelism Types of data dependencies: Straight line code Read After Write (RAW) Write After Read (WAR) Write After Write (WAW) Loops Recurrence or inter-iteration dependencies

28 Straight Line Dependencies Read After Write (RAW) i1: load r1, a; i2: add r2, r1, r1; Assume a pipeline of Fetch/Decode/Execute/Mem/Writeback When add is in the DECODE stage (which fetches r1), the load is in the EXECUTE stage and the true value of r1 has not been fetched yet! (r1 is fetched in the Mem stage) Solve this by either stalling the add until the value of r1 is ready, or by forwarding the value of r1 from the Mem stage to the Execute stage.

29 Write after Read (WAR) Straight Line Dependencies (cont) i1: mul r1, r2, r3; r1 <= r2 * r3 i2: add r2, r4, r5; r2 <= r4 + r5 If instruction i2 (add) is executed before instruction i1 (mul) for some reason, then i1 (mul) could read the wrong value for r2. One reason for delaying i1 would be a stall for the r3 value being produced by a previous instruction. Instruction i2 could proceed because it has all its operands, thus causing the WAR hazard. Use register renaming to eliminate WAR dependency. Replace r2 with some other register that has not been used yet.

30 Write after Write (WAW) Straight Line Dependencies (cont.) i1: mul r1, r2, r3; r1 <= r2 * r3 i2: add r1, r4, r5; r2 <= r4 + r5 If instruction i1 (mul) finishes AFTER instruction i2 (add), then register r1 would get the wrong value. Instruction i1 could finish after instruction i2 if separate execution units were used for instructions i1 and i2. One way to solve this hazard is to simply let instruction i1 proceed normally, but disable its write stage.

31 Loop Dependencies Recurrences: do I = 2, n X(I) = A * X(I-1) + B; enddo One way to parallelize this loop would be to unroll this loop (create (N-2) copies of the loop). However, a dependency exists between the current X value and the previous loop value, so loop unrolling will not give us anymore parallelism. This type of data dependency cannot be solved at the implementation level, but must be addressed at the compiler level.

32 Control Dependencies Control Dependencies (i.e. branches) are a major obstacle to instruction level parallelism In a pipelined machine, normally have branch condition computation done as EARLY as possible in the pipeline in order to lessen the impact of incorrect branch prediction (taken or not taken) Conditional branch instructions are 20% for general purpose code, 5-10% for scientific code.

33 Branch Strategies Static Always predict taken or not-taken Dynamic Keep a history of code execution and modify predictions based on execution history Multi-way Execute both branch paths and kill incorrect path as soon as branch condition is resolved.

34 Control Dependency Graph i0 i0: r1 = op1; i1: r2 = op2; i2: r3 = op3; i3: if (r2 > r1) { i4: if (r3 > r1) { i5: r4 = r3; i6: else r4 = r1 } i7: } else r4 = r2; i8: r5 = r4 * r4; i5 i4 i1 i2 i3 i6 i7 i8

35 Resource Dependencies A resource dependency is when an instruction requires a hardware resource being used by a previously issued instruction (also known as structural hazard) Execution Units, Busses (e.g, external address/data bus) A resource dependency can only be solved by resource duplication The Harvard architecture has separate address/data busses for instructions and data

36 Instruction Scheduling Instruction Scheduling is the assignment of instructions to hardware resources. Hardware resources are busses, registers, and execution units Static scheduling is done by compiler or by human Hardware assumes that ALL hazards have been eliminated. Lessens the amount of control logic needed which hopefully speeds up maximum clock speed

37 Instruction Scheduling (cont). Dynamic Scheduling is implemented in hardware inside of processor. All instruction streams are legal Control logic and hardware resources needed for dynamic scheduling can be significant. If trying to execute legacy code streams, then dynamic scheduling may be the only option.

38 Pipelined Processors Company LOGO Arab Academy for Science, Technology & Maritime Transport

39 Definitions FX pipeline - Fixed point pipeline (integer pipeline) FP pipeline - Floating Point pipeline Cycle time - length of clock period for pipeline, determined by slowest stage. Latency used in referenced to RAW hazards - the amount of time that a result of a particular instruction takes to become available in the pipeline for a subsequent dependent instruction (measured in multiples of clock cycles)

40 RAW Dependency, Latencies Define-use Latency is the time delay after decoding and issue of an instruction until the result becomes available for a subsequent RAW dependent instruction. add r1, r2,r3 add r5, r1, r6 define-use dependency Usually one cycle for simple instructions. Define-use Delay of an instruction is the time a subsequent RAW-dependent instruction has to be stalled in the pipeline. It is one less cycle than the define-use latency.

41 RAW Dependency, Latencies (cont) If define-use latency = 1, then define-use delay is 0 and the pipeline is not stalled. This is the case for most simple instructions in the FX pipeline Non-pipelined FP operations can have define-use latencies from a few cycles to a 10 s of cycles. Load-use dependency, Load-use latency, loaduse delay refer to load instructions load r1, 4(r2) add r3, r1, r2 Definitions are the same as define-use dependency, latency, and delay.

42 More Definitions Repetition Rate R (throughput) - shortest possible time interval between subsequent independent instructions in the pipeline Performance Potential of a Pipeline - the number of independent instructions which can be executed in a unit interval of time: P = 1 / (R * t c ) R: repetition rate in clock cycles t c : cycle time of the pipeline

43 Table 5.1 from Text (latency/repetition rate) Processor CycleTime Prec Fadd FMult FDiv Fsqrt a /5/2 s 6/1 6/ p 6/1 6/ Pentium 6/5/3.3 s 3/1 3/ d 3/1 3/ Pentium Pro 6.7/5/3.3 s 3/1 5/ d 3/1 5/2 HP PA s 3/1 3/ d 3/1 3/ SuperSparc 20/17 s 1/1 3/1 6/4 8/6 d 9/7 12/10

44 How many stages? The more stages, the less combinational logic within a stage, the higher the possible clock frequency More stages can complicate control. Dec Alpha has 7 stages for FX instructions, and these instructions have a define-use delay of one cycle for even basic FX instructions Becomes difficult to divide up logic evenly between stages Clock skew between stages becomes more difficult Diminishing returns as stages become large Superpipelining is a term used for processors that use a high number of stages.

45 Dedicated Pipelines versus Multifunctional Pipelines Trend in current high performance CPUs is to used different logical AND physical pipelines for different instruction classes FX pipeline (integer) FP pipeline (floating point) L/S pipeline (Load/Store) B pipeline (Branch) Allows more concurrency, more optimization Silicon area more plentiful

46 Sequential Consistency With multiple pipelines, how do we maintain sequential consistency when instructions are finishing at different times? With just two pipelines (FX and FP), we can lengthen the shorter pipeline with statically or dynamically. Dynamic lengthening would be used only when hazards are detected. We can force the pipelines to write to a special unit called a Renaming Buffer or Reordering Buffer. It is the job of this unit to maintain sequential consistency. Will look at this in detail in Chapter 7 (superscalar).

47 RISC versus CISC pipelines Pipelines for CISC are required to handle complex memory to register addressing mov r4, (r3, r2)4 EA is r3 + r2 + 4 Will have an extra stage for Effective address calculation (see Figures 5.40, 5.41, 5.43) Some CISC pipelines avoid a load-use delay penalty (Fig 5.54, 5.56) RISC pipelines have a load-use penalty of at least one Determining load-use penalties when multiple pipelines are in action are instruction sequence dependent (ie., 1, 2, more than 2 cycles)

48 Some other important Figures in Chapter 5 Figure 5.26 (illustrates use of both clock phases for performing pipeline tasks) Figure 5.31, Figure 5.32 (Pentium Pipeline, shows difference between logical and physical pipelines) Figure 5.33, Figure 5.34 (PowerPC first look at a modern superscalar processor)

49 CC721 Computing Systems Part 3: VLIW Architecture Company LOGO Arab Academy for Science, Technology & Maritime Transport

50 Basic Working Principles of VLIW Aim at speeding up computation by exploiting instruction-level parallelism. Same hardware core as superscalar processors, having multiple execution units (EUs) working in parallel. An instruction is consisted of multiple operations; typical word length from 52 bits to 1 Kbits. All operations in an instruction are executed in a lock-step mode. One or multiple register files for FX and FP data. Rely on compiler to find parallelism and schedule dependency free program code.

51 Basic VLIW Approach

52 Register File Structure for VLIW What is the challenge to register file in VLIW? R/W ports

53 Differences Between VLIW & Superscalar Architecture (I)

54 nces Between VLIW & Superscalar Architect Instruction formulation: Superscalar: Receive conventional instructions conceived for seq. processors. VLIW: Receive (very) long instruction words, each comprising a field (or opcode) for each execution unit. Instruction word length depends (a) number of execution units, and (b) code length to control each unit (such as opcode length, register names, ). Typical word length is bits, much longer than conventional machine word length.

55 nces Between VLIW & Superscalar Architect Instruction scheduling: Superscalar: Performed dynamically at run-time by the hardware. Data dependency is checked and resolved in hardware. Need a look-ahead hardware window for instruction fetch.

56 Differences Between VLIW & Superscalar Architecture (IV) Instruction scheduling (cont d): VLIW: Static scheduling done at compile-time by the compiler. Advantages: Reduce hardware complexity. Tasks such as decoding, data dependency detection, instruction issue,, etc. becoming simple. Potentially higher clock rate. Higher degree of parallelism with global program information.

57 nces Between VLIW & Superscalar Architect Instruction scheduling (cont d): VLIW: Disadvantages Higher complexity of the compiler. Compiler optimization needs to consider technology dependent parameters such as latencies and load-use time of cache. (Question: What happens to the software if the hardware is updated?) Non-deterministic problem of cache misses, resulting in worst case assumption for code scheduling. In case of un-filled opcodes in a (V)LIW, memory

58 Development history of Proposed/Commercial VLIWs

59 Case Study of VLIW: Trace 200 Family (I)

60 Case Study of VLIW: Trace 200 Family (II) Only two branches might be used in Trace 7/2000

61 Code Expansion in VLIW It is found that code in VLIW is expanded roughly by a factor of three. For long VLIW, more opcode fields will be emptied. This will result in wasting bandwidth and storage space. Can you propose a solution for it?

Superscalar Processors. Company LOGO

Superscalar Processors. Company LOGO Superscalar Processors Company LOGO Superscalar Processors Goal is to execute an arbitrary instruction stream and to achieve a clocks per instruction (CPI) of < 1.0 Usually, all instruction streams are

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Course on Advanced Computer Architectures

Course on Advanced Computer Architectures Surname (Cognome) Name (Nome) POLIMI ID Number Signature (Firma) SOLUTION Politecnico di Milano, July 9, 2018 Course on Advanced Computer Architectures Prof. D. Sciuto, Prof. C. Silvano EX1 EX2 EX3 Q1

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest

More information

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

Instruction-Level Parallelism and Its Exploitation

Instruction-Level Parallelism and Its Exploitation Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic

More information

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer

More information

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor 1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Pipelining and Vector Processing Chapter 8 S. Dandamudi Outline Basic concepts Handling resource conflicts Data hazards Handling branches Performance enhancements Example implementations Pentium PowerPC

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism

More information

Lecture 9: Multiple Issue (Superscalar and VLIW)

Lecture 9: Multiple Issue (Superscalar and VLIW) Lecture 9: Multiple Issue (Superscalar and VLIW) Iakovos Mavroidis Computer Science Department University of Crete Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

Lecture 26: Parallel Processing. Spring 2018 Jason Tang Lecture 26: Parallel Processing Spring 2018 Jason Tang 1 Topics Static multiple issue pipelines Dynamic multiple issue pipelines Hardware multithreading 2 Taxonomy of Parallel Architectures Flynn categories:

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors William Stallings Computer Organization and Architecture 8 th Edition Chapter 14 Instruction Level Parallelism and Superscalar Processors What is Superscalar? Common instructions (arithmetic, load/store,

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard. COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not

More information

E0-243: Computer Architecture

E0-243: Computer Architecture E0-243: Computer Architecture L1 ILP Processors RG:E0243:L1-ILP Processors 1 ILP Architectures Superscalar Architecture VLIW Architecture EPIC, Subword Parallelism, RG:E0243:L1-ILP Processors 2 Motivation

More information

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review Hazards (data/name/control) RAW, WAR, WAW hazards Different types

More information

Spring 2014 Midterm Exam Review

Spring 2014 Midterm Exam Review mr 1 When / Where Spring 2014 Midterm Exam Review mr 1 Monday, 31 March 2014, 9:30-10:40 CDT 1112 P. Taylor Hall (Here) Conditions Closed Book, Closed Notes Bring one sheet of notes (both sides), 216 mm

More information

Lecture 8: RISC & Parallel Computers. Parallel computers

Lecture 8: RISC & Parallel Computers. Parallel computers Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

ILP: Instruction Level Parallelism

ILP: Instruction Level Parallelism ILP: Instruction Level Parallelism Tassadaq Hussain Riphah International University Barcelona Supercomputing Center Universitat Politècnica de Catalunya Introduction Introduction Pipelining become universal

More information

CISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3

CISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3 CISC 662 Graduate Computer Architecture Lecture 10 - ILP 3 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

SUPERSCALAR AND VLIW PROCESSORS

SUPERSCALAR AND VLIW PROCESSORS Datorarkitektur I Fö 10-1 Datorarkitektur I Fö 10-2 What is a Superscalar Architecture? SUPERSCALAR AND VLIW PROCESSORS A superscalar architecture is one in which several instructions can be initiated

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Pipelining 11142011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review I/O Chapter 5 Overview Pipelining Pipelining

More information

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor Single-Issue Processor (AKA Scalar Processor) CPI IPC 1 - One At Best 1 - One At best 1 From Single-Issue to: AKS Scalar Processors CPI < 1? How? Multiple issue processors: VLIW (Very Long Instruction

More information

Parallel computer architecture classification

Parallel computer architecture classification Parallel computer architecture classification Hardware Parallelism Computing: execute instructions that operate on data. Computer Instructions Data Flynn s taxonomy (Michael Flynn, 1967) classifies computer

More information

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 9 Instruction-Level Parallelism Part 2

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 9 Instruction-Level Parallelism Part 2 ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 9 Instruction-Level Parallelism Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Handout 2 ILP: Part B

Handout 2 ILP: Part B Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic

More information

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per

More information

ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism

ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University,

More information

TDT 4260 lecture 7 spring semester 2015

TDT 4260 lecture 7 spring semester 2015 1 TDT 4260 lecture 7 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Repetition Superscalar processor (out-of-order) Dependencies/forwarding

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Instruction Level Parallelism Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson /

More information

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle.

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. CS 320 Ch. 16 SuperScalar Machines A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. A superpipelined machine is one in which a

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Chapter 8 Pipelining and Vector Processing 8 1 If the pipeline stages are heterogeneous, the slowest stage determines the flow rate of the entire pipeline. This leads to other stages idling. 8 2 Pipeline

More information

Modern Processors. RISC Architectures

Modern Processors. RISC Architectures Modern Processors RISC Architectures Figures used from: Manolis Katevenis, RISC Architectures, Ch. 20 in Zomaya, A.Y.H. (ed), Parallel and Distributed Computing Handbook, McGraw-Hill, 1996 RISC Characteristics

More information

Superscalar Processors Ch 14

Superscalar Processors Ch 14 Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency? Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

UNIT I (Two Marks Questions & Answers)

UNIT I (Two Marks Questions & Answers) UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-

More information

Lecture-13 (ROB and Multi-threading) CS422-Spring

Lecture-13 (ROB and Multi-threading) CS422-Spring Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue

More information

EE382A Lecture 3: Superscalar and Out-of-order Processor Basics

EE382A Lecture 3: Superscalar and Out-of-order Processor Basics EE382A Lecture 3: Superscalar and Out-of-order Processor Basics Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 3-1 Announcements HW1 is due today Hand

More information

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation TDT4255 Lecture 9: ILP and speculation Donn Morrison Department of Computer Science 2 Outline Textbook: Computer Architecture: A Quantitative Approach, 4th ed Section 2.6: Speculation Section 2.7: Multiple

More information

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise CSCI 4717/5717 Computer Architecture Topic: Instruction Level Parallelism Reading: Stallings, Chapter 14 What is Superscalar? A machine designed to improve the performance of the execution of scalar instructions.

More information

A Key Theme of CIS 371: Parallelism. CIS 371 Computer Organization and Design. Readings. This Unit: (In-Order) Superscalar Pipelines

A Key Theme of CIS 371: Parallelism. CIS 371 Computer Organization and Design. Readings. This Unit: (In-Order) Superscalar Pipelines A Key Theme of CIS 371: arallelism CIS 371 Computer Organization and Design Unit 10: Superscalar ipelines reviously: pipeline-level parallelism Work on execute of one instruction in parallel with decode

More information

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) 18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures

More information

Metodologie di Progettazione Hardware-Software

Metodologie di Progettazione Hardware-Software Metodologie di Progettazione Hardware-Software Advanced Pipelining and Instruction-Level Paralelism Metodologie di Progettazione Hardware/Software LS Ing. Informatica 1 ILP Instruction-level Parallelism

More information

CSE502 Lecture 15 - Tue 3Nov09 Review: MidTerm Thu 5Nov09 - Outline of Major Topics

CSE502 Lecture 15 - Tue 3Nov09 Review: MidTerm Thu 5Nov09 - Outline of Major Topics CSE502 Lecture 15 - Tue 3Nov09 Review: MidTerm Thu 5Nov09 - Outline of Major Topics Computing system: performance, speedup, performance/cost Origins and benefits of scalar instruction pipelines and caches

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of

More information

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Dynamic Control Hazard Avoidance

Dynamic Control Hazard Avoidance Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>

More information

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University Advanced d Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23) Lecture Topics Today: Single-Cycle Processors (P&H 4.1-4.4) Next: continued 1 Announcements Milestone #3 (due 2/9) Milestone #4 (due 2/23) Exam #1 (Wednesday, 2/15) 2 1 Exam #1 Wednesday, 2/15 (3:00-4:20

More information

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?) Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Improve performance by increasing instruction throughput

Improve performance by increasing instruction throughput Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access

More information

EECC551 Exam Review 4 questions out of 6 questions

EECC551 Exam Review 4 questions out of 6 questions EECC551 Exam Review 4 questions out of 6 questions (Must answer first 2 questions and 2 from remaining 4) Instruction Dependencies and graphs In-order Floating Point/Multicycle Pipelining (quiz 2) Improving

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions

More information

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted

More information

INSTRUCTION LEVEL PARALLELISM

INSTRUCTION LEVEL PARALLELISM INSTRUCTION LEVEL PARALLELISM Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix H, John L. Hennessy and David A. Patterson,

More information

Like scalar processor Processes individual data items Item may be single integer or floating point number. - 1 of 15 - Superscalar Architectures

Like scalar processor Processes individual data items Item may be single integer or floating point number. - 1 of 15 - Superscalar Architectures Superscalar Architectures Have looked at examined basic architecture concepts Starting with simple machines Introduced concepts underlying RISC machines From characteristics of RISC instructions Found

More information

Superscalar Machines. Characteristics of superscalar processors

Superscalar Machines. Characteristics of superscalar processors Superscalar Machines Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any performance

More information

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise

More information

EE 4980 Modern Electronic Systems. Processor Advanced

EE 4980 Modern Electronic Systems. Processor Advanced EE 4980 Modern Electronic Systems Processor Advanced Architecture General Purpose Processor User Programmable Intended to run end user selected programs Application Independent PowerPoint, Chrome, Twitter,

More information

Processor: Superscalars Dynamic Scheduling

Processor: Superscalars Dynamic Scheduling Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any a performance

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

Dynamic Scheduling. CSE471 Susan Eggers 1

Dynamic Scheduling. CSE471 Susan Eggers 1 Dynamic Scheduling Why go out of style? expensive hardware for the time (actually, still is, relatively) register files grew so less register pressure early RISCs had lower CPIs Why come back? higher chip

More information

Superscalar Processors Ch 13. Superscalar Processing (5) Computer Organization II 10/10/2001. New dependency for superscalar case? (8) Name dependency

Superscalar Processors Ch 13. Superscalar Processing (5) Computer Organization II 10/10/2001. New dependency for superscalar case? (8) Name dependency Superscalar Processors Ch 13 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction 1 New dependency for superscalar case? (8) Name dependency (nimiriippuvuus) two use the same

More information

Computer Architecture Lecture 14: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013

Computer Architecture Lecture 14: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013 18-447 Computer Architecture Lecture 14: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013 Reminder: Homework 3 Homework 3 Due Feb 25 REP MOVS in Microprogrammed

More information

Chapter 06: Instruction Pipelining and Parallel Processing

Chapter 06: Instruction Pipelining and Parallel Processing Chapter 06: Instruction Pipelining and Parallel Processing Lesson 09: Superscalar Processors and Parallel Computer Systems Objective To understand parallel pipelines and multiple execution units Instruction

More information

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at

More information

ECE 587 Advanced Computer Architecture I

ECE 587 Advanced Computer Architecture I ECE 587 Advanced Computer Architecture I Instructor: Alaa Alameldeen alaa@ece.pdx.edu Spring 2015 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2015 1 When and Where? When:

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

Exploitation of instruction level parallelism

Exploitation of instruction level parallelism Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

EITF20: Computer Architecture Part3.2.1: Pipeline - 3

EITF20: Computer Architecture Part3.2.1: Pipeline - 3 EITF20: Computer Architecture Part3.2.1: Pipeline - 3 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Dynamic scheduling - Tomasulo Superscalar, VLIW Speculation ILP limitations What we have done

More information