Chapter 3 & Appendix C Part B: ILP and Its Exploitation

Similar documents
ILP: Instruction Level Parallelism

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Copyright 2012, Elsevier Inc. All rights reserved.

Instruction-Level Parallelism and Its Exploitation

Instruction Level Parallelism

Static vs. Dynamic Scheduling

Adapted from David Patterson s slides on graduate computer architecture

EITF20: Computer Architecture Part3.2.1: Pipeline - 3

Instruction Level Parallelism (ILP)

COSC4201 Instruction Level Parallelism Dynamic Scheduling

Chapter 06: Instruction Pipelining and Parallel Processing

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

What is ILP? Instruction Level Parallelism. Where do we find ILP? How do we expose ILP?

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

CPE 631 Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Website for Students VTU NOTES QUESTION PAPERS NEWS RESULTS

The basic structure of a MIPS floating-point unit

CPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation

Advanced Pipelining and Instruction- Level Parallelism 4

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by:

Superscalar Architectures: Part 2

Tomasulo s Algorithm

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

Hardware-based Speculation

Lecture-13 (ROB and Multi-threading) CS422-Spring

DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

COSC 6385 Computer Architecture - Pipelining (II)

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

Handout 2 ILP: Part B

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

EE 4683/5683: COMPUTER ARCHITECTURE

CS425 Computer Systems Architecture

Hardware-based Speculation

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

EECC551 Exam Review 4 questions out of 6 questions

Scoreboard information (3 tables) Four stages of scoreboard control

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

CIS 662: Midterm. 16 cycles, 6 stalls

Exploitation of instruction level parallelism

Pipelining: Issue instructions in every cycle (CPI 1) Compiler scheduling (static scheduling) reduces impact of dependences

Multi-cycle Instructions in the Pipeline (Floating Point)

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

Slide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng

Chapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,

CISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3

Processor: Superscalars Dynamic Scheduling

Instruction Level Parallelism (ILP)

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

EEC 581 Computer Architecture. Lec 4 Instruction Level Parallelism

ECE 505 Computer Architecture

Instruction Level Parallelism

CSE 502 Graduate Computer Architecture

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism

Metodologie di Progettazione Hardware-Software

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1

Floating Point/Multicycle Pipelining in DLX

Instruction Level Parallelism

Instruction-Level Parallelism (ILP)

CMSC411 Fall 2013 Midterm 2 Solutions

In embedded systems there is a trade off between performance and power consumption. Using ILP saves power and leads to DECREASING clock frequency.

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Graduate Computer Architecture. Chapter 3. Instruction Level Parallelism and Its Dynamic Exploitation

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

Preventing Stalls: 1

Instruction Pipelining Review

5008: Computer Architecture

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

UNIT I (Two Marks Questions & Answers)

Course on Advanced Computer Architectures

CS433 Homework 2 (Chapter 3)

Hardware-Based Speculation

Chapter 4 The Processor 1. Chapter 4D. The Processor

Review: Evaluating Branch Alternatives. Lecture 3: Introduction to Advanced Pipelining. Review: Evaluating Branch Prediction

Instruction-Level Parallelism. Instruction Level Parallelism (ILP)

Dynamic Scheduling. Better than static scheduling Scoreboarding: Tomasulo algorithm:

CS433 Homework 2 (Chapter 3)

Instruction Level Parallelism. Taken from

吳俊興高雄大學資訊工程學系. October Example to eleminate WAR and WAW by register renaming. Tomasulo Algorithm. A Dynamic Algorithm: Tomasulo s Algorithm

Review: Compiler techniques for parallelism Loop unrolling Ÿ Multiple iterations of loop in software:

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

Chapter 3: Instruction Level Parallelism (ILP) and its exploitation. Types of dependences

Advantages of Dynamic Scheduling

Super Scalar. Kalyan Basu March 21,

Chapter 7. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 7 <1>

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

Complex Pipelining COE 501. Computer Architecture Prof. Muhamed Mudawar

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

Lecture 4: Introduction to Advanced Pipelining

Advanced issues in pipelining

Getting CPI under 1: Outline

Transcription:

CS359: Computer Architecture Chapter 3 & Appendix C Part B: ILP and Its Exploitation Yanyan Shen Department of Computer Science and Engineering Shanghai Jiao Tong University 1

Outline 3.1 Concepts and Challenges 3.2 Basic Compiler Techniques 3.3 Reducing Branch Costs with Advanced Branch Prediction 3.4 Overcoming Data Hazards with Dynamic Scheduling 3.5 Hardware-based Speculation 2

Background Instruction Level Parallelism (ILP) refers to overlap execution of instructions Pipelining become universal technique in 1985 There are two main approaches to exploit ILP Hardware-based dynamic approaches Used in server and desktop processors Compiler-based static approaches Not as successful outside of scientific applications The goal of this chapter: 1) to look at the limitation imposed by data and control hazards; 2) Investigate the topic of increasing the ability of the compiler (software) and the processor (hardware) to exploit ILP 3

Pipeline CPI When exploiting instruction-level parallelism, the goal is to minimize CPI Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data hazard stalls + Control Stalls Ideal pipeline CPI is a measure of the maximum performance attainable by the implementation By reducing each of the terms in the right-hand side, we decrease the overall pipeline CPI or improve IPC (instructions per clock) 4

Major Techniques for Improving Pipeline CPI 5

What is Instruction-Level Parallelism Basic block: a straight-line code sequence with no branches in except to the entry and no branches out except at the exit Parallelism with a basic block is limited The average dynamic branch frequency is often between 15% and 25% Thus, typical size of basic block = 3-6 instructions Conclusion: Must optimize across branches 6

Dependences and Hazards Dependences: Determining how one instruction depends on another is critical to determining 1) how much parallelism exists in a program and 2) how that parallelism can be exploited! Hazards: When there are hazards, the pipeline may be stopped in order to resolve the hazards 7

Three Types of Dependences Data dependences (also called true data dependences) Data flow between instructions Name dependences (antidependence and output dependence) Control dependence 8

Data Dependences Dependencies are a property of programs Pipeline organization determines if a dependence results in an actual hazard being detected and if the hazard causes a stall Data dependence conveys: Possibility of a hazard Order in which results must be calculated Upper bound on exploitable instruction level parallelism 9

Name Dependences Two instructions use the same register or memory location but no flow of information Not a true data dependence, but is a problem when reordering instructions Antidependence: instruction j writes a register or memory location that instruction i reads Initial ordering (i before j) must be preserved Output dependence: instruction i and instruction j write the same register or memory location Ordering must be preserved To resolve, use renaming techniques 10

Control Dependence Ordering of instruction i with respect to a branch instruction Instruction control dependent on a branch cannot be moved before the branch so that its execution is no longer controller by the branch An instruction not control dependent on a branch cannot be moved after the branch so that its execution is controlled by the branch 11

Data Hazards Read after write (RAW) j tries to read a source before i writes it Program order must be preserved! Write after write (WAW) j tries to write an operand before it is written by i Write after read (WAR) j tries to write a destination before it is read by i, so i incorrectly gets the new value 12

Outline 3.1 Concepts and Challenges 3.2 Basic Compiler Techniques 3.3 Reducing Branch Costs with Advanced Branch Prediction 3.4 Overcoming Data Hazards with Dynamic Scheduling 3.5 Hardware-based Speculation 13

Motivation DIV.D F4, F0, F2 ADD.D F10, F4, F8 SUB.D F12, F6, F14 There is a RAW hazard between DIV.D and ADD.D ADD.D should be stalled SUB.D is not dependent on previous instructions. Thus, it can be executed. But it is blocked by ADD.D There is little parallelism among the instructions (if executed in order) Can we execute instructions out of order? To get more parallelism Yes! Out of Order Execution 14

Motivation Idea: Out of Order Execution But, Be Careful! Both Parallelism and Correctness! Traditionally: Inorder issue In-order execution How do we ensure correctness of execution? 15

Improvement Through Out-of-Execution (hardware) The ID stage is divided into two parts Part 1) checking for any structural hazard Part 2) waiting for the absence of a data hazard Thus, in-order issue, out-of-order execution We can use in-order instruction issue (instructions issued in program order), But we want an instruction to begin execution as soon as its data operands are available. 16

Issues with Out-of-Order Execution List all WAR and WAW hazards DIV.D ADD.D F0, F2, F4 F6, F0, F8 RAW WAR WAW SUB.D MUL.D F8, F10, F14 F6, F10, F8 17

Dynamic Scheduling Dynamic Scheduling: Rearrange order of instructions to reduce stalls while maintaining data flow Advantages: Compiler doesn t need to have knowledge of microarchitecture Handles cases where dependencies are unknown at compile time Disadvantage: Substantial increase in hardware complexity Complicates exceptions 18

Two Algorithms for Dynamic Scheduling Scoreboard Approach Adopted by CDC6600 in 1963 Tomasulo s Approach Tracks when operands are available Introduces register renaming in hardware Minimizes WAW and WAR hazards 19