Website for Students VTU NOTES QUESTION PAPERS NEWS RESULTS

Similar documents
Instruction Level Parallelism

The basic structure of a MIPS floating-point unit

Hardware-based Speculation

COSC 6385 Computer Architecture - Instruction Level Parallelism (II)

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1

Hardware-based Speculation

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

Static vs. Dynamic Scheduling

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by:

CPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Adapted from David Patterson s slides on graduate computer architecture

Instruction-Level Parallelism and Its Exploitation

Four Steps of Speculative Tomasulo cycle 0

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Chapter 3: Instruction Level Parallelism (ILP) and its exploitation. Types of dependences

Chapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,

COSC4201 Instruction Level Parallelism Dynamic Scheduling

Processor: Superscalars Dynamic Scheduling

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

5008: Computer Architecture

CISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3

Super Scalar. Kalyan Basu March 21,

ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism

Scoreboard information (3 tables) Four stages of scoreboard control

Hardware-Based Speculation

EECC551 Exam Review 4 questions out of 6 questions

Instruction Level Parallelism

Metodologie di Progettazione Hardware-Software

EITF20: Computer Architecture Part3.2.1: Pipeline - 3

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Handout 2 ILP: Part B

Dynamic Scheduling. Better than static scheduling Scoreboarding: Tomasulo algorithm:

Instruction Level Parallelism (ILP)

Topics. Digital Systems Architecture EECE EECE Predication, Prediction, and Speculation

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

吳俊興高雄大學資訊工程學系. October Example to eleminate WAR and WAW by register renaming. Tomasulo Algorithm. A Dynamic Algorithm: Tomasulo s Algorithm

ILP: Instruction Level Parallelism

CS433 Homework 2 (Chapter 3)

Multiple Instruction Issue and Hardware Based Speculation

CS252 Graduate Computer Architecture Lecture 8. Review: Scoreboard (CDC 6600) Explicit Renaming Precise Interrupts February 13 th, 2010

Good luck and have fun!

Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar, VLIW. CPE 631 Session 19 Exploiting ILP with SW Approaches

Review: Compiler techniques for parallelism Loop unrolling Ÿ Multiple iterations of loop in software:

Chapter 4 The Processor 1. Chapter 4D. The Processor

Advanced Computer Architecture

CS433 Homework 2 (Chapter 3)

Superscalar Architectures: Part 2

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

CS 614 COMPUTER ARCHITECTURE II FALL 2004

Functional Units. Registers. The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Input Control Memory

Lecture-13 (ROB and Multi-threading) CS422-Spring

Advanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012

Compiler Optimizations. Lecture 7 Overview of Superscalar Techniques. Memory Allocation by Compilers. Compiler Structure. Register allocation

Tomasulo s Algorithm

Graduate Computer Architecture. Chapter 3. Instruction Level Parallelism and Its Dynamic Exploitation

Multicycle ALU Operations 2/28/2011. Diversified Pipelines The Path Toward Superscalar Processors. Limitations of Our Simple 5 stage Pipeline

The Tomasulo Algorithm Implementation

Computer Architecture Homework Set # 3 COVER SHEET Please turn in with your own solution

Complex Pipelining COE 501. Computer Architecture Prof. Muhamed Mudawar

CS433 Homework 3 (Chapter 3)

DAT105: Computer Architecture Study Period 2, 2009 Exercise 3 Chapter 2: Instruction-Level Parallelism and Its Exploitation

What is ILP? Instruction Level Parallelism. Where do we find ILP? How do we expose ILP?

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士

Hardware-Based Speculation

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

Instruction Level Parallelism. Taken from

Dynamic Instruction Level Parallelism (ILP)

COSC 6385 Computer Architecture - Pipelining (II)

Instruction Pipelining Review

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

CMSC411 Fall 2013 Midterm 2 Solutions

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

ELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST *

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

Chapter 3: Instruc0on Level Parallelism and Its Exploita0on

Pipelining: Issue instructions in every cycle (CPI 1) Compiler scheduling (static scheduling) reduces impact of dependences

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

Chapter 3 & Appendix C Part B: ILP and Its Exploitation

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Course on Advanced Computer Architectures

CPE 631 Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Transcription:

Advanced Computer Architecture- 06CS81 Hardware Based Speculation Tomasulu algorithm and Reorder Buffer Tomasulu idea: 1. Have reservation stations where register renaming is possible 2. Results are directly forwarded to the reservation station along with the final registers. This is also called short circuiting or bypassing. ROB: 1.The instructions are stored sequentially but we have indicators to say if it is speculative or completed execution. 2. If completed execution and t speculative and reached head of the queue then we commit it. Speculating on Branch Outcomes To optimally exploit ILP (instruction-level parallelism) e.g. with pipelining, Tomasulo, etc. it is critical to efficiently maintain control dependencies (=branch dependencies)

Key idea: Speculate on the outcome of branches(=predict) and execute instructions as if the predictions are correct of course, we must proceed in such a manner as to be able to recover if our speculation turns out wrong Three components of hardware-based speculation 1. dynamic branch prediction to pick branch outcome 2. speculation to allow instructions to execute before control dependencies are resolved, i.e., before branch outcomes become kwn with ability to undo in case of incorrect speculation 3. dynamic scheduling Speculating with Tomasulo Key ideas: 1. separate execution from completion: instructions to execute speculatively but instructions update registers or memory until more speculative 2. therefore, add a final step after an instruction is longer speculative, called instruction commit when it is allowed to make register and memory updates 3. allow instructions to execute and complete out of order but force them to commit in order 4. Add hardware called the reorder buffer (ROB), with registers to hold the result of an instruction between completion and commit Tomasulo s Algorithm with Speculation: Four Stages 1. Issue: get instruction from Instruction Queue if reservation station and ROB slot free ( structural hazard), control issues instruction to reservation station and ROB, and sends to reservation station operand values (or reservation station source for values) as well as allocated ROB slot number 2. Execution: operate on operands (EX) when both operands ready then execute; if t ready, watch CDB for result 3. Write result: finish execution (WB) write on CDB to all awaiting units and ROB; mark reservation station available 4. Commit: update register or memory with ROB result when instruction reaches head of ROB and results present, update register with result or store to memory and remove instruction from ROB if an incorrectly predicted branch reaches the head of ROB, flush the ROB, and restart at correct successor of branch ROB Data Structure ROB entry fields Instruction type: branch, store, register operation (i.e., ALU or load) State: indicates if instruction has completed and value is ready Destination: where result is to be written register number for register operation (i.e. ALU or load), memory address for store

branch has destination result Value: holds the value of instruction result till time to commit Additional reservation station field Destination: Corresponding ROB entry number Example 1. L.D F6, 34(R2) 2. L.D F2, 45(R3) 3. MUL.D F0, F2, F4 4. SUB.D F8, F2, F6 5. DIV.D F10, F0, F6 6. ADD.D F6, F8, F2 The position of Reservation stations, ROB and FP registers are indicated below: Assume latencies load 1 clock, add 2 clocks, multiply 10 clocks, divide 40 clocks Show data structures just before MUL.D goes to commit Reservation Stations Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3 Mult1 yes MUL Mem[45+Regs[R3]] Regs[F4] #3 Mult2 yes DIV Mem[34+Regs[R2]] #3 #5 At the time MUL.D is ready to commit only the two L.D instructions have already committed, though others have completed execution Actually, the MUL.D is at the head of the ROB the L.D instructions are shown only for understanding purposes #X represents value field of ROB entry number X Floating point registers Field F0 F1 F2 F3 F4 F5 F6 F7 F8 F10 Reorder# 3 6 4 5 Busy yes yes yes yes

Reorder Buffer Entry Busy Instruction State Destination Value 1 L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 yes MUL.D F0, F2, F4 Write result F0 #2 Regs[F4] 4 yes SUB.D F8, F6, F2 Write result F8 #1 #2 5 yes DIV.D F10, F0, F6 Execute F10 6 yes ADD.D F6, F8, F2 Write result F6 #4 + #2 Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 SUBI R1 R1 #8 BNEZ R1 Loop Assume instructions in the loop have been issued twice Assume L.D and MUL.D from the first iteration have committed and all other instructions have completed Assume effective address for store is computed prior to its issue Show data structures

Reorder Buffer Entry Busy Instruction State Destination Value 1 L.D F0, 0(R1) Commit F0 Mem[0+Regs[R1]] 2 MUL.D F4, F0, F2 Commit F4 #1 Regs[F2] 3 yes S.D F4, 0(R1) Write result 0 + Regs[R1] #2 4 yes DADDUI R1, R1, #-8 Write result R1 Regs[R1] 8 5 yes BNE R1, R2, Loop Write result 6 yes L.D F0, 0(R1) Write result F0 Mem[#4] 7 yes MUL.D F4, F0, F2 Write result F4 #6 Regs[F2] 8 yes S.D F4, 0(R1) Write result 0 + #4 #7 9 yes DADDUI R1, R1, #-8 Write result R1 #4 8 10 yes BNE R1, R2, Loop Write result Notes If a branch is mispredicted, recovery is done by flushing the ROB of all entries that appear after the mispredicted branch entries before the branch are allowed to continue restart the fetch at the correct branch successor When an instruction commits or is flushed from the ROB then the corresponding slots become available for subsequent instructions Advantages of hardware-based speculation: - -able to disambiguate memory references; - -better when hardware-based branch prediction is better than software-based branch prediction done at compile time; - maintains a completely precise exception model even for speculated instructions; - does t require compensation or bookkeeping code; - main disadvantage: - complex and requires substantial hardware resources;