HARDWARE SPECULATION. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Size: px

Start display at page:

Download "HARDWARE SPECULATION. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah"

Curtis Black
5 years ago
Views:

1 HARDWARE SPECULATION Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture

2 Overview Announcement Mid-term exam: Mar. 5 th No homework till after Mar. 14 th J This lecture Out-of-order pipeline n Issue queue n Register renaming n Branch recovery n Speculated execution

3 Recall: Out-of-Order Execution Producer-consumer chains on the fly Register renaming: remove anti-/output-dependences via register tags Limited by the number of instructions in the instruction window (ROB) Out-of-order issue (dispatch) Broadcast tags to waiting instructions Wake up ready instructions and select among them Out-of-order execute/complete In-order fetch/decode and commit

4 Out-of-Order Pipelines Distributed reservation stations In-order issue/dispatch Res. Station FU-1 Inst. Memory Inst. Decoder Res. Station Res. Station FU-n Branch Register File Res. Station Data Memory Re-Order Buffer (ROB) Fetch Decode Rename Issue Execute Complete Commit

5 Out-of-Order Pipelines Out of order issue/dispatch to functional units Out-of-order issue/dispatch Issue Queue (IQ) FU-1 Inst. Memory Inst. Decoder FU-n Branch Register File Data Memory Re-Order Buffer (ROB) Fetch Decode Rename Issue Execute Complete Commit

6 Out-of-Order Issue Queue Two step wakeup and select logic broadcast tag Selected Instructions Selection Logic Rdy-1 Reg-1 Reg-2 Rdy-2 = = Rdy-1 Reg-1 Reg-2 Rdy-2 = =

7 Register Renaming Register aliasing table for fast lookup How to add entries to RAT? Issue Queue (IQ) Search the table for an unallocated tag! Inst. Memory Inst. Decoder Register Aliasing Table (RAT) FU-1 FU-n Branch Data Memory Where to write? Register File Re-Order Buffer (ROB) Fetch Decode Rename Issue Execute Complete Commit

8 Register Renaming Free register list for fast register renaming Proceed only if there is free space in IQ, ROB, and Free List Issue Queue (IQ) When is it safe to free a tag? Inst. Memory Inst. Decoder Register Aliasing Table (RAT) Free Register List FU-1 FU-n Branch Data Memory Register File Re-Order Buffer (ROB) Fetch Decode Rename Issue Execute Complete Commit

9 Example: Instruction Commit Update value in ROB R1-> R10-> T2,,, =V2+V3 =Mem[10],V1 R1:?? R2: V2 R3: V3 ld,r10 +, R1 T2

10 Example: Instruction Commit Register file write R1-> R10-> T2,,, ALU =Mem[10] R1:?? R2: V2 R3: V3 R1,V1 ld,r10 +, R1 T2

11 Example: Instruction Commit Update tables and ROB R10-> T2,,,, ALU =Mem[10] R1: V1 R2: V2 R3: V3 ld,r10 T2

12 Example: Instruction Rename Allocate entries on ROB, IQ, and FL R10-> ALU R2=R1+R2 R3=R1+R2 R5=R1+R9 T2,,,, =Mem[10] R1: V1 R2: V2 R3: V3 ld,r10 T2

13 Example: Instruction Rename Allocate entries on ROB, IQ, and FL T2=V1+V2 R3=R1+R2 R5=R1+R9 R10-> R2->T2,,, ALU =Mem[10] R1: V1 R2:?? R3: V3 +,R2 ld,r10 T2

14 Example: Instruction Rename Allocate entries on ROB, IQ, and FL =V1+T2 T2=V1+V2 R5=R1+R9 R10-> R2->T2 R3->,, ALU =Mem[10] R1: V1 R2:?? R3:?? +,R3 +,R2 ld,r10 T2

15 Example: Instruction Rename Instruction has to wait for free resources =V1+T2 T2=V1+V2 R5=R1+R9 R10-> R2->T2 R3->,, ALU =Mem[10] R1: V1 R2:?? R3:?? +,R3 +,R2 ld,r10 T2

16 Example: Instruction Issue Issue ready instruction if free FU exists =V1+T2 R5=R1+R9 R10-> R2->T2 R3->,, T2=V1+V2 =Mem[10] R1: V1 R2:?? R3:?? +,R3 +,R2 ld,r10 T2

17 Example: Instruction Issue Out-of-order issue is now possible =V1+T2 =V1+V9 R10-> R2->T2 R3-> R5-> T5,, T2=V1+V2 =Mem[10] R1: V1 R2:?? R3:?? +,R5 +,R3 +,R2 ld,r10 T2

18 Example: Instruction Issue Out-of-order issue is now possible =V1+T2 =V1+V9 R10-> R2->T2 R3-> R5-> T5,, T2=V1+V2 =Mem[10] T2,V2 R1: V1 R2:?? R3:?? +,R5 +,R3 +,R2 ld,r10 T2

19 Example: Instruction Issue Wakeup and select =V1+V2 =V1+V9 R10-> R2->T2 R3-> R5-> T5,, ALU =Mem[10] R1: V1 R2:?? R3:?? +,R5 +,R3 +,R2 ld,r10 T2

20 Example: Instruction Issue Keep the program order to avoid starvation =V1+V9 R10-> R2->T2 R3-> R5-> T5,, =V1+V2 =Mem[10],V3 R1: V1 R2:?? R3:?? +,R5 +,R3 +,R2 ld,r10 T2

21 Example: Instruction Execute Issue ready instructions R10-> R2->T2 R3-> R5-> T5,, =V1+V9 =Mem[10],V4 R1: V1 R2:?? R3:?? +,R5 +,R3 +,R2 ld,r10 T2

22 Example: Instruction Execute Update ROB R10-> R2->T2 R3-> R5-> T5,, ALU =Mem[10],V10 R1: V1 R2:?? R3:?? +,R5 +,R3 +,R2 ld,r10 T2

23 Register Renaming Example Where values are stored? Issue Queue ßR1 + R2 Functional Units ß - R3 Decode Queue R1ßR1 + R2 R2ßR1 - R3 BEQ R2, R0 R3ßR1 ^ R2 R1ßR3 & R2 R1 R2 R3 tags BEQ, R0 ß ^ ß & Reorder Buffer ROB 1 ROB 2 ROB 3 T2 R1 R2 --- Register File R1 R2 R3 ROB 4 ROB 5 R3 R1

24 Branch Recovery How to handle branches? Issue Queue (IQ) Branch Predictor Inst. Memory Inst. Decoder Register Aliasing Table (RAT) Free Register List FU-1 FU-n Branch Data Memory Register File Re-Order Buffer (ROB) Fetch Decode Rename Issue Execute Complete Commit

25 Revisit Branch Prediction Problem: find the average number of stall cycles caused by branches in a pipeline, where branch misprediction penalty is 20 cycles, branch predictor accuracy is 90%, and branch target buffer hit rate is 80%. Every fifth instruction is a branch; 30% of branches are actually taken.

26 Revisit Branch Prediction Problem: find the average number of stall cycles caused by branches in a pipeline, where branch misprediction penalty is 20 cycles, branch predictor accuracy is 90%, and branch target buffer hit rate is 80%. Every fifth instruction is a branch; 30% of branches are actually taken. Average misses = 1- (0.3x0.9x x0.9) = Average stalls = 20x0.2x0.151 = 0.6

27 Speculated Execution Problem: branch may significantly limit performance consumer of a load or long latency instructions Solution: speculative instruction execution Fetch and decode instructions speculatively Issue and execute speculative instructions Branch resolution n Nullify the impact of speculative instructions if mispredicted n Commit speculative instructions (writes to register file/memory) only if prediction was correct

28 Branch Recovery Squash all mispredicted entries Issue Queue ßR1 + R2 Functional Units ß + R3 Decode Queue R1ßR1 + R2 R2ßR1 - R3 BEQ R2, R0 R3ßR1 ^ R2 R1ßR3 & R2 R1 R2 R3 RAT BEQ, R0 ß + ß + checkpoint Reorder Buffer ROB 1 ROB 2 ROB 3 T2 ROB 4 R1 R2 --- R3 Register File R1 R2 R3 speculated ROB 5 R1

29 Physical Register File Avoid copying register values multiple times Issue Queue ßR1 + R2 Functional Units ß + R3 Decode Queue R1ßR1 + R2 R2ßR1 - R3 BEQ R2, R0 R3ßR1 ^ R2 R1ßR3 & R2 R1 R2 R3 RAT BEQ, R0 ß + ß + Reorder Buffer ROB 1 ROB 2 ROB 3 T2 R1 R2 --- copy Register File R1 R2 R3 ROB 4 ROB 5 R3 R1

30 Physical Register File Avoid copying register values multiple times Note1: only a subset of the Phy. Reg. file is committed at any time. Decode Queue R1ßR1 + R2 R2ßR1 - R3 BEQ R2, R0 R3ßR1 ^ R2 R1ßR3 & R2 R1 R2 R3 Front RAT P4 P1 P3 Note2: no need for storing values in ROB or IQ Issue Queue P0ßR1 + R2 P1ßP0 + R3 BEQ P1, R0 P3ßP0 + P1 P4ßP3 + P1 Reorder Buffer ROB 1 ROB 2 ROB 3 ROB 4 ROB 5 Functional Units R1,P0 R2,P1 --- R3,P3 R1,P4 Phy. Reg. File P0 P1 P2 P3 P4 P5 Retire RAT P0 P1 R1 R2 R3

31 Double RAT Architecture What is the size of ROB? Branch Predictor Front RAT Issue Queue (IQ) FU-1 FU-n Physical Register File Inst. Memory Inst. Decoder Free Register List Branch Data Memory Retire RAT Re-Order Buffer (ROB) Fetch Decode Rename Issue Execute Complete Commit

32 Physical Register Release Example: when is it safe to free p30 (R1)? ADD R1, R2, R3 SUB R2, R1, R3 ADD R3, R1, R2 SUB R1, R3, R2 ADD R2, R1, R3 Architectural Registers R1 R2 retiring the second R1 Physical Registers p30 p31 p32 p33 p34 p35

ILP: CONTROL FLOW. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

ILP: CONTROL FLOW. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah ILP: CONTROL FLOW Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 2 will be released on Sept. 26 th This