Lecture 16: Core Design. Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue

Size: px

Start display at page:

Download "Lecture 16: Core Design. Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue"

Gervase Watts
6 years ago
Views:

1 Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue 1

2 The Alpha Out-of-Order Implementation Reorder Buffer (ROB) Branch prediction and instr fetch R1 R1+R2 R2 R1+R3 BEQZ R2 R3 R1+R2 R1 R3+R2 Instr Fetch Queue Decode & Rename Speculative Reg Map R1 P36 R2 P34 Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 Committed Reg Map R1 P1 R2 P2 P33 P1+P2 P34 P33+P3 BEQZ P34 P35 P33+P34 P36 P35+P34 Register File P1-P64 ALU ALU ALU Results written to regfile and tags broadcast to IQ Issue Queue (IQ) 2

3 Rename A B C D lr1 lr2 + lr3 lr2 lr4 + lr5 lr6 lr1 + lr3 lr6 lr1 + lr2 RAR lr3 RAW lr1 WAR lr2 WAW lr6 A ; BC ; D pr7 pr2 + pr3 pr8 pr4 + pr5 pr9 pr7 + pr3 pr10 pr7 + pr8 RAR pr3 RAW pr7 WAR x WAW x AB ; CD

4 Commit Example Assume a processor with 6 logical regs and 10 physical regs A B C D lr1 lr2 + lr3 lr2 lr4 + lr5 lr6 lr1 + lr3 lr6 lr1 + lr2 pr7 pr2 + pr3 pr8 pr4 + pr5 pr9 pr7 + pr3 pr10 pr7 + pr8 Map Old / New lr1 pr1 pr7 lr2 pr2 pr8 lr6 pr6 pr9 lr6 pr9 pr10 E F lr3 lr6 + lr2 lr4 lr3 + lr4 pr1 pr10 + pr8 pr2 pr1 + pr4 lr3 lr4 pr3 pr1 pr4 pr2

5 Out-of-Order Loads/Stores St R1 [R2] R3 [R4] R5 [R6] R7 [R8] R9 [R10] 5

6 Memory Dependence Checking St St 0x abcdef 0x abcdef 0x abcd00 0x abc000 0x abcd00 The issue queue checks for register dependences and executes instructions as soon as registers are ready Loads/stores access memory as well must check for RAW, WAW, and WAR hazards for memory as well Hence, first check for register dependences to compute effective addresses; then check for memory dependences 6

7 Memory Dependence Checking St St 0x abcdef 0x abcdef 0x abcd00 0x abc000 0x abcd00 Load and store addresses are maintained in program order in the Load/Store Queue (LSQ) Loads can issue if they are guaranteed to not have true dependences with earlier stores Stores can issue only if we are ready to modify memory (can not recover if an earlier instr raises an exception) 7

8 The Alpha Out-of-Order Implementation Branch prediction and instr fetch R1 R1+R2 R2 R1+R3 BEQZ R2 R3 R1+R2 R1 R3+R2 LD R4 8[R3] ST R4 8[R1] Instr Fetch Queue Decode & Rename Speculative Reg Map R1 P36 R2 P34 Reorder Buffer (ROB) Instr 1 Committed Instr 2 Reg Map Instr 3 R1 P1 Instr 4 R2 P2 Instr 5 Instr 6 Instr 7 P33 P1+P2 P34 P33+P3 BEQZ P34 P35 P33+P34 P36 P35+P34 P37 8[P35] P37 8[P36] Issue Queue (IQ) P37 [P35 + 8] P37 [P36 + 8] LSQ Register File P1-P64 ALU ALU ALU Results written to regfile and tags broadcast to IQ ALU D-Cache 8

9 Speculative Issue Instr I1 leaves the issue queue at start of cycle 6; the instr then reads operands from the regfile, wires are traversed, instruction executes, result is available at end of cycle 8 If operand availability is broadcast to issue queue in cycle 9, dependent instruction leaves in cycle 10 This causes a 4-cycle gap between successive instrs Hence, if we know that the instruction takes a cycle to execute, the operand is broadcast to the issue queue in cycle 6 and the dependent instr leaves issue queue in cycle 7; the input operand is correctly bypassed at the FU 9

10 Load Hit Speculation The previous optimization assumes that we know the exact latency for every operation This is true for all ops except loads (cache hit or miss?) Assume hit and schedule accordingly; on a cache miss, must squash all speculatively issued instructions; an instruction therefore sits in the queue until load hits are determined 10

11 Register Rename Logic Logical Source Regs Map Table Physical Source Regs Physical Dest Regs Mux Free Pool Logical Dest Regs Logical Source Reg Dependence Check Logic

12 Map Table RAM 7-bits 7-bits 7-bits 7-bits 7-bits Num entries = Num logical regs Phys reg id Shadow copies (shift register)

13 Map Table CAM 5-bits 1-bit 1-bit Num entries = Num phys regs Logical reg id v a l i d Shadow copies

14 Wakeup Logic tag1 tagiw or = = or rdyl. tagl tagr. rdyr.... rdyl tagl tagr rdyr

15 Selection Logic Issue window req grant anyreq enable Arbiter cell enable For multiple FUs, will need sequential selectors

16 Structure Complexities Critical structures: register map tables, issue queue, LSQ, register file, register bypass Cycle time is heavily influenced by: window size (physical register size), issue width (#FUs) Conflict between the desire to increase IPC and clock speed 16

17 Title Bullet 17

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections ) Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections 2.3-2.6) 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB) Branch prediction and instr fetch R1 R1+R2 R2 R1+R3 BEQZ R2 R3