inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #19: Pipelining II 2005-07-21 Andy Carle CS 61C L19 Pipelining II (1)
Review: Datapath for MIPS PC instruction memory rd rs rt registers ALU Data memory +4 imm 1. Instruction Fetch 2. Decode/ Register Read Use datapath figure to represent pipeline IFtch Dcd 3. Execute 4. ory 5. Write Back Exec WB ALU I$ Reg D$ Reg CS 61C L19 Pipelining II (2)
Review: Problems for Computers Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) Control hazards: Pipelining of branches & other instructions stall the pipeline until the hazard; bubbles in the pipeline Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock) CS 61C L19 Pipelining II (3)
Review: C.f. Branch Delay vs. Load Delay Load Delay occurs only if necessary (dependent instructions). Branch Delay always happens (part of the ISA). Why not have Branch Delay interlocked? Answer: Interlocks only work if you can detect hazard ahead of time. By the time we detect a branch, we already need its value hence no interlock is possible! CS 61C L19 Pipelining II (4)
FYI: Historical Trivia First MIPS design did not interlock and stall on load-use data hazard Real reason for name behind MIPS: Microprocessor without Interlocked Pipeline Stages Word Play on acronym for Millions of Instructions Per Second, also called MIPS Load/Use Wrong Answer! CS 61C L19 Pipelining II (5)
Outline Pipeline Control Forwarding Control Hazard Control CS 61C L19 Pipelining II (6)
Piped Proc So Far CS 61C L19 Pipelining II (7)
New Representation: Regs more explicit IF/DE DE/EX EX/ME ME/WB Next PC PC Inst. IR Reg A B Exec S D Data S M Reg. IF/DE.Ir = Instruction DE/EX.A = BusA out of Reg EX/ME.S = AluOut EX/ME.D = Bus B pass-through for sw ME/WB.S = ALuOut pass-through ME/WB.M = Result from lw CS 61C L19 Pipelining II (8)
New Representation: Regs more explicit IF/DE DE/EX EX/ME ME/WB Next PC PC Inst. IR Reg A B Exec S D Data S M Reg. What s Missing??? CS 61C L19 Pipelining II (9)
Pipelined Processor (almost) for slides Idea: Parallel Piped Control Valid Data Access Exec Reg. A B S PC Reg Inst. IR Dcd Ctrl IRex Ex Ctrl IRmem Ctrl IRwb WB Ctrl Equal M Next PC D CS 61C L19 Pipelining II (10)
Pipelined Control IR <- [PC]; PC < PC+4; A <- R[rs]; B< R[rt] S < A + B; S < A or ZX; S < A + SX; S < A + SX; If Cond PC < PC+SX; M < [S] [S] <- B R[rd] < S; R[rt] < S; R[rd] < M; Equal Next PC PC Inst. IR Reg A B Exec S D Access M Data Reg. CS 61C L19 Pipelining II (11)
Data Stationary Control The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later Control signals for (Wr Branch) are used 2 cycles later Control signals for Wr (toreg Wr) are used 3 cycles later Reg/Dec Exec Wr ExtOp ExtOp ALUSrc ALUSrc IF/ID Register Main Control ALUOp RegDst W Branch r toreg ID/Ex Register ALUOp RegDst W Branch r toreg Ex/ Register W rbranch toreg /Wr Register toreg RegWr RegWr RegWr RegWr CS 61C L19 Pipelining II (12)
Let s Try it Out 10 lw r1, 36(r2) 14 addi r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 28 ori r8, r9, 17 32 add r10, r11, r12 100 and r13, r14, 15 CS 61C L19 Pipelining II (13)
Start: Fetch 10 n n n n Inst. IR Decode rs rt im Ctrl WB Ctrl Reg = A B Exec S D Access M Data IF Reg. 10 lw r1, 36(r2) 14 addi r2, r2, 3 Next PC 10 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 PC 34 add r10, r11, r12 CS 61C L19 Pipelining II (14) 100 and r13, r14, 15
Fetch 14, Decode 10 Inst. lw r1, 36(r2) IR Decode 2 rt n n n im Ctrl WB Ctrl Reg Next PC = A B 14 Exec S D Access M Data ID IF Reg. 10 lw r1, 36(r2) 14 addi r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 PC 34 add r10, r11, r12 CS 61C L19 Pipelining II (15) 100 and r13, r14, 15
Fetch 20, Decode 14, Exec 10 Inst. addi r2, r2, 3 IR Decode 2 rt lw r1 36 n Ctrl n WB Ctrl Reg Next PC = r2 B 20 Exec S D Access M Data EX ID IF Reg. 10 lw r1, 36(r2) 14 addi r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 PC 34 add r10, r11, r12 CS 61C L19 Pipelining II (16) 100 and r13, r14, 15
Fetch 24, Decode 20, Exec 14, 10 Inst. sub r3, r4, r5 IR Decode 4 5 addi r2, r2, 3 3 lw r1 Ctrl n WB Ctrl Reg Next PC = r2 B 24 Exec r2+36 D Access M Data M EX ID IF Reg. 10 lw r1, 36(r2) 14 addi r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 PC 34 add r10, r11, r12 CS 61C L19 Pipelining II (17) 100 and r13, r14, 15
Fetch 30, Dcd 24, Ex 20, 14, WB 10 Inst. beq r6, r7 100 IR Decode 6 7 Reg Next PC = CS 61C L19 Pipelining II (18) sub r3 r4 r5 30 PC Exec addi r2 r2+3 D Ctrl Access M[r2+36] Data Note Delayed Branch: always execute ori after beq lw r1 WB M EX ID IF WB Ctrl Reg. 10 lw r1, 36(r2) 14 addi r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15
Fetch 100, Dcd 30, Ex 24, 20, WB 14 Inst. ori r8, r9 17 IR Decode 9 xx Reg Next PC = 100 beq r6 r7 100 Exec sub r3 r4-r5 D Ctrl Access addi r2 r2+3 Data WB M EX ID WB Ctrl Reg. r1=m[r2+35] 10 lw r1, 36(r2) 14 addi r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 PC 34 add r10, r11, r12 CS 61C L19 Pipelining II (19) IF 100 and r13, r14, 15
???? Valid Data Access Exec Reg. A B S M Next PC PC Reg Equal Inst. IR Dcd Ctrl IRex Ex Ctrl IRmem Ctrl IRwb WB Ctrl D Remember: means triggered on edge. What is wrong here? CS 61C L19 Pipelining II (20)
Double-Clocked Signals Valid Reg Exec Equal Next PC Inst. IR Dcd Ctrl PC IRex Ex Ctrl IRmem Ctrl IRwb WB Ctrl A B S Access Some signals are double clocked! D M Data Reg. In general: Inputs to edge components are their own pipeline regs Watch out for stalls and such! CS 61C L19 Pipelining II (21)
Administrivia Proj 2 Due Sunday HW6 Due Tuesday Midterm 2: Friday, July 29: 11:00 2:00 Location TBD If you are really so concerned about the drop deadline that this is a problem for you, talk to me about the possibility of taking the exam on Thursday CS 61C L19 Pipelining II (22)
Megahertz Myth? CS 61C L19 Pipelining II (23)
Outline Pipeline Control Forwarding Control Hazard Control CS 61C L19 Pipelining II (24)
Review: Forwarding Fix by Forwarding result as soon as we have it to where we need it: add $t0,$t1,$t2 sub $t4,$t0,$t3 and $t5,$t0,$t6 IF ID/RF EX MEM WB ALU I$ Reg D$ Reg ALU I$ Reg D$ Reg ALU I$ Reg D$ Reg or $t7,$t0,$t8 * xor $t9,$t0,$t10 I$ ALU Reg D$ Reg ALU I$ Reg D$ Reg * or hazard solved by register hardware CS 61C L19 Pipelining II (25)
Forwarding In general: For each stage i that has reg inputs For each stage j after I that has reg output - If i.reg == j.reg forward j value back to i. - Some exceptions ($0, invalid) In particular: ALUinput Input (ALUResult, Result) (Result) CS 61C L19 Pipelining II (26)
Pending Writes In Pipeline Registers IAU npc I mem Regs op rw rs rt PC B A im n alu S n D mem m Regs n CS 61C L19 Pipelining II (27)
Pending Writes In Pipeline Registers IAU Regs B alu S D mem npc I mem op rw rs rt A im n op rw n op rw PC Current operand registers Pending writes hazard <= ((rs == rw ex) & regw ex ) OR ((rs == rw mem) & regw me ) OR ((rs == rw & regw wb) wb ) OR ((rt == rw & regw ex) ex ) OR ((rt == rw & regw mem) me ) OR ((rt == rw wb ) & regw wb ) m n op rw Regs CS 61C L19 Pipelining II (28)
Forwarding Muxes Regs Forward mux B alu S D mem m Regs CS 61C L19 Pipelining II (29) IAU npc I mem A im n op rw n n op rw rs rt op op rw rw PC Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe Increase muxes to add paths from pipeline registers Data Forwarding = Data Bypassing
What about memory operations? Tricky situation: MIPS: lw 0($t0) sw 0($t1) op Rd Ra Rb op Rd Ra Rb A B RTL: R1 <- [ R2 + I ]; [R3+34] <- R1 Rd Rd D R T to reg file CS 61C L19 Pipelining II (30)
What about memory operations? Tricky situation: MIPS: lw 0($t0) sw 0($t1) op Rd Ra Rb op Rd Ra Rb A B RTL: R1 <- [ R2 + I ]; [R3+34] <- R1 Rd D R Solution: Handle with bypass in memory stage! Rd to reg file T CS 61C L19 Pipelining II (31)
Outline Pipeline Control Forwarding Control Hazard Control CS 61C L19 Pipelining II (32)
Data Hazard: Loads (1/4) Forwarding works if value is available (but not written back) before it is needed. But consider lw $t0,0($t1) IF ID/RF EX MEM WB ALU I$ Reg D$ Reg sub $t3,$t0,$t2 ALU I$ Reg D$ Reg Need result before it is calculated! Must stall use (sub) 1 cycle and then forward. CS 61C L19 Pipelining II (33)
Data Hazard: Loads (2/4) Hardware must stall pipeline Called interlock lw $t0, 0($t1) sub $t3,$t0,$t2 and $t5,$t0,$t4 IF ID/RF EX MEM WB ALU I$ Reg D$ Reg bub I$ Reg D$ Reg ble ALU I$ bub Reg D$ Reg ble ALU bub ble or $t7,$t0,$t6 I$ ALU Reg D$ CS 61C L19 Pipelining II (34)
Data Hazard: Loads (3/4) Instruction slot after a load is called load delay slot If that instruction uses the result of the load, then the hardware interlock will stall it for one cycle. If the compiler puts an unrelated instruction in that slot, then no stall Letting the hardware stall the instruction in the delay slot is equivalent to putting a nop in the slot (except the latter uses more code space) CS 61C L19 Pipelining II (35)
Data Hazard: Loads (4/4) Stall is equivalent to nop lw $t0, 0($t1) ALU I$ Reg D$ Reg nop bub ble bub ble bub ble bub ble bub ble sub $t3,$t0,$t2 and $t5,$t0,$t4 ALU I$ Reg D$ Reg ALU I$ Reg D$ Reg or $t7,$t0,$t6 I$ ALU Reg D$ CS 61C L19 Pipelining II (36)
Hazards / Stalling In general: For each stage i that has reg inputs If I s reg is being written later on in the pipe but is not ready yet - Stages 0 to i: Stall (Turn CEs off so no change) - Stage i+1: Make a bubble (do nothing) - Stages i+2 onward: As usual In particular: ALUinput (Result) CS 61C L19 Pipelining II (37)
Hazards / Stalling Alternative Approach: Detect non-forwarding hazards in decode Possible since our hazards are formal. - Not always the case. Stalling then becomes: - Issue nop to EX stage - Turn off nextpc update (refetch same inst) - Turn off InstReg update (re-decode same inst) CS 61C L19 Pipelining II (38)
Stall Logic Regs Forward mux B alu IAU npc I mem op rw rs rt A im n op rw PC 1. Detect nonresolving hazards. 2a. Insert Bubble 2b. Stall nextpc, IF/DE S n op rw D mem m n op rw Regs CS 61C L19 Pipelining II (39)
Stall Logic Stall-on-issue is used quite a bit More complex processors: many cases that stall on issue. More complex processors: cases that can t be detected at decode - E.g. value needed from mem is not in cache proc must stall multiple cycles CS 61C L19 Pipelining II (40)
By the way Notice that our forwarding and stall logic is stateless! Big Idea: Keep it simple! Option 1: Store old fetched inst in reg ( stall_temp ), keep state reg that says whether to use stall_temp or value coming off inst mem. Option 2: Re-fetch old value by turning off PC update. CS 61C L19 Pipelining II (41)