Review: Compter Organization Pipelining Chans Y Landry Eample Landry Eample Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 3 mintes A B C D Dryer takes 3 mintes Folder takes 3 mintes Stasher takes 3 mintes to pt clothes into drawers 2
Seqential Landry 6 P 7 8 9 2 2 A A 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Time B C D Seqential landry takes 8 hors for loads If they learned pipelining, how long wold landry take? 3 Faster Landry - Pipelining 2 2 A 6 P 7 8 9 A B C D 3 3 3 3 3 3 3 Time Faster landry takes 3.5 hors for loads! 2
5 Stages of IPS Step name Instrction fetch Instrction decode/register fetch Action for R-type instrctions Action for -reference Action for instrctions branches IR = emory[] = + A = Reg [IR[25-2]] B = Reg [IR[2-6]] ALUOt = + (sign-etend (IR[5-]) << 2) Action for jmps Eection, address ALUOt = A op B ALUOt = A + sign-etend if (A ==B) then = [3-28] II comptation, branch/ (IR[5-]) = ALUOt (IR[25-]<<2) jmp completion emory access or R-type Reg [IR[5-]] = Load: DR = emory[aluot] completion ALUOt or Store: emory [ALUOt] = B emory read completion Load: Reg[IR[2-6]] = DR 5 Single cycle vs. lticycle Instrction fetch Reg. read ALU operation Reg. write add Instrction fetch Reg. read ALU operation emory read Reg. write load 2 3 5 6 7 8 9 2 3 5 6 7 8 Instrction fetch Reg. read ALU operation Reg. write add Instrction fetch Reg. read ALU operation emory read What are the advantages of mlticycle implementation? What are the disadvantages of mlticycle implementation? Reg. write load 6 3
lticycle vs. Pipelined Instrction fetch Reg. read ALU operation Reg. write add Instrction fetch Reg. read ALU operation emory read Reg. write load 2 3 5 6 7 8 9 2 3 5 6 7 8 Instrction fetch Reg. read ALU operation Reg. write add Instrction fetch Reg. read ALU operation emory read Reg. write load What are the advantages of pipelined implementation? What are the disadvantages of pipelined implementation? 7 Lessons from Pipelined Landry 6 P 7 8 9 Time 3 3 3 3 3 3 3 A B C D Pipelining doesn t help latency of single task, it helps throghpt of entire workload Potential speedp = Nmber pipe stages Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages redces speedp Time to fill pipeline and time to drain it redces speedp ltiple tasks operating simltaneosly sing different resorces any dependencies, any conflicts??? 8
Can pipelining get s into troble? E If any two stages se the same resorce, there mst be a conflict. s n i o t c r t s n I 2 3 E E E 5 E 2 3 5 6 7 8 9 Time Step (Clock Cycle) 9 Hazards Hazard = when an instrction s stage is nable to eecte dring the crrent cycle. Can always resolve hazards by waiting pipeline control mst detect the hazard take action (or delay action) to resolve hazards E Instrction #2 stage 3 nable to contine. s n i o t c r t s n I 2 3 Stall E E E 2 3 5 6 7 8 9 Time Step (Clock Cycle) 5
Strctral Hazards A needed fnctional nit is bsy eecting a previos instrction (Attempt to se the same resorce two different ways at the same time) Eample: Or sample IPS pipeline has none. What if + comptation sed main ALU instead of separate adder? s n i o t c r t s n I 2 3 Stall E Stall E E 2 3 5 6 7 8 9 Time Step (Clock Cycle) Control Hazards While eecting a previos branch, net instrction address might not yet be known. (attempt to make a decision before condition is evalated) s n i o t c r t s n I Comptes branch target address. Calclates +. Performs branch test & sets to target. Conditional branch Branch target Stall Stall 2 3 E E 5 6 7 8 Time Step (Clock Cycle) 2 6
Data Hazards Needed still being compted by previos instrction. (attempt to se item before it is ready) add $s3,$s,$s2 E sw $s,($s3) Stall E lw $s5,($s3) Stall E add $s7,$s5,$s6 Stall Stall E 2 3 5 6 7 8 9 2 Time Step (Clock Cycle) 3 Pipelined Approach 2 - Cycle time, No. stages - Resorce conflict E E A B C D 3 E E 5 E 2 3 5 6 7 8 9 7
Resorce Conflicts (revisit) Step name Instrction fetch Instrction decode/register fetch Action for R-type instrctions Action for -reference Action for instrctions branches IR = emory[] = + A = Reg [IR[25-2]] B = Reg [IR[2-6]] ALUOt = + (sign-etend (IR[5-]) << 2) Action for jmps Eection, address ALUOt = A op B ALUOt = A + sign-etend if (A ==B) then = [3-28] II comptation, branch/ (IR[5-]) = ALUOt (IR[25-]<<2) jmp completion emory access or R-type Reg [IR[5-]] = Load: DR = emory[aluot] completion ALUOt or Store: emory [ALUOt] = B emory read completion Load: Reg[IR[2-6]] = DR ALU conflict emory conflict Register file conflict (read or write) 5 Basic Pipeline : Instrction fetch : Instrction decode/ register file read : Eecte/ address calclation E: emory access : back ress Instrction Instrction register register 2 Registers register 6 2 Sign etend 32 Shift left 2 reslt ALU Zero ALU reslt ress Data Instrctions and move generally from left to right throgh the five stages as they complete eection ecept two cases. - stage - selection 6 8
Basic Pipeline Step name Instrction fetch Instrction decode/register fetch Action for R-type instrctions Action for -reference Action for instrctions branches IR = emory[] = + A = Reg [IR[25-2]] B = Reg [IR[2-6]] ALUOt = + (sign-etend (IR[5-]) << 2) Action for jmps Eection, address ALUOt = A op B ALUOt = A + sign-etend if (A ==B) then = [3-28] II comptation, branch/ (IR[5-]) = ALUOt (IR[25-]<<2) jmp completion emory access or R-type Reg [IR[5-]] = Load: DR = emory[aluot] completion ALUOt or Store: emory [ALUOt] = B emory read completion Why move?? ZF is available dring stage, anyway. Why do we still need 2 ALUs at stage? (one for A-B and the other for +IR) Load: Reg[IR[2-6]] = DR 7 Pipelined Datapath / / /E E/ Shift left 2 reslt to the basic pipeline in order to actally split the path into stages. ress Instrction Instrction register register 2 Registers register 6 2 Sign etend 32 Zero ALU ALU reslt ress Data The info. mst be placed in a pipeline register; otherwise, it is lost when the net instrction enters that pipeline stage. For store instrction, (?) => / pipeline register => /E pipeline register => (?) 8 9
Content of Pipeline Registers Which shold be passed throgh stages? I.e., what are the contents of pipeline registers? In / pipeline register (32), Inst. (32) In / pipeline register (32), Reg. (32), Reg. 2 (32), Offset (32), Reg. no. 2 and 3 () In /E pipeline register (32), ZF (), ALUOt (32), Reg. 2 (32), Reg. no. (5) In E/ pipeline register emory (32), ALUOt (32), Reg. no. (5) 9 Eample Five instrctions go throgh the IPS pipeline: lw $, 2($) (8c2a ) sb$, $2, $3 (3 582) and$2, $, $5 (85 626) or $3, $6, $7 (c7 6827) add$, $8, $9 (9 72) $pc = 5 [ ] = $ = [ ] =...... $9 = 9 2
2 22
23 2 2
25 26 3
Five instrctions go throgh the IPS pipeline lw $, 2($) (8c2a ) sb$, $2, $3 (3 582) and$2, $, $5 (85 626) or $3, $6, $7 (c7 6827) add$, $8, $9 (9 72) Register contents emory contents $pc = 5 [ ] = $ = [ ] =...... $9 = 9 27 add $, $8, $9 or $3, $6, $7 and $2, $, $5 sb $, $2, $3 lw $, 2($) / / /E E/ (a) (j) (m) (b) reslt (q) (t) Shift left 2 ress Instrction (c) Instrction (d) register (k) (e) register 2 Registers (f) (l) 2 register (g) (h) 6 Sign etend 32 (n) (o) (p) Zero ALU ALU reslt (r) () ress (v) Data () (y) (g) (z) (i) (s) (w) (f) 28
9 Control Signals mltipleor selectors 3 write signals 2 ALU signals emto- Reg em em (Src) Instrction RegDst ALUSrc Reg Branch ALUOp ALUp R-format lw sw X X beq X X Q: In which stage is the control circit? Q2: stage eectes and and stage eectes lw Is emtoreg or? 29 Pipeline Control Generate control signals all at once at stage And passed them throgh stages jst like the Eection/ress Calclation stage control lines emory access stage control lines -back stage control lines Instrction Reg Dst ALU Op ALU Op ALU Src Branch em em Reg write em to Reg R-format lw sw X X beq X X Instrction Control / / /E E/ 3 5
Datapath with Control Src / /E Control E/ / ress Instrction Instrction register register 2 Registers register Reg 2 Shift left 2 reslt ALUSrc Zero ALU ALU reslt Branch em ress Data emtoreg Instrction [5 ] 6 32 Sign etend 6 ALU control em Instrction [2 6] Instrction [5 ] RegDst ALUOp 3 Graphically Representing Pipelines Time (in clock cycles) Program eection order (in instrctions) lw $, 2($) CC CC 2 CC 3 CC CC 5 CC 6 I Reg ALU D Reg sb $, $2, $3 I Reg ALU D Reg Can help with answering qestions like: how many cycles does it take to eecte this code? what is the ALU doing dring cycle? se this representation to help nderstand paths 32 6
Data Hazards Needed still being compted by previos instrction sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $, $2, $2 sw $5, ($2) 33 Assme $=, $2=, $3=3 Data Hazards: Dependencies Problem with starting net instrction before first is finished dependencies that go backward in time are hazards Time (in clock cycles) Vale of register $2: Program eection order (in instrctions) sb $2, $, $3 and $2, $2, $5 CC CC 2 CC 3 CC CC 5 CC 6 I Reg I Reg CC 7 CC 8 CC 9 / 2 2 2 2 2 Reg D Reg D and has a problem or has a problem add??? sw is OK or $3, $6, $2 I Reg D Reg add $, $2, $2 I Reg D Reg sw $5, ($2) I Reg D Reg 3 7
Data Hazards: Forwarding While reslt not written back ntil : sb $2,$,$3 E and $2,$2,$5 Stall Stall E It is calclated earlier in : sb $2,$,$3 E Actally available after stage (not ) and $2,$2,$5 35 E forwarding hardware to allow, e.g., s otpt (located in /E pipeline register) to be s inpt. Actally needed at stage (not ) Forwarding : All 2 Cases Time (in clock cycles) CC CC 2 CC 3 CC CC 5 CC 6 CC 7 CC 8 CC 9 Vale of register $2 : / 2 2 2 2 2 Vale of /E : X X X 2 X X X X X Vale of E/ : X X X X 2 X X X X Program eection order (in instrctions) sb $2, $, $3 and $2, $2, $5 I Reg I Reg D Reg D Reg and has a problem -> fied or has a problem -> fied add??? -> OK sw is OK or $3, $6, $2 I Reg D Reg add $, $2, $2 I Reg D Reg sw $5, ($2) I Reg D Reg 36 8
Data Hazards (again) Needed still being compted by previos instrction sb $, $3, $2 and $2, $, $ or $3, $6, $ add $, $8, $9 sw $5, ($2) 37 sb $, $3, $2 / / /E E/ (a) (j) (m) (b) reslt (q) (t) Shift left 2 ress Instrction (c) Instrction (d) register (k) (e) register 2 Registers (f) (l) 2 register (g) (h) 6 Sign etend 32 (n) (o) (p) Zero ALU ALU reslt (r) () ress (v) Data () (y) (g) (z) (i) (s) (w) (f) 38 9
and $2, $, $ sb $, $3, $2 / / /E E/ (a) (j) (m) (b) $Rs=3 reslt (q) (t) Rs=3 Shift left 2 ress Instrction (c) Instrction (f) (g) register register 2 Registers 2 register (h) 6 Sign etend 32 (n) (o) (p) Zero ALU ALU reslt (r) () ress (v) Data () (y) (g) (z) (s) (w) (f) Rd= 39 or $3, $6, $ and $2, $, $ sb $, $3, $2 / / /E E/ (a) (j) (m) (b) ress Instrction (c) Instrction register (e) register 2 Registers (f) (l) 2 (g) register (h) $Rs=??? Rs= 6 Sign etend 32 Shift left 2 (n) (o) (p) reslt??? (q)??? Zero ALU ALU reslt (r) (t) () ress (v) Data () (y) (g) (z) (s) (w) (f) Rd=2 Rd= 2
or $3, $6, $ and $2, $, $ sb $, $3, $2 / / /E E/ (a) (j) (m) (b) ress Instrction (c) Instrction register (e) register 2 Registers (f) (l) 2 (g) register (h) $Rs= Rs= 6 Sign etend 32 Shift left 2 (n) (o) (p) reslt 3 (q) Zero ALU ALU reslt (r) (t) () ress (v) Data () (y) (g) (z) (s) (w) (f) Rd=2 Rd= add $, $8, $9 or $3, $6, $ and $2, $, $ sb $, $3, $2 / / /E E/ (a) (j) (m) (b) reslt (q) (t) ress Instrction (c) Instrction (d) register (k) (e) register 2 Registers (f) (l) 2 register (g) (h) 6 Sign etend 32 Shift left 2 (n) (o) (p)??? Zero ALU ALU reslt (r)??? () ress (v) Data () (y) (g) (z) (i) (s) (w) (f) 2 Rd=2 Rd= 2
add $, $8, $9 or $3, $6, $ and $2, $, $ sb $, $3, $2 / / /E E/ (a) (j) (m) (b) reslt (q) (t) ress Instrction (c) Instrction (d) register (k) (e) register 2 Registers (f) (l) 2 register (g) (h) 6 Sign etend 32 Shift left 2 (n) (o) (p) Zero ALU ALU reslt (r) () ress (v) Data () (y) (g) (z) (i) (s) (w) (f) 3 Rd=2 Rd= sw $5, ($2) add $, $8, $9 or $3, $6, $ and $2, $, $ sb $, $3, $2 / / /E E/ (a) (j) (m) (b) reslt (q) (t) ress Instrction (c) Instrction (d) register (k) (e) register 2 Registers (f) (l) 2 register (g) (h) 6 Sign etend 32 Shift left 2 (n) (o) (p)??? Zero ALU ALU reslt (r) () ress (v) Data ()??? (y) (g) (z) (i) (s) (w) (f) Rd= 22
Forwarding : Implementation / / /E E/ reslt itional path for forwarding? Shift left 2 ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 How to control the forwarding pth? 5 Forwarding : Implementation / / /E E/ reslt itional path for forwarding? Shift left 2 ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 How to control the forwarding pth? 6 23
Forwarding : Forwarding Unit / /E Control E/ / Instrction Instrction Registers ALU Data /.RegisterRs Rs /.RegisterRt Rt /.RegisterRt /.RegisterRd Rt Rd /E.RegisterRd Forwarding nit: 6-inpt, 2-otpt combinational circit 7 Forwarding nit E/.RegisterRd HW#, (5) Forwarding Control Control logic ForwardA = if (/E.Rd = /.Rs) <- get operand from /E if (E/.Rd = /.Rs) <- get operand from E/, otherwise <- get operand from / ForwardB = if (/E.Rd = /.Rt) <- get operand from /E if (E/.Rd = /.Rt) <- get operand from E/, otherwise <- get operand from / 8 2
Forwarding Control Circit ForwardA = if ((/E.Rd = /.Rs) && /E.Reg && (/E.Rd )) if ((E/.Rd = /.Rs) && E/.Reg && (E/.Rd ) && (/E.Rd /.Rs)), otherwise ForwardB = if ((/E.Rd = /.Rt) && /E.Reg && (/E.Rd )) if ((E/.Rd = /.Rt) && E/.Reg && (E/.Rd ) && (/E.Rd /.Rt))), otherwise 9 Data Hazards: All Considered??? bt it doesn t eliminate all hazards: lw $s5,($s) E add $s7,$s5,$s6 Stall E especially when we remember that access is really often mch longer than a single cycle: lw $s5,($s) E add $s7,$s5,$s6 Stall Stall Stall E 5 25
Data Hazards: Stalling Stall the pipeline by keeping an instrction in the same stage Program Time (in clock cycles) eection order (in instrctions) CC CC 2 CC 3 CC CC 5 CC 6 CC 7 CC 8 CC 9 CC lw $2, 2($) I Reg D Reg and $, $2, $5 I Reg Reg D Reg or $8, $2, $6 add $9, $, $2 I I Reg D Reg bbble I Reg D Reg slt $, $6, $7 I Reg D Reg lw-and lw-or At CC5, E stage is empty!!! 5 Data Hazards: Stalling Stalling detection and control Detects dring the stage when lw instrction is in stage The following two instrctions are in ( and ) and ( or ) stages, respectively If detected, Stall the following instrction (in stage, and ) so that it repeats the stage again => / pipeline register shold not be changed Stall the second instrction (in stage, or ) so that it repeats the stage again => shold not be changed 52 26
Data Hazards: Stalling lw Hazard detection If (/.em and ((/.Rt = /.Rs) or (/.Rt = /.Rt)) stall the pipeline Control signals generated from hazard detection nit / to prevent / register from changing to prevent from changing UX control to delay forwarding control signals (pass nll signals) 53 Stalling: Detection Unit Stall by letting an instrction that won t write anything go forward Hazard detection nit /.em / / Control /E E/ / Instrction Instrction Registers ALU Data /.RegisterRs /.RegisterRt Hazard detection nit: -inpt, 3-otpt combinational circit /.RegisterRt /.RegisterRd /.RegisterRt 5 Rt Rd Rs Rt Forwarding nit /E.RegisterRd E/.RegisterRd 27
Stalling: What happen in the pipleine? CC CC2 CC3 CC CC5 CC6 CC7 CC8 CC9 CC CC CC2 No stage E stage is repeated at CC7 <- /. No E stage E stage is repeated at CC7 <- No stage No at CC7, no E at CC8 E and no at CC9 <- zero control signals lw $s5,($s) E add $s7,$s5,$s6 Stall () E Stall () E E 55 E Branch (Control) Hazards While eecting a previos branch, net instrction address might not yet be known. s n i o t c r t s n I Conditional branch Branch target Calclates +. Stall 2 Comptes branch target address. Performs branch test & sets to target Stall E E 3 5 6 7 8 Time Step (Clock Cycle) 56 28
Branch (Control) Hazards 57 Branch Hazards We can stall the pipeline for every branch instrction Too slow (3 instrctions) Or, contine eection down the seqential instrction stream assming that the branch will not be taken (predict branch not taken ) If the condition is not met, OK! (prediction is sccessfl) If the condition is met, (prediction is wrong) Some nwanted instrctions are in the pipeline! Need to flsh instrctions How do yo compare the above two? If branches are taken half the time, and if it costs little to discard the instrctions, the second approach halves the cost of control hazards 58 29
Stalling: What happen in the pipleine? A new control signal.flsh is introdced to flsh the instrction in stage It zeros the instrction field of the / pipeline register, which in fact can be decoded as sll $, $, $ In fact, nop = sll $, $, $ beq $,$2, 7 add $3,$,$5 target of beq.flsh at CC3 will do. 59 CC CC2 CC3 CC CC5 CC6 CC7 Nll () E stage eectes a nll instrction (sll $,$,$) at CC3 Nll Nll Nll () (E)() E CC5 stage eectes a nll instrction (sll $,$,$) at CC E stage eectes a nll instrction (sll $,$,$) at stage eectes a nll instrction (sll $,$,$) at CC6 Branch Hazards: Branch Delay Slots While determining net instrction address, go ahead and eecte seqentially following instrction(s). Comptes branch target address. Performs branch test & sets to target. s n i o t c r t s n I Conditional branch Branch delay Branch target E E Fetches correct target. E 2 3 5 6 7 Time Step (Clock Cycle) 6 3
Branch Hazards: Branch Delay Slots Advantage: Can avoid one stall per delay slot. Disadvantages: akes assembly-langage programming more difficlt. Can be difficlt to find appropriate code for slot. Eposes implementation detail that cold change. Later implementations withot a stall mst still emlate slot. ost modern processors avoid 6 Branch Hazards: Branch Prediction Gess which instrction is net, & start eecting it. What if gess is wrong? : Flsh the pipeline Simplest gesses: Always Taken or Never Taken. When to do prediction? Static prediction: compiler Dynamic prediction: processor 62 3
Dynamic Branch Prediction Branch prediction bffer (branch history table) A small that is indeed by the lower portion of the address of the branch instrction and that contains one or more bits indicating whether the branch was recently taken or not. Instrction Instrction BPB Prediction (T or NT) 63 / Dynamic Branch Prediction -bit predictor T Predict taken N (Not taken) T (Taken) Predict not taken NT Prediction accracy --- --- loop times => st :?, 2 nd : correct, 3 rd : correct, beq 9 th : correct, th : incorrect => 8% accracy (Becase the first one is incorrect in the second eection of the same code.) 6 32
Eceptions Another form of control hazard involves eceptions. When an arithmetic overflow occrs dring eecting add $, $2, $ Transfer control to the eception rotine ( ) This is the same as eecting a branch instrction Necessary actions are Stop eecting the crrent instrction and start the eception rotine. Following instrctions already in the pipe mst be wiped ot (flsh pipeline registers). Retrn to the offending instrction. 65 Flsh Control Signals Similar to the taken-branch, we need to flsh pipeline registers. Qestion is which pipeline register(s)? Arithmetic overflow is detected at the end of stage. And ths flshing takes place at E stage (at the net cycle). Since three following instrctions are already in the pipeline (, and stages), we need to flsh those three instrctions. Therefore, we need.flsh and.flsh in addition to.flsh control signal. 66 33
For the instrction in stage For the instrction in stage For the instrction in stage.flsh.flsh.flsh Instrction / Hazard detection nit Control Shift left 2 Registers = / 67 Case Ecept OF ALU /E Data E/ Sign Challenges etend What if more than one instrction generates eceptions? Forwarding nit While add cases an overflow eception at CC5 in, another cases an invalid opcode eception at CC5 in It is not OK to generate all flshing signals. And, how does the eception service rotine correctly identify the instrction that cases the eception? => Imprecise eception 68 3
Precise and Imprecise Eceptions Precise eceptions Hardware (CPU) correctly identifies the offending instrction. And makes sre all prior instrctions complete. All instrctions following it are not allowed to complete their eection and have not modified the process state Imprecise eception Hardware does not garantee it and leaves it p to the operating system to determine which instrction cased the problem. Some instrctions following the offending instrction are allowed to completed their eection and modified the process state. ost of modern CPUs spport Precise eceptions 69 35