Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind

Size: px

Start display at page:

Download "Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind"

Cassandra Briggs
5 years ago
Views:

1 Pipelining hink of sing machines in landry services Chapter 6 nhancing Performance with Pipelining 6 P A ime ask A B C ot pipelined Assme 3 min. each task wash, dry, fold, store and that separate tasks se separate hardware and so can be overlapped 6 P A ime ask A B Pipelined C Pipelined vs. Single-Cycle Instrction ection: the Plan eection ime (in instrctions) lw $, ($) lw $, ($) lw $3, 3($) eection ime (in instrctions) lw $, ($) lw $, ($) lw $3, 3($) Instrction Instrction ns ns Instrction 8 ns 6 8 Instrction ns Instrction Single-cycle Instrction Assme ns for, operation; ns for : therefore, single cycle clock 8 ns; pipelined clock cycle ns. 8 ns Pipelined... Pipelining: Keep in ind Pipelining does not redce latency of a single task, it increases throghpt of entire workload Pipeline rate limited by longest stage potential speedp = nmber pipe stages nbalanced lengths of pipe stages redces speedp ime to fill pipeline and time to drain it when there is slack in the pipeline redces speedp ns ns ns ns ns

2 Pipelining IPS What makes it hard? strctral hazards: different instrctions, at different stages, in the pipeline want to se the same hardware resorce hazards: scceeding instrction, to pt into pipeline, depends on the otcome of a previos branch instrction, already in pipeline hazards: an instrction in the pipeline reqires to be compted by a previos instrction still in the pipeline Before actally bilding the pipelined path and we first briefly eamine these potential hazards individally Strctral Hazards Strctral hazard: inadeqate hardware to simltaneosly spport all instrctions in the pipeline in the same clock cycle.g., sppose single not separate instrction and in pipeline below with one read port then a strctral hazard between first and forth lw instrctions eection ime (in instrctions) lw $, ($) lw $, ($) lw $3, 3($) lw $, ($) Instrction ns 6 8 Instrction ns Instrction ns Instrction Pipelined Hazard if single ns ns ns ns ns IPS was designed to be pipelined: strctral hazards are easy to avoid! Hazards hazard: need to make a decision based on the reslt of a previos instrction still eecting in pipeline Soltion Stall the pipeline eection (in instrctions) add $, $, $6 beq $, $, lw $3, 3($) ime Instrction ns Instrction ns bbble Instrction ns Pipeline stall ote that branch otcome is compted in I stage with added hardware (later ) Hazards Soltion Predict branch otcome e.g., predict branch-not-taken : eection (in instrctions) add $, $, $6 beq $, $, lw $3, 3($) eection (in instrctions) add $, $,$6 beq $, $, ime ime Instrction ns Instrction ns 6 8 Instrction ns Instrction Instrction Prediction sccess 6 8 bbble bbble bbble bbble bbble or $7, $8, $9 ns Instrction Prediction failre: ndo (=flsh) lw

3 Hazards Soltion 3 elayed branch: always eecte the seqentially net statement with the branch eecting after one instrction delay compiler s job to find a statement that can be pt in the slot that is independent of branch otcome IPS does this bt it is an option in SPI (Simlator -> Settings) eection (in instrctions) beq $, $, ime Instrction 6 8 Hazards hazard: instrction needs from the reslt of a previos instrction still eecting in pipeline Soltion Forward if possible ime 6 8 add $s, $t, $t IF I Instrction pipeline diagram: shade indicates se left=write, right=read add $, $, $6 (d elayed branch slot) ns lw $3, 3($) Instrction ns Instrction ns elayed branch beq is followed by add that is independent of branch otcome eection ime (in instrctions) add $s, $t, $t sb $t, $s, $t3 6 8 IF I IF I Withot forwarding ble line has to go back in time; with forwarding red line is available in time Hazards may not be enogh e.g., if an R-type instrction following a load ses the reslt of the load called load-se hazard ime eection (in instrctions) lw $s, ($t) sb $t, $s, $t3 ime eection (in instrctions) lw $s, ($t) 6 8 IF I IF I 6 8 IF I bbble bbble bbble bbble bbble Withot a stall it is impossible to provide inpt to the sb instrction in time With a one-stage stall, forwarding can get the to the sb instrction in time Reing Code to Avoid Pipeline Stall (Software Soltion) ample: lw $t, ($t) lw $t, ($t) sw $t, ($t) sw $t, ($t) Reed code: lw $t, ($t) lw $t, ($t) sw $t, ($t) sw $t, ($t) hazard Interchanged sb $t, $s, $t3 IF I

4 Pipelined path Review - Single-Cycle path Steps We now move to actally bilding a pipelined path First recall the steps in instrction eection. Instrction Fetch & Increment (IF). Instrction ecode and ister (I) 3. ection or calclate address (). emory (). reslt into () Review: single-cycle processor all steps done in a single clock cycle dedicated hardware reqired for each step What happens if we break the eection into mltiple cycles, bt keep the etra hardware? R R Instrction emory 3 6 Instrction I 3 R R W R ister File R 6 3 << R emory R IF Instrction Fetch I Instrction ecode ecte/ ress Calc. emory Access Back Pipelined path Key Idea Pipelined path What happens if we break the eection into mltiple cycles, bt keep the etra hardware? Answer: We may be able to start eecting a new instrction at each clock cycle - pipelining bt we shall need etra s to hold between cycles pipeline s R R Instrction emory 3 Pipeline swide enogh to hold coming in 6 bits 6 Instrction I 3 R R W R ister File R bits << 97 bits 6 bits R emory R I/ / /

5 Pipelined path Bg in the path Pipeline s wide enogh to hold coming in I/ / / R R Instrction emory 3 6 bits 6 Instrction I 3 R R W R ister File R bits << 97 bits 6 bits R emory R R R Instrction emory 3 6 Instrction I 3 R R W R ister File R 6 3 << R emory R I/ / / Only flowing right to left may case hazard, why? nmber comes from another later instrction! Corrected path Pipelined ample I/ 6 bits 33 bits << / / bits 69 bits Consider the following instrction seqence: lw $t, ($t) sw $t3, ($t) add $t, $t6, $t7 sb $t8, $t9, $t R R Instrction emory 3 R R W ister File R R 6 3 R emory R estination nmber is also passed throgh I/, / and / s, which are now wider by bits

6 Single-Clock-Cycle iagram: Clock Cycle Single-Clock-Cycle iagram: Clock Cycle LW SW LW I/ / / I/ / / << << R R Instrction emory 3 R R W ister File R R 6 3 R emory R R R Instrction emory 3 R R W ister File R R 6 3 R emory R Single-Clock-Cycle iagram: Clock Cycle 3 Single-Clock-Cycle iagram: Clock Cycle SW LW SB SW LW I/ / / I/ / / << << R R Instrction emory 3 R R W ister File R R 6 3 R emory R R R Instrction emory 3 R R W ister File R R 6 3 R emory R

7 Single-Clock-Cycle iagram: Clock Cycle Single-Clock-Cycle iagram: Clock Cycle 6 SB SW LW SB SW I/ / / I/ / / << << R R Instrction emory 3 R R W ister File R R 6 3 R emory R R R Instrction emory 3 R R W ister File R R 6 3 R emory R Single-Clock-Cycle iagram: Clock Cycle 7 Single-Clock-Cycle iagram: Clock Cycle 8 SB SB I/ / / I/ / / << << R R Instrction emory 3 R R W ister File R R 6 3 R emory R R R Instrction emory 3 R R W ister File R R 6 3 R emory R

8 lw $t, ($t) sw $t3, ($t) add $t, $t6, $t7 Alternative View ltiple-clock-cycle iagram CC CC CC 3 CC CC CC 6 CC 7 I RG RG I RG RG ime ais I RG RG sb $t8, $t9, $t I RG RG CC 8 otes One significant difference in the eection of an R-type instrction between mlticycle and pipelined implementations: write-back for the R-type instrction is the th (the last write-back) pipeline stage vs. the th stage for the mlticycle implementation. Why? think of strctral hazards when writing to the file Worth repeating: the essential difference between the pipeline and mlticycle implementations is the insertion of pipeline s to decople the stages he CPI of an ideal pipeline (no stalls) is. Why? he RaVi Architectre Visalization Project of ortmnd. has pipeline simlations see link in or itional Resorces page As we develop for the pipeline keep in mind that the tet does not consider jmp shold not be too hard to implement! Recall Single-Cycle the path Recall Single-Cycle address Instrction Instrction [3 ] Instrction [3 6] Instrction [ ] Instrction [ 6] Instrction [ ] Instrction [ ] st em emto Op em Src isters Instrction [ ] 6 3 etend left reslt reslt ress Src Instrction AlOp Instrction Fnct Field esired opcode operation action inpt LW load word add SW store word add eq branch eq sbtract R-type add add R-type sbtract sbtract R-type A and R-type OR or R-type set on less set on less Op Fnct field Operation Op Op F F F3 F F F rth table for bits

9 Recall Single-Cycle als Pipeline al ame ffect when deasserted ffect of bits ffect when asserted st he destination nmber for the he destination nmber for the comes from the rt field (bits -6) comes from the rd field (bits -) one he on the inpt is written with the vale on the inpt AlLSrc he second operand comes from the he second operand is the sign-etended, second file otpt ( ) lower 6 bits of the instrction Src he is replaced by the otpt of the adder he is replaced by the otpt of the adder that comptes the vale of + that comptes the branch target em one contents designated by the address inpt are pt on the first otpt em one contents designated by the address inpt are replaced by the vale of the inpt emto he vale fed to the inpt he vale fed to the inpt comes from the comes from the etermining Instrction st Src emto- em em Op p R-format lw bits sw beq Initial design motivated by single-cycle path se the same signals Observe: o separate write signal for the as it is written every cycle o separate write signals for the pipeline s as they are written every cycle o separate read signal for instrction as it is read every clock cycle o separate read signal for file as it is read every clock cycle eed to set signals dring each pipeline stage Since signals are associated with components active dring a single pipeline stage, can grop lines into five grops according to pipeline stage Pipelined path with I Pipeline als I/ / / Src here are five stages in the pipeline instrction / increment instrction decode / eection / address calclation write back othing to as instrction read and write are always enabled left reslt ress Instrction Same signals as the single-cycle path Instrction isters Instrction [ ] Instrction [ 6] Instrction [ ] 6 3 etend Src 6 st Op reslt em ress em emto ection/ress Calclation stage lines emory stage lines -back stage lines Instrction st Op Op Src em em write em to R-format lw sw beq

10 Pipeline Implementation Pipelined path with II Pass signals along jst like the etend each pipeline to hold needed bits for scceeding stages Src I/ / / Instrction I/ / / ote: he 6-bit fnct field of the instrction reqired in the stage to generate can be retrieved as the 6 least significant bits of the immediate field which is sign-etended and passed from the to the I/ ress Instrction signals emanate from the portions of the pipeline s Instrction isters Instrction [ ] Instrction [ 6] Instrction [ ] 6 3 etend left 6 st reslt Src Op reslt ress em em emto Pipelined ection and IF: lw $, ($) ress Instrction I: before<> : before<> : before<3> : before<> Instrction isters Instrction [ ] etend I/ left reslt Src reslt / ress em em / emto Pipelined ection and IF: and $, $, $ ress Instrction I: sb $, $, $3 : lw $,... : before<> : before<> Instrction sb 3 isters Instrction [ ] etend $ $3 I/ $ left reslt Src reslt / em ress em / emto Instrction seqence: Clock cycle Clock IF: sb $, $, $3 Instrction [ 6] Instrction [ ] st Op I: lw $, ($) : before<> : before<> : before<3> Instrction seqence: Clock cycle 3 Clock 3 IF: or $3, $6, $7 Instrction [ 6] Instrction [ ] st Op I: and $, $, $3 : sb $,... : lw $,... : before<> lw $, ($) sb $, $, $3 and $, $, $7 or $3, $6, $7 add $, $8, $9 ress Instrction Label before<i> means i th instrction before lw Clock cycle Clock Instrction lw isters Instrction [ 6] Instrction [ ] etend Instrction [ ] $ $ I/ left st reslt Src reslt Op / em ress em / emto lw $, ($) sb $, $, $3 and $, $, $7 or $3, $6, $7 add $, $8, $9 ress Instrction Clock cycle Clock Instrction and I/ left $ $ isters $ $3 Instrction [ ] Instrction [ 6] Instrction [ ] etend st reslt Src reslt Op / ress em em / emto

11 Instrction Pipelined ection and IF: add $, $8, $9 ress I: or $3, $6, $7 : and $,... : sb $,... : lw $,... Instrction or 6 7 isters $6 $7 I/ $ $ left reslt Src reslt / em ress / emto Pipelined ection and IF: after<> ress Instrction I: after<> : add $,... : or $3,... : and $,... Instrction isters I/ $8 $9 left reslt Src reslt / em ress / emto Instrction [ ] etend em Instrction [ ] etend em Instrction seqence: lw $, ($) sb $, $, $3 and $, $, $7 or $3, $6, $7 add $, $8, $9 Label after<i> means i th instrction after add Clock cycle IF: after<> ress Clock Instrction Clock cycle 6 Clock 6 Instrction 3 Instrction [ 6] Instrction [ ] I/ left I: add $, $8, $9 : or $3,... : and $,... : sb $,... add 8 9 isters Instrction [ ] Instrction [ 6] Instrction [ ] etend 3 $8 $9 $6 $7 3 st st Op reslt Src reslt Op / em ress em / emto Instrction seqence: lw $, ($) sb $, $, $3 and $, $, $7 or $3, $6, $7 add $, $8, $9 Clock cycle 7 IF: after<3> Clock 7 ress Instrction Clock cycle 8 Clock 8 Instrction Instrction [ 6] Instrction [ ] I/ left I: after<> : after<> : add $,... : or $3,... 3 isters Instrction [ 6] Instrction [ ] etend Instrction [ ] st st Op reslt Src reslt Op / 3 em ress em / 3 emto Pipelined ection and Revisiting Hazards Instrction seqence: lw $, ($) sb $, $, $3 and $, $, $7 or $3, $6, $7 add $, $8, $9 IF: after<> ress Instrction I: after<3> : after<> : after<> : add $,... Instrction isters I/ left reslt Src reslt / em ress / emto So far or path and have ignored hazards We shall revisit hazards and hazards and enhance or path and to handle them in hardware Instrction [ ] etend em Clock cycle 9 Clock 9 Instrction [ 6] Instrction [ ] st Op

12 Hazards and Problem with starting an instrction before previos are finished: dependencies that go backward in time called hazards $ = before sb; $ = - after sb sb $, $, $3 and $, $, $ or $3, $6, $ add $, $, $ sw $, ($) eection (in instrctions) ime (in clock cycles) Vale of $: sb $, $, $3 and $, $, $ or $3, $6, $ add $, $, $ sw $, ($) CC CC CC 3 CC CC CC 6 I I I CC 7 CC 8 CC 9 / I I Software Soltion Have compiler garantee never any hazards! by rearranging instrctions to insert independent instrctions between instrctions that wold otherwise have a hazard between them, or, if sch rearrangement is not possible, insert nops sb $, $, $3 lw $, ($3) slt $, $6, $7 and $, $, $ or $3, $6, $ add $, $, $ or sb $, $, $3 nop nop and $, $, $ or $3, $6, $ add $, $, $ sw $, ($) sw $, ($) Sch compiler soltions may not always be possible, and nops slow the machine down IPS: nop = no operation = (3bits) = sll $, $, Hardware Soltion: Idea: se intermediate, do not wait for reslt to be finally written to the destination. wo steps:. etect hazard. Forward intermediate to resolve hazard Pipelined path with II (as before) Src I/ / / ress Instrction Instrction isters left reslt Src reslt ress em emto signals emanate from the portions of the pipeline s Instrction [ ] Instrction [ 6] Instrction [ ] 6 3 etend 6 st Op em

13 Hazard etection Hazard conditions: a. /.isterrd = I/.isterRs b. /.isterrd = I/.isterRt a. /.isterrd = I/.isterRs b. /.isterrd = I/.isterRt g., in the earlier eample, first hazard between sb $, $, $3 and and $, $, $ is detected when the and is in stage and the sb is in stage becase /.isterrd = I/.isterRs = $ (a) Whether to forward also depends on: if the later instrction is going to write a if not, no need to forward, even if there is nmber match as in conditions above if the destination of the later instrction is $ in which case there is no need to forward vale ($ is always and never overwritten) Plan: allow inpts to the not jst from I/, bt also later pipeline s, and se mltipleors and signals to choose appropriate inpts to sb $, $, $3 and $, $, $ or $3, $6, $ add $, $, $ sw $, ($) eection (in instrctions) sb $, $, $3 and $, $, $ ime (in clock cycles) CC CC CC 3 CC CC CC 6 I I I CC 7 CC 8 CC 9 Vale of $ : / Vale of / : Vale of / : or $3, $6, $ add $, $, $ I sw $, ($) I ependencies between pipelines move forward in time Hardware isters I/ / / Hardware with I/ Called forwarding nit, not hazard nit, becase once is forwarded there is no hazard! / / a. o forwarding isters path before adding forwarding hardware I/ ForwardA / / Instrction Instrction isters.isterrs Rs.isterRt.isterRt.isterRd Rt Rt Rd /.isterrd b. With forwarding Rs Rt Rt Rd ForwardB nit /.isterrd /.isterrd path after adding forwarding hardware nit /.isterrd path with forwarding hardware and wires certain details, e.g., branching hardware, are omitted to simplify the drawing ote: so far we have only handled forwarding to R-type instrctions!

14 or $, $, $ and $, $, $ sb $, $, $3 before<> before<> after<> add $9, $, $ or $, $, $ and $,... sb $,... I/ / / I/ / / $ $ $ $ Instrction Instrction isters $ $3 Instrction Instrction isters $ $ 3 ection eample: Clock cycle 3 Clock 3 add $9, $, $ or $, $, $ and $, $, $ nit sb $,... before<> ection eample (cont.): Clock cycle Clock 9 nit after<> after<> add $9, $, $ or $,... and $,... I/ I/ sb $, $, $3 and $, $, $ or $, $, $ add $9, $, $ Instrction Instrction 6 isters $ $ $ $ / / sb $, $, $3 and $, $, $ or $, $, $ add $9, $, $ Instrction Instrction isters $ $ / / 6 Clock cycle nit Clock cycle 6 9 nit Clock Clock 6 Hazards and Stalls Load word can still case a hazard: an instrction tries to read a following a load instrction that writes to the same lw $, ($) and $, $, $ or $8, $, $6 add $9, $, $ Slt $, $6, $7 As even a pipeline dependency goes backward in time forwarding will not solve the hazard eection (in instrctions) lw $, ($) and $, $, $ or $8, $, $6 add $9, $, $ slt $, $6, $7 ime (in clock cycles) CC CC CC 3 CC CC CC 6 I I therefore, we need a hazard nit to stall the pipeline after the load instrction I CC 7 CC 8 CC 9 I I ress Instrction Pipelined path with II (as before) Src signals emanate from the portions of the pipeline s Instrction isters Instrction [ ] Instrction [ 6] Instrction [ ] 6 3 etend I/ left 6 st reslt Src Op reslt / ress em em / emto

15 Hazard etection Logic to Stall echanics of Stalling Hazard nit implements the following check if to stall if ( I/.em // if the instrction in the stage is a load and ( ( I/.isterRt =.isterrs ) // and the destination or ( I/.isterRt =.isterrt ) ) ) // matches either sorce // of the instrction in the I stage, then stall the pipeline If the check to stall verifies, then the pipeline needs to stall only clock cycle after the load as after that the forwarding nit can resolve the dependency What the hardware does to stall the pipeline cycle: does not let the change (disable write!) this will case the instrction in the I stage to repeat, i.e., stall therefore, the instrction, jst behind, in the IF stage mst be stalled as well so hardware does not let the change (disable write!) this will case the instrction in the IF stage to repeat, i.e., stall changes all the, and fields in the I/ pipeline to, so effectively the instrction jst behind the load becomes a nop a bbble is said to have been inserted into the pipeline note that we cannot trn that instrction into an nop by ing all the bits in the instrction itself recall nop = (3 bits) becase it has already been decoded and signals generated Hazard etection nit Hazard I/.em nit I/ Stalling Resolves a Hazard Instrction Instrction isters.isterrs.isterrt.isterrt.isterrd I/.isterRt Rt Rd Rs Rt / nit / /.isterrd /.isterrd Same instrction seqence as before for which forwarding by itself cold not resolve the hazard: lw $, ($) and $, $, $ or $8, $, $6 add $9, $, $ Slt $, $6, $7 ime (in clock cycles) eection (in instrctions) CC CC CC 3 CC CC CC 6 lw $, ($) and $, $, $ or $8, $, $6 add $9, $, $ I I I I bbble CC 7 CC 8 CC 9 CC I path with forwarding hardware, the hazard nit and s wires certain details, e.g., branching hardware are omitted to simplify the drawing slt $, $6, $7 I Hazard nit inserts a -cycle bbble in the pipeline, after which all pipeline dependencies go forward so then the forwarding nit can handle them and there are no more hazards

16 Hazard Stalling and $, $, $ lw $, ($) before<> before<> Hazard nit I/.em I/ / / before<3> Stalling or $, $, $ and $, $, $ Hazard nit I/.em I/ bbble / lw $,... / before<> ection eample: lw $, ($) and $, $, $ or $, $, $ add $9, $, $ or $, $, $ Clock Instrction Clock cycle Instrction Instrction and $, $, $ Instrction nit isters I/.isterRt isters I/.em $ $ I/ $ $ lw $, ($) $ $ nit / before<> / before<> ection eample (cont.): lw $, ($) and $, $, $ or $, $, $ add $9, $, $ Clock Instrction Clock cycle add $9, $, $ Instrction Instrction or $, $, $ Instrction Hazard nit isters I/.isterRt isters $ $ I/.em I/ $ $ $ $ and $, $, $ $ $ nit / bbble lw $,... / Clock cycle 3 Clock 3 I/.isterRt nit Clock cycle Clock I/.isterRt nit after<> add $9, $, $ or $, $, $ and $,... bbble Stalling Hazard nit I/.em I/ / / (or ) Hazards ection eample (cont.): lw $, ($) and $, $, $ or $, $, $ add $9, $, $ Clock 6 Instrction Clock cycle 6 after<> Instrction Clock cycle 7 Clock 7 Instrction Instrction after<> Hazard nit isters I/.isterRt isters $ $ 9 I/.em I/.isterRt I/ $ $ add $9, $, $ or $,... and $,... $ $ 9 nit nit / / Problem with branches in the pipeline we have so far is that the branch decision is not made till the stage so what instrctions, if at all, shold we insert into the pipeline following the branch instrctions? Possible soltion: stall the pipeline till branch decision is known not efficient, slow the pipeline significantly! Another soltion: predict the branch otcome e.g., always predict branch-not-taken contine with net seqential instrctions if the prediction is wrong have to flsh the pipeline behind the branch discard instrctions already ed or decoded and contine eection at the branch target

17 Predicting -not-taken: isprediction delay eection (in instrctions) beq $, $3, 7 and $, $, $ 8 or $3, $6, $ add $, $, $ 7 lw $, ($7) ime (in clock cycles) CC I CC CC 3 CC CC CC 6 CC 7 CC 8 CC 9 I I I he otcome of branch taken (prediction wrong) is decided only when beq is in the stage, so the following three seqential instrctions already in the pipeline have to be flshed and eection resmes at lw I Optimizing the Pipeline to Redce elay ove the branch decision from the stage (as in or crrent pipeline) earlier to the I stage calclating the branch target address involves moving the branch adder from the stage to the I stage inpts to this adder, the vale and the immediate fields are already available in the pipeline calclating the branch decision is efficiently done, e.g., for eqality test, by ORing respective bits and then ORing all the reslts and inverting, rather than sing the to sbtract and then test for zero (when there is a carry delay) with the more efficient eqality test we can pt it in the I stage withot significantly lengthening this stage remember an objective of pipeline design is to keep pipeline stages balanced we mst correspondingly make additions to the forwarding and hazard nits to forward to or stall the branch at the I stage in case the branch decision depends on an earlier reslt Flshing on isprediction Same strategy as for stalling on load-se hazard ot all the vales (or the instrction itself) in pipeline s for the instrctions following the branch that are already in the pipeline effectively trning them into nops so they are flshed in the optimized pipeline, with branch decision made in the I stage, we have to flsh only one instrction in the IF stage the branch delay penalty is then only one clock cycle IF.Flsh Optimized path for Instrction Hazard nit left isters = I/ IF.Flsh zeros ot the instrction in the pipeline (which follows the branch) / / etend nit decision is moved from the stage to the I stage simplified drawing not showing enhancements to the forwarding and hazard nits

Pipelined ection eample: 36 sb $, $, $8 beq $, $3, 7 and $ $, $ 8 or $3 $, $6 add $, $, $ 6 slt $, $6, $7 7 lw $, ($7) Optimized pipeline

Flsh Clock 76 7 Clock cycle 7 8 Instrction 76 Instrction 8 Hazard nit 8 bbble (nop) 76 7 left 7 Hazard nit etend left etend 7 isters isters =

.. before<> $ $3 nit / / before<> Sperscalar Architectre A sperscalar processor eectes more than one instrction dring a clock cycle by

18 Pipelined ection eample: 36 sb $, $, $8 beq $, $3, 7 and $ $, $ 8 or $3 $, $6 add $, $, $ 6 slt $, $6, $7 7 lw $, ($7) Optimized pipeline with only one bbble as a reslt of the taken branch and $, $, $ beq $, $3, 7 sb $, $, $8 IF.Flsh Clock 3 7 Clock cycle 3 lw $, ($7) IF.Flsh Clock 76 7 Clock cycle 7 8 Instrction 76 Instrction 8 Hazard nit 8 bbble (nop) 76 7 left 7 Hazard nit etend left etend 7 isters isters = = $ $3 I/ I/ $ $8 nit / before<> / beq $, $3, 7 sb $,... before<> $ $3 nit / / before<> Sperscalar Architectre A sperscalar processor eectes more than one instrction dring a clock cycle by simltaneosly dispatching mltiple instrctions to redndant fnctional nits on the processor. ach fnctional nit is not a separate CP core bt an eection resorce within a single CP ypical -stage pipeline Sperscalar Pipeline Pentim Pipeline -stage pipeline

Review: Computer Organization

Review: Computer Organization Review: Compter Organization Pipelining Chans Y Landry Eample Landry Eample Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 3 mintes A B C D Dryer takes 3 mintes