Chapter 6: Pipelining

Size: px

Start display at page:

Download "Chapter 6: Pipelining"

Laurel Palmer
6 years ago
Views:

1 Chapter 6: Pipelining

2 Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining 2

3 Landry eample: Pipelining Is Natral! Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold A B C D Washer takes 3 mintes Dryer takes mintes Folder takes 2 mintes 3

4 Seqential Landry 6 P idnight Time T a s k O r d e r A B C D Seqential landry takes 6 hors for loads If they learned pipelining, how long wold it take?

5 Pipelined Landry: Start ASAP 6 P idnight Time T a s k O r d e r A B C D 3 2 Pipelined landry takes 3.5 hors for loads 5

6 Pipelining Lessons T a s k O r d e r 6 P Time 3 2 A B C D Doesn t help latency of single task, bt throghpt of entire Pipeline rate limited by slowest stage ltiple tasks working at same time sing different resorces Potential speedp = Nmber of pipe stages Unbalanced stage length; time to fill & drain the pipeline redce speedp Stall for dependences 6

7 Clk Single-, lti-cycle, vs. Pipeline Cycle Cycle 2 Single Cycle Implementation: Load Store Waste Cycle Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle Clk ltiple Cycle Implementation: Load Ifetch Eec em Wr Store Ifetch Eec em R-type Ifetch Pipeline Implementation: Load Ifetch Eec em Wr Store Ifetch Eec em Wr R-type Ifetch Eec em Wr 7

8 Pipelining IPS Eection Program eection order Time (in instrctions) lw $, ($) fetch Data access lw $2, 2($) 8 ns fetch Data access lw $3, 3($) Program eection Time order (in instrctions) lw $, ($) lw $2, 2($) fetch 2 ns 8 ns fetch Data access Data access fetch 8 ns Fig lw $3, 3($) 2 ns fetch Data access 2 ns 2 ns 2 ns 2 ns 2 ns 8

9 Why Pipeline? Becase the Resorces Are There! Time (clock cycles) I n s t r. O r d e r Inst Inst Inst 2 Inst 3 Inst Im Dm Im Dm Im Dm Im Dm Singlecycle Datapath Im Dm 9

10 Hazard Limits to pipelining: Hazards prevent net instrction from eecting dring its designated clock cycle Strctral hazards: Hardware cannot spport this combination of instrctions - two instrctions need the same resorce. Data hazards: depends on reslt of prior instrction still in the pipeline Control hazards: Pipelining of branches & other instrctions that change the PC Common soltion is to stall the pipeline ntil the hazard is resolved, inserting one or more bbbles in the pipeline To do this, hardware or software mst detect that a hazard has occrred.

11 Strctral Hazards Strctral hazards occr when two or more instrctions need the same resorce. Common methods for eliminating strctral hazards are: Dplicate resorces Pipeline the resorce Reorder the instrctions It may be too epensive too eliminate a strctral hazard, in which case the pipeline shold stall. When the pipeline stalls, no instrctions are issed ntil the hazard has been resolved. What are some eamples of strctral hazards?

12 One emory Port Strctral Hazards Figre 3.6, Page 2 Time (clock cycles) Cycle Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 Cycle 7 I n s t r. O r d e r Load Ifetch Instr Instr 2 Instr 3 Instr Ifetch Ifetch Dem Ifetch Dem Ifetch Dem Dem Dem 2

13 One emory Port Strctral Hazards Figre 3.7, Page 3 Time (clock cycles) Cycle Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 Cycle 7 I n s t r. O r d e r Load Ifetch Instr Instr 2 Stall Instr 3 Ifetch Ifetch Dem Dem Dem Bbble Bbble Bbble Bbble Bbble Ifetch Dem 3

14 Otline An overview of pipelining A pipelined path (6.2) Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining

15 Designing a Pipelined Processor Eamine the path and control diagram Starting with single- or mlti-cycle path? Single- or mlti-cycle control? Partition path into stages: IF (instrction fetch) ID (instrction decode and register file read) EX (eection or address calclation) E ( access) (write back) Associate resorces with stages Ensre that flows do not conflict, or figre ot how to resolve Assert control in appropriate stage 5

16 Use lticycle Eection Steps Step name fetch decode/register fetch Action for R-type instrctions Action for -reference Action for instrctions branches IR = emory[pc] PC = PC + A = [IR[25-2]] B = [IR[2-6]] Ot = PC + (sign-etend (IR[5-]) << 2) Action for jmps Eection, address Ot = A op B Ot = A + sign-etend if (A ==B) then PC = PC [3-28] II comptation, branch/ (IR[5-]) PC = Ot (IR[25-]<<2) jmp completion emory access or R-type [IR[5-]] = Load: DR = emory[ot] completion Ot or Store: emory [Ot] = B emory read completion Load: [IR[2-6]] = DR Bt, se single-cycle path... 6

17 Split Single-cycle Datapath IF: fetch ID: decode/ register file read EX: Eecte/ address calclation E: emory access : back Feedback Path Shift left 2 reslt PC ress register register 2 isters 2 register Zero reslt ress Data 6 Sign etend 32 Fig

18 Pipeline isters Pipeline registers (latches) IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress register register 2 isters 2 register Zero reslt ress Data Fig Sign etend 32 8

19 Considerload Cycle Cycle 2 Cycle 3 Cycle Cycle 5 Load Ifetch /Dec Eec em Wr IF: Fetch Fetch the instrction from the emory ID: Decode isters fetch and instrction decode EX: Calclate the address E: the from the Data emory : the back to the register file 9

20 Pipeliningload Cycle Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 Cycle 7 Clock st lw Ifetch /Dec Eec em Wr 2nd lw Ifetch /Dec Eec em Wr 3rd lw Ifetch /Dec Eec em Wr 5 fnctional nits in the pipeline path are: emory for the Ifetch stage ister File s ports (bsa and bsb) for the /Dec stage for the Eec stage Data emory for the E stage ister File s port (bsw) for the stage 2

21 IF Stage ofload IR = mem[pc]; PC = PC + lw fetch IR, PC+ Fig. 6.2 IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress register register 2 isters 2 register Zero reslt ress Data 6 Sign etend 32 2

22 ID Stage ofload A = [IR[25-2]]; B = [IR[2-6]]; ot = PC + (sign-et(ir[5-]) << 2) (some ops moved to the net stage) lw decode Fig. 6.2 IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress register register 2 isters 2 register Zero reslt ress Data 6 Sign etend 32 22

23 EX Stage ofload ot = A + sign-et(ir[5-]) lw Eection Fig. 6.3 IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress register register 2 isters 2 register Zero reslt ress Data 6 Sign etend 32 23

24 E State ofload DR = mem[ot] lw emory IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress register register 2 isters 2 register Zero reslt ress Data Fig Sign etend 32 2

25 Stage ofload [IR[2-6]] = DR IF/ID Who will ID/EX spply this address? reslt EX/E E/ lw back Shift left 2 PC ress register register 2 isters 2 register Zero reslt ress Data 6 Sign etend 32 Fig

26 The For Stages of R-type Cycle Cycle 2 Cycle 3 Cycle R-type Ifetch /Dec Eec Wr IF: fetch the instrction from the emory ID: registers fetch and instrction decode EX: operates on the two register operands : write otpt back to the register file 26

27 Pipelining R-type andload Cycle Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type Ifetch /Dec Eec Wr Ops! We have a problem! R-type Ifetch /Dec Eec Wr Load Ifetch /Dec Eec em Wr R-type Ifetch /Dec Eec Wr R-type Ifetch /Dec Eec Wr We have a strctral hazard: Two instrctions try to write to the register file at the same time! Only one write port 27

28 Important Observation Each fnctional nit can only be sed once per instrction Each fnctional nit mst be sed at the same stage for all instrctions: Load ses ister File s write port dring its 5th stage Load R-type Ifetch /Dec Eec em Wr R-type ses ister File s write port dring its th stage 2 3 Ifetch /Dec Eec Wr Several ways to solve: forwarding, adding pipeline bbble, making instrctions same length 28

29 Soltion: Delay R-type s Delay R-type s register write by one cycle: R-type also se File s write port at Stage 5 E is a NOP stage: nothing is being done R-type Ifetch /Dec Eec em Wr Cycle Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type Ifetch /Dec Eec em Wr R-type Ifetch /Dec Eec em Wr Load R-type also has 5 stages Ifetch /Dec Eec em Wr R-type Ifetch /Dec Eec em Wr R-type Ifetch /Dec Eec em Wr 29

30 The For Stages ofstore Cycle Cycle 2 Cycle 3 Cycle Store Ifetch /Dec Eec em Wr IF: fetch the instrction from the emory ID: registers fetch and instrction decode EX: calclate the address E: write the into the Data emory an etra stage: : NOP 3

31 The Three Stages of beq Cycle Cycle 2 Cycle 3 Cycle Beq Ifetch /Dec Eec em Wr IF: fetch the instrction from the emory ID: registers fetch and instrction decode EX: compares the two register operand select correct branch target address latch into PC two etra stages: E: NOP : NOP 3

32 Pipelined Datapath Fig. 6.7 IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress register register 2 isters 2 register Zero reslt ress Data 6 Sign etend 32 32

33 Graphically Representing Program eection order (in instrctions) lw $, 2($) Time (in clock cycles) Pipelines CC CC 2 CC 3 CC CC 5 CC 6 I D sb $, $2, $3 I D Can help with answering qestions like: How many cycles to eecte this code? What is the doing dring cycle? Help nderstand paths 33

34 Eample : Cycle lw $, 2($) fetch IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Fig. 6.8 register register 2 isters 2 register 6 Sign etend 32 Zero reslt ress Data Clock 3

35 Eample : Cycle 2 sb $, $2, $3 fetch lw $, 2($) decode IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress register register 2 isters 2 register Zero reslt ress Data Fig Sign etend 32 Clock 2 35

36 Eample : Cycle 3 sb $, $2, $3 decode lw $, 2($) Eection IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Fig. 6.8 register register 2 isters 2 register 6 Sign etend 32 Zero reslt ress Data Clock 3 36

37 Eample : Cycle sb $, $2, $3 Eection lw $, 2($) emory IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress register register 2 isters 2 register Zero reslt ress Data Fig Sign etend 32 Clock 37

38 Eample : Cycle 5 sb $, $2, $3 emory lw $, 2($) back IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress register register 2 isters 2 register Zero reslt ress Data Fig Sign etend 32 Clock 5 38

39 Eample : Cycle 6 sb $, $2, $3 back IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Fig. 6.8 register register 2 isters 2 register 6 Sign etend 32 Zero reslt ress Data Clock 6 39

40 Otline An overview of pipelining A pipelined path Pipelined control (6.3) Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining

41 Pipeline Control: Control Signals PCSrc IF/ID ID/EX EX/E E/ Shift left 2 reslt Branch PC ress Fig register register 2 isters 2 register [5 ] [2 6] [5 ] 6 Sign 32 etend Src 6 control Op Zero reslt ress em Data em emto Dst

42 Grop Signals According to Stages Can se control signals of single-cycle CPU (Fig. 6.23, 6.2 <==> 5.2, 5.6) R-type lw sw beq Eection/ress Calclation stage control lines emory access stage control lines -back stage control lines Dst Op Op Src Branch em em write em to X X X X Fig

43 Data Stationary Control Pass control signals along jst like the ain control generates control signals dring ID Control EX IF/ID ID/EX EX/E E/ Fig

44 Data Stationary Control (cont.) Signals for EX (EtOp, Src,...) are sed cycle later Signals for E (emwr, Branch) are sed 2 cycles later Signals for (emto, emwr) are sed 3 cycles later ID EX E IF/ID ister ain Control EtOp Src Op Dst emw Branch r emto Wr ID/E ister EtOp Src Op Dst emw Branch r emto Wr E/E ister emw Branch emto Wr E/ ister emto Wr

45 Stage ofload [IR[2-6]] = DR IF/ID Who will ID/EX spply this address? reslt EX/E E/ lw back Shift left 2 PC ress register register 2 isters 2 register Zero reslt ress Data 6 Sign etend 32 Fig. 6. 5

46 Datapath with Control PCSrc Control ID/EX EX/E E/ IF/ID EX PC ress register register 2 isters W rite 2 register W rite Shift left 2 reslt Src Zero reslt Branch em ress Data emto [5 ] 6 Sign 32 etend 6 control em Fig [2 6] [5 ] Dst Op 6

47 Let s Try it Ot lw $, 2($) sb$, $2, $3 and$2, $, $5 or $3, $6, $7 add$, $8, $9 7

48 Eample 2: Cycle IF: lw $, 2($) ID: before<> EX: before<2> E: before<3> : before<> IF/ID Control ID/EX EX EX/E E/ PC ress register register 2 isters 2 register Shift left 2 reslt Src Zero reslt Branch ress Data em emto [5 ] Sign etend control em Clock [2 6] [5 ] Dst Op 8

49 Eample 2: Cycle 2 IF: sb $, $2, $3 ID: lw $, 2($) EX: before<> E: before<2> : before<3> IF/ID ID/EX EX/E E/ lw Control EX PC ress X register register 2 isters 2 register $ $X Shift left 2 reslt Src Zero reslt Branch em ress Data emto 2 [5 ] Sign etend 2 control em Clock 2 X [2 6] [5 ] X Dst Op 9

50 Eample 2: Cycle 3 IF: and $2, $, $5 ID: sb $, $2, $3 EX: lw $,... E: before<> : before<2> IF/ID sb Control ID/EX EX EX/E E/ PC ress 2 3 register register 2 isters 2 register $2 $3 Shift left 2 $ reslt Src Zero reslt Branch em ress Data emto X [5 ] Sign etend X 2 control em Clock 3 X [2 6] [5 ] X Dst Op 5

51 Eample 2: Cycle IF: or $3, $6, $7 ID: and $2, $2, $3 EX: sb $,... E: lw $,... : before<> IF/ID and Control ID/EX EX EX/E E/ PC ress 5 Shift left 2 register $ $2 register 2 isters $5 $3 2 register reslt Src Zero reslt Branch ress em Data emto X [5 ] Sign etend X control em Clock X 2 [2 6] [5 ] X 2 Op Dst 5

52 Eample 2: Cycle 5 IF: add $, $8, $9 ID: or $3, $6, $7 EX: and $2,... E: sb $,... : lw $,... IF/ID ID/EX EX/E E/ or Control EX PC ress 6 7 register register 2 isters 2 register $6 $7 Shift left 2 $ $5 reslt Src Zero reslt Branch em ress Data emto X [5 ] Sign etend X control em Clock 5 X 3 [2 6] [5 ] X 3 2 Dst Op 52

53 Eample 2: Cycle 6 IF: after<> ID: add $, $8, $9 EX: or $3,... E: and $2,... : sb $,... IF/ID add Control ID/EX EX EX/E E/ PC ress 8 9 register register 2 isters 2 register $8 $9 Shift left 2 $6 $7 reslt Src Zero reslt Branch em ress Data emto X [5 ] Sign etend X control em Clock 6 X [2 6] [5 ] X 3 Op Dst 2 53

54 Eample 2: Cycle 7 IF: after<2> ID: after<> EX: add $,... E: or $3,... : and $2,... IF/ID ID/EX EX/E E/ Control EX PC ress 2 register register 2 isters 2 register Shift left 2 $8 $9 reslt Src Zero reslt Branch em ress Data emto Fig. 6.3 Clock 7 [5 ] [2 6] [5 ] Sign etend control Dst Op em 3 2 5

55 Eample 2: Cycle 8 IF: after<3> ID: after<2> EX: after<> E: add $,... : or $3,... IF/ID ID/EX EX/E E/ Control EX PC ress 3 register register 2 isters 2 register Shift left 2 reslt Src Zero reslt Branch em ress Data emto Fig. 6.3 Clock 8 [5 ] Sign etend [2 6] [5 ] control Dst Op em 3 55

56 Eample 2: Cycle 9 IF: after<> ID: after<3> EX: after<2> E: after<> : add $,... IF/ID ID/EX EX/E E/ Control EX PC ress register register 2 isters 2 register Shift left 2 reslt Src Zero reslt Branch em ress Data emto [5 ] Sign etend control em Clock 9 [2 6] [5 ] Op Dst 56

57 Smmary of Pipeline Basics Pipelining is a fndamental concept ltiple steps sing distinct resorces Utilize capabilities of path by pipelined instrction processing Start net instrnction while working on the crrent one Limited by length of longest stage (pls fill/flsh) Need to detect and resolve hazards What makes it easy in IPS? All instrctions are of the same length Jst a few instrction formats emory operands only in loads and stores What makes pipelining hard? hazards 57

58 Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding (6.) Data hazards and stalls (6.5) Branch hazards Eceptions Sperscalar and dynamic pipelining 58

59 Pipeline Hazards Pipeline Hazards: Strctral hazards: attempt to se the same resorce in two different ways at the same time E.: combined washer/dryer or folder bsy doing something else (watching TV) Data hazards: attempt to se item before ready depends on reslt of prior instrction still in the pipeline Control hazards: attempt to make decision before condition is evalated E.: wash football niforms and need to see reslt of previos load to get proper detergent level Branch instrctions Can always resolve hazards by waiting pipeline control mst detect the hazard take action (or delay action) to resolve hazards 59

60 Strctral Hazard: Single emory Time I n s t r. O r d e r Load Instr Instr 2 Instr 3 Instr em em em em em em em em em em Use 2 : and instrction 6

61 Pipeline Hazards Illstrated IF ID EX E Strctral Hazard IF ID. 6

62 Time (in clock cycles) Vale of register $2: Program eection order (in instrctions) sb $2, $, $3 I Data Hazards CC CC 2 CC 3 CC CC 5 CC 6 CC 7 CC 8 CC 9 / D Fig and $2, $2, $5 I D or $3, $6, $2 I D add $, $2, $2 I D sw $5, ($2) I D 62

63 Types of Data Hazards Three types: (inst. i followed by inst. i2) RAW (read after write): i2 tries to read operand before i writes it WAR (write after read): i2 tries to write operand before i reads it Gets wrong operand, e.g., atoincrement addr. Can t happen in IPS 5-stage pipeline becase: All instrctions take 5 stages, and reads are always in stage 2, and writes are always in stage 5 WAW (write after write): i2 tries to write operand before i writes it Leaves wrong reslt ( i s not i2 s); occr only in pipelines that write in more than one stage Can t happen in IPS 5-stage pipeline becase: All instrctions take 5 stages, and writes are always in stage 5 63

64 Pipeline Hazards Illstrated IF ID EX E RAW (read after write) Data Hazard IF ID EX E WAW Data Hazard IF ID EX E (write after write) IF ID EX em IF ID EX E WAR Data Hazard (write after read) 6

65 Handling Data Hazards Use simple, fied designs Eliminate WAR by always fetching operands early (ID) in pipeline Eliminate WAW by doing all write backs in order (last stage, static) These featres have a lot to do with ISA design Internal forwarding in register file: in first half of clock and read in second half delivers what is written, resolve hazard between sb and add Detect and resolve remaining ones Compiler inserts NOP Forward Stall 65

66 Software Soltion Have compiler garantee no hazards Where do we insert the NOPs? sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $, $2, $2 sw $5, ($2) Problem: this really slows s down! 66

67 Time (in clock cycles) Vale of register $2: Program eection order (in instrctions) sb $2, $, $3 I Insert two nops Data Hazards CC CC 2 CC 3 CC CC 5 CC 6 CC 7 CC 8 CC 9 / D Fig and $2, $2, $5 I D or $3, $6, $2 I D add $, $2, $2 I D sw $5, ($2) I D 67

68 Data Hazards : Forwarding Time (in clock cycles) Vale of register $2: Program eection order (in instrctions) sb $2, $, $3 CC CC 2 CC 3 CC CC 5 CC 6 I CC 7 CC 8 CC 9 / D and $2, $2, $5 I D Fig or $3, $6, $2 I D add $, $2, $2 I D sw $5, ($2) I D 68

69 69 Pipeline with Forwarding PC isters Control EX ID/EX EX/E E/ Data Forwarding nit IF/ID Rd EX/E.isterRd E/.isterRd Rt Rt Rs IF/ID.isterRd IF/ID.isterRt IF/ID.isterRt IF/ID.isterRs Fig ForwardA ForwardB

70 Hazard conditions: Detecting Data Hazards a. EX/E.isterRd = ID/EX.isterRs b. EX/E.isterRd = ID/EX.isterRt 2a. E/.isterRd = ID/EX.isterRs 2b. E/.isterRd = ID/EX.isterRt Two optimizations: Don t forward if instrction does not write register => check if is asserted Don t forward if destination register is $ => check if isterrd = 7

71 Detecting Data Hazards (cont.) Hazard conditions sing control signals: At EX stage: EX/E. and (EX/E.Rd ) and (EX/E.Rd=ID/EX.Rs) At E stage: E/. and (E/.Rd ) and (E/.Rd=ID/EX.Rs) (replace ID/EX.Rt for ID/EX.Rs for the other two conditions) 7

72 Resolving Hazards: Forwarding Use temporary reslts, e.g., those in pipeline registers, don t wait for them to be written Time (in clock cycles) CC C C 2 C C 3 C C C C 5 C C 6 C C 7 C C 8 C C 9 Vale of register $2 : / Vale of EX/ E : X X X 2 X X X X X Vale of E /W B : X X X X 2 X X X X Program eection orde r (in instrctions) sb $2, $, $3 I R eg D R eg Fig and $2, $2, $5 I D or $3, $6, $2 I R eg D R eg add $, $2, $2 I R eg D R eg sw $5, ($2) I D 72

73 73 Pipeline with Forwarding PC isters Control EX ID/EX EX/E E/ Data Forwarding nit IF/ID Rd EX/E.isterRd E/.isterRd Rt Rt Rs IF/ID.isterRd IF/ID.isterRt IF/ID.isterRt IF/ID.isterRs Fig ForwardA ForwardB

74 Forwarding Logic Forwarding: inpt to from any pipe reg. mltipleors to inpt Control forwarding in EX => carry Rs in ID/EX Control signals for forwarding: If both and E forward, e.g.,add $,$,$2; add $,$,$3; add $,$,$; => let E forward EX hazard: if (EX/E. and (EX/E.Rd ) and (EX/E.Rd=ID/EX.Rs)) ForwardA= E hazard: if (E/. and (E/.Rd ) and (EX/E.Rd ID/EX..Rs) and (E/.Rd=ID/EX.Rs)) ForwardA= 7

75 Eample 3: Cycle 3 or $, $, $2 and $, $2, $5 sb $2, $, $3 before<> before<2> ID/EX EX/E Control E/ IF/ID EX 2 $2 $ PC 5 isters $5 $3 Data Forwarding nit Clock 3 75

76 Eample 3: Cycle add $9, $, $2 or $, $, $2 and $, $2, $5 sb $2,... before<> ID/EX EX/E Control E/ IF/ID EX $ $2 PC 6 isters $2 $5 Data Fig. 6. Forwarding nit 2 Clock 76

77 Eample 3: Cycle 5 after<> add $9, $, $2 or $, $, $2 and $,... sb $2,... ID/EX EX/E Control E/ IF/ID EX $ $ PC 2 2 isters $2 $2 Data Fig Forwarding nit 2 Clock 5 77

78 Eample 3: Cycle 6 after<2> after<> add $9, $, $2 or $,... and $,... ID/EX EX/E Control E/ IF/ID EX $ PC isters $2 Data 2 Fig Forwarding nit Clock 6 78

79 lw can still case a hazard: (in instrctions) Can't Always Forward if is followed by an instrction to read the loaded reg. lw $2, 2($) I D and $, $2, $5 I D Fig. 6.3 or $8, $2, $6 I D add $9, $, $2 slt $, $6, $7 Use stalling or compiler to resolve I D I D 79

80 Stalling Stall pipeline by keeping instrctions in same stage and inserting an NOP instead order (in instrctions) lw $2, 2($) I D and $, $2, $5 I D Fig or $8, $2, $6 add $9, $, $2 I I D bbble I D slt $, $6, $7 I D 8

81 Pipeline with Stalling Unit Forwarding controls inpts, hazard detection controls PC, IF/ID, control signals IF/ID IF/ID Hazard detection nit Control ID/EX.em ID/EX EX EX/E Fig E/ PC PC isters Data IF/ID.isterRs IF/ID.isterRt IF/ID.isterRt IF/ID.isterRd ID/EX.isterRt Rt Rd Rs Rt Forwarding nit EX/E.isterRd E/.isterRd 8

82 Handling Stalls Hazard detection nit in ID to insert stall between a load instrction and its se: if (ID/EX.em and ((ID/EX.isterRt = IF/ID.isterRs) or (ID/EX.isterRt = IF/ID.registerRt)) stall the pipeline for one cycle (ID/EX.em= indicates a load instrction) How to stall? Stall instrction in IF and ID: not change PC and IF/ID => the stages re-eecte the instrctions What to move into EX: insert an NOP by changing EX, E, control fields of ID/EX pipeline register to as control signals propagate, all control signals to EX, E, are deasserted and no registers or memories are written 82

83 Eample : Cycle 2 and $, $2, $5 lw $2, 2($) before<> before<2> IF/ID IF/ID X Hazard detection nit Control ID/EX.em ID/EX EX EX/E E/ before<3> PC PC X isters $ $X Data ID/EX.isterRt X 2 Forwarding nit Clock 2 83

84 Eample : Cycle 3 or $, $, $2 and $, $2, $5 lw $2, 2($) before<> before<2> IF/ID IF/ID 2 5 Hazard detection nit Control ID/EX.em ID/EX EX EX/E E/ PC PC 2 5 isters $2 $5 $ $X Data 2 5 X 2 ID/EX.isterRt Forwarding nit Clock 3 8

85 Eample : Cycle or $, $, $2 and $, $2, $5 bbble lw $2,... before<> IF/ID IF/ID 2 5 Hazard detection nit Control ID/EX.em ID/EX EX EX/E E/ PC PC 2 5 isters $2 $5 $2 $5 Data ID/EX.isterRt Forwarding nit Clock 85

86 Eample : Cycle 5 add $9, $, $2 or $, $, $2 and $, $2, $5 bbble lw $2,... IF/ID IF/ID 2 Hazard detection nit Control ID/EX.em ID/EX EX EX/E E/ PC PC 2 2 isters $ $2 $2 $5 Data ID/EX.isterRt Forwarding nit Clock 5 86

87 Eample : Cycle 6 after<> add $9, $, $2 or $, $, $2 and $,... bbble IF/ID IF/ID 2 Hazard detection nit Control ID/EX.em ID/EX EX EX/E E/ PC PC 2 isters $ $2 $ $2 Data 2 2 Fig. 6.9 ID/EX.isterRt 9 Forwarding nit Clock 6 87

88 Eample : Cycle 7 after<2> after<> add $9, $, $2 or $,... and $,... Hazard detection nit ID/EX.em ID/EX IF/ID IF/ID Control EX EX/E E/ PC PC isters $ $2 Data 2 Fig. 6.9 ID/EX.isterRt 9 Forwarding nit Clock 7 88

89 Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards (6.6) Eceptions Sperscalar and dynamic pipelining 89

90 Feedback Path IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress register register 2 isters 2 register Zero reslt ress Data Fig Sign etend 32 9

91 Pipeline Datapath with Control Signals PCSrc IF/ID ID/EX EX/E E/ Shift left 2 reslt Branch PC ress Fig register register 2 isters 2 register [5 ] [2 6] [5 ] 6 Sign 32 etend Src 6 control Op Zero reslt ress em Data em emto Dst 9

92 Pipeline Hazards Illstrated IF ID EX E Strctral Hazard IF ID. IF ID EX E RAW (read after write) Data Hazard IF ID EX E WAW Data Hazard IF ID EX E (write after write) IF ID EX em IF ID EX E IF ID EX E Control Hazard IF ID. WAR Data Hazard (write after read) 92

93 When decide to branch, other inst. are in pipeline! order (in instrctions) Branch Hazards beq $, $3, 7 I D Fig and $2, $2, $5 I D 8 or $3, $6, $2 I D 52 add $, $2, $2 I D 72 lw $, 5($7) I D 93

94 Handling Branch Hazard Predict branch always not taken Need to add hardware for flshing inst. if wrong Branch decision made at E => need to flsh instrction in IF/ID, ID/E by changing control vales to Redce delay of taken branch by moving branch eection earlier in the pipeline ove p branch address calclation to ID Check branch eqality at ID (sing XOR) by comparing the two registers read dring ID Branch decision made at ID => one instrction to flsh a control signal, IF.Flsh, to zero instrction field of IF/ID => making the instrction an NOP Dynamic branch prediction Compiler reschedling, delay branch 9

95 Pipeline with Flshing IF.Flsh Hazard detection nit ID/EX EX/E Fig. 6. Control E/ IF/ID EX Shift left 2 PC isters = Data Sign etend Forwarding nit 95

96 Eample 5: Cycle 3 and $2, $2, $5 beq $, $3, 7 sb $, $, $8 before<> before<2> IF.Flsh 72 8 Hazard detection nit ID/EX EX/E IF/ID 8 Control EX E/ PC 72 Shift left 2 7 isters = $ $3 $ $8 Data Sign etend Fig Clock 3 Forwarding nit 96

97 Eample 5: Cycle lw $, 5($7) bbble (nop) beq $, $3, 7 sb $,... before<> IF.Flsh 76 Hazard detection nit ID/EX EX/E IF/ID Control EX E/ PC Shift left 2 isters = $ $3 Data Sign etend Fig Clock Forwarding nit 97

98 Predict-not-taken + branch decision at ID => the following instrction is always eected => branches take effect cycle later I n s t r. O r d e r add beq misc lw Delayed Branch Time (clock cycles) em em em em em em em em 98

99 Dynamic Branch Prediction Performance = ƒ(accracy, cost of misprediction) Branch History Table: Lower bits of PC address inde table of -bit vales Says whether or not branch taken last time No address check Problem: in a loop, -bit BHT will case two mispredictions (avg is 9 iterations before eit): End of loop case, when it eits instead of looping as before First time throgh loop on net time throgh code, when it predicts eit instead of looping 99

100 -Bit Prediction For each branch, keep track of what happened last time and se that otcome as the prediction What are prediction accracies for branches and 2 below: while () { for (i=;i<;i++) { branch- } for (j=;j<2;j++) { branch-2 } }

101 2-Bit Prediction For each branch, maintain a 2-bit satrating conter: if the branch is taken: conter = min(3,conter+) if the branch is not taken: conter = ma(,conter-) If (conter >= 2), predict taken, else predict not taken Advantage: a few atypical branches will not inflence the prediction (a better measre of the common case ) Especially sefl when mltiple branches share the same conter (some bits of the branch PC are sed to inde into the branch predictor) Can be easily etended to N-bits (in most processors, N=2)

102 N-bit Branch Prediction Bffers When the conter is greater than or eqal to onehalf of its maimm vale (2 n -), the branch is predicted as taken. The conter is increased on a taken branch and decremented on an ntaken branch. A branch bffer can be implemented as a small cache accessed dring the IF stage. 2

103 N-bit Branch Prediction Bffers Use an n-bit satrating conter Only the loop eit cases a misprediction 2-bit predictor almost as good as any general n-bit predictor 3

104 Basic Branch Prediction Bffers a.k.a. Branch History Table (BHT) - Small direct-mapped cache of T/NT bits IR: Branch + Branch Target PC: BHT T (predict taken) NT (predict not- taken) PC +

105 Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining 5

106 What abot Eceptions? Another form of branch hazard How to stop the pipeline? restart? Who cased the interrpt? Who to serve first, if mltiple interrpts at the same time? 6

107 Handling Eceptions How to stop the pipeline? restart? Sppose overflow occr atadd $,$2,$ Disable writes of instrctions till trap hits, e.g., flsh following instrctions sing IF.Flsh, ID.Flsh, EX.Flsh to case mltipleers to zero control signals (overflow eception detected at EX => flsh offending instrction) Force trap instrction into IF, e.g., fetch from he by adding he to PC inpt UX Save address of offending instrction in EPC 7

108 Pipeline with Eception IF.Flsh ID.Flsh EX.Flsh Hazard detection nit ID/EX EX/E Fig IF/ID Control EX Case E/ Shift left 2 Ecept PC PC isters = Data Sign etend Forwarding nit 8

109 Who cased the eception? 5 instrctions eecting in 5 stage pipeline Who cased the eception? Need to know in which stage an eception can occr => help determine case Stage IF ID EX Problem interrpts occrring Page falt; misaligned access; -protection violation Undefined or illegal opcode Arithmetic eception E Page falt; misaligned access; error; mem-protection violation; 9

110 When to Serve? Who to serve first, if mltiple interrpts at the same time? ltiple interrpts: se priority hardware to choose the earliest instrction to interrpt Eternal interrpts: fleible in when to interrpt

111 Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining

112 Level Parallelism How to increase the potential amont of ILP: Increase the depth of the pipeline to overlap more instrctions sper-pipeline Lanch mltiple instrctions Static mltiple isse (decision made by compiler before eection) Dynamic mltiple isse (decision made dring eection by the processor) 2

113 Different Pipelined Designs Pipelining IF D E W IF D E W IF D E W IF D E W Limitation Isse rate, FU stalls, FU depth Sper-scalar - Isse mltiple scalar instrctions per cycle IF D E W IF D E W IF D E W IF D E W Hazard resoltion VLIW (EPIC) - Each instrction specifies mltiple scalar operations - Compiler determines parallelism IF D E W E W E W E W Packing 3

114 Static ltiple Isse Use compiler to assist with packing instrctions and handling hazard Very Long Word (VLIW) Eplicitly Parallel Compter (EPIC) (Intel IA-6)

115 A Static Two-isse Datapath Fig. 6.5 PC isters Data Sign etend Sign etend ress 5

116 Dynamic ltiple Isse The hardware performs the schedling? hardware tries to find instrctions to eecte ot of order eection is possible speclative eection and dynamic branch prediction Sperscalar 6

117 Sperscalar: Three Primary Units fetch and decode nit In-order isse Reservation station Reservation station Reservation station Reser vation station Fnctional nits Integer Integer Floating point Load/ Store Ot-of-order eecte In-order commit Commit nit Fig

118 Simple Sperscalar Independent INT and FP isse to separate pipelines I-Cache INT Inst Isse and Bypass FP Operand / Reslt Bsses INT Unit Load / Store Unit FP FP l D-Cache 8

119 Dynamic Schedling All modern processors are very complicated DEC Alpha 226: 9 stage pipeline, 6 instrction isse PowerPC and Pentim: branch history table Compiler technology important 9

120 Smmary Pipelines pass control information down the pipe jst as moves down pipe Forwarding/stalls handled by local control Eceptions stop the pipeline IPS instrction set architectre made pipeline visible (delayed branch, delayed load) ore performance from deeper pipelines, parallelism 2

1048: Computer Organization

1048: Computer Organization 8: Compter Organization Lectre 6 Pipelining Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6- Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards