1048: Computer Organization

Size: px

Start display at page:

Download "1048: Computer Organization"

Warren Reeves
5 years ago
Views:

1 8: Compter Organization Lectre 6 Pipelining Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-

2 Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-2

3 Pipelining Is Natral! Landry eample: Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold A B C D Washer takes 3 mintes Dryer takes mintes Folder takes 2 mintes

4 Seqential Landry 6 P idnight Time T a s k O r d e r A B C D Seqential landry takes 6 hors for loads If they learned pipelining, how long wold it take?

5 Pipelined Landry: Start ASAP 6 P idnight Time T a s k O r d e r A B C D 3 2 Pipelined landry takes 3.5 hors for loads

6 Pipelining Lessons T a s k O r d e r 6 P Time 3 2 A B C D Doesn t help latency of single task, bt throghpt of entire Pipeline rate limited by slowest stage ltipletasks working at same time sing different resorces Potential speedp = Nmber pipe stages Unbalanced stage length; time to fill & drain the pipeline redce speedp Stall for dependences

7 Single-, lti-cycle, vs. Pipeline Clk Cycle Cycle 2 Single Cycle Implementation: Load Store Waste Cycle Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle Clk ltiple Cycle Implementation: Load Ifetch Reg Eec em Wr Store Ifetch Reg Eec em R-type Ifetch Pipeline Implementation: Load Ifetch Reg Eec em Wr Store Ifetch Reg Eec em Wr R-type Ifetch Reg Eec em Wr

8 Pipelining IPS Eection Program eection order Time (in instrctions) lw $, ($) Instrction fetch Reg ALU Data access Reg lw $2, 2($) 8 ns Instrction fetch Reg ALU Data access Reg lw $3, 3($) Program eection Time order (in instrctions) lw $, ($) Instrction fetch 8 ns Reg ALU Data access Reg Instrction fetch 8 ns... lw $2, 2($) 2 ns Instrction fetch Reg ALU Data access Reg Fig. 6.3 lw $3, 3($) 2 ns Instrction fetch Reg ALU Data access Reg 2 ns 2 ns 2 ns 2 ns 2 ns Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-8

9 Why Pipeline? Time (clock cycles) I n s t r. O r d e r Inst Inst Inst 2 Inst 3 Inst ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Singlecycle Datapath Im Reg Dm Reg Becase the Resorces Are There!

10 Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-

11 Designing a Pipelined Processor Eamine the path and control diagram Starting with single- or mlti-cycle path? Single- or mlti-cycle control? Partition path into stages: IF (instrction fetch), ID (instrction decode and register file read), EX (eection or address calclation), E ( access), (write back) Associate resorces with stages Ensre that flows do not conflict, or figre ot how to resolve Assert control in appropriate stage Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-

12 Use lticycle Eection Steps Step name Instrction fetch Instrction decode/register fetch Action for R-type instrctions Action for -reference Action for instrctions branches IR = emory[pc] PC = PC + A = Reg [IR[25-2]] B = Reg [IR[2-6]] ALUOt = PC + (sign-etend (IR[5-]) << 2) Action for jmps Eection, address ALUOt = A op B ALUOt = A + sign-etend if (A ==B) then PC = PC [3-28] II comptation, branch/ (IR[5-]) PC = ALUOt (IR[25-]<<2) jmp completion emory access or R-type Reg [IR[5-]] = Load: DR = emory[aluot] completion ALUOt or Store: emory [ALUOt] = B emory read completion Load: Reg[IR[2-6]] = DR Bt, se single-cycle path...

13 Split Single-Cycle Datapath IF: Instrction fetch ID: Instrction decode/ register file read EX: Eecte/ address calclation E: emory access : back Feedback Path Shift left 2 reslt PC ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 Fig. 6.9 Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-3

14 Pipeline Registers Pipeline registers (latches) IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data Fig Sign etend 32 Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-

15 Consider load Cycle Cycle 2 Cycle 3 Cycle Cycle 5 Load Ifetch Reg/Dec Eec em Wr IF: Instrction Fetch Fetch the instrction from the Instrction emory ID: Instrction Decode Registers fetch and instrction decode EX: Calclate the address E: the from the Data emory : the back to the register file

16 Pipelining load Cycle Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 Cycle 7 Clock st lw Ifetch Reg/Dec Eec em Wr 2nd lw Ifetch Reg/Dec Eec em Wr 3rd lw Ifetch Reg/Dec Eec em Wr 5 fnctional nits in the pipeline path are: Instrction emory for the Ifetch stage Register File s ports (bsa and bsb) for the Reg/Dec stage ALU for the Eec stage Data emory for the E stage Register File s port (bsw) for the stage

17 IF Stage of load IR = mem[pc]; PC = PC + lw Instrction fetch IR, PC+ Fig. 6.2 IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend

18 ID Stage of load A = Reg[IR[25-2]]; B = Reg[IR[2-6]]; ALUot = PC + (sign-et(ir[5-]) << 2) (some ops moved to the net stage) lw Instrction decode Fig. 6.2 IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend

19 EX Stage of load ALUot = A + sign-et(ir[5-]) lw Eection Fig. 6.3 IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend

20 E State of load DR = mem[aluot] Fig. 6. lw emory IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend

21 Stage of load Reg[IR[2-6]] = DR Fig. 6. Who will spply this address? lw back IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-2

22 The For Stages of R-type Cycle Cycle 2 Cycle 3 Cycle R-type Ifetch Reg/Dec Eec Wr IF: fetch the instrction from the Instrction emory ID: registers fetch and instrction decode EX: ALU operates on the two register operands : write ALU otpt back to the register file Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-22

23 Pipelining R-type and load Cycle Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type Ifetch Reg/Dec Eec Wr Ops! We have a problem! R-type Ifetch Reg/Dec Eec Wr Load Ifetch Reg/Dec Eec em Wr R-type Ifetch Reg/Dec Eec Wr R-type Ifetch Reg/Dec Eec Wr We have a strctral hazard: Two instrctions try to write to the register file at the same time! Only one write port Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-23

24 Important Observation Each fnctional nit can only be sed once per instrction Each fnctional nit mst be sed at the same stage for all instrctions: Load ses Register File s write port dring its 5th stage Load Ifetch Reg/Dec Eec em Wr R-type ses Register File s write port dring its th stage 2 3 R-type Ifetch Reg/Dec Eec Wr Several ways to solve: forwarding, adding pipeline bbble, making instrctions same length Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-2

25 Soltion: Delay R-type s Delay R-type s register write by one cycle: R-type also se Reg File s write port at Stage 5 E is a NOP stage: nothing is being done R-type Ifetch Reg/Dec Eec em Wr R-type also has 5 stages Cycle Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type Ifetch Reg/Dec Eec em Wr R-type Ifetch Reg/Dec Eec em Wr Load Ifetch Reg/Dec Eec em Wr R-type Ifetch Reg/Dec Eec em Wr R-type Ifetch Reg/Dec Eec em Wr Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-25

26 The For Stages of store Cycle Cycle 2 Cycle 3 Cycle Store Ifetch Reg/Dec Eec em Wr IF: fetch the instrction from the Instrction emory ID: registers fetch and instrction decode EX: calclate the address E: write the into the Data emory an etra stage: : NOP Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-26

27 The Three Stages of beq Cycle Cycle 2 Cycle 3 Cycle Beq Ifetch Reg/Dec Eec em Wr IF: fetch the instrction from the Instrction emory ID: registers fetch and instrction decode EX: compares the two register operand select correct branch target address latch into PC two etra stages: E: NOP : NOP Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-27

28 Pipelined Datapath Fig. 6.7 IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-28

29 Graphically Representing Pipelines Time (in clock cycles) Program eection order (in instrctions) lw $, 2($) CC CC 2 CC 3 CC CC 5 CC 6 I Reg ALU D Reg sb $, $2, $3 I Reg ALU D Reg Can help with answering qestions like: How many cycles to eecte this code? What is the ALU doing dring cycle? Help nderstand paths Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-29

30 Eample : Cycle Fig. 6.8 lw $, 2($) Instrction fetch IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 Clock

31 Eample : Cycle 2 Fig. 6.8 sb $, $2, $3 Instrction fetch lw $, 2($) Instrction decode IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 Clock 2

32 Eample : Cycle 3 Fig. 6.8 sb $, $2, $3 Instrction decode lw $, 2($) Eection IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 Clock 3

33 Eample : Cycle Fig. 6.8 sb $, $2, $3 Eection lw $, 2($) emory IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 Clock

34 Eample : Cycle 5 Fig. 6.8 sb $, $2, $3 emory lw $, 2($) back IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 Clock 5

35 Eample : Cycle 6 Fig. 6.8 sb $, $2, $3 back IF/ID ID/EX EX/E E/ Shift left 2 reslt PC ress Instrction Instrction register register 2 Registers 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 Clock 6 Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-35

36 Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-36

37 Control Signals Fig PCSrc IF/ID ID/EX EX/E E/ Reg Shift left 2 reslt Branch PC ress Instrction Instrction register register 2 Registers 2 register Instrction [5 ] 6 Sign 32 etend ALUSrc 6 ALU control Zero ALU ALU reslt ress em Data em emtoreg Instrction [2 6] Instrction [5 ] RegDst ALUOp 6-37

38 Grop Signals According to Stages Can se control signals of single-cycle CPU (Fig. 6.23, 6.2 <==> 5.2, 5.6) Eection/ress Calclation stage control lines emory access stage control lines -back stage control lines Reg Dst ALU Op ALU Op ALU Src Branch em em Reg write em to Reg X X X X Fig Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-38

39 Data Stationary Control Pass control signals along jst like the ain control generates control signals dring ID Fig Instrction Control EX IF/ID ID/EX EX/E E/ Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-39

40 Data Stationary Control Signals for EX (EtOp, ALUSrc,...) are sed cycle later Signals for E (emwr, Branch) are sed 2 cycles later Signals for (emtoreg, emwr) are sed 3 cycles later ID EX E EtOp EtOp ALUSrc ALUSrc IF/ID Register ain Control ALUOp RegDst emwr Branch emtoreg ID/E Register ALUOp RegDst emwr Branch emtoreg E/E Register emw Branch emtoreg E/ Register emtoreg RegWr RegWr RegWr RegWr Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-

41 Datapath with Control Fig PCSrc Control ID/EX EX/E E/ IF/ID EX PC ress Instrction Instrction Reg register register 2 Registers 2 register Shift left 2 reslt ALUSrc Zero ALU ALU reslt Branch em ress Data emtoreg Remember that? Who will spply this address? Instrction [5 ] Instrction [2 6] Instrction [5 ] 6 Sign 32 etend 6 ALU control RegDst ALUOp em

42 Let s Try it Ot lw $, 2($) sb $, $2, $3 and $2, $, $5 or $3, $6, $7 add $, $8, $9 Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-2

43 Eample 2: Cycle IF: lw $, 2($) ID: before<> EX: before<2> E: before<3> : before<> IF/ID Control ID/EX EX EX/E E/ PC ress Instrction Instrction Reg register register 2 Registers 2 register Shift left 2 reslt ALUSrc Zero ALU ALU reslt Branch ress Data em emtoreg Instrction [5 ] Sign etend ALU control em Clock Instrction [2 6] Instrction [5 ] RegDst ALUOp 6-3

44 Eample 2: Cycle 2 IF: sb $, $2, $3 ID: lw $, 2($) EX: before<> E: before<2> : before<3> IF/ID lw Control ID/EX EX EX/E E/ PC ress Instrction Instrction X Reg register register 2 Registers 2 register $ $X Shift left 2 reslt ALUSrc Zero ALU ALU reslt Branch em ress Data emtoreg 2 Instrction [5 ] Sign etend 2 ALU control em Clock 2 X Instrction [2 6] Instrction [5 ] X RegDst ALUOp 6-

45 Eample 2: Cycle 3 IF: and $2, $, $5 ID: sb $, $2, $3 EX: lw $,... E: before<> : before<2> IF/ID sb Control ID/EX EX EX/E E/ PC ress Instrction Instrction 2 3 Reg register register 2 Registers 2 register $2 $3 Shift left 2 $ reslt ALUSrc Zero ALU ALU reslt Branch em ress Data emtoreg X Instrction [5 ] Sign etend X 2 ALU control em Clock 3 X Instrction [2 6] Instrction [5 ] X RegDst ALUOp 6-5

46 Eample 2: Cycle IF: or $3, $6, $7 ID: and $2, $2, $3 EX: sb $,... E: lw $,... : before<> IF/ID and Control ID/EX EX EX/E E/ PC ress Instrction Instrction 5 Reg Shift left 2 register $ $2 register 2 Registers $5 $3 2 register reslt ALUSrc Zero ALU ALU reslt Branch ress em Data emtoreg X Instrction [5 ] Sign etend X ALU control em Clock X 2 Instrction [2 6] Instrction [5 ] X 2 RegDst ALUOp 6-6

47 Eample 2: Cycle 5 IF: add $, $8, $9 ID: or $3, $6, $7 EX: and $2,... E: sb $,... : lw $,... IF/ID or Control ID/EX EX EX/E E/ PC ress Instrction Instrction 6 7 Reg register register 2 Registers 2 register $6 $7 Shift left 2 $ $5 reslt ALUSrc Zero ALU ALU reslt Branch em ress Data emtoreg X Instrction [5 ] Sign etend X ALU control em Clock 5 X 3 Instrction [2 6] Instrction [5 ] X 3 2 RegDst ALUOp 6-7

48 Eample 2: Cycle 6 IF: after<> ID: add $, $8, $9 EX: or $3,... E: and $2,... : sb $,... IF/ID add Control ID/EX EX EX/E E/ PC ress Instrction Instrction 8 9 Reg register register 2 Registers 2 register $8 $9 Shift left 2 $6 $7 reslt ALUSrc Zero ALU ALU reslt Branch em ress Data emtoreg X Instrction [5 ] Sign etend X ALU control em Clock 6 X Instrction [2 6] Instrction [5 ] X 3 RegDst ALUOp 2 6-8

49 Eample 2: Cycle 7 IF: after<2> ID: after<> EX: add $,... E: or $3,... : and $2,... IF/ID Control ID/EX EX EX/E E/ PC ress Instrction Instrction 2 Reg register register 2 Registers 2 register Shift left 2 $8 $9 reslt ALUSrc Zero ALU ALU reslt Branch em ress Data emtoreg Instrction [5 ] Sign etend ALU control em Clock 7 Instrction [2 6] Instrction [5 ] RegDst ALUOp

50 Eample 2: Cycle 8 IF: after<3> ID: after<2> EX: after<> E: add $,... : or $3,... IF/ID Control ID/EX EX EX/E E/ PC ress Instrction Instrction 3 Reg register register 2 Registers 2 register Shift left 2 reslt ALUSrc Zero ALU ALU reslt Branch em ress Data emtoreg Instrction [5 ] Sign etend ALU control em Clock 8 Instrction [2 6] Instrction [5 ] RegDst ALUOp 3 6-5

51 Eample 2: Cycle 9 IF: after<> ID: after<3> EX: after<2> E: after<> : add $,... IF/ID Control ID/EX EX EX/E E/ PC ress Instrction Instrction Reg register register 2 Registers 2 register Shift left 2 reslt ALUSrc Zero ALU ALU reslt Branch em ress Data emtoreg Instrction [5 ] Sign etend ALU control em Clock 9 Instrction [2 6] Instrction [5 ] RegDst ALUOp 6-5

52 Smmary of Pipeline Basics Pipelining is a fndamental concept ltiple steps sing distinct resorces Utilize capabilities of path by pipelined instrction processing Start net instrnction while working on the crrent one Limited by length of longest stage (pls fill/flsh) Need to detect and resolve hazards What makes it easy in IPS? All instrctions are of the same length Jst a few instrction formats emory operands only in loads and stores What makes pipelining hard? Pipeline hazards Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-52

53 Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding (R-Type and R-Type) Data hazards and stalls (Load and R-type) Branch hazards Eceptions Sperscalar and dynamic pipelining Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-53

54 Pipeline Hazards Pipeline Hazards: Strctral hazards: attempt to se the same resorce in two different ways at the same time Data hazards: attempt to se item before ready Instrction depends on reslt of prior instrction still in the pipeline Control hazards: attempt to make decision before condition is evalated Branch instrctions Can always resolve hazards by waiting? pipeline control mst detect the hazard take action (or delay action) to resolve hazards Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-5

55 Strctral Hazard: Single emory Time I n s t r. O r d e r Load Instr Instr 2 Instr 3 Instr ALU em Reg em Reg ALU em Reg em Reg em ALU em Reg em Reg ALU Reg em Reg em Reg em Reg ALU Use 2 : and instrction Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-55

56 Data Hazards Fig Time (in clock cycles) Vale of register $2: Program eection order (in instrctions) sb $2, $, $3 CC CC 2 CC 3 CC CC 5 CC 6 I Reg CC 7 CC 8 CC 9 / D Reg and $2, $2, $5 I Reg D Reg or $3, $6, $2 I Reg D Reg add $, $2, $2 I Reg D Reg sw $5, ($2) I Reg D Reg

57 Types of Data Hazards Three types: (inst. i followed by inst. i2) RAW (read after write): i2 tries to read operand before i writes it WAR (write after read): i2 tries to write operand before i reads it Gets wrong operand, e.g., atoincrement addr. Can t happen in IPS 5-stage pipeline becase: All instrctions take 5 stages, and reads are always in stage 2, and writes are always in stage 5 WAW (write after write): i2 tries to write operand before i writes it Leaves wrong reslt ( i s not i2 s); occr only in pipelines that write in more than one stage Can t happen in IPS 5-stage pipeline becase: RAR? All instrctions take 5 stages, and writes are always in stage 5 Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-57

58 Pipeline Hazards Illstrated IF ID EX E Strctral Hazard IF ID. IF ID EX E RAW (read after write) Data Hazard IF ID EX E IF ID EX E WAW Data Hazard (write after write) IF ID EX em IF ID EX E WAR Data Hazard (write after read) Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-58

59 Handling Data Hazards Use simple, fied designs Eliminate WAR by always fetching operands early (ID) in pipeline Eliminate WAW by doing all write backs in order (last stage, static) These featres have a lot to do with ISA design Internal forwarding in register file: in first half of clock and read in second half delivers what is written, resolve hazard between sb and add Detect and resolve remaining ones Compiler inserts NOP Forward Stall Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-59

60 Software Soltion Have compiler garantee no hazards Where do we insert the NOPs? sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $, $2, $2 sw $5, ($2) Problem: this really slows s down! Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-6

61 Time (in clock cycles) Vale of register $2: Program eection order (in instrctions) sb $2, $, $3 I Insert two nops Reg Data Hazards CC CC 2 CC 3 CC CC 5 CC 6 CC 7 CC 8 CC 9 / D Reg Fig and $2, $2, $5 I Reg D Reg or $3, $6, $2 I Reg D Reg add $, $2, $2 I Reg D Reg sw $5, ($2) I Reg D Reg Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-6

62 Data Hazards : Forwarding Time (in clock cycles) Vale of register $2: Program eection order (in instrctions) CC CC 2 CC 3 CC CC 5 CC 6 CC 7 CC 8 CC 9 / sb $2, $, $3 I Reg D Reg and $2, $2, $5 I Reg D Reg Fig or $3, $6, $2 I Reg D Reg add $, $2, $2 I Reg D Reg sw $5, ($2) I Reg D Reg Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-62

63 6-63 Pipeline with Forwarding PC Instrction Registers Control ALU EX ID/EX EX/E E/ Data Forwarding nit IF/ID Instrction Rd EX/E.RegisterRd E/.RegisterRd Rt Rt Rs IF/ID.RegisterRd IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRs Fig ForwardA ForwardB

64 Detecting Data Hazards Hazard conditions: a. EX/E.RegisterRd = ID/EX.RegisterRs b. EX/E.RegisterRd = ID/EX.RegisterRt 2a. E/.RegisterRd = ID/EX.RegisterRs 2b. E/.RegisterRd = ID/EX.RegisterRt Two optimizations: Don t forward if instrction does not write register => check if Reg is asserted Don t forward if destination register is $ => check if RegisterRd = Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-6

65 Detecting Data Hazards (cont.) Hazard conditions sing control signals: At EX stage: EX/E.Reg and (EX/E.RegRd ) and (EX/E.RegRd=ID/EX.RegRs) At E stage: E/.Reg and (E/.RegRd ) and (E/.RegRd=ID/EX.RegRs) (replace ID/EX.RegRt for ID/EX.RegRs for the other two conditions) Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-65

66 Resolving Hazards: Forwarding Use temporary reslts, e.g., those in pipeline registers, don t wait for them to be written Fig Time (in clock cycles) CC CC 2 CC 3 CC CC 5 CC 6 CC 7 CC 8 CC 9 Vale of register $2 : / Vale of EX/E : X X X 2 X X X X X Vale of E/ : X X X X 2 X X X X Program eection order (in instrctions) sb $2, $, $3 I Reg D Reg and $2, $2, $5 I Reg D Reg or $3, $6, $2 I Reg D Reg add $, $2, $2 I Reg D Reg sw $5, ($2) 6-66 I Reg D Reg

67 Pipeline with Forwarding ID/EX EX/E Fig Control E/ IF/ID EX PC Instrction Instrction Registers IF/ID.RegisterRs Rs ALU ForwardA Data IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRd Rt Rt Rd ForwardB EX/E.RegisterRd Forwarding nit E/.RegisterRd Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-67

68 Forwarding Logic Forwarding: inpt to ALU from any pipe reg. mltipleors to ALU inpt Control forwarding in EX => carry Rs in ID/EX Control signals for forwarding: If both and E forward, e.g., add $,$,$2; add $,$,$3; add $,$,$; => let E forward EX hazard: if (EX/E.Reg and (EX/E.RegRd ) and (EX/E.RegRd=ID/EX.RegRs)) ForwardA= E hazard: if (E/.Reg and (E/.RegRd ) and (EX/E.RegRd ID/EX.Reg.Rs) and (E/.RegRd=ID/EX.RegRs)) ForwardA= (ID/EX.RegRt<->ID/EX.RegRs, ForwardB<-> ForwardA) Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-68

69 Eample 3: Cycle 3 or $, $, $2 and $, $2, $5 sb $2, $, $3 before<> before<2> ID/EX EX/E Control E/ IF/ID EX 2 $2 $ PC Instrction Instrction 5 Registers $5 $3 ALU Data Forwarding nit Clock

70 Eample 3: Cycle add $9, $, $2 or $, $, $2 and $, $2, $5 sb $2,... before<> ID/EX EX/E Control E/ IF/ID EX $ $2 PC Instrction Instrction 6 Registers $2 $5 ALU Data Fig. 6. Forwarding nit 2 Clock 6-7

71 Eample 3: Cycle 5 after<> add $9, $, $2 or $, $, $2 and $,... sb $2,... ID/EX EX/E Control E/ IF/ID EX $ $ PC Instrction Instrction 2 2 Registers $2 $2 ALU Data Fig Forwarding nit 2 Clock 5 6-7

72 Eample 3: Cycle 6 after<2> after<> add $9, $, $2 or $,... and $,... ID/EX EX/E Control E/ IF/ID EX $ PC Instrction Instrction Registers $2 ALU Data 2 Fig Forwarding nit Clock

73 lw can still case a hazard: (in instrctions) Can't Always Forward if followed by an instrction to read the loaded reg. Fig. 6.3 lw $2, 2($) I Reg? D Reg Use stalling or compiler to resolve and $, $2, $5 I Reg D Reg or $8, $2, $6 I Reg D Reg add $9, $, $2 I Reg D Reg slt $, $6, $7 I Reg D Reg 6-73

74 Stalling Stall pipeline by keeping instrctions in same stage and inserting an NOP instead (in instrctions) lw $2, 2($) I Reg D Reg Fig and $, $2, $5 I Reg Reg D Reg or $8, $2, $6 add $9, $, $2 I I Reg D Reg bbble I Reg D Reg slt $, $6, $7 I Reg D Reg Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-7

75 Pipeline with Stalling Unit Forwarding controls ALU inpts, hazard detection controls PC, IF/ID, control signals IF/ID Hazard detection nit Control ID/EX.em ID/EX EX/E Fig E/ IF/ID EX PC PC Instrction Instrction Registers ALU Data IF/ID.RegisterRs IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRd Rt Rd EX/E.RegisterRd ID/EX.RegisterRt Rs Rt Forwarding nit E/.RegisterRd 6-75

76 Handling Stalls Hazard detection nit in ID to insert stall between a load instrction and its se: if (ID/EX.em and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.registerRt)) stall the pipeline for one cycle (ID/EX.em= indicates a load instrction) How to stall? Stall instrction in IF and ID: not change PC and IF/ID => the stages re-eecte the instrctions What to move into EX: insert an NOP by changing EX, E, control fields of ID/EX pipeline register to as control signals propagate, all control signals to EX, E, are deasserted and no registers or memories are written Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-76

77 Eample : Cycle 2 and $, $2, $5 lw $2, 2($) before<> before<2> IF/ID IF/ID X Hazard detection nit Control ID/EX.em ID/EX EX EX/E E/ before<3> PC PC Instrction Instrction X Registers $ $X ALU Data ID/EX.RegisterRt X 2 Forwarding nit Clock

78 Eample : Cycle 3 or $, $, $2 and $, $2, $5 lw $2, 2($) before<> before<2> IF/ID IF/ID 2 5 Hazard detection nit Control ID/EX.em ID/EX EX EX/E E/ PC PC Instrction Instrction 2 5 Registers $2 $5 $ $X ALU Data 2 5 X 2 ID/EX.RegisterRt Forwarding nit Clock

79 Eample : Cycle or $, $, $2 and $, $2, $5 bbble lw $2,... before<> IF/ID IF/ID 2 5 Hazard detection nit Control ID/EX.em ID/EX EX EX/E E/ PC PC Instrction Instrction 2 5 Registers $2 $5 $2 $5 ALU Data ID/EX.RegisterRt Forwarding nit 2 Clock 6-79

80 Eample : Cycle 5 add $9, $, $2 or $, $, $2 and $, $2, $5 bbble lw $2,... IF/ID IF/ID 2 Hazard detection nit Control ID/EX.em ID/EX EX EX/E E/ PC PC Instrction Instrction 2 2 Registers $ $2 $2 $5 ALU Data ID/EX.RegisterRt Forwarding nit Clock 5 6-8

81 Eample : Cycle 6 after<> add $9, $, $2 or $, $, $2 and $,... bbble IF/ID IF/ID 2 Hazard detection nit Control ID/EX.em ID/EX EX EX/E E/ PC PC Instrction Instrction 2 Registers $ $2 $ $2 ALU Data Fig. 6.9 ID/EX.RegisterRt Forwarding nit Clock 6 6-8

82 Eample : Cycle 7 after<2> after<> add $9, $, $2 or $,... and $,... Hazard detection nit ID/EX.em ID/EX IF/ID IF/ID Control EX EX/E E/ PC PC Instrction Instrction Registers $ $2 ALU Data 2 ID/EX.RegisterRt 9 Forwarding nit Clock

83 Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-83

84 Pipeline Datapath with Control Signals PCSrc IF/ID ID/EX EX/E E/ Reg Shift left 2 reslt Branch PC ress Instrction Instrction register register 2 Registers 2 register Instrction [5 ] 6 Sign 32 etend ALUSrc 6 ALU control Zero ALU ALU reslt ress em Data em emtoreg Fig Instrction [2 6] Instrction [5 ] ALUOp RegDst Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-8

85 Branch Hazards When decide to branch, other inst. are in pipeline! (in instrctions) beq $, $3, 7 I Reg D Reg Fig and $2, $2, $5 I Reg D Reg 8 or $3, $6, $2 I Reg D Reg 52 add $, $2, $2 I Reg D Reg 72 lw $, 5($7) I Reg D Reg Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-85

86 P i p e l i n e H a z a r d s I l l s t r a t e d IF ID EX E Strctral Hazard IF ID. IF ID EX E RAW (read after write) Data Hazard IF ID EX E IF ID EX E WAW Data Hazard (write after write) IF ID EX em IF ID EX E WAR Data Hazard (write after read) IF ID EX E Control Hazard IF ID. 6-86

87 Handling Branch Hazard Predict branch always not taken Need to add hardware for flshing inst. if wrong Branch decision made at E => need to flsh instrction in IF/ID, ID/EX by changing control vales to Redce delay of taken branch by moving branch eection earlier in the pipeline ove p branch address calclation to ID Check branch eqality at ID (sing XOR) by comparing the two registers read dring ID Branch decision made at ID => one instrction to flsh a control signal, IF.Flsh, to zero instrction field of IF/ID => making the instrction an NOP Dynamic branch prediction Compiler reschedling, delay branch Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-87

88 6-88 Pipeline with Flshing PC Instrction Registers ALU EX ID/EX EX/E E/ Data Hazard detection nit Forwarding nit IF.Flsh IF/ID Sign etend Control = Shift left 2 Fig. 6.

89 Eample 5: Cycle 3 Fig and $2, $2, $5 beq $, $3, 7 sb $, $, $8 before<> before<2> IF.Flsh 72 8 Hazard detection nit ID /EX EX/E IF/ID 8 Control EX E/ PC 72 Instrction Shift left 2 7 Registers = $ $3 $ $8 ALU Data Sign etend Forwarding nit Clock

90 Eample 5: Cycle lw $, 5($7) bbble (nop) beq $, $3, 7 sb $,... before<> IF.Flsh 76 Hazard detection nit ID/EX EX/E IF/ID Control EX E/ PC Instrction Shift left 2 Registers = $ $3 ALU Data Sign etend Forwarding nit Clock 6-9

91 Delayed Branch Predict-not-taken + branch decision at ID => the following instrction is always eected => branches take effect cycle later I n s t r. O r d e r add beq misc lw Time (clock cycles) ALU em Reg em Reg clock cycle penalty per branch instrction if can find instrction to pt in slot ( 5% of time) ALU em Reg em Reg em ALU Reg em Reg em Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-9 ALU Reg em Reg

92 Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-92

93 Handling Eceptions How to stop the pipeline? restart? Sppose overflow occr at add $,$2,$ Disable writes of instrctions till trap hits, e.g., flsh following instrctions sing IF.Flsh, ID.Flsh, EX.Flsh to case mltipleers to zero control signals (overflow eception detected at EX => flsh offending instrction) Force trap instrction into IF, e.g., fetch from he by adding he to PC inpt UX Save address of offending instrction in EPC Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-93

94 6-9 Pipeline with Eception PC Instrction Registers Sign etend Control ALU EX ID/EX EX/E E/ Data Hazard detection nit Forwarding nit IF.Flsh IF/ID = Ecept PC ID.Flsh EX.Flsh Case Shift left 2 Fig. 6.55

95 Handling Eceptions 5 instrctions eecting in 5 stage pipeline Who cased the eception? Need to know in which stage an eception can occr => help determine case Stage IF ID EX E Problem interrpts occrring Page falt; misaligned access; -protection violation Undefined or illegal opcode Arithmetic eception Page falt; misaligned access; error; mem-protection violation; Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-95

96 Handling Eceptions Who to serve first, if mltiple interrpts at the same time? ltiple interrpts: se priority hardware to choose the earliest instrction to interrpt Eternal interrpts: fleible in when to interrpt Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-96

97 Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-97

98 Instrction Level Parallelism, ILP How to increase the potential amont of ILP: Increase the depth of the pipeline to overlap more instrctions sper-pipeline Lanch mltiple instrctions Static mltiple isse (decision made by compiler before eection) Dynamic mltiple isse (decision made dring eection by the processor) Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-98

99 Different Pipelined Designs Pipelining Sper-pipeline - Isse one instrction per (fast) cycle - ALU takes mltiple cycles IF D E W IF D E W IF D E W IF D E IF D E W IF D E W IF D E W IF D E W W Limitation Isse rate, FU stalls, FU depth Clock skew, FU stalls, FU depth Sper-scalar - Isse mltiple scalar instrctions per cycle IF D E W IF D E W IF D E W IF D E W Hazard resoltion VLIW (EPIC) - Each instrction specifies mltiple scalar operations - Compiler determines parallelism IF D E W E W E W E W Packing Vector operations - Each instrction specifies series of identical operations IF D E W E W E W E W Applicability 6-99

100 Static ltiple Isse Use compiler to assist with packing instrctions and handling hazard Very Long Instrction Word (VLIW) Eplicitly Parallel Instrction Compter (EPIC) (Intel IA-6) Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-

101 A Static Two-isse Datapath Fig. 6.5 ALU PC Instrction Registers Data Sign etend Sign etend ALU ress

102 Dynamic ltiple Isse The hardware performs the schedling? hardware tries to find instrctions to eecte ot of order eection is possible speclative eection and dynamic branch prediction Sperscalar Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-2

103 Sperscalar:Three Primary Units Fig Instrction fetch and decode nit In-order isse Reservation station Reservation station Reservation station Reservation station Fnctional nits Integer Integer Floating point Load/ Store Ot-of-order eecte In-order commit Commit nit Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-3

104 Simple Sperscalar Independent INT and FP isse to separate pipelines I-Cache INT Reg Inst Isse and Bypass FP Reg Operand / Reslt Bsses INT Unit Load / Store Unit FP FP l D-Cache Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-

105 Dynamic Schedling All modern processors are very complicated DEC Alpha 226: 9 stage pipeline, 6 instrction isse PowerPC and Pentim: branch history table Compiler technology important Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-5

106 Smmary Pipelines pass control information down the pipe jst as moves down pipe Forwarding/stalls handled by local control Eceptions stop the pipeline IPS instrction set architectre made pipeline visible (delayed branch, delayed load) ore performance from deeper pipelines, parallelism Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6-6

Chapter 6: Pipelining

Chapter 6: Pipelining Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining