Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Size: px

Start display at page:

Download "Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts"

Jesse Godfrey McCoy
6 years ago
Views:

1 CS359: Compter Architectre Chapter 3 & Appendi C Pipelining Part A: Basic and Intermediate Concepts Yanyan Shen Department of Compter Science and Engineering Shanghai Jiao Tong University 1

2 Otline Introdction to Pipelining How Pipeline is Implemented C.3 Pipeline Hazards Eceptions Handling lticycle Operations 2

3 Hazards It wold be happy if we split the datapath into stages and the CPU works jst fine Bt, things are not that simple as yo may epect There are hazards! Hazard is a sitation that prevents starting the net instrction in the net cycle Strctre hazard Conflict over the se of a resorce at the same time Data hazard Data is not ready for the sbseqent dependent instrction Control hazard Fetching the net instrction depends on the previos branch otcome 3

4 Strctre Hazards Strctral hazard is a conflict over the se of a resorce at the same time Sppose the IPS CPU with a single memory Load/store reqires data access in E stage Instrction fetch reqires instrction access from the same memory Instrction fetch wold have to stall for that cycle Wold case a pipeline bbble Hence, pipelined datapaths reqire either separate ports to memory or separate memories for instrction and data Address Bs Address Bs IPS CPU Data Bs emory IPS CPU Data Bs Address Bs emory Data Bs 4

5 Time Strctre Hazards (Cont.) lw IF ID EX E WB add IF ID EX E WB sb IF ID EX E WB add IF ID EX E WB Either provide separate ports to access memory or to provide instrction memory and data memory separately 5

6 Data Hazards Data is not ready for the sbseqent dependent instrction add $s0,$t0,$t1 IF ID EX E WB sb $t2,$s0,$t3 Bbble IF ID Bbble EX E WB To solve the data hazard problem, the pipeline needs to be stalled (typically referred to as bbble ) Then, the performance is penalized A better soltion? Forwarding (or Bypassing) 6

7 Forwarding add $s0,$t0,$t1 IF ID EX E WB sb $t2,$s0,$t3 IF ID Bbble Bbble EX E WB 7

8 Data Hazard - Load-Use Case Can t always avoid stalls by forwarding Can t forward backward in time! Hardware interlock is needed for the pipeline stall lw $s0, 8($t1) IF ID EX E WB sb $t2,$s0,$t3 IF ID Bbble EX E WB This bbble can be hidden by proper instrction schedling 8

9 Code Schedling to Avoid Stalls Reorder code to avoid se of load reslt in the net instrction A = B + E; // B is loaded to $t1, E is loaded to $t2 C = B + F; // F is loaded to $t4 stall stall lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 13 cycles lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 11 cycles 9

10 Data Hazard - Forwarding Don t wait for them to be written to the register file Use temporary reslts Vale of register $2 : Vale of EX/E : Vale of E/WB : Program eection order (in instrctions) sb $2, $1, $3 Time (in clock cycles) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 I Reg CC 7 CC 8 CC / X X X 20 X X X X X X X X X 20 X X X X D Reg Ok.. Then, do we have to do this forwarding? 1. If the write to the register file occrs in the first half of the clock, and read occrs in the 2 nd half of the clock, then? and $12, $2, $5 I Reg D Reg or $13, $6, $2 I Reg D Reg add $14, $2, $2 I Reg D Reg sw $15, 100($2) I Reg D Reg 10

11 Forwarding ID/EX EX/E E/WB Register File ALU Data emory UX 11

12 Forwarding (from EX/E) ID/EX EX/E E/WB UX Register File ALU UX Data emory UX 12

13 Forwarding (from E/WB) ID/EX EX/E E/WB UX Register File ALU UX Data emory UX 13

14 Forwarding (operand selection) ID/EX EX/E E/WB UX Register File ALU UX Data emory UX Forwarding Unit 14

15 Forwarding (operand propagation) ID/EX EX/E E/WB UX Register File ALU UX Data emory UX Rd Rt UX Rt Rs Forwarding Unit EX/E Rd E/WB Rd 15

16 Review: The IPS Instrction Formats All IPS instrctions are 32 bits long. The three instrction formats: R-type op rs rt rd shamt fnct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits I-type op rs rt immediate 6 bits 5 bits 5 bits 16 bits J-type op target address 6 bits 26 bits The different fields are: op: operation of the instrction rs, rt, rd: the sorce and destination register specifiers shamt: shift amont fnct: selects the variant of the operation in the op field address / immediate: address offset or immediate vale target address: target address of the jmp instrction EI209 Chapter 4A.16 CSE, SJTU, 2013

17 Forwarding Logic Implementation 17

18 Forwarding ID/EX WB EX/E Control WB E/WB IF/ID EX WB PC Instrction memory Instrction Registers ALU Data memory IF/ID.RegisterRs Rs IF/ID.RegisterRt Rt IF/ID.RegisterRt IF/ID.RegisterRd Rt Rd EX/E.RegisterRd Forwarding nit E/WB.RegisterRd 18

19 Can't always forward lw (load word) can still case a hazard An instrction tries to read a register following a load instrction that writes to the same register Time (in clock cycles) Program CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 eection order (in instrctions) lw $2, 20($1) I Reg D Reg CC 7 CC 8 CC 9 and $4, $2, $5 I Reg D Reg or $8, $2, $6 I Reg D Reg add $9, $4, $2 I Reg D Reg slt $1, $6, $7 I Reg D Reg Ths, we need a hazard detection nit to stall the pipeline after the load instrction 19

20 Stalling We can stall the pipeline by keeping an instrction in the same stage Program Time (in clock cycles) eection order (in instrctions) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 CC 10 lw $2, 20($1) I Reg D Reg and $4, $2, $5 I ID - Reg Reg D Reg or $8, $2, $6 - I IF I Reg D Reg bbble add $9, $4, $2 I Reg D Reg slt $1, $6, $7 I Reg D Reg 20

21 Hazard Detection Unit Stall the pipeline if both ID/EX is a load and (rt=if/id.rs or rt=if/id.rt) Stall by letting an instrction (that won t write anything) go forward Hazard detection nit ID/EX.emRead ID/EX IF/IDWrite IF/ID Control 0 WB EX EX/E WB E/WB WB PCWrite PC Instrction memory Instrction Registers ALU Data memory IF/ID.RegisterRs IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRd ID/EX.RegisterRt Rt Rd Rs Rt Forwarding nit EX/E.RegisterRd E/WB.RegisterRd 21

22 Data Hazard Detection Logic The logic to detect the need for load interlocks dring the ID stage of an instrction 22

23 Control Hazard Branch determines the flow of instrctions Fetching the net instrction depends on the branch otcome Pipeline can t always fetch correct instrction Branch instrction is still working on ID stage when fetching the net instrction Taken target address is known here Branch is resolved here beq $1,$2,L1 IF ID EX E WB add $1,$2,$3 Bbble IF ID EX E WB sw $1, 4($2) Bbble IF ID EX E WB L1: sb $1,$2, $3 IF ID EX E WB Fetch the net instrction based on the comparison reslt 23

24 Redcing Control Hazard To redce 2 bbbles to 1 bbble, add hardware in ID stage to compare registers (and generate branch condition) Taken target address is known here Branch is resolved here beq $1,$2,L1 IF ID EX E WB add $1,$2,$3 L1: sb $1,$2, $3 Bbble IF ID EX E WB IF ID EX E WB Fetch instrction based on the comparison reslt 24

25 Delayed Branch any CPUs adopt a techniqe called the delayed branch to frther redce the stall Delayed branch always eectes the net seqential instrction The branch takes place after that one instrction delay Delay slot is the slot right after a delayed branch instrction Taken target address is known here Branch is resolved here beq $1,$2,L1 IF ID EX E WB add $1,$2,$3 (delay slot) IF ID EX E WB L1: sb $1,$2, $3 IF ID EX E WB Fetch instrction based on the comparison reslt 25

26 Delay Slot (Cont.) Compiler needs to schedle a sefl instrction in the delay slot, or fills it p with nop (no operation) // $s1 = a, $s2 = b, $3 = c // $t0 = d, $t1 = f a = b + c; if (d == 0) {f = f + 1;} f = f + 2; add $s1,$s2, $s3 bne $t0,$zero, L1 nop //delay slot addi $t1, $t1, 1 L1: addi $t1, $t1, 2 Can we do better? bne $t0, $zero, L1 add $s1,$s2,$s3 // delay slot addi $t1, $t1, 1 L1: addi $t1, $t1, 2 Fill the delay slot with a sefl and valid instrction 26

27 Branch Prediction Longer pipelines (implemented in Core 2 Do, for eample) can t readily determine branch otcome early Stall penalty becomes nacceptable since branch instrctions are sed so freqently in the program Soltion: Branch Prediction Predict the branch otcome in hardware Flsh the instrctions (that sholdn t have been eected) in the pipeline if the prediction trns ot to be wrong odern processors se sophisticated branch predictors 27

28 IPS with Predict-Not-Taken Prediction correct Prediction incorrect Flsh the instrction that sholdn t be eected 28

29 Control Hazards - Branch When the branch condition is resolved, other instrctions are in the pipeline Program eection order (in instrctions) Time (in clock cycles) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 40 beq $1, $3, 7 44 and $12, $2, $5 I Reg I Reg D Reg D Reg Note that in this implementation, the branch is resolved in the E stage 48 or $13, $6, $2 I Reg D Reg 52 add $14, $2, $2 I Reg D Reg 72 lw $4, 50($7) I Reg D Reg We are predicting branch not taken If we are wrong (if branch is taken), flsh instrctions 29

30 Alleviate Branch Hazards Redce penalty to 1 cycle ove the branch compare to the ID stage of pipeline Add an adder to calclate the branch target in ID stage Add the IF.flsh signal that zeros the instrction (or sqash) in IF/ID pipeline register Taken target address is known here Branch is resolved here beq $1,$2,L1 IF ID EX E WB add $1,$2,$3 L1: sb $1,$2, $3 Bbble IF ID EX E WB e IF ID EX E WB 30

31 31 Chapter 3: Pipelining Flshing Instrctions PC Instrction memory 4 Registers ALU EX WB WB WB ID/EX 0 EX/E E/WB Data memory Hazard detection nit Forwarding nit IF.Flsh IF/ID Sign etend Control = Shift left 2

32 Flshing Instrctions (cycle N) and $12, $2, $5 IF.Flsh beq $1, $3, L2 Hazard detection nit Control 0 ID/EX WB EX/E WB E/WB beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 L2: lw $4, 40($7) IF/ID EX WB PC 4 Instrction memory Shift left 2 Registers = ALU Data memory Sign etend Forwarding nit 32

33 Flshing Instrctions (cycle N) and $12, $2, $5 IF.Flsh beq $1, $3, L2 Hazard detection nit Control 0 ID/EX WB EX/E WB E/WB beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 L2: lw $4, 40($7) IF/ID EX WB L2 PC 4 Instrction memory Shift left 2 Registers = ALU Data memory Sign etend Forwarding nit 33

34 Flshing Instrctions (cycle N+1) lw $4, 40($7) IF.Flsh Hazard detection nit Control nop 0 ID/EX WB beq $1, $3, L2 EX/E WB E/WB beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 L2: lw $4, 40($7) IF/ID EX WB PC 4 Instrction memory Shift left 2 Registers = ALU Data memory Sign etend Forwarding nit 34

35 Otline Introdction to Pipelining How Pipeline is Implemented Pipeline Hazards C.4 Eceptions Handling lticycle Operations 35

36 Eceptions Eceptions describe those sitations where the normal eection order of instrction is changed! may force the CPU to abort the instrctions in the pipeline before they complete! Some other sed terminologies for eception Interrpt Falt 36

37 Types of Eceptions I/O device reqest Invoking an OS service for a ser program Tracing instrction eection Breakpoint (programmer reqested interrpt) Integer arithmetic overflow FP arithmetic anomaly Page falt (not in main memory) isaligned memory accesses emory protection violation Using an ndefined instrction Hardware malfnctions Power failre 37

38 Reqirements on Eceptions Synchronos vs asynchronos 38

39 Classifications 39

40 Stopping and Restarting Eection The most difficlt eceptions have two properties (1) they occr within instrctions (i.e., in the middle of the instrction eection corresponding to EX or E pipe stages (2) they mst be restartable 40

41 Steps to Save Pipeline State (1) Force a trap instrction into the pipeline on the net IF (2) Until the trap is taken, trn off all writes for the falting instrction and for all instrctions that follow in the pipeline This can be done by placing zeros into the pipeline latches of all instrctions, starting with the instrction that generates the eception, bt not those that precede that instrction (3) After the eception-handling rotine in the OS receives control, it immediately saves the PC of the falting instrction This vale will be sed to retrn from the eception later 41

42 Precise vs Imprecise Eceptions If the pipeline can be stopped so that the instrctions jst before the falting instrction are completed and those after it can be restarted from scratch, the pipeline is staid to have precise eceptions Spporting precise eceptions is a reqirement in many systems Any processor with demand paging or IEEE arithmetic trap handlers mst make its eceptions precise 42

43 Eceptions in IPS Pipeline Eceptions may occr in different stages of a pipeline 43

44 Otline Introdction to Pipelining How Pipeline is Implemented Pipeline Hazards Eceptions C.5 Handling lticycle Operations 44

45 Spporting ltiple FP Operations E X 4 IF ID E WB A 1 A 2 A 3 Integer Unit FP mltiplier: 7 cycles 5 FP add: 4 cycles A FP divider (non-pipelined) 24 cycles Complicate bypass or forwarding Potential strctral hazard ltiple (FP) instrctions can complete at the same time RF might need to be mlti-ported Ordering isse, who gets to pdate the register? Ot-of-order completion/retirement: Precise eception isse odified from Prof Sean Lee s Slide 45

46 Bypassing & Forwarding Clock Cycles L.D F4,0(R2) IF ID EX WB UL.D F0,F4,F6 IF ID S WB ADD.D F2,F0,F8 IF S ID S S S S S S A1 A2 A3 A4 WB S.D F2,0(R2) IF S S S S S S ID EX S S S WB 46

47 Strctral Hazards Clock Cycles UL.D F0,F4,F6 IF ID WB.... IF ID EX WB.... IF ID EX WB ADD.D F2,F4,F6 IF ID A1 A2 A3 A4 WB L.D F2,0(R2) IF ID EX WB IF ID EX WB IF ID EX WB Write to register file at the same cycle (cc11) Write to the same register (WAW) E in cc10 47

48 Precise Eception Isse DIV.D F0,F2,F4 ADD.D F3,F10,F8 SUB.D F12,F12,F14 (eception!) (completed) (completed) Precise eception: If the pipeline can (or mst) be stopped All the instrctions before the falty (or intended) instrction mst be completed All the instrctions after it mst not be completed Restart the eection from the falty (or intended) instrction State mst be consistent with the original program order Not straightforward with ot-of-order completion 48

49 Scalar Pipeline (Baseline) Instrction Seqence IF DE EX E WB Eection Cycle odified from Prof Sean Lee s Slide 49

50 Sperpipeline Deeper pipelining is called sperpipelining Deeper pipeline allows for achieving higher clock rates Instrction Seqence 1 2 IF DE EX E WB I I I D D D E E E W W W E E E E E D E E D D E D D D I D D I I D I I I Eection Cycle odified from Prof Sean Lee s Slide 50

51 CS359: Compter Architectre End of Part A Qestions? 51

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction Review Friday the 2st of October Real world eamples of pipelining? How does pipelining pp inflence instrction latency? How does pipelining inflence instrction throghpt? What are the three types of hazard