ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

Size: px

Start display at page:

Download "ECE473 Computer Architecture and Organization. Pipeline: Control Hazard"

Archibald Foster
5 years ago
Views:

1 Computer Architecture and Organization Pipeline: Control Hazard Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 15.1

2 Pipelining Outline Introduction Defining Pipelining Pipelining Instructions Hazards Structural hazards Data Hazards Control Hazards \ Performance Controller implementation Lec 15.2

3 Pipeline Hazards Where one instruction cannot immediately follow another Types of hazards Structural hazards - attempt to use same resource twice Control hazards - attempt to make decision before condition is evaluated Data hazards - attempt to use data before it is ready Can always resolve hazards by waiting Lec 15.3

4 Control Hazards A control hazard is when we need to find the destination of a branch, and can t fetch any new instructions until we know that destination. A branch is either Taken: PC <= PC Imm Not Taken: PC <= PC + 4 Lec 15.4

5 ALU ALU ALU ALU ALU Control Hazards Control Hazard on Branches Three Stage Stall 10: beq r1,r3,36 Ifetch DMem 14: and r2,r3,r5 Ifetch DMem 18: or r6,r1,r7 Ifetch DMem 22: add r8,r1,r9 Ifetch DMem 36: xor r10,r1,r11 Ifetch DMem The penalty when branch take is 3 cycles! Lec 15.5

6 Basic Pipelined Processor In our original Design, branches have a penalty of 3 cycles Lec 15.6

7 Reducing Branch Delay Move following to ID stage a) Branch-target address calculation b) Branch condition decision Reduced penalty (1 cycle) when branch take! Lec 15.7

8 Reducing Branch Delay: move branch logic to ID stage -> add $r4,$r5,$r6 IF ID EX MEM WB beq $r0,$r1,tgt IF ID EX MEM WB STALL BUBBLE BUBBLE BUBBLE BUBBLE BUBBLE sw $s4,200($t5) IF ID EX MEM WB beq writes PC here new PC used here Lec 15.8

9 Stall Control Hazard Solution #1 stop loading instructions until result is available Lec 15.9

10 Control Hazard Solution #2 Branch Prediction Just stalling for each branch is not practical Common assumption: branch not taken When assumption fails: flush three instructions Program execution order (in instructions) Time (in clock cycles) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 40 beq $1, $3, 7 IM DM 44 and $12, $2, $5 IM DM 48 or $13, $6, $2 IM DM 52 add $14, $2, $2 IM DM 72 lw $4, 50($7) IM DM (Fig. 6.37) Lec 15.10

11 Static Branch Prediction For every branch, predict whether the branch will be taken or not taken. Predicting branch not taken: 1. Speculatively fetch and execute in-line instructions following the branch 2. If prediction incorrect flush pipeline of speculated instructions Convert these instructions to NOPs by clearing pipeline registers These have not updated memory or registers at time of flush Predicting branch taken: 1. Speculatively fetch and execute instructions at the branch target address 2. Useful only if target address known earlier than branch outcome May require stall cycles till target address known Flush pipeline if prediction is incorrect Must ensure that flushed instructions do not update memory/registers Lec 15.11

12 Flush instructions in Branch Hazard 36 sub $10, $4, $8 40 beq $1, $3, 7 # taget = *4 = and $12, $2, $5 48 or $13, $2, $ lw $4, 50($7) Lec 15.12

13 Control Hazard - Stall add $r4,$r5,$r6 IF ID EX MEM WB beq $r0,$r1,tgt IF ID EX MEM WB STALL BUBBLE BUBBLE BUBBLE BUBBLE BUBBLE sw $s4,200($t5) IF ID EX MEM WB beq writes PC here new PC used here Lec 15.13

14 Control Hazard - Correct Prediction add $r4,$r5,$r6 IF ID EX MEM WB beq $r0,$r1,tgt IF ID EX MEM WB tgt: sw $s4,200($t5) IF ID EX MEM WB Fetch assuming branch taken Lec 15.14

15 Control Hazard - Incorrect Prediction add $r4,$r5,$r6 IF ID EX MEM WB beq $r0,$r1,tgt IF ID EX MEM WB tgt: sw $s4,200($t5) (inco rrect - STALL) IF BUBBLE BUBBLE BUBBLE BUBBLE or $r8,$r8,$r9 IF ID EX MEM WB Squashed instruction Lec 15.15

16 Lec 15.16

17 Flush instructions at IF stage in Branch Hazard Turn the instructions at IF stage into nop. Lec 15.17

18 Flush instructions at IF stage in Branch Hazard zero control signals 2 Turn the instructions at IF stage into nop. Lec 15.18

19 Branch Behavior in Programs Based on SPEC benchmarks on DLX Branches occur with a frequency of 14% to 16% in integer programs and 3% to 12% in floating point programs. About 75% of the branches are forward branches 60% of forward branches are taken 80% of backward branches are taken 67% of all branches are taken Why are branches (especially backward branches) more likely to be taken than not taken? Lec 15.19

20 1-Bit Branch Prediction Branch History Table (BHT): Lower bits of PC address index table of 1-bit values Says whether or not branch taken last time No address check (saves HW, but may not be right branch) If prediction is wrong, invert prediction bit 1 = branch was last taken 0 = branch was last not taken 1 prediction bit 0 a 31 a 30 a 11 a 2 a 1 a 0 branch instruction 1K-entry BHT 10-bit index 1 Instruction memory Hypothesis: branch will do the same again. Lec 15.20

21 1-Bit Branch Prediction Example: Consider a loop branch that is taken 9 times in a row and then not taken once. What is the prediction accuracy of 1-bit predictor for this branch assuming only this branch ever changes its corresponding prediction bit? Answer: 80%. Because there are two mispredictions one on the first iteration and one on the last iteration. Why? Lec 15.21

22 Solution: 2-bit scheme where change prediction only if get misprediction twice Red: stop, not taken Green: go, taken Predict Taken 2-Bit Branch Prediction (Jim Smith, 1981) Predict Not Taken T T NT T NT T NT NT Predict Taken Predict Not Taken Lec 15.22

23 2-bit Predictor Statistics Prediction accuracy of 4K-entry 2-bit prediction buffer on SPEC89 benchmarks: accuracy is lower for integer programs (gcc, espresso, eqntott, li) than for FP Lec 15.23

24 2-bit Predictor Statistics Prediction accuracy of 4K-entry 2-bit prediction buffer vs. infinite 2-bit buffer: increasing buffer size from 4K does not significantly improve performance Lec 15.24

25 Control Hazard Solution #3 Delay Branches Delayed branches code rearranged by compiler to place independent instruction after every branch (in delay slot). add $R4,$R5,$R6 beq $R1,$R2,20 lw $R3,400($R0) beq $R1,$R2,20 add $R4,$R5,$R6 lw $R3,400($R0) Lec 15.25

26 Scheduling the Delay Slot Lec 15.26

27 Delayed Branch Instruction in branch delay slot is always executed Compiler (tries to) move a useful instruction into delay slot. (a) From before the Branch: Always helpful when possible ADD R1, R2, R3 BEQZ R2, L1 BEQZ R2, L1 DELAY SLOT ADD R1, R2, R3 - - L1: L1: If the ADD instruction were: ADD R2, R1, R3 the move would not be possible Lec 15.27

28 Delayed Branch (b) From the Target: Helps when branch is taken. May duplicate instructions ADD R2, R1, R3 ADD R2, R1, R3 BEQZ R2, L1 BEQZ R2, L2 DELAY SLOT SUB R4, R5, R6 - - L1: SUB R4, R5, R6 L1: SUB R4, R5, R6 L2: L2: Instructions between BEQ and SUB (in fall through) must not use R4. Why is instruction at L1 duplicated? What if R5 or R6 changed? Lec 15.28

29 Delayed Branch ( c ) From Fall Through: Helps when branch is not taken. ADD R2, R1, R3 ADD R2, R1, R3 BEQZ R2, L1 BEQZ R2, L1 DELAY SLOT SUB R4, R5, R6 SUB R4, R5, R6 - - L1: L1: Instructions at target (L1 and after) must not use R4 till set again. Cancelling (Nullifying) Branch: Branch instruction indicates direction of prediction. If mispredicted the instruction in the delay slot is cancelled. Greater flexibility for compiler to schedule instructions. Lec 15.29

30 Delayed Branch Limitations of delayed branch Compiler may not find appropriate instructions to fill delay slots. Then it fills delay slots with noops. Visible architectural feature likely to change with new implementations»pipeline structure is exposed to compiler. Need to know how many delay slots. Lec 15.30

31 Summary - Control Hazard Solutions Stall - stop fetching instr. until result is available Significant performance penalty Hardware required to stall Predict - assume an outcome and continue fetching (undo if prediction is wrong) Performance penalty only when guess wrong Hardware required to "squash" instructions Delayed branch - specify in architecture that following instruction is always executed Compiler re-orders instructions into delay slot Insert "NOP" (no-op) operations when can't use (~50%) This is how original MIPS worked Lec 15.31

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add