Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Size: px

Start display at page:

Download "Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception"

Loreen Long
5 years ago
Views:

1 Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1

2 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add The MEM stage IF/ID ID/EX EX/MEM MEM/WB Write Shift left 2 Add Add result Branch PCSrc 4.9 Branch Control Hazard PC Address Instruction memory Instruction Read register 1 Read Read data 1 register 2 isters Read Write data 2 register Write data Instruction [15 0] 16 Sign 32 extend ALUSrc 0 M u x 1 6 ALU control Zero ALU ALU result Address Write data MemWrite Data memory MemRead Read data Memto 1 M u x 0 Instruction [20 16] Instruction [15 11] 0 M u x 1 ALUOp Dst 2

3 Control (or Branch) Hazards Branch decision is made in the MEM stage Unable to determine the following instruction in pipeline immediately Control hazard will occur Stall the pipeline utill branch decision is known not efficient, slow the pipeline significantly! if $1=$3 Branch decision is made in the MEM stage 40 beq $1, $3, and $12, $2, $5 48 or $13, $6, $2 52 add $14, $2, $2 72 lw $3, 50($7) beq $1, $3, 28 nop nop nop IM DM IM DM IM DM IM DM lw $3, 50($7) IM DM 3 cycles are wasted 3

4 Better solution 1: predict branch outcome Predict branch not taken => continue with next sequential instructions Correct prediction => no penalty and save time Incorrect prediction => have to flush the pipeline behind the branch (see next slide Case 1: Assume branch not taken and prediction is correct Pipeline is executed as normal Addr beq $1, $3, 28 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 lw $3, 50($7) 40 beq $1, $3, and $12, $2, $5 48 or $13, $6, $2 IM DM IM DM IM DM 52 add $14, $2, $2 IM DM 4

5 Better solution 1: predict branch outcome 2 Case 2: Assume Branch not taken, Incorrect Prediction (branch outcome is determined in MEM) Program execution order (in instructions) Time (in clock cycles) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 40 beq $1, $3, 7 IM DM Nop 44 and $12, $2, $5 Nop 48 or $13, $6, $2 Nop 52 add $14, $2, $2 IM IM IM DM DM DM Flush these instructions (Set control values to 0) 72 lw $4, 50($7) PC IM DM 5

6 How flushing instructions is done? When misprediction occurs Flush: Zero out all the control values (or the instruction itself) in pipeline registers for the instructions following the branch that are already in the pipeline Similar to the strategy as for stalling on load use data hazard (RAW) CC1 CC2 CC3 CC4 CC5 Cleared 40 beq $1, $3, 7 IM DM Nop 44 and $12, $2, $5 Nop 48 or $13, $6, $2 Nop 52 add $14, $2, $2 IM IM IM DM DM DM Zero out all the control values (or the instruction itself) in pipeline registers for the instructions 72 lw $4, 50($7) IM DM 6

7 Better Solution 2:Reducing Branch Delay If branch decision is made at MEM stage, three instructions are flushed if misprediction occurs How to reduce Brach delay =>Decide branch outcome earlier (make decision in ID stage) =>only one instruction is flushed (IF stage) IM DM Flush this instruction IM DM IM DM 7

8 Original branch decision Hardware that are moved to ID stage => Target address calculation & ister Comparator PCSrc 0 M u x 1 IF/ID ID/EX EX/MEM MEM/WB Add 4 Write Shift left 2 Add Add result Branch PC Address Instruction memory Instruction Read register 1 Read data 1 Read register 2 isters Read Write data 2 register Write data Instruction [15 0] 16 Sign 32 extend ALUSrc 0 M u x 1 6 ALU control Zero ALU ALU result Address Write data MemWrite Data memory MemRead Read data Memto 1 M u x 0 Instruction [20 16] Instruction [15 11] 0 M u x 1 ALUOp Dst 8

9 IF.Flush Optimized Datapath for Branch IF.Flush signal zeros out the instruction (which follows the branch) in the IF/ID pipeline register target address calculation ister comparator PC 4 M u x Instruction memory IF/ID Hazard detection unit Control Shift left 2 0 isters M u x = ID/EX WB M EX M u x M u x ALU EX/MEM WB M Data memory MEM/WB WB M u x Sign extend M u x Forwarding unit Branch decision is moved from the MEM stage to the ID stage simplified drawing not showing enhancements to the forwarding and hazard detection units 9

10 Reducing Branch Delay by detecting at ID stage Two changes are needed to move the branch decision to the ID stage Target address calculation calculating the branch target address in ID stage, inputs to this adder, the PC value and the immediate fields are already available in the IF/ID pipeline register) ister comparator calculating the branch decision in ID stage, for equality test, by XORing respective bits and then ORing all the results and inverting, rather than using the ALU to subtract and then test for zero (when there is a carry delay) Also modify the forwarding and hazard detection units to forward to or stall the branch at the ID stage in case the branch decision depends on an earlier result (see textbook for more details) 10

11 Example: branch taken Reducing Branch Delay 36: sub $10, $4, $8 40: beq $1, $3, 7 44: and $12, $2, $5 48: or $13, $2, $6 52: add $14, $4, $2 56: slt $15, $6, $ : lw $4, 50($7) 11

12 36 sub $10, $4, $8 40 beq $1, $3, 7 44 and $12 $2, $5 48 or $13 $2, $6 52 add $14, $4, $2 56 slt $15, $6, $7 72 lw $4, 50($7) Example: Branch is predicted not taken, but actually taken Assume $1 == $3, and predict not taken, but prediction is incorrect) 12

predicted not taken, but actually taken Assume $1 == $3, and predict not

13 36 sub $10, $4, $8 40 beq $1, $3, 7 44 and $12 $2, $5 48 or $13 $2, $6 52 add $14, $4, $2 56 slt $15, $6, $7 72 lw $4, 50($7) Example: Branch is predicted not taken, but actually taken Assume $1 == $3, and predict not taken (incorrect prediction) Optimized pipeline with only one bubble penalty 13

14 Data Hazards for Branches Are any stalls in need in the following instructions? If so, how many? Don t forget forwarding. add $1, $2, $3 add $4, $5, $6 add $6, $7, $8 beq $1, $4, target IM DM IM DM IM DM IM DM Solution: no stall, due to forwarding 14

15 Hidden Data Hazards for Branches 2 Are any stalls in need in the following instructions? If so, how many? Don t forget forwarding. lw $1, addr add $4, $5, $6 beq $1, $4, target 15

16 Data Hazards for Branches If a comparison register is a destination of immediately preceding load instruction Need 2 stall cycles lw $1, addr beq $1, $0, target How many stalls? lw $1, addr IF ID EX MEM WB nop IF ID nop ID beq $1, $0, target Solution: two stalls, even with forwarding 17 ID EX MEM WB

17 Static branch prediction (Recap) Branch Prediction Based on typical branch behavior Example: loop and if statement branches Predict backward branches taken Predict forward branches not taken taken Memory 0xf f ffffff 0x120040ac 0x Today: Dynamic branch prediction Prediction based on record recent history of each branch Hardware measures actual branch behavior Taken, Taken, Taken, Taken not taken data What is the next prediction? Taken for Not Taken? 0x c 0x x x word address 18

18 Better Solution 3: Dynamic Branch Prediction Improve prediction accuracy Based on the past history Use dynamic prediction Branch prediction buffer (aka branch history table) Indexed by recent branch instruction addresses Stores outcome (taken/not taken) To execute a branch Check table, expect the same outcome Based the table, make prediction Branch PC 1 Recent history (1 bit predictor) BHT 19

19 1 bit predictor: Example: 1 Bit Predictor When 0=> predict not taken, When 1=>predict taken, Change states when incorrectly predicted Example: assume four branches are taken, taken, taken, not taken, 1 bit predictor is initialized to 0, what is the prediction accuracy if 1 bit predictor is used Execution pattern T T T N Predictor value Predicted branch Correct or incorrect 0 N I 1 T C 1 T C 1 T I Accurac y=2/4 = 50% 20

is initialized to 0 Execution Pattern Predictor value at time of prediction Predicted branch

20 Four states: 00, 01, 10, 11 2 Bit Predictor 00,01=> predict not taken, 10, 11=> prediction taken Only change prediction on two successive mispredictions bit predictor is initialized to 0 Execution Pattern Predictor value at time of prediction Predicted branch Prediction result in steady state T T 0 1 N N I I 21 N T T T N T T N T T T T I I C C I C

21 Another Example: Consider the following loop branch that branches nine times in a row (taken), and then is not taken once. Prediction accuracy for 1 bit predictor Prediction accuracy for 2 bit predictor 1 bit predictor Execution Pattern: Predictor value at time of prediction Predicted branch Prediction result in steady state 2 bit predictor Execution Pattern: Predictor value at time of prediction Predicted branch Prediction result in steady state Loop: sll $t1, $s3, 2 add $t1, $t1, $s6 lw $t0, 0($t1) bne $t0, $s5, Exit addi $s3, $s3, 1 j Loop Exit: T T T T T T T T T N T T T T T T T T T N N T T T T T T T T T N T T T T T T T T T I C C C C C C C C I I C C C C C C C C I Prediction accuracy (on average) for many loops: 80% T T T T T T T T T N T T T T T T T T T N T N N T T T T T T T T T T T T T T T T T T T I I C C C C C C C I C C C C C C C C C I C Prediction accuracy (on average) for many loops: 90% 22

22 Delay Branch Delay branch: the instruction after branch instruction, Sequential successor1, is always executed no matter taken or not taken called branch delay slot Schedule a instruction that always run Branch instruction Sequential successor1 Branch target if taken Delay Slot Scheduling branch delay slot to reduce branch penalties From before From target From fall through 23

Delay Branch (from target or fall through) From target and from

in a 5 stage pipeline single issue But less effective in modern

ADD is not determined => can t be moved to delay slot Target DSUB

determined => can t be moved to delay slot Not taken fallthrough

23 Delay Branch (from target or fall through) From target and from fall through is used when from before is not available Effective in a 5 stage pipeline single issue But less effective in modern CPU with longer pipeline and multiple issues because Since s1 in ADD is not determined => can t be moved to delay slot Target DSUB is copied (not just moved) to delay slot Since $s1 in ADD is not determined => can t be moved to delay slot Not taken fallthrough instruction SUB is moved to delay slot Single delay slot is not enough 24

24 Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 27

25 Exceptions and Interrupts Unexpected events requiring change in flow of control Different ISAs use the terms differently Exception Arises within the CPU e.g., undefined opcode, overflow, syscall, Interrupt From an external I/O controller Dealing with them without sacrificing performance is hard 4.9 Exceptions Type of event From where? MIPS terminology I/O device request External Interrupt Invoke OS from user program Internal Exception Arithmetic overflow Internal Exception Using an undefined instruction Internal Exception Hardware malfunction Either Exception or interrupt 28

26 Handling Exceptions In MIPS, exceptions managed by a System Control Coprocessor (CP0) Save PC of offending (or interrupted) instruction In MIPS: Exception Program Counter (EPC) Save indication of the problem Cause register (Used by MIPS) We ll assume 1 bit, 0 for undefined opcode, 1 for overflow Vectored interrupts In MIPS, jump to handler at

27 Handler Actions Read cause, and transfer to relevant handler Determine action required If restartable Take corrective action use EPC to return to program Otherwise Terminate program Exception occurs Report error using EPC, cause, Save address to EPC Transfer control to OS and perform actions for exceptions Terminate the program Use EPC to restart the program Cause register Vectored interrupt 31

28 Exceptions in a Pipeline Another form of control hazard Flush later instructions Fetches new instructions Consider overflow on add in EX stage add $1, $2, $1 Prevent $1 from being clobbered Complete previous instructions Flush add and subsequent instructions Set Cause and EPC register values Transfer control to handler Similar to mispredicted branch Use much of the same hardware 32

29 Pipeline with Exceptions 33

30 Pipeline with Exceptions 2 Need to flush later instructions Flush IF stage IF.Flush : same as before Change to nop instructions Flush ID stage Add ID.Flush signal ID.Flush is Ored with stall signal from hazard detection unit Flush EX stage Add Ex.Flush to cause new MUX to zero the controls To start from location Add additional input to the PC MUX that sends to PC EPC and Cause register 34

31 Restartable exceptions Exception Properties Pipeline can flush the instruction Handler executes, then returns to the instruction Refetched and executed from scratch PC saved in EPC register Identifies causing instruction Actually PC + 4 is saved Handler must adjust 35

32 Exception Example Exception on add in 40 sub $11, $2, $4 44 and $12, $2, $5 48 or $13, $2, $6 4C add $1, $2, $1 50 slt $15, $6, $7 54 lw $16, 50($7) Handler sw $25, 1000($0) sw $26, 1004($0) 36

33 Exception Example cycle 6 37

34 Exception Example cycle 7 38

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU