ECE232: Hardware Organization and Design

Size: px

Start display at page:

Download "ECE232: Hardware Organization and Design"

Arron West
5 years ago
Views:

1 ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

2 Outline The textbook includes lots of information Focus on material covered on homeworks, lecture, and discussion Most important Read after write hazards Control hazards (branches) Today: Wrapup important pipelining issues Talk about improving control hazard performance ECE232: Pipelining Wrapup 2

3 Forwarding I n s t r. add $1, sub $4,$1,$5 O r d e r and $6,$7,$1 or $8,$1,$1 sw $4,4($1) ECE232: Pipelining Wrapup 3

4 Data Forwarding (aka Bypassing) Take the result from the point that it exists in any of the pipeline state registers and forward it to the functional unit (e.g., the ) that needs it that cycle For functional unit: the inputs can come from any pipeline register rather than just from ID/EX by add multiplexors to the inputs of the connect the result data in EX/MEM or MEM/WB to both of the EX s stage Rs and Rt mux inputs add the proper control hardware to control the new muxes Other functional units may need similar forwarding logic (e.g., the DMem) IMem Reg DMem Reg ECE232: Pipelining Wrapup 4

5 PC Datapath with Forwarding Hardware PCSrc ID/EX EX/MEM IF/ID Control 4 Instruction Memory Read Address Add Read Addr 1 Register Read Read Addr Data 2 1 File Write Addr Write Data Read Data 2 16 Sign 32 Extend Shift left 2 Add cntrl Branch Address Data Memory Write Data Read Data MEM/WB Forward Unit ECE232: Pipelining Wrapup 5

6 Overall All modern day processors use pipelining Pipelining doesn t help latency of single task, it helps throughput of entire workload Pipeline clock cycle determined/limited by slowest pipeline stage Unbalanced pipe stages cause inefficiencies The time to fill pipeline and time to drain it can impact speedup for deep pipelines and short code runs Must detect and resolve hazards ECE232: Pipelining Wrapup 6

7 Review: Pipeline Hazards Structural hazards Design pipeline to eliminate structural hazards Data hazards read after write - RAW Use data forwarding inside the pipeline For those cases that forwarding won t solve (e.g., load-use) include hazard hardware to insert stalls/bubbles Control hazards beq, bne,j,jr,jal Stall hurts performance Move decision point as early in the pipeline as possible reduces number of stalls at the cost of additional hardware Delay decision (requires compiler support) Delayed Branch Predict outcome of Branch Static prediction e.g., always not-taken Dynamic prediction prediction per branch in program ECE232: Pipelining Wrapup 7

8 Branch Instructions Cause Control Hazards I n s t r. O r d e r beq lw Inst 3 Inst 4 jr F D EX M W F D EX M W ECE232: Pipelining Wrapup 8

9 PC BEQ resolved during the MEM stage PCSrc ID/EX EX/MEM IF/ID Control 4 Instruction Memory Read Address Add Read Addr 1 Register Read Read Addr Data 2 1 File Write Addr Write Data Read Data 2 Shift left 2 Add Branch Address Data Memory Write Data Read Data MEM/WB 16 Sign 32 Extend ECE232: Pipelining Wrapup 9

10 One Way to Fix a Control Hazard I n s t r. beq stall O r d e r stall stall lw Inst 3 Fix branch hazard by waiting introduce stalls IM Reg DM ECE232: Pipelining Wrapup 10

11 Reducing Control Hazards Penalties Stalls hurts performance Deeper pipelines have higher penalties 1. Move decision point as early in the pipeline as possible reduces number of stalls at the cost of additional hardware 2. Delay decision (requires compiler support) Delayed Branch : NEXT not effective for deeper pipes - requiring more than one delay slot to be filled 3. Predict outcome of branch beq $1,$2,NEXT add $4,$3,$5 sub $7,$2,$8 ECE232: Pipelining Wrapup 11

12 Reducing branch penalty through HW design ECE232: Pipelining Wrapup 12

13 Branch Prediction Easiest - static prediction Always taken, always not taken Opcode based Displacement based (forward not taken, backward taken) Compiler directed (branch likely, branch not likely) Dynamic prediction prediction per branch in program 1 bit predictor remember last taken/not taken per branch Use a branch-history table (BHT) with 1 bit entry Use part of the PC (low-order bits) to index table Why? ECE232: Pipelining Wrapup 13 Multiple branches may share the same bit Invert the bit if prediction is wrong Branch PC BHT Predictor 0 Predictor 1 Predictor 127

14 Branch Prediction 1 bit predictor Backward branches for loops will be mispredicted twice EX: If a loop branches 9 times in a row and not taken once, what is the prediction accuracy? Misprediction at the first and last loop iteration => 80% prediction accuracy, although branch is taken 90%... TTT T N TT... N T Modern processors multiple instructions issued per cycle, more branch hazards will occur per cycle Cost of branch mispredicted goes up Pentium II 3 instructions issued per cycle, 12+ cycle misprediction penalty Huge penalty for a misfetched path following a branch ECE232: Pipelining Wrapup 14

2-bit Branch Prediction 4 states instead of 2, allowing for more information about tendencies A prediction must miss twice before it is changed Good for backward

15 2-bit Branch Prediction 4 states instead of 2, allowing for more information about tendencies A prediction must miss twice before it is changed Good for backward branches of loops 2-bit saturating counter T N Predict Taken Predict not taken T T N T N Predict Taken Predict not taken N... TTT T T TT... N T ECE232: Pipelining Wrapup 15

16 Summary Make sure you understand the impact of different pipeline stages How many delay slots for a branch? Why? How many stalls needed for a RAW hazard? Why? Why does it make a difference if a register read obtains the register value that is also written? Branch prediction Can reduce number of required branch delay slots Only requires a small amount of hardware Useful for high performance Next time: considering processor performance ECE232: Pipelining Wrapup 16

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor