Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns

Size: px

Start display at page:

Download "Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns"

Brett Garrison
5 years ago
Views:

1 Chapter Si Pipelining Improve perfomance by increasing instruction throughput eecutionı Time lw $, ($) access lw $2, 2($) 8 ns access lw $3, 3($) eecutionı Time lw $, ($) lw $2, 2($) 2 ns 8 ns access ı access 8 ns... lw $3, 3($) 2 ns access 2 ns 2 ns 2 ns 2 ns 2 ns Ideal speedup is number of stages in the pipeline. Do we achieve this? 2

2 Pipelining What makes it easy all instructions are the same length just a few instruction formats operands appear only in loads and stores What makes it hard? structural hazards: suppose we had only one control hazards: need to worry about branch instructions hazards: an instruction depends on a previous instruction We ll build a simple pipeline and look at these issues We ll talk about modern processors and what really makes it hard: eception handling trying to improve performance with out-of-order eecution, etc. 3 Basic Idea IF: ID: decode/ı register file read EX: Eecute/ı address calculation E: emory access : Write back ı register register 2 isters register 2 Zero ı 6 etend 32 What do we need to add to actually split the path into stages?

3 Pipelined Datapath IF/ID ID/EX EX/E E/ ı register register 2 isters register 2 Zero ı 6 etend 32 Can you find a problem even if there are no dependencies? What instructions can we eecute to manifest the problem? 5 Corrected Datapath IF/ID ID/EX EX/E E/ ı register register 2 isters register 2 Zero ı 6 etend 32 6

4 Graphically Representing Pipelines eecutionı lw $, 2($) Time (in clock cycles) CC CC 2 CC 3 CC CC 5 CC 6 sub $, $2, $3 Can help with answering questions like: how many cycles does it take to eecute this code? what is the doing during cycle? use this representation to help understand paths 7 Pipeline Control Src IF/ID ID/EX EX/E E/ Write ı Branch register register 2 isters 2 register [5 ] 6 32 etend Src 6 ı control Zero ı emwrite emread emto [2 6] [5 ] Op Dst 8

5 Pipeline control We have 5 stages. What needs to be controlled in each stage? Fetch and Increment Decode / ister Fetch Eecution emory Stage Write Back How would control be handled in an automobile plant? a fancy control center telling everyone what to do? should we use a finite state machine? 9 Pipeline Control Pass control signals along just like the Eecution/ Calculation stage control lines emory access stage control lines stage control lines Dst Op Op Src Branch em Read em Write write em to R-format lw sw X X beq X X Control EX IF/ID ID/EX EX/E E/

6 Datapath with Control Src Control ID/EX EX/E E/ IF/ID EX Write register register 2 isters 2 register ı Src Zero ı Branch emwrite emto [5 ] 6 32 etend 6 ı control emread [2 6] [5 ] Dst Op Dependencies Problem with starting net instruction before first is finished dependencies that go backward in time are hazards Time (in clock cycles) Value of ı register $2: eecutionı sub $2, $, $3 CC CC 2 CC 3 CC CC 5 CC 6 CC 7 CC 8 CC 9 / and $2, $2, $5 or $3, $6, $2 add $, $2, $2 sw $5, ($2) 2

7 Software Solution Have compiler guarantee no hazards Where do we insert the nops? sub $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $, $2, $2 sw $5, ($2) Problem: this really slows us down! 3 Forwarding Use temporary s, don t wait for them to be written register file forwarding to handle read/write to same register forwarding Time (in clock cycles) CC CC 2 CC 3 CC CC 5 CC 6 CC 7 CC 8 CC 9 Value of register $2 : / Value of EX/E : X X X 2 X X X X X Value of E/ : X X X X 2 X X X X eecution sub $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $, $2, $2 sw $5, ($2) what if this $2 was $3?

8 Forwarding ID/EX EX/E Control E/ IF/ID EX isters IF/ID.isterRs Rs IF/ID.isterRt Rt IF/ID.isterRt IF/ID.isterRd Rt Rd EX/E.isterRd Forwardingı unit E/.isterRd 5 Can't always forward Load word can still cause a hazard: an instruction tries to read a register following a load instruction that writes to the same register. eecutionı lw $2, 2($) Time (in clock cycles) CC CC 2 CC 3 CC CC 5 CC 6 CC 7 CC 8 CC 9 and $, $2, $5 or $8, $2, $6 add $9, $, $2 slt $, $6, $7 Thus, we need a hazard detection unit to stall the load instruction 6

9 Stalling We can stall the pipeline by keeping an instruction in the same stage Time (in clock cycles) eecutionı CC CC 2 CC 3 CC CC 5 CC 6 CC 7 CC 8 CC 9 CC lw $2, 2($) and $, $2, $5 or $8, $2, $6 add $9, $, $2 bubble slt $, $6, $7 7 Hazard Detection Unit Stall by letting an instruction that won t write anything go forward Hazardı detectionı unit ID/EX.emRead ID/EX IF/IDWrite IF/ID Control EX EX/E E/ Write isters IF/ID.isterRs IF/ID.isterRt IF/ID.isterRt IF/ID.isterRd ID/EX.isterRt Rt Rd Rs Rt Forwardingı unit EX/E.isterRd E/.isterRd 8

10 Branch Hazards When we decide to branch, other instructions are in the pipeline! eecutionı Time (in clock cycles) CC CC 2 CC 3 CC CC 5 CC 6 CC 7 CC 8 CC 9 beq $, $3, 7 and $2, $2, $5 8 or $3, $6, $2 52 add $, $2, $2 72 lw $, 5($7) We are predicting branch not taken need to add hardware for flushing instructions if we are wrong 9 Flushing s IF.Flush Hazardı detectionı unit ID/EX EX/E Control E/ IF/ID EX isters = etend Forwardingı unit 2

11 Improving Performance Try and avoid stalls! E.g., reorder these instructions: lw $t, ($t) lw $t2, ($t) sw $t2, ($t) sw $t, ($t) a branch delay slot the net instruction after a branch is always eecuted rely on compiler to fill the slot with something useful Superscalar: start more than one instruction in the same cycle 2 Dynamic Scheduling The hardware performs the scheduling hardware tries to find instructions to eecute out of order eecution is possible speculative eecution and dynamic branch prediction All modern processors are very complicated DEC Alpha 226: 9 stage pipeline, 6 instruction issue Power and Pentium: branch history table Compiler technology important This class has given you the background you need to learn more Video: An Overview of Intel s Pentium Processor (available from University Video Communications) 22

Improve performance by increasing instruction throughput

Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access