Lecture 6: Pipelining i CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and other sources for academic purpose only. The instructor does not claim any originality.
The Big Picture: Where are We Now? We know five classic components of a computer. We understand how the instruction set plays a key role in determining the performance, the design complexity of the path and controller. We designed a processor comprising of path and controller for a small set of instructions. Datapath: th Single-cycle implementation - Too slow ulti-cycle implementation Large instructions take longer time, small instructions take shorter time Controller: Approach I: FS Based Approach Structured approach to derive a circuit implementation from FS specification Implementation styles: () Random-logic (2) PLA (3) RO Approach 2: icroprogramming Control written as a program using microinstructions Flexible but slower 2
Pipelining is Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold. A B C D Washing takes 3 minutes. Drying takes 3 minutes. Folding takes 3 minutes. Putting-away takes 3 minutes to put clothes into drawers. 3
Sequential Laundry 6 P 7 8 9 2 2 A T a s k O r d e r 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Time A B C D Sequential laundry takes 8 hours for 4 loads. Ifthey learned pipelining, pp how long would laundry take? 4
Pipelined Laundry: Start work ASAP 2 2 A 6 P 7 8 9 T a s k O r d e r A B C D 3 3 3 3 3 3 3 Time Pipelined laundry takes 3.5 hours for 4 loads! 5
Pipelining Lessons T a s k O r d e r 6 P 7 8 9 Time 3 3 3 3 3 3 3 A B C D Pipelining doesn t help latency of single task, it helps throughput of entire workload. ultiple tasks operating simultaneously using different resources. Potential speedup = Number pipeline stages. Pipeline rate limited by slowest pipeline stage. Unbalanced lengths of pipe stages reduces speedup. Time to fill pipeline and time to drain it reduces speedup. Stall for dependences. 6
IPS case: The Five Stages of Load Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load Ifetch /Dec Exec em WB Ifetch: Fetch Fetch the instruction from the emory /Dec: isters Fetch and Decode Exec: Calculate the memory address em: the from the Data emory WB: the back to the register file 7
Conventional Pipelined Execution Representation Time IFetch Dcd Exec em WB IFetch Dcd Exec em WB IFetch Dcd Exec em WB IFetch Dcd Exec em WB Program Flow IFetch Dcd Exec em WB IFetch Dcd Exec em WB 8
Single Cycle, ultiple Cycle, vs. Pipeline Clk Cycle Single Cycle Implementation: Cycle 2 Load Store Waste Clk Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle ultiple Cycle Implementation: Load Store Ifetch Exec em Wr Ifetch Exec em R-type Ifetch Pipeline Implementation: Load Ifetch Exec em Wr Store Ifetch Exec em Wr Improves performance by increasing instruction throughput. R-type Ifetch Exec em Wr 9
Program execution order Time (in instructions) lw $, ($) Single Cycle Vs Pipelining 2 4 6 8 2 4 6 8 Data ALU fetch access lw $2, 2($) 8 nsps fetch ALU Data access lw $3, 3($) 8ps ns Overall execution time = 3x8ps = 24ps. fetch... 8 nsps Program execution Time order (in instructions) lw $, ($) lw $2, 2($) lw $3, 3($) fetch 2 nsps 2 4 6 8 2 4 fetch 2ps ns ALU fetch 2 nsps Data access ALU 2 nsps Data access ALU 2 nsps Data access 2 nsps Overall execution time = 3x2ps = 6ps. 2 nsps Speedup = 24/6 = 4. Ideal speedup is number of stages in the pipeline. Do we achieve this?
Pipelining What makes it easy/hard? What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores What makes it hard? structural t hazards: suppose we had only one memory hazards: an instruction depends on a previous instruction control hazards: need to worry about branch instructions We ll build a simple pipeline and look at these issues.
Pipelining the Datapath: Basic Idea IF: fetch u ID: decode/ register file read EX: Execute/ address calculation E: emory access WB: back x Add 4 Add Add result Shift left 2 PC Address memory register register 2 isters 2 register u x Zero ALU ALU result Address Data memory ux 6 Sign extend 32 What do we need to add to actually split the path into stages? 2
ux Pipelined Datapath 64-bit 28-bit 97-bit 64-bit IF/ID ID/EX EX/E E/WB Add 4 Add Add result Shift left 2 PC Address register register 2 isters memory 2 register ux ruction Instr Zero ALU ALU result Address Data memory ux 6 Sign extend 32 Follow Fig. 2 (page-389) to Fig. 4 (page-39) to understand pipelined execution of lw instruction. Others instructions will be similar. 3
u x Corrected Datapath IF/ID ID/EX EX/E E/WB Add 4 Add Add result Shift left 2 PC In nstruction register Address register 2 isters memory 2 register u x Zero ALU ALU result Address Data memory u x 6 Sign extend 32 For load instruction, register number is needed in the last stage, thus same needs to be passed along in order to be preserved. 4
Graphically Representing Pipelines Tim e (in clock cycles) Program execution C C C C 2 C C 3 C C 4 C C 5 C C 6 o r d e r (in in s tru c tio n s ) lw $, 2($) I R e g ALU D I D sub $, $2, $3 ALU Pipeline can be thought of as a series of paths shifted in time. The above graphics can help in answering questions like: how many cycles does it take to execute this code? what is the ALU doing during cycle 4? use this representation to help understand paths 5
Pipeline Control PCSrc u x IF/ID ID/EX EX/E E/W B Add 4 W rite Shift le ft 2 Add A d d result Branch PC Address memory Instru c tion register register 2 isters register 2 ALUSrc u x Zero ALU ALU result Address W rite em Data memory emto u x [5 ] 6 Sign 32 extend 6 ALU control em [2 6] [5 ] u x ALUOp Dst 6
Pipeline Control We have 5 stages. What needs to be controlled in each stage? Fetch and PC Increment Decode / ister Fetch Execution emory Stage Back 7
Pipeline Control Pass control signals along just like the. Execution/Address Calculation stage control lines emory access stage control lines stage control lines ALU ALU ALU em em em to Dst Op Op Src Branch write R-format lw sw X X beq X X WB Control WB EX WB IF/ID ID/EX EX/E E/WB 8
PCSrc Datapath with Control ux Control ID/EX WB EX/E WB E/WB IF/ID EX WB Add PC 4 Add Add result Address memory ruction Instr register register 2 isters 2 register Shift left 2 ux ALUSrc Zero ALU ALU result Branch em Address Data memory em mto ux [5 ] 6 32 Sign extend 6 ALU control em [2 6] [5 ] ALUOp ux Dst 9
Pipeline Hazards ajor Hurdles of Pipelining Dictionary meaning of hazard: a source of danger structural hazards: attempt to use the same resource two different ways at the same time e.g., combined washer/dryer would be a structural hazard hazards: attempt to use item before it is ready ti dependsd onresult of prior instruction ti still in the pipeline control hazards: attempt to make a decision before condition is evaluated Branch instructions One Solution: Wait until dependencies are resolved pipeline control must detect the hazard take action (or delay action) to resolve hazards 2