ECE/CS 552: Pipelining

Size: px

Start display at page:

Download "ECE/CS 552: Pipelining"

Rhoda Samantha Perry
5 years ago
Views:

1 ECE/CS 552: Pipelining Prof. ikko Lipasti Lecture notes based in part on slides created by ark Hill, David Wood, Guri Sohi, John Shen and Jim Smith

2 Forecast Big Picture Datapath Control Pipelining

3 s Program (code size) otivation Single cycle implementation CPI = X Cycles (CPI) Cycle = imem + RFrd + ALU + dmem + RFwr + muxes + control E.g = 2000ps Time/program = P x 2ns X Time Cycle (cycle time)

4 ulticycle ulticycle implementation: Cycle: Instr: i F D X W i+ F D X i+2 F D X i+3 F i+4

5 ulticycle ulticycle implementation CPI = 3, 4, 5 Cycle = max(memory, RF, ALU, mux, control) = max(500,250,500) = 500ps Time/prog = P x 4 x 500 = P x 2000ps = P x 2ns Would like: CPI = + overhead from hazards (later) Cycle = 500ps + overhead In practice, ~3x improvement

6 Big Picture latency = 5 cycles throughput = /5 instr/cycle CPI = 5 cycles per instruction Instead Pipelining: process instructions like a lunch buffet ALL microprocessors use it E.g. Intel Core i7, AD Jaguar, AR A9

7 Big Picture Latency = 5 cycles (same) throughput = instr/cycle CPI = cycle per instruction CPI = cycle between instruction completion =

8 Ideal Pipelining L Comb. Logic n Gate Delay BW = ~(/n) L ṉ - Gate Delay L ṉ 2-2 Gate Delay BW = ~(2/n) L ṉ Gate Gate Delay L ṉ ṉ - - Gate Delay L Delay BW = ~(3/n) Bandwidth increases linearly with pipeline depth Latency increases by latch delays

9 Example: Integer ultiplier [Source: J. Hayes, Univ. of ichigan] 6x6 combinational multiplier ISCAS-85 C6288 standard benchmark Tools: Synopsys DC/LSI Logic 0nm gflxp ASIC 9

10 Example: Integer ultiplier Configuration Delay PS Area (FF/wiring) Area Increase Combinational 3.52ns (--/759) 2 Stages.87ns 534 (.9x) 8725 (078/870) 6% 4 Stages.7ns 855 (3.0x) 276 (3388/22) 50% 8 Stages 0.80ns 250 (4.4x) 727 (8938/262) 27% Pipeline efficiency 2-stage: nearly double throughput; marginal area cost 4-stage: 75% efficiency; area still reasonable 8-stage: 55% efficiency; area more than doubles Tools: Synopsys DC/LSI Logic 0nm gflxp ASIC 0

11 Ideal Pipelining Cycle: Instr: i F D X W i+ F D X W i+2 F D X W i+3 F D X W i+4 F D X W 2 3

12 Pipelining Idealisms Uniform subcomputations Can pipeline into stages with equal delay Identical computations Can fill pipeline with identical work Independent computations No relationships between work units No dependences, hence no pipeline hazards Are these practical? No, but can get close enough to get significant speedup

13 Complications Datapath Five (or more) instructions in flight Control ust correspond to multiple instructions s may have data and control flow dependences I.e. units of work are not independent One may have to stall and wait for another

14 Datapath

15 Datapath

16 Control Control Concurrently set by 5 different instructions Divide and conquer: carry IR down the pipe

17 Pipelined Datapath Start with single-cycle datapath Pipelined execution Assume each instruction has its own datapath But each instruction uses a different part in every cycle ultiplex all on to one datapath Latches separate cycles (like multicycle) Ignore dependences and hazards for now Data control

18 Pipelined Datapath 0 u x IF/ID ID/EX EX/E E/WB Add 4 Shift left 2 Add Add result PC Address memory Read register Read data Read register 2 Registers Read data 2 Write register Write data 0 u x Zero ALU ALU result Address Write data Data memory Read data u x 0 6 Sign extend 32

19 Pipelined Datapath flow add and load Write of registers Pass register specifiers Any info needed by a later stage gets passed down the pipeline E.g. store value through EX

20 Pipelined Control IF and ID None EX ALUop, ALUsrc, RegDst E Branch, emread, emwrite WB emtoreg, RegWrite

21 Datapath Control Signals PCSrc 0 u x IF/ID ID/EX EX/E E/WB Add 4 RegWrite Shift left 2 Add Add result Branch PC Address memory Read register Write data Read data Read register 2 Registers Read Write data 2 register [5 0] 6 Sign 32 extend ALUSrc 0 u x 6 ALU control Zero ALU ALU result Address Write data emwrite Data memory emread Read data emtoreg u x 0 [20 6] [5 ] 0 u x ALUOp RegDst

22 Pipelined Control WB Control WB EX WB IF/ID ID/EX EX/E E/WB

23 All Together PCSrc 0 u x Control ID/EX WB EX/E WB E/WB IF/ID EX WB Add PC 4 Address memory Read register Write data RegWrite Read data Read register 2 Registers Read Write data 2 register Shift left 2 0 u x Add Add result ALUSrc Zero ALU ALU result Branch Write data emwrite Address Data memory Read data emtoreg u x [5 0] Sign extend 6 ALU control emread [20 6] [5 ] 0 u x RegDst ALUOp

24 Pipelined Control Controlled by different instructions Decode instructions and pass the signals down the pipe Control sequencing is embedded in the pipeline No explicit FS Instead, distributed FS

25 Summary Big Picture Datapath Control Next Program dependences Pipeline hazards

CSE 533: Advanced Computer Architectures. Pipelining. Instructor: Gürhan Küçük. Yeditepe University

CSE 533: Advanced Computer Architectures Pipelining Instructor: Gürhan Küçük Yeditepe University Lecture notes based on notes by Mark D. Hill and John P. Shen Updated by Mikko Lipasti Pipelining Forecast