Design of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017

Size: px
Start display at page:

Download "Design of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017"

Transcription

1 Design of Digital Circuits Lecture 5: Pipelining Prof. Onur Mutlu ETH Zurich Spring 27 3 April 27

2 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed Microarchitectures! Pipelining! Issues in Pipelining: Control & Data Dependence Handling, State Maintenance and Recovery,! Out-of-Order Execution! Issues in OoO Execution: Load-Store Handling, 2

3 Readings for This Week! H&H, Chapter 7.5 (keep reading) 3

4 Wrap Up Microprogramming 4

5 Remember: An Exercise in Microprogramming 5

6 Handouts! 7 pages of Microprogrammed LC-3b design! infk/inst-infsec/system-security-group-dam/education/ Digitaltechnik_7/lecture/lc3b-figures.pdf 6

7 A Simple LC-3b Control and Datapath 7

8 MAR <! PC PC <! PC + 2 8, 9 MDR <! M 33 R R IR <! MDR 35 To 8 RTI ADD 32 BEN<!IR[] & N + IR[] & Z + IR[9] & P [IR[5:2]] BR To To To 8 DR<!SR+OP2* set CC DR<!SR&OP2* set CC 5 AND XOR TRAP SHF LEA LDB LDW STW STB JSR JMP [BEN] 22 PC<!PC+LSHF(off9,) To 8 9 DR<!SR XOR OP2* set CC 2 PC<!BaseR To 8 To 8 MAR<!LSHF(ZEXT[IR[7:]],) 5 4 [IR[]] To 8 R MDR<!M[MAR] R7<!PC R PC<!MDR R7<!PC PC<!BaseR 2 R7<!PC To 8 PC<!PC+LSHF(off,) To 8 3 DR<!SHF(SR,A,D,amt4) set CC To 8 To 8 4 DR<!PC+LSHF(off9, ) set CC 2 MAR<!B+off6 6 MAR<!B+LSHF(off6,) 7 MAR<!B+LSHF(off6,) 3 MAR<!B+off6 To NOTES B+off6 : Base + SEXT[offset6] PC+off9 : PC + SEXT[offset9] *OP2 may be SR2 or SEXT[imm5] ** [5:8] or [7:] depending on MAR[] MDR<!M[MAR[5:] ] R R 3 DR<!SEXT[BYTE.DATA] set CC MDR<!M[MAR] 27 R DR<!MDR set CC R MDR<!SR 6 M[MAR]<!MDR R R MDR<!SR[7:] 7 M[MAR]<!MDR** R R To 8 To 8 To 8 To 9

9 GateMARMUX GatePC LD.PC PC ZEXT & LSHF MARMUX 6 6 LSHF PCMUX ADDRMUX LD.REG 3 SR2 6 SR2 OUT REG FILE SR OUT 3 3 DR SR [7:] 2 ADDR2MUX [:] SEXT [8:] SEXT SR2MUX [5:] [4:] SEXT SEXT CONTROL R LD.IR IR 6 LD.CC N Z P 2 B A ALUK ALU SHF 6 IR[5:] LOGIC 6 6 GateALU 6 GateSHF GateMDR MAR LD. MAR A Simple Datapath Can Become Very Powerful LOGIC MDR DATA.SIZE MAR[] 6 LD. MDR MIO.EN WE WE WE LOGIC MEMORY MEM.EN R [] R.W DATA. SIZE ADDR. CTL. LOGIC 2 MIO.EN INPUT KBDR KBSR DDR OUTPUT DSR 6 6 LOGIC DATA.SIZE MAR[] INMUX

10 State Machine for LDW Microsequencer COND COND BEN R IR[] Branch Ready Addr. Mode J[5] J[4] J[3] J[2] J[] J[],,IR[5:2] 6 IRD 6 Address of Next State State 8 () State 33 () State 35 () State 32 () State 6 () State 25 () State 27 ()

11 IR[:9] DR IR[:9] IR[8:6] SR DRMUX SRMUX (a) (b) IR[:9] N Z P Logic BEN (c)

12

13 R IR[5:] BEN Microsequencer 6 Simple Design of the Control Structure Control Store 2 6 x Microinstruction 9 26 (J, COND, IRD)

14 COND COND BEN R IR[] Branch Ready Addr. Mode J[5] J[4] J[3] J[2] J[] J[],,IR[5:2] 6 IRD 6 Address of Next State

15 J IRD Cond LD.MDR LD.IR LD.BEN LD.REG LD.CC LD.MAR GatePC GateMDR GateALU LD.PC GateMARMUX GateSHF PCMUX DRMUX SRMUX ADDRMUX ADDR2MUX MARMUX ALUK MIO.EN R.W DATA.SIZE LSHF (State ) (State ) (State 2) (State 3) (State 4) (State 5) (State 6) (State 7) (State 8) (State 9) (State ) (State ) (State 2) (State 3) (State 4) (State 5) (State 6) (State 7) (State 8) (State 9) (State 2) (State 2) (State 22) (State 23) (State 24) (State 25) (State 26) (State 27) (State 28) (State 29) (State 3) (State 3) (State 32) (State 33) (State 34) (State 35) (State 36) (State 37) (State 38) (State 39) (State 4) (State 4) (State 42) (State 43) (State 44) (State 45) (State 46) (State 47) (State 48) (State 49) (State 5) (State 5) (State 52) (State 53) (State 54) (State 55) (State 56) (State 57) (State 58) (State 59) (State 6) (State 6) (State 62) (State 63)

16 End of the Exercise in Microprogramming 6

17 Variable-Latency Memory! The ready signal (R) enables memory read/write to execute correctly " Example: transition from state 33 to state 35 is controlled by the R bit asserted by memory when memory data is available! Could we have done this in a single-cycle microarchitecture?! What did we assume about memory and registers in a single-cycle microarchitecture? 7

18 The Microsequencer: Advanced Questions! What happens if the machine is interrupted?! What if an instruction generates an exception?! How can you implement a complex instruction using this control structure? " Think REP MOVS instruction in x86 8

19 The Power of Abstraction! The concept of a control store of microinstructions enables the hardware designer with a new abstraction: microprogramming! The designer can translate any desired operation to a sequence of microinstructions! All the designer needs to provide is " The sequence of microinstructions needed to implement the desired operation " The ability for the control logic to correctly sequence through the microinstructions " Any additional datapath elements and control signals needed (no need if the operation can be translated into existing control signals) 9

20 Let s Do Some More Microprogramming! Implement REP MOVS in the LC-3b microarchitecture! What changes, if any, do you make to the " state machine? " datapath? " control store? " microsequencer?! Show all changes and microinstructions! Extra Credit Assignment 2

21 x86 REP MOVS (String Copy) REP MOVS (DEST SRC) How many instructions does this take in MIPS ISA? How many microinstructions does this take to add to the LC-3b microarchitecture? 2

22 Aside: Alignment Correction in Memory! Unaligned accesses! LC-3b has byte load and byte store instructions that move data not aligned at the word-address boundary " Convenience to the programmer/compiler! How does the hardware ensure this works correctly? " Take a look at state 29 for LDB " States 24 and 7 for STB " Additional logic to handle unaligned accesses! P&P, Revised Appendix C.5 22

23 Aside: Memory Mapped I/O! Address control logic determines whether the specified address of LDW and STW are to memory or I/O devices! Correspondingly enables memory or I/O devices and sets up muxes! An instance where the final control signals of some datapath elements (e.g., MEM.EN or INMUX/2) cannot be stored in the control store " These signals are dependent on memory address! P&P, Revised Appendix C.6 23

24 Advantages of Microprogrammed Control! Allows a very simple design to do powerful computation by controlling the datapath (using a sequencer) " High-level ISA translated into microcode (sequence of u-instructions) " Microcode (u-code) enables a minimal datapath to emulate an ISA " Microinstructions can be thought of as a user-invisible ISA (u-isa)! Enables easy extensibility of the ISA " Can support a new instruction by changing the microcode " Can support complex instructions as a sequence of simple microinstructions (e.g., REP MOVS, INC [MEM])! Enables update of machine behavior " A buggy implementation of an instruction can be fixed by changing the microcode in the field! Easier if datapath provides ability to do the same thing in different ways 24

25 Update of Machine Behavior! The ability to update/patch microcode in the field (after a processor is shipped) enables " Ability to add new instructions without changing the processor! " Ability to fix buggy hardware implementations! Examples " IBM 37 Model 45: microcode stored in main memory, can be updated after a reboot " IBM System z: Similar to 37/45.! Heller and Farrell, Millicode in an IBM zseries processor, IBM JR&D, May/Jul 24. " B7 microcode can be updated while the processor is running! User-microprogrammable machine!! Wilner, Microprogramming environment on the Burroughs B7, CompCon

26 Multi-Cycle vs. Single-Cycle uarch! Advantages! Disadvantages! For you to fill in 26

27 Can We Do Better? 27

28 Can We Do Better?! What limitations do you see with the multi-cycle design?! Limited concurrency " Some hardware resources are idle during different phases of instruction processing cycle " Fetch logic is idle when an instruction is being decoded or executed " Most of the datapath is idle when a memory access is happening 28

29 Can We Use the Idle Hardware to Improve Concurrency?! Goal: More concurrency # Higher instruction throughput (i.e., more work completed in one cycle)! Idea: When an instruction is using some resources in its processing phase, process other instructions on idle resources not needed by that instruction " E.g., when an instruction is being decoded, fetch the next instruction " E.g., when an instruction is being executed, decode another instruction " E.g., when an instruction is accessing data memory (ld/st), execute the next instruction " E.g., when an instruction is writing its result into the register file, access data memory for the next instruction 29

30 Pipelining 3

31 Pipelining: Basic Idea! More systematically: " Pipeline the execution of multiple instructions " Analogy: Assembly line processing of instructions! Idea: " Divide the instruction processing cycle into distinct stages of processing " Ensure there are enough hardware resources to process one instruction in each stage " Process a different instruction in each stage! s consecutive in program order are processed in consecutive stages! Benefit: Increases instruction processing throughput (/CPI)! Downside: Start thinking about this 3

32 Example: Execution of Four Independent ADDs! Multi-cycle: 4 cycles per instruction F D E W F D E W F D E W F D E W! Pipelined: 4 cycles per 4 instructions (steady state) F D E W F D E W F D E W Is life always this beau9ful? Time F D E W Time 32

33 The Laundry Analogy Time Task order A B C D 6 PM AM! place one dirty load of clothes in the washer! when the washer is finished, place the wet load in the dryer! when the dryer is finished, take out the dry load and fold! when folding is finished, ask your roommate (??) to put the clothes away - steps to do a load are sequentially dependent - no dependence between different loads - different steps do not share resources Based on original figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 33

34 Pipelining Multiple Loads of Laundry Time Task order A B C D 6 PM AM Time 6 PM AM Task order A B C D - 4 loads of laundry in parallel - no additional resources - throughput increased by 4 - latency per load is the same Based on original figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 34

35 Pipelining Multiple Loads of Laundry: In Practice Time Task order A B C D 6 PM AM Time 6 PM AM Task order A B C D the slowest step decides throughput Based on original figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 35

36 Pipelining Multiple Loads of Laundry: In Practice Time Task order A B C D 6 PM AM Time 6 PM AM Task order A B C D Based on original figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] A B A B throughput restored (2 loads per hour) using 2 dryers 36

37 An Ideal Pipeline! Goal: Increase throughput with little increase in cost (hardware cost, in case of instruction processing)! Repetition of identical operations " The same operation is repeated on a large number of different inputs (e.g., all laundry loads go through the same steps)! Repetition of independent operations " No dependencies between repeated operations! Uniformly partitionable suboperations " Processing can be evenly divided into uniform-latency suboperations (that do not share resources)! Fitting examples: automobile assembly line, doing laundry " What about the instruction processing cycle? 37

38 Ideal Pipelining combinatonal logic (F,D,E,M,W) T psec BW=~(/T) T/2 ps (F,D,E) T/2 ps (M,W) BW=~(2/T) T/3 ps (F,D) T/3 ps (E,M) T/3 ps (M,W) BW=~(3/T) 38

39 More Realistic Pipeline: Throughput! Nonpipelined version with delay T BW = /(T+S) where S = latch delay T ps! k-stage pipelined version BW k-stage = / (T/k +S ) BW max = / ( gate delay + S ) Latch delay reduces throughput (switching overhead b/w stages) T/k ps T/k ps 39

40 More Realistic Pipeline: Cost! Nonpipelined version with combinatonal cost G Cost = G+L where L = latch cost G gates! k-stage pipelined version Cost k-stage = G + Lk Latches increase hardware cost G/k G/k 4

41 Pipelining Processing 4

42 Remember: The Processing Cycle. " Fetch fetch (IF) 2. " Decode decode and register " Evaluate operand Address fetch (ID/RF) 3. Execute/Evaluate " Fetch Operands memory address (EX/AG) 4. Memory operand fetch (MEM) " Execute 5. Store/writeback result (WB) " Store Result 42

43 Remember the Single-Cycle Uarch [25 ] Shift Jump address [3 ] left PCSrc =Jump 4 Add PC+4 [3 28] [3 26] Control RegDst Jump Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Shift left 2 Add result ALU M u x M u x PCSrc 2 =Br Taken PC Read address memory [3 ] [25 2] [2 6] [5 ] M u x Read register Read data Read register 2 Registers Read Write data 2 register Write data M u x Zero ALU ALU result bcond Address Write data Data memory Read data M u x [5 ] 6 Sign 32 extend ALU control [5 ] ALU operaton Based on original figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] T BW=~(/T) 43

44 Dividing Into Stages 2ps IF: fetch M u x ps 2ps 2ps ps ID: decode/ register file read EX: Execute/ address calculation MEM: Memory access WB: Write back ignore for now Add 4 Shift left 2 Add Add result PC Address memory Read register Read data Read register 2 Registers Read data 2 Write register Write data M u x Zero ALU ALU result Address Write data Data memory Read data M u x RF write 6 Sign extend 32 Is this the correct partitioning? Why not 4 or 6 stages? Why not different boundaries? Based on original figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 44

45 Pipeline Throughput Program execution order Time (in instructions) lw $, ($) fetch Reg ALU Data access Reg lw $2, 2($) 8ps 8 ns fetch Reg ALU Data access Reg lw $3, 3($) Program execution Time order (in instructions) lw $, ($) fetch 8 ns 8ps Reg ALU Data access Reg fetch 8ps 8 ns... lw $2, 2($) 2 ns 2ps fetch Reg ALU Data access Reg lw $3, 3($) 2ps 2 ns fetch Reg ALU Data access Reg 2ps 2 ns 2ps 2 ns 22ps ns 2ps 2 ns 2ps 2 ns 5-stage speedup is 4, not 5 as predicted by the ideal model. Why? 45

46 Enabling Pipelined Processing: Pipeline Registers IF: fetch M M u u x x ID: decode/ register file read EX: Execute/ address calculation MEM: Memory access WB: Write back No resource is used by more than stage! IF/ID ID/EX EX/MEM MEM/WB 4 4 Add Add PC D +4 PC E +4 Add Add Add Add result result npc M Shift Shift left left 2 2 PC PC PC F Address Address memory memory IR D Read Read register register Read Read data data Read Read register 2 2 Registers Read Read Write Write data data 2 2 register register Write Write data data Sign Sign extend extend A E B E Imm E M M u u x x Zero Zero ALU ALU ALU ALU result result Aout M B M Address Address Write Write data data Data Data memory Read Read data data MDR W Aout W M M u u x x Based on original figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] T/k ps T T/k ps 46

47 Pipelined Operation Example lw fetch M u x All instruction classes must follow the same path and timing through the pipeline stages. lw lw decode Any performance impact? Execution lw Memory lw Write back IF/ID ID/EX EX/MEM MEM/WB Add 4 Shift left left 2 Add Add result PC PC Address memory Read register Read data Read register 2 Registers Read Write data 2 register Write data 6 6 Sign extend M u x Zero ALU ALU result Address Data memory Data memory Write data data Read data M u x Based on original figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 47

48 Write data Pipelined Operation Example 32 6 Sign M u x Write data Data memory M u x extend Clock 5 sub lw $, $, 2($) $2, $3 fetch M u x sub lw $, $, 2($) $2, $3 decode lw $, 2($) Execution sub $, $2, $3 Execution sub lw $, $, 2($) $2, $3 Memory sub lw $, $, 2($) $2, $3 Write back IF/ID ID/EX EX/MEM MEM/WB Add 4 Shift left 2 Add Add result PC Address memory Read register Read Read data register 2 Zero Registers Read ALU ALU Write data 2 result register M u Write x data Is life always this beau9ful? 6 Sign extend 32 Address Data memory Write data Read data M u x Clock 2 3 Clock Clock 56 4 Based on original figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 48 sub $, $2, $3

49 Illustrating Pipeline Operation: Operation View t t t 2 t 3 t 4 t 5 Inst Inst Inst 2 Inst 3 Inst 4 IF ID IF EX ID IF MEM EX ID IF WB MEM EX ID IF steady state (full pipeline) WB MEM EX ID IF WB MEM EX ID IF WB MEM EX ID IF 49

50 Illustrating Pipeline Operation: Resource View t t t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t IF I I I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I ID I I I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 EX I I I 2 I 3 I 4 I 5 I 6 I 7 I 8 MEM I I I 2 I 3 I 4 I 5 I 6 I 7 WB I I I 2 I 3 I 4 I 5 I 6 5

51 Control Points in a Pipeline PCSrc M u x IF/ID ID/EX EX/MEM MEM/WB Add 4 RegWrite Shift left 2 Add Add result Branch PC Address memory Read register Read data Read register 2 Registers Read Write data 2 register Write data [5 ] 6 Sign 32 extend ALUSrc M u x 6 ALU control Zero ALU ALU result Address Write data MemWrite Data memory MemRead Read data MemtoReg M u x Based on original figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] [2 6] [5 ] M u x RegDst ALUOp Identical set of control points as the single-cycle datapath!! 5

52 Control Signals in a Pipeline! For a given instruction " same control signals as single-cycle, but " control signals required at different cycles, depending on stage Option : decode once using the same logic as single-cycle and buffer signals until consumed WB Control M WB EX M WB IF/ID ID/EX EX/MEM MEM/WB Option 2: carry relevant instruction word/field down the pipeline and decode locally within each or in a previous stage Which one is better? 52

53 Pipelined Control Signals PCSrc M u x Control ID/EX WB M EX/MEM WB MEM/WB IF/ID EX M WB Add PC 4 Address memory Read register Read data Read register 2 Registers Read Write data 2 register Write data RegWrite Shift left 2 M u x Add Add result ALUSrc Zero ALU ALU result Branch Write data MemWrite Address Data memory Read data MemtoReg M u x [5 ] 6 Sign 32 extend 6 ALU control MemRead [2 6] [5 ] M u x RegDst ALUOp Based on original figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 53

54 Carnegie Mellon Another Example: Single-Cycle and Pipelined CLK PC' PC A RD Memory Instr 25:2 2:6 CLK A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData CLK A RD Data Memory WD WE ReadData 4 + PCPlus4 2:6 5: 5: Sign Extend SignImm PC' CLK PCF A RD Memory CLK InstrD 25:2 2:6 2:6 5: CLK A A2 A3 WD3 WE3 Register File RD RD2 CLK RtE RdE SrcAE SrcBE WriteDataE WriteRegE 4: CLK ZeroM ALUOutM WriteDataM CLK WE A RD Data Memory WD ALUOutW ReadDataW + 4 5: Sign Extend SignImmE <<2 PCBranchM + WriteReg 4: <<2 + PCBranch Result ALU CLK PCPlus4F PCPlus4D PCPlus4E Fetch Decode Execute Memory Writeback ResultW 54

55 Carnegie Mellon Another Example: Correct Pipelined Datapath CLK CLK ALUOutW CLK PC' PCF A RD Memory CLK InstrD 25:2 2:6 2:6 5: CLK A A2 A3 WD3 WE3 Register File RD RD2 CLK RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: ZeroM ALUOutM WriteDataM WriteRegM 4: CLK A RD Data Memory WD WE ReadDataW WriteRegW 4: 4 + 5: Sign Extend SignImmE <<2 + PCBranchM PCPlus4F PCPlus4D PCPlus4E ResultW Fetch Decode Execute Memory Writeback! WriteReg must arrive at the same 9me as Result 55

56 Carnegie Mellon Another Example: Pipelined Control CLK CLK CLK Control Unit RegWriteD MemtoRegD MemWriteD RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM 3:26 5: Op Funct BranchD ALUControlD ALUSrcD BranchE ALUControlE 2: ALUSrcE BranchM PCSrcM RegDstD RegDstE ALUOutW PC' CLK PCF A RD Memory CLK InstrD 25:2 2:6 2:6 5: CLK A A2 A3 WD3 WE3 Register File RD RD2 RtE RdE SrcAE SrcBE WriteDataE ALU WriteRegE 4: ZeroM ALUOutM WriteDataM WriteRegM 4: CLK A RD Data Memory WD WE ReadDataW WriteRegW 4: 4 + 5: Sign Extend SignImmE <<2 + PCBranchM PCPlus4F PCPlus4D PCPlus4E ResultW! Same control unit as single-cycle processor Control delayed to proper pipeline stage 56

57 Remember: An Ideal Pipeline! Goal: Increase throughput with little increase in cost (hardware cost, in case of instruction processing)! Repetition of identical operations " The same operation is repeated on a large number of different inputs (e.g., all laundry loads go through the same steps)! Repetition of independent operations " No dependencies between repeated operations! Uniformly partitionable suboperations " Processing an be evenly divided into uniform-latency suboperations (that do not share resources)! Fitting examples: automobile assembly line, doing laundry " What about the instruction processing cycle? 57

58 Pipeline: Not An Ideal Pipeline! Identical operations... NOT! different instructions # not all need the same stages Forcing different instructions to go through the same pipe stages # external fragmentation (some pipe stages idle for some instructions)! Uniform suboperations... NOT! different pipeline stages # not the same latency Need to force each stage to be controlled by the same clock # internal fragmentation (some pipe stages are too fast but all take the same clock cycle time)! Independent operations... NOT! instructions are not independent of each other Need to detect and resolve inter-instruction dependencies to ensure the pipeline provides correct results # pipeline stalls (pipeline is not always moving) 58

59 Issues in Pipeline Design! Balancing work in pipeline stages " How many stages and what is done in each stage! Keeping the pipeline correct, moving, and full in the presence of events that disrupt pipeline flow " Handling dependences! Data! Control " Handling resource contention " Handling long-latency (multi-cycle) operations! Handling exceptions, interrupts! Advanced: Improving pipeline throughput " Minimizing stalls 59

60 Causes of Pipeline Stalls! Stall: A condition when the pipeline stops moving! Resource contention! Dependences (between instructions) " Data " Control! Long-latency (multi-cycle) operations 6

61 Dependences and Their Types! Also called dependency or less desirably hazard! Dependences dictate ordering requirements between instructions! Two types " Data dependence " Control dependence! Resource contention is sometimes called resource dependence " However, this is not fundamental to (dictated by) program semantics, so we will treat it separately 6

62 Handling Resource Contention! Happens when instructions in two pipeline stages need the same resource! Solution : Eliminate the cause of contention " Duplicate the resource or increase its throughput! E.g., use separate instruction and data memories (caches)! E.g., use multiple ports for memory structures! Solution 2: Detect the resource contention and stall one of the contending stages " Which stage do you stall? " Example: What if you had a single read and write port for the register file? 62

63 Carnegie Mellon Example Resource Dependence: RegFile! The register file can be read and wrinen in the same cycle: $ write takes place during the st half of the cycle $ read takes place during the 2nd half of the cycle => no problem!!! $ However operatons that involve register file have only half a clock cycle to complete the operaton!! Time (cycles) add $s2 add $s, $s2, $s3 IM RF $s3 + DM $s RF and $t, $s, $s IM and $s RF $s & DM $t RF or $t, $s4, $s IM or $s4 RF $s DM $t RF sub $t2, $s, $s5 IM sub $s RF $s5 - DM $t2 RF 63

64 Design of Digital Circuits Lecture 5: Pipelining Prof. Onur Mutlu ETH Zurich Spring 27 3 April 27

Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures

Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures 18-447 Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 1/28/2015 Agenda for Today & Next Few Lectures Single-cycle

More information

Computer Architecture. Lecture 5: Multi-Cycle and Microprogrammed Microarchitectures

Computer Architecture. Lecture 5: Multi-Cycle and Microprogrammed Microarchitectures Computer Architecture Lecture 5: Multi-Cycle and Microprogrammed Microarchitectures Dr. Ahmed Sallam Based on original slides by Prof. Onur Mutlu Agenda for Today & Next Few Lectures Single-cycle Microarchitectures

More information

Design of Digital Circuits Lecture 16: Dependence Handling. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 16: Dependence Handling. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 16: Dependence Handling Prof. Onur Mutlu ETH Zurich Spring 2017 27 April 2017 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

Topics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues.

Topics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues. Lecture 2: Pipelining Topics Introduction to pipelining Performance Pipelined datapath Design issues Hazards in pipeline Types Solutions Pipelining is Natural! Laundry Example Use case scenario Ann, Brian,

More information

Design of Digital Circuits Lecture 17: Pipelining Issues. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 17: Pipelining Issues. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 17: Pipelining Issues Prof. Onur Mutlu ETH Zurich Spring 2017 28 April 2017 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

CMSC Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining Prof. Yanjing Li University of Chicago Administrative Stuff! Lab1 due at 11:59pm today! Lab2 out " Pipeline ARM simulator "

More information

Computer Architectures

Computer Architectures Computer Architectures Pipelined instruction execution Hazards, stages balancing, super-scalar systems Pavel Píša, Michal Štepanovský, Miroslav Šnorek Main source of inspiration: Patterson Czech Technical

More information

CHW 362 : Computer Architecture & Organization

CHW 362 : Computer Architecture & Organization CHW 362 : Computer Architecture & Organization Instructors: Dr Ahmed Shalaby Dr Mona Ali http://bu.edu.eg/staff/ahmedshalaby4# http://www.bu.edu.eg/staff/mona.abdelbaset Review: Instruction Formats R-Type

More information

CENG 5133 Computer Architecture Design Spring Sample Exam 2

CENG 5133 Computer Architecture Design Spring Sample Exam 2 CENG 533 Computer Architecture Design Spring 24 Sample Exam 2. (6 pt) Determine the propagation delay and contamination delay of the following circuit using the gate delays given below. Gate t pd (ps)

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

Design of Digital Circuits Lecture 13: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 13: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 3: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring 27 6 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

COMP2611: Computer Organization. The Pipelined Processor

COMP2611: Computer Organization. The Pipelined Processor COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among

More information

ENCM 501 Winter 2019 Assignment 6 for the Week of March 11

ENCM 501 Winter 2019 Assignment 6 for the Week of March 11 page of 8 ENCM 5 Winter 29 Assignment 6 for the Week of March Steve Norman Department of Electrical & Computer Engineering University of Calgary February 29 Assignment instructions and other documents

More information

Slide Set 7 for Lecture Section 01

Slide Set 7 for Lecture Section 01 Slide Set 7 for Lecture Section 01 for ENCM 369 Winter 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2017 ENCM 369 Winter

More information

11/28/2016. ECE 120: Introduction to Computing. Register Loads Control Updates to Register Values. We Consider Five Groups of LC-3 Control Signals

11/28/2016. ECE 120: Introduction to Computing. Register Loads Control Updates to Register Values. We Consider Five Groups of LC-3 Control Signals University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering ECE 120: Introduction to Computing LC-3 Control Signals Time to Examine a Processor s Control Signals in Detail Recall

More information

CS 2461: Computer Architecture I

CS 2461: Computer Architecture I Computer Architecture is... CS 2461: Computer Architecture I Instructor: Prof. Bhagi Narahari Dept. of Computer Science Course URL: www.seas.gwu.edu/~bhagiweb/cs2461/ Instruction Set Architecture Organization

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19 CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) Microarchitecture Design of Digital Circuits 27 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) http://www.syssec.ethz.ch/education/digitaltechnik_7 Adapted from Digital

More information

LC3DataPath ECE2893. Lecture 9a. ECE2893 LC3DataPath Spring / 14

LC3DataPath ECE2893. Lecture 9a. ECE2893 LC3DataPath Spring / 14 LC3DataPath ECE2893 Lecture 9a ECE2893 LC3DataPath Spring 2011 1 / 14 LC3 Data Path [4:0] FINITE MACHINE STATE MEMORY IR ADDR2MUX ADDR1MUX + GateMARMUX LDPC MARMUX ZEXT SEXT SEXT SEXT RESET GateALU +1

More information

ECS 154B Computer Architecture II Spring 2009

ECS 154B Computer Architecture II Spring 2009 ECS 154B Computer Architecture II Spring 2009 Pipelining Datapath and Control 6.2-6.3 Partially adapted from slides by Mary Jane Irwin, Penn State And Kurtis Kredo, UCD Pipelined CPU Break execution into

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

Department of Electrical and Computer Engineering The University of Texas at Austin

Department of Electrical and Computer Engineering The University of Texas at Austin Department of Electrical and Computer Engineering The University of Texas at Austin EE 360N, Fall 003 Yale Patt, Instructor Santhosh Srinath, Danny Lynch, TAs Exam 1, October 0, 003 Name: Problem 1 (0

More information

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Full Datapath Branch Target Instruction Fetch Immediate 4 Today s Contents We have looked

More information

CSCI-564 Advanced Computer Architecture

CSCI-564 Advanced Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 6: Pipelining Review Bo Wu Colorado School of Mines Wake up! Time to do laundry! The Laundry Analogy Place one dirty load of clothes in the washer When the

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some

More information

Department of Electrical and Computer Engineering The University of Texas at Austin

Department of Electrical and Computer Engineering The University of Texas at Austin Department of Electrical and Computer Engineering The University of Texas at Austin EE 360N, Spring 2003 Yale Patt, Instructor Hyesoon Kim, Onur Mutlu, Moinuddin Qureshi, Santhosh Srinath, TAs Exam 1,

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

Working on the Pipeline

Working on the Pipeline Computer Science 6C Spring 27 Working on the Pipeline Datapath Control Signals Computer Science 6C Spring 27 MemWr: write memory MemtoReg: ALU; Mem RegDst: rt ; rd RegWr: write register 4 PC Ext Imm6 Adder

More information

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

CPE 335 Computer Organization. Basic MIPS Pipelining Part I CPE 335 Computer Organization Basic MIPS Pipelining Part I Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE232 Basic MIPS Pipelining

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content 3/6/8 CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Today s Content We have looked at how to design a Data Path. 4.4, 4.5 We will design

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 4: Pipelining Prof. Onur Mutlu Carnegie Mellon University Last Time Addressing modes Other ISA-level tradeoffs Programmer vs. microarchitect Virtual memory Unaligned

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

CPE 335. Basic MIPS Architecture Part II

CPE 335. Basic MIPS Architecture Part II CPE 335 Computer Organization Basic MIPS Architecture Part II Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE232 Basic MIPS Architecture

More information

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction EECS150 - Digital Design Lecture 10- CPU Microarchitecture Feb 18, 2010 John Wawrzynek Spring 2010 EECS150 - Lec10-cpu Page 1 Processor Microarchitecture Introduction Microarchitecture: how to implement

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information

Department of Electrical and Computer Engineering The University of Texas at Austin

Department of Electrical and Computer Engineering The University of Texas at Austin Department of Electrical and Computer Engineering The University of Texas at Austin EE 360N, Fall 2005 Yale Patt, Instructor Aater Suleman, Linda Bigelow, Jose Joao, Veynu Narasiman, TAs Final Exam, December,

More information

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August Lecture 8: Control COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Datapath and Control Datapath The collection of state elements, computation elements,

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design ENGN64: Design of Computing Systems Topic 4: Single-Cycle Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

The LC-3 Instruction Set Architecture. ISA Overview Operate instructions Data Movement instructions Control Instructions LC-3 data path

The LC-3 Instruction Set Architecture. ISA Overview Operate instructions Data Movement instructions Control Instructions LC-3 data path Chapter 5 The LC-3 Instruction Set Architecture ISA Overview Operate instructions Data Movement instructions Control Instructions LC-3 data path A specific ISA: The LC-3 We have: Reviewed data encoding

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Pipeline design. Mehran Rezaei

Pipeline design. Mehran Rezaei Pipeline design Mehran Rezaei How Can We Improve the Performance? Exec Time = IC * CPI * CCT Optimization IC CPI CCT Source Level * Compiler * * ISA * * Organization * * Technology * With Pipelining We

More information

EE 457 Unit 6a. Basic Pipelining Techniques

EE 457 Unit 6a. Basic Pipelining Techniques EE 47 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink bottling plant Filling the bottle = 3 sec. Placing the cap = 3 sec. Labeling = 3 sec. Would you want Machine = Does

More information

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer EECS150 - Digital Design Lecture 9- CPU Microarchitecture Feb 15, 2011 John Wawrzynek Spring 2011 EECS150 - Lec09-cpu Page 1 Watson: Jeopardy-playing Computer Watson is made up of a cluster of ninety IBM

More information

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4 IC220 Set #9: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Return to Chapter 4 Midnight Laundry Task order A B C D 6 PM 7 8 9 0 2 2 AM 2 Smarty Laundry Task order A B C D 6 PM

More information

15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011

15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011 15-740/18-740 Computer Architecture Lecture 7: Pipelining Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011 Review of Last Lecture More ISA Tradeoffs Programmer vs. microarchitect Transactional

More information

Department of Electrical and Computer Engineering The University of Texas at Austin

Department of Electrical and Computer Engineering The University of Texas at Austin Department of Electrical and Computer Engineering The University of Texas at Austin EE 60N, Fall 00 Yale Patt, Instructor Santhosh Srinath, Danny Lynch, TAs Exam, November 9, 00 Name: Problem (0 points):

More information

Special Microarchitecture based on a lecture by Sanjay Rajopadhye modified by Yashwant Malaiya

Special Microarchitecture based on a lecture by Sanjay Rajopadhye modified by Yashwant Malaiya Special Microarchitecture based on a lecture by Sanjay Rajopadhye modified by Yashwant Malaiya Computing Layers Problems Algorithms Language Instruction Set Architecture Microarchitecture Circuits Devices

More information

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

CS 61C: Great Ideas in Computer Architecture Control and Pipelining CS 6C: Great Ideas in Computer Architecture Control and Pipelining Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs6c/sp6 Datapath Control Signals ExtOp: zero, sign

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike

More information

Processor (II) - pipelining. Hwansoo Han

Processor (II) - pipelining. Hwansoo Han Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

Computer Architecture 计算机体系结构. Lecture 2. Instruction Set Architecture 第二讲 指令集架构. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 2. Instruction Set Architecture 第二讲 指令集架构. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 2. Instruction Set Architecture 第二讲 指令集架构 Chao Li, PhD. 李超博士 SJTU-SE346, Spring 27 Review ENIAC (946) used decimal representation; vacuum tubes per digit; could store

More information

Computer Architecture. Lecture 6: Pipelining

Computer Architecture. Lecture 6: Pipelining Compter Architectre Lectre 6: Pipelining Dr. Ahmed Sallam Based on original slides by Prof. Onr tl Agenda for Today & Net Few Lectres Single-cycle icroarchitectres lti-cycle and icroprogrammed icroarchitectres

More information

Pipelined Processor Design

Pipelined Processor Design Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Computer Design and Test Lab. Indian Institute of Science (IISc) Bangalore virendra@computer.org Advance Computer Architecture http://www.serc.iisc.ernet.in/~viren/courses/aca/aca.htm

More information

ENCM 369 Winter 2018 Lab 9 for the Week of March 19

ENCM 369 Winter 2018 Lab 9 for the Week of March 19 page 1 of 9 ENCM 369 Winter 2018 Lab 9 for the Week of March 19 Steve Norman Department of Electrical & Computer Engineering University of Calgary March 2018 Lab instructions and other documents for ENCM

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

LC-3 Instruction Processing

LC-3 Instruction Processing LC-3 Instruction Processing (Textbookʼs Chapter 4)# Next set of Slides:# Textbook Chapter 10-10.2# Instruction Processing# It is impossible to do all of an instruction in one clock cycle.# Processors break

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu www.secs.oakland.edu/~yan

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

Major CPU Design Steps

Major CPU Design Steps Datapath Major CPU Design Steps. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. This provides the the required datapath components and how they are connected

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Department of Electrical and Computer Engineering The University of Texas at Austin

Department of Electrical and Computer Engineering The University of Texas at Austin Department of Electrical and Computer Engineering The University of Texas at Austin EE N Spring 7 Y. N. Patt, Instructor Chirag Sakhuja, Sarbartha Banerjee, Jonathan Dahm, Arjun Teh, TAs Exam March, 7

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

LC-3 Instruction Processing. (Textbook s Chapter 4)

LC-3 Instruction Processing. (Textbook s Chapter 4) LC-3 Instruction Processing (Textbook s Chapter 4) Instruction Processing Fetch instruction from memory Decode instruction Evaluate address Fetch operands from memory Usually combine Execute operation

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design

More information

Simple Instruction Pipelining

Simple Instruction Pipelining Simple Instruction Pipelining Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Processor Performance Equation Time = Instructions * Cycles * Time Program Program Instruction

More information

Computer Systems Architecture Spring 2016

Computer Systems Architecture Spring 2016 Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,

More information

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón ICS 152 Computer Systems Architecture Prof. Juan Luis Aragón Lecture 5 and 6 Multicycle Implementation Introduction to Microprogramming Readings: Sections 5.4 and 5.5 1 Review of Last Lecture We have seen

More information

Columbia University CSEE 3827 Fundamentals of Computer Systems Final Exam

Columbia University CSEE 3827 Fundamentals of Computer Systems Final Exam Columbia University CSEE 3827 Fundamentals of Computer Systems Final Exam Prof. Martha A. Kim December 7, 23 Name: First Last (Family) UNI (e.g., mak29) You are allowed 3 hours. You may consult your own

More information

ETH, Design of Digital Circuits, SS17 Practice Exercises III

ETH, Design of Digital Circuits, SS17 Practice Exercises III ETH, Design of Digital Circuits, SS17 Practice Exercises III Instructors: Prof. Onur Mutlu, Prof. Srdjan Capkun TAs: Jeremie Kim, Minesh Patel, Hasan Hassan, Arash Tavakkol, Der-Yeuan Yu, Francois Serre,

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath COMP33 - Computer Architecture Lecture 8 Designing a Single Cycle Datapath The Big Picture The Five Classic Components of a Computer Processor Input Control Memory Datapath Output The Big Picture: The

More information

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ... CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

Pipelining. Maurizio Palesi

Pipelining. Maurizio Palesi * Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer

More information

Lecture 6: Pipelining

Lecture 6: Pipelining Lecture 6: Pipelining i CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and other

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

Processor (I) - datapath & control. Hwansoo Han

Processor (I) - datapath & control. Hwansoo Han Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two

More information

Lecture 10: Pipelined Implementations

Lecture 10: Pipelined Implementations U 8-7 S 9 L- 8-7 Lectre : Pipelined Implementations James. Hoe ept of EE, U Febrary 23, 29 nnoncements: Project is de this week idterm graded, d reslts posted Handots: H9 Homework 3 (on lackboard) Graded

More information

ENE 334 Microprocessors

ENE 334 Microprocessors ENE 334 Microprocessors Lecture 6: Datapath and Control : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 3 th & 4 th Edition, Patterson & Hennessy, 2005/2008, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

14:332:331 Pipelined Datapath

14:332:331 Pipelined Datapath 14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate

More information