Computer Architecture. Lecture 6: Pipelining

Size: px
Start display at page:

Download "Computer Architecture. Lecture 6: Pipelining"

Transcription

1 Compter Architectre Lectre 6: Pipelining Dr. Ahmed Sallam Based on original slides by Prof. Onr tl

2 Agenda for Today & Net Few Lectres Single-cycle icroarchitectres lti-cycle and icroprogrammed icroarchitectres Pipelining Isses in Pipelining: Control & Data Dependence Handling, State aintenance and Recovery, Ot-of-Order Eection Isses in OoO Eection: Load-Store Handling, 2

3 Recap of Last Lectre lti-cycle and icroprogrammed icroarchitectres Benefits vs. Design Principles When to Generate Control Signals icroprogrammed Control:, Seqencer, Control Store LC-3b State achine, Datapath, Control Strctre An Eercise in icroprogramming Variable Latency emory, Alignment, emory apped I/O, icroprogramming Power of abstraction (for the HW designer) Advantages of Programmed Control Update of achine Behavior 3

4 Review: A Simple LC-3b Control and Datapath 4

5 R IR[5:] BEN icroseqencer 6 Simple Design of the Control Strctre Control Store icroinstrction 9 26 (J, COND, IRD)

6 A Simple Datapath Can Become Very Powerfl

7 AR <! PC PC <! PC + 2 8, 9 DR <! 33 R R IR <! DR 35 To 8 RTI ADD 32 BEN<! IR[] & N + IR[] & Z + IR[9] & P [IR[5:2]] BR To To To 8 DR<! SR+OP2* set CC DR<! SR&OP2* set CC 5 AND XOR TRAP SHF LEA LDB LDW STW STB JSR JP [BEN] 22 PC<! PC+LSHF(off9,) To 8 9 DR<! SR XOR OP2* set CC 2 PC<! BaseR To 8 To 8 AR<! LSHF(ZEXT[IR[7:]],) 5 4 [IR[]] To 8 R 28 DR<! [AR] R7<! PC R PC<! DR 3 2 R7<! PC PC<! BaseR 2 R7<! PC To 8 PC<! PC+LSHF(off,) 3 To 8 DR<! SHF(SR,A,D,amt4) set CC To 8 To 8 4 DR<! PC+LSHF(off9, ) set CC 2 AR<! B+off6 6 AR<! B+LSHF(off6,) 7 AR<! B+LSHF(off6,) 3 AR<! B+off6 To NOTES B+off6 : Base + SEXT[offset6] PC+off9 : PC + SEXT[offset9] *OP2 may be SR2 or SEXT[imm5] ** [5:8] or [7:] depending on AR[] DR<! [AR[5:] ] R R 3 DR<! SEXT[BYTE.DATA] set CC DR<! [AR] 27 R DR<! DR set CC R DR<! SR 6 [AR]<! DR R R DR<! SR[7:] 7 [AR]<! DR** R R To 8 To 8 To 8 To 9 Figre C.2: A state machine for the LC-3b

8 Review: The Power of Abstraction The concept of a control store of microinstrctions enables the hardware designer with a new abstraction: microprogramming The designer can translate any desired operation to a seqence of microinstrctions All the designer needs to provide is The seqence of microinstrctions needed to implement the desired operation The ability for the control logic to correctly seqence throgh the microinstrctions Any additional path elements and control signals needed (no need if the operation can be translated into eisting control signals) 8

9 Review: Advantages of icroprogrammed Control Allows a very simple design to do powerfl comptation by controlling the path (sing a seqencer) High-level ISA translated into microcode (seqence of -instrctions) icrocode (-code) enables a minimal path to emlate an ISA icroinstrctions can be thoght of as a ser-invisible ISA (-ISA) Enables easy etensibility of the ISA Can spport a new instrction by changing the microcode Can spport comple instrctions as a seqence of simple microinstrctions Enables pdate of machine behavior A bggy implementation of an instrction can be fied by changing the microcode in the field 9

10 lti-cycle vs. Single-Cycle Arch Advantages Disadvantages Yo shold be very familiar with this right now

11 icroprogrammed vs. Hardwired Control Advantages Disadvantages Yo shold be very familiar with this right now

12 Can We Do Better? What limitations do yo see with the mlti-cycle design? Limited concrrency Some hardware resorces are idle dring different phases of instrction processing cycle Fetch logic is idle when an instrction is being decoded or eected ost of the path is idle when a memory access is happening 2

13 Can We Use the Idle Hardware to Improve Concrrency? Goal: ore concrrency Higher instrction throghpt (i.e., more work completed in one cycle) Idea: When an instrction is sing some resorces in its processing phase, process other instrctions on idle resorces not needed by that instrction E.g., when an instrction is being decoded, fetch the net instrction E.g., when an instrction is being eected, decode another instrction E.g., when an instrction is accessing memory (ld/st), eecte the net instrction E.g., when an instrction is writing its reslt into the register file, access memory for the net instrction 3

14 Pipelining 4

15 Pipelining: Basic Idea ore systematically: Idea: Pipeline the eection of mltiple instrctions Analogy: Assembly line processing of instrctions Divide the instrction processing cycle into distinct stages of processing Ensre there are enogh hardware resorces to process one instrction in each stage Process a different instrction in each stage s consective in program order are processed in consective stages Benefit: Increases instrction processing throghpt (/CPI) Downside: Start thinking abot this 5

16 Eample: Eection of For Independent ADDs lti-cycle: 4 cycles per instrction F D E W F D E W F D E W F D E W Pipelined: 4 cycles per 4 instrctions (steady state) F D E W F D E W F D E W Is life always this beatifl? Time F D E W Time 6

17 UNDERSTANDING PIPELINE 7

18 The Landry Analogy Time Task order A B C D 6 P A place one dirty load of clothes in the washer when the washer is finished, place the wet load in the dryer when the dryer 6 P is 7 finished, 8 take 9 ot the dry 2 load and fold 2 A Time when folding is finished, ask yor roommate (??) to pt the clothes Task away order A - steps to do a load are seqentially dependent B - no dependence between different loads - different steps do not share resorces C D Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 8

19 Pipelining ltiple Loads of Landry 7 6 P A Time Task order Time A6 P A Task B order AC DB C D Time 6 P A Task order 6 P A Time A Task order B A C B D C - 4 loads of landry in parallel - no additional resorces - throghpt increased by 4 - latency per load is the same D Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 9

20 7 Pipelining Time ltiple Loads of Landry: In Practice 6 P A Task order A 6 P Time A B Task order C A D B C D Time 6 P A Task order 6 P A TimeA TaskB order C A D B C D Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] the slowest step decides throghpt 2

21 Pipelining ltiple 7 Loads of Landry: In Practice 6 P A Time Task order A6 P A Time TaskB order C A D B C D Time 6 P A Task order 6 P A Time A TaskB order AC DB C D Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] A B A B throghpt restored (2 loads per hor) sing 2 dryers 2

22 PERFORING PIPELINE 22

23 An Ideal Pipeline Goal: Increase throghpt with little increase in cost (hardware cost, in case of instrction processing) Repetition of identical operations The same operation is repeated on a large nmber of different inpts (e.g., all landry loads go throgh the same steps) Repetition of independent operations No dependencies between repeated operations Uniformly partitionable sboperations Processing can be evenly divided into niform-latency sboperations (that do not share resorces) Fitting eamples: atomobile assembly line, doing landry What abot the instrction processing cycle? 23

24 Ideal Pipelining combinational logic (F,D,E,,W) T psec BW=~(/T) T/2 ps (F,D,E) T/2 ps (,W) BW=~(2/T) T/3 ps (F,D) T/3 ps (E,) T/3 ps (,W) BW=~(3/T) 24

25 ore Realistic Pipeline: Throghpt Nonpipelined version with delay T BW = /(T+S) where S = latch delay T ps k-stage pipelined version BW k-stage = / (T/k +S ) BW ma = / ( gate delay + S ) Latch delay redces throghpt (switching overhead b/w stages) T/k ps T/k ps 25

26 ore Realistic Pipeline: Cost Nonpipelined version with combinational cost G Cost = G+L where L = latch cost G gates k-stage pipelined version Cost k-stage = G + Lk Latches increase hardware cost G/k G/k 26

27 Pipelining Processing 27

28 Remember: The Processing Cycle Fetch. fetch (IF) 2. Decode decode and register Evalate operand ress fetch (ID/RF) 3. Eecte/Evalate Fetch Operands memory address (EX/AG) 4. emory operand fetch (E) 5. Store/writeback Eecte reslt (WB) Store Reslt 28

29 Remember the Single-Cycle Uarch [25 ] Shift Jmp address [3 ] left PCSrc =Jmp 4 PC+4 [3 28] [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 reslt ALU PCSrc 2 =Br Taken PC address memory [3 ] [25 2] [2 6] [5 ] register register 2 Registers 2 register Zero ALU ALU reslt bcond ress Data memory [5 ] 6 32 Sign etend ALU control [5 ] ALU operation Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] T BW=~(/T) 29

30 PIPELINE DATA PATH 3

31 Dividing Into Stages 2ps IF: fetch ps 2ps 2ps ps ID: decode/ register file read EX: Eecte/ address calclation E: emory access WB: back ignore for now 4 Shift left 2 reslt PC ress memory register register 2 Registers 2 register Zero ALU ALU reslt ress Data memory RF write 6 Sign etend 32 Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 3

32 Pipeline Throghpt Program eection order Time (in instrctions) lw $, ($) Reg fetch ALU Data access Reg lw $2, 2($) 8ps 8 ns Reg fetch ALU Data access Reg lw $3, 3($) Program eection Time order (in instrctions) lw $, ($) lw $2, 2($) fetch 2ps 2 ns 8 8ps ns Reg fetch ALU Reg Data access ALU Reg Data access Reg fetch... 8ps 8 ns lw $3, 3($) 2ps 2 ns fetch Reg ALU Data access Reg 2ps 2 ns 2ps 2 ns 22ps ns 2ps 2 ns 2ps 2 ns 5-stage speedp is 4, not 5 as predicted by the ideal model. Why? 32

33 PC D +4 PC E +4 Enabling Pipelined Processing: Pipeline Registers IF: fetch ID: decode/ register file read EX: Eecte/ address calclation E: emory access WB: back No resorce is sed by more than stage! IF/ID ID/EX EX/E E/WB 4 4 reslt reslt npc Shift Shift left left 2 2 PC PC PC F ress memory memory IR D register register register 2 2 Registers 2 2 register register Sign Sign etend etend A E B E Imm E Zero Zero ALU ALU ALU ALU reslt reslt Aot B ress ress Data memory DR W Aot W Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] T/k ps T T/k ps 33

34 Pipeline performance 32 impact 6 6 Sign etend 32 Sign etend lw fetch All instrction classes mst follow the same path and timing throgh the pipeline stages. (compare lw, add) lw decode Any performance impact? lw Eection lw emory lw back IF/ID ID/EX ID/EX EX/E EX/E E/WB E/WB 4 Shift left 2 reslt PC PC ress memory register register 2 Registers 2 register 6 32 Sign etend Zero ALU ALU ALU reslt ress Data memory Data memory Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] lw decode lw back 34

35 register 6 Sign etend 6 32 Sign etend Pipelined Operation 32 Eample Clock Clock Clock 5 3 Data memory sb lw $, $, 2($) $2, $3 fetch sb lw $, $, 2($) $2, $3 decode lw $, 2($) Eection sb $, $2, $3 Eection sb lw $, $, 2($) $2, $3 emory sb lw $, $, 2($) $2, $3 back IF/ID ID/EX EX/E E/WB 4 Shift left 2 reslt PC ress memory register register 2 Zero Registers ALU ALU 2 reslt register Is life always this beatifl? 6 32 Sign etend ress Data memory Clock Clock Clock 6 sb $, $2, $3 lw $, 2($) Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] sb $, $2, $3 lw $, 2($) 35 sb $, $2, $3

36 Illstrating Pipeline Operation: Operation View t t t 2 t 3 t 4 t 5 Inst Inst Inst 2 Inst 3 Inst 4 IF ID IF EX ID IF E EX ID IF WB E EX ID IF steady state (fll pipeline) WB E EX ID IF WB E EX ID IF WB E EX ID IF 36

37 Control Points in a Pipeline PCSrc IF/ID ID/EX EX/E E/WB 4 Reg Shift left 2 reslt Branch PC ress memory register register 2 Registers 2 register [5 ] 6 Sign 32 etend ALUSrc 6 ALU control Zero ALU ALU reslt ress em Data memory em emtoreg Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] [2 6] [5 ] RegDst ALUOp Identical set of control points as the single-cycle path!! 37

38 PIPELINE CONTROL SIGNALS 38

39 Control Signals in a Pipeline For a given instrction same control signals as single-cycle, bt control signals reqired at different cycles, depending on stage Option : decode once sing the same logic as single-cycle and bffer signals ntil consmed WB Control WB EX WB IF/ID ID/EX EX/E E/WB Option 2: carry relevant instrction word/field down the pipeline and decode locally within each or in a previos stage Which one is better? 39

40 Pipelined Control Signals PCSrc Control ID/EX WB EX/E WB E/WB IF/ID EX WB PC 4 ress memory register register 2 Registers register Reg 2 Shift left 2 reslt ALUSrc Zero ALU ALU reslt Branch em ress Data memory emtoreg 6 32 [5 ] Sign etend 6 ALU control em [2 6] [5 ] RegDst ALUOp Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 4

41 PIPELINE ISSUES 4

42 Remember: An Ideal Pipeline Goal: Increase throghpt with little increase in cost (hardware cost, in case of instrction processing) Repetition of identical operations The same operation is repeated on a large nmber of different inpts (e.g., all landry loads go throgh the same steps) Repetition of independent operations No dependencies between repeated operations Uniformly partitionable sboperations Processing an be evenly divided into niform-latency sboperations (that do not share resorces) Fitting eamples: atomobile assembly line, doing landry What abot the instrction processing cycle? 42

43 Pipeline: Not An Ideal Pipeline Identical operations... NOT! different instrctions not all need the same stages Forcing different instrctions to go throgh the same pipe stages eternal fragmentation (some pipe stages idle for some instrctions) Uniform sboperations... NOT! different pipeline stages not the same latency Need to force each stage to be controlled by the same clock internal fragmentation (some pipe stages are too fast bt all take the same clock cycle time) Independent operations... NOT! instrctions are not independent of each other Need to detect and resolve inter-instrction dependencies to ensre the pipeline provides correct reslts pipeline stalls (pipeline is not always moving) 43

44 Isses in Pipeline Design Balancing work in pipeline stages How many stages and what is done in each stage Keeping the pipeline correct, moving, and fll in the presence of events that disrpt pipeline flow Handling dependences Data Control Handling resorce contention Handling long-latency (mlti-cycle) operations Handling eceptions, interrpts Advanced: Improving pipeline throghpt inimizing stalls 44

45 Pipeline Stalls Stall: A condition when the pipeline stops moving Resorce contention Dependences (between instrctions) Data Control Long-latency (mlti-cycle) operations 45

46 DEPENDENCES 46

47 A compter program Following the Von Nemann model, the program is a seqence of instrctions L: ov AX, 3 a, b. jmp l 47

48 Dependences and Their Types Also called dependency or less desirably hazard Dependences dictate ordering reqirements between instrctions Two types Data dependence Control dependence Resorce contention is sometimes called resorce dependence 48

49 Handling Resorce Contention Happens when instrctions in two pipeline stages need the same resorce Soltion : Eliminate the case of contention Dplicate the resorce or increase its throghpt E.g., se separate instrction and memories (caches) E.g., se mltiple ports for memory strctres Soltion 2: Detect the resorce contention and stall one of the contending stages Which stage do yo stall? Eample: What if yo had a single read and write port for the register file? 49

50 UNDERSTANDING DEPENDENCES 5

51 Data Dependences Types of dependences Flow dependence (tre dependence read after write) Comes from the rnning program semantic Anti dependence (write after read) Otpt dependence (write after write) Which ones case stalls in a pipelined machine? For all of them, we need to ensre semantics of the program is correct Flow dependences always need to be obeyed becase they constitte tre dependence on a vale Anti and otpt dependences eist de to limited nmber of architectral registers They are dependence on a name, not a vale 5

52 Data Dependence Types Flow dependence r 3 r op r 2 -after- r 5 r 3 op r 4 (RAW) Anti dependence r 3 r op r 2 -after- r r 4 op r 5 (WAR) Otpt-dependence r 3 r op r 2 -after- r 5 r 3 op r 4 (WAW) r 3 r 6 op r 7 52

53 register 2 Zero memory Registers ALU 2 ALU register reslt Eample otpt 6 dependence 32 Sign etend 6 32 Sign etend Clock 3 Sb sb lw $,$2,$3 $, 2($) $2, $3 PC fetch PC PC 4 Clock Clock 5 4 ress 4 ress ress memory memory memory Clock Clock 56 2 Clock 3 Clock 4 IF/ID ID/EX EX/E E/WB In lw $, 2($) Sb sb $, $,$2,$3 $2, $3 decode decode register register register 2 Registers register register 2 2 register Registers register 2 2 register Registers 2 register 6 32 Sign 6 etend 32 Sign 6 etend 32 Sign etend Shift left 2 Shift left 2 lw $, 2($) Eection reslt reslt reslt Zero ALU ALU Zero reslt ALU ALU Zero reslt ALU ALU reslt IF/ID ID/EX EX/E E/WB IF/ID ID/EX EX/E E/WB 6 Sign etend 32 sb Sb $, $,$2,$3 $2, $3 Shift left 2 Eection Data memory ress Data memory Sb sb $,$2,$3 $, $2, $3 emory lw $, 2($) emory ress ress Data memory ress Data memory Data memory sb lw Sb $, $, $,$2,$3 2($) $2, $3 back sb $, $2, $3 lw $, 2($) 53 sb $, $2, $3

54 register 2 Zero memory Registers ALU 2 ALU register reslt Eample of flow 6 dependence 32 Sign etend 6 32 Sign etend Clock 3 Sb sb lw $,$2,$ $, 2($) $2, $3 PC fetch PC PC 4 Clock Clock 5 4 ress 4 ress ress memory memory memory Clock Clock 56 2 Clock 3 Clock 4 IF/ID ID/EX EX/E E/WB In lw $, 2($) Sb sb $,$2,$ $2, $3 decode decode register register register 2 Registers register register 2 2 register Registers register 2 2 register Registers 2 register 6 32 Sign 6 etend 32 Sign 6 etend 32 Sign etend Shift left 2 Shift left 2 lw $, 2($) Eection reslt reslt reslt Zero ALU ALU Zero reslt ALU ALU Zero reslt ALU ALU reslt IF/ID ID/EX EX/E E/WB IF/ID ID/EX EX/E E/WB 6 Sign etend 32 sb Sb $,$2,$ $2, $3 Shift left 2 Eection Data memory ress Data memory Sb sb $,$2,$ $2, $3 emory lw $, 2($) emory ress ress Data memory ress Data memory Data memory sb lw Sb $, $,$2,$ 2($) $2, $3 back sb $, $2, $3 lw $, 2($) 54 sb $, $2, $3

55 Control Dependence Qestion: What shold the fetch PC be in the net cycle? Answer: The address of the net instrction All instrctions are control dependent on previos ones. Why? If the fetched instrction is a non-control-flow instrction: Net Fetch PC is the address of the net-seqential instrction Easy to determine if we know the size of the fetched instrction If the instrction that is fetched is a control-flow instrction: How do we determine the net Fetch PC? In fact, how do we know whether or not the fetched instrction is a control-flow instrction? 55

56 DEPENDENCES DETECTION 56

57 Interlocking Detection of dependence between instrctions in a pipelined processor to garantee correct eection Software based interlocking vs. Hardware based interlocking IPS acronym? icroprocessor withot Interlocked Pipeline Stages 57

58 Approaches to Dependence Detection (I) Scoreboarding Each register in register file has a Valid bit associated with it An instrction that is writing to the register resets the Valid bit An instrction in Decode stage checks if all its sorce and destination registers are Valid Yes: No need to stall No dependence No: Stall the instrction Advantage: Simple. bit per register Disadvantage: Need to stall for all types of dependences, not only flow dep. 58

59 Scoreboarding IF/ID ID/EX EX/E E/WB 4 Shift left 2 reslt PC ress memory register register 2 Registers 2 register Zero ALU ALU reslt ress Data memory 6 Sign etend 32 59

60 Not Stalling on Anti and Otpt Dependences What changes wold yo make to the scoreboard to enable this? conter for writing operation, not jst and 6

61 Approaches to Dependence Detection (II) Combinational dependence check logic Special logic that checks if any instrction in later stages is spposed to write to any sorce register of the instrction that is being decoded Yes: stall the instrction/pipeline No: no need to stall no flow dependence Advantage: No need to stall on anti and otpt dependences Disadvantage: Logic is more comple than a scoreboard Logic becomes more comple as we make the pipeline deeper and wider (flash-forward: think sperscalar eection) 6

62 DATA DEPENDENCE HANDLING 62

63 Once Yo Detect the Dependence in Hardware What do yo do afterwards? Observation: Dependence between two instrctions is detected before the commnicated vale becomes available 63

64 How to Handle Data Dependences Anti and otpt dependences are easier to handle write to the destination in one stage and in program order Flow dependences are more interesting Five fndamental ways of handling flow dependences. Detect and wait ntil vale is available in register file 2. Detect and forward/bypass to dependent instrction Detect and eliminate the dependence at the software level No need for the hardware to detect dependence (IPS NOP) Do something else (same program reorder ), (different program fine-grained mltithreading ) and No need to detect. Predict the needed vale(s), eecte speclatively, and verify 64

65 Right place to eliminate dependency Which one of the following flow dependences lead to conflicts in the 5-stage pipeline? addi r - - IF ID EX E WB addi - r - IF ID EX E WB addi - r - IF ID EX E addi - r - IF ID EX addi - r - IF? ID addi - r- IF 65

66 Safe and Unsafe ovement of Pipeline stage X j:_r k Reg j:r k _ Reg j:r k _ Reg i F j i A j i O j stage Y i:r k _ Reg i:_r k Reg i:r k _ Reg RAW Dependence WAR Dependence WAW Dependence dist(i,j) dist(x,y)?? Unsafe to keep j moving dist(i,j) > dist(x,y)?? Safe 66

67 RAW Dependence Analysis Eample s I A and I B (where I A comes before I B ) have RAW dependence iff IF R/I-Type LW SW Br J Jr ID read RF read RF read RF read RF read RF EX E WB write RF write RF I B (R/I, LW, SW, Br or JR) reads a register written by I A (R/I or LW) dist(i A, I B ) dist(id, WB) = 3 What abot WAW and WAR dependence? What abot memory dependence? 67

68 Pipeline Stall: Resolving Data Dependence t t t 2 t 3 t 4 t 5 Inst h IF ID ALU E WB Inst i i IF ID ALU E WB Inst j j IF ID ALU ID E ALU ID WB E ALU ID WB E ALU Inst k IF ID IF ALU ID IF E ALU ID IF WB E ALU ID Inst l IF ID IF ALU ID IF E ALU ID IF IF ID IF ALU ID IF i: r _ j: bbble _ r IF ID dist(i,j)= IF Stall = make the dependent instrction j: bbble _ r dist(i,j)=2 IF j: bbble _ r dist(i,j)=3 j: _ r dist(i,j)=4 wait ntil its sorce vale is available. stop all p-stream stages 2. drain all down-stream stages 68

69 Sample Assembly (P&H) for (j=i-; j>= && v[j] > v[j+]; j-=) {... } addi $s, $s, - for2tst: slti $t, $s, bne $t, $zero, eit2 sll $t, $s, 2 add $t2, $a, $t lw $t3, ($t2) lw $t4, 4($t2) slt $t, $t4, $t3 beq $t, $zero, eit2... addi $s, $s, - j for2tst eit2: 3 stalls 3 stalls 3 stalls 3 stalls 3 stalls 3 stalls 69

70 ings P&H Chapter Smith and Sohi, The icroarchitectre of Sperscalar Processors, Proceedings of the IEEE, 995 ore advanced pipelining Interrpt and eception handling Ot-of-order and sperscalar eection concepts 7

Lecture 10: Pipelined Implementations

Lecture 10: Pipelined Implementations U 8-7 S 9 L- 8-7 Lectre : Pipelined Implementations James. Hoe ept of EE, U Febrary 23, 29 nnoncements: Project is de this week idterm graded, d reslts posted Handots: H9 Homework 3 (on lackboard) Graded

More information

Pipelining. Chapter 4

Pipelining. Chapter 4 Pipelining Chapter 4 ake processor rns faster Pipelining is an implementation techniqe in which mltiple instrctions are overlapped in eection Key of making processor fast Pipelining Single cycle path we

More information

Computer Architecture

Computer Architecture Compter Architectre Lectre 4: Intro to icroarchitectre: Single- Cycle Dr. Ahmed Sallam Sez Canal University Based on original slides by Prof. Onr tl Review Compter Architectre Today and Basics (Lectres

More information

Computer Architecture

Computer Architecture Compter Architectre Lectre 4: Intro to icroarchitectre: Single- Cycle Dr. Ahmed Sallam Sez Canal University Spring 25 Based on original slides by Prof. Onr tl Review Compter Architectre Today and Basics

More information

Review: Computer Organization

Review: Computer Organization Review: Compter Organization Pipelining Chans Y Landry Eample Landry Eample Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 3 mintes A B C D Dryer takes 3 mintes

More information

Overview of Pipelining

Overview of Pipelining EEC 58 Compter Architectre Pipelining Department of Electrical Engineering and Compter Science Cleveland State University Fndamental Principles Overview of Pipelining Pipelined Design otivation: Increase

More information

1048: Computer Organization

1048: Computer Organization 8: Compter Organization Lectre 6 Pipelining Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6- Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards

More information

Chapter 6: Pipelining

Chapter 6: Pipelining CSE 322 COPUTER ARCHITECTURE II Chapter 6: Pipelining Chapter 6: Pipelining Febrary 10, 2000 1 Clothes Washing CSE 322 COPUTER ARCHITECTURE II The Assembly Line Accmlate dirty clothes in hamper Place in

More information

What do we have so far? Multi-Cycle Datapath

What do we have so far? Multi-Cycle Datapath What do we have so far? lti-cycle Datapath CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instrction being processed in datapath How to lower CPI frther? #1 Lec # 8 Spring2 4-11-2 Pipelining pipelining

More information

Enhanced Performance with Pipelining

Enhanced Performance with Pipelining Chapter 6 Enhanced Performance with Pipelining Note: The slides being presented represent a mi. Some are created by ark Franklin, Washington University in St. Lois, Dept. of CSE. any are taken from the

More information

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction Review Friday the 2st of October Real world eamples of pipelining? How does pipelining pp inflence instrction latency? How does pipelining inflence instrction throghpt? What are the three types of hazard

More information

PIPELINING. Pipelining: Natural Phenomenon. Pipelining. Pipelining Lessons

PIPELINING. Pipelining: Natural Phenomenon. Pipelining. Pipelining Lessons Pipelining: Natral Phenomenon Landry Eample: nn, rian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 mintes C D Dryer takes 0 mintes PIPELINING Folder takes 20 mintes

More information

Computer Architecture Lecture 6: Multi-cycle Microarchitectures. Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 2/6/2012

Computer Architecture Lecture 6: Multi-cycle Microarchitectures. Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 2/6/2012 8-447 Compter Architectre Lectre 6: lti-cycle icroarchitectres Prof. Onr tl Carnegie ellon University Spring 22, 2/6/22 Reminder: Homeworks Homework soltions Check and stdy the soltions! Learning now is

More information

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts CS359: Compter Architectre Chapter 3 & Appendi C Pipelining Part A: Basic and Intermediate Concepts Yanyan Shen Department of Compter Science and Engineering Shanghai Jiao Tong University 1 Otline Introdction

More information

Solutions for Chapter 6 Exercises

Solutions for Chapter 6 Exercises Soltions for Chapter 6 Eercises Soltions for Chapter 6 Eercises 6. 6.2 a. Shortening the ALU operation will not affect the speedp obtained from pipelining. It wold not affect the clock cycle. b. If the

More information

Exceptions and interrupts

Exceptions and interrupts Eceptions and interrpts An eception or interrpt is an nepected event that reqires the CPU to pase or stop the crrent program. Eception handling is the hardware analog of error handling in software. Classes

More information

CS 251, Winter 2019, Assignment % of course mark

CS 251, Winter 2019, Assignment % of course mark CS 25, Winter 29, Assignment.. 3% of corse mark De Wednesday, arch 3th, 5:3P Lates accepted ntil Thrsday arch th, pm with a 5% penalty. (7 points) In the diagram below, the mlticycle compter from the corse

More information

Chapter 6: Pipelining

Chapter 6: Pipelining Chapter 6: Pipelining Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining

More information

PS Midterm 2. Pipelining

PS Midterm 2. Pipelining PS idterm 2 Pipelining Seqential Landry 6 P 7 8 9 idnight Time T a s k O r d e r A B C D 3 4 2 3 4 2 3 4 2 3 4 2 Seqential landry takes 6 hors for 4 loads If they learned pipelining, how long wold landry

More information

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read. The final path PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtor RegDst ALUSrc em I [5

More information

The single-cycle design from last time

The single-cycle design from last time lticycle path Last time we saw a single-cycle path and control nit for or simple IPS-based instrction set. A mlticycle processor fies some shortcomings in the single-cycle CPU. Faster instrctions are not

More information

Review Multicycle: What is Happening. Controlling The Multicycle Design

Review Multicycle: What is Happening. Controlling The Multicycle Design Review lticycle: What is Happening Reslt Zero Op SrcA SrcB Registers Reg Address emory em Data Sign etend Shift left Sorce A B Ot [-6] [5-] [-6] [5-] [5-] Instrction emory IR RegDst emtoreg IorD em em

More information

EEC 483 Computer Organization

EEC 483 Computer Organization EEC 83 Compter Organization Chapter.6 A Pipelined path Chans Y Pipelined Approach 2 - Cycle time, No. stages - Resorce conflict E E A B C D 3 E E 5 E 2 3 5 2 6 7 8 9 c.y9@csohio.ed Resorces sed in 5 Stages

More information

CS 251, Winter 2018, Assignment % of course mark

CS 251, Winter 2018, Assignment % of course mark CS 25, Winter 28, Assignment 4.. 3% of corse mark De Wednesday, arch 7th, 4:3P Lates accepted ntil Thrsday arch 8th, am with a 5% penalty. (6 points) In the diagram below, the mlticycle compter from the

More information

Lecture 6: Microprogrammed Multi Cycle Implementation. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 6: Microprogrammed Multi Cycle Implementation. James C. Hoe Department of ECE Carnegie Mellon University 8 447 Lectre 6: icroprogrammed lti Cycle Implementation James C. Hoe Department of ECE Carnegie ellon University 8 447 S8 L06 S, James C. Hoe, CU/ECE/CALC, 208 Yor goal today Hosekeeping nderstand why

More information

The extra single-cycle adders

The extra single-cycle adders lticycle Datapath As an added bons, we can eliminate some of the etra hardware from the single-cycle path. We will restrict orselves to sing each fnctional nit once per cycle, jst like before. Bt since

More information

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind Pipelining hink of sing machines in landry services Chapter 6 nhancing Performance with Pipelining 6 P 7 8 9 A ime ask A B C ot pipelined Assme 3 min. each task wash, dry, fold, store and that separate

More information

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation EXAINATIONS 2003 COP203 END-YEAR Compter Organisation Time Allowed: 3 Hors (180 mintes) Instrctions: Answer all qestions. There are 180 possible marks on the eam. Calclators and foreign langage dictionaries

More information

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM Lectre (Wed /5/28) Lab # Hardware De Fri Oct 7 HW #2 IPS programming, de Wed Oct 22 idterm Fri Oct 2 IorD The mlticycle path SrcA Today s objectives: icroprogramming Etending the mlti-cycle path lti-cycle

More information

PART I: Adding Instructions to the Datapath. (2 nd Edition):

PART I: Adding Instructions to the Datapath. (2 nd Edition): EE57 Instrctor: G. Pvvada ===================================================================== Homework #5b De: check on the blackboard =====================================================================

More information

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code:

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code: EE8 Winter 25 Homework #2 Soltions De Thrsday, Feb 2, 5 P. ( points) Consider the following fragment of Java code: for (i=; i

More information

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION EXAINATIONS 2010 END OF YEAR COPUTER ORGANIZATION Time Allowed: 3 Hors (180 mintes) Instrctions: Answer all qestions. ake sre yor answers are clear and to the point. Calclators and paper foreign langage

More information

Review. A single-cycle MIPS processor

Review. A single-cycle MIPS processor Review If three instrctions have opcodes, 7 and 5 are they all of the same type? If we were to add an instrction to IPS of the form OD $t, $t2, $t3, which performs $t = $t2 OD $t3, what wold be its opcode?

More information

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 8: Data Hazard and Resolution James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L08 S1, James C. Hoe, CU/ECE/CALC, 2018 Your goal today Housekeeping detect and resolve

More information

Lecture 9: Microcontrolled Multi-Cycle Implementations

Lecture 9: Microcontrolled Multi-Cycle Implementations 8-447 Lectre 9: icroled lti-cycle Implementations James C. Hoe Dept of ECE, CU Febrary 8, 29 S 9 L9- Annoncements: P&H Appendi D Get started t on Lab Handots: Handot #8: Project (on Blackboard) Single-Cycle

More information

EEC 483 Computer Organization

EEC 483 Computer Organization EEC 483 Compter Organization Chapter 4.4 A Simple Implementation Scheme Chans Y The Big Pictre The Five Classic Components of a Compter Processor Control emory Inpt path Otpt path & Control 2 path and

More information

1048: Computer Organization

1048: Computer Organization 48: Compter Organization Lectre 5 Datapath and Control Lectre5A - simple implementation (cwli@twins.ee.nct.ed.tw) 5A- Introdction In this lectre, we will try to implement simplified IPS which contain emory

More information

1048: Computer Organization

1048: Computer Organization 48: Compter Organization Lectre 5 Datapath and Control Lectre5B - mlticycle implementation (cwli@twins.ee.nct.ed.tw) 5B- Recap: A Single-Cycle Processor PCSrc 4 Add Shift left 2 Add ALU reslt PC address

More information

Quiz #1 EEC 483, Spring 2019

Quiz #1 EEC 483, Spring 2019 Qiz # EEC 483, Spring 29 Date: Jan 22 Name: Eercise #: Translate the following instrction in C into IPS code. Eercise #2: Translate the following instrction in C into IPS code. Hint: operand C is stored

More information

EEC 483 Computer Organization. Branch (Control) Hazards

EEC 483 Computer Organization. Branch (Control) Hazards EEC 483 Compter Organization Section 4.8 Branch Hazards Section 4.9 Exceptions Chans Y Branch (Control) Hazards While execting a previos branch, next instrction address might not yet be known. s n i o

More information

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University Compter Architectre Chapter 5 Fall 25 Department of Compter Science Kent State University The Processor: Datapath & Control Or implementation of the MIPS is simplified memory-reference instrctions: lw,

More information

CSE Introduction to Computer Architecture Chapter 5 The Processor: Datapath & Control

CSE Introduction to Computer Architecture Chapter 5 The Processor: Datapath & Control CSE-45432 Introdction to Compter Architectre Chapter 5 The Processor: Datapath & Control Dr. Izadi Data Processor Register # PC Address Registers ALU memory Register # Register # Address Data memory Data

More information

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion . (Chapter 5) Fill in the vales for SrcA, SrcB, IorD, Dst and emto to complete the Finite State achine for the mlti-cycle datapath shown below. emory address comptation 2 SrcA = SrcB = Op = fetch em SrcA

More information

Lecture 7. Building A Simple Processor

Lecture 7. Building A Simple Processor Lectre 7 Bilding A Simple Processor Christos Kozyrakis Stanford University http://eeclass.stanford.ed/ee8b C. Kozyrakis EE8b Lectre 7 Annoncements Upcoming deadlines Lab is de today Demo by 5pm, report

More information

Comp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13

Comp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13 Comp 33 Compter Architectre A Pipelined path Lectre 3 Pipelined path with Signals PCSrc IF/ ID ID/ EX EX / E E / Add PC 4 Address Instrction emory RegWr ra rb rw Registers bsw [5-] [2-6] [5-] bsa bsb Sign

More information

CS 251, Spring 2018, Assignment 3.0 3% of course mark

CS 251, Spring 2018, Assignment 3.0 3% of course mark CS 25, Spring 28, Assignment 3. 3% of corse mark De onday, Jne 25th, 5:3 P. (5 points) Consider the single-cycle compter shown on page 6 of this assignment. Sppose the circit elements take the following

More information

Instruction Pipelining is the use of pipelining to allow more than one instruction to be in some stage of execution at the same time.

Instruction Pipelining is the use of pipelining to allow more than one instruction to be in some stage of execution at the same time. Pipelining Pipelining is the se of pipelining to allow more than one instrction to be in some stage of eection at the same time. Ferranti ATLAS (963): Pipelining redced the average time per instrction

More information

Computer Architecture. Lecture 5: Multi-Cycle and Microprogrammed Microarchitectures

Computer Architecture. Lecture 5: Multi-Cycle and Microprogrammed Microarchitectures Computer Architecture Lecture 5: Multi-Cycle and Microprogrammed Microarchitectures Dr. Ahmed Sallam Based on original slides by Prof. Onur Mutlu Agenda for Today & Next Few Lectures Single-cycle Microarchitectures

More information

Lecture 13: Exceptions and Interrupts

Lecture 13: Exceptions and Interrupts 18 447 Lectre 13: Eceptions and Interrpts S 10 L13 1 James C. Hoe Dept of ECE, CU arch 1, 2010 Annoncements: Handots: Spring break is almost here Check grades on Blackboard idterm 1 graded Handot #9: Lab

More information

CSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade

CSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade CSE 141 Compter Architectre Smmer Session I, 2004 Lectres 10 Advanced Topics, emory Hierarchy and Cache Pramod V. Argade CSE141: Introdction to Compter Architectre Instrctor: TA: Pramod V. Argade (p2argade@cs.csd.ed)

More information

CMSC Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining Prof. Yanjing Li University of Chicago Administrative Stuff! Lab1 due at 11:59pm today! Lab2 out " Pipeline ARM simulator "

More information

CS 251, Winter 2018, Assignment % of course mark

CS 251, Winter 2018, Assignment % of course mark CS 25, Winter 28, Assignment 3.. 3% of corse mark De onday, Febrary 26th, 4:3 P Lates accepted ntil : A, Febrary 27th with a 5% penalty. IEEE 754 Floating Point ( points): (a) (4 points) Complete the following

More information

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage to Describe and odel a Pipeline

More information

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed Lecture 3: General Purpose Processor Design CSCE 665 Advanced VLSI Systems Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, tet etc included in slides are borrowed from various books, websites,

More information

Lecture 10: Pipelined Implementations: Hazards and Resolutions. Instruction Pipeline Reality

Lecture 10: Pipelined Implementations: Hazards and Resolutions. Instruction Pipeline Reality 18-447 Lecture 10: Pipelined Implementations: Hazards and Resolutions S 09 L10-1 James C. Hoe José F. Martínez Electrical and Computer Engineering Carnegie Mellon University February 15, 2010 Instruction

More information

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle

More information

Design of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 5: Pipelining Prof. Onur Mutlu ETH Zurich Spring 27 3 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

Winter 2013 MIDTERM TEST #2 Wednesday, March 20 7:00pm to 8:15pm. Please do not write your U of C ID number on this cover page.

Winter 2013 MIDTERM TEST #2 Wednesday, March 20 7:00pm to 8:15pm. Please do not write your U of C ID number on this cover page. page of 7 University of Calgary Departent of Electrical and Copter Engineering ENCM 369: Copter Organization Lectre Instrctors: Steve Noran and Nor Bartley Winter 23 MIDTERM TEST #2 Wednesday, March 2

More information

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content 3/6/8 CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Today s Content We have looked at how to design a Data Path. 4.4, 4.5 We will design

More information

Animating the Datapath. Animating the Datapath: R-type Instruction. Animating the Datapath: Load Instruction. MIPS Datapath I: Single-Cycle

Animating the Datapath. Animating the Datapath: R-type Instruction. Animating the Datapath: Load Instruction. MIPS Datapath I: Single-Cycle nimating the atapath PS atapath : Single-Cycle npt is either (-type) or sign-etended lower half of instrction (load/store) op offset/immediate W egister File 6 6 + from instrction path beq,, offset if

More information

Pipelining. Maurizio Palesi

Pipelining. Maurizio Palesi * Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer

More information

COMP2611: Computer Organization. The Pipelined Processor

COMP2611: Computer Organization. The Pipelined Processor COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among

More information

What do we have so far? Multi-Cycle Datapath (Textbook Version)

What do we have so far? Multi-Cycle Datapath (Textbook Version) What do we have so far? ulti-cycle Datapath (Textbook Version) CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instruction being processed in datapath How to lower CPI further? #1 Lec # 8 Summer2001

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

Designing a Pipelined CPU

Designing a Pipelined CPU Designing a Pipelined CPU CSE 4, S2'6 Review -- Single Cycle CPU CSE 4, S2'6 Review -- ultiple Cycle CPU CSE 4, S2'6 Review -- Instruction Latencies Single-Cycle CPU Load Ifetch /Dec Exec em Wr ultiple

More information

Lecture 6: Pipelining

Lecture 6: Pipelining Lecture 6: Pipelining i CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and other

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online

More information

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000 The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE 513 01 Test II November 14, 2000 Name: 1. (5 points) For an eight-stage pipeline, how many cycles does it take to

More information

Hardware Design Tips. Outline

Hardware Design Tips. Outline Hardware Design Tips EE 36 University of Hawaii EE 36 Fall 23 University of Hawaii Otline Verilog: some sbleties Simlators Test Benching Implementing the IPS Actally a simplified 6 bit version EE 36 Fall

More information

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Full Datapath Branch Target Instruction Fetch Immediate 4 Today s Contents We have looked

More information

cs470 - Computer Architecture 1 Spring 2002 Final Exam open books, open notes

cs470 - Computer Architecture 1 Spring 2002 Final Exam open books, open notes 1 of 7 ay 13, 2002 v2 Spring 2002 Final Exam open books, open notes Starts: 7:30 pm Ends: 9:30 pm Name: (please print) ID: Problem ax points Your mark Comments 1 10 5+5 2 40 10+5+5+10+10 3 15 5+10 4 10

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

Design of Digital Circuits Lecture 16: Dependence Handling. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 16: Dependence Handling. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 16: Dependence Handling Prof. Onur Mutlu ETH Zurich Spring 2017 27 April 2017 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

Review. How to represent real numbers

Review. How to represent real numbers PCWrite PC IorD Review ALUSrcA emread Address Write data emory emwrite em Data IRWrite [3-26] [25-2] [2-6] [5-] [5-] RegDst Read register Read register 2 Write register Write data RegWrite Read data Read

More information

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight Pipelining: Its Natural! Chapter 3 Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes A B C D Dryer takes 40 minutes Folder

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Harware Organization an Design ectre 11: Introction to IPs path apte from Compter Organization an Design, Patterson & Hennessy, CB IPS-lite processor Compter Want to bil a processor for a sbset

More information

Lecture 19 Introduction to Pipelining

Lecture 19 Introduction to Pipelining CSE 30321 Lecture 19 Pipelining (Part 1) 1 Lecture 19 Introduction to Pipelining CSE 30321 Lecture 19 Pipelining (Part 1) Basic pipelining basic := single, in-order issue single issue one instruction at

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S. Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing (2) Pipeline Performance Assume time for stages is ps for register read or write

More information

Lab 8 (All Sections) Prelab: ALU and ALU Control

Lab 8 (All Sections) Prelab: ALU and ALU Control Lab 8 (All Sections) Prelab: and Control Name: Sign the following statement: On my honor, as an Aggie, I have neither given nor received nathorized aid on this academic work Objective In this lab yo will

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information

ECE/CS 552: Pipelining

ECE/CS 552: Pipelining ECE/CS 552: Pipelining Prof. ikko Lipasti Lecture notes based in part on slides created by ark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Forecast Big Picture Datapath Control Pipelining s Program

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

MIPS Architecture. Fibonacci (C) Fibonacci (Assembly) Another Example: MIPS. Example: subset of MIPS processor architecture

MIPS Architecture. Fibonacci (C) Fibonacci (Assembly) Another Example: MIPS. Example: subset of MIPS processor architecture Another Eample: IPS From the Harris/Weste book Based on the IPS-like processor from the Hennessy/Patterson book IPS Architectre Eample: sbset of IPS processor architectre Drawn from Patterson & Hennessy

More information

ECEC 355: Pipelining

ECEC 355: Pipelining ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly

More information

Advanced Computer Architecture Pipelining

Advanced Computer Architecture Pipelining Advanced Computer Architecture Pipelining Dr. Shadrokh Samavi Some slides are from the instructors resources which accompany the 6 th and previous editions of the textbook. Some slides are from David Patterson,

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20

More information

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 4: Pipelining Prof. Onur Mutlu Carnegie Mellon University Last Time Addressing modes Other ISA-level tradeoffs Programmer vs. microarchitect Virtual memory Unaligned

More information

Improve performance by increasing instruction throughput

Improve performance by increasing instruction throughput Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS3350B Computer Architecture Quiz 3 March 15, 2018 CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.

More information

4.13. An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations

4.13. An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations .3 An Introdction to Digital Design Using a Hardware Design Langage to Describe and odel a Pipeline and ore Pipelining Illstrations This online section covers hardware description langages and then gives

More information

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution Important guidelines: Always state your assumptions and clearly explain your answers. Please upload your solution document

More information

CSCI-564 Advanced Computer Architecture

CSCI-564 Advanced Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 6: Pipelining Review Bo Wu Colorado School of Mines Wake up! Time to do laundry! The Laundry Analogy Place one dirty load of clothes in the washer When the

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information