Computer Architecture. Lecture 6: Pipelining
|
|
- Matilda Lamb
- 6 years ago
- Views:
Transcription
1 Compter Architectre Lectre 6: Pipelining Dr. Ahmed Sallam Based on original slides by Prof. Onr tl
2 Agenda for Today & Net Few Lectres Single-cycle icroarchitectres lti-cycle and icroprogrammed icroarchitectres Pipelining Isses in Pipelining: Control & Data Dependence Handling, State aintenance and Recovery, Ot-of-Order Eection Isses in OoO Eection: Load-Store Handling, 2
3 Recap of Last Lectre lti-cycle and icroprogrammed icroarchitectres Benefits vs. Design Principles When to Generate Control Signals icroprogrammed Control:, Seqencer, Control Store LC-3b State achine, Datapath, Control Strctre An Eercise in icroprogramming Variable Latency emory, Alignment, emory apped I/O, icroprogramming Power of abstraction (for the HW designer) Advantages of Programmed Control Update of achine Behavior 3
4 Review: A Simple LC-3b Control and Datapath 4
5 R IR[5:] BEN icroseqencer 6 Simple Design of the Control Strctre Control Store icroinstrction 9 26 (J, COND, IRD)
6 A Simple Datapath Can Become Very Powerfl
7 AR <! PC PC <! PC + 2 8, 9 DR <! 33 R R IR <! DR 35 To 8 RTI ADD 32 BEN<! IR[] & N + IR[] & Z + IR[9] & P [IR[5:2]] BR To To To 8 DR<! SR+OP2* set CC DR<! SR&OP2* set CC 5 AND XOR TRAP SHF LEA LDB LDW STW STB JSR JP [BEN] 22 PC<! PC+LSHF(off9,) To 8 9 DR<! SR XOR OP2* set CC 2 PC<! BaseR To 8 To 8 AR<! LSHF(ZEXT[IR[7:]],) 5 4 [IR[]] To 8 R 28 DR<! [AR] R7<! PC R PC<! DR 3 2 R7<! PC PC<! BaseR 2 R7<! PC To 8 PC<! PC+LSHF(off,) 3 To 8 DR<! SHF(SR,A,D,amt4) set CC To 8 To 8 4 DR<! PC+LSHF(off9, ) set CC 2 AR<! B+off6 6 AR<! B+LSHF(off6,) 7 AR<! B+LSHF(off6,) 3 AR<! B+off6 To NOTES B+off6 : Base + SEXT[offset6] PC+off9 : PC + SEXT[offset9] *OP2 may be SR2 or SEXT[imm5] ** [5:8] or [7:] depending on AR[] DR<! [AR[5:] ] R R 3 DR<! SEXT[BYTE.DATA] set CC DR<! [AR] 27 R DR<! DR set CC R DR<! SR 6 [AR]<! DR R R DR<! SR[7:] 7 [AR]<! DR** R R To 8 To 8 To 8 To 9 Figre C.2: A state machine for the LC-3b
8 Review: The Power of Abstraction The concept of a control store of microinstrctions enables the hardware designer with a new abstraction: microprogramming The designer can translate any desired operation to a seqence of microinstrctions All the designer needs to provide is The seqence of microinstrctions needed to implement the desired operation The ability for the control logic to correctly seqence throgh the microinstrctions Any additional path elements and control signals needed (no need if the operation can be translated into eisting control signals) 8
9 Review: Advantages of icroprogrammed Control Allows a very simple design to do powerfl comptation by controlling the path (sing a seqencer) High-level ISA translated into microcode (seqence of -instrctions) icrocode (-code) enables a minimal path to emlate an ISA icroinstrctions can be thoght of as a ser-invisible ISA (-ISA) Enables easy etensibility of the ISA Can spport a new instrction by changing the microcode Can spport comple instrctions as a seqence of simple microinstrctions Enables pdate of machine behavior A bggy implementation of an instrction can be fied by changing the microcode in the field 9
10 lti-cycle vs. Single-Cycle Arch Advantages Disadvantages Yo shold be very familiar with this right now
11 icroprogrammed vs. Hardwired Control Advantages Disadvantages Yo shold be very familiar with this right now
12 Can We Do Better? What limitations do yo see with the mlti-cycle design? Limited concrrency Some hardware resorces are idle dring different phases of instrction processing cycle Fetch logic is idle when an instrction is being decoded or eected ost of the path is idle when a memory access is happening 2
13 Can We Use the Idle Hardware to Improve Concrrency? Goal: ore concrrency Higher instrction throghpt (i.e., more work completed in one cycle) Idea: When an instrction is sing some resorces in its processing phase, process other instrctions on idle resorces not needed by that instrction E.g., when an instrction is being decoded, fetch the net instrction E.g., when an instrction is being eected, decode another instrction E.g., when an instrction is accessing memory (ld/st), eecte the net instrction E.g., when an instrction is writing its reslt into the register file, access memory for the net instrction 3
14 Pipelining 4
15 Pipelining: Basic Idea ore systematically: Idea: Pipeline the eection of mltiple instrctions Analogy: Assembly line processing of instrctions Divide the instrction processing cycle into distinct stages of processing Ensre there are enogh hardware resorces to process one instrction in each stage Process a different instrction in each stage s consective in program order are processed in consective stages Benefit: Increases instrction processing throghpt (/CPI) Downside: Start thinking abot this 5
16 Eample: Eection of For Independent ADDs lti-cycle: 4 cycles per instrction F D E W F D E W F D E W F D E W Pipelined: 4 cycles per 4 instrctions (steady state) F D E W F D E W F D E W Is life always this beatifl? Time F D E W Time 6
17 UNDERSTANDING PIPELINE 7
18 The Landry Analogy Time Task order A B C D 6 P A place one dirty load of clothes in the washer when the washer is finished, place the wet load in the dryer when the dryer 6 P is 7 finished, 8 take 9 ot the dry 2 load and fold 2 A Time when folding is finished, ask yor roommate (??) to pt the clothes Task away order A - steps to do a load are seqentially dependent B - no dependence between different loads - different steps do not share resorces C D Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 8
19 Pipelining ltiple Loads of Landry 7 6 P A Time Task order Time A6 P A Task B order AC DB C D Time 6 P A Task order 6 P A Time A Task order B A C B D C - 4 loads of landry in parallel - no additional resorces - throghpt increased by 4 - latency per load is the same D Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 9
20 7 Pipelining Time ltiple Loads of Landry: In Practice 6 P A Task order A 6 P Time A B Task order C A D B C D Time 6 P A Task order 6 P A TimeA TaskB order C A D B C D Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] the slowest step decides throghpt 2
21 Pipelining ltiple 7 Loads of Landry: In Practice 6 P A Time Task order A6 P A Time TaskB order C A D B C D Time 6 P A Task order 6 P A Time A TaskB order AC DB C D Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] A B A B throghpt restored (2 loads per hor) sing 2 dryers 2
22 PERFORING PIPELINE 22
23 An Ideal Pipeline Goal: Increase throghpt with little increase in cost (hardware cost, in case of instrction processing) Repetition of identical operations The same operation is repeated on a large nmber of different inpts (e.g., all landry loads go throgh the same steps) Repetition of independent operations No dependencies between repeated operations Uniformly partitionable sboperations Processing can be evenly divided into niform-latency sboperations (that do not share resorces) Fitting eamples: atomobile assembly line, doing landry What abot the instrction processing cycle? 23
24 Ideal Pipelining combinational logic (F,D,E,,W) T psec BW=~(/T) T/2 ps (F,D,E) T/2 ps (,W) BW=~(2/T) T/3 ps (F,D) T/3 ps (E,) T/3 ps (,W) BW=~(3/T) 24
25 ore Realistic Pipeline: Throghpt Nonpipelined version with delay T BW = /(T+S) where S = latch delay T ps k-stage pipelined version BW k-stage = / (T/k +S ) BW ma = / ( gate delay + S ) Latch delay redces throghpt (switching overhead b/w stages) T/k ps T/k ps 25
26 ore Realistic Pipeline: Cost Nonpipelined version with combinational cost G Cost = G+L where L = latch cost G gates k-stage pipelined version Cost k-stage = G + Lk Latches increase hardware cost G/k G/k 26
27 Pipelining Processing 27
28 Remember: The Processing Cycle Fetch. fetch (IF) 2. Decode decode and register Evalate operand ress fetch (ID/RF) 3. Eecte/Evalate Fetch Operands memory address (EX/AG) 4. emory operand fetch (E) 5. Store/writeback Eecte reslt (WB) Store Reslt 28
29 Remember the Single-Cycle Uarch [25 ] Shift Jmp address [3 ] left PCSrc =Jmp 4 PC+4 [3 28] [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 reslt ALU PCSrc 2 =Br Taken PC address memory [3 ] [25 2] [2 6] [5 ] register register 2 Registers 2 register Zero ALU ALU reslt bcond ress Data memory [5 ] 6 32 Sign etend ALU control [5 ] ALU operation Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] T BW=~(/T) 29
30 PIPELINE DATA PATH 3
31 Dividing Into Stages 2ps IF: fetch ps 2ps 2ps ps ID: decode/ register file read EX: Eecte/ address calclation E: emory access WB: back ignore for now 4 Shift left 2 reslt PC ress memory register register 2 Registers 2 register Zero ALU ALU reslt ress Data memory RF write 6 Sign etend 32 Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 3
32 Pipeline Throghpt Program eection order Time (in instrctions) lw $, ($) Reg fetch ALU Data access Reg lw $2, 2($) 8ps 8 ns Reg fetch ALU Data access Reg lw $3, 3($) Program eection Time order (in instrctions) lw $, ($) lw $2, 2($) fetch 2ps 2 ns 8 8ps ns Reg fetch ALU Reg Data access ALU Reg Data access Reg fetch... 8ps 8 ns lw $3, 3($) 2ps 2 ns fetch Reg ALU Data access Reg 2ps 2 ns 2ps 2 ns 22ps ns 2ps 2 ns 2ps 2 ns 5-stage speedp is 4, not 5 as predicted by the ideal model. Why? 32
33 PC D +4 PC E +4 Enabling Pipelined Processing: Pipeline Registers IF: fetch ID: decode/ register file read EX: Eecte/ address calclation E: emory access WB: back No resorce is sed by more than stage! IF/ID ID/EX EX/E E/WB 4 4 reslt reslt npc Shift Shift left left 2 2 PC PC PC F ress memory memory IR D register register register 2 2 Registers 2 2 register register Sign Sign etend etend A E B E Imm E Zero Zero ALU ALU ALU ALU reslt reslt Aot B ress ress Data memory DR W Aot W Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] T/k ps T T/k ps 33
34 Pipeline performance 32 impact 6 6 Sign etend 32 Sign etend lw fetch All instrction classes mst follow the same path and timing throgh the pipeline stages. (compare lw, add) lw decode Any performance impact? lw Eection lw emory lw back IF/ID ID/EX ID/EX EX/E EX/E E/WB E/WB 4 Shift left 2 reslt PC PC ress memory register register 2 Registers 2 register 6 32 Sign etend Zero ALU ALU ALU reslt ress Data memory Data memory Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] lw decode lw back 34
35 register 6 Sign etend 6 32 Sign etend Pipelined Operation 32 Eample Clock Clock Clock 5 3 Data memory sb lw $, $, 2($) $2, $3 fetch sb lw $, $, 2($) $2, $3 decode lw $, 2($) Eection sb $, $2, $3 Eection sb lw $, $, 2($) $2, $3 emory sb lw $, $, 2($) $2, $3 back IF/ID ID/EX EX/E E/WB 4 Shift left 2 reslt PC ress memory register register 2 Zero Registers ALU ALU 2 reslt register Is life always this beatifl? 6 32 Sign etend ress Data memory Clock Clock Clock 6 sb $, $2, $3 lw $, 2($) Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] sb $, $2, $3 lw $, 2($) 35 sb $, $2, $3
36 Illstrating Pipeline Operation: Operation View t t t 2 t 3 t 4 t 5 Inst Inst Inst 2 Inst 3 Inst 4 IF ID IF EX ID IF E EX ID IF WB E EX ID IF steady state (fll pipeline) WB E EX ID IF WB E EX ID IF WB E EX ID IF 36
37 Control Points in a Pipeline PCSrc IF/ID ID/EX EX/E E/WB 4 Reg Shift left 2 reslt Branch PC ress memory register register 2 Registers 2 register [5 ] 6 Sign 32 etend ALUSrc 6 ALU control Zero ALU ALU reslt ress em Data memory em emtoreg Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] [2 6] [5 ] RegDst ALUOp Identical set of control points as the single-cycle path!! 37
38 PIPELINE CONTROL SIGNALS 38
39 Control Signals in a Pipeline For a given instrction same control signals as single-cycle, bt control signals reqired at different cycles, depending on stage Option : decode once sing the same logic as single-cycle and bffer signals ntil consmed WB Control WB EX WB IF/ID ID/EX EX/E E/WB Option 2: carry relevant instrction word/field down the pipeline and decode locally within each or in a previos stage Which one is better? 39
40 Pipelined Control Signals PCSrc Control ID/EX WB EX/E WB E/WB IF/ID EX WB PC 4 ress memory register register 2 Registers register Reg 2 Shift left 2 reslt ALUSrc Zero ALU ALU reslt Branch em ress Data memory emtoreg 6 32 [5 ] Sign etend 6 ALU control em [2 6] [5 ] RegDst ALUOp Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 4
41 PIPELINE ISSUES 4
42 Remember: An Ideal Pipeline Goal: Increase throghpt with little increase in cost (hardware cost, in case of instrction processing) Repetition of identical operations The same operation is repeated on a large nmber of different inpts (e.g., all landry loads go throgh the same steps) Repetition of independent operations No dependencies between repeated operations Uniformly partitionable sboperations Processing an be evenly divided into niform-latency sboperations (that do not share resorces) Fitting eamples: atomobile assembly line, doing landry What abot the instrction processing cycle? 42
43 Pipeline: Not An Ideal Pipeline Identical operations... NOT! different instrctions not all need the same stages Forcing different instrctions to go throgh the same pipe stages eternal fragmentation (some pipe stages idle for some instrctions) Uniform sboperations... NOT! different pipeline stages not the same latency Need to force each stage to be controlled by the same clock internal fragmentation (some pipe stages are too fast bt all take the same clock cycle time) Independent operations... NOT! instrctions are not independent of each other Need to detect and resolve inter-instrction dependencies to ensre the pipeline provides correct reslts pipeline stalls (pipeline is not always moving) 43
44 Isses in Pipeline Design Balancing work in pipeline stages How many stages and what is done in each stage Keeping the pipeline correct, moving, and fll in the presence of events that disrpt pipeline flow Handling dependences Data Control Handling resorce contention Handling long-latency (mlti-cycle) operations Handling eceptions, interrpts Advanced: Improving pipeline throghpt inimizing stalls 44
45 Pipeline Stalls Stall: A condition when the pipeline stops moving Resorce contention Dependences (between instrctions) Data Control Long-latency (mlti-cycle) operations 45
46 DEPENDENCES 46
47 A compter program Following the Von Nemann model, the program is a seqence of instrctions L: ov AX, 3 a, b. jmp l 47
48 Dependences and Their Types Also called dependency or less desirably hazard Dependences dictate ordering reqirements between instrctions Two types Data dependence Control dependence Resorce contention is sometimes called resorce dependence 48
49 Handling Resorce Contention Happens when instrctions in two pipeline stages need the same resorce Soltion : Eliminate the case of contention Dplicate the resorce or increase its throghpt E.g., se separate instrction and memories (caches) E.g., se mltiple ports for memory strctres Soltion 2: Detect the resorce contention and stall one of the contending stages Which stage do yo stall? Eample: What if yo had a single read and write port for the register file? 49
50 UNDERSTANDING DEPENDENCES 5
51 Data Dependences Types of dependences Flow dependence (tre dependence read after write) Comes from the rnning program semantic Anti dependence (write after read) Otpt dependence (write after write) Which ones case stalls in a pipelined machine? For all of them, we need to ensre semantics of the program is correct Flow dependences always need to be obeyed becase they constitte tre dependence on a vale Anti and otpt dependences eist de to limited nmber of architectral registers They are dependence on a name, not a vale 5
52 Data Dependence Types Flow dependence r 3 r op r 2 -after- r 5 r 3 op r 4 (RAW) Anti dependence r 3 r op r 2 -after- r r 4 op r 5 (WAR) Otpt-dependence r 3 r op r 2 -after- r 5 r 3 op r 4 (WAW) r 3 r 6 op r 7 52
53 register 2 Zero memory Registers ALU 2 ALU register reslt Eample otpt 6 dependence 32 Sign etend 6 32 Sign etend Clock 3 Sb sb lw $,$2,$3 $, 2($) $2, $3 PC fetch PC PC 4 Clock Clock 5 4 ress 4 ress ress memory memory memory Clock Clock 56 2 Clock 3 Clock 4 IF/ID ID/EX EX/E E/WB In lw $, 2($) Sb sb $, $,$2,$3 $2, $3 decode decode register register register 2 Registers register register 2 2 register Registers register 2 2 register Registers 2 register 6 32 Sign 6 etend 32 Sign 6 etend 32 Sign etend Shift left 2 Shift left 2 lw $, 2($) Eection reslt reslt reslt Zero ALU ALU Zero reslt ALU ALU Zero reslt ALU ALU reslt IF/ID ID/EX EX/E E/WB IF/ID ID/EX EX/E E/WB 6 Sign etend 32 sb Sb $, $,$2,$3 $2, $3 Shift left 2 Eection Data memory ress Data memory Sb sb $,$2,$3 $, $2, $3 emory lw $, 2($) emory ress ress Data memory ress Data memory Data memory sb lw Sb $, $, $,$2,$3 2($) $2, $3 back sb $, $2, $3 lw $, 2($) 53 sb $, $2, $3
54 register 2 Zero memory Registers ALU 2 ALU register reslt Eample of flow 6 dependence 32 Sign etend 6 32 Sign etend Clock 3 Sb sb lw $,$2,$ $, 2($) $2, $3 PC fetch PC PC 4 Clock Clock 5 4 ress 4 ress ress memory memory memory Clock Clock 56 2 Clock 3 Clock 4 IF/ID ID/EX EX/E E/WB In lw $, 2($) Sb sb $,$2,$ $2, $3 decode decode register register register 2 Registers register register 2 2 register Registers register 2 2 register Registers 2 register 6 32 Sign 6 etend 32 Sign 6 etend 32 Sign etend Shift left 2 Shift left 2 lw $, 2($) Eection reslt reslt reslt Zero ALU ALU Zero reslt ALU ALU Zero reslt ALU ALU reslt IF/ID ID/EX EX/E E/WB IF/ID ID/EX EX/E E/WB 6 Sign etend 32 sb Sb $,$2,$ $2, $3 Shift left 2 Eection Data memory ress Data memory Sb sb $,$2,$ $2, $3 emory lw $, 2($) emory ress ress Data memory ress Data memory Data memory sb lw Sb $, $,$2,$ 2($) $2, $3 back sb $, $2, $3 lw $, 2($) 54 sb $, $2, $3
55 Control Dependence Qestion: What shold the fetch PC be in the net cycle? Answer: The address of the net instrction All instrctions are control dependent on previos ones. Why? If the fetched instrction is a non-control-flow instrction: Net Fetch PC is the address of the net-seqential instrction Easy to determine if we know the size of the fetched instrction If the instrction that is fetched is a control-flow instrction: How do we determine the net Fetch PC? In fact, how do we know whether or not the fetched instrction is a control-flow instrction? 55
56 DEPENDENCES DETECTION 56
57 Interlocking Detection of dependence between instrctions in a pipelined processor to garantee correct eection Software based interlocking vs. Hardware based interlocking IPS acronym? icroprocessor withot Interlocked Pipeline Stages 57
58 Approaches to Dependence Detection (I) Scoreboarding Each register in register file has a Valid bit associated with it An instrction that is writing to the register resets the Valid bit An instrction in Decode stage checks if all its sorce and destination registers are Valid Yes: No need to stall No dependence No: Stall the instrction Advantage: Simple. bit per register Disadvantage: Need to stall for all types of dependences, not only flow dep. 58
59 Scoreboarding IF/ID ID/EX EX/E E/WB 4 Shift left 2 reslt PC ress memory register register 2 Registers 2 register Zero ALU ALU reslt ress Data memory 6 Sign etend 32 59
60 Not Stalling on Anti and Otpt Dependences What changes wold yo make to the scoreboard to enable this? conter for writing operation, not jst and 6
61 Approaches to Dependence Detection (II) Combinational dependence check logic Special logic that checks if any instrction in later stages is spposed to write to any sorce register of the instrction that is being decoded Yes: stall the instrction/pipeline No: no need to stall no flow dependence Advantage: No need to stall on anti and otpt dependences Disadvantage: Logic is more comple than a scoreboard Logic becomes more comple as we make the pipeline deeper and wider (flash-forward: think sperscalar eection) 6
62 DATA DEPENDENCE HANDLING 62
63 Once Yo Detect the Dependence in Hardware What do yo do afterwards? Observation: Dependence between two instrctions is detected before the commnicated vale becomes available 63
64 How to Handle Data Dependences Anti and otpt dependences are easier to handle write to the destination in one stage and in program order Flow dependences are more interesting Five fndamental ways of handling flow dependences. Detect and wait ntil vale is available in register file 2. Detect and forward/bypass to dependent instrction Detect and eliminate the dependence at the software level No need for the hardware to detect dependence (IPS NOP) Do something else (same program reorder ), (different program fine-grained mltithreading ) and No need to detect. Predict the needed vale(s), eecte speclatively, and verify 64
65 Right place to eliminate dependency Which one of the following flow dependences lead to conflicts in the 5-stage pipeline? addi r - - IF ID EX E WB addi - r - IF ID EX E WB addi - r - IF ID EX E addi - r - IF ID EX addi - r - IF? ID addi - r- IF 65
66 Safe and Unsafe ovement of Pipeline stage X j:_r k Reg j:r k _ Reg j:r k _ Reg i F j i A j i O j stage Y i:r k _ Reg i:_r k Reg i:r k _ Reg RAW Dependence WAR Dependence WAW Dependence dist(i,j) dist(x,y)?? Unsafe to keep j moving dist(i,j) > dist(x,y)?? Safe 66
67 RAW Dependence Analysis Eample s I A and I B (where I A comes before I B ) have RAW dependence iff IF R/I-Type LW SW Br J Jr ID read RF read RF read RF read RF read RF EX E WB write RF write RF I B (R/I, LW, SW, Br or JR) reads a register written by I A (R/I or LW) dist(i A, I B ) dist(id, WB) = 3 What abot WAW and WAR dependence? What abot memory dependence? 67
68 Pipeline Stall: Resolving Data Dependence t t t 2 t 3 t 4 t 5 Inst h IF ID ALU E WB Inst i i IF ID ALU E WB Inst j j IF ID ALU ID E ALU ID WB E ALU ID WB E ALU Inst k IF ID IF ALU ID IF E ALU ID IF WB E ALU ID Inst l IF ID IF ALU ID IF E ALU ID IF IF ID IF ALU ID IF i: r _ j: bbble _ r IF ID dist(i,j)= IF Stall = make the dependent instrction j: bbble _ r dist(i,j)=2 IF j: bbble _ r dist(i,j)=3 j: _ r dist(i,j)=4 wait ntil its sorce vale is available. stop all p-stream stages 2. drain all down-stream stages 68
69 Sample Assembly (P&H) for (j=i-; j>= && v[j] > v[j+]; j-=) {... } addi $s, $s, - for2tst: slti $t, $s, bne $t, $zero, eit2 sll $t, $s, 2 add $t2, $a, $t lw $t3, ($t2) lw $t4, 4($t2) slt $t, $t4, $t3 beq $t, $zero, eit2... addi $s, $s, - j for2tst eit2: 3 stalls 3 stalls 3 stalls 3 stalls 3 stalls 3 stalls 69
70 ings P&H Chapter Smith and Sohi, The icroarchitectre of Sperscalar Processors, Proceedings of the IEEE, 995 ore advanced pipelining Interrpt and eception handling Ot-of-order and sperscalar eection concepts 7
Lecture 10: Pipelined Implementations
U 8-7 S 9 L- 8-7 Lectre : Pipelined Implementations James. Hoe ept of EE, U Febrary 23, 29 nnoncements: Project is de this week idterm graded, d reslts posted Handots: H9 Homework 3 (on lackboard) Graded
More informationPipelining. Chapter 4
Pipelining Chapter 4 ake processor rns faster Pipelining is an implementation techniqe in which mltiple instrctions are overlapped in eection Key of making processor fast Pipelining Single cycle path we
More informationComputer Architecture
Compter Architectre Lectre 4: Intro to icroarchitectre: Single- Cycle Dr. Ahmed Sallam Sez Canal University Based on original slides by Prof. Onr tl Review Compter Architectre Today and Basics (Lectres
More informationComputer Architecture
Compter Architectre Lectre 4: Intro to icroarchitectre: Single- Cycle Dr. Ahmed Sallam Sez Canal University Spring 25 Based on original slides by Prof. Onr tl Review Compter Architectre Today and Basics
More informationReview: Computer Organization
Review: Compter Organization Pipelining Chans Y Landry Eample Landry Eample Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 3 mintes A B C D Dryer takes 3 mintes
More informationOverview of Pipelining
EEC 58 Compter Architectre Pipelining Department of Electrical Engineering and Compter Science Cleveland State University Fndamental Principles Overview of Pipelining Pipelined Design otivation: Increase
More information1048: Computer Organization
8: Compter Organization Lectre 6 Pipelining Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6- Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards
More informationChapter 6: Pipelining
CSE 322 COPUTER ARCHITECTURE II Chapter 6: Pipelining Chapter 6: Pipelining Febrary 10, 2000 1 Clothes Washing CSE 322 COPUTER ARCHITECTURE II The Assembly Line Accmlate dirty clothes in hamper Place in
More informationWhat do we have so far? Multi-Cycle Datapath
What do we have so far? lti-cycle Datapath CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instrction being processed in datapath How to lower CPI frther? #1 Lec # 8 Spring2 4-11-2 Pipelining pipelining
More informationEnhanced Performance with Pipelining
Chapter 6 Enhanced Performance with Pipelining Note: The slides being presented represent a mi. Some are created by ark Franklin, Washington University in St. Lois, Dept. of CSE. any are taken from the
More informationTDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction
Review Friday the 2st of October Real world eamples of pipelining? How does pipelining pp inflence instrction latency? How does pipelining inflence instrction throghpt? What are the three types of hazard
More informationPIPELINING. Pipelining: Natural Phenomenon. Pipelining. Pipelining Lessons
Pipelining: Natral Phenomenon Landry Eample: nn, rian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 mintes C D Dryer takes 0 mintes PIPELINING Folder takes 20 mintes
More informationComputer Architecture Lecture 6: Multi-cycle Microarchitectures. Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 2/6/2012
8-447 Compter Architectre Lectre 6: lti-cycle icroarchitectres Prof. Onr tl Carnegie ellon University Spring 22, 2/6/22 Reminder: Homeworks Homework soltions Check and stdy the soltions! Learning now is
More informationChapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts
CS359: Compter Architectre Chapter 3 & Appendi C Pipelining Part A: Basic and Intermediate Concepts Yanyan Shen Department of Compter Science and Engineering Shanghai Jiao Tong University 1 Otline Introdction
More informationSolutions for Chapter 6 Exercises
Soltions for Chapter 6 Eercises Soltions for Chapter 6 Eercises 6. 6.2 a. Shortening the ALU operation will not affect the speedp obtained from pipelining. It wold not affect the clock cycle. b. If the
More informationExceptions and interrupts
Eceptions and interrpts An eception or interrpt is an nepected event that reqires the CPU to pase or stop the crrent program. Eception handling is the hardware analog of error handling in software. Classes
More informationCS 251, Winter 2019, Assignment % of course mark
CS 25, Winter 29, Assignment.. 3% of corse mark De Wednesday, arch 3th, 5:3P Lates accepted ntil Thrsday arch th, pm with a 5% penalty. (7 points) In the diagram below, the mlticycle compter from the corse
More informationChapter 6: Pipelining
Chapter 6: Pipelining Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining
More informationPS Midterm 2. Pipelining
PS idterm 2 Pipelining Seqential Landry 6 P 7 8 9 idnight Time T a s k O r d e r A B C D 3 4 2 3 4 2 3 4 2 3 4 2 Seqential landry takes 6 hors for 4 loads If they learned pipelining, how long wold landry
More informationThe final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.
The final path PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtor RegDst ALUSrc em I [5
More informationThe single-cycle design from last time
lticycle path Last time we saw a single-cycle path and control nit for or simple IPS-based instrction set. A mlticycle processor fies some shortcomings in the single-cycle CPU. Faster instrctions are not
More informationReview Multicycle: What is Happening. Controlling The Multicycle Design
Review lticycle: What is Happening Reslt Zero Op SrcA SrcB Registers Reg Address emory em Data Sign etend Shift left Sorce A B Ot [-6] [5-] [-6] [5-] [5-] Instrction emory IR RegDst emtoreg IorD em em
More informationEEC 483 Computer Organization
EEC 83 Compter Organization Chapter.6 A Pipelined path Chans Y Pipelined Approach 2 - Cycle time, No. stages - Resorce conflict E E A B C D 3 E E 5 E 2 3 5 2 6 7 8 9 c.y9@csohio.ed Resorces sed in 5 Stages
More informationCS 251, Winter 2018, Assignment % of course mark
CS 25, Winter 28, Assignment 4.. 3% of corse mark De Wednesday, arch 7th, 4:3P Lates accepted ntil Thrsday arch 8th, am with a 5% penalty. (6 points) In the diagram below, the mlticycle compter from the
More informationLecture 6: Microprogrammed Multi Cycle Implementation. James C. Hoe Department of ECE Carnegie Mellon University
8 447 Lectre 6: icroprogrammed lti Cycle Implementation James C. Hoe Department of ECE Carnegie ellon University 8 447 S8 L06 S, James C. Hoe, CU/ECE/CALC, 208 Yor goal today Hosekeeping nderstand why
More informationThe extra single-cycle adders
lticycle Datapath As an added bons, we can eliminate some of the etra hardware from the single-cycle path. We will restrict orselves to sing each fnctional nit once per cycle, jst like before. Bt since
More informationChapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind
Pipelining hink of sing machines in landry services Chapter 6 nhancing Performance with Pipelining 6 P 7 8 9 A ime ask A B C ot pipelined Assme 3 min. each task wash, dry, fold, store and that separate
More informationEXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation
EXAINATIONS 2003 COP203 END-YEAR Compter Organisation Time Allowed: 3 Hors (180 mintes) Instrctions: Answer all qestions. There are 180 possible marks on the eam. Calclators and foreign langage dictionaries
More informationThe multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM
Lectre (Wed /5/28) Lab # Hardware De Fri Oct 7 HW #2 IPS programming, de Wed Oct 22 idterm Fri Oct 2 IorD The mlticycle path SrcA Today s objectives: icroprogramming Etending the mlti-cycle path lti-cycle
More informationPART I: Adding Instructions to the Datapath. (2 nd Edition):
EE57 Instrctor: G. Pvvada ===================================================================== Homework #5b De: check on the blackboard =====================================================================
More informationProf. Kozyrakis. 1. (10 points) Consider the following fragment of Java code:
EE8 Winter 25 Homework #2 Soltions De Thrsday, Feb 2, 5 P. ( points) Consider the following fragment of Java code: for (i=; i
More informationEXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION
EXAINATIONS 2010 END OF YEAR COPUTER ORGANIZATION Time Allowed: 3 Hors (180 mintes) Instrctions: Answer all qestions. ake sre yor answers are clear and to the point. Calclators and paper foreign langage
More informationReview. A single-cycle MIPS processor
Review If three instrctions have opcodes, 7 and 5 are they all of the same type? If we were to add an instrction to IPS of the form OD $t, $t2, $t3, which performs $t = $t2 OD $t3, what wold be its opcode?
More informationLecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 8: Data Hazard and Resolution James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L08 S1, James C. Hoe, CU/ECE/CALC, 2018 Your goal today Housekeeping detect and resolve
More informationLecture 9: Microcontrolled Multi-Cycle Implementations
8-447 Lectre 9: icroled lti-cycle Implementations James C. Hoe Dept of ECE, CU Febrary 8, 29 S 9 L9- Annoncements: P&H Appendi D Get started t on Lab Handots: Handot #8: Project (on Blackboard) Single-Cycle
More informationEEC 483 Computer Organization
EEC 483 Compter Organization Chapter 4.4 A Simple Implementation Scheme Chans Y The Big Pictre The Five Classic Components of a Compter Processor Control emory Inpt path Otpt path & Control 2 path and
More information1048: Computer Organization
48: Compter Organization Lectre 5 Datapath and Control Lectre5A - simple implementation (cwli@twins.ee.nct.ed.tw) 5A- Introdction In this lectre, we will try to implement simplified IPS which contain emory
More information1048: Computer Organization
48: Compter Organization Lectre 5 Datapath and Control Lectre5B - mlticycle implementation (cwli@twins.ee.nct.ed.tw) 5B- Recap: A Single-Cycle Processor PCSrc 4 Add Shift left 2 Add ALU reslt PC address
More informationQuiz #1 EEC 483, Spring 2019
Qiz # EEC 483, Spring 29 Date: Jan 22 Name: Eercise #: Translate the following instrction in C into IPS code. Eercise #2: Translate the following instrction in C into IPS code. Hint: operand C is stored
More informationEEC 483 Computer Organization. Branch (Control) Hazards
EEC 483 Compter Organization Section 4.8 Branch Hazards Section 4.9 Exceptions Chans Y Branch (Control) Hazards While execting a previos branch, next instrction address might not yet be known. s n i o
More informationComputer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University
Compter Architectre Chapter 5 Fall 25 Department of Compter Science Kent State University The Processor: Datapath & Control Or implementation of the MIPS is simplified memory-reference instrctions: lw,
More informationCSE Introduction to Computer Architecture Chapter 5 The Processor: Datapath & Control
CSE-45432 Introdction to Compter Architectre Chapter 5 The Processor: Datapath & Control Dr. Izadi Data Processor Register # PC Address Registers ALU memory Register # Register # Address Data memory Data
More informationInstruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion
. (Chapter 5) Fill in the vales for SrcA, SrcB, IorD, Dst and emto to complete the Finite State achine for the mlti-cycle datapath shown below. emory address comptation 2 SrcA = SrcB = Op = fetch em SrcA
More informationLecture 7. Building A Simple Processor
Lectre 7 Bilding A Simple Processor Christos Kozyrakis Stanford University http://eeclass.stanford.ed/ee8b C. Kozyrakis EE8b Lectre 7 Annoncements Upcoming deadlines Lab is de today Demo by 5pm, report
More informationComp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13
Comp 33 Compter Architectre A Pipelined path Lectre 3 Pipelined path with Signals PCSrc IF/ ID ID/ EX EX / E E / Add PC 4 Address Instrction emory RegWr ra rb rw Registers bsw [5-] [2-6] [5-] bsa bsb Sign
More informationCS 251, Spring 2018, Assignment 3.0 3% of course mark
CS 25, Spring 28, Assignment 3. 3% of corse mark De onday, Jne 25th, 5:3 P. (5 points) Consider the single-cycle compter shown on page 6 of this assignment. Sppose the circit elements take the following
More informationInstruction Pipelining is the use of pipelining to allow more than one instruction to be in some stage of execution at the same time.
Pipelining Pipelining is the se of pipelining to allow more than one instrction to be in some stage of eection at the same time. Ferranti ATLAS (963): Pipelining redced the average time per instrction
More informationComputer Architecture. Lecture 5: Multi-Cycle and Microprogrammed Microarchitectures
Computer Architecture Lecture 5: Multi-Cycle and Microprogrammed Microarchitectures Dr. Ahmed Sallam Based on original slides by Prof. Onur Mutlu Agenda for Today & Next Few Lectures Single-cycle Microarchitectures
More informationLecture 13: Exceptions and Interrupts
18 447 Lectre 13: Eceptions and Interrpts S 10 L13 1 James C. Hoe Dept of ECE, CU arch 1, 2010 Annoncements: Handots: Spring break is almost here Check grades on Blackboard idterm 1 graded Handot #9: Lab
More informationCSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade
CSE 141 Compter Architectre Smmer Session I, 2004 Lectres 10 Advanced Topics, emory Hierarchy and Cache Pramod V. Argade CSE141: Introdction to Compter Architectre Instrctor: TA: Pramod V. Argade (p2argade@cs.csd.ed)
More informationCMSC Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining Prof. Yanjing Li University of Chicago Administrative Stuff! Lab1 due at 11:59pm today! Lab2 out " Pipeline ARM simulator "
More informationCS 251, Winter 2018, Assignment % of course mark
CS 25, Winter 28, Assignment 3.. 3% of corse mark De onday, Febrary 26th, 4:3 P Lates accepted ntil : A, Febrary 27th with a 5% penalty. IEEE 754 Floating Point ( points): (a) (4 points) Complete the following
More information4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1
.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage to Describe and odel a Pipeline
More informationProcessor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed
Lecture 3: General Purpose Processor Design CSCE 665 Advanced VLSI Systems Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, tet etc included in slides are borrowed from various books, websites,
More informationLecture 10: Pipelined Implementations: Hazards and Resolutions. Instruction Pipeline Reality
18-447 Lecture 10: Pipelined Implementations: Hazards and Resolutions S 09 L10-1 James C. Hoe José F. Martínez Electrical and Computer Engineering Carnegie Mellon University February 15, 2010 Instruction
More informationT = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good
CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle
More informationDesign of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017
Design of Digital Circuits Lecture 5: Pipelining Prof. Onur Mutlu ETH Zurich Spring 27 3 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed
More informationWinter 2013 MIDTERM TEST #2 Wednesday, March 20 7:00pm to 8:15pm. Please do not write your U of C ID number on this cover page.
page of 7 University of Calgary Departent of Electrical and Copter Engineering ENCM 369: Copter Organization Lectre Instrctors: Steve Noran and Nor Bartley Winter 23 MIDTERM TEST #2 Wednesday, March 2
More informationCSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content
3/6/8 CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Today s Content We have looked at how to design a Data Path. 4.4, 4.5 We will design
More informationAnimating the Datapath. Animating the Datapath: R-type Instruction. Animating the Datapath: Load Instruction. MIPS Datapath I: Single-Cycle
nimating the atapath PS atapath : Single-Cycle npt is either (-type) or sign-etended lower half of instrction (load/store) op offset/immediate W egister File 6 6 + from instrction path beq,, offset if
More informationPipelining. Maurizio Palesi
* Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer
More informationCOMP2611: Computer Organization. The Pipelined Processor
COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among
More informationWhat do we have so far? Multi-Cycle Datapath (Textbook Version)
What do we have so far? ulti-cycle Datapath (Textbook Version) CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instruction being processed in datapath How to lower CPI further? #1 Lec # 8 Summer2001
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationDesigning a Pipelined CPU
Designing a Pipelined CPU CSE 4, S2'6 Review -- Single Cycle CPU CSE 4, S2'6 Review -- ultiple Cycle CPU CSE 4, S2'6 Review -- Instruction Latencies Single-Cycle CPU Load Ifetch /Dec Exec em Wr ultiple
More informationLecture 6: Pipelining
Lecture 6: Pipelining i CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and other
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationPipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12
Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online
More informationThe University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000
The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE 513 01 Test II November 14, 2000 Name: 1. (5 points) For an eight-stage pipeline, how many cycles does it take to
More informationHardware Design Tips. Outline
Hardware Design Tips EE 36 University of Hawaii EE 36 Fall 23 University of Hawaii Otline Verilog: some sbleties Simlators Test Benching Implementing the IPS Actally a simplified 6 bit version EE 36 Fall
More informationFull Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI
CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Full Datapath Branch Target Instruction Fetch Immediate 4 Today s Contents We have looked
More informationcs470 - Computer Architecture 1 Spring 2002 Final Exam open books, open notes
1 of 7 ay 13, 2002 v2 Spring 2002 Final Exam open books, open notes Starts: 7:30 pm Ends: 9:30 pm Name: (please print) ID: Problem ax points Your mark Comments 1 10 5+5 2 40 10+5+5+10+10 3 15 5+10 4 10
More informationPipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science
Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and
More informationDesign of Digital Circuits Lecture 16: Dependence Handling. Prof. Onur Mutlu ETH Zurich Spring April 2017
Design of Digital Circuits Lecture 16: Dependence Handling Prof. Onur Mutlu ETH Zurich Spring 2017 27 April 2017 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed
More informationReview. How to represent real numbers
PCWrite PC IorD Review ALUSrcA emread Address Write data emory emwrite em Data IRWrite [3-26] [25-2] [2-6] [5-] [5-] RegDst Read register Read register 2 Write register Write data RegWrite Read data Read
More informationPage 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight
Pipelining: Its Natural! Chapter 3 Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes A B C D Dryer takes 40 minutes Folder
More informationLecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1
Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)
More informationECE232: Hardware Organization and Design
ECE232: Harware Organization an Design ectre 11: Introction to IPs path apte from Compter Organization an Design, Patterson & Hennessy, CB IPS-lite processor Compter Want to bil a processor for a sbset
More informationLecture 19 Introduction to Pipelining
CSE 30321 Lecture 19 Pipelining (Part 1) 1 Lecture 19 Introduction to Pipelining CSE 30321 Lecture 19 Pipelining (Part 1) Basic pipelining basic := single, in-order issue single issue one instruction at
More informationComputer Architecture. Lecture 6.1: Fundamentals of
CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and
More informationPipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.
Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing (2) Pipeline Performance Assume time for stages is ps for register read or write
More informationLab 8 (All Sections) Prelab: ALU and ALU Control
Lab 8 (All Sections) Prelab: and Control Name: Sign the following statement: On my honor, as an Aggie, I have neither given nor received nathorized aid on this academic work Objective In this lab yo will
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More informationECE/CS 552: Pipelining
ECE/CS 552: Pipelining Prof. ikko Lipasti Lecture notes based in part on slides created by ark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Forecast Big Picture Datapath Control Pipelining s Program
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationMIPS Architecture. Fibonacci (C) Fibonacci (Assembly) Another Example: MIPS. Example: subset of MIPS processor architecture
Another Eample: IPS From the Harris/Weste book Based on the IPS-like processor from the Hennessy/Patterson book IPS Architectre Eample: sbset of IPS processor architectre Drawn from Patterson & Hennessy
More informationECEC 355: Pipelining
ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly
More informationAdvanced Computer Architecture Pipelining
Advanced Computer Architecture Pipelining Dr. Shadrokh Samavi Some slides are from the instructors resources which accompany the 6 th and previous editions of the textbook. Some slides are from David Patterson,
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20
More information15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 4: Pipelining Prof. Onur Mutlu Carnegie Mellon University Last Time Addressing modes Other ISA-level tradeoffs Programmer vs. microarchitect Virtual memory Unaligned
More informationImprove performance by increasing instruction throughput
Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationCS3350B Computer Architecture Quiz 3 March 15, 2018
CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.
More information4.13. An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations
.3 An Introdction to Digital Design Using a Hardware Design Langage to Describe and odel a Pipeline and ore Pipelining Illstrations This online section covers hardware description langages and then gives
More informationEC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution
EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution Important guidelines: Always state your assumptions and clearly explain your answers. Please upload your solution document
More informationCSCI-564 Advanced Computer Architecture
CSCI-564 Advanced Computer Architecture Lecture 6: Pipelining Review Bo Wu Colorado School of Mines Wake up! Time to do laundry! The Laundry Analogy Place one dirty load of clothes in the washer When the
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More information