Computer Architecture. Lecture 6: Pipelining

Size: px

Start display at page:

Download "Computer Architecture. Lecture 6: Pipelining"

Matilda Lamb
6 years ago
Views:

1 Compter Architectre Lectre 6: Pipelining Dr. Ahmed Sallam Based on original slides by Prof. Onr tl

2 Agenda for Today & Net Few Lectres Single-cycle icroarchitectres lti-cycle and icroprogrammed icroarchitectres Pipelining Isses in Pipelining: Control & Data Dependence Handling, State aintenance and Recovery, Ot-of-Order Eection Isses in OoO Eection: Load-Store Handling, 2

3 Recap of Last Lectre lti-cycle and icroprogrammed icroarchitectres Benefits vs. Design Principles When to Generate Control Signals icroprogrammed Control:, Seqencer, Control Store LC-3b State achine, Datapath, Control Strctre An Eercise in icroprogramming Variable Latency emory, Alignment, emory apped I/O, icroprogramming Power of abstraction (for the HW designer) Advantages of Programmed Control Update of achine Behavior 3

4 Review: A Simple LC-3b Control and Datapath 4

5 R IR[5:] BEN icroseqencer 6 Simple Design of the Control Strctre Control Store icroinstrction 9 26 (J, COND, IRD)

6 A Simple Datapath Can Become Very Powerfl

7 AR <! PC PC <! PC + 2 8, 9 DR <! 33 R R IR <! DR 35 To 8 RTI ADD 32 BEN<! IR[] & N + IR[] & Z + IR[9] & P [IR[5:2]] BR To To To 8 DR<! SR+OP2* set CC DR<! SR&OP2* set CC 5 AND XOR TRAP SHF LEA LDB LDW STW STB JSR JP [BEN] 22 PC<! PC+LSHF(off9,) To 8 9 DR<! SR XOR OP2* set CC 2 PC<! BaseR To 8 To 8 AR<! LSHF(ZEXT[IR[7:]],) 5 4 [IR[]] To 8 R 28 DR<! [AR] R7<! PC R PC<! DR 3 2 R7<! PC PC<! BaseR 2 R7<! PC To 8 PC<! PC+LSHF(off,) 3 To 8 DR<! SHF(SR,A,D,amt4) set CC To 8 To 8 4 DR<! PC+LSHF(off9, ) set CC 2 AR<! B+off6 6 AR<! B+LSHF(off6,) 7 AR<! B+LSHF(off6,) 3 AR<! B+off6 To NOTES B+off6 : Base + SEXT[offset6] PC+off9 : PC + SEXT[offset9] *OP2 may be SR2 or SEXT[imm5] ** [5:8] or [7:] depending on AR[] DR<! [AR[5:] ] R R 3 DR<! SEXT[BYTE.DATA] set CC DR<! [AR] 27 R DR<! DR set CC R DR<! SR 6 [AR]<! DR R R DR<! SR[7:] 7 [AR]<! DR** R R To 8 To 8 To 8 To 9 Figre C.2: A state machine for the LC-3b

8 Review: The Power of Abstraction The concept of a control store of microinstrctions enables the hardware designer with a new abstraction: microprogramming The designer can translate any desired operation to a seqence of microinstrctions All the designer needs to provide is The seqence of microinstrctions needed to implement the desired operation The ability for the control logic to correctly seqence throgh the microinstrctions Any additional path elements and control signals needed (no need if the operation can be translated into eisting control signals) 8

9 Review: Advantages of icroprogrammed Control Allows a very simple design to do powerfl comptation by controlling the path (sing a seqencer) High-level ISA translated into microcode (seqence of -instrctions) icrocode (-code) enables a minimal path to emlate an ISA icroinstrctions can be thoght of as a ser-invisible ISA (-ISA) Enables easy etensibility of the ISA Can spport a new instrction by changing the microcode Can spport comple instrctions as a seqence of simple microinstrctions Enables pdate of machine behavior A bggy implementation of an instrction can be fied by changing the microcode in the field 9

10 lti-cycle vs. Single-Cycle Arch Advantages Disadvantages Yo shold be very familiar with this right now

11 icroprogrammed vs. Hardwired Control Advantages Disadvantages Yo shold be very familiar with this right now

12 Can We Do Better? What limitations do yo see with the mlti-cycle design? Limited concrrency Some hardware resorces are idle dring different phases of instrction processing cycle Fetch logic is idle when an instrction is being decoded or eected ost of the path is idle when a memory access is happening 2

13 Can We Use the Idle Hardware to Improve Concrrency? Goal: ore concrrency Higher instrction throghpt (i.e., more work completed in one cycle) Idea: When an instrction is sing some resorces in its processing phase, process other instrctions on idle resorces not needed by that instrction E.g., when an instrction is being decoded, fetch the net instrction E.g., when an instrction is being eected, decode another instrction E.g., when an instrction is accessing memory (ld/st), eecte the net instrction E.g., when an instrction is writing its reslt into the register file, access memory for the net instrction 3

14 Pipelining 4

15 Pipelining: Basic Idea ore systematically: Idea: Pipeline the eection of mltiple instrctions Analogy: Assembly line processing of instrctions Divide the instrction processing cycle into distinct stages of processing Ensre there are enogh hardware resorces to process one instrction in each stage Process a different instrction in each stage s consective in program order are processed in consective stages Benefit: Increases instrction processing throghpt (/CPI) Downside: Start thinking abot this 5

16 Eample: Eection of For Independent ADDs lti-cycle: 4 cycles per instrction F D E W F D E W F D E W F D E W Pipelined: 4 cycles per 4 instrctions (steady state) F D E W F D E W F D E W Is life always this beatifl? Time F D E W Time 6

17 UNDERSTANDING PIPELINE 7

18 The Landry Analogy Time Task order A B C D 6 P A place one dirty load of clothes in the washer when the washer is finished, place the wet load in the dryer when the dryer 6 P is 7 finished, 8 take 9 ot the dry 2 load and fold 2 A Time when folding is finished, ask yor roommate (??) to pt the clothes Task away order A - steps to do a load are seqentially dependent B - no dependence between different loads - different steps do not share resorces C D Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 8

19 Pipelining ltiple Loads of Landry 7 6 P A Time Task order Time A6 P A Task B order AC DB C D Time 6 P A Task order 6 P A Time A Task order B A C B D C - 4 loads of landry in parallel - no additional resorces - throghpt increased by 4 - latency per load is the same D Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 9

20 7 Pipelining Time ltiple Loads of Landry: In Practice 6 P A Task order A 6 P Time A B Task order C A D B C D Time 6 P A Task order 6 P A TimeA TaskB order C A D B C D Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] the slowest step decides throghpt 2

21 Pipelining ltiple 7 Loads of Landry: In Practice 6 P A Time Task order A6 P A Time TaskB order C A D B C D Time 6 P A Task order 6 P A Time A TaskB order AC DB C D Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] A B A B throghpt restored (2 loads per hor) sing 2 dryers 2

22 PERFORING PIPELINE 22

23 An Ideal Pipeline Goal: Increase throghpt with little increase in cost (hardware cost, in case of instrction processing) Repetition of identical operations The same operation is repeated on a large nmber of different inpts (e.g., all landry loads go throgh the same steps) Repetition of independent operations No dependencies between repeated operations Uniformly partitionable sboperations Processing can be evenly divided into niform-latency sboperations (that do not share resorces) Fitting eamples: atomobile assembly line, doing landry What abot the instrction processing cycle? 23

24 Ideal Pipelining combinational logic (F,D,E,,W) T psec BW=~(/T) T/2 ps (F,D,E) T/2 ps (,W) BW=~(2/T) T/3 ps (F,D) T/3 ps (E,) T/3 ps (,W) BW=~(3/T) 24

25 ore Realistic Pipeline: Throghpt Nonpipelined version with delay T BW = /(T+S) where S = latch delay T ps k-stage pipelined version BW k-stage = / (T/k +S ) BW ma = / ( gate delay + S ) Latch delay redces throghpt (switching overhead b/w stages) T/k ps T/k ps 25

26 ore Realistic Pipeline: Cost Nonpipelined version with combinational cost G Cost = G+L where L = latch cost G gates k-stage pipelined version Cost k-stage = G + Lk Latches increase hardware cost G/k G/k 26

27 Pipelining Processing 27

28 Remember: The Processing Cycle Fetch. fetch (IF) 2. Decode decode and register Evalate operand ress fetch (ID/RF) 3. Eecte/Evalate Fetch Operands memory address (EX/AG) 4. emory operand fetch (E) 5. Store/writeback Eecte reslt (WB) Store Reslt 28

29 Remember the Single-Cycle Uarch [25 ] Shift Jmp address [3 ] left PCSrc =Jmp 4 PC+4 [3 28] [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 reslt ALU PCSrc 2 =Br Taken PC address memory [3 ] [25 2] [2 6] [5 ] register register 2 Registers 2 register Zero ALU ALU reslt bcond ress Data memory [5 ] 6 32 Sign etend ALU control [5 ] ALU operation Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] T BW=~(/T) 29

30 PIPELINE DATA PATH 3

31 Dividing Into Stages 2ps IF: fetch ps 2ps 2ps ps ID: decode/ register file read EX: Eecte/ address calclation E: emory access WB: back ignore for now 4 Shift left 2 reslt PC ress memory register register 2 Registers 2 register Zero ALU ALU reslt ress Data memory RF write 6 Sign etend 32 Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 3

32 Pipeline Throghpt Program eection order Time (in instrctions) lw $, ($) Reg fetch ALU Data access Reg lw $2, 2($) 8ps 8 ns Reg fetch ALU Data access Reg lw $3, 3($) Program eection Time order (in instrctions) lw $, ($) lw $2, 2($) fetch 2ps 2 ns 8 8ps ns Reg fetch ALU Reg Data access ALU Reg Data access Reg fetch... 8ps 8 ns lw $3, 3($) 2ps 2 ns fetch Reg ALU Data access Reg 2ps 2 ns 2ps 2 ns 22ps ns 2ps 2 ns 2ps 2 ns 5-stage speedp is 4, not 5 as predicted by the ideal model. Why? 32

33 PC D +4 PC E +4 Enabling Pipelined Processing: Pipeline Registers IF: fetch ID: decode/ register file read EX: Eecte/ address calclation E: emory access WB: back No resorce is sed by more than stage! IF/ID ID/EX EX/E E/WB 4 4 reslt reslt npc Shift Shift left left 2 2 PC PC PC F ress memory memory IR D register register register 2 2 Registers 2 2 register register Sign Sign etend etend A E B E Imm E Zero Zero ALU ALU ALU ALU reslt reslt Aot B ress ress Data memory DR W Aot W Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] T/k ps T T/k ps 33

34 Pipeline performance 32 impact 6 6 Sign etend 32 Sign etend lw fetch All instrction classes mst follow the same path and timing throgh the pipeline stages. (compare lw, add) lw decode Any performance impact? lw Eection lw emory lw back IF/ID ID/EX ID/EX EX/E EX/E E/WB E/WB 4 Shift left 2 reslt PC PC ress memory register register 2 Registers 2 register 6 32 Sign etend Zero ALU ALU ALU reslt ress Data memory Data memory Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] lw decode lw back 34

35 register 6 Sign etend 6 32 Sign etend Pipelined Operation 32 Eample Clock Clock Clock 5 3 Data memory sb lw $, $, 2($) $2, $3 fetch sb lw $, $, 2($) $2, $3 decode lw $, 2($) Eection sb $, $2, $3 Eection sb lw $, $, 2($) $2, $3 emory sb lw $, $, 2($) $2, $3 back IF/ID ID/EX EX/E E/WB 4 Shift left 2 reslt PC ress memory register register 2 Zero Registers ALU ALU 2 reslt register Is life always this beatifl? 6 32 Sign etend ress Data memory Clock Clock Clock 6 sb $, $2, $3 lw $, 2($) Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] sb $, $2, $3 lw $, 2($) 35 sb $, $2, $3

36 Illstrating Pipeline Operation: Operation View t t t 2 t 3 t 4 t 5 Inst Inst Inst 2 Inst 3 Inst 4 IF ID IF EX ID IF E EX ID IF WB E EX ID IF steady state (fll pipeline) WB E EX ID IF WB E EX ID IF WB E EX ID IF 36

37 Control Points in a Pipeline PCSrc IF/ID ID/EX EX/E E/WB 4 Reg Shift left 2 reslt Branch PC ress memory register register 2 Registers 2 register [5 ] 6 Sign 32 etend ALUSrc 6 ALU control Zero ALU ALU reslt ress em Data memory em emtoreg Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] [2 6] [5 ] RegDst ALUOp Identical set of control points as the single-cycle path!! 37

38 PIPELINE CONTROL SIGNALS 38

39 Control Signals in a Pipeline For a given instrction same control signals as single-cycle, bt control signals reqired at different cycles, depending on stage Option : decode once sing the same logic as single-cycle and bffer signals ntil consmed WB Control WB EX WB IF/ID ID/EX EX/E E/WB Option 2: carry relevant instrction word/field down the pipeline and decode locally within each or in a previos stage Which one is better? 39

40 Pipelined Control Signals PCSrc Control ID/EX WB EX/E WB E/WB IF/ID EX WB PC 4 ress memory register register 2 Registers register Reg 2 Shift left 2 reslt ALUSrc Zero ALU ALU reslt Branch em ress Data memory emtoreg 6 32 [5 ] Sign etend 6 ALU control em [2 6] [5 ] RegDst ALUOp Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 4

41 PIPELINE ISSUES 4

42 Remember: An Ideal Pipeline Goal: Increase throghpt with little increase in cost (hardware cost, in case of instrction processing) Repetition of identical operations The same operation is repeated on a large nmber of different inpts (e.g., all landry loads go throgh the same steps) Repetition of independent operations No dependencies between repeated operations Uniformly partitionable sboperations Processing an be evenly divided into niform-latency sboperations (that do not share resorces) Fitting eamples: atomobile assembly line, doing landry What abot the instrction processing cycle? 42

43 Pipeline: Not An Ideal Pipeline Identical operations... NOT! different instrctions not all need the same stages Forcing different instrctions to go throgh the same pipe stages eternal fragmentation (some pipe stages idle for some instrctions) Uniform sboperations... NOT! different pipeline stages not the same latency Need to force each stage to be controlled by the same clock internal fragmentation (some pipe stages are too fast bt all take the same clock cycle time) Independent operations... NOT! instrctions are not independent of each other Need to detect and resolve inter-instrction dependencies to ensre the pipeline provides correct reslts pipeline stalls (pipeline is not always moving) 43

44 Isses in Pipeline Design Balancing work in pipeline stages How many stages and what is done in each stage Keeping the pipeline correct, moving, and fll in the presence of events that disrpt pipeline flow Handling dependences Data Control Handling resorce contention Handling long-latency (mlti-cycle) operations Handling eceptions, interrpts Advanced: Improving pipeline throghpt inimizing stalls 44

45 Pipeline Stalls Stall: A condition when the pipeline stops moving Resorce contention Dependences (between instrctions) Data Control Long-latency (mlti-cycle) operations 45

46 DEPENDENCES 46

47 A compter program Following the Von Nemann model, the program is a seqence of instrctions L: ov AX, 3 a, b. jmp l 47

48 Dependences and Their Types Also called dependency or less desirably hazard Dependences dictate ordering reqirements between instrctions Two types Data dependence Control dependence Resorce contention is sometimes called resorce dependence 48

49 Handling Resorce Contention Happens when instrctions in two pipeline stages need the same resorce Soltion : Eliminate the case of contention Dplicate the resorce or increase its throghpt E.g., se separate instrction and memories (caches) E.g., se mltiple ports for memory strctres Soltion 2: Detect the resorce contention and stall one of the contending stages Which stage do yo stall? Eample: What if yo had a single read and write port for the register file? 49

50 UNDERSTANDING DEPENDENCES 5

51 Data Dependences Types of dependences Flow dependence (tre dependence read after write) Comes from the rnning program semantic Anti dependence (write after read) Otpt dependence (write after write) Which ones case stalls in a pipelined machine? For all of them, we need to ensre semantics of the program is correct Flow dependences always need to be obeyed becase they constitte tre dependence on a vale Anti and otpt dependences eist de to limited nmber of architectral registers They are dependence on a name, not a vale 5

52 Data Dependence Types Flow dependence r 3 r op r 2 -after- r 5 r 3 op r 4 (RAW) Anti dependence r 3 r op r 2 -after- r r 4 op r 5 (WAR) Otpt-dependence r 3 r op r 2 -after- r 5 r 3 op r 4 (WAW) r 3 r 6 op r 7 52

53 register 2 Zero memory Registers ALU 2 ALU register reslt Eample otpt 6 dependence 32 Sign etend 6 32 Sign etend Clock 3 Sb sb lw $,$2,$3 $, 2($) $2, $3 PC fetch PC PC 4 Clock Clock 5 4 ress 4 ress ress memory memory memory Clock Clock 56 2 Clock 3 Clock 4 IF/ID ID/EX EX/E E/WB In lw $, 2($) Sb sb $, $,$2,$3 $2, $3 decode decode register register register 2 Registers register register 2 2 register Registers register 2 2 register Registers 2 register 6 32 Sign 6 etend 32 Sign 6 etend 32 Sign etend Shift left 2 Shift left 2 lw $, 2($) Eection reslt reslt reslt Zero ALU ALU Zero reslt ALU ALU Zero reslt ALU ALU reslt IF/ID ID/EX EX/E E/WB IF/ID ID/EX EX/E E/WB 6 Sign etend 32 sb Sb $, $,$2,$3 $2, $3 Shift left 2 Eection Data memory ress Data memory Sb sb $,$2,$3 $, $2, $3 emory lw $, 2($) emory ress ress Data memory ress Data memory Data memory sb lw Sb $, $, $,$2,$3 2($) $2, $3 back sb $, $2, $3 lw $, 2($) 53 sb $, $2, $3

54 register 2 Zero memory Registers ALU 2 ALU register reslt Eample of flow 6 dependence 32 Sign etend 6 32 Sign etend Clock 3 Sb sb lw $,$2,$ $, 2($) $2, $3 PC fetch PC PC 4 Clock Clock 5 4 ress 4 ress ress memory memory memory Clock Clock 56 2 Clock 3 Clock 4 IF/ID ID/EX EX/E E/WB In lw $, 2($) Sb sb $,$2,$ $2, $3 decode decode register register register 2 Registers register register 2 2 register Registers register 2 2 register Registers 2 register 6 32 Sign 6 etend 32 Sign 6 etend 32 Sign etend Shift left 2 Shift left 2 lw $, 2($) Eection reslt reslt reslt Zero ALU ALU Zero reslt ALU ALU Zero reslt ALU ALU reslt IF/ID ID/EX EX/E E/WB IF/ID ID/EX EX/E E/WB 6 Sign etend 32 sb Sb $,$2,$ $2, $3 Shift left 2 Eection Data memory ress Data memory Sb sb $,$2,$ $2, $3 emory lw $, 2($) emory ress ress Data memory ress Data memory Data memory sb lw Sb $, $,$2,$ 2($) $2, $3 back sb $, $2, $3 lw $, 2($) 54 sb $, $2, $3

55 Control Dependence Qestion: What shold the fetch PC be in the net cycle? Answer: The address of the net instrction All instrctions are control dependent on previos ones. Why? If the fetched instrction is a non-control-flow instrction: Net Fetch PC is the address of the net-seqential instrction Easy to determine if we know the size of the fetched instrction If the instrction that is fetched is a control-flow instrction: How do we determine the net Fetch PC? In fact, how do we know whether or not the fetched instrction is a control-flow instrction? 55

56 DEPENDENCES DETECTION 56

57 Interlocking Detection of dependence between instrctions in a pipelined processor to garantee correct eection Software based interlocking vs. Hardware based interlocking IPS acronym? icroprocessor withot Interlocked Pipeline Stages 57

58 Approaches to Dependence Detection (I) Scoreboarding Each register in register file has a Valid bit associated with it An instrction that is writing to the register resets the Valid bit An instrction in Decode stage checks if all its sorce and destination registers are Valid Yes: No need to stall No dependence No: Stall the instrction Advantage: Simple. bit per register Disadvantage: Need to stall for all types of dependences, not only flow dep. 58

59 Scoreboarding IF/ID ID/EX EX/E E/WB 4 Shift left 2 reslt PC ress memory register register 2 Registers 2 register Zero ALU ALU reslt ress Data memory 6 Sign etend 32 59

60 Not Stalling on Anti and Otpt Dependences What changes wold yo make to the scoreboard to enable this? conter for writing operation, not jst and 6

61 Approaches to Dependence Detection (II) Combinational dependence check logic Special logic that checks if any instrction in later stages is spposed to write to any sorce register of the instrction that is being decoded Yes: stall the instrction/pipeline No: no need to stall no flow dependence Advantage: No need to stall on anti and otpt dependences Disadvantage: Logic is more comple than a scoreboard Logic becomes more comple as we make the pipeline deeper and wider (flash-forward: think sperscalar eection) 6

62 DATA DEPENDENCE HANDLING 62

63 Once Yo Detect the Dependence in Hardware What do yo do afterwards? Observation: Dependence between two instrctions is detected before the commnicated vale becomes available 63

64 How to Handle Data Dependences Anti and otpt dependences are easier to handle write to the destination in one stage and in program order Flow dependences are more interesting Five fndamental ways of handling flow dependences. Detect and wait ntil vale is available in register file 2. Detect and forward/bypass to dependent instrction Detect and eliminate the dependence at the software level No need for the hardware to detect dependence (IPS NOP) Do something else (same program reorder ), (different program fine-grained mltithreading ) and No need to detect. Predict the needed vale(s), eecte speclatively, and verify 64

65 Right place to eliminate dependency Which one of the following flow dependences lead to conflicts in the 5-stage pipeline? addi r - - IF ID EX E WB addi - r - IF ID EX E WB addi - r - IF ID EX E addi - r - IF ID EX addi - r - IF? ID addi - r- IF 65

66 Safe and Unsafe ovement of Pipeline stage X j:_r k Reg j:r k _ Reg j:r k _ Reg i F j i A j i O j stage Y i:r k _ Reg i:_r k Reg i:r k _ Reg RAW Dependence WAR Dependence WAW Dependence dist(i,j) dist(x,y)?? Unsafe to keep j moving dist(i,j) > dist(x,y)?? Safe 66

67 RAW Dependence Analysis Eample s I A and I B (where I A comes before I B ) have RAW dependence iff IF R/I-Type LW SW Br J Jr ID read RF read RF read RF read RF read RF EX E WB write RF write RF I B (R/I, LW, SW, Br or JR) reads a register written by I A (R/I or LW) dist(i A, I B ) dist(id, WB) = 3 What abot WAW and WAR dependence? What abot memory dependence? 67

68 Pipeline Stall: Resolving Data Dependence t t t 2 t 3 t 4 t 5 Inst h IF ID ALU E WB Inst i i IF ID ALU E WB Inst j j IF ID ALU ID E ALU ID WB E ALU ID WB E ALU Inst k IF ID IF ALU ID IF E ALU ID IF WB E ALU ID Inst l IF ID IF ALU ID IF E ALU ID IF IF ID IF ALU ID IF i: r _ j: bbble _ r IF ID dist(i,j)= IF Stall = make the dependent instrction j: bbble _ r dist(i,j)=2 IF j: bbble _ r dist(i,j)=3 j: _ r dist(i,j)=4 wait ntil its sorce vale is available. stop all p-stream stages 2. drain all down-stream stages 68

69 Sample Assembly (P&H) for (j=i-; j>= && v[j] > v[j+]; j-=) {... } addi $s, $s, - for2tst: slti $t, $s, bne $t, $zero, eit2 sll $t, $s, 2 add $t2, $a, $t lw $t3, ($t2) lw $t4, 4($t2) slt $t, $t4, $t3 beq $t, $zero, eit2... addi $s, $s, - j for2tst eit2: 3 stalls 3 stalls 3 stalls 3 stalls 3 stalls 3 stalls 69

70 ings P&H Chapter Smith and Sohi, The icroarchitectre of Sperscalar Processors, Proceedings of the IEEE, 995 ore advanced pipelining Interrpt and eception handling Ot-of-order and sperscalar eection concepts 7

Lecture 10: Pipelined Implementations

Lecture 10: Pipelined Implementations U 8-7 S 9 L- 8-7 Lectre : Pipelined Implementations James. Hoe ept of EE, U Febrary 23, 29 nnoncements: Project is de this week idterm graded, d reslts posted Handots: H9 Homework 3 (on lackboard) Graded