Advanced Computer Architecture Pipelining

Size: px
Start display at page:

Download "Advanced Computer Architecture Pipelining"

Transcription

1 Advanced Computer Architecture Pipelining Dr. Shadrokh Samavi

2 Some slides are from the instructors resources which accompany the 6 th and previous editions of the textbook. Some slides are from David Patterson, David Culler and Krste Asanovic of UC Berkeley; Israel Koren of U Amherst, ilos Prvulovic and Sean Lee of Georgia Tech. Sources of some slides are mentioned at the bottom of the page. Please inform me if I am missing a name in the above list. Dr. Shadrokh Samavi 2 2

3 What Is Pipelining? Dr. Shadrokh Samavi 3

4 Pipeline: In a computer pipeline, each step in the pipeline completes a part of an instruction. pipe stage Throughput: how often an instruction exits the pipeline achine cycle: The time required between moving an instruction one step down the pipeline is a processor cycle Dr. Shadrokh Samavi 4

5 RISC-V instruction set architecture formats All instructions are 32 bits long. R: integer register-to-register operations. I: for loads and immediate operations. B: branches. J: jumps and link. S: stores. U: wide immediate instructions (LUI, AUIPC). Dr. Shadrokh Samavi 5

6 IF ID EXE E WB Dr. Shadrokh Samavi 6

7 IF Dr. Shadrokh Samavi 7

8 ID Dr. Shadrokh Samavi 8

9 EXE Dr. Shadrokh Samavi 9

10 E Dr. Shadrokh Samavi

11 WB Dr. Shadrokh Samavi

12 Pipeline Implementation Dr. Shadrokh Samavi 2

13 Pipeline Stage F/F Combinational Logic F/F Dr. Shadrokh Samavi 3

14 WB E/WB IF ID EX E EX/E ID/EX IF/ID INST INST 2 Dr. Shadrokh Samavi 4

15 Five-stage Pipelined Datapath Inst. Fetch Inst. Decode Exec em WB Dr. Shadrokh Samavi 5

16 fetch Example for lw instruction: Fetch (IF) IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 6

17 Example for lw instruction: Decode (ID) decode IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 7

18 Example for lw instruction: Execution (EX) Execution IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 8

19 Example for lw instruction: emory (E) emory IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 9

20 Example for lw instruction: back (WB) back IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 2

21 Example for sw instruction: emory (E) emory IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 2

22 Example for sw instruction: back (WB): do nothing back IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 22

23 Corrected Datapath (for lw) IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 23

24 Pipelining Example add $4, $5, $6 lw $3, 24($) add $2, $3, $4 sub $, $2, $3 lw $, 2($) IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 24

25 Pipeline Control PCSrc u x IF/ID ID/EX EX/E E/WB 4 Shift left 2 result Branch PC ress memory register register 2 isters 2 register [5 ] 6 Sign 32 extend Src u x 6 control Zero result ress em Data memory em emto u x [2 6] [5 ] u x Op Dst Dr. Shadrokh Samavi 25

26 Pipeline control We have 5 stages. What needs to be controlled in each stage? Fetch and PC Increment Decode / ister Fetch Execution (4 lines)» Dst» op[:]» Src emory Stage (3 lines)» Branch» em» em Back (2 lines)» emto» (note that this signal is in ID stage) Dr. Shadrokh Samavi 26

27 Pipeline Control Extend pipeline registers to include control information (created in ID) Pass control signals along just like the Execution/ress Calculation stage control lines -back stage control lines emory access stage control lines em em Dst Op Op Src Branch write R-format lw sw X X beq X X WB em to Control WB EX WB IF/ID ID/EX EX/E E/WB Dr. Shadrokh Samavi 27

28 Datapath with Control PCSrc u x Control ID/EX WB EX/E WB E/WB IF/ID EX WB PC 4 ress memory register register 2 isters 2 register Shift left 2 u x result Src Zero result Branch em ress Data memory emto u x [5 ] 6 Sign 32 extend 6 control em [2 6] [5 ] u x Dst Op Dr. Shadrokh Samavi 28

29 Datapath with Control IF: lw $, 8($) PCSrc Control ID/EX WB EX/E WB E/WB IF/ID EX WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 29

30 Datapath with Control IF: sub $, $2, $3 ID: lw $, 8($) PCSrc lw Control ID/EX WB EX/E WB E/WB IF/ID E X WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 3

31 Datapath with Control IF: and $2, $4, $5 PCSrc ID: sub $, $2, $3 EX: lw $, 8($) IF/ID sub Control ID/EX WB E X EX/E WB E/WB WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 3

32 Datapath with Control IF: or $3, $6, $7 PCSrc ID: and $2, $4, $5 EX: sub $, $2, $3 E: lw $, 8($) IF/ID and Control ID/EX WB E X EX/E WB E/WB WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 32

33 Datapath with Control IF: add $4, $8, $9 PCSrc ID: or $3, $6, $7 EX: and $2, $4, $5 E: sub $,.. WB: lw $, 8($) IF/ID or Control ID/EX WB E X EX/E WB E/WB WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 33

34 Datapath with Control IF: xxxx ID: add $4, $8, $9 EX: or $3, $6, $7 E: and $2 WB: sub $,.. PCSrc IF/ID add Control ID/EX WB E X EX/E WB E/WB WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 34

35 Datapath with Control IF: xxxx ID: xxxx EX: add $4, $8, $9 E: or $3,.. WB: and $2 PCSrc IF/ID Control ID/EX WB E X EX/E WB E/WB WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 35

36 Datapath with Control IF: xxxx ID: xxxx EX: xxxx E: add $4,.. WB: or $3 PCSrc IF/ID Control ID/EX WB E X EX/E WB E/WB WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 36

37 Datapath with Control IF: xxxx ID: xxxx EX: xxxx E: xxxx WB: add $4.. PCSrc Control ID/EX WB EX/E WB E/WB IF/ID E X WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 37

38 Simple RISC Pipeline Clock number number i IF ID EX E WB i+ IF ID EX E WB i+ 2 IF ID EX E WB i+ 3 IF ID EX E WB i+ 4 IF ID EX E WB Dr. Shadrokh Samavi 38

39 Review: Visualizing Pipelining Time (clock cycles) I n s t r. Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Ifetch Ifetch Dem Dem O r d e r Ifetch Ifetch Dem Dem Adapted from Patterson, Katz and Kubiatowicz UCB Dr. Shadrokh Samavi 39

40 Dr. Shadrokh Samavi 4

41 Example: Consider the unpipelined processor in the previous section. Assume that it has a 4 GHz clock (or a.5 ns clock cycle) and that it uses four cycles for operations and branches and five cycles for memory operations. Assume that the relative frequencies of these operations are 4%, 2%, and 4%, respectively. Suppose that due to clock skew and setup, pipelining the processor adds. ns of overhead to the clock. Ignoring any latency impact, how much speedup in the instruction execution rate will we gain from a pipeline? Answer: The average instruction execution time on the unpipelined processor is In the pipelined implementation, the clock must run at the speed of the slowest stage plus overhead, which will be.5+. or.6 ns; this is the average instruction execution time. Thus, the speedup from pipelining is The. ns overhead essentially establishes a limit on the effectiveness of pipelining. If the overhead is not affected by changes in the clock cycle, Amdahl s Law tells us that the overhead limits the speedup. Dr. Shadrokh Samavi 4

42 Pipeline Hazards Dr. Shadrokh Samavi 42

43 Hazards: circumstances that would cause incorrect execution if next instruction were launched Structural hazards: Attempting to use the same hardware to do two different things at the same time Data hazards: depends on result of prior instruction still in the pipeline Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps). Dr. Shadrokh Samavi 43

44 Average instruction time unpipelined Speedup from pipelining = Average instruction time pipelined CPI unpipelined = Clock cycle unpipelined CPI pipelined Clock cycle pipelined - CPI unpipelined = Clock cycle unpipelined CPI pipelined Clock cycle pipelined Assuming same Clock cycle for pipelined & unpipelined CPI pipelined = Ideal CPI + Pipeline stall clock cycles per instruction = + Pipeline stall clock cycles per instruction Speedup= Speedup = Speedup from pipelining = CPI unpipelined Pipeline stall cycles per instruction Pipeline depth Pipeline stall cycles per instruction CPI unpipelined CPI pipelined Clock cycle unpipelined Clock cycle pipelined = Pipeline stall cycles per instruction Clock cycle unpipelined Clock cycle pipelined Dr. Shadrokh Samavi 44

45 Example: Structural Hazard Time (clock cycles) Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t r. O r d e r Load Ifetch Instr Instr 2 Instr 3 Instr 4 Ifetch Ifetch Dem Ifetch Dem Dem Dem Structural Hazard Dr. Shadrokh Samavi 45

46 Resolving structural hazards Definition of structural hazard: attempt to use same hardware for two different things at the same time Solution : Wait (stall) must detect the hazard must have mechanism to stall Solution 2: Use more hardware Dr. Shadrokh Samavi 46

47 Detecting and Resolving Structural Hazard Time (clock cycles) Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t r. O r d e r Load Ifetch Instr Instr 2 Stall Instr 3 Ifetch Ifetch Dem Dem Dem Bubble Bubble Bubble Bubble Bubble Ifetch Dem Adapted from Patterson, Katz and Kubiatowicz UCB Dr. Shadrokh Samavi 47

48 Role of ISA in Structural Hazard Resolution Simple to determine the sequence of resources used by an instruction opcode tells it all Uniformity in the resource usage Compare IPS to IA32? IPS approach => all instructions flow through same 5-stage pipelining Dr. Shadrokh Samavi 48

49 Time (clock cycles) Data Hazards IF ID/RF EX E WB I n s t r. add r,r2,r3 sub r4,r,r3 Ifetch Ifetch Dem Dem O r d e r and r6,r,r7 or r8,r,r9 xor r,r,r Ifetch Ifetch Ifetch Dem Dem Dem Adapted from Patterson, Katz and Kubiatowicz UCB Dr. Shadrokh Samavi 49

50 Program execution order (in instructions) Time (in clock cycles) CC CC2 CC3 CC4 CC5 CC6 ADD R, R2, R3 I LW R4, (R) I D D SW 2(R), R4 I D Stores require an operand during E, and forwarding of that operand is shown here Dr. Shadrokh Samavi 5

51 Pipeline w/o forwarding Inst. Fetch Inst. Decode Exec em WB u x IF/ID ID/EX EX/E E/WB 4 Shift left 2 result PC ress memory register register 2 isters 2 register u x Zero result ress Data memory u x 6 Sign extend 32 Dr. Shadrokh Samavi 5

52 Forwarding (from EX/E) Inst. Decode Exec em ID/EX EX/E E/WB ister File UX UX Data emory UX Dr. Shadrokh Samavi 52

53 Forwarding (from E/WB) ID/EX EX/E E/WB ister File UX Data emory UX UX Dr. Shadrokh Samavi 53

54 Control unit for forwarding ID/EX EX/E E/WB ister File UX UX Data emory UX Rd Rt UX 5 Rt Rs Forwarding Unit EX/E Rd 5 E/WB Rd Dr. Shadrokh Samavi 54

55 Forwarding of Pipeline register containing source instruction EX/E Opcode of source instruction Pipeline register containing destination instruction ID/EX Opcode of destination instruction ister-register, immediate, load, store, branch Destination of the forwarded result Top input EX/E isterregister, ID/EX ister-register, Bottom input E/WB isterregister, ID/EX ister-register, immediate, load, store, branch isterregister, Comparison (if equal then forward) EX/E.IR 6..2 == ID/EX.IR 6.. EX/E.IR 6..2 == ID/EX.IR..5 E/WB.IR 6..2 == ID/EX.IR 6.. E/WB ID/EX ister-register Bottom E/WB.IR 6..2 == input ID/EX.IR..5 EX/E EX/E E/WB E/WB isterregister, immediate immediate immediate immediate ID/EX E/WB Load ID/EX ister-register immediate, load, store, branch EX/E.IR..5 == ID/EX.IR 6.. ID/EX ister-register Bottom EX/E.IR..5 == input ID/EX.IR..5 ID/EX ister-register Top E/WB.IR..5 == immediate, load, input ID/EX.IR 6.. store, branch ID/EX ister-register Bottom E/WB.IR..5 == input ID/EX.IR..5 ister-register immediate, load, store, branch Top input Top input Top input E/WB.IR..5 == ID/EX.IR 6.. E/WB Load ID/EX ister-register Bottom E/WB.IR..5 == input ID/EX.IR..5 Dr. Shadrokh Samavi 55

56 Data Hazards Classification Resource Objects (R.O.): all addressable locations Data Objects (D.O.): content of resource objects D(I): Domain of instruction I = all R.O. that their D.O. effect the operation of I. R(I): Range of instruction I = all R.O. that their D.O. are effected by the execution of I. Dr. Shadrokh Samavi 56

57 A RAW hazard exists on register if R ( i ) D( j ) A WAW hazard exists on register if R( i ) R(j ) A WAR hazard exists on register if D( i ) R (j ) Dr. Shadrokh Samavi 57

58 D(I) I write R(I) D(J) J D(J) RAW D(I) D(J) I write J write R(I) R(J) WAW D(J) J write D(I) R(J) I D(I) WAR Dr. Shadrokh Samavi 58

59 Situation No dependence Dependence requiring stall Dependence overcome by forwarding Dependence with accesses in order Example code sequence LW R,45(R2) ADD R5,R6,R7 SUB R8,R6,R7 OR R9,R6,R7 LW R,45(R2) ADD R5,R,R7 SUB R8,R6,R7 OR R9,R6,R7 LW R,45(R2) ADD R5,R6,R7 SUB R8,R,R7 OR R9,R6,R7 LW R,45(R2) ADD R5,R6,R7 SUB R8,R6,R7 OR R9,R,R7 Action No hazard possible because no dependence exists on R in the immediately following. three instructions Comparators detect the use of R in the ADD and stall the ADD (and SUB and OR) before the ADD begins EX. Comparators detect use of R in SUB and forward result of load to in time for SUB to begin EX. No action required because the read of R by OR occurs in the second half of the ID phase, while the write of the loaded occurred in the first half. Situations that the pipeline hazard detection hardware can see by comparing the destination and sources of adjacent instructions. Dr. Shadrokh Samavi 59

60 Three Generic Data Hazards After (RAW) Instr J tries to read operand before Instr I writes it I: add r,r2,r3 J: sub r4,r,r3 Caused by a Data Dependence (in compiler nomenclature). This hazard results from an actual need for communication. Dr. Shadrokh Samavi 6

61 Three Generic Data Hazards After (WAR) Instr J writes operand before Instr I reads it I: sub r4,r,r3 J: add r,r2,r3 K: mul r6,r,r7 Called an anti-dependence by compiler writers. This results from reuse of the name r. Can t happen in IPS 5 stage pipeline because: All instructions take 5 stages, and s are always in stage 2, and s are always in stage 5 Dr. Shadrokh Samavi 6

62 Three Generic Data Hazards After (WAW) Instr J writes operand before Instr I writes it. I: sub r,r4,r3 J: add r,r2,r3 K: mul r6,r,r7 Called an output dependence by compiler writers This also results from the reuse of name r. Can t happen in IPS 5 stage pipeline because: All instructions take 5 stages, and s are always in stage 5 Will see WAR and WAW in later more complicated pipes Dr. Shadrokh Samavi 62

63 Forwarding to Avoid Data Hazard Time (clock cycles) I n s t r. add r,r2,r3 sub r4,r,r3 Ifetch Ifetch Dem Dem O r d e r and r6,r,r7 or r8,r,r9 Ifetch Ifetch Dem Dem xor r,r,r Ifetch Dem Dr. Shadrokh Samavi 63

64 Data Hazard Even with Forwarding Time (clock cycles) I n s t r. lw r, (r2) sub r4,r,r6 Ifetch Ifetch Dem Dem O r d e r and r6,r,r7 or r8,r,r9 Ifetch Ifetch Dem Dem Adapted from Patterson, Katz and Kubiatowicz UCB Dr. Shadrokh Samavi 64

65 Resolving this load hazard ing hardware?... not Detection? Compilation techniques? What is the cost of load delays? Dr. Shadrokh Samavi 65

66 Resolving the Load Data Hazard Time (clock cycles) I n s t r. O r d e r lw r, (r2) sub r4,r,r6 and r6,r,r7 Ifetch Ifetch Ifetch Dem Bubble Bubble Dem Dem or r8,r,r9 Bubble Ifetch Dem How is this different from the instruction issue stall? Dr. Shadrokh Samavi 66

67 Software Scheduling to Avoid Load Hazards Try producing fast code for a = b + c; d = e f; assuming a, b, c, d,e, and f in memory. Slow code: LW Rb,b Fast code: LW Rb, b LW Rc,c LW Rc, c ADD Ra,Rb,Rc LW Re,e SW a,ra ADD Ra, Rb, Rc LW Re,e LW Rf, f LW Rf,f SW a,ra SUB Rd,Re,Rf SUB Rd, Re, Rf SW d,rd SW d,rd Dr. Shadrokh Samavi 67

68 Set Connection What is exposed about this organizational hazard in the instruction set? k cycle delay? bad, CPI is not part of ISA k instruction slot delay load should not be followed by use of the value in the next k instructions Nothing, but code can reduce run-time delays IPS did the transformation in the assembler Dr. Shadrokh Samavi 68

69 23% 24% 2% 2% Fraction of loads that cause a stall 45% 4% 4% 35% 3% 25% 24% 2% 5% % 2% % % 5% 4% % compress eqntott espresso gcc li doduc ear Benchmark hydro2d mdljdp su2cor Dr. Shadrokh Samavi 69

70 Control Hazard on Branches => Three Stage Stall : beq r,r3,36 Ifetch Dem 4: and r2,r3,r5 Ifetch Dem 8: or r6,r,r7 Ifetch Dem 22: add r8,r,r9 Ifetch Dem 36: xor r,r,r Ifetch Dem Dr. Shadrokh Samavi 7

71 Example: Branch Stall Impact If 3% branch, Stall 3 cycles significant Two part solution: Determine branch taken or not sooner, AND Compute taken branch address earlier IPS branch tests if register = or IPS Solution: ove Zero test to ID/RF stage er to calculate new PC in ID/RF stage clock cycle penalty for branch versus 3 Dr. Shadrokh Samavi 7

72 compress 3% 3% % eqntott 2% 2% 22% Benchmark espresso gcc li doduc ear % 2% 2% 4% 3% 4% 4% 4% 4% 6% 6% 8% % % 2% The frequency of instructions that may change the PC hydro2d % 2% % mdljdp su2cor % % 2% % % 9% % 5% % 5% 2% 25% Percentage of instructions executed Forward conditional branches Backward conditional branches Unconditional branches Dr. Shadrokh Samavi 72

73 Four Branch Hazard Alternatives #: Stall until branch direction is clear #2: Predict Branch Not Taken Execute successor instructions in sequence Squash instructions in pipeline if branch actually taken Advantage of late pipeline state update 47% IPS branches not taken on average PC+4 already calculated, so use it to get next instruction #3: Predict Branch Taken 53% IPS branches taken on average But haven t calculated branch target address in IPS» IPS still incurs cycle branch penalty» Other machines: branch target known before outcome Dr. Shadrokh Samavi 73

74 Four Branch Hazard Alternatives #4: Delayed Branch Define branch to take place AFTER a following instruction branch instruction sequential successor sequential successor 2... sequential successor n... branch target if taken Branch delay of length n slot delay allows proper decision and branch target address in 5 stage pipeline IPS uses this Dr. Shadrokh Samavi 74

75 Hardware interlock- No-op Dr. Shadrokh Samavi 75

76 Delayed Branch Where to get instructions to fill branch delay slot? Before branch instruction From the target address: only valuable when branch taken From fall through: only valuable when branch not taken Canceling branches allow more slots to be filled Compiler effectiveness for single branch delay slot: Fills about 6% of branch delay slots About 8% of instructions executed in branch delay slots useful in computation About 5% (6% x 8%) of slots usefully filled Delayed Branch downside: 7-8 stage pipelines, multiple instructions issued per clock (superscalar) Dr. Shadrokh Samavi 76

77 (a) From before (b) From target (c) From fall through ADD R, R2, R3 if R2 = then Delay slot SUB R4, R5, R6 ADD R, R2, R3 if R = then Delay slot ADD R, R2, R3 if R = then Delay slot OR R7, R8, R9 SUB R4, R5, R6 Becomes Becomes Becomes if R2 = then ADD R, R2, R3 SUB R4, R5, R6 ADD R, R2, R3 if R = then SUB R4, R5, R6 ADD R, R2, R3 if R = then OR R7, R8, R9 SUB R4, R5, R6 Dr. Shadrokh Samavi 77

78 Canceling or Nullifying Branch. In a canceling branch, the instruction includes the direction that the branch was predicted. When the branch behaves as predicted, the instruction in the branch-delay slot is simply executed as it would normally be with a delayed branch. When the branch is incorrectly predicted, the instruction in the branch-delay slot is simply turned into a no-op. Dr. Shadrokh Samavi 78

79 Delayed-branch scheduling schemes Scheduling strategy Requirements Improves performance when? (a) From before. Branch must not depend on the rescheduled instructions Always. (b) From target (c) From fall through ust be OK to execute rescheduled instructions if branch is not taken. ay need to duplicate instructions. ust be OK to execute instructions if branch is taken. When branch is taken. ay enlarge program if instructions are duplicated. When branch is not taken. Dr. Shadrokh Samavi 79

80 Branch is NOT TAKEN (misprediction) Branch is TAKEN ( predicted correctly) The behavior of a predicted-taken canceling branch depends on whether the branch is taken or not. Dr. Shadrokh Samavi 8

81 Delayed and Canceling Delay Branches Dr. Shadrokh Samavi 8

82 Overall costs of a variety of branch schemes Dr. Shadrokh Samavi 82

83 For an R4-style pipeline, it takes 3 pipeline stages before the branch target address is known & more cycle to evaluate branch condition. This leads to the following branch penalties: Find the effective addition to the CPI arising from branches given that the relative frequency of unconditional, conditional untaken, and conditional taken branches are 4%, 6%, and %, respectively. Dr. Shadrokh Samavi 83

84 Control Hazards. Find out whether the branch is taken or not taken earlier in the pipeline. 2. Compute the taken PC (i.e., the address of the branch target) earlier. Branch instruction IF ID EX E WB Branch successor IF stall stall IF ID EX E WB Branch successor + IF ID EX E WB Branch successor + 2 IF ID EX E Branch successor + 3 IF ID EX Branch successor + 4 IF ID Branch successor + 5 IF Dr. Shadrokh Samavi 84

85 ultiple Pipelines Dr. Shadrokh Samavi 85

86 Superscalar 2 3 s Superscalar of degree 3 Clock cycles Dr. Shadrokh Samavi 86

87 Superpipelined s Clock cycles 23 s Superpipelined of degree 3 Clock cycles Dr. Shadrokh Samavi 87

88 ulticycle Operations Dr. Shadrokh Samavi 88

89 The IPS pipeline with three additional unpipelined, floating-point, functional units. Dr. Shadrokh Samavi 89

90 Dr. Shadrokh Samavi 9

91 Latency : the number of intervening cycles between an instruction that produces a result and an instruction that uses the result. The initiation or repeat interval: the number of cycles that must elapse between issuing two operations of a given type. The pipe stages are shown in the order in which they are used for any operation. The notation S+A indicates a clock cycle in which both the S and A stages are used. The notation D 28 indicates that the D stage is used 28 times in a row. Dr. Shadrokh Samavi 9

92 An FP multiply issued at clock is followed by a single FP add issued between clocks and 7. The second column indicates whether an instruction of the specified type stalls when it is issued n cycles later, where n is the clock cycle number in which the U stage of the second instruction occurs. The stage or stages that cause a stall are in bold. Note that this table deals with only the interaction between the multiply and one add issued between clocks and 7. In this case, the add will stall if it is issued four or five cycles after the multiply; otherwise, it issues without stalling. Notice that the add will be stalled for two cycles if it issues in cycle 4 because on the next clock cycle it will still conflict with the multiply; if, however, the add issues in cycle 5, it will stall for only clock cycle, because that will eliminate the conflicts. Dr. Shadrokh Samavi 92

93 A multiply issuing after an add can always proceed without stalling, because the shorter instruction clears the shared pipeline stages before the longer instruction reaches them. Dr. Shadrokh Samavi 93

94 Support ultiple FP Operations 2 3 E X 4 Integer Unit FP multiplier IF ID FP add E WB A A A A Complicate bypass FP divider (non-pipelined) Potential structural hazard ultiple (FP) instructions can complete at the same time RF might need to be multi-ported Ordering issue, who gets to update the register? Out-of-order completion/retirement: Precise exception issue Hsien-Hsin Dr. Shadrokh S. Lee, Georgia SamaviInstitute of Technology 94

95 Bypass/Forwarding Clock Cycles L.D F4,(R2) IF ID EX WB UL.D F,F4,F6 IF ID S WB ADD.D F2,F,F8 IF S ID S S S S S S A A2 A3 A4 WB S.D F2,(R2) IF S S S S S S ID EX S S S WB Hsien-Hsin Dr. Shadrokh S. Lee, Georgia SamaviInstitute of Technology 95

96 The pipeline timing of a set of independent FP operations. A typical FP code sequence showing the stalls arising from RAW hazards. Dr. Shadrokh Samavi 96

97 Three instructions want to perform a write back to the FP register file simultaneously, as shown in clock cycle. Dr. Shadrokh Samavi 97

98 Dr. Shadrokh Samavi 98

99 Dr. Shadrokh Samavi 99

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics

More information

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

Pipelining: Hazards Ver. Jan 14, 2014

Pipelining: Hazards Ver. Jan 14, 2014 POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Pipelining: Hazards Ver. Jan 14, 2014 Marco D. Santambrogio: marco.santambrogio@polimi.it Simone Campanoni:

More information

Overview of Pipelining

Overview of Pipelining EEC 58 Compter Architectre Pipelining Department of Electrical Engineering and Compter Science Cleveland State University Fndamental Principles Overview of Pipelining Pipelined Design otivation: Increase

More information

CSE 533: Advanced Computer Architectures. Pipelining. Instructor: Gürhan Küçük. Yeditepe University

CSE 533: Advanced Computer Architectures. Pipelining. Instructor: Gürhan Küçük. Yeditepe University CSE 533: Advanced Computer Architectures Pipelining Instructor: Gürhan Küçük Yeditepe University Lecture notes based on notes by Mark D. Hill and John P. Shen Updated by Mikko Lipasti Pipelining Forecast

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

Pipeline Review. Review

Pipeline Review. Review Pipeline Review Review Covered in EECS2021 (was CSE2021) Just a reminder of pipeline and hazards If you need more details, review 2021 materials 1 The basic MIPS Processor Pipeline 2 Performance of pipelining

More information

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP Overview Appendix A Pipelining: Basic and Intermediate Concepts Basics of Pipelining Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 1 2 Unpipelined

More information

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University COSC4201 Pipelining Prof. Mokhtar Aboelaze York University 1 Instructions: Fetch Every instruction could be executed in 5 cycles, these 5 cycles are (MIPS like machine). Instruction fetch IR Mem[PC] NPC

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

Appendix A. Overview

Appendix A. Overview Appendix A Pipelining: Basic and Intermediate Concepts 1 Overview Basics of Pipelining Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 2 1 Unpipelined

More information

Outline. Pipelining basics The Basic Pipeline for DLX & MIPS Pipeline hazards. Handling exceptions Multi-cycle operations

Outline. Pipelining basics The Basic Pipeline for DLX & MIPS Pipeline hazards. Handling exceptions Multi-cycle operations Pipelining 1 Outline Pipelining basics The Basic Pipeline for DLX & MIPS Pipeline hazards Structural Hazards Data Hazards Control Hazards Handling exceptions Multi-cycle operations 2 Pipelining basics

More information

What do we have so far? Multi-Cycle Datapath (Textbook Version)

What do we have so far? Multi-Cycle Datapath (Textbook Version) What do we have so far? ulti-cycle Datapath (Textbook Version) CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instruction being processed in datapath How to lower CPI further? #1 Lec # 8 Summer2001

More information

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle

More information

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts CPE 110408443 Computer Architecture Appendix A: Pipelining: Basic and Intermediate Concepts Sa ed R. Abed [Computer Engineering Department, Hashemite University] Outline Basic concept of Pipelining The

More information

Designing a Pipelined CPU

Designing a Pipelined CPU Designing a Pipelined CPU CSE 4, S2'6 Review -- Single Cycle CPU CSE 4, S2'6 Review -- ultiple Cycle CPU CSE 4, S2'6 Review -- Instruction Latencies Single-Cycle CPU Load Ifetch /Dec Exec em Wr ultiple

More information

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding

More information

Chapter 4 (Part II) Sequential Laundry

Chapter 4 (Part II) Sequential Laundry Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30

More information

Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.SP96 1 Review: Evaluating Branch Alternatives Two part solution: Determine

More information

Appendix C: Pipelining: Basic and Intermediate Concepts

Appendix C: Pipelining: Basic and Intermediate Concepts Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)

More information

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch branch taken Revisiting Branch Hazard Solutions Stall Predict Not Taken Predict Taken Branch Delay Slot Branch I+1 I+2 I+3 Predict Not Taken branch not taken Branch I+1 IF (bubble) (bubble) (bubble) (bubble)

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the

More information

Pipelining: Basic and Intermediate Concepts

Pipelining: Basic and Intermediate Concepts Appendix A Pipelining: Basic and Intermediate Concepts 1 Overview Basics of fpipelining i Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 2 Unpipelined

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Lecture 2: Processor and Pipelining 1

Lecture 2: Processor and Pipelining 1 The Simple BIG Picture! Chapter 3 Additional Slides The Processor and Pipelining CENG 6332 2 Datapath vs Control Datapath signals Control Points Controller Datapath: Storage, FU, interconnect sufficient

More information

MIPS An ISA for Pipelining

MIPS An ISA for Pipelining Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch

More information

第三章 Instruction-Level Parallelism and Its Dynamic Exploitation. 陈文智 浙江大学计算机学院 2014 年 10 月

第三章 Instruction-Level Parallelism and Its Dynamic Exploitation. 陈文智 浙江大学计算机学院 2014 年 10 月 第三章 Instruction-Level Parallelism and Its Dynamic Exploitation 陈文智 chenwz@zju.edu.cn 浙江大学计算机学院 2014 年 10 月 1 3.3 The Major Hurdle of Pipelining Pipeline Hazards 本科回顾 ------- Appendix A.2 3.3.1 Taxonomy

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,

More information

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard Computer Architecture and Organization Pipeline: Control Hazard Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 15.1 Pipelining Outline Introduction

More information

Improving Performance: Pipelining

Improving Performance: Pipelining Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory Pipelining the Idea Similar to assembly line in a factory Divide instruction into smaller tasks Each task is performed on subset of resources Overlap the execution of multiple instructions by completing

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy

More information

Improve performance by increasing instruction throughput

Improve performance by increasing instruction throughput Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access

More information

Review: Evaluating Branch Alternatives. Lecture 3: Introduction to Advanced Pipelining. Review: Evaluating Branch Prediction

Review: Evaluating Branch Alternatives. Lecture 3: Introduction to Advanced Pipelining. Review: Evaluating Branch Prediction Review: Evaluating Branch Alternatives Lecture 3: Introduction to Advanced Pipelining Two part solution: Determine branch taken or not sooner, AND Compute taken branch address earlier Pipeline speedup

More information

Lecture 6: Pipelining

Lecture 6: Pipelining Lecture 6: Pipelining i CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and other

More information

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu www.secs.oakland.edu/~yan

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

The Big Picture Problem Focus S re r g X r eg A d, M lt2 Sub u, Shi h ft Mac2 M l u t l 1 Mac1 Mac Performance Focus Gate Source Drain BOX

The Big Picture Problem Focus S re r g X r eg A d, M lt2 Sub u, Shi h ft Mac2 M l u t l 1 Mac1 Mac Performance Focus Gate Source Drain BOX Appendix A - Pipelining 1 The Big Picture SPEC f2() { f3(s2, &j, &i); *s2->p = 10; i = *s2->q + i; } Requirements Algorithms Prog. Lang./OS ISA f1 f2 f5 f3 s q p fp j f3 f4 i1: ld r1, b i2: ld r2,

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Pipelining. Maurizio Palesi

Pipelining. Maurizio Palesi * Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online

More information

Pipelining: Basic Concepts

Pipelining: Basic Concepts Pipelining: Basic Concepts Prof. Cristina Silvano Dipartimento di Elettronica e Informazione Politecnico di ilano email: silvano@elet.polimi.it Outline Reduced Instruction Set of IPS Processor Implementation

More information

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study Complications With Long Instructions So far, all MIPS instructions take 5 cycles But haven't talked

More information

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed Lecture 3: General Purpose Processor Design CSCE 665 Advanced VLSI Systems Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, tet etc included in slides are borrowed from various books, websites,

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S Lecture 6 MIPS R4000 and Instruction Level Parallelism Computer Architectures 521480S Case Study: MIPS R4000 (200 MHz, 64-bit instructions, MIPS-3 instruction set) 8 Stage Pipeline: first half of fetching

More information

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts CS359: Computer Architecture Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts Yanyan Shen Department of Computer Science and Engineering Shanghai Jiao Tong University Parallel

More information

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ... CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2010 Lecture 2: Review of Metrics and Pipelining 563 L02.1 Fall 2010 Review from Last Time Computer Architecture >> instruction sets Computer Architecture skill

More information

CSE 502 Graduate Computer Architecture. Lec 4-6 Performance + Instruction Pipelining Review

CSE 502 Graduate Computer Architecture. Lec 4-6 Performance + Instruction Pipelining Review CSE 502 Graduate Computer Architecture Lec 4-6 Performance + Instruction Pipelining Review Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S. Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing (2) Pipeline Performance Assume time for stages is ps for register read or write

More information

CS422 Computer Architecture

CS422 Computer Architecture CS422 Computer Architecture Spring 2004 Lecture 07, 08 Jan 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html Recall: Data Hazards Have to be detected dynamically,

More information

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017! Advanced Topics on Heterogeneous System Architectures Pipelining! Politecnico di Milano! Seminar Room @ DEIB! 30 November, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Outline!

More information

Execution/Effective address

Execution/Effective address Pipelined RC 69 Pipelined RC Instruction Fetch IR mem[pc] NPC PC+4 Instruction Decode/Operands fetch A Regs[rs]; B regs[rt]; Imm sign extended immediate field Execution/Effective address Memory Ref ALUOutput

More information

CSE 502 Graduate Computer Architecture. Lec 3-5 Performance + Instruction Pipelining Review

CSE 502 Graduate Computer Architecture. Lec 3-5 Performance + Instruction Pipelining Review CSE 502 Graduate Computer Architecture Lec 3-5 Performance + Instruction Pipelining Review Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from

More information

DLX Unpipelined Implementation

DLX Unpipelined Implementation LECTURE - 06 DLX Unpipelined Implementation Five cycles: IF, ID, EX, MEM, WB Branch and store instructions: 4 cycles only What is the CPI? F branch 0.12, F store 0.05 CPI0.1740.83550.174.83 Further reduction

More information

COSC4201. Prof. Mokhtar Aboelaze York University

COSC4201. Prof. Mokhtar Aboelaze York University COSC4201 Chapter 3 Multi Cycle Operations Prof. Mokhtar Aboelaze York University Based on Slides by Prof. L. Bhuyan (UCR) Prof. M. Shaaban (RTI) 1 Multicycle Operations More than one function unit, each

More information

Basic Pipelining Concepts

Basic Pipelining Concepts Basic ipelining oncepts Appendix A (recommended reading, not everything will be covered today) Basic pipelining ipeline hazards Data hazards ontrol hazards Structural hazards Multicycle operations Execution

More information

Updated Exercises by Diana Franklin

Updated Exercises by Diana Franklin C-82 Appendix C Pipelining: Basic and Intermediate Concepts Updated Exercises by Diana Franklin C.1 [15/15/15/15/25/10/15] Use the following code fragment: Loop: LD R1,0(R2) ;load R1 from address

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns Chapter Si Pipelining Improve perfomance by increasing instruction throughput eecutionı Time lw $, ($) 2 6 8 2 6 8 access lw $2, 2($) 8 ns access lw $3, 3($) eecutionı Time lw $, ($) lw $2, 2($) 2 ns 8

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches

More information

CSE 502 Graduate Computer Architecture. Lec 3-5 Performance + Instruction Pipelining Review

CSE 502 Graduate Computer Architecture. Lec 3-5 Performance + Instruction Pipelining Review CSE 502 Graduate Computer Architecture Lec 3-5 Performance + Instruction Pipelining Review Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from

More information

Lecture 4: Introduction to Advanced Pipelining

Lecture 4: Introduction to Advanced Pipelining Lecture 4: Introduction to Advanced Pipelining Prepared by: Professor David A. Patterson Computer Science 252, Fall 1996 Edited and presented by : Prof. Kurt Keutzer Computer Science 252, Spring 2000 KK

More information

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1 ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 6 Pipelining Part 1 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)

More information

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards CISC 662 Gaduate Compute Achitectue Lectue 6 - Hazads Michela Taufe http://www.cis.udel.edu/~taufe/teaching/cis662f07 Powepoint Lectue Notes fom John Hennessy and David Patteson s: Compute Achitectue,

More information

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations Principles of pipelining Pipelining Simple pipelining Structural Hazards Data Hazards Control Hazards Interrupts Multicycle operations Pipeline clocking ECE D52 Lecture Notes: Chapter 3 1 Sequential Execution

More information

Appendix C. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Appendix C. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002, Appendix C Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Pipelining Multiple instructions are overlapped in execution Each is in a different stage Each stage is called

More information

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight Pipelining: Its Natural! Chapter 3 Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes A B C D Dryer takes 40 minutes Folder

More information

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts C-2 Appendix C Pipelining: Basic and Intermediate Concepts C.1 Introduction Many readers of this text will have covered the basics of pipelining in another text (such as our more basic text Computer Organization

More information

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not

More information

ECE154A Introduction to Computer Architecture. Homework 4 solution

ECE154A Introduction to Computer Architecture. Homework 4 solution ECE154A Introduction to Computer Architecture Homework 4 solution 4.16.1 According to Figure 4.65 on the textbook, each register located between two pipeline stages keeps data shown below. Register IF/ID

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Processor Architecture

Processor Architecture Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)

More information

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions

More information

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson

More information

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &

More information

14:332:331 Pipelined Datapath

14:332:331 Pipelined Datapath 14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Appendix C Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure C.2 The pipeline can be thought of as a series of data paths shifted in time. This shows

More information

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining 2014-1-30 John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 otorola 68000 Next week

More information

Complications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow?

Complications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow? Complications with long instructions CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study So far, all MIPS instructions take 5 cycles But haven't talked

More information

09-1 Multicycle Pipeline Operations

09-1 Multicycle Pipeline Operations 09-1 Multicycle Pipeline Operations 09-1 Material may be added to this set. Material Covered Section 3.7. Long-Latency Operations (Topics) Typical long-latency instructions: floating point Pipelined v.

More information