Advanced Computer Architecture Pipelining
|
|
- Theodore Cooper
- 5 years ago
- Views:
Transcription
1 Advanced Computer Architecture Pipelining Dr. Shadrokh Samavi
2 Some slides are from the instructors resources which accompany the 6 th and previous editions of the textbook. Some slides are from David Patterson, David Culler and Krste Asanovic of UC Berkeley; Israel Koren of U Amherst, ilos Prvulovic and Sean Lee of Georgia Tech. Sources of some slides are mentioned at the bottom of the page. Please inform me if I am missing a name in the above list. Dr. Shadrokh Samavi 2 2
3 What Is Pipelining? Dr. Shadrokh Samavi 3
4 Pipeline: In a computer pipeline, each step in the pipeline completes a part of an instruction. pipe stage Throughput: how often an instruction exits the pipeline achine cycle: The time required between moving an instruction one step down the pipeline is a processor cycle Dr. Shadrokh Samavi 4
5 RISC-V instruction set architecture formats All instructions are 32 bits long. R: integer register-to-register operations. I: for loads and immediate operations. B: branches. J: jumps and link. S: stores. U: wide immediate instructions (LUI, AUIPC). Dr. Shadrokh Samavi 5
6 IF ID EXE E WB Dr. Shadrokh Samavi 6
7 IF Dr. Shadrokh Samavi 7
8 ID Dr. Shadrokh Samavi 8
9 EXE Dr. Shadrokh Samavi 9
10 E Dr. Shadrokh Samavi
11 WB Dr. Shadrokh Samavi
12 Pipeline Implementation Dr. Shadrokh Samavi 2
13 Pipeline Stage F/F Combinational Logic F/F Dr. Shadrokh Samavi 3
14 WB E/WB IF ID EX E EX/E ID/EX IF/ID INST INST 2 Dr. Shadrokh Samavi 4
15 Five-stage Pipelined Datapath Inst. Fetch Inst. Decode Exec em WB Dr. Shadrokh Samavi 5
16 fetch Example for lw instruction: Fetch (IF) IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 6
17 Example for lw instruction: Decode (ID) decode IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 7
18 Example for lw instruction: Execution (EX) Execution IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 8
19 Example for lw instruction: emory (E) emory IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 9
20 Example for lw instruction: back (WB) back IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 2
21 Example for sw instruction: emory (E) emory IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 2
22 Example for sw instruction: back (WB): do nothing back IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 22
23 Corrected Datapath (for lw) IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 23
24 Pipelining Example add $4, $5, $6 lw $3, 24($) add $2, $3, $4 sub $, $2, $3 lw $, 2($) IF/ID ID/EX EX/E E/WB 4 Shift left2 result PC ress memory register register 2 isters 2 register Zero result ress Data memory 6 Sign extend 32 Dr. Shadrokh Samavi 24
25 Pipeline Control PCSrc u x IF/ID ID/EX EX/E E/WB 4 Shift left 2 result Branch PC ress memory register register 2 isters 2 register [5 ] 6 Sign 32 extend Src u x 6 control Zero result ress em Data memory em emto u x [2 6] [5 ] u x Op Dst Dr. Shadrokh Samavi 25
26 Pipeline control We have 5 stages. What needs to be controlled in each stage? Fetch and PC Increment Decode / ister Fetch Execution (4 lines)» Dst» op[:]» Src emory Stage (3 lines)» Branch» em» em Back (2 lines)» emto» (note that this signal is in ID stage) Dr. Shadrokh Samavi 26
27 Pipeline Control Extend pipeline registers to include control information (created in ID) Pass control signals along just like the Execution/ress Calculation stage control lines -back stage control lines emory access stage control lines em em Dst Op Op Src Branch write R-format lw sw X X beq X X WB em to Control WB EX WB IF/ID ID/EX EX/E E/WB Dr. Shadrokh Samavi 27
28 Datapath with Control PCSrc u x Control ID/EX WB EX/E WB E/WB IF/ID EX WB PC 4 ress memory register register 2 isters 2 register Shift left 2 u x result Src Zero result Branch em ress Data memory emto u x [5 ] 6 Sign 32 extend 6 control em [2 6] [5 ] u x Dst Op Dr. Shadrokh Samavi 28
29 Datapath with Control IF: lw $, 8($) PCSrc Control ID/EX WB EX/E WB E/WB IF/ID EX WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 29
30 Datapath with Control IF: sub $, $2, $3 ID: lw $, 8($) PCSrc lw Control ID/EX WB EX/E WB E/WB IF/ID E X WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 3
31 Datapath with Control IF: and $2, $4, $5 PCSrc ID: sub $, $2, $3 EX: lw $, 8($) IF/ID sub Control ID/EX WB E X EX/E WB E/WB WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 3
32 Datapath with Control IF: or $3, $6, $7 PCSrc ID: and $2, $4, $5 EX: sub $, $2, $3 E: lw $, 8($) IF/ID and Control ID/EX WB E X EX/E WB E/WB WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 32
33 Datapath with Control IF: add $4, $8, $9 PCSrc ID: or $3, $6, $7 EX: and $2, $4, $5 E: sub $,.. WB: lw $, 8($) IF/ID or Control ID/EX WB E X EX/E WB E/WB WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 33
34 Datapath with Control IF: xxxx ID: add $4, $8, $9 EX: or $3, $6, $7 E: and $2 WB: sub $,.. PCSrc IF/ID add Control ID/EX WB E X EX/E WB E/WB WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 34
35 Datapath with Control IF: xxxx ID: xxxx EX: add $4, $8, $9 E: or $3,.. WB: and $2 PCSrc IF/ID Control ID/EX WB E X EX/E WB E/WB WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 35
36 Datapath with Control IF: xxxx ID: xxxx EX: xxxx E: add $4,.. WB: or $3 PCSrc IF/ID Control ID/EX WB E X EX/E WB E/WB WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 36
37 Datapath with Control IF: xxxx ID: xxxx EX: xxxx E: xxxx WB: add $4.. PCSrc Control ID/EX WB EX/E WB E/WB IF/ID E X WB PC 4 ress memory register register 2 isters 2 register Shift left2 resu lt Src Zero result Branch em ress Data memory emto [ 5 ] 6 Sign 32 extend 6 control em [2 6] [ 5 ] Op Dst Dr. Shadrokh Samavi 37
38 Simple RISC Pipeline Clock number number i IF ID EX E WB i+ IF ID EX E WB i+ 2 IF ID EX E WB i+ 3 IF ID EX E WB i+ 4 IF ID EX E WB Dr. Shadrokh Samavi 38
39 Review: Visualizing Pipelining Time (clock cycles) I n s t r. Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Ifetch Ifetch Dem Dem O r d e r Ifetch Ifetch Dem Dem Adapted from Patterson, Katz and Kubiatowicz UCB Dr. Shadrokh Samavi 39
40 Dr. Shadrokh Samavi 4
41 Example: Consider the unpipelined processor in the previous section. Assume that it has a 4 GHz clock (or a.5 ns clock cycle) and that it uses four cycles for operations and branches and five cycles for memory operations. Assume that the relative frequencies of these operations are 4%, 2%, and 4%, respectively. Suppose that due to clock skew and setup, pipelining the processor adds. ns of overhead to the clock. Ignoring any latency impact, how much speedup in the instruction execution rate will we gain from a pipeline? Answer: The average instruction execution time on the unpipelined processor is In the pipelined implementation, the clock must run at the speed of the slowest stage plus overhead, which will be.5+. or.6 ns; this is the average instruction execution time. Thus, the speedup from pipelining is The. ns overhead essentially establishes a limit on the effectiveness of pipelining. If the overhead is not affected by changes in the clock cycle, Amdahl s Law tells us that the overhead limits the speedup. Dr. Shadrokh Samavi 4
42 Pipeline Hazards Dr. Shadrokh Samavi 42
43 Hazards: circumstances that would cause incorrect execution if next instruction were launched Structural hazards: Attempting to use the same hardware to do two different things at the same time Data hazards: depends on result of prior instruction still in the pipeline Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps). Dr. Shadrokh Samavi 43
44 Average instruction time unpipelined Speedup from pipelining = Average instruction time pipelined CPI unpipelined = Clock cycle unpipelined CPI pipelined Clock cycle pipelined - CPI unpipelined = Clock cycle unpipelined CPI pipelined Clock cycle pipelined Assuming same Clock cycle for pipelined & unpipelined CPI pipelined = Ideal CPI + Pipeline stall clock cycles per instruction = + Pipeline stall clock cycles per instruction Speedup= Speedup = Speedup from pipelining = CPI unpipelined Pipeline stall cycles per instruction Pipeline depth Pipeline stall cycles per instruction CPI unpipelined CPI pipelined Clock cycle unpipelined Clock cycle pipelined = Pipeline stall cycles per instruction Clock cycle unpipelined Clock cycle pipelined Dr. Shadrokh Samavi 44
45 Example: Structural Hazard Time (clock cycles) Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t r. O r d e r Load Ifetch Instr Instr 2 Instr 3 Instr 4 Ifetch Ifetch Dem Ifetch Dem Dem Dem Structural Hazard Dr. Shadrokh Samavi 45
46 Resolving structural hazards Definition of structural hazard: attempt to use same hardware for two different things at the same time Solution : Wait (stall) must detect the hazard must have mechanism to stall Solution 2: Use more hardware Dr. Shadrokh Samavi 46
47 Detecting and Resolving Structural Hazard Time (clock cycles) Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t r. O r d e r Load Ifetch Instr Instr 2 Stall Instr 3 Ifetch Ifetch Dem Dem Dem Bubble Bubble Bubble Bubble Bubble Ifetch Dem Adapted from Patterson, Katz and Kubiatowicz UCB Dr. Shadrokh Samavi 47
48 Role of ISA in Structural Hazard Resolution Simple to determine the sequence of resources used by an instruction opcode tells it all Uniformity in the resource usage Compare IPS to IA32? IPS approach => all instructions flow through same 5-stage pipelining Dr. Shadrokh Samavi 48
49 Time (clock cycles) Data Hazards IF ID/RF EX E WB I n s t r. add r,r2,r3 sub r4,r,r3 Ifetch Ifetch Dem Dem O r d e r and r6,r,r7 or r8,r,r9 xor r,r,r Ifetch Ifetch Ifetch Dem Dem Dem Adapted from Patterson, Katz and Kubiatowicz UCB Dr. Shadrokh Samavi 49
50 Program execution order (in instructions) Time (in clock cycles) CC CC2 CC3 CC4 CC5 CC6 ADD R, R2, R3 I LW R4, (R) I D D SW 2(R), R4 I D Stores require an operand during E, and forwarding of that operand is shown here Dr. Shadrokh Samavi 5
51 Pipeline w/o forwarding Inst. Fetch Inst. Decode Exec em WB u x IF/ID ID/EX EX/E E/WB 4 Shift left 2 result PC ress memory register register 2 isters 2 register u x Zero result ress Data memory u x 6 Sign extend 32 Dr. Shadrokh Samavi 5
52 Forwarding (from EX/E) Inst. Decode Exec em ID/EX EX/E E/WB ister File UX UX Data emory UX Dr. Shadrokh Samavi 52
53 Forwarding (from E/WB) ID/EX EX/E E/WB ister File UX Data emory UX UX Dr. Shadrokh Samavi 53
54 Control unit for forwarding ID/EX EX/E E/WB ister File UX UX Data emory UX Rd Rt UX 5 Rt Rs Forwarding Unit EX/E Rd 5 E/WB Rd Dr. Shadrokh Samavi 54
55 Forwarding of Pipeline register containing source instruction EX/E Opcode of source instruction Pipeline register containing destination instruction ID/EX Opcode of destination instruction ister-register, immediate, load, store, branch Destination of the forwarded result Top input EX/E isterregister, ID/EX ister-register, Bottom input E/WB isterregister, ID/EX ister-register, immediate, load, store, branch isterregister, Comparison (if equal then forward) EX/E.IR 6..2 == ID/EX.IR 6.. EX/E.IR 6..2 == ID/EX.IR..5 E/WB.IR 6..2 == ID/EX.IR 6.. E/WB ID/EX ister-register Bottom E/WB.IR 6..2 == input ID/EX.IR..5 EX/E EX/E E/WB E/WB isterregister, immediate immediate immediate immediate ID/EX E/WB Load ID/EX ister-register immediate, load, store, branch EX/E.IR..5 == ID/EX.IR 6.. ID/EX ister-register Bottom EX/E.IR..5 == input ID/EX.IR..5 ID/EX ister-register Top E/WB.IR..5 == immediate, load, input ID/EX.IR 6.. store, branch ID/EX ister-register Bottom E/WB.IR..5 == input ID/EX.IR..5 ister-register immediate, load, store, branch Top input Top input Top input E/WB.IR..5 == ID/EX.IR 6.. E/WB Load ID/EX ister-register Bottom E/WB.IR..5 == input ID/EX.IR..5 Dr. Shadrokh Samavi 55
56 Data Hazards Classification Resource Objects (R.O.): all addressable locations Data Objects (D.O.): content of resource objects D(I): Domain of instruction I = all R.O. that their D.O. effect the operation of I. R(I): Range of instruction I = all R.O. that their D.O. are effected by the execution of I. Dr. Shadrokh Samavi 56
57 A RAW hazard exists on register if R ( i ) D( j ) A WAW hazard exists on register if R( i ) R(j ) A WAR hazard exists on register if D( i ) R (j ) Dr. Shadrokh Samavi 57
58 D(I) I write R(I) D(J) J D(J) RAW D(I) D(J) I write J write R(I) R(J) WAW D(J) J write D(I) R(J) I D(I) WAR Dr. Shadrokh Samavi 58
59 Situation No dependence Dependence requiring stall Dependence overcome by forwarding Dependence with accesses in order Example code sequence LW R,45(R2) ADD R5,R6,R7 SUB R8,R6,R7 OR R9,R6,R7 LW R,45(R2) ADD R5,R,R7 SUB R8,R6,R7 OR R9,R6,R7 LW R,45(R2) ADD R5,R6,R7 SUB R8,R,R7 OR R9,R6,R7 LW R,45(R2) ADD R5,R6,R7 SUB R8,R6,R7 OR R9,R,R7 Action No hazard possible because no dependence exists on R in the immediately following. three instructions Comparators detect the use of R in the ADD and stall the ADD (and SUB and OR) before the ADD begins EX. Comparators detect use of R in SUB and forward result of load to in time for SUB to begin EX. No action required because the read of R by OR occurs in the second half of the ID phase, while the write of the loaded occurred in the first half. Situations that the pipeline hazard detection hardware can see by comparing the destination and sources of adjacent instructions. Dr. Shadrokh Samavi 59
60 Three Generic Data Hazards After (RAW) Instr J tries to read operand before Instr I writes it I: add r,r2,r3 J: sub r4,r,r3 Caused by a Data Dependence (in compiler nomenclature). This hazard results from an actual need for communication. Dr. Shadrokh Samavi 6
61 Three Generic Data Hazards After (WAR) Instr J writes operand before Instr I reads it I: sub r4,r,r3 J: add r,r2,r3 K: mul r6,r,r7 Called an anti-dependence by compiler writers. This results from reuse of the name r. Can t happen in IPS 5 stage pipeline because: All instructions take 5 stages, and s are always in stage 2, and s are always in stage 5 Dr. Shadrokh Samavi 6
62 Three Generic Data Hazards After (WAW) Instr J writes operand before Instr I writes it. I: sub r,r4,r3 J: add r,r2,r3 K: mul r6,r,r7 Called an output dependence by compiler writers This also results from the reuse of name r. Can t happen in IPS 5 stage pipeline because: All instructions take 5 stages, and s are always in stage 5 Will see WAR and WAW in later more complicated pipes Dr. Shadrokh Samavi 62
63 Forwarding to Avoid Data Hazard Time (clock cycles) I n s t r. add r,r2,r3 sub r4,r,r3 Ifetch Ifetch Dem Dem O r d e r and r6,r,r7 or r8,r,r9 Ifetch Ifetch Dem Dem xor r,r,r Ifetch Dem Dr. Shadrokh Samavi 63
64 Data Hazard Even with Forwarding Time (clock cycles) I n s t r. lw r, (r2) sub r4,r,r6 Ifetch Ifetch Dem Dem O r d e r and r6,r,r7 or r8,r,r9 Ifetch Ifetch Dem Dem Adapted from Patterson, Katz and Kubiatowicz UCB Dr. Shadrokh Samavi 64
65 Resolving this load hazard ing hardware?... not Detection? Compilation techniques? What is the cost of load delays? Dr. Shadrokh Samavi 65
66 Resolving the Load Data Hazard Time (clock cycles) I n s t r. O r d e r lw r, (r2) sub r4,r,r6 and r6,r,r7 Ifetch Ifetch Ifetch Dem Bubble Bubble Dem Dem or r8,r,r9 Bubble Ifetch Dem How is this different from the instruction issue stall? Dr. Shadrokh Samavi 66
67 Software Scheduling to Avoid Load Hazards Try producing fast code for a = b + c; d = e f; assuming a, b, c, d,e, and f in memory. Slow code: LW Rb,b Fast code: LW Rb, b LW Rc,c LW Rc, c ADD Ra,Rb,Rc LW Re,e SW a,ra ADD Ra, Rb, Rc LW Re,e LW Rf, f LW Rf,f SW a,ra SUB Rd,Re,Rf SUB Rd, Re, Rf SW d,rd SW d,rd Dr. Shadrokh Samavi 67
68 Set Connection What is exposed about this organizational hazard in the instruction set? k cycle delay? bad, CPI is not part of ISA k instruction slot delay load should not be followed by use of the value in the next k instructions Nothing, but code can reduce run-time delays IPS did the transformation in the assembler Dr. Shadrokh Samavi 68
69 23% 24% 2% 2% Fraction of loads that cause a stall 45% 4% 4% 35% 3% 25% 24% 2% 5% % 2% % % 5% 4% % compress eqntott espresso gcc li doduc ear Benchmark hydro2d mdljdp su2cor Dr. Shadrokh Samavi 69
70 Control Hazard on Branches => Three Stage Stall : beq r,r3,36 Ifetch Dem 4: and r2,r3,r5 Ifetch Dem 8: or r6,r,r7 Ifetch Dem 22: add r8,r,r9 Ifetch Dem 36: xor r,r,r Ifetch Dem Dr. Shadrokh Samavi 7
71 Example: Branch Stall Impact If 3% branch, Stall 3 cycles significant Two part solution: Determine branch taken or not sooner, AND Compute taken branch address earlier IPS branch tests if register = or IPS Solution: ove Zero test to ID/RF stage er to calculate new PC in ID/RF stage clock cycle penalty for branch versus 3 Dr. Shadrokh Samavi 7
72 compress 3% 3% % eqntott 2% 2% 22% Benchmark espresso gcc li doduc ear % 2% 2% 4% 3% 4% 4% 4% 4% 6% 6% 8% % % 2% The frequency of instructions that may change the PC hydro2d % 2% % mdljdp su2cor % % 2% % % 9% % 5% % 5% 2% 25% Percentage of instructions executed Forward conditional branches Backward conditional branches Unconditional branches Dr. Shadrokh Samavi 72
73 Four Branch Hazard Alternatives #: Stall until branch direction is clear #2: Predict Branch Not Taken Execute successor instructions in sequence Squash instructions in pipeline if branch actually taken Advantage of late pipeline state update 47% IPS branches not taken on average PC+4 already calculated, so use it to get next instruction #3: Predict Branch Taken 53% IPS branches taken on average But haven t calculated branch target address in IPS» IPS still incurs cycle branch penalty» Other machines: branch target known before outcome Dr. Shadrokh Samavi 73
74 Four Branch Hazard Alternatives #4: Delayed Branch Define branch to take place AFTER a following instruction branch instruction sequential successor sequential successor 2... sequential successor n... branch target if taken Branch delay of length n slot delay allows proper decision and branch target address in 5 stage pipeline IPS uses this Dr. Shadrokh Samavi 74
75 Hardware interlock- No-op Dr. Shadrokh Samavi 75
76 Delayed Branch Where to get instructions to fill branch delay slot? Before branch instruction From the target address: only valuable when branch taken From fall through: only valuable when branch not taken Canceling branches allow more slots to be filled Compiler effectiveness for single branch delay slot: Fills about 6% of branch delay slots About 8% of instructions executed in branch delay slots useful in computation About 5% (6% x 8%) of slots usefully filled Delayed Branch downside: 7-8 stage pipelines, multiple instructions issued per clock (superscalar) Dr. Shadrokh Samavi 76
77 (a) From before (b) From target (c) From fall through ADD R, R2, R3 if R2 = then Delay slot SUB R4, R5, R6 ADD R, R2, R3 if R = then Delay slot ADD R, R2, R3 if R = then Delay slot OR R7, R8, R9 SUB R4, R5, R6 Becomes Becomes Becomes if R2 = then ADD R, R2, R3 SUB R4, R5, R6 ADD R, R2, R3 if R = then SUB R4, R5, R6 ADD R, R2, R3 if R = then OR R7, R8, R9 SUB R4, R5, R6 Dr. Shadrokh Samavi 77
78 Canceling or Nullifying Branch. In a canceling branch, the instruction includes the direction that the branch was predicted. When the branch behaves as predicted, the instruction in the branch-delay slot is simply executed as it would normally be with a delayed branch. When the branch is incorrectly predicted, the instruction in the branch-delay slot is simply turned into a no-op. Dr. Shadrokh Samavi 78
79 Delayed-branch scheduling schemes Scheduling strategy Requirements Improves performance when? (a) From before. Branch must not depend on the rescheduled instructions Always. (b) From target (c) From fall through ust be OK to execute rescheduled instructions if branch is not taken. ay need to duplicate instructions. ust be OK to execute instructions if branch is taken. When branch is taken. ay enlarge program if instructions are duplicated. When branch is not taken. Dr. Shadrokh Samavi 79
80 Branch is NOT TAKEN (misprediction) Branch is TAKEN ( predicted correctly) The behavior of a predicted-taken canceling branch depends on whether the branch is taken or not. Dr. Shadrokh Samavi 8
81 Delayed and Canceling Delay Branches Dr. Shadrokh Samavi 8
82 Overall costs of a variety of branch schemes Dr. Shadrokh Samavi 82
83 For an R4-style pipeline, it takes 3 pipeline stages before the branch target address is known & more cycle to evaluate branch condition. This leads to the following branch penalties: Find the effective addition to the CPI arising from branches given that the relative frequency of unconditional, conditional untaken, and conditional taken branches are 4%, 6%, and %, respectively. Dr. Shadrokh Samavi 83
84 Control Hazards. Find out whether the branch is taken or not taken earlier in the pipeline. 2. Compute the taken PC (i.e., the address of the branch target) earlier. Branch instruction IF ID EX E WB Branch successor IF stall stall IF ID EX E WB Branch successor + IF ID EX E WB Branch successor + 2 IF ID EX E Branch successor + 3 IF ID EX Branch successor + 4 IF ID Branch successor + 5 IF Dr. Shadrokh Samavi 84
85 ultiple Pipelines Dr. Shadrokh Samavi 85
86 Superscalar 2 3 s Superscalar of degree 3 Clock cycles Dr. Shadrokh Samavi 86
87 Superpipelined s Clock cycles 23 s Superpipelined of degree 3 Clock cycles Dr. Shadrokh Samavi 87
88 ulticycle Operations Dr. Shadrokh Samavi 88
89 The IPS pipeline with three additional unpipelined, floating-point, functional units. Dr. Shadrokh Samavi 89
90 Dr. Shadrokh Samavi 9
91 Latency : the number of intervening cycles between an instruction that produces a result and an instruction that uses the result. The initiation or repeat interval: the number of cycles that must elapse between issuing two operations of a given type. The pipe stages are shown in the order in which they are used for any operation. The notation S+A indicates a clock cycle in which both the S and A stages are used. The notation D 28 indicates that the D stage is used 28 times in a row. Dr. Shadrokh Samavi 9
92 An FP multiply issued at clock is followed by a single FP add issued between clocks and 7. The second column indicates whether an instruction of the specified type stalls when it is issued n cycles later, where n is the clock cycle number in which the U stage of the second instruction occurs. The stage or stages that cause a stall are in bold. Note that this table deals with only the interaction between the multiply and one add issued between clocks and 7. In this case, the add will stall if it is issued four or five cycles after the multiply; otherwise, it issues without stalling. Notice that the add will be stalled for two cycles if it issues in cycle 4 because on the next clock cycle it will still conflict with the multiply; if, however, the add issues in cycle 5, it will stall for only clock cycle, because that will eliminate the conflicts. Dr. Shadrokh Samavi 92
93 A multiply issuing after an add can always proceed without stalling, because the shorter instruction clears the shared pipeline stages before the longer instruction reaches them. Dr. Shadrokh Samavi 93
94 Support ultiple FP Operations 2 3 E X 4 Integer Unit FP multiplier IF ID FP add E WB A A A A Complicate bypass FP divider (non-pipelined) Potential structural hazard ultiple (FP) instructions can complete at the same time RF might need to be multi-ported Ordering issue, who gets to update the register? Out-of-order completion/retirement: Precise exception issue Hsien-Hsin Dr. Shadrokh S. Lee, Georgia SamaviInstitute of Technology 94
95 Bypass/Forwarding Clock Cycles L.D F4,(R2) IF ID EX WB UL.D F,F4,F6 IF ID S WB ADD.D F2,F,F8 IF S ID S S S S S S A A2 A3 A4 WB S.D F2,(R2) IF S S S S S S ID EX S S S WB Hsien-Hsin Dr. Shadrokh S. Lee, Georgia SamaviInstitute of Technology 95
96 The pipeline timing of a set of independent FP operations. A typical FP code sequence showing the stalls arising from RAW hazards. Dr. Shadrokh Samavi 96
97 Three instructions want to perform a write back to the FP register file simultaneously, as shown in clock cycle. Dr. Shadrokh Samavi 97
98 Dr. Shadrokh Samavi 98
99 Dr. Shadrokh Samavi 99
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics
More informationCISC 662 Graduate Computer Architecture Lecture 6 - Hazards
CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20
More informationPipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science
Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationModern Computer Architecture
Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationLecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1
Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationPipelining: Hazards Ver. Jan 14, 2014
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Pipelining: Hazards Ver. Jan 14, 2014 Marco D. Santambrogio: marco.santambrogio@polimi.it Simone Campanoni:
More informationOverview of Pipelining
EEC 58 Compter Architectre Pipelining Department of Electrical Engineering and Compter Science Cleveland State University Fndamental Principles Overview of Pipelining Pipelined Design otivation: Increase
More informationCSE 533: Advanced Computer Architectures. Pipelining. Instructor: Gürhan Küçük. Yeditepe University
CSE 533: Advanced Computer Architectures Pipelining Instructor: Gürhan Küçük Yeditepe University Lecture notes based on notes by Mark D. Hill and John P. Shen Updated by Mikko Lipasti Pipelining Forecast
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationPipeline Review. Review
Pipeline Review Review Covered in EECS2021 (was CSE2021) Just a reminder of pipeline and hazards If you need more details, review 2021 materials 1 The basic MIPS Processor Pipeline 2 Performance of pipelining
More informationOverview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP
Overview Appendix A Pipelining: Basic and Intermediate Concepts Basics of Pipelining Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 1 2 Unpipelined
More informationCOSC4201 Pipelining. Prof. Mokhtar Aboelaze York University
COSC4201 Pipelining Prof. Mokhtar Aboelaze York University 1 Instructions: Fetch Every instruction could be executed in 5 cycles, these 5 cycles are (MIPS like machine). Instruction fetch IR Mem[PC] NPC
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationAppendix A. Overview
Appendix A Pipelining: Basic and Intermediate Concepts 1 Overview Basics of Pipelining Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 2 1 Unpipelined
More informationOutline. Pipelining basics The Basic Pipeline for DLX & MIPS Pipeline hazards. Handling exceptions Multi-cycle operations
Pipelining 1 Outline Pipelining basics The Basic Pipeline for DLX & MIPS Pipeline hazards Structural Hazards Data Hazards Control Hazards Handling exceptions Multi-cycle operations 2 Pipelining basics
More informationWhat do we have so far? Multi-Cycle Datapath (Textbook Version)
What do we have so far? ulti-cycle Datapath (Textbook Version) CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instruction being processed in datapath How to lower CPI further? #1 Lec # 8 Summer2001
More informationT = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good
CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle
More informationCPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts
CPE 110408443 Computer Architecture Appendix A: Pipelining: Basic and Intermediate Concepts Sa ed R. Abed [Computer Engineering Department, Hashemite University] Outline Basic concept of Pipelining The
More informationDesigning a Pipelined CPU
Designing a Pipelined CPU CSE 4, S2'6 Review -- Single Cycle CPU CSE 4, S2'6 Review -- ultiple Cycle CPU CSE 4, S2'6 Review -- Instruction Latencies Single-Cycle CPU Load Ifetch /Dec Exec em Wr ultiple
More informationMinimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline
Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding
More informationChapter 4 (Part II) Sequential Laundry
Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30
More informationLecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.SP96 1 Review: Evaluating Branch Alternatives Two part solution: Determine
More informationAppendix C: Pipelining: Basic and Intermediate Concepts
Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)
More informationPredict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch
branch taken Revisiting Branch Hazard Solutions Stall Predict Not Taken Predict Taken Branch Delay Slot Branch I+1 I+2 I+3 Predict Not Taken branch not taken Branch I+1 IF (bubble) (bubble) (bubble) (bubble)
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the
More informationPipelining: Basic and Intermediate Concepts
Appendix A Pipelining: Basic and Intermediate Concepts 1 Overview Basics of fpipelining i Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 2 Unpipelined
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationLecture 2: Processor and Pipelining 1
The Simple BIG Picture! Chapter 3 Additional Slides The Processor and Pipelining CENG 6332 2 Datapath vs Control Datapath signals Control Points Controller Datapath: Storage, FU, interconnect sufficient
More informationMIPS An ISA for Pipelining
Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch
More information第三章 Instruction-Level Parallelism and Its Dynamic Exploitation. 陈文智 浙江大学计算机学院 2014 年 10 月
第三章 Instruction-Level Parallelism and Its Dynamic Exploitation 陈文智 chenwz@zju.edu.cn 浙江大学计算机学院 2014 年 10 月 1 3.3 The Major Hurdle of Pipelining Pipeline Hazards 本科回顾 ------- Appendix A.2 3.3.1 Taxonomy
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,
More informationECE473 Computer Architecture and Organization. Pipeline: Control Hazard
Computer Architecture and Organization Pipeline: Control Hazard Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 15.1 Pipelining Outline Introduction
More informationImproving Performance: Pipelining
Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute
More informationWhat is Pipelining? RISC remainder (our assumptions)
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationWhat is Pipelining? Time per instruction on unpipelined machine Number of pipe stages
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationUnpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory
Pipelining the Idea Similar to assembly line in a factory Divide instruction into smaller tasks Each task is performed on subset of resources Overlap the execution of multiple instructions by completing
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy
More informationImprove performance by increasing instruction throughput
Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access
More informationReview: Evaluating Branch Alternatives. Lecture 3: Introduction to Advanced Pipelining. Review: Evaluating Branch Prediction
Review: Evaluating Branch Alternatives Lecture 3: Introduction to Advanced Pipelining Two part solution: Determine branch taken or not sooner, AND Compute taken branch address earlier Pipeline speedup
More informationLecture 6: Pipelining
Lecture 6: Pipelining i CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and other
More informationLecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation
Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu www.secs.oakland.edu/~yan
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationThe Big Picture Problem Focus S re r g X r eg A d, M lt2 Sub u, Shi h ft Mac2 M l u t l 1 Mac1 Mac Performance Focus Gate Source Drain BOX
Appendix A - Pipelining 1 The Big Picture SPEC f2() { f3(s2, &j, &i); *s2->p = 10; i = *s2->q + i; } Requirements Algorithms Prog. Lang./OS ISA f1 f2 f5 f3 s q p fp j f3 f4 i1: ld r1, b i2: ld r2,
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationPipelining. Maurizio Palesi
* Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationPipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12
Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online
More informationPipelining: Basic Concepts
Pipelining: Basic Concepts Prof. Cristina Silvano Dipartimento di Elettronica e Informazione Politecnico di ilano email: silvano@elet.polimi.it Outline Reduced Instruction Set of IPS Processor Implementation
More informationCMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions
CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study Complications With Long Instructions So far, all MIPS instructions take 5 cycles But haven't talked
More informationProcessor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed
Lecture 3: General Purpose Processor Design CSCE 665 Advanced VLSI Systems Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, tet etc included in slides are borrowed from various books, websites,
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationLecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S
Lecture 6 MIPS R4000 and Instruction Level Parallelism Computer Architectures 521480S Case Study: MIPS R4000 (200 MHz, 64-bit instructions, MIPS-3 instruction set) 8 Stage Pipeline: first half of fetching
More informationChapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts
CS359: Computer Architecture Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts Yanyan Shen Department of Computer Science and Engineering Shanghai Jiao Tong University Parallel
More informationPipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...
CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2010 Lecture 2: Review of Metrics and Pipelining 563 L02.1 Fall 2010 Review from Last Time Computer Architecture >> instruction sets Computer Architecture skill
More informationCSE 502 Graduate Computer Architecture. Lec 4-6 Performance + Instruction Pipelining Review
CSE 502 Graduate Computer Architecture Lec 4-6 Performance + Instruction Pipelining Review Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from
More informationPipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.
Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing (2) Pipeline Performance Assume time for stages is ps for register read or write
More informationCS422 Computer Architecture
CS422 Computer Architecture Spring 2004 Lecture 07, 08 Jan 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html Recall: Data Hazards Have to be detected dynamically,
More informationPipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!
Advanced Topics on Heterogeneous System Architectures Pipelining! Politecnico di Milano! Seminar Room @ DEIB! 30 November, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Outline!
More informationExecution/Effective address
Pipelined RC 69 Pipelined RC Instruction Fetch IR mem[pc] NPC PC+4 Instruction Decode/Operands fetch A Regs[rs]; B regs[rt]; Imm sign extended immediate field Execution/Effective address Memory Ref ALUOutput
More informationCSE 502 Graduate Computer Architecture. Lec 3-5 Performance + Instruction Pipelining Review
CSE 502 Graduate Computer Architecture Lec 3-5 Performance + Instruction Pipelining Review Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from
More informationDLX Unpipelined Implementation
LECTURE - 06 DLX Unpipelined Implementation Five cycles: IF, ID, EX, MEM, WB Branch and store instructions: 4 cycles only What is the CPI? F branch 0.12, F store 0.05 CPI0.1740.83550.174.83 Further reduction
More informationCOSC4201. Prof. Mokhtar Aboelaze York University
COSC4201 Chapter 3 Multi Cycle Operations Prof. Mokhtar Aboelaze York University Based on Slides by Prof. L. Bhuyan (UCR) Prof. M. Shaaban (RTI) 1 Multicycle Operations More than one function unit, each
More informationBasic Pipelining Concepts
Basic ipelining oncepts Appendix A (recommended reading, not everything will be covered today) Basic pipelining ipeline hazards Data hazards ontrol hazards Structural hazards Multicycle operations Execution
More informationUpdated Exercises by Diana Franklin
C-82 Appendix C Pipelining: Basic and Intermediate Concepts Updated Exercises by Diana Franklin C.1 [15/15/15/15/25/10/15] Use the following code fragment: Loop: LD R1,0(R2) ;load R1 from address
More informationLecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University
Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationChapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns
Chapter Si Pipelining Improve perfomance by increasing instruction throughput eecutionı Time lw $, ($) 2 6 8 2 6 8 access lw $2, 2($) 8 ns access lw $3, 3($) eecutionı Time lw $, ($) lw $2, 2($) 2 ns 8
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationInstruction Level Parallelism
Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches
More informationCSE 502 Graduate Computer Architecture. Lec 3-5 Performance + Instruction Pipelining Review
CSE 502 Graduate Computer Architecture Lec 3-5 Performance + Instruction Pipelining Review Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from
More informationLecture 4: Introduction to Advanced Pipelining
Lecture 4: Introduction to Advanced Pipelining Prepared by: Professor David A. Patterson Computer Science 252, Fall 1996 Edited and presented by : Prof. Kurt Keutzer Computer Science 252, Spring 2000 KK
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 6 Pipelining Part 1 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More informationAdvanced issues in pipelining
Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one
More informationProcessor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)
More informationCISC 662 Graduate Computer Architecture Lecture 6 - Hazards
CISC 662 Gaduate Compute Achitectue Lectue 6 - Hazads Michela Taufe http://www.cis.udel.edu/~taufe/teaching/cis662f07 Powepoint Lectue Notes fom John Hennessy and David Patteson s: Compute Achitectue,
More informationPipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations
Principles of pipelining Pipelining Simple pipelining Structural Hazards Data Hazards Control Hazards Interrupts Multicycle operations Pipeline clocking ECE D52 Lecture Notes: Chapter 3 1 Sequential Execution
More informationAppendix C. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,
Appendix C Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Pipelining Multiple instructions are overlapped in execution Each is in a different stage Each stage is called
More informationPage 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight
Pipelining: Its Natural! Chapter 3 Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes A B C D Dryer takes 40 minutes Folder
More informationC.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts
C-2 Appendix C Pipelining: Basic and Intermediate Concepts C.1 Introduction Many readers of this text will have covered the basics of pipelining in another text (such as our more basic text Computer Organization
More informationCS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism
CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not
More informationECE154A Introduction to Computer Architecture. Homework 4 solution
ECE154A Introduction to Computer Architecture Homework 4 solution 4.16.1 According to Figure 4.65 on the textbook, each register located between two pipeline stages keeps data shown below. Register IF/ID
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationProcessor Architecture
Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)
More informationThe Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.
The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions
More informationPage 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson
More informationPipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &
More information14:332:331 Pipelined Datapath
14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationAppendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1
Appendix C Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure C.2 The pipeline can be thought of as a series of data paths shifted in time. This shows
More informationCS 152 Computer Architecture and Engineering Lecture 4 Pipelining
CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining 2014-1-30 John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 otorola 68000 Next week
More informationComplications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow?
Complications with long instructions CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study So far, all MIPS instructions take 5 cycles But haven't talked
More information09-1 Multicycle Pipeline Operations
09-1 Multicycle Pipeline Operations 09-1 Material may be added to this set. Material Covered Section 3.7. Long-Latency Operations (Topics) Typical long-latency instructions: floating point Pipelined v.
More information