COSC 6385 Computer Architecture - Instruction Level Parallelism (II)

Size: px

Start display at page:

Download "COSC 6385 Computer Architecture - Instruction Level Parallelism (II)"

Ginger Caldwell
6 years ago
Views:

1 COSC 6385 Computer Architecture - Instruction Level Parallelism (II) Edgar Gabriel Spring 2016 Data fields for reservation stations Op: operation to perform on source operands S1 and S2 Q j, Q k : reservation stations producing the operands V j, V k : value for each operand A: holds information for memory address calculation (immediate field, effective address) Busy: indicates occupied functional units/reservation stations Q i : number of the reservation station who will produce the data to be stored in this register 1

2 Detailed steps Lets look at the details for an operation OP rd, rs, rt (e.g. ADD.D F6, F2, F0) Assume, that Operation has been assigned to reservation station r RS[r] is the data structure holding all the fields for reservation station r, as described in the last lecture RegisterStat[rs] is the data structure holding the status of register rs (e.g. whether a reservation station will write the register) Regs[rs] is the register rs in the register file Detailed steps (II) Instruction state Wait until Action / bookkeeping Issue FP operation Station r empty if ( RegisterStat[rs].Qi!= 0 ){ RS[r].Qj = RegisterStat[rs].Qi; } else { RS[r].Qj = 0; RS[r].Vj = Regs[rs]; } if ( RegisterStat[rt].Qi!= 0 ){ RS[r].Qk = RegisterStat[rt].Qi; } else { RS[r].Qk = 0; RS[r].Vk = Regs[rt]; } RS[r].Busy = yes; RegisterStat[rd].Qi = r; 2

3 Detailed steps (III) Instruction state Wait until Action / bookkeeping Execute FP operation Write result FP operation RS[r].Qj==0 && RS[r].Qk==0 Execution complete and CDB available /* compute result using Vj and Vk */ x : if ( RegisterStat[x].Qi == r) { Regs[x] = result; RegisterStat[x].Qi = 0; } x : if ( RS[x].Qj == r ) { RS[x].Vj = result; RS[x].Qj = 0; } x : if ( RS[x].Qk == r ) { RS[x].Vk = result; RS[x].Qk = 0; } RS[r].Busy = no; Detailed steps (IV) For a LOAD operation, e.g. LD rt, imm(rs) Instruction state Wait until Action / bookkeeping Issue Load Buffer r empty if ( RegisterStat[rs].Qi!= 0 ){ RS[r].Qj = RegisterStat[rs].Qi; } else { RS[r].Qj = 0; RS[r].Vj = Regs[rs]; } RS[r].A = imm; RS[r].Busy = yes; RegisterStat[rt].Qi = r; 3

4 Detailed steps (V) Instruction state Wait until Action / bookkeeping Execute Load step1 RS[r].Qj==0 && r is head of load queue Load step 2 Load step 1 complete Write result Load Execution complete and CDB available RS[r].A = RS[r].Vj + RS[r].A Read from Mem[RS[r].A] x : if ( RegisterStat[x].Qi == r) { Regs[x] = result; RegisterStat[x].Qi = 0; } x : if ( RS[x].Qj == r ) { RS[x].Vj = result; RS[x].Qj = 0; } x : if ( RS[x].Qk == r ) { RS[x].Vk = result; RS[x].Qk = 0; } RS[r].Busy = no; Hardware based speculation Branch prediction reduces direct stalls of branches Instructions can be issued using dynamic branch prediction, but could not be executed until the branch outcome was known Speculative executions extends the concept of dynamic scheduling Speculates on the outcome of the branch Executes the following instructions Requires the ability to undo instructions in case the prediction was wrong. 4

5 Hardware based speculation (II) Extending Tomasulo s algorithm to support speculation: Separate the step of bypassing results among instructions from the completion of the instruction Add another step Issue Execute Write result Commit Instruction execute out-of-order but commit in-order Additional set of hardware buffers to hold the results of instructions which have not yet been committed: Reorder buffer (ROB) Reorder Buffers Hold the results of instructions between the time an instruction finishes and the time the instruction is being committed Acts as additional reservation stations ROB can be the source of operands of other instructions Each ROB contains four fields Instruction type: branch/store/alu operation Destination: Register number or memory address where result should be written Value: value of the instruction Ready: instruction completed execution? 5

6 Four steps of execution (I) Issue: Get instruction from instruction queue Issue instruction if an reservation station is empty and an ROB is available Execute: If operands available, execute New: a store instruction only contains the calculation of the effective address at this point Write result: Write result to CDB Any reservation station/rob should update Register file not modified at this point Four steps of execution (II) Commit: Normal case (prediction was correct): instruction reaches head of ROB Update register file Remove entry from ROB Store operation: Instruction reaches head of ROB Update of memory location Incorrect prediction: When a branch instruction reaches head of ROB and the hardware indicates that the prediction was wrong, ROB is flushed and execution restarted. 6

7 The same example as for scoreboarding L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Following slides are based on a lecture by Jelena Mirkovic, University of Delaware Assumption: ADD and SUB take 2 clock cycles MULT takes 10 clock cycle DIV takes 40 clock cycles 2 Load/Store, 3 ADD and 2 Mult reservation stations Time=1 Issue first load Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Yes Load Regs[R2] #1 34 Load2 Add1 Add2 Add3 Mult1 Mult2 Register result status F0 F2 F4 F6 F8 F10 F12 F30 Reorder# #1 Busy yes 7

8 Time=1 Issue first load Reorder buffer Entry Busy Instruction State Destination Value 1 Yes L.D F6, 34(R2) Issue F Time=2 first load executes, Second load issues Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Yes Load Regs[R2] #1 +34 Load2 Yes Load Regs[R3] #2 45 Add1 Add2 Add3 Mult1 Mult2 Register result status F0 F2 F4 F6 F8 F10 F12 F30 Reorder# #2 #1 Busy yes yes 8

9 Time=2 Reorder buffer Entry Busy Instruction State Destination Value 1 Yes L.D F6, 34(R2) Execute F6 2 Yes L.D F2, 45(R3) Issue F Time=3 first load executes, Second load executes, Mul is issued Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Yes Load #1 Regs[R2]+34 Load2 Yes Load Regs[R3] #2 +45 Add1 Add2 Add3 Mult1 Yes Mult Regs[F4] #2 #3 Mult2 Register result status F0 F2 F4 F6 F8 F10 F12 F30 Reorder# #3 #2 #1 Busy yes yes yes 9

10 Time=3 Reorder buffer Entry Busy Instruction State Destination Value 1 Yes L.D F6, 34(R2) Execute F6 2 Yes L.D F2, 45(R3) Executes F2 3 Yes MUL.D F0,F2,F4 Issue F Time=4 first load write res., Second load executes, Mul stalled, SUB issued Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Yes Load #2 Regs[R3]+45 Add1 Yes Sub Mem[34+Regs[R2]] #2 #4 Add2 Add3 Mult1 Yes Mult Regs[F4] #2 #3 Mult2 Register result status F0 F2 F4 F6 F8 F10 F12 F30 Reorder# #3 #2 #1 #4 Busy yes yes yes yes 10

11 Time=4 Reorder buffer Entry Busy Instruction State Destination Value 1 Yes L.D F6, 34(R2) Write result F6 Mem[34+Regs[R2]] 2 Yes L.D F2, 45(R3) Executes F2 3 Yes MUL.D F0,F2,F4 Stalled in issue F0 4 Yes SUB.D F8, F2, F6 Issue F8 5 6 Time=5 first load commits, Second load write res, Mul, Sub stalled, Div issued Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Yes Sub Mem[45+Regs[R3]] Mem[34+Regs[R2]] #4 Add2 Add3 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3 Mult2 Yes Div Mem[34+Regs[R2]] #3 #5 Register result status F0 F2 F4 F6 F8 F10 F12 F30 Reorder# #3 #2 #4 #5 Busy yes yes yes Yes 11

12 Time=5 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 Yes L.D F2, 45(R3) Write result F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Stalled in issue F0 4 Yes SUB.D F8, F2, F6 Stalled in issue F8 5 Yes DIV.D F10,F0, F6 Issue F10 6 Time=6 second load commits., Mul (1/10), Sub (1/2), Div stalled, Add issued Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Yes Sub Mem[45+Regs[R3]] Mem[34+Regs[R2]] #4 Add2 yes Add Mem[45+Regs[R3]] #4 #6 Add3 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3 Mult2 Yes Div Mem[34+Regs[R2]] #3 #5 Register result status F0 F2 F4 F6 F8 F10 F12 F30 Reorder# #3 #6 #4 #5 Busy yes yes yes Yes 12

13 Time=6 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Execute F0 4 Yes SUB.D F8, F2, F6 Execute F8 5 Yes DIV.D F10,F0, F6 Stalled in Issue F10 6 Yes ADD F6, F8, F2 Issue F6 Time=7 Mul (2/10), Sub (2/2), Div stalled, Add stalled Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Yes Sub Mem[45+Regs[R3]] Mem[34+Regs[R2]] #4 Add2 yes Add Mem[45+Regs[R3]] #4 #6 Add3 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3 Mult2 Yes Div Mem[34+Regs[R2]] #3 #5 Register result status F0 F2 F4 F6 F8 F10 F12 F30 Reorder# #3 #6 #4 #5 Busy yes yes yes Yes 13

14 Time=7 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Execute F0 4 Yes SUB.D F8, F2, F6 Execute F8 5 Yes DIV.D F10,F0, F6 Stalled in Issue F10 6 Yes ADD F6, F8, F2 Stalled in Issue F6 Time=8 Mul (3/10), Sub write result, Div stalled, Add stalled Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 yes Add X Mem[45+Regs[R3]] #6 Add3 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3 Mult2 Yes Div Mem[34+Regs[R2]] #3 #5 Register result status F0 F2 F4 F6 F8 F10 F12 F30 Reorder# #3 #6 #4 #5 Busy yes yes yes Yes 14

15 Time=8 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Execute F0 4 Yes SUB.D F8, F2, F6 Write result F8 X 5 Yes DIV.D F10,F0, F6 Stalled in Issue F10 6 Yes ADD F6, F8, F2 Stalled in Issue F6 Time=9 Mul (4/10),Div stalled, Add executes (1/2) Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 yes Add X Mem[45+Regs[R3]] #6 Add3 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3 Mult2 Yes Div Mem[34+Regs[R2]] #3 #5 Register result status F0 F2 F4 F6 F8 F10 F12 F30 Reorder# #3 #6 #4 #5 Busy yes yes yes Yes 15

16 Time=9 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Execute F0 4 Yes SUB.D F8, F2, F6 Waiting to commit F8 X 5 Yes DIV.D F10,F0, F6 Stalled in Issue F10 6 Yes ADD F6, F8, F2 Execute F6 Time=11 Mul (6/10),Div stalled, Add writes result Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3 Mult2 Yes Div Mem[34+Regs[R2]] #3 #5 Register result status F0 F2 F4 F6 F8 F10 F12 F30 Reorder# #3 #6 #4 #5 Busy yes yes yes Yes 16

17 Time=11 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Execute F0 4 Yes SUB.D F8, F2, F6 Waiting to commit F8 X 5 Yes DIV.D F10,F0, F6 Stalled in Issue F10 6 Yes ADD F6, F8, F2 Write result F6 Y Time=12 Mul (7/10),Div stalled, Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3 Mult2 Yes Div Mem[34+Regs[R2]] #3 #5 Register result status F0 F2 F4 F6 F8 F10 F12 F30 Reorder# #3 #6 #4 #5 Busy yes yes yes Yes 17

18 Time=12 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Execute F0 4 Yes SUB.D F8, F2, F6 Waiting to commit F8 X 5 Yes DIV.D F10,F0, F6 Stalled in Issue F10 6 Yes ADD F6, F8, F2 Waiting to commit F6 Y Time=16 Mul writes result, Div stalled Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3 Mult1 Mult2 Yes Div Z Mem[34+Regs[R2]] #5 Register result status F0 F2 F4 F6 F8 F10 F12 F30 Reorder# #3 #6 #4 #5 Busy yes yes yes Yes 18

19 Time=16 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Writing result F0 Z 4 Yes SUB.D F8, F2, F6 Waiting to commit F8 X 5 Yes DIV.D F10,F0, F6 Stalled in Issue F10 6 Yes ADD F6, F8, F2 Waiting to commit F6 Y Time=17 Mul commits, Div executes (1/40), Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3 Mult1 Mult2 Yes Div Z Mem[34+Regs[R2]] #5 Register result status F0 F2 F4 F6 F8 F10 F12 F30 Reorder# #6 #4 #5 Busy yes yes Yes 19

20 Time=17 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 no MUL.D F0,F2,F4 Commits F0 Z 4 Yes SUB.D F8, F2, F6 Waiting to commit F8 X 5 Yes DIV.D F10,F0, F6 Executes F10 6 Yes ADD F6, F8, F2 Waiting to commit F6 Y Time=18 Sub commits, Div executes (2/40), Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3 Mult1 Mult2 Yes Div Z Mem[34+Regs[R2]] #5 Register result status F0 F2 F4 F6 F8 F10 F12 F30 Reorder# #6 #5 Busy yes Yes 20

21 Time=18 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 no MUL.D F0,F2,F4 Commit F0 Z 4 No SUB.D F8, F2, F6 Commit F8 X 5 Yes DIV.D F10,F0, F6 Executes F10 6 Yes ADD F6, F8, F2 Waiting to commit F6 Y Time 57: DIV writes result Time 58: DIV commits Time 59: Add commits and so on 21

22 Multiple Issue Take advantage of the fact that we have multiple functional units further decrease ideal CPI (<1) Two flavors of multiple-issue processors: Superscalar VLIW (Very long instruction Word) Superscalar (static) Superscalar (dynamic) Superscalar (speculative) Issue structure Hazard detection Scheduling Dist. characteristic Dynamic Hardware Static In-order execution Dynamic Hardware Dynamic Limited out-oforder exec. Dynamic Hardware Dynamic with speculation Out-of-order exec. with spec. VLIW Static Software Static No hazards between issue packets EPIC Mostly static Mostly software Mostly static Dependencies marked by compiler Examples Sun UltraSPARC II/III IBM Power2 Pentium III/4 IBM RS64III, i860, Trimedia Itanium Superscalar architectures Issue a varying number of instructions per cycle Statically scheduled using compiler techniques Dynamically scheduled (e.g. using Tomasulo s algorithm) Why a varying number of instructions? Statically scheduled -> no out-of-order execution Can check for hazards at issue time Issue logic will issue instructions which cause a hazard 22

23 Some details Issue unit receives between one and n instructions from the instruction fetch unit (n being typically 4 or 8) issue packet Instruction fetch unit examines each instruction in the issue packet in order If an instruction causes a structural hazard or a data hazard, it will not be issued Since the checking for structural and data hazards are complex operations, the instruction fetch unit is implemented as a pipeline, e.g First stage checks for hazards within the issue packet Second stage checks for hazards with already issued instructions Dynamically scheduled superscalar MIPS Extend Tomasulo s algorithm to handle multiple issues per cycle Must issue instructions to reservation stations in order to maintain program semantics Note: we do not handle the details on how multiple issue works for Tomasulo s algorithms 23

24 Static scheduling Issue a fixed number of instructions As one large instruction or As a fixed instruction packet Parallelism among instructions has to be explicitly indicated by the instructions Statically scheduled by compiler Example: VLIW processors Static Multiple Issue: VLIW Hardware checking for dependencies in issue packets may be expensive and complex Compiler can examine instructions and decide which ones can be scheduled in parallel group instructions into instruction packets VLIW Hardware can then be simplified Processor has multiple functional units and each field of the VLIW is assigned to one unit For example, VLIW could contain 5 fields and one has to contain ALU instruction or branch, two have to contain FP instructions and two have to be memory references Slide based on a lecture by Jelena Mirkovic, University of Delaware 24

25 Static Multiple Issue: VLIW Hardware checking for dependencies in issue packets may be expensive and complex Compiler can examine instructions and decide which ones can be scheduled in parallel group instructions into instruction packets VLIW Hardware can then be simplified Processor has multiple functional units and each field of the VLIW is assigned to one unit For example, VLIW could contain 5 fields and one has to contain ALU instruction or branch, two have to contain FP instructions and two have to be memory references Slide based on a lecture by Jelena Mirkovic, University of Delaware Example Assume VLIW contains 5 fields: ALU instruction or branch, two FP instructions and two memory references Ignore branch delay slot Loop: L.D F0,0(R1) stall, wait for F0 value to propagate ADD.D F4, F0, F2 stall, wait for FP add to be completed stall, wait for FP add to be completed S.D F4, 0(R1) DADDUI R1, R1, #-8 stall, wait for R1 value to propagate BNE R1, R2, Loop Memory reference FP instruction Memory reference ALU instruction ALU instruction Slide based on a lecture by Jelena Mirkovic, University of Delaware 25

26 Example Unroll seven times and rearrange Loop: L.D F0,0(R1) 1 L.D F6,-8(R1) L.D F10,-16(R1) L.D F14,-24(R1) L.D F18,-32(R1) L.D F22,-40(R1) L.D F26,-48(R1) ADD.D F4, F0, F2 ADD.D F8, F6, F2 ADD.D F12, F10, F2 ADD.D F16, F14, F2 ADD.D F20, F18, F2 ADD.D F24, F22, F2 ADD.D F28, F26, F2 3 S.D F4, 0(R1) S.D F8, -8(R1) S.D F12, -16(R1) S.D F16, -24(R1) S.D F20, -32(R1) DADDUI R1, R1, #-56 S.D F24, 16(R1) BNE R1, R2, Loop S.D F28, 8(R1) ALU /branch FP FP mem mem Slide based on a lecture by Jelena Mirkovic, University of Delaware Example Loop: L.D F0,0(R1) L.D F6,-8(R1) L.D F10,-16(R1) 2 L.D F14,-24(R1) L.D F18,-32(R1) L.D F22,-40(R1) L.D F26,-48(R1) ADD.D F4, F0, F2 ADD.D F8, F6, F2 ADD.D F12, F10, F2 ADD.D F16, F14, F2 ADD.D F20, F18, F2 ADD.D F24, F22, F2 ADD.D F28, F26, F2 3 4 S.D F4, 0(R1) S.D F8, -8(R1) S.D F12, -16(R1) S.D F16, -24(R1) S.D F20, -32(R1) DADDUI R1, R1, #-56 S.D F24, 16(R1) BNE R1, R2, Loop S.D F28, 8(R1) ALU /branch FP FP mem mem Slide based on a lecture by Jelena Mirkovic, University of Delaware 26

27 Example Loop: L.D F0,0(R1) L.D F6,-8(R1) L.D F10,-16(R1) L.D F14,-24(R1) L.D F18,-32(R1) 3 L.D F22,-40(R1) L.D F26,-48(R1) ADD.D F4, F0, F2 ADD.D F8, F6, F2 ADD.D F12, F10, F2 ADD.D F16, F14, F2 ADD.D F20, F18, F2 ADD.D F24, F22, F2 ADD.D F28, F26, F S.D F4, 0(R1) S.D F8, -8(R1) S.D F12, -16(R1) S.D F16, -24(R1) S.D F20, -32(R1) DADDUI R1, R1, #-56 S.D F24, 16(R1) BNE R1, R2, Loop S.D F28, 8(R1) 6 ALU /branch FP FP mem mem Slide based on a lecture by Jelena Mirkovic, University of Delaware Example Loop: L.D F0,0(R1) L.D F6,-8(R1) L.D F10,-16(R1) L.D F14,-24(R1) L.D F18,-32(R1) L.D F22,-40(R1) L.D F26,-48(R1) 4 ADD.D F4, F0, F2 ADD.D F8, F6, F2 ADD.D F12, F10, F2 ADD.D F16, F14, F2 ADD.D F20, F18, F2 ADD.D F24, F22, F2 ADD.D F28, F26, F S.D F4, 0(R1) S.D F8, -8(R1) S.D F12, -16(R1) S.D F16, -24(R1) S.D F20, -32(R1) DADDUI R1, R1, #-56 S.D F24, 16(R1) BNE R1, R2, Loop S.D F28, 8(R1) 6 7 ALU /branch FP FP mem mem Slide based on a lecture by Jelena Mirkovic, University of Delaware 27

28 Example Loop: L.D F0,0(R1) L.D F6,-8(R1) L.D F10,-16(R1) L.D F14,-24(R1) L.D F18,-32(R1) L.D F22,-40(R1) L.D F26,-48(R1) ADD.D F4, F0, F2 ADD.D F8, F6, F2 ADD.D F12, F10, F2 ADD.D F16, F14, F2 ADD.D F20, F18, F2 ADD.D F24, F22, F2 ADD.D F28, F26, F2 5 6 S.D F4, 0(R1) S.D F8, -8(R1) S.D F12, -16(R1) S.D F16, -24(R1) S.D F20, -32(R1) DADDUI R1, R1, #-56 S.D F24, 16(R1) BNE R1, R2, Loop S.D F28, 8(R1) ALU /branch FP FP mem mem Slide based on a lecture by Jelena Mirkovic, University of Delaware Example Loop: L.D F0,0(R1) L.D F6,-8(R1) L.D F10,-16(R1) L.D F14,-24(R1) L.D F18,-32(R1) L.D F22,-40(R1) L.D F26,-48(R1) ADD.D F4, F0, F2 ADD.D F8, F6, F2 ADD.D F12, F10, F2 ADD.D F16, F14, F2 ADD.D F20, F18, F2 ADD.D F24, F22, F2 ADD.D F28, F26, F2 6 S.D F4, 0(R1) S.D F8, -8(R1) S.D F12, -16(R1) S.D F16, -24(R1) S.D F20, -32(R1) DADDUI R1, R1, #-56 S.D F24, 16(R1) BNE R1, R2, Loop S.D F28, 8(R1) ALU /branch FP FP mem mem Slide based on a lecture by Jelena Mirkovic, University of Delaware 28

29 Example Loop: L.D F0,0(R1) L.D F6,-8(R1) L.D F10,-16(R1) L.D F14,-24(R1) L.D F18,-32(R1) L.D F22,-40(R1) L.D F26,-48(R1) ADD.D F4, F0, F2 ADD.D F8, F6, F2 ADD.D F12, F10, F2 ADD.D F16, F14, F2 ADD.D F20, F18, F2 ADD.D F24, F22, F2 ADD.D F28, F26, F2 S.D F4, 0(R1) S.D F8, -8(R1) S.D F12, -16(R1) S.D F16, -24(R1) S.D F20, 24(R1) DADDUI R1, R1, #-56 S.D F24, 16(R1) BNE R1, R2, Loop S.D F28, 8(R1) ALU /branch FP FP mem mem Slide based on a lecture by Jelena Mirkovic, University of Delaware Example Loop: L.D F0,0(R1) L.D F6,-8(R1) L.D F10,-16(R1) L.D F14,-24(R1) L.D F18,-32(R1) L.D F22,-40(R1) L.D F26,-48(R1) ADD.D F4, F0, F2 ADD.D F8, F6, F2 ADD.D F12, F10, F2 ADD.D F16, F14, F2 ADD.D F20, F18, F2 ADD.D F24, F22, F2 ADD.D F28, F26, F2 S.D F4, 0(R1) S.D F8, -8(R1) S.D F12, -16(R1) S.D F16, -24(R1) S.D F20, 24(R1) DADDUI R1, R1, #-56 S.D F24, 16(R1) BNE R1, R2, Loop S.D F28, 8(R1) ALU /branch FP FP mem mem Slide based on a lecture by Jelena Mirkovic, University of Delaware 29

30 Example Loop: L.D F0,0(R1) L.D F6,-8(R1) L.D F10,-16(R1) L.D F14,-24(R1) L.D F18,-32(R1) L.D F22,-40(R1) L.D F26,-48(R1) ADD.D F4, F0, F2 ADD.D F8, F6, F2 ADD.D F12, F10, F2 ADD.D F16, F14, F2 ADD.D F20, F18, F2 ADD.D F24, F22, F2 ADD.D F28, F26, F2 S.D F4, 0(R1) S.D F8, -8(R1) S.D F12, -16(R1) S.D F16, -24(R1) S.D F20, 24(R1) DADDUI R1, R1, #-56 S.D F24, 16(R1) BNE R1, R2, Loop S.D F28, 8(R1) Overall 9 cycles for 7 iterations 1.29 per iteration But VLIW was always half-full 9 ALU /branch FP FP mem mem Slide based on a lecture by Jelena Mirkovic, University of Delaware 30

Website for Students VTU NOTES QUESTION PAPERS NEWS RESULTS

Website for Students VTU NOTES QUESTION PAPERS NEWS RESULTS Advanced Computer Architecture- 06CS81 Hardware Based Speculation Tomasulu algorithm and Reorder Buffer Tomasulu idea: 1. Have reservation stations where register renaming is possible 2. Results are directly