DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD

Size: px

Start display at page:

Download "DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD"

Gavin Owens
5 years ago
Views:

DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition,

1 DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011 ADVANCED COMPUTER ARCHITECTURES ARQUITECTURAS AVANÇADAS DE COMPUTADORES (AAC)

2 Outline 2 Dynamic instruction scheduling: Scoreboard Overview Data structures associated with Scoreboard

3 Review of static scheduling Compiler techniques to extract parallelism 3 // C Code for (i=99;i>=0;i--) A[i] = A[i] + K; ; Assembly Code Cont: L.D F0,0(R2) ADD.D S.D DSUBI BNE F2,F0,F1 0(R2),F2 R2,R2,#8 R2,R1,Cont Cont: L.D F0,0(R2) Stall ADD.D F2,F0,F1 Stall Stall S.D 0(R2),F2 DSUBI R2,R2,#8 BNE R2,R1,Cont Stall 9 cycles per iteration Straightforward implementation

4 Review of static scheduling Compiler techniques to extract parallelism 4 // C Code for (i=99;i>=0;i--) A[i] = A[i] + K; ; Assembly Code Cont: L.D F0,0(R2) ADD.D S.D DSUBI BNE S.D F2,F0,F1 0(R2),F2 R2,R2,#8 R2,R1,Cont 0(R2),F2 Cont: L.D F0,0(R2) Stall ADD.D F2,F0,F1 DSUBI R2,R2,#8 BNE R2,R1,Cont S.D 8(R2),F2 6 cycles per iteration Speedup = 9 6 = 1.5 Simple instruction scheduling

5 Review of static scheduling Compiler techniques to extract parallelism 5 // C Code for (i=99;i>=0;i--) A[i] = A[i] + K; Cont: L.D F0,0(R2) L.D F2,-8(R2) L.D F3,-16(R2) 3.5 cycles per iteration Speedup = = 2.57 Loop unrolling and instruction scheduling L.D ADD.D ADD.D ADD.D ADD.D SD SD SD SUB BNE S.D F4,-24(R2) F0,F0,F1 F2,F2,F1 F3,F3,F1 F4,F4,F1 0(R2),F0-8(R2),F2-16(R2),F3 R2,R2,#32 R2,R1,Cont 8(R2),F4

6 Review of static scheduling Compiler techniques to extract parallelism 6 // C Code for (i=99;i>=0;i--) A[i] = A[i] + K; 5 cycles per iteration Speedup = 9 5 = 1.8 L.D F0,0(R2) ADD.D F2,F0,F1 L.D F0,-8(R2) Cont: S.D 0(R2),F2 ADD.D F2,F0,F1 DSUBI R2,R2,#8 BNE R2,R1,Cont L.D F0,-16(R2) S.D 0(R2),F2 ADD.D F2,F0,F1 S.D -8(R2),F2 Software pipelining

Review of static scheduling Compiler techniques to extract parallelism 7 Applying the techniques in VLIW systems # Memory 1 Memory 2 FP 1 FP 2 Integer 1 L.D F0,0(R2) L.D F2,-8(R2) 2 L.D F3,-16(R2) L.

7 Review of static scheduling Compiler techniques to extract parallelism 7 Applying the techniques in VLIW systems # Memory 1 Memory 2 FP 1 FP 2 Integer 1 L.D F0,0(R2) L.D F2,-8(R2) 2 L.D F3,-16(R2) L.D F4,-24(R2) 3 L.D F5,-32(R2) L.D F6,-40(R2) ADD.D F0,F0,F1 ADD.D F2,F2,F1 4 L.D F7,-48(R2) L.D F8,-56(R2) ADD.D F3,F3,F1 ADD.D F4,F4,F1 5 L.D F9,-64(R2) L.D F10,-72(R2) ADD.D F5,F5,F1 ADD.D F6,F6,F1 6 L.D F11,-80(R2) L.D F12,-88(R2) ADD.D F7,F7,F1 ADD.D F8,F8,F1 7 S.D 0(R2),F0 S.D -8(R2),F2 ADD.D F9,F9,F1 ADD.D F10,F10,F1 8 S.D -16(R2),F3 S.D -24(R2),F4 ADD.D F11,F11,F1 ADD.D F12,F12,F1 9 S.D -32(R2),F5 S.D -40(R2),F6 10 S.D -48(R2),F7 S.D -56(R2),F8 DSUBI R0,R0, S.D 32(R2),F9 S.D 24(R2),F10 BNE R0,R1,Cont 12 S.D 16(R2),F11 S.D 8(R2),F12 1,09 cycles per iteration Speedup = 9 12/11 = 8.25

8 8 Dynamic instruction scheduling Can we do better using dynamic approaches?

9 9 Dynamic scheduling General idea Can dynamic scheduling be used to reduce the number of cycles per instruction (CPI)? Static scheduling by the compiler tries to change the order of the instructions in order to reduce the number of stalls in the pipeline. Dynamic scheduling uses the same idea. Whenever a hazard forces the pipeline to stall, try to issue another instruction, e.g., DIV.D SUB.D ADD.D F0,F1,F2 F15,F0,F3 F6,F4,F5 The SUB.D instruction must stall until the division ends, which might take dozens of cycles The ADD.D instruction has no dependency with the previous instructions Out-of-order execution

10 10 Dynamic scheduling Hazards with dynamic scheduling In-order pipelined architectures can only generate RAW (Read after Write) hazards Out-of-order pipelined architectures generate more hazards: WAR (Write after Read), e.g., IN ORDER: DIV.D F0,F1,F2 SUB.D F15,F0,F3 ADD.D F3,F4,F5 Dynamic Scheduling OUT OF ORDER: DIV.D F0,F1,F2 ADD.D F3,F4,F5 SUB.D F15,F0,F3 WAR Observation: As it was seen in static scheduling, name dependencies (i.e., WAR and WAW dependencies), can be solved by register renaming

11 Dynamic scheduling Hazards with dynamic scheduling In-order pipelined architectures can only generate RAW (Read after Write) hazards Out-of-order pipelined architectures generate more hazards: WAW

11 11 Dynamic scheduling Hazards with dynamic scheduling In-order pipelined architectures can only generate RAW (Read after Write) hazards Out-of-order pipelined architectures generate more hazards: WAW (Write after Write), e.g., IN ORDER: DIV.D F0,F1,F2 SUB.D F15,F0,F3 S.D 0(R2),F15 ADD.D F15,F1,F2 Dynamic Scheduling OUT OF ORDER: DIV.D F0,F1,F2 ADD.D F15,F1,F2 SUB.D F15,F0,F3 S.D 0(R2),F15 WAW Observation: As it was seen in static scheduling, name dependencies (i.e., WAR and WAW dependencies), can be solved by register renaming

Dynamic scheduling Scoreboard general overview 12 The scoreboard is a centralized dynamic scheduling technique that monitors

functional units are available, it dispatches the instruction so that it can be executed It monitors multiple instructions and

well, since it does not implement register renaming Out-of-order execution can also lead to structural conflicts that must be

12 Dynamic scheduling Scoreboard general overview 12 The scoreboard is a centralized dynamic scheduling technique that monitors each instruction waiting to be dispatched for execution Once it determines that all the source operands and the required functional units are available, it dispatches the instruction so that it can be executed It monitors multiple instructions and dispatches the first instruction(s) to have all dependencies satisfied The scoreboard does not resolve WAR and WAW hazards well, since it does not implement register renaming Out-of-order execution can also lead to structural conflicts that must be solved by the scoreboard Instruction queue IF Oldest instruction In queue Ready Ready Ready EX/MEM WB New instructions from instruction memory

13 Dynamic scheduling Scoreboard implementation 13 Divide the ID/OF stage in two parts: ISSUE Instruction decoding and verification of structural and WAW hazards Once all structural and WAW conflicts are solved, issue the instruction to the dispatch stage Instruction issue is performed in-order READ OPERANDS (Dispatch) Wait until all data hazards are solved, before reading the operands from the register file and before dispatching the instruction to execution Instruction dispatch is performed out-of-order DISP. IF ISSUE Ready Ready Ready EX/MEM WB IN ORDER OUT-OF-ORDER

Dynamic scheduling Scoreboard implementation 14 Divide the ID/OF stage in two parts Issue and Read Operands Instructions normally executed after solving all conflicts Scoreboard was introduced in the

14 Dynamic scheduling Scoreboard implementation 14 Divide the ID/OF stage in two parts Issue and Read Operands Instructions normally executed after solving all conflicts Scoreboard was introduced in the CDC 6600 mainframe in First delivered to the Lawrence Radiation Laboratory, part of the University of California at Berkeley, in 1964, the CDC 6600 was used primarily for high-energy nuclear physics research. It achieved a peak performance of 1 Mega FLOPS (3x faster than its faster predecessor). The CDC 6600 had 16 separate functional units (all non-pipelined): 4x FP, 7x Integer, 5x Load/Store DISP. IF ISSUE Ready Ready Ready EX/MEM WB IN ORDER OUT-OF-ORDER

15 Dynamic scheduling with scoreboard Issue stage 15 Resolves all structural and WAW hazards: When an instruction is received from the IF stage, perform the following steps: a) Decode the instruction b) Verify if any active instruction has the same destination register as the current instruction (check for WAW hazards) c) Verify if the target functional unit is available (check for structural hazards) d) If no WAW or structural hazard is found, issue the instruction to the Read Operands (Dispatch) stage; otherwise stall the pipeline DISP. IF ISSUE Ready Ready Ready EX/MEM WB IN ORDER OUT-OF-ORDER

16 Dynamic scheduling with scoreboard Read Operands (dispatch) stage 16 Resolves all structural and RAW hazards: When an instruction is received from the Issue stage perform the following steps: a) Verify if the source operands are available by checking if source operands will be written by any earlier issued active instruction (check for RAW hazards) b) If no RAW or hazard is found, dispatch the instruction to the execution stage DISP. IF ISSUE Ready Ready Ready EX/MEM WB IN ORDER OUT-OF-ORDER

Dynamic scheduling with scoreboard Execute (and memory) stage 17 Execute the operation When finished, inform the scoreboard that the functional unit has

17 Dynamic scheduling with scoreboard Execute (and memory) stage 17 Execute the operation When finished, inform the scoreboard that the functional unit has completed execution forwarding is performed; the scoreboard must solve all hazards Scoreboard DISP. IF ISSUE Ready Ready Ready EX/MEM WB IN ORDER OUT-OF-ORDER

Dynamic scheduling with scoreboard Write back stage 18 Resolve all WAR hazards before writing the results Verify if any WAR hazard exists by: Checking if any preceding instruction (in-order issue)

18 Dynamic scheduling with scoreboard Write back stage 18 Resolve all WAR hazards before writing the results Verify if any WAR hazard exists by: Checking if any preceding instruction (in-order issue) has not read its operands, and That one of its operands depends on the register the completing instruction is trying to write, and The missing operand comes from another instruction If no WAR exists, allow the result to be written; otherwise delay writing the result to the destination register Scoreboard DISP. IF ISSUE Ready Ready Ready EX/MEM WB IN ORDER OUT-OF-ORDER

19 19 Dynamic scheduling with scoreboard Write back stage Resolve all WAR hazards before writing the results Verify if any WAR hazard exists by: Checking if any preceding instruction (in-order issue) has not read its operands, and That one of its operands depends on the register the completing instruction is trying to write, and The missing operand comes from another instruction If no WAR exists, allow the result to be written; otherwise delay writing the result to the destination register EXAMPLE: IN ORDER: DIV.D F0,F1,F2 SUB.D F15,F0,F3 ADD.D F3,F4,F5 Dynamic Scheduling OUT OF ORDER: DIV.D F0,F1,F2 ADD.D F3,F4,F5 SUB.D F15,F0,F3 WAR hazard solved by retaining the ADD.D instruction in WB stage until the SUB.D has read the value of R3

20 Dynamic scheduling with scoreboard Information on the scoreboard 20 Instruction status Where is the instruction: issue, read operands, execute (EX), write back (WB) Status of each functional unit (FU): Busy FU is occupied Op Operation being executed (e.g., +, -) Fi Destination register number Fj,Fk Source registers number Qj,Qk FUs computing the value of source registers Fj,Fk Rj,Rk Flags indicating whether the register value is already available Register result status Indicates which functional unit will produce the value of each register If an active (executing) instruction writes on a register, indicate the corresponding functional unit name, otherwise leave empty

21 Dynamic scheduling with scoreboard Example 21 Consider the execution of the instructions on the left on a processor with: n-pipelined functional units: 1x Integer ALU, with 1 cycle latency (INT) 1x FP multiplier, with 10 cycles latency (MULT1, ) 1x FP Adder/subtractor, with 2 cycles latency (ADD) 1x INT/FP Division, with 40 cycles latency (DIV) L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 Load/store unit has 2 cycles latency (Add calc+mem access), with the computation of the effective address being performed at the integer ALU

22 22 Dynamic scheduling with scoreboard Information on the scoreboard L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 INT MULT1 ADD DIV FU

23 23 Information on the scoreboard ISSUE stage L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 INT MULT1 ADD DIV Check if FU is available Update target FU on issue FU Check for WAW conflicts Update target register on issue

24 24 Information on the scoreboard DISPATCH stage L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 INT MULT1 ADD DIV FU Dispatch all operations that have data ready Check if operands are ready Set to on issue

25 25 Information on the scoreboard WB stage L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 INT MULT1 ADD DIV FU Check for WAR hazards by checking if any instruction is waiting to read some value originated from a preceding instruction Update Ready information if necessary Clear corresponding register

26 26 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) 1 L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 CYCLE 1: Issue is performed since the INT unit is not busy INT Yes L.D F6 #34 R Yes MULT1 ADD DIV Updated on rising edge FU INT Updated on rising edge

27 27 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) 1 2 L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 CYCLE 2: Since all operands are ready, the first load is dispatched for execution; the second load cannot issue until the FU becomes available INT (2 cycles left) Yes L.D F6 #34 R2 MULT1 ADD DIV Updated on rising edge FU INT

28 28 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 CYCLE 4: The INT FU informs the scoreboard that the execution has finished INT (Done) Yes L.D F6 #34 R2 MULT1 ADD DIV FU INT

29 29 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 CYCLE 5: The scoreboard checks that no WAR conflict exists and allows the instruction to complete INT MULT1 ADD DIV Cleared at end of cycle (rising edge) FU

30 30 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) 6 MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 CYCLE 6: The second load instruction can finally issue, since the FU is now available INT Yes L.D F2 #45 R Yes MULT1 ADD DIV Updated at end of cycle (rising edge) FU INT

31 31 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) 6 7 MUL.D F0,F2,F4 7 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 CYCLE 7: The second load is dispatched for execution, while the MUL.D is issued INT (2 cycles left) Yes L.D F2 #45 R MULT1 Yes MUL.D F0 F2 F4 INT - Yes ADD DIV FU MULT1 INT

32 32 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 7 SUB.D F8,F6,F2 8 DIV.D F10,F0,F6 ADD.D F6,F8,F2 CYCLE 8: The MUL.D cannot be dispatched until operand F2 is written to register INT (1 cycle left) Yes L.D F2 #45 R MULT1 Yes MUL.D F0 F2 F4 INT - Yes ADD Yes SUB.D F8 F6 F2 - INT Yes DIV FU MULT1 INT SUB.D

33 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) 1 2 3-4 5 L.D F2,45(R3) 6 7 8-9 MUL.D F0,F2,F4 7 SUB.D F8,F6,F2 8 DIV.D F10,F0,F6 9 ADD.

33 33 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 7 SUB.D F8,F6,F2 8 DIV.D F10,F0,F6 9 ADD.D F6,F8,F2 CYCLE 9: The second load finishes execution, while the DIV.D is issued; neither the MUL.D or SUB.D can be dispatched because operand F2 is not yet on written on the register file INT Yes L.D F2 #45 R MULT1 Yes MUL.D F0 F2 F4 - - Yes ADD Yes SUB.D F8 F6 F2 - - Yes DIV Yes DIV.D F10 F0 F6 MULT1 - Yes FU MULT1 ADD DIV

34 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) 1 2 3-4 5 L.D F2,45(R3) 6 7 8-9 10 MUL.D F0,F2,F4 7 SUB.D F8,F6,F2 8 DIV.D F10,F0,F6 9 ADD.

34 34 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 7 SUB.D F8,F6,F2 8 DIV.D F10,F0,F6 9 ADD.D F6,F8,F2 CYCLE 10: The second load writes the result to register; neither the MUL.D nor SUB.D can be dispatched because F2 is not yet on RF, while the DIV.D is waiting for register F10. The ADD.D generates a structural hazard since the ADD unit is occupied INT Yes L.D F2 #45 R MULT1 Yes MUL.D F0 F2 F4 - - Yes Yes ADD Yes SUB.D F8 F6 F2 - - Yes Yes DIV Yes DIV.D F10 F0 F6 MULT1 - Yes Updated at end of cycle (rising edge) FU MULT1 ADD DIV

35 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) 1 2 3-4 5 L.D F2,45(R3) 6 7 8-9 10 MUL.D F0,F2,F4 7 11 SUB.D F8,F6,F2 8 11 DIV.D F10,F0,F6 9 ADD.

35 35 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F SUB.D F8,F6,F DIV.D F10,F0,F6 9 ADD.D F6,F8,F2 CYCLE 11: The MUL.D and SUB.D can be dispatched simultaneously; the next ADD.D cannot be issued because the ADD unit is busy INT MULT1 (10 cycles left) Yes MUL.D F0 F2 F4 - - ADD (2 cycles left) Yes SUB.D F8 F6 F2 - - DIV Yes DIV.D F10 F0 F6 MULT1 - Yes FU MULT1 ADD DIV

36 36 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F SUB.D F8,F6,F DIV.D F10,F0,F6 9 ADD.D F6,F8,F2 CYCLE 13: The SUB.D finishes execution INT MULT1 (8 cycles left) Yes MUL.D F0 F2 F4 - - ADD (Done) Yes SUB.D F8 F6 F2 - - DIV Yes DIV.D F10 F0 F6 MULT1 - Yes FU MULT1 ADD DIV

37 37 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F SUB.D F8,F6,F DIV.D F10,F0,F6 9 ADD.D F6,F8,F2 CYCLE 14: The SUB.D writes the result INT MULT1 (7 cycles left) Yes MUL.D F0 F2 F4 - - ADD DIV Yes DIV.D F10 F0 F6 MULT1 - Yes FU MULT1 DIV

38 38 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F SUB.D F8,F6,F DIV.D F10,F0,F6 9 ADD.D F6,F8,F2 15 CYCLE 15: The ADD.D has no further structural hazards and can be issued INT MULT1 (6 cycles left) Yes MUL.D F0 F2 F4 - - ADD Yes ADD.D F6 F8 F2 - - Yes Yes DIV Yes DIV.D F10 F0 F6 MULT1 - Yes FU MULT1 ADD DIV

39 39 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F SUB.D F8,F6,F DIV.D F10,F0,F6 9 ADD.D F6,F8,F CYCLE 16: The ADD.D has all operands ready and can be dispatched INT MULT1 (5 cycles left) Yes MUL.D F0 F2 F4 - - ADD (2 cycles left) Yes ADD.D F6 F8 F2 - - DIV Yes DIV.D F10 F0 F6 MULT1 - Yes FU MULT1 ADD DIV

40 40 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F SUB.D F8,F6,F DIV.D F10,F0,F6 9 ADD.D F6,F8,F CYCLE 18: The ADD.D has finished executing INT MULT1 (3 cycles left) Yes MUL.D F0 F2 F4 - - ADD (Done) Yes ADD.D F6 F8 F2 - - DIV Yes DIV.D F10 F0 F6 MULT1 - Yes FU MULT1 ADD DIV

41 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) 1 2 3-4 5 L.D F2,45(R3) 6 7 8-9 10 MUL.D F0,F2,F4 7 11 12- SUB.D F8,F6,F2 8 11 12-13 14 DIV.D F10,F0,F6 9 ADD.

41 41 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F SUB.D F8,F6,F DIV.D F10,F0,F6 9 ADD.D F6,F8,F CYCLE 19: The ADD.D cannot write the result until the DIV.D instruction has read its operands INT MULT1 (2 cycles left) Yes MUL.D F0 F2 F4 - - ADD (waiting to write) Yes ADD.D F6 F8 F2 - - DIV Yes DIV.D F10 F0 F6 MULT1 - Yes FU MULT1 ADD DIV

42 42 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F SUB.D F8,F6,F DIV.D F10,F0,F6 9 ADD.D F6,F8,F CYCLE 21: The MUL.D instruction has finished executing INT MULT1 (Done) Yes MUL.D F0 F2 F4 - - ADD (waiting to write) Yes ADD.D F6 F8 F2 - - DIV Yes DIV.D F10 F0 F6 MULT1 - Yes FU MULT1 ADD DIV

43 43 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F SUB.D F8,F6,F DIV.D F10,F0,F6 9 ADD.D F6,F8,F CYCLE 22: The MUL.D instruction writes the result INT MULT1 ADD (waiting to write) Yes ADD.D F6 F8 F2 - - DIV Yes DIV.D F10 F0 F6 MULT1 - Yes Yes FU ADD DIV

44 44 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F SUB.D F8,F6,F DIV.D F10,F0,F ADD.D F6,F8,F CYCLE 23: The DIV.D instruction reads the operands and is dispatched for execution INT MULT1 ADD (waiting to write) Yes ADD.D F6 F8 F2 - - DIV (40 cycles left) DIV.D F10 F0 F6 MULT1 - FU ADD DIV

45 45 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F SUB.D F8,F6,F DIV.D F10,F0,F ADD.D F6,F8,F CYCLE 24: The ADD.D can write the results to register, since there are no more WAR hazards INT MULT1 ADD DIV (39 cycles left) DIV.D F10 F0 F6 MULT1 - FU DIV

46 46 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F SUB.D F8,F6,F DIV.D F10,F0,F ADD.D F6,F8,F CYCLE 64: The DIV.D writes the results to register F10 INT MULT1 ADD DIV FU

47 47 Dynamic scheduling with scoreboard Scoreboard update example L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F SUB.D F8,F6,F DIV.D F10,F0,F ADD.D F6,F8,F Issue is performed in-order Dispatch (operand read), execute and write back are performed out-of-order INT MULT1 ADD DIV FU

48 Dynamic scheduling with scoreboard Limitations 48 Centralized control Status and control signals are generated in the scoreboard unit, where all the scheduling information is kept Forwarding techniques cannot be easily applied The scoreboard must be informed whenever the register is read WAW hazards completely block the pipeline; WAR hazards delay writing the results to register, thus delaying the dispatch of new instructions Register renaming is not applied! The scoreboard is a complex structure that may require multiple updates in a single clock cycle NEXT: a distributed dynamic scheduling technique

49 49 Next lesson Dynamic techniques to extract parallelism Tomasulo

DYNAMIC SPECULATIVE EXECUTION

DYNAMIC SPECULATIVE EXECUTION Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson, Morgan Kaufmann,