Exercises (III) Output dependencies: Instruction 6 has output dependence with instruction 1 for access to R1

Size: px

Start display at page:

Download "Exercises (III) Output dependencies: Instruction 6 has output dependence with instruction 1 for access to R1"

Robyn Gray
5 years ago
Views:

1 1) a)solution: Let s number instructions: 1. LD R1, 50(R2) 2. ADD R3, R1, R4 3. LD R5, 100(R3) 4. MUL R6, R5, R7 5. STORE R6, 50(R2) 6. ADD R1, R1, # SUB R2, R2, #8 b) Exercises (III) Data dependencies: Instruction 2 depends on instruction 1 for value R1 Instruction 6 depends on instruction 1 for value R1 Instruction 3 depends on instruction 2 for value R3 Instruction 4 depends on instruction 3 for value R5 Instruction 5 depends on instruction 4 for value R6 Antidependencies: Instruction 6 is antidependent on instruction 2 for access to R1 Instruction 7 is antidependent on instruction 1 for access to R2 Instruction 7 is antidependent on instruction 5 for access to R2 Output dependencies: Instruction 6 has output dependence with instruction 1 for access to R1 Control dependencies: None LD R1, 50(R2) IF ID EXE MEM WB ADD R3, R1, R4 IF s s ID EXE MEM WB LD R5, 100(R3) IF s s ID EXE MEM MUL R6, R5, R7 IF s s STORE R6, 50(R2) ADD R1, R1, #100 SUB R2, R2, #

2 WB ID EXE MEM WB IF s s ID EXE MEM WB IF ID EXE MEM WB IF ID EXE MEM WB It takes 19 cylces. We stall for 8 cycles c) LD R1, 50(R2) IF ID ALU1 MEM1 MEM2 ALU2 WB ADD R3, R1, R4 IF s s s s ID ALU1 MEM1 MEM2 LD R5, 100(R3) IF s s s MUL R6, R5, R7 STORE R6, 50(R2) ADD R1, R1, #100 SUB R2, R2, # ALU2 WB s ID ALU1 MEM1 MEM2 ALU2 WB IF s s s s ID ALU1 MEM1 MEM2 IF s s s ALU2 WB s ID ALU1 MEM1 MEM2 ALU2 WB IF ID ALU1 MEM1 MEM2 ALU2 WB IF ID ALU1 MEM1 MEM2 ALU2 WB It takes 29 cycles. We stall for 16 cycles. II) Since branches are resolved in ALU2 stage, we can start IF only after that. We would like to start after IF stage. So the penalty is 5 cycles

3 2) The following C code z = 1.0 if ( x!= 0 ) { z = c*x + y } can be mapped to the following sequence of instructions, assuming that the register F0 always contains zero : 1. LD.D F4, 16(R2) /* Load x */ 2. ADDI.D F6, F0, #1 /* set z=1 */ 3. BEQ F4, F0, DONE /* if x==0 goto DONE */ 4. LD.D F12, 22(R6) /* Load y */ 5. MUL.D F10, F4, F8 /* Calculate c * x */ 6. ADD.D F6, F10, F12 /* Add c*x + y */ 7. DONE: S.D F6, 0(R3) /* Store z */ a. Find all dependencies in the code segment and list them by category (data dependence, output dependence, anti-dependence and control dependencies) Data dependencies: Instr. 5 on Instr. 1 on register F4 Instr. 5 on Instr. 3 on register F4 Instr. 6 on Instr. 4 on register F12 Instr. 6 on Instr. 5 on register F10 Instr. 7 on Instr. 6 on register F6 Instr. 7 on Instr. 2 on register F6 Output dependencies: Instr. 2 and Instr. 6 on register F6 Antidependencies: none Control dependencies: Instr. 4, 5, 6 on Instr. 3

4 For parts b. and c. please make the following assumptions: regular 5 stage pipeline without forwarding dual-port memory no branch prediction no special hardware to speed up branch operations, i.e. the outcome of a branch is known after the WB stage of the according instruction. b. How many cycles does this code sequence take on assuming that the branch in the C code is being taken ( i.e. x!=0)? Give the number of stall cycles. LD.D F4, 16(R2) IF ID EXE MEM WB ADDI.D F6, F0, #1 IF ID EXE MEM WB BEQ F4, F0, DONE IF S ID EXE MEM WB LD.D F12, 22(R6) S S S S IF ID MUL.D F10, F4, F8 IF ADD.D F6, F10, F12 DONE:S.D F6, 0(R3) EXE MEM WB ID EXE MEM WB IF S S ID EXE MEM WB IF S S ID EXE MEM WB It takes 20 cycles to execute the code sequence with 9 stall cycles.

5 c. How many cycles does this code sequence take assuming that the branch in the C code is not being taken ( i.e. x=0)? Give the number of stall cycles. LD.D F4, 16(R2) IF ID EXE MEM WB ADDI.D F6, F0, #1 IF ID EXE MEM WB BEQ F4, F0, DONE IF S ID EXE MEM WB LD.D F12, 22(R6) MUL.D F10, F4, F8 ADD.D F6, F10, F12 DONE:S.D F6, 0(R3) S S S S IF ID EXE MEM WB The code sequence takes 13 clock cycles with 5 stall cycles. d. In order to reduce the branch penalty, we would like to apply a delayed branch approach. Please indicate which instruction could be used to fill the delayed branch slot. We are looking for an instruction that will be executed under any circumstances in order to fill the branch delay slot. Instruction no 2 ( ADDI.D F6,F0,#1) would be a candidate.

6 3) The expression z = c*x + y can be mapped to the following sequence of instructions: 1. LD.D F4, 16(R2) /* Load x */ 2. LD.D F6, 48(R2) /* Load y */ 3. MUL.D F10, F4, F8 /* Calculate c * x */ 4. ADD.D F8, F10, F6 /* Add c*x to y */ 5. S.D F8, 0(R3) /* Store z */ a. Find all dependencies in the code segment and list them by category (data dependence, output dependence, anti-dependence and control dependencies) Data dependence: Instruction 3 on instruction 1 in register F4 Instruction 4 on instruction 3 in register F10 Instruction 4 on instruction 2 in register F6 Instruction 5 on instruction 4 on register F8 Output dependence: None Anti-dependence: Instruction 4 on instruction 3 in register F8 Control Dependence: None b. How many cycles does this code sequence take on the regular 5 stage pipeline assuming that we have multi-port memory? Indicate the number of stall cycles. LD.D F4, 16(R2) IF ID EXE MEM WB LD.D F6, 48(R2) IF ID EXE MEM WB MUL.D F10, F4, F8 IF s ID EXE MEM WB ADD.D F8, F10, F6 IF s s ID EXE MEM S.D F8, 0(R3) IF s s WB ID EXE MEM WB 14 cycles, 5 cycles stall

7 c. What is forwarding and what problem does it solve? ( 1 Pt) Forwarding is the technique used to pass data items from one stage of the registry to another stage. By using this technique, one can avoid that an instruction requiring a data item which is produced by a previous instruction has to wait until the WB stage to get the item. d. Assume now that you have forwarding available among the required ( 3 Pts) stages of the pipeline. How many cycles would it now take to execute the code sequence and how many stall cycles do you have this time? LD.D F4, 16(R2) IF ID EXE MEM WB LD.D F6, 48(R2) IF ID EXE MEM WB MUL.D F10, F4, F8 IF ID EXE MEM WB ADD.D F2, F10, F6 IF ID EXE MEM WB S.D F2, 0(R3) IF ID EXE MEM WB The MUL.D F4 needed at cycle 5 for EXE, Instruction 1 has the data item available after the MEM stage (Cycle 4)-> Forwarding helps THE ADD.D requires F10 in EXE (cycle 6), F10 determined in EXE of the MUL instruction (cycle 5) - > forwarding helps The Store needs F2 in the MEM (cycle 8), ADD produces it in EXE (cycle 6) -> forwarding help 9 cycles total, 0 stall cycles

8 4. Remember: R0 always contains 0 DADD R1, R0, R0 /* Set R1 = 0 */ DADD R2, R0, R0 /* Set R2 = 0 */ DADD R3, R0, R0 /* Set R3 = 0 */ DADDI R4, R0, #5 /* Set R4 = 5 */ Loop: BEQ R3, R4, Done /* branch b1. If ( R3 == R4 ) goto Done */ BNEZ R1, If2 /* branch b2. If ( R1!= 0 ) goto If2 */ DADDI R2, R0, #1 /* Set R2 = 1 */ DADDI R1, R0, #2 /* Set R1 = 2 */ If2: BNEZ R2, End /* branch b3: If (R2!= 0 ) goto End */ DADDI R1, R0, #1 /* Set R1 = 1 */ DADDI R2, R0, #2 /* Set R2 = 2 */ End: DADDI R1, R1, #-1 /* R1 = R1 1 */ DADDI R2, R2, #-1 /* R2 = R2-1 */ DADDI R3, R3, #1 /* R3 = R3 + 1 */ J Loop /* Jump to Loop */ Done: a) I b1 prediction b1 outcome b1 new prediction 0 10(T) NT 00(NT) miss 1 00(NT) NT 00(NT) 2 00(NT) NT 00(NT) 3 00(NT) NT 00(NT) 4 00(NT) NT 00(NT) 5 00(NT) T 01(NT) miss

9 b) R1 R2 R3 R4 b1 act. b2 pred b2 act. R1 R2 b3 pred b3 act NT NT/NT NT 2 1 NT/NT T NT NT/NT T 1 0 T/NT NT NT NT/T NT 2 1 T/NT T NT NT/T T 1 0 T/NT NT NT NT/T NT 2 1 T/NT T T NT/T T/NT c) R1 R2 R3 R4 b1 act. b2 pred b2 act. R1 R2 b3 pred b3 act NT 00/00 NT /00 T NT 00/00 T /00 NT NT 00/01 NT /00 T NT 00/01 T /00 NT NT 00/11 NT /00 T T 00/11 11/00

10 d) R1 R2 R3 R4 b1 pred b1 act. b2 pred b2 act. R1 R2 b3 pred b3 act NT/NT NT NT/NT NT 2 1 NT/NT T NT/NT NT NT/NT T 1 0 T/NT NT NT/NT NT T/NT NT 2 1 T/NT T NT/NT NT NT/NT T 1 0 T/NT NT NT/NT NT T/NT NT 2 1 T/NT T NT/NT T NT/T T/NT

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add