Architectures for Multimedia Systems - Code: 078650 (073335) Prof. C. SILVANO 5th July 2010 SURNAME NAME STUDENT ID (MATRICOLA) EMAIL EXERCISE 1 - PIPELINE Given the following loop expressed in a high level language: do { VETTB[i] = VETTA[i]; VETTC[i] = VETTA[i] if ( vetta[i] >= 0 ) { } i++; } while (i!= N) vettd[i] = vetta[i]; Il programma sia stato compilato nel codice assembly MIPS riportato nella seguente tabella. Si supponga che i registri $t6, e $t7 siano stati inizializzati rispettivamente ai valori 0 e N. I simboli VETTA, VETTB, VETTC e VETTD sono costanti a 16 bit, prefissate. La frequenza di clock del processore vale 1 GHz. Si consideri una generica iterazione del ciclo eseguita dal processore MIPS in modalità pipeline a 5 stadi.
a) Assuming there are NO optimisation in the pipeline: 1. Identify the RAW (Read After Write) data hazards and control hazards. 2. Identify the number of stalls to be inserted before each instruction (or between the stage IF and ID of each instruction) necessary to solve the harzards. Num. stalls INSTRUCTION C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 Hazard Type DO: lw $t2,vetta($t6) IF ID EX ME WB sw $t2, VETTB($t6) IF ID EX ME WB sw $t2, VETTC($t6) IF ID EX ME WB slt $t0,$t2,$0 IF ID EX ME WB bne $t0, $0, INC IF ID EX ME WB sw $t2,vettd($t6) IF ID EX ME WB INC: addi $t6,$t6,4 IF ID EX ME WB bne $t6,$t7, DO IF ID EX ME WB END: IF ID EX ME WB NOTA: slt $t0,$t2,$0 # if $t2 < $0 then set $t0 = 1 otherwise $t0 =0; Assuming that (vetta[i] >= 0)in the 50% of the cases: Average Instruction Count: IC AVE = Average Number of Stalls: STALL AVE = Asymptotic CPI (N ): CPI AS = Asymptotic Throughput expressed in MIPS (N ): MIPS AS =
b) Assuming there are the following optimisations in the pipeline - In the Register File it is possible the read and write at the same address in the same clock cycle; - Forwarding - Computation of PC e TARGET ADDRESS for branch & jump instructions anticipated in the ID stage 1. Identify the RAW (Read After Write) data hazards and control hazards. 2. Identify the number of stalls to be inserted before each instruction (or between the stage IF and ID of each instruction) necessary to solve the harzards. 3. Identify in the last column the forwarding path used Num. ISTRUCTION C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 Hazard stalls Type DO: lw $t2,vetta($t6) IF ID EX ME WB sw $t2, VETTB($t6) IF ID EX ME WB sw $t2, VETTC($t6) IF ID EX ME WB slt $t0,$t2,$0 IF ID EX ME WB bne $t0, $0, INC IF ID EX ME WB sw $t2,vettd($t6) IF ID EX ME WB INC: addi $t6,$t6,4 IF ID EX ME WB bne $t6,$t7, DO IF ID EX ME WB END: IF ID EX ME WB FORWARDING PATH NOTA: slt $t0,$t2,$0 # if $t2 < $0 then set $t0 = 1 otherwise $t0 =0; Assuming that (vetta[i] >= 0)in the 50% of the cases: Average Instruction Count: IC AVE = Average Number of Stalls: STALL AVE = Asymptotic CPI (N ): CPI AS = Asymptotic Throughput expressed in MIPS (N ): MIPS AS = Asymptotic SpeedUp with respect to the first case: SpeedUp AS =
c) Assuming there are the previous optimisations in the pipeline with static branch prediction BTFNT (BACKWARD TAKEN FORWARD NOT TAKEN) with BRANCH TARGET BUFFER. 1. Identify the RAW (Read After Write) data hazards and control hazards. 2. Identify the number of stalls to be inserted before each instruction (or between the stage IF and ID of each instruction) necessary to solve the harzards. 3. Identify in the Static Branch Prediction (Taken/Not Taken) Num. stalls ISTRUCTION PRED. T/NT DO: lw $t2,vetta($t6) - IF ID EX ME WB sw $t2, VETTB($t6) - IF ID EX ME WB sw $t2, VETTC($t6) - IF ID EX ME WB slt $t0,$t2,$0 - IF ID EX ME WB bne $t0, $0, INC IF ID EX ME WB sw $t2,vettd($t6) - IF ID EX ME WB INC: addi $t6,$t6,4 - IF ID EX ME WB bne $t6,$t7, DO IF ID EX ME WB END: IF ID EX ME WB NOTA: slt $t0,$t2,$0 # if $t2 < $0 then set $t0 = 1 otherwise $t0 =0; C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 Hazard Type Assuming that (vetta[i] >= 0)in the 50% of the cases: Average Instruction Count: IC AVE = Average Number of Stalls: STALL AVE = Asymptotic CPI (N ): CPI AS = Asymptotic Throughput expressed in MIPS (N ): MIPS AS = Asymptotic SpeedUp with respect to the first case: SpeedUp AS =
EXERCISE 2: SCOREBOARD Assuming the program be executed by a CPU with dynamic scheduling based on SCOREBOARD with: 2 LOAD/STORE units (LDU1, LDU2) with Latency 4 2 ALU/BR/J units (ALU1, ALU2) with latency 2. Check structural hazards in ISSUE phase Check RAW hazads in READ OPERANDS phase Check WAR e WAW in WRITE BACK phase Forwarding Static Branch Prediction BTFNT (BACKWARD TAKEN FORWARD NOT TAKEN) with Branch Target Buffer a) Assuming the case (vetta[i] >= 0), fill in the following table: ISTRUCTION PRED. ISSUE READ EXECUTION WRITE HAZARDS UNIT T/NT OPERANDS COMPLETE BACK DO: lw $t2,vetta($t6) - 1 2 6 7 LDU1 sw $t2,vettb($t6) - 2 7 11 12 RAW $t2 solved with FW LDU2 sw $t2,vettc($t6) - slt $t0, $t2, $0 - bne $t0,$0, INC sw $t2,vettd($t6) - INC: addi $t6, $t6, 4 - bne $t6, $t7, DO END:
EXERCISE 3 - TOMASULO Assuming the program be executed by a CPU with dynamic scheduling based on TOMASULO algorithm: 2 RESERVATION STATIONS (RS1, RS2) + 2 LOAD/STORE UNITS (LDU1, LDU2) with latency 4 2 RESERVATION STATIONS (RS3, RS4) + 2 ALU/BR/J UNITS (ALU1, ALU2) with latency 2 Check structural hazards for RS in ISSUE phase Check RAW hazads and Check structural hazards for FUs in START EXECUTE phase WRITE RESULT in RS and RF Forwarding Static Branch Prediction BTFNT (BACKWARD TAKEN FORWARD NOT TAKEN) with Branch Target Buffer a) Assuming the case (vetta[i] >= 0), fill in the following table: INSTRUCTION PRED ISSUE START WRITE HAZARDS RS UNIT T/NT EXEC RESULTS DO: lw $t2,vetta($t6) - 1 2 6 RS1 LDU1 sw $t2,vettb($t6) - 2 6 10 RAW $t2 solved with FW RS2 LDU2 sw $t2,vettc($t6) - slt $t0, $t2, $0 - bne $t0,$0, INC sw $t2,vettd($t6) - INC: addi $t6, $t6, 4 - bne $t6, $t7, DO END:
EXERCISE 4 - TOMASULO Assuming the program be executed by a CPU with dynamic scheduling based on TOMASULO algorithm: 4 RESERVATION STATIONS (RS1, RS2, RS3, RS4) + 2 LOAD/STORE UNITS (LDU1, LDU2) with latency 4 4 RESERVATION STATIONS (RS5, RS6, RS7l, RS8) + 2 ALU/BR/J UNITS (ALU1, ALU2) with latency 2 Check structural hazards for RS in ISSUE phase Check RAW hazads and Check structural hazards for FUs in START EXECUTE phase WRITE RESULT in RS and RF Forwarding Static Branch Prediction BTFNT (BACKWARD TAKEN FORWARD NOT TAKEN) with Branch Target Buffer b) Assuming the case (vetta[i] >= 0), fill in the following table: INSTRUCTION PRED ISSUE START WRITE HAZARDS RS UNIT T/NT EXEC RESULTS DO: lw $t2,vetta($t6) - 1 2 6 RS1 LDU1 sw $t2,vettb($t6) - 2 6 10 RAW $t2 solved with FW RS2 LDU2 sw $t2,vettc($t6) - slt $t0, $t2, $0 - bne $t0,$0, INC sw $t2,vettd($t6) - INC: addi $t6, $t6, 4 - bne $t6, $t7, DO END:
EXERCISE 5 VLIW and TRACE SCHEDULING 1. Define the term basic-block in the context of assembly code: 2. Consider the following pseudo-mips code sequence:... load $r2, 0($r1) /* Instruction A */ add $r4, $r5, $r6 /* Instruction B, $r4 is destination */ add $r8, $r6, $r4 /* Instruction C, $r8 is destination */ blt $r2, #1000, L1 /* Instruction D, branch */ load $r3, 0($r2) /* Instruction E */... 2.1 Given that instructions A... D are not a target of any branch in the program, we can safely say that there are two basic blocks; indicate, in the following control flow graph, the content of the two basic blocks by using the labels defined for each instruction (A..E).
2.2 Let us consider a VLIW machine with only two slots. The first slot is dedicated to load/store instructions while the second slot is dedicated to branches/alu instructions. Ideally, L/S functional units are used for 2 cycles while ALU functional unit take only 1 cycle. Branch instructions can be considered as occupying the branch unit for 2 cycles. 2.2.1 Draw the dependency graph of instructions in basic block 1 (use directed arrows to indicate RAW or control dependencies among instructions): 2.2.2 Indicate the length of the critical path of basic block 1 in terms of clock cycles:
2.2.3 Fill up the resource reservation table and the VLIW instruction schedule for the instructions of basic block 1 (as always use the instruction label defined before): Resource Reservation Table VLIW Code Schedule Cycle L/S Arith U. Branch U. Cycle L/S Slot ALU/Branch Slot 1 1 2 2 3 3 4 4 5 5 6 6 7 7 2.2.3 How many cycles does it take to complete the sequence of instructions A...E? 2.2.4 Assume that the code sequence is the most likely sequence found in an execution trace of the program and indicate which trace-scheduling move can be made to improve the execution time of the instructions A...E:
2.2.5 Is it a safe move? 2.2.6 Compute the speedup of the completion of the code sequence by considering the above move: 2.2.7 Indicate if you need any additional architectural support for enabling the above move: