CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

Exam 2 April 12, 2012 You have 80 minutes to complete the exam. Please write your answers clearly and legibly on this exam paper. GRADE: Name. Class ID. 1. (22 pts) Circle the selected answer for T/F and multiple-choice questions; and fill in the blanks for the rest. Each part is 2 points. a. T / F A processor can have different CPIs for different programs. b. T / F In multi-cycle implementation the first two stages, instruction fetch and instruction decode, are the same for all instruction classes. c. T / F Increasing the depth of pipelining always decreases performance. d. T / F Pipelining improves the performance by increasing throughput. e. T / F To pass data from an early pipeline stage to a later pipeline stage, the data must be placed in a pipeline register not to lose the data when the next instruction enters that pipeline stage. f. T / F Data forwarding resolves the data hazard that occurs when an instruction tries to read a register following a load instruction that writes the same register. g. The ideal speedup of a pipelined system with four ideal stages is. h. One solution to data hazards can be. i. One solution to structural hazards can be. j. Which one of the following processors has the highest possible MIPS rate in ideal conditions? a. A single-issue processor driven by a 1 GHz clock. b. A 2-issue processor driven by a 500 MHz clock. c. A 4-issue processor driven by a 250 MHz clock. d. An 8-issue VLIW processor driven by a 200 MHz clock. k. Which one of the following is NOT calculated by the ALU? a. Arithmetic result for arithmetic instructions b. Memory address for load/store instructions c. Branch target address d. Address of the next instruction 1

2. (18 pts) Answer the following questions giving all necessary details. a. (5 pts) Given the sequences of array references below, determine if each sequence exhibits spatial or temporal locality. A[10], B[10], A[11], B[11], A[12], B[12], A[9], B[9] A[1], B[1], A[1000], B[1000], A[1], B[1], A[1000], B[1000] b. (5 pts) List each memory components in memory hierarchy. Order them from fastest to slowest and from smallest in size to larger. Memory Components Order fastest to slowest Order smallest to largest c. (8 pts) Name the five pipeline stages of MIPS Architecture. Explain what part of the instruction execution is performed in that stage. Give enough detail and be specific. 2

3. (16 pts) The datapath for 5-stage MIPS Pipeline Architecture is given below. List the resources that are used during the execution of each instruction below. Ignore the MUXes. When listing, use the numbers associated with the resources. 1 Program Counter 2 Adder in IF stage 3 Instruction Memory 4 Register File 5 Sing-extension Unit 6 Shift-left-2 Unit 7 Adder in EX stage 8 ALU 9 Data Memory Instruction beq s4, zero,loop Resources used 3

4. (10 pts) The latencies of individual stages in five-stage MIPS Architecture are given below. Stage IF ID EX MEM WB Latency 200ps 300ps 250ps 400ps 100ps a. What is the clock cycle time in a pipelined and non-pipelined processor? Pipelined version Non-pipelined version b. What is the total latency of a lw instruction in a pipelined and non-pipelined processor? Pipelined version Non-pipelined version 4

5. (10 pts) What is the accuracy of always-taken and always-not-taken branch predictors for the repeating (T, T, NT, T, T, T, NT, NT, T, T) pattern of branch outcomes? always-taken always-not-taken 6. (5 pts) Given code fragment below, schedule the code to avoid the stalls within a loop iteration. Make the necessary changes, if needed, in the code. Assume the classic five-stage MIPS architecture supports fully forwarding. loop: lw s1, 0(t1) add s3, s1, s2 sw s3, 0(t1) subi t1, t1, 4 bne t1, t2, loop 5

7. (20 pts) Show the pipeline timing diagram for one iteration of the loop using classic five-stage MIPS Architecture. For all parts, assume that register read and register write can be done in the same clock cycle and branches are resolved in EX stage (i.e., the branch target address will be known at the end of EX stage and the target instruction can be fetched in the next clock cycle). There is no branch prediction mechanism employed. loop: lw s1, 0(s2) addi s2, s2, 4 bne s4, zero, loop a. (9 pts) Show the pipeline timing diagram of this instruction sequence assuming forwarding is not supported by the architecture. Clock Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 lw s1, 0(s2) F D X M W addi s2, s2, 4 bne s4, zero,loop lw s1, 0(s2) It takes clock cycles to execute one iteration (from ID of first lw to ID of next lw). b. (9 pts) Do the same work in part (a) assuming forwarding is fully supported by the architecture. Clock Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 lw s1, 0(s2) F D X M W addi s2, s2, 4 bne s4, zero,loop lw s1, 0(s2) It takes clock cycles to execute one iteration. c. (2 pts) What is the speedup obtained by forwarding? 6

[Left blank intentionally for scratch] 7