EECS 470 Midterm Exam Fall 2014

Size: px

Start display at page:

Download "EECS 470 Midterm Exam Fall 2014"

Alexis Hancock
6 years ago
Views:

1 EECS 470 Midterm Exam Fall 2014 Name: uniqname: Rewrite and sign the honor code below: I have neither given nor received aid on this exam nor observed anyone else doing so. Signature: Scores: Page # Points 2 /15 3 /5 4 /10 5 /10 6 /5 7 /15 8 /20 9 /20 Total /100 NOTES: Open book and Open notes Calculators are allowed but not if it has communication support (Bluetooth, Cell phones, etc.) Don t spend too much time on any one problem. You have about 120 minutes for the exam. Be sure to show work and explain what you ve done when asked to do so. Page 1 of 10

2 1) Fill in the blank [10 points, -1 per blank or wrong answer] a. We refer to hazards that can be resolved by forwarding as _READ_ after_write_ hazards. b. The lowest CPI (assuming no hazards) that is possible in a 5 stage pipeline is 1, while the lowest CPI in an 8 stage pipeline is 1. c. Results from instructions which have completed but not retired are found in the PRF in a MIPS R10K design and in the ROB in an Intel P6 design. d. The ROB is flushed when the retiring instruction is a(n) _MISPREDICTED BRANCH_ or EXCEPTION / INTERRUPT. e. A processor with a byte addressable, 64-bit address space and a 512 byte, 16 way associative L1 cache with 64 bit lines has 59 bits for the tag and 61 bits for the index when using single address cache lines. 2) Circle the correct answer(s). There may be multiple correct answers for each question [5 points, -1 per blank or wrong answer. Minimum of 0] a. True dependencies are also called RAW/WAW/WAR/RAR hazards. b. Hazards which can be resolved by renaming are called RAW/WAW/WAR/RAR hazards. c. The problem that multicore designs are intended to solve is area/power consumption/memory speed. d. A joule is equivalent to a (W*s)/(W/s)/(W^2). Page 2 of 10

3 3) The following is a dependency graph of a program, where each node represents an instruction and each edge represents a data dependency: Recall that ILP is the average number of instructions that can be executed in parallel. What is the ILP of the above program? Explain how you obtained the ILP number. [5 points] 7 instructions over 3 levels -> 7/3 Page 3 of 10

4 4) A program is 20% serial and the rest can be perfectly parallelized with no additional overhead. What is the maximum speed up possible with infinite resources? Show your work. [5 points] Speedup = 1/.2 = 5 5) A computer is capable of DVFS -- dynamic voltage (V) and frequency (f) scaling. Recall that power is proportional to V 2 f and f is proportional to V. If the computer runs at its maximum voltage of V max it will finish a particular task in T/2 seconds. a. What voltage should the computer run at if it is to finish the task in T seconds? b. If the amount of energy used to perform the task in the V max case is E, what is it in case describe in (a)? [5 points] a) freq is proportional to voltage. Performs half as fast, so half the voltage = (v_max/2) b) Power is reduced to (1/2)^3 = 1/8). Takes twice as long, so overall consumption is (1/8*2) = ¼ E Page 4 of 10

5 6) Verilog: a. Consider the following Verilog excerpt from ex_stage.v of the provided Verisimple Pipeline (project 3): always_comb begin case (func) ALU_MULQ: result = opa * opb; // other cases endcase end Briefly (one or two sentences), describe why you should not implement multiplication in this way for your final project. [5 points] Single cycle multiplier s critical path is too large. Use pipelined multiplier instead. b. Bruce is working on a Verilog design for 470. He is finding that the design is not working as he expects after synthesis. He thinks the problem is with the following code: always_comb begin status = 0; count = 0; case(count) 0: count = 1; 1: begin if(en_a) status = 1; count = 2; end 2: begin count = 0; if(en_b) status = 1; end endcase end Describe which synthesis guideline Bruce has violated. [5 points] Circular combinational logic. count depends on itself (no steady state value). Note that no latches are formed, since status and count are given default values. Page 5 of 10

6 7) Write a module in behavioral Verilog (i.e. do not instantiate any submodules) which implements the following device. You are to keep the signal names the same as they are in the provided figures (although you may create your own internal signals, where appropriate). Your code should be reasonably efficient. Minor syntax errors will be ignored. [5 points] module thing ( ); input A, B, C, S, output out logic top, bottom, next_top, next_bottom; assign next_top = S? B : A; assign next_bottom = S? C : top; assign out = bottom; Clk) begin end endmodule top <= next_top; bottom <= next_bottom; Page 6 of 10

7 8) a. Consider the pipeline you were to implement for your third programming assignment, but assume that the structural hazard on memory has been removed. Branches are still resolved in the memory stage. A given program consists of 30% loads, 5% stores, 15% branches and 50% ALU operations. If 30% of the branches are not-taken and 40% of all instructions are dependent on the instruction in front of them, what is the expected CPI of the processor on this program? Show your work. [5 points] Base CPI + stalls due to data dependencies (load followed by dependent) + stalls due to branch mispredicts 1 +.3*4 +.15*.7*3 = b. Sally has done some analysis on the pipeline and has realized that the front-end of the machine is much faster than the back-end of the machine. She proposes that merging the IF and ID stage into a single stage will make the machine faster. In other words, the new pipeline is a 4-stage pipeline with stages IFD, EX, MEM, WB, where IFD performs all the functionality of IF and ID in a single clock period. If the new frequency is 5% slower than the original, what is the expected speedup on the program specified in part (a)? Show your work. [10 points] New CPI = 1 +.3*4 +.15*.7*2 = 1.33 Speedup = (1.33 * 1.05) / = 1.02 Page 7 of 10

8 9) In this problem, we are using the P6 out-of-order execution scheme. Map Table ROB Head = 5 Tail = 6 Arch ROB# Buffer PC Done Dest. Value Reg. # (-- if in ARF) Number with EX? Arch Reg # Y N/A N Y Y Y N N RS RS# Op type Op1 ready? Op1 RoB/value Op2 ready? Op2 RoB/value Dest ROB 0 + Y -1 Y Y 5 Y ^ N 5 Y ARF Reg# Value Now say that the instruction in ROB #0 is a branch, which should have gone to top. top: R0=R3+R4 R2=R4<<2 R5=R1-R2 R3=R5^R3 // A // B // C // D Update the state of the machine to reflect that C has dispatched, but not issued and everything else has made as much progress as it can, given that constraint. Assume an RS entry is cleared at the beginning of execution. The PC of instruction A is 48 and each instruction is 4 bytes in size. [20 points] Page 8 of 10

9 10) Consider a processor implementing the R10K algorithm discussed in class. This processor has 32 architected registers, 64 ROB entries, 8 reservation stations, 1 CDB and 32-bit memory addresses (for both instruction and data memory). Assume for this example that there are no exceptions/interrupts, but the ROB does have to handle branch mispredicts as well as halting instructions. List all input and output that will be used to implement the ROB by filing in the table below. We ve done two signals for you as an example. Note that there is some design freedom here, so explain your reasoning for each signal. You may not need the entire table. This is a difficult question. We recommend doing it after completing the others. [20 points] There are multiple solutions, depending on the specific implementation. Below is an partial list of signals you should have Description of signal Input/ Output Why it s needed Tag in from CDB Input Need to mark when an instruction has finished execution and is ready to retire Halt in from dispatch Input Need to know if a dispatched instruction is a halt, so we can stop the program after retire How many bits? >6 1 Clock / reset Tag in dispatch Tag_old in dispatch Tag_old out Input 1/1 Input >6 Input >6 Output Update free list >6 Mispredict in Input Mark entry after Complete to flush ROB at retire 1 Flush out Output Flush pipeline at mispredict 1 Dispatch enable In Set low if no valid instruction or structural hazard 1 Page 9 of 10

10 ROB full Out Stall dispatch if necessary 1 Target PC_in In In case there s a branch mispredicts 32 Target PC_out Out // 32 Retire valid out Out 1 Halt_out Out 1 // Whatever signals you used for branch recovery Page 10 of 10

EECS 470 Midterm Exam

EECS 470 Midterm Exam Winter 2014 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: NOTES: # Points Page 2 /12 Page 3