CMSC411 Fall 2013 Midterm 2 Solutions

Size: px

Start display at page:

Download "CMSC411 Fall 2013 Midterm 2 Solutions"

Irma Payne
5 years ago
Views:

1 CMSC411 Fall 2013 Midterm 2 Solutions 1. (12 pts) Memory hierarchy a. (6 pts) Suppose we have a virtual memory of size 64 GB, or 2 36 bytes, where pages are 16 KB (2 14 bytes) each, and the machine has 8 GB (2 33 bytes) of physical memory. Compute the number of page table entries needed if all the pages are being used. Power of 2 Value K M G T # page table entries = virtual address / page size = 2 36 / 2 14 = 2 22 = 4 M entries b. (3 pts) Compute the size of the page table if each page table entry also required 4 additional bits (valid, protection, dirty, use). # physical pages = physical address space / page size = 2 33 / 2 14 = bits to represent each physical page number. Size of each Page table entry = (other bits) = 23 bits. Total Size of Page Table = 23 bits * 2 22 entries = 1.44 * 2 26 bits c. (3 pts) Assume the CPU has two levels of cache. If the miss rates were 5% for L1 cache, 2% for L2 cache, and 0.2% for memory, what percent of references require accessing the disk (paged virtual memory)? 5% * 2% * 0.2% = 0.05 * 0.02 * = = % 2. (9 pts) CPU Architectures a. (3 pts) Give an example of how compiler code transformations can help improve the performance of computer architectures. transformations (interchange, fusion, tiling) can improve cache performance. Other transformations (instruction reordering, loop unrolling, register renaming) can improve ILP. b. (3 pts) Describe advantages of long-instruction word (e.g., VLIW, EPIC, Itanium) processors over dynamically scheduled processors. Reduces hardware need for dynamically scheduling instructions. Compiler can move instructions further in the code for rescheduling. c. (3 pts) Explain how reorder buffers (ROB) enable speculation in dynamically scheduled microprocessors. ROBs store results of instructions until they are committed, allowing instructions to be speculatively executed since they may be canceled if the guess turned out to be wrong.

2 3. (12 pts) Data Hazards Instruction Effect LD F1, 0(Rx) F1 Mem(Rx) ADD.D F1, F2, F3 F1 F2 + F3 MULT.D F1, F2, F3 F1 F2 * F3 Consider the sequence of instructions to the right: I2: ADD.D F2, F6, F6 I3: MULT.D F3, F1, F2 I4: ADD.D F4, F2, F2 a. (8 pts) List all RAW, WAR, and WAW hazards found in the code. RAW: WAR: WAW: I1 I3 for F1 I3 I5 for F2 I2 I5 for F2 I1 I5 for F1 I4 I5 for F2 I1 I6 for F1 I2 I3 for F2 I2 I4 for F2 I3 I5 for F3 b. (4 pts) Compilers may reorder instructions at compile time to reduce stalls. Which registers may be renamed to permit more instructions to be reordered? List both the instruction and register (e.g., I2, F1 refers to the register F1 in instruction I2). The only WAR and WAW hazards are caused by reusing F2 in I5. To eliminate the storage-related hazards, rename either F2 in I5, or F2 where it appears in I2, I3, and I4. Renaming Example (F2 renamed to F7 in I2, I3, I4): I2: ADD.D F7, F6, F6 I3: MULT.D F3, F1, F7 I4: ADD.D F4, F7, F7

3 4. (12 pts) Instruction scheduling I2: ADD.D F2, F6, F6 I3: MULT.D F3, F1, F2 I4: ADD.D F4, F2, F2 Instruction Latency Memory LD +3 ADD.D +1 MULT.D +2 a. (6 pts) Given the instruction latencies on the right, show how instructions would be scheduled (with stalls) if instructions stalled only for true/flow/raw dependences. b. (6 pts) Consider a multiple-issue design processor design. Show how instructions would be scheduled (with stalls) if the processor can issue and execute two instructions per cycle. Note instructions must still be issued in order (i.e., an instruction cannot be issued until all previous instructions have been issued). Assume instructions stall only for true/flow/raw dependences. If Instructions Must Be Issued In Order Schedule for 4a Schedule for 4b Cycle Instruction Cycle Instruction Instruction 1 1 I2: ADD.D F2, F6, F6 2 I2: ADD.D F2, F6, F6 2 stall stall 3 stall 3 stall stall 4 stall 4 stall stall 5 I3: MULT.D F3, F1, F2 5 I3: MULT.D F3, F1, F2 I4: ADD.D F4, F2, F2 6 I4: ADD.D F4, F2, F2 6 stall stall 7 stall 7 stall stall If Instructions May Be Issued Out Of Order Schedule for 4a Schedule for 4b Cycle Instruction Cycle Instruction Instruction 1 1 I2: ADD.D F2, F6, F6 2 I2: ADD.D F2, F6, F6 2 stall stall 3 stall 3 I4: ADD.D F4, F2, F2 NOP 4 I4: ADD.D F4, F2, F2 4 stall stall 5 I3: MULT.D F3, F1, F2 5 I3: MULT.D F3, F1, F2 6 6 stall stall 7 stall 7 stall stall 8 8 NOP

4 5. (24 pts) Dynamic scheduling I2: ADD.D F2, F6, F6 I3: MULT.D F3, F1, F2 I4: ADD.D F4, F2, F2 Instruction Latency Memory LD +3 ADD.D +1 MULT.D +2 Consider the execution of a single-issue Tomasulo-style CPU. Assume the following: The CPU has 1 of each: load buffer, FP adder, FP multiplier functional unit. An unlimited number of reservation stations for each functional unit & load buffer. Functional units are not pipelined. No forwarding between functional units; results can only come from the CDB. If multiple instructions attempt to use the CDB in the same cycle, the instruction issued earliest goes first. Instruction execution times (latencies) are provided as +n cycles, where an instruction executes in 1+n cycles (i.e., spends 1+n cycles in the EX stage). For example: an instruction with latency +2 executed at cycle 4 would finish in cycle 6 (4+2) and put its result on the CDB in cycle 7 (if CDB is not busy). For each following instructions, show what clock cycle each instruction is issued and when it begins execution (i.e., enters its first EX cycle). Also show when each instruction writes the CDB. If an instruction execution or completion stalls, list the length of stall and provide a reason for the stall(s). Cycle # Instruction Issue Exec Write CDB Stalls I2: ADD.D F2, F6, F I3: MULT.D F3, F1, F Stall EX 3 cycles (until cycle 7) due to RAW for I1 I4: ADD.D F4, F2, F Stall EX 1 cycles (until cycle 6) due to RAW for I Stall EX 5 cycles (until cycle 11) due to RAW for I Stall EX 1 cycle (until cycle 8) waiting for FP Adder Stall CDB 1 cycle (until cycle 11) due to CDB for I3

5 6. (32 pts) Branch prediction For a loop containing two branches B1 & B2 (branch actions provided) for each loop iterations, show on each loop iterations the state of the branch predictors (and branch history tables, if needed), the predictions. Assume that all predictors are initialized to not taken, and that the correlation bits are initially set to not taken. When multiple predictors may be used, circle (or underline) the predictors used to make a prediction a. (8 pts) (1,2) predictor w/ global history, without branch address (standard 2-bit counter) Branch B2 Iteration predictor prediction action predictor prediction action 1 0/0 NT T 1/0 NT T 2 1/1 NT T 1/3 T NT 3 1/2 NT T 3/2 T NT 4 3/0 T NT 2/0 T T Exit 3/0 b. (8 pts) (2,1) predictor w/ global history + branch address Branch B2 Iteration predictor prediction action predictor prediction action 1 0/0/0/0 NT T 0/0/0/0 NT T 2 1/0/0/0 NT T 0/1/0/0 NT NT 3 1/0/0/1 NT T 0/1/0/0 T NT 4 1/0/1/1 T NT 0/0/0/0 NT T Exit 1/0/0/1 1/0/0/0 c. (8 pts) (2,1) predictor w/ local history + branch address Branch B2 Iteration predictor prediction action predictor prediction action 1 0/0/0/0 NT T 0/0/0/0 NT T 2 1/0/0/0 NT T 1/0/0/0 NT NT 3 1/1/0/0 NT T 1/0/0/0 NT NT 4 1/1/0/1 T NT 1/0/0/0 T T Exit 1/1/0/0 1/0/0/0

6 d. (8 pts) tournament predictor (saturating 2-bit counter) Iteration predictor X predictor Y tournament predictor prediction action 1 T NT 0 T T 2 NT T 0 NT T 3 NT NT 1 NT T 4 T NT 1 T T 5 T T 0 T T 6 T NT 0 T T Exit 0

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes CS433 Midterm Prof Josep Torrellas October 19, 2017 Time: 1 hour + 15 minutes Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your time.