CMSC411 Fall 2013 Midterm 1 Name: Instructions You have 75 minutes to take this exam. There are 100 points in this exam, so spend about 45 seconds per point. You do not need to provide a number if you can show the appropriate fraction. E.g., 1/13 is acceptable in place of.0769. This is a closed book exam. No notes or other aids are allowed. If you have a question, please raise your hand and wait for the instructor. Answer essay questions concisely using 1-2 sentences. Longer answers are not necessary and a penalty may be applied. In order to be eligible for partial credit, show all of your work and clearly indicate your answers. Write neatly. Credit cannot be given for illegible answers. Problem Score 1 Computer Architectures /12 2 Reliability, Performance, /18 Amdahl s Law 3 Basic Pipelining /30 4 Pipeline Hazards /16 5 Pipeline Performance /8 6 Cache Organization /16 Total /100
1. (12 pts) Computer architectures a. (3 pts) Describe one advantage of using geometric mean instead of arithmetic mean to combine results from multiple benchmark programs. b. (3 pts) Explain the motivation behind the introduction of pipelining in processor architectures. c. (3 pts) Explain why it is difficult to report exceptions precisely for pipelined architectures. d. (3 pts) Explain how caches exploit spatial locality to improve performance.
2. (18 pts) Reliability, performance, and Amdahl s Law The memory hierarchy consists of cache, memory, and disk. Suppose you are considering replacing your hard disk drive with a faster solid state disk based on flash memory. The access speeds and mean time to failure (MTTF) for different parts of the memory hierarchy are as shown in the table below:. Component Access Time MTTF Cache (SRAM) 10 cycles 10 years Memory (DRAM) 100 cycles 10 years Hard disk drive (HD) 10000 cycles 2 years Solid state disk (SSD) 1000 cycles 5 years a. (6 pts) What is the mean time to failure (MTTF) of the entire memory hierarchy assuming there are 2 caches, 1 memory, and 2 hard disk drives? There is no SSD. b. (6 pts) If 1% of memory accesses result in page misses that require a disk access, what is the average cost of a memory access (in cycles)? Assume no caches. c. (6 pts) For the problem above, what is the improvement in the average memory access time (in cycles) if the hard disk drive is replaced with a solid state disk (i.e., hard disk performance improved by 10x)?
3. (30 pts) Basic pipelining. Use the following code fragment: I1 LW R1, 0(R2) ; R1 address (0+R2) I2 LW R2, 0(R1) ; R2 address (0+R1) I3 ADDI R3, R2, #8 ; R3 R2+8 I4 MULT R4, R1, R1 ; R4 R1*R1 I5 SW R4, 4(R3) ; address(4+r3) R4 a. (10 pts) List all RAW (read-after-write) pipeline hazards in the code, regardless of whether they cause any stalls. Use the classic MIPS five-stage integer pipeline, show the timing of this instruction sequence. Assume all memory accesses take 1 clock cycle, and a register may be read and written in the same clock cycle. b. (10 pts) Assume there is no forwarding or bypassing hardware. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 LW IF ID EX MEM WB LW ADDI MULT SW c. (10 pts) Assume normal forwarding and bypassing hardware. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 LW IF ID EX MEM WB LW ADDI MULT SW
4. (16 pts) Pipeline hazards. Consider the following MIPS floating point pipeline: Processors implement logic to check for potential data hazards (such as RAW and WAW) and forwarding. Recall that the format of MIPS register-register instructions are rd = rs OP rt (i.e., rd is the destination, and rs/rt are the operands), and register immediate instructions (including load/store) are rt = rs OP immed (i.e., rt is the destination, and rs is the operand). Consider the following check: IF/ID.IR[op] = ADD.D & A2/A3.IR[op] = ADD.D & IF/ID.IR[rt] = A2/A3.IR[rd] a. (8 pts) Explain what the logic is checking b. (8 pts) Explain whether the check is needed
5. (8 pts) Pipeline performance. Suppose processor X executes instructions in the following 3 stages (no pipeline), where each stages could run this fast. Compare the performance of a pipelined vs. unpipelined implementation of processor X. IF&ID 12ns EX 7ns MEM&WB 15ns 6. (16 pts) Cache organization Suppose we have a byte addressable memory of size 4GB (2 32 bytes). a. (12 pts) The Intel Core i7 (Sandy Bridge) CPU has a 256K MB L2 cache (2 17 bytes, not including tag bits) and a cache block size of 64 (2 6 ) bytes. The L2 cache is 8-way (2 3 ) associative. Compute for the L2 cache the length in number of bits for the tag, index and offset fields of a 32-bit memory address (show your calculations) Power of 2 Value 2 1 2 2 2 4 2 3 8 2 4 16 2 5 32 2 6 64 2 7 128 2 8 256 2 9 512 2 10 1K 2 20 1M 2 30 1G b. (4 pts) Considering the answer to part (a), circle the bits representing the index in the following 32-bit memory address (in binary): 1 0 0 1 1 1 0 1 1 1 0 0 0 1 1 0 1 1 1 0 1 0 0 1 1 1 0 0 0 1 0 0