This exam is open book and open notes. You have 2 hours. Problems 1-4 refer to a proposed MIPS instruction lwu (load word - update) which implements update addressing an addressing mode that is used in the PowerPC (see p. 177). The assembly language form of lwu and its register transfers are shown below: New Instruction lwu rt, immed(rs) Equivalent MIPS Instructions lw rt, immed(rs); addi rs, rs, immed Register Transfers address <- Reg[rs] + sign_extend(immed); Reg[rt] <- MEM[address]; Reg[rs] <- address; 1. Multicycle Processor Design 20 Points Modfiy the multicycle processor design to efficiently implement the lwu instruction. Mark changes on the state diagram below and the datapath diagram on the next page. Start 0 Instruction Fetch MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 Instruction decode / register fetch 1 ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 (OP = JMP ) Memory address computation 2 ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 Branch Execution Completion 6 8 9 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 Jump Completion PCWrite PCSource = 10 10 MemRead IorD = 1 RegWrite MemToReg=0 RegDst=2 3 (OP = LW ) MemRead IorD = 1 Memory access (OP = ( SW ) Memory access 5 7 MemWrite IorD = 1 RegDst = 1 RegWrite MemtoReg = 0 R-type completion 4 Writeback step RegWrite MemToReg=1 RegDst = 0 Page 1 of 9
Page 2 of 9
3. Pipelined Processor Design 20 Points Modify the pipelined processor datapath and control to implement the lwu instruction. Note that in a pipelined implementation all register updates should take place during the WB stage. To make this possible, the register file has been extended with a second write port so that it can write two registers at the same time. Mark any changes to the datapath on the diagram on the next page. In addition, show all control outputs in the table below: Instr. EX Stage Control Lines MEM Stage Control Lines WB Stage Control Lines Reg ALU ALU ALU Branch Mem Mem Reg Memto Reg Dst Op1 Op0 Src Read Write Write Reg Write2 lwu 0 0 0 1 0 1 0 1 0 1 Page 3 of 9
Page 4 of 9
4. Data Hazards & Forwarding 20 Points The following sequence of MIPS instructions includes the new lwu instruction. Assume that this sequence is executing on the modified pipeline design from Problem 3, but that this design is altered to perform forwarding as in Fig. 6.40 on p. 484. lwu $4, 200($1) add $6, $1, $7 sub $5, $6, $4 (a) Circle any data dependencies which exist between these instructions. (b) Note that the lwu instruction writes the rs register as well as the rt register. Can the data dependencies on the rs register be resolved by forwarding alone, or will stalling be necessary? Why or why not? No forwarding is necessary! (c) Fill in the multicycle diagram shown below to show the execution of the instruction sequence, including stalls (if any). Shade active stages and show forwarding. 0 2 4 6 8 10 12 14 lwu $4, 200($1) IF ID $4 $1 EX MEM $4 $1 WB add $6, $1, $7 $1 IF ID EX MEM $7 WB sub $5, $6, $4 $6 IF ID EX MEM $4 WB IF ID EX MEM WB Page 5 of 9
5. Pipelined Processor Timing 10 Points This problem refers to the pipelined datapath used in Problem 3. Assume that the pipelined datapath components have the same delay characteristics as the single-cycle components described on page 373 of the book ALU and memory have a 2ns delay, register file read and write each have a 1ns delay. (a) Assume that all other components have no delay. What is the minimum clock period at which this design can operate properly? Solution: Worst case register-register delay = 2ns (b) Now assume that in addition to the delays given above, the delay of multiplexers is 0.1ns. What is the minimum clock period at which this design can operate properly? What stages limit the execution time? Solution: Worst case register-register delay = 2.1ns Page 6 of 9
6. Short Answers 10 Points Provide a short answer for each of the following questions: (a) When would a compiler be able to use the lwu instruction to increase the speed of a program? Solution: When adjacent memory locations or array elements are accessed in a loop, we can perform both the memory access and address update to access the next element in one instruction. The original MIPS will require 2. (b) Why do RISC architectures use fixed-width instructions? Solution: To make instructions easy to decode and the implementation simpler. Variable-width instructions save memory, but require that the hardware handle instructions of varying width. (c) What are the steps required to add two floating point numbers? Solution: 1. Shift one number to make the exponents equal. 2. Perform the addition 3. Round the result 4. Normalize the result and adjust the exponent (d) What is the motivation for out-of-order execution in dynamic pipelining? Solution: Out-of order execution allows the execution of instructions to start execution even though earlier instruction have stalled, which increases performance. (e) When does a page fault occur? Solution: When a processor with virtual memory attempts to access instructions or data that are not currently stored in the main memory but instead stored on disk. Page 7 of 9
7. Cache Memories 20 Points The diagram on the next page shows a cache memory design which contains 8 blocks of four words each. Note that on a cache miss all four words in a block are replaced at the same time. (a) How many bits will there be in the Block Offset field of the address? Solution: Two. Oops, diagram shows this already! But index is 3 bits! (b) How many bits will there be in the Tag field of each address? Solution: 32 bit address 2 bit byte offset 2 bit block offset 3 bit block index = 25 bits (c) How many bits of storage will be required for this cache memory? Solution: (128 bits + 25 bits + 1 valid bit) X 8 = 1232 bits Hit Tag Data Tag Index 2 Byte Offset Block Offset 128 Bits V Tag Data 0 1 2 3 Block 0 4 5 6 7 Block 1 8 9 10 11 Block 2 12 13 14 15 Block 3 16 17 18 19 Block 4 20 21 22 23 Block 5 24 25 26 27 Block 6 28 29 30 31 Block 7 32 32 32 32 = Page 8 of 9
(d) Given the word references below, mark each reference as a hit or miss and show the cache contents in the table below. Reference Hit/Miss 1 4 8 5 20 17 19 56 9 11 4 43 99 6 9 17 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Block 0 Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Page 9 of 9