Comprehensive Exams COMPUTER ARCHITECTURE Spring 2006 April 3, 2006 ID Number 1 /15 2 /20 3 /20 4 /20 Total /75
Problem 1. ( 15 points) Logic Design: A three-input switching function is expressed as f(a, b, c) =(a + bc)(a b + c). a) Write function f(a, b, c) in the minimum sum of products format. b) Write function f(a, b, c) in the minimum product of sums format. c) Suppose we connect the output of the function f to the input of one D Flip-Flop, and connect the output of the D Flip-Flop, g, to signal a, we have a sequential machine with inputs (b, c) and output g. Write the state table of the sequential machine.
2. (20 points) Processor Design Comparisons Read this problem carefully. It asks you to add a new instruction to a processor design. The functionality that this new instruction will implement currently has to be done by a series of other instructions (as detailed below). You propose a new instruction SETC that will be used to set a background color for a cell phone screen (NOTE: Screen colors are stored as a big two-dimensional array in memory). Our screen is of size 32 x 32 and each pixel color is stored as an int (a word of data). So the array that stores the screen color is 1024 words long (32x32). The SETC instruction takes two source registers: The rs register holds a value which is the background color we should assign The rt register holds the start memory address (location (0,0)) on the screen the first element in the array) where we want to start writing the background color for the screen. So, if we had SETC $3, $4 and $4 has the number 4096 in it, we will overwrite all memory locations from 4096 to 8192 (that s 1024 words) with the value in $3. In the current processor design (where SETC doesn t exist as an instruction) they currently implement this functionality in software, by doing something like: # $3 has the background color value # $4 is already set to 4096 # $5 is already set to 8192 loop: SW $3, 0($4) ADDI $4, $4, 4 //increase address 1 word (4 bytes) each cycle BNE $4, $5, loop //this loop runs 1024 times Assume all processors below can store one 32-bit word at a time, and only one store can happen at a time. Assume that it takes 50ns to write a word to memory.
I) This question asks about the prospect of adding the SETC instruction to a single cycle processor. A) If you were to add this instruction to a single cycle processor design, what would the required cycle time be? Assume that a single cycle design for the current set of instructions supported takes 200ns and the time to access memory is 50ns. Cycle Time: B) If you were asked to advise a company considering a single cycle design including the SETC instruction, would you recommend that they go forward with the project? Answer yes or no and provide a few sentences of explanation.
II) This question asks about the prospect of adding the SETC instruction to a multicycle processor. Here is some extra information: Cycle time is 50ns Assume that a STORE instruction takes 3 cycles STORE Cycle 1: Fetch and Decode STORE Cycle 2: Compute address and read register STORE Cycle 3: Write to memory BNE and ADDI take 2 cycles each Assume that the SETC instruction takes o 1 cycle to be fetched and decoded o 1 cycle to get ready to store the first word o 1 cycle to store each word (this includes time to keep track of how many words were written, and to increment the address to store to). A) How long would it take to accomplish the setc functionality in the original design (without the SETC instruction)? Give an answer in cycles. B) Consider a multicycle implementation of the SETC instruction. How many cycles would it require? C) If you were asked to advise a company considering a multicycle design including the SETC instruction, would you recommend that they go forward with the project? Answer yes or no and provide a few sentences of English explanation.
III) This question asks about the prospect of adding the SETC instruction to a pipelined processor. Here is some extra information: Assume that the cycle time is 50ns Assume the pipeline has these 5 stages (just like the DLX pipeline developed in Patterson and Hennessey): fetch, decode, execute, memory, writeback A) Some further assumptions for Part A Assume that you cannot change the number of pipeline stages Assume that you cannot change the design of the memory module (it can store only one value per cycle) Is it possible to implement the SETC instruction in a pipelined design? Answer yes or no and give a few English sentences of explanation. B) If the assumptions from Part A are removed (that is you can modify the pipeline design), would you recommend doing so? Answer yes or no and provide a few sentences of English explanation.
if id reg ex1 ex2 mem wb 3. (20 points) Pipelining A certain processor has a pipeline as shown above. Registers are read in stage reg and written in wb. There are two execution stages, ex1 and ex2. Arithmetic operations (add, sub, etc.) complete the computation at the end of ex2, but logical operations (e.g., and, or, shift) produce the result by the end of ex1. Memory (cache) is accessed in stage mem, while the address calculation for memory operations is done in ex1 and ex2. Assume all reasonable forwarding (if you need to make other assumptions, make sure they are reasonable ones, and clearly stated). Show all forwarding and bubbles for the following code on this pipeline. add $5, $4, $1 lw $6, 1000($5) and $6, $6, $4 add $8, $5, $6 lw $7, 1020($5) and $9, $8, 46 add $10, $7, $9 IF ID R X1 X2 M WB B) Could this code be reordered to improve performance? If so, how many fewer stalls would result?
Problem 4 (20 points) Caches Assume you have a 128 byte cache, with 32 bytes per block. The memory is byte addressable. All cache blocks start out as invalid. Answer the following question filling in the table below using the following address stream. The addresses in the order they are accessed are: 90, 160, 280, 50, 190, 90, 250, 120, 200, 60, 300, 30, 192, 96, 30, 220, 280... Fill in the table below assuming that the cache is 2-way associative, and use LRU for replacement. Fill in for each address (a) the set each address indexes into (the sets start at number 0), (b) the full block address range stored there after performing the access, (c) indicate if the address access is a hit or miss in the cache, (d) if it is a hit, specify if it is a temporal (T) or spatial (S) hit, and (d) if miss, what block (its address range) was evicted from the cache using LRU? Address Set Index Block Addresses Hit or Miss If Hit, T or S If Miss, Block Evicted 90 160 280 50 190 90 250 120 200 60 300 30 192 96 30 220 280