EECS 470 Midterm Exam Answer Key Fall 2004

Size: px

Start display at page:

Download "EECS 470 Midterm Exam Answer Key Fall 2004"

Ada Caldwell
5 years ago
Views:

1 EECS 470 Midterm Exam Answer Key Fall 2004 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points Part I /23 Part II /47 1 /10 2 /10 3 /12 4 /15 Part III /30 Total /100 NOTES: Open book and Open notes Calculators are allowed, but no PDAs, Portables, Cell phones, etc. Don t spend too much time on any one problem. You have about 120 minutes for the exam. There are 9 pages including this one. A few questions have limits on the number or words or sentences you can use. We will only grade your answer until you hit that limit. Be sure to show work and explain what you ve done when asked to do so. The last page has two answer areas for the Part III question. Clearly mark which one you want graded or we will grade the first one.

2 Part I Short answer 23 points 1. Convert the following Verilog statement into an assign statement: [4] reg [2:0] a, bob, alice; begin if(a==1) bob = 3; else bob = alice; end reg [2:0] a, alice; wire [2:0] bob; assign bob = (a==1)? 3 : alice; 2. Consider the algorithm we have labeled as Tomasulo s 3. In homework 3 you were asked about the benefits of increasing the size of the architected register file vs. the physical register file. Now consider holding the size of the physical register size fixed and increasing size of the architected register size. What would be the most significant CPI penalty? You are to assume that the ISA will be able to encode the same instructions as before without penalty (there was space in the ISA for larger registers to be encoded). Your answer must be one or two sentences in length. [6] The most significant CPI penalty will be from the reduced window size.i.e. stalls due an empty freelist will occur more often because fewer registers will be available to be on the freelist. 2

3 3. Consider the pipeline of project 3 but without the structural hazards (there are two ports to memory). If 20% of the instructions are branches, branches are not-taken 60% of the time, 20% of the instructions are loads, 15% of the instructions are stores, and any given instruction has a 20% chance of being dependent on the instruction in front of it, what will be the expected CPI? Show your work. [5] 1 + 3*(0.2*0.4) + 1*(0.2*0.2) = 1.28 base + (branch mispredict cycles)*(prob an instruction is a branch mispredict) + (data data stall cycles)*(prob an instruction is stalled by data hazard) 4. Below is a list of statements. For each entry on the list indicate which of the three algorithms we discussed in class (T1, T2, T3) the statement is true for. A given statement may be true for all three algorithms, some combination of algorithms or none. For each statement write the number(s) of the algorithms for which the statement would be true. If it is true for none of the algorithms, write None. You may assume these implementations are all superscalar. [8, -2 per wrong or blank answer, minimum of zero] Number Statement Answer 1 Provides precise interrupts Renames to a reservation station number 1 3 Retires instructions in-order 23 4 Only need to look for a source operand on the CDB or one other 13 place. 5 Can speculate on branches 23 6 Uses Reservation Stations Can quickly deal with a mis-predict when the mis-predicted 2 branch is at the front of the RoB while using only one RAT. 8 Suffers no performance impact from RAW hazards None 3

4 Part II Not-so-short questions 47 points 1. Consider the case of self-modifying code in the context a processor which implements what we have called Tomasulo s 3. Assume we have a retirement RAT and we plan on addressing this hazard by nuking the processor. That is, by clearing the ROB, RS, and execution unit while copying the retirement RAT and associated physical register free list into the front-end (issue) RAT and free list. When should this nuke occur? Your answer must be less than 50 words in length. [10] When a store writes to the address of an instruction currently in flight there is a memory hazard. This can be solved by Nuking the processor at any point before the modified instruction retires. 4

5 2. We have an architecture that we are told has a CPI of 1.4. Someone has come up with a way to cut the branch mispredict penalty by 2 cycles, but at a cost of increasing the clock period by 10%. We know 20% of instructions are branches, but we do not know what the prediction rate of the branch predictor is. a) For what range of prediction rates would this change to the architecture improve overall performance? Show your work! [5] t old =CPI old * tclock * #instr. t new= CPI new * 1.1 tclock * #instr. t new <t old is desired. CPI old < 1.1 CPI new == 1.4 < 1.1 *CPI new 1.4 < 1.1 * (1.4 (2 *.2 * MPR)) 1.4/1.1 < 1.4-2*.2*MPR 1.4/ < -2*.2*MPR (1.4/ )/(-.4) > MPR (-.127)/(-.4)>MPR.318 > MPR Prediction rate <.682 b) What would be the percentage gain/loss in performance with a prediction rate of 90%? Show your work! [5] t new =1.1 * (1.4 (2 *.2 *.1))*tclock * #instr 1.1*( ) = 1.1 *1.36 = t new = * tclock * #instr t old =1.4* tclock * #instr. 1-(t old /t new ) = 1-(1.4/1.496)=93.6% or a 6.4% performance loss. 5

6 3. Consider the following pseudo-assembly code: R2=0 r4=0 r5=0 bob: r3=(r2 mod 2) // remainder when r2 is divided by 2 if(r3==0) goto next // Branch 1 r4=r4+1 next: r5=r5+4 r2=mem[r5+0] // Load if(r4<50000) goto bob // Branch 2 bob has an address of 0x1000. The predictors all use the least significant bits of the PC other than the word-offset. Predictors are all initialized to 0 or 00 which is nottaken and strongly not-taken respectively. You are to consider how different branch predictors will behave on this code under different circumstances. Case 1: The data from the load will be 1 the first time, 2 the second, 3 the third etc. Case 2: The data from the load will be random. (each instance is independent, with no bias toward even or odd numbers) Case 3: The data is even the first three times in a row and then odd, and then even three times in a row, and then odd, etc. You are now to consider 3 branch predictors: Predictor 1: A PC-based predictor with 32 entries each 1 bit. Predictor 2: A PC-based predictor with 16 entries each a 2-bit saturating counter. Predictor 3: A local pattern history predictor. The BHT has 16 entries, each with 3 bits of history. The predictors are each 1 bit. What are the expected mis-predict rates for each of the following? Your answer must be correct within 0.2%. [12, -1 per wrong or blank box, min 0] Case 1 Case 2 Case 3 Branch 1 Branch 2 Branch 1 Branch 2 Branch 1 Branch 2 Predictor 1 100% 0 50% 0 50% 0 Predictor 2 50% 0 50% 0 25% 0 Predictor % 6.25% 25% 25% 6

7 4. Consider a set of code where there are two classes of instructions. One class of instruction (called simple) is not dependent on any other instruction and can execute in one cycle. The other class of instruction (called long) is also independent of all other instructions but takes 20 cycles to execute. Say you have a machine which can issue one instruction per cycle, finish execution of one instruction per cycle, and retire one instruction per cycle. This machine implements what we have called Tomasulo s 3, has a RS size of 16 and a RoB size of 64. Show your work! a) What is the best CPI this machine could achieve if the program being run consisted of groups of 20 instructions, where the first 19 were simple and the last was long. Assume there are a large number of these groups. (So the code is 19 simple, 1 long, 19 simple, 1 long, etc.) [4] 1.0 After the first long instruction there will always be 20 instructions in the RoB. The long instruction can execute during its stay in the RoB. So only a stall on the very first long instruction b) As part a but now the long instructions take 100 cycles and the groups are of 99 simple instructions followed by a long instruction (99 simple, 1 long, 99 simple, 1 long, etc.) [5] 1+[(100-63) /100] =1.36 After the very first long instruction causes a stall of 100 the RoB will be constantly full. Thus when the long instruction hits it will only cause a stall when it hits the head of the RoB which will take 63 cycles. c) As part a but now the long instructions take 100 cycles and the groups are of 39 simple instructions followed by a long instruction (39 simple, 1 long, 39 simple, 1 long, etc.) [6] 1+[(100-63)/(2*40)] = 1.45 In this case the RoB will again get full. But when the long instruction at the head of the queue stalls, another long instruction will also be executing. That instruction won t stall. So we get stalls per two long instructions. 7

8 Part III Implementation of T2 30 points 1. Consider the following tables that represent the state of a processor that implements what we have called Tomasulo s second algorithm: Arch Reg. # RAT ROB# (-- if in ARF) Buffer Number PC ROB Done with EX? Dest. Arch Reg # Y N N/A N Y Value RS RS# Op type Op1 ready? Op1 RoB/value Op2 ready? Op2 RoB/value Dest ROB 0 ADD Yes 14 Yes Branch Yes 14 Yes ARF Reg# Value The branch at PC 16 has been predicted not-taken, but it is actually taken. The destination of the branch is PC 100, where the following code resides: R2=R2+R2 R1=R2*R1 R2=R2+R1 R3=R4-R5 R4=R3-R1 // A // B // C // D // E Show the state of the above tables if instruction A has retired, inst B has not finished executing, while C, D and E have progressed as far along as possible. Be sure to label the head and tail of the ROB. You are to place instruction A in the RoB in the same place it would have been placed had the prediction been correct. When arbitrary decisions need to be made, you are to just make them. [30] Note: A set of blank tables are found on the following page. You should cross out the one you don t want graded. If we can t tell which one to grade, we will grade the top one. It is likely you will need to rip out your answer sheet. We will have a stapler available for you to reattach your answer. 8

9 Arch Reg. # RAT ROB# (-- if in ARF) Buffer Number PC ROB Done with EX? Dest. Arch Reg # N N Y N Value RS RS# Op type Op1 ready? Op1 RoB/value Op2 ready? Op2 RoB/value Dest ROB 0 Mult Yes 28 Yes Add Yes 28 No Sub Yes 5 No ARF Reg# Value

EECS 470 Midterm Exam Fall 2006

EECS 470 Midterm Exam Fall 2006 EECS 40 Midterm Exam Fall 2 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points Page 2 /18 Page 3 /13 Page 4 /15