The University of Michigan - Department of EECS EECS370 Introduction to Computer Organization Midterm Exam 1 October 22, 2009 Name: University of Michigan uniqname: (NOT your student ID number!) Open book, open notes. No laptops, PDAs, cell phones, etc. (calculators are ok). This exam has 6 sets of questions, 15 pages, and 65 points. Questions vary in difficulty; it is strongly recommended that you do not spend too much time on any one question. For questions where a box is provided, please put your final answer in the box. Question Points 1 Short questions /6 2 Floating Point Arithmetic /12 3 Single Cycle Datapath /12 4 ISA Design /15 5 MIPS /10 6 Caller/Callee /10 Total /65 The rules of the Honor Code of the University of Michigan - College of Engineering apply for this exam. Honor code pledge: I have neither given nor received aid on this examination, nor have I concealed any violations of the Honor Code. Signature: (Exams without a signed pledge will not be graded) Page 1/15
1. Short Answer Questions [6 points] a) [3 points] The LC2K instruction set lacks a subtract instruction. Show in LC2K assembly how to subtract an operand in register 2 from an operand in register 1, with the result of the subtraction placed in register 3. lw 0 4 neg1 nand 2 2 2 add 2 4 2 add 1 2 3 b) [3 points] Which addressing mode does the following sequence of LC-2K instructions emulate? Assume the initial value of register 0 is zero. a) Register lw 0 3 100 lw 3 3 0 lw 3 4 0 b) Base + displacement c) Indirect d) Double indirect e) PC relative Page 2/15
2. Floating Point Arithmetic [12 points] We have constructed a new 10-bit floating point format: 1 bit sign 3 bit exponent with a bias of 3 6 bit mantissa (a.k.a. significand) All other aspects of this format are exactly the same as the standard IEEE floating point studied in class. You must show your work on this problem to be eligible for partial credit. If you need more space, attach another sheet, but label it clearly. a) [2 points] What is the largest number that can be represented exactly in this format? Give both its floating point and decimal representations Floating point: S E E E M M M M M M 0 1 1 1 1 1 1 1 1 1 Decimal: 1.111111 * 2^4 = 1 1111.11 == 31.75 b) [4 points] Convert the following two floating point values to decimal: S E E E M M M M M M 0 1 1 0 0 1 0 0 0 1 Decimal: 1.010001 * 2^3 = 1010.001 = 10.125 S E E E M M M M M M 1 0 1 0 0 1 0 0 0 0 Page 3/15
Decimal: - 1.01 * 2^-1 =.101 = -0.625 c) [4 points] Multiply the two floating point numbers given in part (b) and report your result in the floating point format: Product: S E E E M M M M M M 1 1 0 1 1 0 0 1 0 1 Sign is negative Exponent is 6 + 2-3 = 5 == 2^2 Mantissa multiplication: 1.01 0001 x 1.01 101 0001 0 1 0100 0100 1.1001 0101 Note that the low order two bits are truncated because of limited mantissa space. Hence, the product is 1.100101 * 2^2 == -110.0101 == -6.3125 d) [2 points] Because of limitations on the number of bits in the mantissa, floating point calculations often lose precision. What is the absolute difference between your answer for part (c) and the exact product of the two numbers given in part (b). The exact product is 6.328125. The difference is 0.015625 or 1/64 th. This can easily be determined by examining the value of the bits truncated during the multiplication. Page 4/15
3. Single-Cycle Datapath [12 points] The figure on page 9 illustrates the single-cycle LC2K architecture discussed in class. a) [4 points] Assume we want to have a combinational circuit (the box labeled 0? inside the dashed circle) that takes the (32-bit) result of ALU as input and outputs a single bit, Z, which is 1 if and only if the result of the operation done by ALU is zero. For example if you execute add 1 2 3 and registers 1 and 2 respectively hold values 5 and -5 then Z will be 1 in that cycle. Sketch the design for this circuit. You may use basic logic gates (e.g., NAND, AND, OR, Inverter) of any number of inputs. All 32 bits of ALU result go into OR gate, then invert the result b) [8 points] Suppose we want to add a new instruction, lbr (loop branch), to LC2K and assign it opcode 7 (111 binary), which was previously unused. lbr has two operands: R and a 16-bit displacement, which are stored in bits 21-19 and 15-0 respectively. (Similar to I-type instructions, except bits 18-16 are not used.) lbr decrements register R by 1 and if the new value of register R is not 0 then it branches to PC + 1 + displacement. Note that regardless of whether or not the branch is taken, R should be updated to contain the decremented value. i. Modify the figure on page 9 to show what extra circuitry must be added to implement this instruction. ii. The following table shows the contents of the control ROM. Fill in the values for line 7. If you need to add control signals in your design, add a new column for each and fill in the entries of the new column for every line of the ROM. If you include multiplexers (MUX's) with more than one select line, be sure to indicate which line is the high-order and which line is the low-order select input. Page 5/15
Use these columns for new control signals (if needed.) C 0 C 1 C 2 C 3 C 4 C 5 C 6 C 7 CC 8 CC 9 C 1 C 1 Line 0 0 0 Line 1 0 0 Line 2 0 0 Line 3 This part of ROM is not important to us. 0 0 Line 4 0 0 Line 5 0 0 Line 6 0 0 Line 7 0 1 0 1 0 0 0 X 1 1 iii. Provide a brief explanation of the sequence of events that take place in your design to execute lbr? Add -1 as an input to MUX before ALU; new control signal C8 for MUX. Add an AND gate with inputs being the three opcode bits and the output of Z inverted. Connect output of this gate to an OR gate along with the BEQ AND gate s output, and loop the output of the OR gate to the MUX control signal that the BEQ AND gate s output originally was connected to. Connect bits 21-19 to write MUX for register file, add control signal C9 for that MUX. Assume: new inputs to MUX go to the bottom, new control signals go to the left (and are most significant) With the proper control signals: R is added to -1 and Z determines if the result is 0. If not, and the instruction is lbr, this sends PC + 1 + offset to the PC. R 1 is sent back to the register file. Page 6/15
Page 7/15
4. ISA Design [15 points] You are the Chief Architect at Broken Arrows, a company specializing in low-overhead, processor design. The ISA for the company s flagship processor, is documented in the tables below. Instructions are 11 bits long. Data and memory addresses are all 8 bits long. The design is a CISC, byte-addressable instruction set. There is one single register, the AllAlone register (called AA) and a stack, to assist with all computation. There are 3 instruction formats, explained below. R-type Instructions Bits 10-3 Bits 2-0 Instruction Opcode Action unused opcode Pop 001 Pop the [top] value from the stack and stores it in AA, resulting in one less value on the stack. Push 010 Push the value of AA onto the top of stack, resulting in one more value on the stack. Halt 011 Halt the processor. I-type Instructions Bits 10-3 Bits 2-0 Instruction Opcode Action 2 s complement immed (IMM) opcode Pushi 000 Push the signextended immediate value onto the stack. Beqnz 100 Pop the [top] entry from the stack. If it is not zero, start executing at PC+1+IMM, where IMM is the sign-extended immediate value. Otherwise, execute the next instruction at PC+1. LoadAdd 101 Form a memory address by popping the [top] stack entry and adding it to the sign-extended immediate value IMM. Now add 1 to the word loaded from memory and then push it onto the stack. StoreSub 110 Form a memory address by popping the [top] stack entry and adding it to the sign-extended immediate value IMM. Pop the [top- 1] stack entry, subtract one and store it to that address in memory. Tadt 111 Tadt stands for Test-And-Divide-by-Two. Form a memory address by popping the [top] stack entry and adding it to the sign-extended immediate value IMM. Compares the value of AA with the value stored at the memory location specified by the address. If the value in AA is less than or equal to the memory value, the value in AA is divided by 2. Q-type Instructions Instruction Opcode ALUOp Bits 10-9 Bits 8-3 Bits 2-0 ALU op unused opcode StAdd 001 11 Remove the [top] value from the stack, add one and push onto the top of stack. StSub 001 01 Remove the [top] value from the stack, subtract one and push onto the top of stack. Page 8/15
StNand 001 10 Remove the [top] value from the stack, NAND with the value stored in AA and push the result onto the top of stack. a) [3 points] Translate the following instructions into machine code. Assembly Binary Hexadecimal LoadAdd -4 0111 1110 0101 0x 7e5 Tadt -20 0111 0110 0111 0x 767 StNand 0100 0000 0001 0x 401 b) [4 points] The loadadd instruction uses base + displacement memory addressing mode. You are asked to design a new pseudo-instruction loaddir, which uses a direct memory addressing mode. The memory address for loaddir is specified using the 8-bit IMM field as an unsigned address. (i) What is the range of values that the immediate field (IMM) can encode for the loadadd instruction? Solution: -128 127 [min value] [max value] (ii) What is the range of values that the immediate field (IMM) can encode for the new loaddir pseudo-instruction? Solution: 0 255 [min value] [max value] Page 9/15
c) [8 points] Suppose a number (not zero) in 2 s complement form is stored at memory address 100. Write a short assembly program using the new ISA design described before to find out if the number is even or odd. If the number is even, do nothing. If the number is odd, shift it right once. Store the result back(in either case) at the memory location 101. [Note: You cannot assume any data stored in memory, unless specified. Also you cannot use the pseudo instruction loaddir] Solution : 0 Pushi 100 // block 1 1 LoadAdd 0 2 StSub 3 Pop 4 Pushi 1 // block 2 5 StNand 6 Pushi 1 7 StNand // block 2 8 Pop store one copy of value in AA 9 StNand 10 Pushi 100 // block 1 11 LoadAdd 0 12 StSub 13 Pop //restore the original value back in AA 14 Beqnz 2 15 Pushi 1 16 Beqnz 2 17 Pushi100 18 Tadt 0 19 Push 20 StAdd 21 Pushi 101 22 StoreSub 0 23 Halt Page 10/15
5. MIPS [10 points] [10 points] Your friend has asked you to debug his MIPS code. His professor has asked him to implement the SAXPY code. SAXPY (Scalar Alpha X Plus Y) is one of the functions in the Basic Linear Algebra Subprograms (BLAS) package, and is a common operation in computations with vector processors. SAXPY is a combination of scalar multiplication and vector addition, as defined by the algorithm below: //x starts at mem address = 500 int x[10] = { 100, 122, 58, 123, 91, 110, 86, 54, 37, 42}; int y[10] = { 120, 16, 83, 130, 71, 10, 99, 78, 32, 63};... int a = 150; // mem address = 700 main(void) { int i; for (i=0; i < 10; i++) { y[i] = a*x[i] + y[i]; } } start: li $r5,0 li $r10,700 lw $r6, 0($r10) li $r7,500 li $r1,510 li $r4,10 loop: lw $r2,0($r7) mult $r2,$r6 mflo $r2 lw $r3,0($r1) add $r3,$r2,$r2 sw $r2,0($r1) addi $r7,$r7,1 addi $r1,$r1,1 addi $r5,$r5,1 beq $r4,$r5,loop halt Page 11/15
What is wrong with your friend s code? Write down which instruction or instructions are causing the code to fail. Explain what needs to be changed or what needs to be added. Notes: there may be more than one problem with the above code. The li instruction is a load immediate in which the register gets loaded with an immediate value. When the mult instruction is run, assume no overflow of the 32 bit register. The first incorrect instruction is the li $r1, 510. We know that x starts at 500 and goes to 539 (40 bytes). Thus y should start at 540 and go till 579, thus the instruction should read li $r1,540 The second error is in the add $r3, $r2, $r2 instruction. In MIPS, the destination register is the first register specified. In this code, we are trying to add the a*x[i] term to y[i] and store the result in $r2 which gets saved in the next line to memory. However, this assembly line is saving the value of $r2 + $r2 into $r3, which is incorrect. We should change this to: add $r2, $r3, $r2 OR add $r2, $r2, $r3 Two more errors exist in the addi $r7,$r7,1 and addi $r1,$r1,1 lines. These lines of assembly tell us where we need to load the instructions from memory. Since the data is bytes, we need to increment the memory address by 4, not by 1. Thus the instructions should be: addi $r7,$r7,4 addi $r1,$r1,4 The final error is the beq instruction. This should read bne since we are comparing the value to 10, which is the number of elements in the array. If it was beq, the program would halt after one iteration. Thus it should read: bne $r4, $r5, loop Page 12/15
6. Caller/Callee [10 points] Suppose you are given the following code: int foo() { int a = 20, b = 10, c, i, j = 3; char *p = Hello World\n ; i = 0; bar(b); for (i = 0; i < 17; ++i) { c += a; b = bar(b); } c += j; } return c; int bar(int j) { int a = j; int b = a + 5; while (b > a) { printf("%d\n", b); --b; } } return a; The architecture that you are using contains 3 caller-saved registers ($1, $2, and $3) and 3 callee-saved registers ($4, $5, and $6). Page 13/15
a) [2 points] Assume the following register assignments for bar( ): a -> $1, b -> $2. What register should be assigned to each variable in foo( ) to minimize the total number of executed save/restore instructions? (NOTE: a single save or restore instruction in code can be executed multiple times if, for example, it resides within a loop) Fill in the table below. Variable Register in foo( ) a $4 b $1 c $5 i $6 j $2 p $3 b) [2 points] How many save/restore instruction pairs are executed during a single call to foo( ) using the assignments in part a)? Note that you will need to consider bar( ) as well, since we are looking for the total number (ignore printf for the purpose of counting). For function foo: a needs to be saved and restored once. (+1) b doesn t need to be saved or restored. c needs to be saved and restored once. (+1) i needs to be saved and restored once. (+1) j needs to be saved and restored once. (+1) p doesn t need to be saved or restored. For function bar, which is called once outside the loop and 17 times inside the loop: a needs to be saved and restored 18*5 times (+90) b needs to be saved and restored 18*5 times (+90) A total of 184 save/restore pairs are needed, or 368 total instructions. Page 14/15
c) [2 points] Now assume the following register assignments are made for bar( ): a->$4, b->$5. Repeat part a) for this configuration. Variable Register in foo ( ) a $4 b $1 c $5 i $6 j $2 p $3 No change is needed from part a). It may be tempting to try to re-assign callee-saved registers to unused variables like p so that bar doesn t need to preserve them, but remember that bar can potentially be called from another function that requires those registers to be preserved. d) [2 points] Repeat part b) with the configuration given in part c). Function foo is unchanged (4 total save/restore pairs). For function bar, which is called once outside the loop and 17 times inside the loop: a needs to be saved and restored 18*1 times (+18) b needs to be saved and restored 18*1 times (+18) A total of 40 save/restore pairs are needed, or 80 total instructions. e) [2 points] What can you conclude about bar s register assignments and the effect they have on optimizing foo? Regardless of how bar assigns its variables to registers, the optimal register assignment for foo remains the same. Page 15/15