Department of Electrical and Computer Engineering University of Wisconsin Madison ECE 552 Introductions to Computer Architecture Homework #2 (Suggested Solution) 1. (10 points) MIPS and C program translations Translate the following C statements into MIPS assembly code, using a minimum number of MIPS assembly instructions. (a) (4 points) Assume that the variables a, b, c and d are assigned to registers $s0, $s1, $s2, and $s4 respectively. i) a = b a ii) c = d + ( c 5) i) add $s0, $s0, $s1 # a = a + b sub $s0, $0, $s0 # a = 0 (a + b) ii) addi $2, $s2, 5 # c = c + 5 sub $s2, $s3, $s2 # c = d (c+5), Grading: Each part 2 pts. If answer is correct but need more than 2 instr., 1 pt off. If answer is wrong, 2 pts off. In the last line, $s4 instead of $s3 will also receive full mark. (b) (6 points) Assume the variables f, g, h, i, and j are assigned to registers $s0, $s1, $s2, $s3, and $s4 respectively, and the base address of the arrays A and B are in registers $s6, and $s7 respectively. i) f = g A[4]; ii) B[8] = A[i j]; (assume i j is in the range of indices of array A) (i) lw $s0, 16($s6) # load A[4] to $s0 add $s0, $s0, $s1 # add g to A[4] sub $s0, $0, $s0 # 0 (g + A[4]) (ii) sub $t0, $s3, $s4 # t0 = i j sll $t0, $t0, 2 add $t0, $t0, $s6 # t0 = addr of A[i-j] lw $t1, 0($t0) # t1 = A[i-j] sw $t1, 32($s7) # B[8] = A[i-j] Grading: 3 pts each part. If answer is correct but uses more instructions, 1 pt off. Wrong answer 3pts off. 2. (10 points) MIPS Assembly code to C code conversion Assume that initially variables a, b, c, d, and e are assigned to registers $s0, $s1, $s2, $s3, and $s4 respectively. The values of a, b, c, d, and e are all multiples of 4 (can be divided evenly by 4) and greater than 0. Assume that the base addresses of arrays P and Q are in registers $s6 and $s7 respectively. 1
(a) (4 points) For the MIPS assembly instructions below, what are the corresponding C statements. Sll $s2, $s4, 1 Add $s0, $s2, $s3 Add $s0, $s0, $s1 The corresponding C statement is a = b + d + 2*e. Sll $s2, $s4, 1 # c = 2*e Add $s0, $s2, $s3 # a = c + d = 2e + d Add $s0, $s0, $s1 # a = a + b = b + d + 2e Grading: each mistake 1 pt off until all 4 pts are taken. If only the answer is given and the answer is wrong, all 4 pts are off. Answer alone but correct gets full marks. (b) (6 points) For the MIPS assembly instructions below, what is the corresponding C statement? Assume part (b) is independent of part (a) add $t0, $s6, $s0 add $t1, $s7, $s1 lw $s0, 0($t0) addi $t2, $t0, 4 lw $t0, 0($t2) add $t0, $t0, $s0 sw $t0, 0($t1) Q[b/4] = P[a/4] + P[a/4 + 1]. add $t0, $s6, $s0 # Base addr of P + a/4 add $t1, $s7, $s1 # Base addr of Q + b/4 lw $s0, 0($t0) # a = P[a/4] addi $t2, $t0, 4 # Base addr of P + a/4 + 4/4 lw $t0, 0($t2) # t0 = P[a/4 + 1] add $t0, $t0, $s0 # t0 = P[a/4] + P[a/4+1] sw $t0, 0($t1) # save t0 to Q[b/4] Grading: the divided by 4 (/4) in the indices is very important. If the index expression is wrong, the all answer is considered wrong and all 6 pts are off. 3. (15 points) Assume that an array is stored in a memory of a MIPS processor as follows: Addr 20 24 28 32 36 Data 4 5 3 2 1 (a) (5 points, CC) Write a C code to sort the data from lowest to highest, placing the lowest value in the smallest memory location shown in the table above. Assume the data shown represents the C variable called A r r a y which is an array type int, and that the first number in the array shown is the first element in the array. Assume the memory is byte addressable, and each word consists of 4 bytes. temp = Array[0]; % temp = 4 temp2 = Array[1]; % temp2 = 5 Array[0] = Array[4]; % M[20] = 1 Array[1] = Array[3]; % M[24] = 2 Array[3] = temp; % M[32] = 4 2
Array[4] = temp2; % M[36] = 5 Grading: there may be other valid solutions. If answer is reasonable, full mark will be given. (b) (5 points, CC) For the memory address list in the table above, write a MIPS code to sort the data from lowest to highest, placing the lowest value in the smallest memory location. Use a minimum number of MIPS instructions. Assume the base address of C variable A r r a y is stored in register $s6. lw $t0, 0($s6) # t0 Array[0] = M[20] = 4 lw $t1, 4($s6) # t1 Array[1] = M[24] = 5 lw $t2, 16($s6) # t2 Array[4] = M[36] = 1 sw $t2, 0($s6) # M[20] = Array[0] t2 = 1 lw $t2, 12($s6) # t2 M[32] = Array[3] = 2 sw $t2, 4($s6) # M[24] = Array[1] [t2] = 2 sw $t0, 12($s6) # M[32] = Array[3] [t0] = 4 sw $t1, 16($s6) # M[36] = Array[4] [t1] = 5 Grading: completion credits. If answer is reasonable, full mark will be given. (c) (5 points) Consider the hexadecimal number 0 ABCDEF12. i) Translate this number into decimal and ii) show how the data would be arranged in memory starting from address 0 in a little-endian machine and a big-endian machine. i) the corresponding decimal number is 2882400018 and (ii) the memory map is as follows: Addr. 0 1 2 3 Data (little endian) 12 EF CD AB Data (big endian) AB CD EF 12 Grading: (i) 2 pts, answers only. (ii) 1.5 pt each. All parts must be correct. Answers only. Address is 0, 8, 16, 24 if it is assumed bits. 0, 1, 2, 3 are the actual address in a byte addressable memory. Both will be given credits. 4. (20 points) MIPS Instructions Sets (a) (12 points) Translate the assembly program into Hexadecimal machine code. Then represent the binary representation in corresponding R, I, or J format according to the respective field partitions: R: Opcode rs rt rd Shamt funct I: Opcode rs rt immediate J: Opcode address If a field in the instruction is unused, it s value is assumed to be 0. Unless specified, numbers are in decimal number representation 3
Assembly instruction j 200000 ten nor $t2, $s2, $s3 sw $t2, 4($s0) j 200000 ten Opcode Address 0x08030D40 0000 10 00 0000 0011 0000 1101 0100 0000 nor $t2, $s2, $s3 Opcode rs rt rd Shamt Funct 0x02535027 0000 00 10 010 1 0011 0101 0 000 00 10 0111 sw $t8, 28($s7) Opcode rs rt Immediate 0xAEF8FFF3 1010 11 10 111 1 1000 1111 1111 1111 0011 Grading: 4 pts each. For each instr. each mistake costs 1 pt. until all 4 pts off. (b) (8 points) Assume at clock cycle t, the content of PC = 0xD0004000. The content of the program memory pointed by PC is MEM[PC] = 0x080009C4. In a single cycle MIPS Datapath, what is the content of PC in Hex format at clock cycle t + 1? 0x080009C4 = 0000 1000 0000 0000 0000 1001 1100 0100. Opcode = 2 H jump instruction. The remaining 26 bits are offset = 00 0000 0000 0000 1001 1100 0100. Next, PC+4{31:28} = 0xD0004004{31:28} = 0xD. Thus the new PC content is {PC+4{31:28}, offset, 2b 00} = 0xD0002710. 5. (20 points) Grading: jump instruction 3 pts, jump address 5 pts. (a) (5 points) Write a (shortest) sequence of regular MIPS instructions to realize a new pseudo-instruction push $s0. This instruction will save the content of $s0 into the stack. addi $sp, $sp, 4 sw $sp, $s0 Grading: if the answer is correct but has more than 2 instr., deduct 1 pt for each extra instr until all 5 pts are off. If the answer is wrong, all 5 pts are off. (b) (5 points) Suppose the contents of registers $s0 and $s1 are: [$s0] = 0xFFFFFFFF, and [$s1] = 0x00000001. What are values in registers $t0 and $t1 after these two instructions: slt $t0, $s0, $s1 # signed comparison sltu $t1, $s0, $s1 # unsigned comparison For signed comparison, [$s0] = 1 < [$s1] = +1. Since 1 < 1, $t0 = 1. For unsigned comparison, [$s0] = 2 32 1 > [$s1] = +1. Thus $t1 = 0. Grading: $t0: 2 pts, $t1: 3 pts 4
(c) (5 points) Assume that initially [$t0] = 0x00000000. What is the value of $t0 after execution of the following two MIPS assembly instructions? ori $t0, $t0, 0x0002 lui $t0, 0xAB10 [$t0] = 0xAB100000. Note that the lui instruction automatically filled 0x0000 to the lower 16 bits of the target register, wiping out the 0x0002 value in [$t0] after the ori instruction. Grading: Answer only. (d) (5 points) if the current content of PC is 0x00000000 (hex), what is the highest PC value a single MIPS jump instruction may realize (give the answer in hex format)? The jump address is first 4 bits of PC concatenated with 26 bits from the instruction and followed by two 0s since it is a word boundary. Thus, the highest PC address a single MIPS jump instruction may jump to will be 0x00000000 + 0x0FFFFFFC = 0x0FFFFFFC. Grading: Answers only. 6. (10 points) consider the following MIPS assembly code and answer the following questions. LOOP: addi $s2, $s2, 2 Subi $t1, $t1, 1 bne $t1, $0, LOOP (a) (5 points) Assume the contents of registers $t1, and $s2 are 10 and 0 initially prior to the execution of above loop. What is the value $s2 after completion of the loop? The loop will be executed 10 times. During each iteration of the loop, $s2 will be incremented by 2. Hence the end value of $s2 is 20. Grading: executing loop 10 times (2 pts). End value of $s2: 3 pts (b) (5 points) Assume the integers I, and A respectively are assigned with registers $t2 and $s2, and that I = 10 and A = 0. Write a C code routine corresponding to above MIPS assembly code. do { A += 2; I = I 1; } while (I > 0) Grading: receive full mark if correct. 7. (15 points) Let the latencies of each major block for the logic blocks shown in Fig. 4.11 in the textbook be as follows: I-Mem Add Mux ALU Regs D-Mem Sign-Extend Shift-Left-2 200 ps 70 ps 20 ps 90 ps 90 ps 250 ps 15 ps 10 ps While answering the following parts, briefly explain the way the answers are derived. (a) (5 points) What is the clock cycle time if the only types of instructions that need to be supported are ALU instructions (ADD, AND, etc)? 5
The longest-latency path for ALU operations is through I-Mem, Regs, Mux (to select ALU operand), ALU, and Mux (to select value for register write). Note that the only other path of interest is the PC-increment path through Add (PC + 4) and Mux, which is much shorter. So for the I-Mem, Regs, Mux, ALU, Mux path we have: 200ps + 90ps + 20ps + 90ps + 20ps = 420ps. Grading: Answer only. (b) (5 points) What is the clock cycle time if the only instruction needed to be supported is the load word (LW) instruction? The longest-latency path for LW is through I-Mem, Regs, Mux (to select ALU input), ALU, D-Dem, and Mux (to select what is written to register). The only other interesting paths are the PC-increment path (which is much shorter) and the path through Sign-extend unit in address computation instead of through Registers. However, Regs has a longer latency than Sign-extend, so for I-Mem, Regs, ALU, D-Mem, and Mux path we have: 200ps + 90ps + 90ps + 250ps + 20ps = 650ps. Grading: Answer only. For part (b) if answer is 670ps, full mark will also be given. (c) (5 points) What is the clock cycle time if the instructions needed to be supported are ADD, BEQ, LW, and SW instructions? The answer is 650ps which is the same as in part (b) because the LW instruction has the longest critical path. The longest path for SW is shorter by one Mux latency (no write to register), and the longest path for ADD or BNE is shorter by one D-Mem latency. Grading: Answer only. 6