# HW2 solutions You did this for Lab sbn temp, temp,.+1 # temp = 0; sbn temp, b,.+1 # temp = -b; sbn a, temp,.+1 # a = a (-b) = a + b;

1 HW2 solutions 3.10 Pseuodinstructions What is accomplished Minimum sequence of Mips Move \$t5, \$t3 \$t5=\$t3 Add \$t5, \$t3, \$0 Clear \$t5 \$t5=0 Xor \$t5, \$t5, \$t5 Li \$t5, small \$t5=small Addi \$t5, \$0, small Li \$t5, big \$t5=big lui \$t5, big[31:16] Ori \$t5, \$t5, big[15:0] Lw \$t5, big(\$t3) \$t5=mem[\$t3+big] Lui \$t5, big[31:26] Ori \$t5, \$t0, big[15:0] Add \$t3, \$t3, \$t5 Lw \$t5, 0(\$t3) Addi \$t5, \$t3, big \$t5=\$t3+big Lui \$t5, big[31:16] Ori \$t5, big[15:0] Addi \$t5, \$t5, \$t3 Beq \$t5, small, L If \$t5=small, branch to L Addi \$at, \$0, small Beq \$t5, \$at, L Beq \$t5, big, L If \$t5=big, branch to L Lui \$at, big[31:16] Ori \$at, big[15:0] Beq \$at, \$t5, L Ble \$t5, \$t3, L If \$t5<=\$t3, branch to L Slt \$at, \$t3, \$t5 Beq \$at, \$0, L Bge \$t5, \$t3, L If \$t5>=\$t3, branch to L Slt \$at, \$t5, \$3 Beq \$at, \$0, L Bgt \$t5, \$t3, L If \$t5>\$t3, branch to L Slt \$at, \$t3, \$t5 Bne \$at, \$0 L 3.25 You did this for Lab sbn temp, temp,.+1 # temp = 0; sbn temp, b,.+1 # temp = -b; sbn a, temp,.+1 # a = a (-b) = a + b; 3.30 sbn neg_a, neg_a,.+1 # neg_a = 0; sbn neg_a, a,.+1 # neg_a = -a; sbn c, c,.+1 # c = 0; loop: sbn b, one,.+1 # do { b = b 1; sbn c, neg_a,.+1 # c = c + a; sbn temp, temp,.+1 # temp = 0; sbn temp, b, loop # } while (b > 0);

2 Note (1) This solution does not work if b = 0, because the problem description said to assume that a and b are greater than 0. Perfectionist students are likely to write solutions that do work for b = 0 though, so their answers would be an instruction or too long add \$t2, \$t3, \$t4 slt \$t2, \$t2, \$t You need only alter the full adder for the MSB such that the Set output is the value of the full adder output XORed with the Overflow 4.24 (a * 2^32 + b) * (c * 2^32 + d) = (a * c * 2^64 + a * d * 2^32 + b * c * 2^32 + b * d multu \$t5, \$t7 # b * d mflo \$t3 # product[31:0] = (b * d)[31:0] mfhi \$t2 # product[63:32] = (b * d)[64:32] multu \$t4, \$t7 # a * d mflo \$t8 # \$t8 = (a * d)[31:0] mfhi \$t1 # product[95:64] = (a * d)[63:32] addu \$t2, \$t2, \$t8 # product[63:32] += (a * d)[31:0] sltu \$t8, \$t2, \$t8 # \$t8 = carry of 63:32 in last op

3 addu \$t1, \$t1, \$t8 # product[95:64] += carry sltu \$t0, \$t1, \$t8 # product[127:96] = carry in 95:64 multu \$t1, \$t2 # b * c mflo \$t8 # \$t8 = (b * c)[31:0] addu \$t2, \$t2, \$t8 # product[63:32] += (b * c)[31:0] sltu \$t8, \$t2, \$t8` # \$t8 = carry of 63:32 in last op addu \$t1, \$t1, \$t8 # product[95:64] += carry mfhi \$t8 # \$t8 = (b * c)[63:32] addu \$t1, \$t1, \$t8 # product[95:64] += (b * c)[63:32] sltu \$t8, \$t1, \$t8 # \$t8 = carry of 95:64 in last op addu \$t0, \$t0, \$t8 # product[127:96] += carry multu \$t4, \$t6 # a * c mflo \$t8 # \$t8 = (a * c)[31:0] addu \$t1, \$t1, \$t8 # product[95:64] += (a * c)[31:0] sltu \$t8, \$t1, \$t8 # \$t8 = carry of 95:64 in last op addu \$t0, \$t0, \$t8 # product[127:96] += carry mfhi \$t8 # \$t8 = (a * c)[63:32] addu \$t0, \$t0, \$t8 # product[127:96] += (a * c)[63:32] 4.52 Each CSA has a delay of 2T. The iterative CLA-based multiplier takes: 16 layers * CLA delay = 16 * 7T = 112T The CSA multiplier takes: 6 layers * 2T + CLA delay = 6 * 2T + 7T = 19T

4 4.53 (ai+1 ai ai1) == NOP + NOP = NOP == NOP + multiplicand = multiplicand == 2 * multiplicand + (-multiplicand) = multiplicand == 2 * multiplicand + NOP = 2 * multiplicand == -(2 * multiplicand) + NOP = -(2 * multiplicand) == -(2 * multiplicand) + multiplicand = -multiplicand == NOP + -multiplicand = -multiplicand == NOP + NOP = NOP 4.54 See Lecture notes. Basic algorithm is: Take the top 4 bits of the dividend, and subtract off the divisor. Based on the top value of the result, we choose whether the next stage is an add (top bit was 1), or a subtract (top bit was 0). The inverted value of this top bit is also the quotient result. The next stage is simply the lower 3 bits of the subtracted (or added) results, along with the next bit of the dividend. The divisor remains the same.

5 You continue this until you have used up all the bits of the dividend. The remainder is the final sum, unless the top bit is 1, in which case you have to add the divisor to that final sum to fix the remainder. A5.ktext 0x sw \$a0, save0 sw \$a1, save1 mfc0 \$k0, \$13 # Move Cause into \$k0 mfc0 \$k1, \$14 # Move EPC into \$k1 addiu \$v0, \$zero, 0x44 slt \$v0, \$v0, \$k0 # Ignore interrupts bgtz \$v0, _restore mov \$a0, \$k0 # Move Cause into \$a0 mov \$a1, \$k1 # EPC into \$a1 jal print_excp # Print exception error msg _restore: lw \$a0, save0 lw \$a1, save1 lw \$k0, -4(\$k1) # \$k0 = previous instruction srl \$k0, \$k0, 26 # \$k0 = opcode of prev instr ori \$k1, \$zero, 2 # opcode of j beq \$k0, \$k1, _delayslot # ori \$k0, \$zero, 4 # opcode of beq beq \$k0, \$k1, _delayslot # and so on for: jr, jal, bne, bltz, bgezal, bczt... _done: mfc0 \$k1, \$14 # reload EPC into \$k1 addiu \$k1, \$k1, 4 # Do not reexecute fault instr jr \$k1 rfe # done in delay-slot of jr _delayslot: mfc0 \$k1, \$14 # reload EPC into \$k1 addiu \$k0, \$k1, -4 # \$k0 = EPC - 4 addiu \$k1, \$k1, 4 # \$k1 = EPC + 4 jr \$k0 # poke at branching instr j _check _check: rfe jr \$k1 or \$zero,\$zero,\$zero.kdata Save0:.word 0 save1:.word 0 This problem is hard. The basic idea of this solution is to do everything possible in order not to touch the instruction that caused the exception. We need a way to poke the branching instruction, that is, execute the instruction without executing any instructions around it. This procedure works by calling the branching instruction with jr, but putting a j in the delay-slot of the jr, so that we will jump back after executing the branching instruction and not execute its regular delay-slot. If it turns out that the branch is not taken (which may happen with a bne or beq), then we jump back to EPC+4. Note: this

6 solution assumes that j in branch delay slots will NOT executed if branch is taken! Other elegant solutions will be highly appreciated B.6 B.10 A B!A!B!(A+B)!A *!B!(A*B)!A +!B a) F = (!x3 &&!x2 && x1) (!x3 && x2 &&!x1) (x3 &&!x2 &&!x1) b) F = (!x3 && x2 && x1) (x3 &&!x2 && x1) (x3 && x2 &&!x1) c) F = (!x3 &&!x2) (!x3 &&!x1) d) F = (x3 &&!x2) (x3 && x1) B.14 Simply use two muxes: B.21

7 B.22 State Assignments: Left (00), Middle a (01), Right (10), Middle b (11) S1 S0 S1 S Solving the K-Maps for S1 and S0, you get: S1 = XOR (S1, S0) S0 = NOT (S0) The Outputs are associated with the state (where both Middle a, b output Middle) C.1 This looks just like the PLA on page C-20, except that there is now S0 through S9. The logic is the same, it just looks a lot bigger. Each column should also only be connected to one of the state bits.

