Homework 3 Assigned on 02/15 Due time: midnight on 02/21 (1 WEEK only!) B.2 B.11 B.14 (hint: use multiplexors) 1 CSCI 402: Computer Architectures Arithmetic for Computers (2) Fengguang Song Department of Computer & Information Science IUPUI 1
Today s Contents Have learned +, -, x 3.3, 3.4 Optimization of multipliers Booth s Algorithm How to implement division 3 Recall: Multiplier P = 0; for i = 0 to 31 { P += A or Zero; P>>1; } Product 0 = 1 Start 1. Test Product 0 Product 0 = 0 Multiplier 1a. Add Multiplicand to left half of Product & place the result in left half. 2. Shift the Product register right 1 bit. 32nd repetition? No: < 32 repetitions Done Yes: 32 repetitions 4 2
Faster Hardware for i = 0 to 3 { C[i] = A[i] + B[i]; } Unroll loop We may use 32 adders C[0] = A[0] + B[0]; C[1] = A[1] + B[1]; C[2] = A[2] + B[2]; C[3] = A[3] + B[3]; 1000 1110 0000 1000 1000 1000 1110000 Every adder outputs: 32 sum bits and a carry-out bit 1) LSB of the intermediate sum is a bit in the final product. 2) The other 31 bits and the carry-out bit are passed along to the next adder. Q: Why it is faster? 5 A 2-bits Multiplier (C = B x A) The AND gates will produce the partial products. For a 2-bit by 2-bit multiplier, we can just use two half adders to sum the partial products. In general, though, we ll need full adders. Here C 3 -C 0 are the product, not carries! B 1 B 0 x A 1 A 0 A 0 B 1 A 0 B 0 + A 1 B 1 A 1 B 0 C 3 C 2 C 1 C 0 6 3
A 4-bits Multiplier (C = B x A) Input 1: Previous sum Input 2: Multiplier i AND Multiplicand 7 Faster Multiplication Hardware Uses multiple adders in parallel =(log 2 32 steps) Cost/performance tradeoff n Several adders can perform in parallel e.g., a0+a1+a2+a3+a4+a5+a6+a7? 8 4
A 3 A 2 A 1 A 0 x B 3 B 2 B 1 B 0 B 0 (A 3 A 2 A 1 A 0 ) B 1 (A 3 A 2 A 1 A 0 ) B 2 (A 3 A 2 A 1 A 0 ) B 3 (A 3 A 2 A 1 A 0 ) B 3 (A 3 A 2 A 1 A 0 ) B 2 (A 3 A 2 A 1 A 0 ) B 1 (A 3 A 2 A 1 A 0 ) B 0 (A 3 A 2 A 1 A 0 ) P2 P1 9 Signed Multiplication? Determine sign of the operands, make them positive Use the same hardware of unsigned multiplication Fix up the sign of the output 10 5
Multiply MIPS Instruction 32 bits x 32 bits à up to 64 bits Hi and Lo special registers // instead of creating 64-bit register HI: stores most-significant 32 bits LO: stores least-significant 32 bits mfhi mflo Signed multiply mult (signed int), multu (unsigned int) MIPS Instructions mult rs, rt 64-bit product goes to Hi and Lo mfhi rd or mflo rd Move from HI/LO to rd Users could test if HI=0 to see if product overflows 32 bits mul rd, rs, rt (pseudoinstruction) Least-significant 32 bits of product > rd Note: mul will ignore overflow! Note: even if they are R-type operations, they only take 2 operands. 11 How to Make Multiply Even Faster? (A better algorithm) 89 x 9999 =? (calculate it by hand) 89 9999= 89 (10000 1) = 89 10000 89 1 = 890000 89 = 889911 Similarly, apply the idea to binary numbers è Booth s algorithm! 12 6
Motivation for Booth s Algorithm Traditional way: x 0110: x 0110 + 0000 nothing (0 in multiplier) + add (1 in multiplier) + add (1 in multiplier) + 0000 nothing (0 in multiplier) 000001100 ALU can get same result in more than one way: 0110 = 6 = -2 + 8 = + 1000 Replace a string of 1s with an initial subtract (when we first see a one), and then later add for the bit after the last 1. E.g., x 0110 1 0-1 0 + 0000 nothing (0 in multiplier) sub (first 1 in multiplier) + 0000 nothing (middle of string of 1s) + add (prior step had last 1) 000001100 13 Booth s Algorithm middle end of run of run 0 1 1 1 1 0 beginning of run Current Bit Bit to the Right Explanation Example Op 1 0 Begins run of 1s 0001111000 sub 1 1 Middle of run of 1s 0001111000 none 0 1 End of run of 1s 0001111000 add 0 0 Middle of run of 0s 0001111000 none 1 + 10000 01111 14 7
Booth s algorithm in Details 16 Booths Example 1 (2 x 7) (Assuming 4-bit) Operation Multiplicand Product Extra bit What to do next? 0000 0111 0 10 -> sub 1b. P = P m - + = 1110 1110 0111 0 shift P (sign ext) 2. 1111 0011 1 11 -> nop, shift 3. 1111 1001 1 11 -> nop, shift 4. 1111 1100 1 01 -> add 4b. + = 0001 1100 1 shift 0000 1110 0 Final Result 4 iterations in total 14=2x7 17 8
quotient divisor 1001 1000 110-1000 dividend 10 101 1010-1000 remainder 10 n-bit operands yield n-bit quotient and remainder Division Check for 0 divisor Long division approach (by hand) If divisor dividend bits 1 bit in quotient, then subtract Otherwise (i.e., > dividend bits) 0 bit in quotient, bring down next dividend bit For computers Not as smart as a human Do the subtract, and if remainder < 0, add divisor back Signed integer division? Divide using absolute values (same as x) Adjust sign of quotient and remainder as required 19 Register Setup Remainder register R (initially place dividend in R) (64 bits) Divisor register D (place divisor in left half of D) (64 bits) We place divisor in the left half so we can start subtracting from the most significant dividend bits. Quotient register Q (32 bits) D Divisor 000 R 0000 Dividend 20 9
Division Hardware Version 1 D Divisor 000 R 0000 Dividend Initially divisor in left half -Place divisor in left half of Divisor register D divisor -Place dividend in the right half of Remainder register R -So, we can subtract entire D from entire R -The first quotient bit = 0. Initially = dividend in right half 21 Divide Algorithm Version 1 Takes n+1 steps for n-bit Quotient & Remainder. Start: Place Dividend in Remainder 1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register. Remainder ³ 0 Test Remainder Remainder < 0 2a. Shift the Quotient register to the left, setting the new rightmost bit to 1. 2b. Restore the original value by adding the Divisor register to the Remainder register. Shift the Quotient register to the left, setting the new least significant bit to 0. 3. Shift the Divisor register right1 bit. n+1 repetition? No: < n+1 repetitions Yes: n+1 repetitions Done 22 10
7/2è quotient 3 remainder 1 Quotient Example Divisor Dividend in remainder 1 Initial values Q: 0000 D: 0000 R: 0000 0111 D = 1110 0000 1: R = R D Q: 0000 D: 0000 R: 1110 0111 ß R D < 0 2b: +D, sl Q, 0 Q: 0000 D: 0000 R: 0000 0111 3: Shr D Q: 0000 D: 0001 0000 R: 0000 0111 D = 1111 0000 2 1: R = R D Q: 0000 D: 0001 0000 R: 1111 0111 2b: +D, sl Q, 0 Q: 0000 D: 0001 0000 R: 0000 0111 3: Shr D Q: 0000 D: 0000 1000 R: 0000 0111 D = 1111 1000 3 1: R = R D Q: 0000 D: 0000 1000 R: 1111 1111 2b: +D, sl Q, 0 Q: 0000 D: 0000 1000 R: 0000 0111 3: Shr D Q: 0000 D: 0000 0100 R: 0000 0111 D = 1111 1100 4 1: R = R D Q: 0000 D: 0000 0100 R: 0000 0011 2a: sl Q, 1 Q: 0001 D: 0000 0100 R: 0000 0011 3: Shr D Q: 0001 D: 0000 R: 0000 0011 D = 1111 1110 5 1: R = R D Q: 0001 D: 0000 R: 0000 0001 2a: sl Q, 1 Q: 0011 D: 0000 R: 0000 0001 3: Shr D Q: 0011 D: 0000 0001 R: 0000 0001 4 bits è 5 iterations: first iteration will always generate Q = 0 bit, because we are subtracting 0-D iteration 23 Improved on Divide Version 2 Half of the bits in the divisor are always zero 32 bits of 64-bit ALU are wasted! Instead of shifting Divisor to right, we can shift Remainder to left (they are equivalent) 1st step quotient bit is always zero, we can save 1 iteration by shift first and then subtract at the beginning. Can eliminate Quotient register by combining with Remainder Start by shifting Remainder left The consequence of combining the two registers together and the new order of the operations in the loop is that the Remainder will be shifted left one extra time. Thus, the final correction step must shift back the remainder in the left half of the register 25 11
Optimized Divider Divisor 32 bits 32-bit ALU HI LO Remainder (Quotient) 64 bits Shift Left Write Control One cycle per partial-remainder subtraction Looks a lot like a multiplier! Indeed, same hardware can be used for both 26 Divide Algorithm Version 2 Remainder Divisor 0000 0111 Start: Place Dividend in Remainder 0. Shift the Remainder register left 1 bit. 1. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. Remainder ³ 0 Test Remainder Remainder < 0 2a. Shift the Remainder register to the left, setting the new rightmost bit to 1. 2b. Restore the original value by adding the Divisor register to the left half of the Remainder register. Shift the Remainder register to the left, setting the new rightmost bit to 0. nth repetition? No: < n repetitions Done. Shift left half of Remainder right 1 bit. Yes: n repetitions (n = 4 here) 27 12
Steps Input: Iteration 1 Divisor (D reg.) 7/2 =? Dividend (R reg.) Operation to do 0000 0111 shl R 0000 1110 sub D 2 = 1110 Start: Place Dividend in Remainder 0. Shift the Remainder register left 1 bit. Iteration 1 1110 1110 2a or 2b? 1. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. Iteration 2 0001 1100 sub D Remainder 0 Test Remainder Remainder < 0 Iteration 2 Iteration 3 1111 1100 0011 1000 2a or 2b? sub D 2a. Shift the Remainder register to the left, setting the new rightmost bit to 1. 2b. Restore the original value by adding the Divisor register to the left half of the Remainder register. Shift the Remainder register to the left, setting the new rightmost bit to 0. Iteration 3 0001 1000 2a or 2b? nth repetition? No: < n repetitions Iteration 4 0011 0001 sub D Done. Shift left half of Remainder right 1 bit. Yes: n repetitions (n = 4 here) Iteration 4 0001 0001 2a or 2b? Final: 0011 0001 0011 shift remainder to right 29 MIPS Division Also use special HI/LO registers for result HI: store 32-bit Remainder LO: store 32-bit Quotient Instructions div rs, rt or divu rs, rt No overflow checking, no divide-by-0 checking Software must perform checks if required Use mfhi, mflo to access result 31 13
Optimized Divider Divisor 32 bits 32-bit ALU HI LO Dividend (Remainder) (Quotient) 64 bits Shift Left Write Control 32 Optimized Multiply Hardware Multiplicand 32 bits 32-bit ALU HI LO Product (Multiplier) 64 bits Shift Right Read LSB Control 33 14
More About Divide Version 2 Same Hardware as Multiply: just let ALU do add or subtract, and let 64-bit register shift left or shift right Hi and Lo registers combined to act as 64-bit register for multiply and divide Signed divide: Simplest is to make both operands positive, then negate Quotient and Remainder if necessary Complexity: What are the signs of Quotient and Remainder? Note1: Quotient negated if Dividend & Divisor have different signs Note2: Dividend and Remainder must have same sign e.g., 7 2 = 3, remainder =? e.g., 7-2 = 3, remainder =? e.g., 7-2 = -3, remainder =? Because Dividend = Quotient x Divisor + Remainder 34 15