Overview. EECS Components and Design Techniques for Digital Systems. Lec 16 Arithmetic II (Multiplication) Computer Number Systems.

Overview EE 15 - omponents and Design Techniques for Digital ystems Lec 16 Arithmetic II (Multiplication) Review of Addition Overflow Multiplication Further adder optimizations for multiplication LA in the large parallel prefix David uller Electrical Engineering and omputer ciences University of alifornia, Berkeley http://www.eecs.berkeley.edu/~culler http://inst.eecs.berkeley.edu/~cs15 Review ircuit design for unsigned addition Full adder per bit slice Delay limited by arry Propagation» Ripple is algorithmically slow, but wires are short arry select imple, resource-intensive Excellent layout arry look-ahead Excellent asymptotic behavior Great at the board level, but wire length effects are significant on chip Digital number systems How to represent negative numbers imple operations lean algorithmic properties 2omplement is most widely used ircuit for unsigned arithmetic ubtract by complement and carry in Overflow when cin xor cout of sign-bit is 1 omputer Number ystems Positional notation D n-1 D n-2 D represents D n-1 B n-1 D n-2 B n-2 D B where D i {,, B-1 } 2s omplement D n-1 D n-2 D represents: - D n-1 2 n-1 D n-2 2 n-2 D 2 MB has negative weight 5-4 -5-6 -3-7 -2 111 11 111 11 111 11-1 1111 1 1 1 1 11 2 3 1 4 11 5 11 111 6 7

2s omplement Overflow 2omp. Overflow Detection How can you tell an overflow occurred? Add two positive numbers to get a negative number or two negative numbers to get a positive number -1-1 -2 1111 1-2 1111 111 1 111-3 111 2-3 1 111-4 -4 11 11 3 11-5 111-5 1 111 4 11-6 11 11-6 5 11 11 11-7 1 111 6-7 1 7 1 1 2 1 11 3 1 4 11 111 7 11 6 5 5 3 Overflow 5 2 7 No overflow 1 1 1 1 1 1 1 1 1 1 1 1 1 1-7 -2 7 Overflow -3-5 No overflow 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 3 =! -7-2 = 7! Overflow occurs when carry in to sign does not equal carry out 2s omplement Adder/ubtractor Adders on the Xilinx Virtex A 3 B 3 B 3 A 2 B 2 B 2 A 1 B 1 B 1 A B B 1 el 1 el 1 el 1 el O I O I O I O I 3 2 1 Overflow A - B = A (-B) = A B 1 Add/ubtract Dedicated carry logic provides fast arithmetic carry capability for highspeed arithmetic functions. The Virtex-E LB supports two separate carry chains, one per lice. The height of the carry chains is two bits per LB. The arithmetic logic includes an XOR gate and AND gate that allows a 2- bit full adder to be implemented within a slice. in to out delay =.1ns, versus.4ns for F to X delay. How do we map a 2-bit adder to one slice?

Time / pace (resource) Trade-offs arry select and LA utilize more silicon to reduce time. an we use more time to reduce silicon? How few FAs does it take to do addition? Bit-serial Adder n-bit shift registers A B lsb reset FF FA A, B, and R held in shiftregisters. hift right once per clock cycle. Reset is asserted by controller. n-bit shift register c s R Addition of 2 n-bit numbers: takes n clock cycles, uses 1 FF, 1 FA cell, plus registers the bit streams may come from or go to other circuits, therefore the registers may be optional. Requireontroller What does the FM look like? Implemented? Final carry out? Discussion What is sign extension and why does it work? Where is addition used in the project? Where might you want more powerful arithmetic operations? Announcements Reading: 5.8 (4 pages!) Digital Design in the news from UB U Berkeley is among six universities to be part of the program started by IBM orp. and Google Inc. on college campuses to promote computer-programming techniques for clusters of processors known as "clouds". loud computing allowomputers in remote data centers to run parallel, increasing their processing power. Each company will spend between $2 million and $25 million for hardware, software and services that can be used by computer-science professors and students.

Basic concept of multiplication ombinational Multiplier: accumulation of partial products multiplicand multiplier Partial products 111 (13) * 111 (11) 111 111 A3 B2 A3 B1 A2 B2 A3 B3 A2 B A2 B1 A1 B2 A2 B2 A2 B A1 B1 2 A1 B1 A1 B 1 A B 111 A3 B3 A2 B3 A1 B3 3 11111 (143) 7 6 5 4 3 2 1 product of 2 n-bit numbers is an 2n-bit number sum of n n-bit partial products unsigned Array Multiplier Generates all n partial products simultaneously. b3 b2 b1 b P7 P6 P5 P4 a a1 a2 a3 P P1 P2 P3 Each row: n-bit adder with AND gates carry out b j FA sum in sum out a i carry in What is the critical path? hift and Add Multiplier n-bit adder 1 P B n-bit shift registers A n-bit register ost α n, Τ = n clock cycles. What is the critical path for determining the min clock period? ums each partial product, one at a time. In binary, each partial product is shifted versions of A or. ontrol Algorithm: 1. P, A multiplicand, B multiplier 2. If LB of B==1 then add A to P else add 3. hift [P][B] right 1 4. Repeat steps 2 and 3 n-1 times. 5. [P][B] has product.

arry-save Addition peeding up multiplication is a matter of speeding up the summing of the partial products. arry-save addition can help. arry-save addition passes (saves) the carries to the output, rather than propagating them. carry-save add carry-propagate add Example: sum three numbers, 3 1 = 11, 2 1 = 1, 3 1 = 11 3 1 11 2 1 1 c 1 = 4 1 s 1 = 1 1 3 1 11 c 1 = 2 1 s 11 = 6 1 1 = 8 1 carry-save add In general, carry-save addition takes in 3 numbers and produces 2. Whereas, carry-propagate takes 2 and produces 1. With this technique, we can avoid carry propagation until final addition arry-save ircuits A c FA FA FA FA FA FA FA FA When adding sets of numbers, carry-save can be used on all but the final sum. tandard adder (carry propagate) is used for final sum. x 2 x 1 x A A A PA Array Mult. using arry-save Addition b3 b2 b1 b P7 P6 P5 P4 1 a a1 a2 a3 P P1 P2 P3 carry out Fast carrypropagate adder b j FA sum in sum out a i carry in Another Representation um In X in Y F A O I out um Out Add PA Building block: full adder and A3 A2 A1 A B A3 B A2 B A1 B B1 A3 B1 A2 B1 A1 B1 1 B2 A3 B2 A2 B2 A1 B2 2 B3 A3 B3 A2 B3 A1 B3 3 P7 P6 P5 P4 P3 P2 P1 P 4 x 4 array of building blocks

arry-save Addition A is associative and commutative. For example: (((X X 1 )X 2 )X 3 ) = ((X X 1 )(X 2 X 3 )) x 7 x 6 x 5 x 4 x 3 x 2 x 1 x A balanced tree can be used to A A reduce the logic delay. igned Multiplier igned Multiplication: Remember for 2 omplement numbers MB has negative weight: N 2 i X = x i 2 x i= n 1 2 n 1 A A A A PA log 2 N log 3/2 N This structure is the basis of the Wallace Tree Multiplier. Partial products are summed with the A tree. Fast PA (ex: LA) is used for final sum. Multiplier delay α log 3/2 N log 2 N ex: -6 = 111 2 = 2 1 2 1 2 2 1 2 3-1 2 4 = 2 8-16 = -6 Therefore for multiplication: a) subtract final partial product b) sign-extend partial products Modifications to shift & add circuit: a) adder/subtractor b) sign-extender on P shifter register igned multiplication igned Array Multiplier multiplicand multiplier - * 111 (-3) 111 (-5) 1111111 111 111 1111 (-3) (-6) -(-24) Note: 2omplement ign extension 1111 (15) product of 2 n-bit numbers is an 2n-bit number sum of n n-bit partial products Implicit ign extension b3 b2 b1 b - - - - a a1 a2 a3 P P1 P2 P3 P7 P6 P5 P4

hift and Add igned Multiplier arry Look-ahead Adders n-bit adder 1 P B n-bit shift registers A n-bit register igned extend partial product at each stage Final step is a subtract In general, for n-bit addition best we can achieve is delay α log(n) How do we arrange this? (think trees) First, reformulate basic adder stage: a b c i c i1 s 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 carry kill k i = a i b i carry propagate p i = a i b i carry generate g i = a i b i c i1 = g i p i c i s i = p i c i arry Look-ahead Adders in blocks c arry Look-ahead Adders Group propagate and generate signals: p i g i p i1 g i1 c in P = p i p i1 p ik G = g ik p ik g ik-1 (p i1 p i2 p ik )g i a b a 1 b 1 a 2 b 2 a P a G a c 3 = G a P a c 9-bit Example of hierarchically generated P and G signals: P = P a P b P c p ik g ik c out a 3 b 3 a 4 b 4 a 5 b 5 b P b G b P true if the group as a whole propagates a carry to c out G true if the group as a whole generates a carry out = G P in Group P and G can be generated hierarchically. a 6 b 6 a 7 b 7 a 8 b 8 c c 6 = G b P b c 3 P c G c c 9 = G Pc G = G c P c G b P b P c G a

Parallel Prefix (generalizing LA) 76 74 3 7 3 74 3 3 54 54 32 1 1 B x BA BAx BA A Ax a b s c c 1 a 1 b 1s1 a 2 b 2s2 c 3 a 3 b 3s3 a 4 b 4s4 p,g c c 2 P,G c c 4 c a i b isi P,G c i c i1 p,g p = a b g = ab s = p c i c i1 = g c i p 8-bit arry Lookahead Adder 76 54 32 1 c 5 a 5 b 5s5 c c 8 c in 7 6 54 3 1 4 2 6 5 4 3 2 76 74 64 54 32 3 2 7 6 5 4 1 1 a 6 b 6s6 c 7 a 7 b 7s7 c 6 P a,g a P b,g b c out P,G P = P a P b G = G b G a P b out = G c in P ompute all the prefixes F i = F i-1 op F i-2 op op F Assume associative and commutative ummary 2 complement number systems Algebraic and corresponding bit manipulations Overflow detection ignficance of sign bit -2 n-1 arry look ahead is form a parallel prefix Time / pace tradeoffs Bit serial adder Binary Multiplication algorithm Array multiplier erial multiply (with bit parallel adder) igned multiplication ign extend multipicand ign bit of multiplier treated as subtract