Integer Multipliers 1 - PDF Free Download

Integer Multipliers

Multipliers A must have circuit in most S applications A variety of multipliers exists that can be chosen based on their performance Serial, Serial/arallel,Shift and Add, Array, ooth, Wallace Tree,. 2

converter reset en RA en reset reset en 6x6 multiplier RC Converter converter R 3

Multiplication Algorithm = n- n-2.. Multiplicand Y=Yn- Yn-2.Y Multiplier Yn- Yn-2 Yn-3 Y Y Yn- Yn-2 Yn-3 Y Y Yn-2 Yn-22 Yn-32 Y2 Y2..... Yn-n-2 Yn-2 n-2 Yn-3 n-2 Yn-2 Yn-2 Yn-n- Yn-2n- Yn-3n- Yn- Yn- ----------------------------------------------------------------------------------------------------------------------------------------- 2n- 2n-2 2n-3 2 4

. Multiplication Algorithms Implementation of multiplication of binary numbers boils down to how to do the additions. Consider the two 8 bit numbers A and to generate the 6 bit product. First generate the 64 partial roducts and then add them up. 5

Multiplier esign R E G I N Storage MU ( Multiplier Unit) R E G O U T Control Unit 6

Serial Multiplier : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= G2 CLK d -bit REG q x y G x y x y Serial Register CLK CLK/(N) Slide 7

S i : the ith bit of the final result : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= G2 CLK d -bit REG q x y G x y x y S Serial Register CLK CLK/(N) Slide 2 8

S i : the ith bit of the final result : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= G2 CLK d -bit REG q x 2 y G x 2 y x 2 y x y S Serial Register CLK CLK/(N) Slide 3 9

S i : the ith bit of the final result : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= G2 CLK d -bit REG q x 3 y G x 3 y x 3 y x 2 y x y S Serial Register CLK CLK/(N) Slide 4

S i : the ith bit of the final result : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= S G2 CLK d -bit REG q G x 3 y x 2 y x y Serial Register S CLK CLK/(N) Slide 5

S i : the ith bit of the final result C i : the only carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= x y G2 CLK d -bit REG q x y x y G x y C S x 3 y x 2 y Serial Register x y S CLK CLK/(N) Slide 6 2

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= x 2 y G2 CLK d -bit REG q x 2 y C 2 S 2 S x 3 y x 2 y S x y G x y C Serial Register CLK CLK/(N) Slide 7 3

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= x 3 y G2 CLK d -bit REG q x 3 y C 3 x 2 y G x 2 y C 2 S 3 S 2 S Serial Register x 3 y S CLK CLK/(N) Slide 8 4

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= G2 CLK d -bit REG q x 3 y G x 3 y C 4 C 3 S 4 S 3 S 2 S Serial Register S CLK CLK/(N) Slide 9 5

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= S G2 CLK d -bit REG q G C 5 = C 4 S 5 S 4 S 3 S 2 Serial Register S S CLK CLK/(N) Slide 6

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= S 2 G2 CLK d -bit REG q x S 2 y G x y 2 2 C 2 S 2 S 5 S 4 S 3 Serial Register S 2 S S CLK CLK/(N) Slide 7

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= S 3 G2 CLK d -bit REG q x S 3 y G x y 2 2 C 3 S 3 C 2 S 2 S 5 S 4 Serial Register S 3 S S CLK CLK/(N) Slide 2 8

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= S 4 G2 CLK d -bit REG q x 2 S 4 y G x 2 y 2 2 C 4 S 4 C 3 S 3 S 2 S 5 Serial Register S 4 S S CLK CLK/(N) Slide 3 9

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= S 5 G2 CLK d -bit REG q x 3 S 5 y G x 3 y 2 2 C 5 S 5 C 4 S 4 S 3 S 2 Serial Register S 5 S S CLK CLK/(N) Slide 4 2

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= S 2 G2 CLK d -bit REG q G C 6 = S 6 C 5 S 5 S 4 S 3 Serial Register S 2 S S CLK CLK/(N) Slide 5 2

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= S 3 G2 CLK d -bit REG q S 3 x y G x y 3 3 C 3 2 S 3 S 6 S 5 S 4 Serial Register S 3 S 2 S S CLK CLK/(N) Slide 6 22

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= S 4 G2 CLK d -bit REG q S 4 x y G x y 3 3 C 4 2 S 4 2 C 3 S 3 S 6 S 5 Serial Register S 4 S 2 S S CLK CLK/(N) Slide 7 23

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= S 5 G2 CLK d -bit REG q S 5 x 2 y G x 2 y 3 3 C 5 2 S 5 2 C 4 S 4 S 3 S 6 Serial Register S 5 S 2 S S CLK CLK/(N) Slide 8 24

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= S 6 G2 CLK d -bit REG q S 6 x 3 y G x 3 y 3 3 C 6 S 6 2 C 5 S 5 S 4 S 3 Serial Register S 6 S 2 S S CLK CLK/(N) Slide 9 25

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= S 3 G2 CLK d -bit REG q G S 7 C 6 S 6 S 5 S 4 Serial Register S 3 S 2 S S CLK CLK/(N) Slide 2 26

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= G2 CLK d -bit REG q G S 7 S 6 S 5 Serial Register S 4 S 3 S 2 S S CLK CLK/(N) Slide 2 27

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i : x 3 x 2 x x Y:y 3 y 2 y y Input Sequence for G: x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x x 3 x 2 x x y 3 y 3 y 3 y 3 y 2 y 2 y 2 y 2 y y y y y y y y Reset: Reset= G2 CLK d -bit REG q G S 7 S 6 S 5 Serial Register S 4 S 3 S 2 S S CLK CLK/(N) Slide 2 28

S i : the ith bit of the final result Serial / arallel Multiplier A y y y 2 y 3 x S S S S S Slide 29

S i : the ith bit of the final result C i : the only carry from column i A y y y 2 y 3 x x x y x y S S S S S C Slide 2 3

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i A y y y 2 y 3 x 2 x x x 2 y x y x y 2 S 2 S 2 S 2 S 2 S S C 2 C C 2 Slide 3 3

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i A y y y 2 y 3 x 3 x 2 x x x 3 y x 2 y S 3 C 2 x y 2 x y 3 S 3 S 3 C 3 C 3 C 3 2 C 2 S 3 S 2 S S Slide 4 32

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i C ij : the jth partial carry from column i A y y y 2 y 3 x 3 x 2 x x 3 y x 2 y 2 x y 3 S 4 S 4 S 4 S 4 S 3 S 2 S S C 4 C 4 C 3 C 3 C 4 2 C 3 2 Slide 5 33

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i A C ij : the jth partial carry from column i y y y 2 y 3 x 3 x 2 x 3 y 2 x 2 y 3 S 5 S 5 C 4 S 5 S 4 S 3 S 2 S S C 4 C 5 C 4 C 5 C 4 2 Slide 6 34

S i : the ith bit of the final result C i : the only carry from column i S ij : the jth partial sum for column i A C ij : the jth partial carry from column i y y y 2 y 3 x 3 x 3 y 3 C 5 S 6 S 6 S 5 S 4 S 3 S 2 S S C 5 C 6 C 5 Slide 7 35

S i : the ith bit of the final result C i : the only carry from column i A y y y 2 y 3 S 7 S 7 S 6 S 5 S 4 S 3 S 2 S S C 6 Slide 8 36

Shift AN Add Multiplier INUT Ain (7 downto ) A REGA MU 8 bit Adder INUT in (7 downto ) REGC REG CLOCK Result (5 downto 8) Result (7 downto ) 37

Synchronous Shift and Add Multiplier controller Multiplication process: 5 states: Idle, Init, Test, Add, and Shift&Count. Idle: Starts by receiving the Start signal; Init: Multiplicand and multiplier are loaded into a load register and a shift register, respectively; Test: The LS in the shift register which contains the multiplier is tested to decide the next state; 38

Synchronous Shift and Add Multiplier Controlleresign Add: If LS is, then next state is to add the new partial product to the accumulation result, and the state machine transits to shift&count state ; Shift&Count: If LS is, then the two shift register shift their contains one bit right, and the counter counts up by one step. After that, the state machine transits back to test state; When the counter reaches to N, a Stop signal is asserted and the state machine goes to the idle state; Idle: In the idle state, a one signal is asserted to indicate the end of multiplication. 39

n-bit Multiplier: Q =: Multiplicand is added to register A; the result is stored in register A; registers C, A, Q are shifted to the right one bit Q =: Registers C, A, Q are shifted to the right one bit Multiplicand n-bit Adder Add Shift and Add Control Logic Shift Right C A n- A n... A A Q n- Q n... Q Q Multiplier Slide 4

Example: 4-bit Multiplier Initial Values A Multiplicand 4-bit Adder Add Shift and Add Control Logic Shift Right Multiplier Slide 2 4

Example: 4-bit Multiplier First Cycle--Add A Multiplicand 4-bit Adder Add= Shift and Add Control Logic Shift Right= Multiplier Slide 3 42

Example: 4-bit Multiplier First Cycle--Shift A Multiplicand 4-bit Adder Add= Shift and Add Control Logic Shift Right= Multiplier Slide 4 43

Example: 4-bit Multiplier Second Cycle--Shift A Multiplicand 4-bit Adder Add= Shift and Add Control Logic Shift Right= Multiplier Slide 5 44

Example: 4-bit Multiplier Third Cycle--Add A Multiplicand 4-bit Adder Add= Shift and Add Control Logic Shift Right= Multiplier Slide 6 45

Example: 4-bit Multiplier Third Cycle--Shift A Multiplicand 4-bit Adder Add= Shift and Add Control Logic Shift Right= Multiplier Slide 7 46

Example: 4-bit Multiplier Fourth Cycle--Add A Multiplicand 4-bit Adder Add= Shift and Add Control Logic Shift Right= Multiplier Slide 8 47

Example: 4-bit Multiplier Fourth Cycle--Shift A Multiplicand 4-bit Adder Add= Shift and Add Control Logic Shift Right= Multiplier Slide 9 48

4*4 Synchronous Shift and Add Multiplier esign Layout esign Floor plan of the 4*4 Synchronous Shift and Add Multiplier 49

Comparison between Synchronous and Asynchronous Approaches. 5

Example : (simulated by Ovais Ahmed) A Multiplicand = 2 = 89 6 Multiplier = 2 = A 6 Expected Result = 2 =583 6 5

Array Multiplier Regular structure based on add and shift algorithm. Addition is mainly done by carry save algorithm. Sign bit extension results in a higher capacitive load and slows down the speed of the circuit. 52

Addition with CLA a 3 a 2 a a A = a 3 a 2 a a b = b 3 b 2 b b a 3 a 2 a a b C out Four-bit Adder C i n a 3 a 2 a a b 2 C out Four-bit Adder C in a 3 a 2 a a b 3 C out Four-bit Adder C in 53 roduct (A*)

Array Multiplier with CSA A 3 A 2 A A 3 2 2 ** ij =A i j Total of 6 gates A j i C i F.A S i C i F.A S i C i F.A S i i 3 j 3 2 3 3 22 2 2 F.A F.A F.A ij C i S i C i S i C i S i 23 32 3 3 F.A F.A F.A C i S i C i S i C i S i 33 F.A F.A F.A C i S i C i S i C i S i 54 R 7 R 6 R 5 R 4 R3 R 2 R R

Critical ath with Array Multipliers FA FA FA HA FA FA FA HA FA FA FA HA Two of the possible paths for the Ripple-Carry based 4*4 Multiplier Area = (N*N) AN Gate (N-)N Full-Adder elay = τ HA (2N-) τ FA 55

x 4 y 4 x 3 y 4 x 4 y 3 Wallace Tree x 2 y 4 x 3 y 3 x 4 y 2 x y 4 x 2 y 3 x 3 y 2 x 4 y x y 4 x y 3 x 2 y 2 x 3 y x 4 y x y 3 x y 2 x 2 y x 3 y x y 2 x y x 2 y x y x y x y 9 8 7 6 5 4 3 2 57

Array Multiplier Wallace Tree 58

59 9/25/27 Concordia VLSI Lab 59 Convert negative partial products to positive representation No sign-extension required ) *2 *2 )*( *2 *2 ( * 2 2 i k i i k k i k i i k k y y x x Y i k k i i k i k k i i k j i k j j i k i k k k x y y x y x y x 2 2 2 2 2 2 2 * *2 ) *2 *2 * ( A augh-wooley Algorithm

examples of 5-by-5 augh-wooley a 4 b ' a a 3 b b a 2 b a b a 4 b ' a 3 b FA FA a2 b a b FA FA a b a 4 b 2 ' a 3 b 2 a 2 b 2 a b 2 FA FA FA FA a b 2 a 4 b 3 ' a FA 3 b 3 a FA 2 b 3 a FA b 3 FA a b 3 a 4 ' b 4 ' FA a 4 b 4 FA a 3 'b4 FA a 2 'b4 FA a 'b4 FA a 'b4 a 4 FA FA FA FA FA FA b 4 9 8 7 6 5 4 3 2 The schematic logic circuit diagram of a 5-by-5 augh-wooley two s complement array multiplier 9/25/27 Concordia VLSI Lab 6 6

a7*a a7*a a7*a a4*a a2*a A Squarer using augh-wooley Algorithm a7 a6 a5 a4 a3 a2 a a * a7 a6 a5 a4 a3 a2 a a -------- ----- ------- ------ -------- ----- ------- ------ -------- ----- -------- ----- ------- ------ -------- ----- -------- ----- ------- ------ -------- ----- ------- ------ -------- ----- ------- ------ -------- ----- a7*a a6*a a5*a a4*a a3*a a2*a a*a a*a a7*a a6*a a5*a a4*a a3*a a2*a a*a a*a a7*a2 a6*a 2 a5*a2 a4*a2 a3*a 2 a2*a2 a*a 2 a*a2 a7*a3 a6*a3 a5*a 3 a4*a3 a3*a3 a2*a 3 a*a3 a*a 3 a7*a 4 a6*a4 a5*a4 a4*a 4 a3*a4 a2*a4 a*a 4 a*a4 a7*a5 a6*a 5 a5*a5 a4*a5 a3*a 5 a2*a5 a*a5 a*a 5 a7*a 6 a6*a6 a5*a 6 a4*a6 a3*a6 a2*a 6 a*a6 a*a6 a7*a7 -------- ----- a6*a 7 a5*a7 ------- ------ -------- ----- a4*a 7 a3*a7 a2*a7 ------- ------ -------- ----- -------- ----- a*a 7 a*a7 ------- ------ -------- ----- -------- ----- ------- ------ -------- ----- ------- ------ -------- ----- ------- ------ -------- ----- 6

62 aa a a2a a5a a4a a3a2 a5a a4a2 a6a a6a a5a2 a7a a6a2 a5a3 a7a a3a a4a a2a a2 a3a a a3 a3a4 a4 a6a3 a5a4 a7a2 a5 a6a4 a7a3 a6a5 a6 a7a4 a7a5 a7 a7a6 S S S2 S4 S5 S6 S7 S8 S9 S S S2 S3 S4 S5 S3 Example of an 8bit squarer A

Array Multiplier 32bits by 32bits multiplier 63

ooth (Radix-4) Multiplier A Radix-4 (3 bit recoding) reduces number of partial products to be added by half. Great saving in area and increased speed. A = -a n- 2 n- a n-2 2 n-2 a n-3 2 n-3. a 2 a = -b n- 2 n- b n-2 2 n-2 b n-3 2 n-3. b 2 b ase 4 redundant sign digit representation of is = (n/2) - i = 2 2i K i 64

K i is calculated by following equation K i = -2b 2i b 2i b 2i- i =,,2,.(n-2)/2 3 bits of Multiplier, b 2i, b 2i, b 2i-, are examined and corresponding K i is calculated. is always appended on the right with zero (b - = ), and n is always even ( is sign extended if needed). The product A is then obtained by adding n/2 partial products. (n/2) - A = = i = 2 2i K i A 65

ooth Algorithm ecoding of multiplier to generate signals for hardware use i i i- O NEG ZERO TWO 2 2 66

ooth Algorithm A ooth recoded multiplier examines Three bits of the multiplicand at a time It determine whether to add zero,, -, 2, or -2 of that rank of the multiplicand. The operation to be performed is based on the current two bits of the multiplicand and the previous bit i i- Z i/2 2-2 - - 67

IT 2 2 2- OERATION M is multiplied by i i i2 add zero (no string) add multipleic (end of string) add multiplic. (a string) add twice the mul. (end of string) 2 sub. twice the m. (beg. of string) -2 sub. the m. (-2 and ) - sub. the m. (beg. of string) - sub. zero (center of string) - 68

ooth Algorithm- dot notation Multiplicand A = Multiplier = ( )( ) artial product bits ( ) 2 A4 artial product bits ( 3 2 )A4 roduct = 69

Example A The following example is used to show how the calculation is done properly. Multiplicand = Multiplier Y = Added to the multiplier After booth decoding, Y is decoded as to multiply by 2, -, separately, then shift the partial product two bits and add them together. * * - * 2 -------------------------------------------- 7

Sign Extension 7

Sign extension Traditional sign-extension scheme Segment the input operands based on the size of embedded blocks Multiply the segmented inputs and extend the sign bit of each partial products Sum all partial products Sign extension Segmented input operands Sign partial products Final result 9/25/27 Concordia VLSI Lab 72 72

ooth Algorithm-Example Example : (3) (29) 2 - (87) 73

ooth Algorithm Example 2 Notice sign extensions 2s complement of multiplicand (-3) (29) 2 - (-87) 74

ooth Algorithm-Example 3 Notice the sign extensions (-3) (-29) -2 - Shifted 2s complement (87) 75

Comparison of ooth and parallel multiplier shift and Add 76

Template to reduce sign extensions for ooth Algorithm lease note that each operand is 7 bit ie. the 7 th bit is the sign bit. Also negative numbers are entered as s complement, this is why you need to add the S in the right hand side of the diagram. If you use 2 complement then the S s on right side of the diagram can be removed 77

Comparison of Template and the sign extension A A S S S S S S S S S S S 2 S 2 S 2 S 2 S 2 S 2 S 3 S 3 S 3 S 3 S 4 Sign template Sign extension 78

Example of using the template 25 * - 35 with -35 as the multiplier. Using 8 bit representation Using the Template 25 * -35 Sign bit Add SS Add inverted S Add Inverted sign and add * Add Inverted sign bit * - * 2 No sign bit * - This is a ve number. Convert it 52 256 64 32 8 2 = 875 79

ooth Multiplier Components Multiplier ooth Encoder Mu lt ip li ca nd U (artial products unit) A (artial products adding unit) roduct 8

Wallace Tree and Ripple Carry Adder Structure. Of 8*8 multiplier With ipeline artial roduct,,2(5 downto ) artial roduct 3(5 downto ) Ripple Carry Adder ipeline Register Critical ath 6 5 4 3 2 9 8 7 6 5 4 3 2 8

CLK Start Mulbegin Hardware implementation of ooth with shift and add A A Init L oubleshift SH CLK 2s complement Init Shift CLK L SH CLK CLK 6 reg_2left32 6 7 32 CLR Start Start Q Q CLRreg2right7 *2 (shifter) 7 =; A6= =, A6= endcheck Mul A 32 32 C 32 32 ctrl F Stop QA(-2) CLK Mul2 ctrl Y 32 mux4-32 Mulbegin Stop A3bit CLK sign expansion 5 A 37 37 CLR Start Shift not used Mux Mux2 Mux Init Mulend Cout 37 FSM Sum Cin Adder 37 Start oubleshift Mux Mux2 CLK Mux Init Mulend A 37 Mux Sel CLK Counter2 Y 37 37 Mux37 CLR Start 37 Q 37 CLK Register37 Finish CLR Start Finish Mulend Result Init Shift CLK L SH CLK 6 32 Q *2 (shifter) reg_2left32 CLR Start 82

Simulation lan 32-bit Signal Generator A A[3:] ehavioral Multiplier A * [63:] Result 64-bit Comparator 32-bit Signal Generator [3:] My Multiplier My_[63:] Failed Number Array Multiplier Modified ooth Multiplier Wallace Tree Multiplier Modified ooth- Wallace Tree Multiplier Twin ipe Serial-arallel Multiplier 83

Testing the esign 84

Simulation For arallel Multipliers Signed Number: Unsigned Number: 85

Simulation For Signed S/ Multipliers There are 34 ns delay between the result and the operators because of the flip-flops delay. 86

FGA after implementation, areas of programming shown clearly 87

Another implementation of the above after pipelining, the place and rout has paced the design in different places. 88

Spartacus FGA board 89

Testing the multiplication system 9

Comparison of Multipliers Area Total CL s (#) Maximum elay (ns) Total ynamic ower (W) elay ower roduct () (ns W) Area ower roduct (A) (# W) Area elay roduct (A) (# ns) Area elay 2 roduct (A 2 ) (# ns 2 ) Array Multiplier Modified ooth Multiplier Wallace-Tree Multiplier Modified ooth- Wallace Tree Multiplier Twin ipe Serial- arallel Multiplier ehavioral Multiplier 376.5 2649.5 3325.5 2672.5 49. 2993.5 35.78 24.43 8.93 8.53 7.52 (3.36x32) 49.33 7.52 6.33 7.46 6.4.28 6.24 268.98 54.64 4.4 8.76 3.62 37.58 2328.2 677.6 24793.93 727.79 39.54 8665.7.E5 6.47E4 6.3E4 4.95E4 5.27E4.48E5 3.94E6.58E6.9E6 9.8E5 5.66E6 7.28E6 9 Table 7. erformance comparison for two s complement multipliers y Chen Yaoquan, M.Eng. 25

Comparison of Multipliers Array Multiplier Modified ooth Multiplier Wallace-Tree Multiplier Modified ooth- Wallace Tree Multiplier Twin ipe Serial- arallel Multiplier ehavioral Multiplier Area Total CL s (#) Maximum elay (ns) Total ynamic ower (W) elay ower roduct () (ns W) Area ower roduct (A) (# W) Area elay roduct (A) (# ns) Area elay 2 roduct (A 2 ) (# ns 2 ) 328.5 28. 332.5 2845.5 487. 33. 37.23 25.33 8.93 8.33 7.52 44.5 7.57 6.66 7.32 6.66.29 6.26 28.88 68.77 38.6 22.3 3.66 278.53 24837.98 8656.4 2439.36 8959.57 38.89 8795.78.22E5 7.9E4 6.29E4 5.22E4 5.24E4.34E5 4.55E6.8E6.9E6 9.56E5 5.63E6 5.95E6 92 Table 7. erformance comparison for Unsigned multipliers y Chen Yaoquan, M.Eng. 25

rea (#) A Comparison of Multipliers Change the value of set_max_delay in Script file (ns) 2 3 4 5 6 >6 Area(#) 34. 5 33. 3. 393. 5 39. 5 2999. 5 2978. 5 2978. 5 ower(w) 6.649 9 6.647 7.568 3 8.87 8 8.64 5 8.4 9 8.5 6 8.5 6 elay(n s) 3.98 3.98 3.93 3.8 39.93 49.88 59.63 59.63 325 The relation of Area and elay for behavioral multiplier -- 32 35 3 35 3 Series "banana curve" 295 2 4 6 8 elay (ns) 93

Comparison of Multipliers Array Multiplier Modified ooth Multiplier Wallace- Tree Multiplier Modified ooth- Wallace Tree Multiplier Twin ipe Serial- arallel Multiplier ehavioral Multiplier Area Medium Small Large Small Smallest Medium Critical elay Medium Fast Very Fast Fastest Very Large Large ower Consumption Large Medium Large Medium Smallest Medium Complexity Simple Complex More Complex More Complex Simple Simplest Implement Easy Medium ifficut ifficut Easy Easiest y Chen Yaoquan, M.Eng. 25 94

ipelining Simulation 95

Synthesis for Signed Multipliers Array Modified ooth Wallace Tree Modified ooth -Wallace Tree Twin ipe S/ ehavioral 96

Synthesis for Unsigned Multipliers Array Modified ooth Wallace Tree Modified ooth -Wallace Tree Twin ipe S/ ehavioral 97

Conclusion Modified ooth and Wallace Tree are the best techniques for high speed multiplication. Wallace Tree has the best performance, but it is hard to implement. ooth algorithm based multipliers have lower area among parallel multipliers. For behavioral multipliers, the area will increase while the delay decreases. 98

Comparison Area Total CL s (#) Array Multiplier Modified ooth Multiplier Wallace Tree Multiplier Modified ooth & Wallace Tree Multiplier Twin ipe Serial- arallel Multiplier 65 292 659 239 33 Maximum elay (ns) 87.87ns 39.4ns.4ns.43ns 22.58ns (722.56ns) ower Consumption at highest speed (mw) elay ower roduct () (ns mw) Area ower roduct (A) (# mw) Area elay roduct (A) (# ns) Area elay 2 roduct(a 2 ) (# ns 2 ) 6.656m W (at 88ns) 23.36mW (at 4ns) 3.95mW (at.4ns) 3.862mW (at.43ns) 2.89mW (at 722.56ns) 328.5 3225.39 33.28 33.33 59.42 9.397 x 29.89 x 3 5.346 x 3 38.238 x 3 277.837 3 28.868 x 8.8 x 67.79 x 3 25.67 x 3 96. x 3 3 3 4.9 x 25. x 6 6.97 x 6 2.747 x 6 69.438 x 6 6 99

NOTICE The rest of these slides are for extra information only and are not part of the lecture

rray Addition

ddition of 8 binary numbers using the Wallace tree principal 2

FINISH A EGIN CLK RESET MULT32 one COUNTER2 INVERTER CLR RESULT 32 Adder37 37 37 AN_2 CLK 37 Q EN LAST_RESULT START CLR REGSTER37 5

augh-wooley two's complement multiplier: a4b' a4b' ab a3b a2b ab a3b FA FA a2b FA ab FA ab a4b2' a3b2 a2b2 ab2 FA FA FA FA ab2 a4b3' FA FA FA a3b3 a2b3 ab3 FA ab3 a4' b4' FA a4b4 FA a3'b4 FA a2'b4 FA a'b4 FA a'b4 a4 FA FA FA FA FA FA b4 9 8 7 6 5 4 3 2 The schematic logic circuit diagram of a 5-by-5 augh-wooley two s complement array multiplier 6

a 4 a 3 a 2 a a b 4 b 3 b 2 b b A a 4b ' a 3b a 2b a b a b a 4b ' a 3b a 2b a b a b a 4b 2' a 3b 2 a 2b 2 a b 2 a b 2 a 4b 4 a 4b 3' a 3b 3 a 2b 3 a b 3 a b 3 a 4' a 3'b 4 a 2'b 4 a 'b 4 a 'b 4 b 4' a 4 b 4 p9 p8 p7 p6 p5 p4 p3 p2 p p =3 = -5 = -5 =3 = -65 = -65 =3 = 5 = -3 = -5 = 65 = 65 7

Cluster Multipliers ivide the multiplier into smaller multipliers 8

Cluster Multipliers Multiplier Multiplicand A8~A7 A3~A 8~7 3~ 4 4 4 4 8-bit Latch 8-bit Latch 8-bit Latch 8-bit Latch 4-bit Multiplier CLK 4-bit Multiplier CLK 4-bit Multiplier CLK 4-bit Multiplier CLK The circuit used to generate the enable signal /CLR 8-bit Latch EN3 EN2 EN EN CLK /CLR 8-bit Latch CLK /CLR 8-bit Latch CLK /CLR 8-bit Latch CLK 8 8 8 8 Final Addition Stage 6 9 8-bit cluster low power multiplier

Cluster Multipliers ividing the multiplication circuit into clusters (blocks) of smaller multipliers Applying clock gating techniques to disable the blocks that are producing a zero result. Features Low ower (claims 3.4 % savings)

Multiplexer-ased Array Multipliers Z 4 Z 3 Z 2 Z Z 4 Z 3 Z 2 Z j 2 Z 4 2 Z 3 3 Z 4 xjyj n j x j y j 2 2 j n j Z j 2 j Z j x jyj j y j j j j 2...

Multiplexer-ased Array Multipliers Two types of cells: Cell : produce the terms Z ij 2 j carry save adder array and includes a full adder of Cell 2: produce the terms x j y j 2 j and includes a full adder of carry save adder array 2

Multiplexer-ased Array Multipliers Characteristics Faster than Modified ooth Unlike ooth, does not require encoding logic Requires approximately N 2 /2 cells Has a zigzag shape, thus not layout-friendly 3

Multiplexer-ased Array Multipliers Improvement More rectangular layout Save up to 4 percent area without penalties Outperforms the modified ooth multiplier in both speed and power by 3% to 26% 4

Gray-Encoded Array Multiplier ec Hyb ec Hyb ec Hyb ec Hyb 4-8 -4 5-7 -3 2 6-6 -2 3 7-5 - 2 s complement Hybrid Coding Having a single bit different for consecutive values Reducing the number of transitions, and thus power ( for highly correlated streams ). 5

Gray-Encoded Array Multiplier An 8-bit wide 2 s complement radix-4 array multiplier 6

Gray-Encoded Array Multiplier Characteristics Uses gray code to reduce the switching activity of multiplier Saves 45.6% power than Modified ooth Uses greater area(26.4% ) than Modified ooth 7

Ultra-high Speed arallel Multiplier How to ultra-high speed? ased on Modified ooth Algorithm and Tree Structure (Column compress) Chooses efficient counters (3:2 and 5:3) Uses the new compressor (faster 2% ) Uses First artial product Addition (FA) Algorithm (reducing the bits of CLA by 5%) 8

Ultra-high Speed arallel Multiplier ivide into 3 rows or 5 rows only (most efficient). Calculate the partial products as soon as possible. The final CLA is only 6-bit instead of 32-bit. Calculation process using parallel counter in case of 6x6 ---Totally reduce delay by about 3% 9

ULLRLF Multiplier ULLRLF stands for Upper/Lower Left-to- Right Leapfrog. Combine the following techniques: Signal flow optimization in [3:2] adder array for partial product reduction, Left-to-right leapfrog (LRLF) signal flow, Splitting of the reduction array into upper/lower parts. 2

ULLRLF Multiplier ij is always connected to pin A Sin/Cin are connected to /C, most Sin signals are connected to C ) Signal flow optimization in [3:2] adder array -- For n = 32, the delay is reduced by 3 percent. -- The power is saved also. 2

ULLRLF Multiplier The sum signals skip over alternate rows. 2) Left-to-Right Leapfrog (LRLF) Structure -- The delay of signals is more balanceable. -- Low power. 22

ULLRLF Multiplier Only n2 bits 3) Upper/Lower Split Structure -- The long path of data path be broken into parallel short paths, there would be a saving in power. -- The delay of artial roducts Reduction is reduced. 23

ULLRLF Multiplier ULLRLF multipliers have less power than optimized tree multipliers for n 32 while keeping similar delay and area. With more regularity and inherently shorter interconnects, the ULLRLF structure presents a competitive alternative to tree structures. Floorplan of ULLRLF (n = 32) 24

Signed Array Multiplier A3 A3 A2 A A A3 A3 A2 A A 2 One stage of carry save adder A3 A3 A29 A A FA FA FA FA HA 3 A3 A3 A29 A28 A A FA FA FA FA FA HA STAGE 4 TO 3 (Each stage includes 32 AN gates, 3 full adders, half adder and NOT gate) 3 A3 A3 A A FA FA FA HA HA 32-bit carry look ahead adder 63 62 6 34 33 3 3 3 2 32*32-it Array Multiplier for Signed Number 25

Unsigned Array Multiplier A3 A3 A2 A A A3 A3 A2 A A 2 One stage of carry save adder A3 A3 A29 A A HA FA FA FA HA 3 A3 A3 A29 A28 A A FA FA FA FA FA HA STAGE 4 TO 3 (Each stage includes 32 AN gates, 3 full adders and half adder) 3 A3 A3 A A FA FA FA HA 32-bit carry look ahead adder 63 62 6 33 32 3 3 3 2 32*32-it Array Multiplier for Unsigned Number 26

Signed Modified ooth Multiplier 6 3 6 5 5 5 4 5 4 3 5 3 2 5 2 5 5... E... E...... E... 6 rows of partial products E... E... E... E... E... E... E... E... E... E... E... E... E. E...S.... E..S... E = The inversion of sign bit in each row S = the i bit in the three encoded bits 32*32-bit ooth Multiplier for Signed Number { { { { { { { { { { { { { { { {... M u l t I p l i e r LS i- i MS 27

Signed Modified ooth Multiplier A3 A3 SEL A3 SEL A4 A3 A2 A A SEL SEL SEL SEL SEL SEL [] 2[] INVERT ooth Encoder [:] One stage A3 A3 A3 A29 A28 SEL SEL SEL SEL A2 A A SEL SEL SEL SEL [] 2[] INVERT ooth Encoder [3:] HA FA HA HA HA HA HA A3 A3 A3 A29 A28 A27 A26 SEL SEL SEL SEL SEL SEL A A SEL SEL SEL INVERT2 [2] 2[2] INVERT2 ooth Encoder [5:3] HA FA FA FA FA FA FA FA STAGE 3 TO 5 (Each stage includes 33 selectors, 3 full adders, half adder and NOT gate) INVERT n [n] 2[n] INVERT n ooth Encoder [3:5] INVERT INVERT 64-bit carry look ahead adder 63 62 6 6 5 4 3 2 32*32-it Modified ooth Multiplier for Signed Number 28

Unsigned Modified ooth Multiplier 6 3 6 5 5 5 4 5 4 3 5 3 2 5 2 5 5... S'... S'...... S'... 7 rows of partial products S'... S'... S'... S'... S'... S'... S'... S'... S'... S'... S'... S'... S'. S'...S......S S = the i bit in the three encoded bits S' = The inversion of S 32*32-bit ooth Multiplier for unsigned Number { { { { { { { { { { { { { { { { {... i- i M u l t i p l i e r LS MS 29

Unsigned Modified ooth Multiplier S[] A3 SEL_ EN A3 SEL A4 A3 A2 A SEL SEL SEL SEL SEL A SEL_ EN [] 2[] S[] ooth Encoder [:] One stage S[] A3 A3 A29 A28 SEL_ EN SEL SEL SEL A2 A SEL SEL SEL A SEL_ EN [] 2[] S[] ooth Encoder [3:] HA FA HA HA HA HA HA HA S[2] A3 A3 A29 A28 A27 A26 SEL_ EN SEL SEL SEL SEL SEL A SEL A SEL SEL_ EN [2] 2[2] S[2] ooth Encoder [5:3] S[2] HA FA FA FA FA FA FA FA FA STAGE 3 TO 5 (Each stage includes 33 selectors, 32 full adders, half adder and NOT gate) S[i] [i] 2[i] S [i] ooth Encoder [i, I, i-] A3 A3 A29 SEL_ EN SEL SEL A SEL A SEL SEL_ EN [6] 2[6] S[6] ooth Encoder [3] S6 HA FA FA FA FA FA S[] S[] 64-bit carry look ahead adder 63 62 6 35 34 33 32 3 6 5 4 3 2 32*32-it Modified ooth Multiplier for Unsigned Number 3

Wallace Tree multipliers A[3:] [3:] 32 partial products added in Wallace Tree Adder C[63:] S[63:] 64-bit Carry Look-ahead Adder [63:] 3

Wallace Tree multipliers... Use the 3:2 counters and 2:2 counters Number of levels of = log (32/2) / log (3/2) 8 Irregular structure Fast Input: Output:... Carry 2:2 counter Sum... 3:2 counter Sum Carry................................................................................................................................................................................................................................................................... 2 3 4 5 6 7 8 32

Wallace Tree multipliers 63... A63... A Carry ropagate/generate unit Cin 63... G63... G 63-56 G63-G56... 7- G7-G 8-it CLA 8-it CLA 8-it CLA 8-it CLA 8-it CLA 8-it CLA 8-it CLA 8-it CLA 2-level hierarchical C63-C56 M7 C56 GM7 C55-C48 C47-C4 C39-C32 C3-C24 C23-C6 C5-C8 M6 M5 M4 M3 M2 M C48 C4 C24 C6 GM6 GM5 GM4 GM3 GM2 GM C8 M GM C7-C 8-it CLA 63... C63... C 64-it Summation Unit C64 S63... S 64-it Carry Look Ahead Adder 33

Modified ooth-wallace Tree Multipliers 34

Use the 3:2 counters and 2:2 counters Number of levels of = log (6/2) / log (3/2) 6 Irregular structure Fast Less area Modified ooth-wallace Tree Multipliers...................................... Rearrage 2 3 4 5 6 ot Matrix of ooth-wallace Multiplier for Signed Number 35

Twin pipe serial-parallel multipliers 3 28 2 A3 A3 A A 62 6 2 arallel in serial out shift registers Serial in parallel out shift registers 3 29 3 32-bit twin pipe serial-parallel multiplier unit 63 6 3 arallel in serial out shift registers Serial in parallel out shift registers Result_ready Load/Shift Reset Clock Sign lock diagram of 32*32-bit signed twin pipe serial-parallel multiplier with serial/parallel conversion logic 36

Signed twin pipe serial-parallel multipliers Even data bits on rising clock A3 A3 A... 2 reset rising_edge FA FA falling_edge FA HA Even product Repeat 28 units more MU roduct FA FA FA HA Clock Odd data bits on rising clock... 3 reset Odd product 3 29... A3 A3 A Sign Reset Clock 32*32-bit twin pipe serial-parallel multiplier for signed number Sign control line and the sign-change hardware 37

Unsigned twin pipe serial-parallel multipliers Even data bits on rising clock A3 A3 A... 2 reset rising_edge HA FA falling_edge FA HA Even product Repeat 28 units more MU roduct HA FA FA HA Clock Odd data bits on rising clock... 3 reset Odd product A3 A3 A Reset Clock 32*32 bit twin pipe serial-parallel multiplier for unsigned number on t need the Sign control line and the sign-change hardware 38