Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3

Similar documents
JOURNAL OF INTERNATIONAL ACADEMIC RESEARCH FOR MULTIDISCIPLINARY Impact Factor 1.393, ISSN: , Volume 2, Issue 7, August 2014

Design and Implementation of Advanced Modified Booth Encoding Multiplier

Chapter 4 Design of Function Specific Arithmetic Circuits

Chapter 4. Combinational Logic

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

32-bit Signed and Unsigned Advanced Modified Booth Multiplication using Radix-4 Encoding Algorithm


EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

FPGA IMPLEMENTATION OF EFFCIENT MODIFIED BOOTH ENCODER MULTIPLIER FOR SIGNED AND UNSIGNED NUMBERS

ECE 341. Lecture # 6

II. MOTIVATION AND IMPLEMENTATION

Date Performed: Marks Obtained: /10. Group Members (ID):. Experiment # 09 MULTIPLEXERS

Outline. Combinational Circuit Design: Practice. Sharing. 2. Operator sharing. An example 0.55 um standard-cell CMOS implementation

Combinational Circuit Design: Practice

Partial product generation. Multiplication. TSTE18 Digital Arithmetic. Seminar 4. Multiplication. yj2 j = xi2 i M

Advanced Computer Architecture-CS501

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 3 DLD P VIDYA SAGAR

VARUN AGGARWAL

*Instruction Matters: Purdue Academic Course Transformation. Introduction to Digital System Design. Module 4 Arithmetic and Computer Logic Circuits

Analysis of Different Multiplication Algorithms & FPGA Implementation

Digital Computer Arithmetic

Improved Design of High Performance Radix-10 Multiplication Using BCD Codes

Efficient Radix-10 Multiplication Using BCD Codes

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

Combinational Logic Circuits

Array Multipliers. Figure 6.9 The partial products generated in a 5 x 5 multiplication. Sec. 6.5

RADIX-4 AND RADIX-8 MULTIPLIER USING VERILOG HDL

D I G I T A L C I R C U I T S E E

Combinational Logic II

High Throughput Radix-D Multiplication Using BCD

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 352 Digital System Fundamentals.

A novel technique for fast multiplication

Study, Implementation and Survey of Different VLSI Architectures for Multipliers

EE 8351 Digital Logic Circuits Ms.J.Jayaudhaya, ASP/EEE

Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Chapter 3 Part 2 Combinational Logic Design

Chapter 5 Design and Implementation of a Unified BCD/Binary Adder/Subtractor

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Area-Time Efficient Square Architecture

Paper ID # IC In the last decade many research have been carried

MULTIPLE OPERAND ADDITION. Multioperand Addition

A Simple Method to Improve the throughput of A Multiplier

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

SIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road QUESTION BANK UNIT I

CHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier

Numbering Systems. Number Representations Part 1

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

Multi-Operand Addition Ivor Page 1

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai

Arithmetic Logic Unit. Digital Computer Design

Basic Arithmetic (adding and subtracting)

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

Number System. Introduction. Decimal Numbers

ECE 30 Introduction to Computer Engineering

Reducing Computational Time using Radix-4 in 2 s Complement Rectangular Multipliers

Combinational Logic. Prof. Wangrok Oh. Dept. of Information Communications Eng. Chungnam National University. Prof. Wangrok Oh(CNU) 1 / 93

Lab 3: Standard Combinational Components

IMPLEMENTATION OF TWIN PRECISION TECHNIQUE FOR MULTIPLICATION

High Speed Special Function Unit for Graphics Processing Unit

Chapter 3: part 3 Binary Subtraction

Integer Multipliers 1

Binary Adders. Ripple-Carry Adder

Principles of Computer Architecture. Chapter 3: Arithmetic

Number Systems. Readings: , Problem: Implement simple pocket calculator Need: Display, adders & subtractors, inputs

ECE 341 Midterm Exam

3. The high voltage level of a digital signal in positive logic is : a) 1 b) 0 c) either 1 or 0

IA Digital Electronics - Supervision I

Improved Combined Binary/Decimal Fixed-Point Multipliers


VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

CS Computer Architecture. 1. Explain Carry Look Ahead adders in detail

Control and Datapath 8

HIGH SPEED SINGLE PRECISION FLOATING POINT UNIT IMPLEMENTATION USING VERILOG

CS 5803 Introduction to High Performance Computer Architecture: Arithmetic Logic Unit. A.R. Hurson 323 CS Building, Missouri S&T

Jan Rabaey Homework # 7 Solutions EECS141

Modified Booth Encoder Comparative Analysis

Two-Level CLA for 4-bit Adder. Two-Level CLA for 4-bit Adder. Two-Level CLA for 16-bit Adder. A Closer Look at CLA Delay

ECE 341 Midterm Exam

Combinational Circuits

HIGH PERFORMANCE FUSED ADD MULTIPLY OPERATOR

CONTENTS CHAPTER 1: NUMBER SYSTEM. Foreword...(vii) Preface... (ix) Acknowledgement... (xi) About the Author...(xxiii)

Computer Organization and Levels of Abstraction

1. Mark the correct statement(s)

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

CAD4 The ALU Fall 2009 Assignment. Description

Chapter 4: The Building Blocks: Binary Numbers, Boolean Logic, and Gates. Invitation to Computer Science, C++ Version, Third Edition

DESIGN AND IMPLEMENTATION OF ADDER ARCHITECTURES AND ANALYSIS OF PERFORMANCE METRICS

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Lecture 6: Signed Numbers & Arithmetic Circuits. BCD (Binary Coded Decimal) Points Addressed in this Lecture

Efficient Design of Radix Booth Multiplier

A New Family of High Performance Parallel Decimal Multipliers

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

High speed Integrated Circuit Hardware Description Language), RTL (Register transfer level). Abstract:

Chap.3 3. Chap reduces the complexity required to represent the schematic diagram of a circuit Library

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

VALLIAMMAI ENGINEERING COLLEGE. SRM Nagar, Kattankulathur DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING EC6302 DIGITAL ELECTRONICS

TOPIC : Verilog Synthesis examples. Module 4.3 : Verilog synthesis

Transcription:

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number Chapter 3

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number Chapter 3 3.1 Introduction The various sections discussed in this chapter are as follows. The section 3.2 describes about the existing multipliers such as array multiplier, Baugh-Wooley multiplier and the MBE multiplier. Then the design of the proposed array multiplier is illustrated in the section 3.3. In this section the logic needed for converting unsigned array multiplier to function for both signed and unsigned number is described, and then the design of VCA as the PPRT and CLCSA as the CPA are illustrated. In the later section 3.4 using radix-256 array multiplier the 32 32-bit array multiplier is implemented. Section 3.5 is about the results and discussion. Section 3.6 describes simulation waveforms. Final section 3.6 gives the summary. This chapter is concerned with the design and implementation of 8 8-bit array multiplier for signed and unsigned numbers. Using this 8 8 array multiplier the 32 32 array multiplier is implemented. The 2-input AND gate is used as the PPG for array multiplier. The MBE multiplier PPG consists of encoder and the decoder logic. These encoder and decoder logic requires several logic gates for the implementation. The literature review on the MBE [4] shows the minimum number of 46 transistors needed to implement the PPG in CMOS logic. Further comparison on the number of partial product generated by MBE and the array multiplier is described as follows. The MBE multiplier generates {(n/2) + 1} partial product rows, but the array multiplier generates n number of partial product rows. For 36

small size multiplier such as 4 4-bit operation 3-partial products are generated by the MBE and 4-partial products are generated by the array multiplier. Similarly, for an -bit multiplier 5- partial products are generated by the MBE and 8-partial products are generated by an array multiplier. Thus an array multiplier of small size is comparable with the MBE multiplier. Also the literature reviews on the MBE [1-4], have proposed multiplication operation using 2 s complement signed numbers. And array multiplier performs the operation of multiplication on unsigned numbers. Therefore, in this chapter, we have proposed signed and unsigned array multiplier that can perform multiplication operation on signed and unsigned numbers. 3.2 Existing Multipliers In this section an array multiplier for unsigned number, an array multiplier for signed number, the MBE multiplier and Baugh-Wooley multiplier for signed number are described as follows. 3.2.1 Array Multiplier Array multiplier uses two input AND logic for the generation of partial products. Figure 3.1 shows the partial products of an -bit multiplier. This array multiplier operates only on the unsigned number operands. The advantages are as follows. i. Uses only 2-input AND gate for PPG and is implemented in CMOS logic using only six transistors the area is very small and the lower power consumption. ii. The delay is comparable for small size array multipliers with the MBE multiplier. iii. Using small size multiplier radix-n multiplier can be designed. The disadvantage is the delay that increases if the array multiplier is designed for long width multipliers. 37

a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 a 7 b 0 a 6 b 0 a 5 b 0 a 4 b 0 a 3 b 0 a 2 b 0 a 1 b 0 a 0 b 0 a 7 b 1 a 6 b 1 a 5 b 1 a 4 b 1 a 3 b 1 a 2 b 1 a 1 b 1 a 0 b 1 a 7 b 2 a 6 b 2 a 5 b 2 a 4 b 2 a 3 b 2 a 2 b 2 a 1 b 2 a 0 b 2 a 7 b 3 a 6 b 3 a 5 b 3 a 4 b 3 a 3 b 3 a 2 b 3 a 1 b 3 a 0 b 3 a 7 b 4 a 6 b 4 a 5 b 4 a 4 b 4 a 3 b 4 a 2 b 4 a 1 b 4 a 0 b 4 a 7 b 5 a 6 b 5 a 5 b 5 a 4 b 5 a 3 b 5 a 2 b 5 a 1 b 5 a 0 b 5 a 7 b 6 a 6 b 6 a 5 b 6 a 4 b 6 a 3 b 6 a 2 b 6 a 1 b 6 a 0 b 6 a 7 b 7 a 6 b 7 a 5 b 7 a 4 b 7 a 3 b 7 a 2 b 7 a 1 b 7 a 0 b 7 p 15 p 14 p 13 p 12 p 11 p 10 p 9 p 8 p 7 p 6 p 5 p 4 p 3 p 2 p 1 p 0 Figure 3.1: An -bit unsigned array multiplier Figure 3.2 shows the partial products of an -bit multiplier signed number array multiplier. This also uses 2-input AND for the PPG. This array multiplier operates only on the signed number operands, but fails to operate on the unsigned number. For this multiplier, it is required to consider a control signal that can modify this multiplier to function as signed and unsigned multiplier. a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 a 7 b 0 a 7 b 0 a 6 b 0 a 5 b 0 a 4 b 0 a 3 b 0 a 2 b 0 a 1 b 0 a 0 b 0 a 7 b 1 a 6 b 1 a 5 b 1 a 4 b 1 a 3 b 1 a 2 b 1 a 1 b 1 a 0 b 1 a 7 b 2 a 6 b 2 a 5 b 2 a 4 b 2 a 3 b 2 a 2 b 2 a 1 b 2 a 0 b 2 a 7 b 3 a 6 b 3 a 5 b 3 a 4 b 3 a 3 b 3 a 2 b 3 a 1 b 3 a 0 b 3 a 7 b 4 a 6 b 4 a 5 b 4 a 4 b 4 a 3 b 4 a 2 b 4 a 1 b 4 a 0 b 4 a 7 b 5 a 6 b 5 a 5 b 5 a 4 b 5 a 3 b 5 a 2 b 5 a 1 b 5 a 0 b 5 a 7 b 6 a 6 b 6 a 5 b 6 a 4 b 6 a 3 b 6 a 2 b 6 a 1 b 6 a 0 b 6 1 a 7 b 7 a 6 b 7 a 5 b 7 a 4 b 7 a 3 b 7 a 2 b 7 a 1 b 7 a 0 b 7 p 15 p 14 p 13 p 12 p 11 p 10 p 9 p 8 p 7 p 6 p 5 p 4 p 3 p 2 p 1 p 0 Figure 3.2: An -bit signed array multiplier 38

3.2.2 Baugh-Wooley Multiplier Figure 3.3 shows -bit Baugh-Wooley signed number multiplier. It uses two an input AND gate for generating the PPG. For signed number multiplication Baugh-Wooley multiplication method uses 2 s complement number system. The Baugh-Wooley uses Hatamian s scheme for 2 s complement number system for the multiplication operation. Like the array multiplier, the Baugh-Wooley multiplier also generates n-number of partial products. a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 1 a 7 b 0 a 6 b 0 a 5 b 0 a 4 b 0 a 3 b 0 a 2 b 0 a 1 b 0 a 0 b 0 a 7 b 1 a 6 b 1 a 5 b 1 a 4 b 1 a 3 b 1 a 2 b 1 a 1 b 1 a 0 b 1 a 7 b 2 a 6 b 2 a 5 b 2 a 4 b 2 a 3 b 2 a 2 b 2 a 1 b 2 a 0 b 2 a 7 b 3 a 6 b 3 a 5 b 3 a 4 b 3 a 3 b 3 a 2 b 3 a 1 b 3 a 0 b 3 a 7 b 4 a 6 b 4 a 5 b 4 a 4 b 4 a 3 b 4 a 2 b 4 a 1 b 4 a 0 b 4 a 7 b 5 a 6 b 5 a 5 b 5 a 4 b 5 a 3 b 5 a 2 b 5 a 1 b 5 a 0 b 5 a 7 b 6 a 6 b 6 a 5 b 6 a 4 b 6 a 3 b 6 a 2 b 6 a 1 b 6 a 0 b 6 1 a 7 b 7 a 6 b 7 a 5 b 7 a 4 b 7 a 3 b 7 a 2 b 7 a 1 b 7 a 0 b 7 p 15 p 14 p 13 p 12 p 11 p 10 p 9 p 8 p 7 p 6 p 5 p 4 p 3 p 2 p 1 p 0 Figure 3.3: An -bit Baugh-Wooley multiplier 3.2.3 The MBE Multiplier Figure 3.4 shows an bit signed MBE multiplier. The MBE PPG uses 68, 56, 62 and 46 transistors for the references [1-4] respectively, when compared two input AND (six transistors using CMOS logic) gate logic of array multiplier. Thus the PPG using the MBE occupies more area and consumes more power. There are 42 partial product bits, therefore total number of transistors required to implement [1-4] are as follows. Number of transistors for [1] = 42 68 = 2856 Number of transistors for [2] = 42 56 = 2352 Number of transistors for [3] = 42 62 = 2604 39

Number of transistors for [4] = 42 46 = 1932 But the array multiplier requires 42 6 = [4] is {(1932-252)/1932} 100 = 86%. 252, thus the % of area saved compared with a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 p 08 p 08 p 08 p 07 p 06 p 05 p 04 p 03 p 02 p 01 p 00 1 p 18 p 17 p 16 p 15 p 14 p 13 p 12 p 11 p 10 n 0 1 p 28 p 27 p 26 p 25 p 24 p 23 p 22 p 21 p 20 n 1 1 p 38 p 37 p 36 p 35 p 34 p 33 p 32 p 31 p 30 n 2 n 3 p 15 p 14 p 13 p 12 p 11 p 10 p 9 p 8 p 7 p 6 p 5 p 4 p 3 p 2 p 1 p 0 Figure 3.4: An 8 8 bit signed MBE multiplier From the above discussion following points are considered. i. Two separate array multipliers are needed for the signed and unsigned multiplication operation. ii. Baugh-Wooley multiplier operates only on the signed number system, but fails to operate on the unsigned number system. iii. MBE multiplier operates only on the signed number system, but fails to operate on the unsigned number system. 3.3 Design of Proposed Array Multiplier From the discussion in the section 3.2, we have proposed an 8 8-bit signed and unsigned array multiplier. To covert unsigned into the signed multiplier the sign extend bits (e 0 through e 0 ) are used as shown in the Figure 3.5. 40

a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 e 0 e 0 e 0 e 0 e 0 e 0 e 0 e 0 a 7 b 0 a 6 b 0 a 5 b 0 a 4 b 0 a 3 b 0 a 2 b 0 a 1 b 0 a 0 b 0 e 1 e 1 e 1 e 1 e 1 e 1 e 1 a 7 b 1 a 6 b 1 a 5 b 1 a 4 b 1 a 3 b 1 a 2 b 1 a 1 b 1 a 0 b 1 e 2 e 2 e 2 e 2 e 2 e 2 a 7 b 2 a 6 b 2 a 5 b 2 a 4 b 2 a 3 b 2 a 2 b 2 a 1 b 2 a 0 b 2 e 3 e 3 e 3 e 3 e 3 a 7 b 3 a 6 b 3 a 5 b 3 a 4 b 3 a 3 b 3 a 2 b 3 a 1 b 3 a 0 b 3 e 4 e 4 e 4 e 4 a 7 b 4 a 6 b 4 a 5 b 4 a 4 b 4 a 3 b 4 a 2 b 4 a 1 b 4 a 0 b 4 e 5 e 5 e 5 a 7 b 5 a 6 b 5 a 5 b 5 a 4 b 5 a 3 b 5 a 2 b 5 a 1 b 5 a 0 b 5 e 6 e 6 a 7 b 6 a 6 b 6 a 5 b 6 a 4 b 6 a 3 b 6 a 2 b 6 a 1 b 6 a 0 b 6 e 7 a 7 b 7 a 6 b 7 a 5 b 7 a 4 b 7 a 3 b 7 a 2 b 7 a 1 b 7 a 0 b 7 p 15 p 14 p 13 p 12 p 11 p 10 p 9 p 8 p 7 p 6 p 5 p 4 p 3 p 2 p 1 p 0 Figure 3.5: An -bit signed unsigned array multiplier Algorithm that converts unsigned multiplier into signed multiplier is explained as follows. Case 1: When both operands are positive or unsigned operands When both operands multiplicand (A) and multiplier (B) are positive or unsigned the extended sign bits are assigned with 0 s as given in the equation (3.1) e 0 = 0, e 1 = 0,.. e 7 = 0 (3.1) Case 2: When multiplicand is negative and multiplier is positive In this case, multiplicand is negative, and the negative multiplicand operand is represented by the 2 s complement number system. Since the partial product and the final result are negative, the sign of each partial product should be extended. Therefore the sign extended bits are assigned with the multiplier operand bits as given by the equation (3.2) e 0 =b 0, e 1 = b 1,..e 7 = b 7 (3.2) Case 3: When multiplicand is positive and multiplier is negative 41

When multiplicand (A) is positive and multiplier (B) is negative. In this case the operands are exchanged, because the array multiplier can t produce correct result for signed number when the multiplier is negative. Therefore, operand (A) and (B) are exchanged as given by the equation (3.3) and sign bits are assigned with 1 as given by the equation (3.4). a 0 = b 0, a 1 = b 1,..a 7 = b 7 e 0 =b 0,, e 1 =b 1,,..e 7 = b 7 (3.3) (3.4) Case 4: When multiplicand is negative and multiplier is negative When both the operands are negative, then these operands are required to be represented in 2 s complement number systems. Since the product of negative operand is positive, the 2 s complimented operands are further 2 s complemented to convert into the positive operands as follows. e 0 = 0, e 1 = 0,.. e 7 = 0 a = not (a 7, a 6,.. a 0 )+1 b = not (b 7, b 6.. b 0 )+ 1 (3.5) (3.6) (3.7) a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 s 0 a 7 b 0 a 6 b 0 a 5 b 0 a 4 b 0 a 3 b 0 a 2 b 0 a 1 b 0 a 0 b 0 s 1 a 6 b 1 a 5 b 1 a 4 b 1 a 3 b 1 a 2 b 1 a 1 b 1 a 0 b 1 s 2 a 6 b 2 a 5 b 2 a 4 b 2 a 3 b 2 a 2 b 2 a 1 b 2 a 0 b 2 s 3 a 6 b 3 a 5 b 3 a 4 b 3 a 3 b 3 a 2 b 3 a 1 b 3 a 0 b 3 s 4 a 6 b 4 a 5 b 4 a 4 b 4 a 3 b 4 a 2 b 4 a 1 b 4 a 0 b 4 s 5 a 6 b 5 a 5 b 5 a 4 b 5 a 3 b 5 a 2 b 5 a 1 b 5 a 0 b 5 s 6 a 6 b 6 a 5 b 6 a 4 b 6 a 3 b 6 a 2 b 6 a 1 b 6 a 0 b 6 s 8 s 7 a 6 b 7 a 5 b 7 a 4 b 7 a 3 b 7 a 2 b 7 a 1 b 7 a 0 b 7 pp 1 pp 2 pp 3 pp 4 pp 5 pp 6 pp 7 pp 8 p 15 p 14 p 13 p 12 p 11 p 10 p 9 p 8 p 7 p 6 p 5 p 4 p 3 p 2 p 1 p 0 Figure 3.6: An 8 8-bit novel signed unsigned array multiplier 42

Figure 3.6 shows an 8-partial products of signed and unsigned array multiplier with sign logic compared to the sign extend logic of Figure 3.5 of an array multiplier. The requirement of signed and unsigned array multiplier is as listed in the Table 3.1 Table 3.1: Truth table of signed unsigned array multiplier sign_unsign (s_u) s_u a n-1 b n-1 Type of operation 0 0 0 Unsigned multiplication 1 0 0 Signed multiplication when A and B positive 1 0 1 Signed multiplication when B is negative 1 1 0 Signed multiplication when A is negative 1 1 1 Signed multiplication when A and B are negative This above conversion is required, because the array produces wrong result when both the operands are negative. From the Table 3.1, for the sign extension logic the following expressions are obtained, and implemented as shown in Figure 3.7 and Figure 3.8. e n-1 = s_u( a n-1 b n-1 ) s n-1 = a n-1 (s _u b n-1 ) s 0 = s_u(a 7 b 0 ) s 1 =a 7 (s_u b 1 ) s 2 = a 7 (s_u b 2 ) s 3 =a 7 (s_u b 3 ) s 4 = a 7 (s_u b 4 ) s 5 = a 7 (s_u b 5 ) s 6 =a 7 (s_u b 6 ) s 7 =a 7 (s_u b 7 ) s 8 =a 7 (s_u b 7 ) (3.8) (3.9) (3.10) (3.11) (3.12) (3.13) (3.14) (3.15) (3.16) (3.17) (3.18) 43

a n-1 s_u b n-1 e n-1 b n-1 s n-1 s_u a n-1 Figure 3.7: Sign extend logic for figure 3.5 Figure 3.8: Sign extend logic for figure 3.6 3.3.1 Design of VCA The PPRT converts 8 partial products (pp 1... pp 8 ) of Figure 3.6 into an array of two rows. To implement PPRT, we have proposed the Vertical Column Adder (VCA) as shown in Figure 3.9. The VCA is consists of cell-1and cell-2 as shown in Figure 3.10 and Figure 3.11. Each cell-1 produces 3-bit partial products and adding using full adder () to produce sum (s i )and the carry (c i+1 ). Each cell-2 produces 2-bit partial products, and adding using half adder (HA) to produce sum (s i ) and carry (c i+1 ). The PPGR in Figure 3.9 produces 3-sets of the rows of sum and carry signals. Then by using another level of 3:2 compressors, the final arrays of sum and carry is obtained. Finally, adding these two rows using CPA, the final product of multiplication is obtained. The partial product of Figure 3.6 taking together as pp 1, pp 2 and pp 3 is shown in the Figure 1.12. These three partial products are generated in parallel and are added to generate the arrays of sum and carry as shown in the Figure 1.13. Since the logic diagram of Figure 1.13 performs the operation of PPG and PPRT it is referred to as the PPGR. The partial product of Figure 3.6 taking together as pp 4, pp 5 and pp 6 is shown in the Figure 1.14. These three partial products are generated in parallel and are added to generate the arrays of sum and carry as shown in the Figure 1.15. The partial product of Figure 3.6 taking together as pp 7 and pp 8 is shown in the Figure 1.16. These three partial products are generated in parallel and are added to generate the arrays of sum and carry as shown in the Figure 1.17. 44

a 6 s 1 s 0 a 7 a 6 a 5 a 6 a 5 a 4 a 5 a 4 a 3 a 4 a 3 a 2 a 3 a 2 a 1 a 2 a 1 a 0 a 0 a 0 b 0 a 1 b 2 b 1 b 0 HA c 8 s 8 c 7 s 7 c 6 s 6 c 5 s 5 c 4 s 4 c 3 s 3 c 2 s 2 c 1 p 1 p 0 s 4 a 6 s 3 a 6 a 5 a 6 a 5 a 4 a 5 a 4 a 3 a 4 a 3 a 2 a 3 a 2 a 1 a 2 a 1 a 0 a 1 a 0 a 0 b 3 b 5 b 4 b 3 HA HA c 16 s 16 c 15 s 15 c 14 s 14 c 13 s 13 c 12 s 12 c 11 s 11 c 10 s 10 c 9 s 9 p x s 6 a 6 a 6 a 2 a 5 a 5 a 4 a 4 a 3 a 3 a 2 a 1 a 1 a 0 a 0 b 6 b 7 b 6 HA HA HA HA HA s 5 s 2 c 22 s 22 c 21 s 21 c 20 s 20 c 19 s 19 c 18 s 18 c 17 s 17 c 16 s 16 p y 3: 2 Compressors c 38..c 23 s 38..s 23 Figure 3.9: Architecture of VCA for PPRT 45

a i+2 a i+1 a i a i+1 a i b i b i+1 b i+2 b i b i+1 HA c i+1 Figure 3.10: Structure of cell-1 s i c i+1 s i Figure 3.11: Structure of cell-2 s 0 a 7 b 0 a 6 b 0 a 5 b 0 a 4 b 0 a 3 b 0 a 2 b 0 a 1 b 0 a 0 b 0 s 1 a 6 b 1 a 5 b 1 a 4 b 1 a 3 b 1 a 2 b 1 a 1 b 1 a 0 b 1 s 2 a 6 b 2 a 5 b 2 a 4 b 2 a 3 b 2 a 2 b 2 a 1 b 2 a 0 b 2 pp 1 pp 2 pp 3 Figure 3.12: The partial products pp 1, pp 2 and pp 3 a 6 s 1 s 0 a 7 a 6 a 5 a 6 a 5 a 4 a 5 a 4 a 3 a 4 a 3 a 2 a 3 a 2 a 1 a 2 a 1 a 0 a 1 a 0 a 0 b 0 b 0 b 1 b 2 H c 8 s 8 c 7 s 7 c 6 s 6 c 5 s 5 c 4 s 4 c 3 s 3 c 2 s 2 c 1 p 1 p 0 Figure 3.13: The PPGR for pp 1, pp 2, and pp 3 46

s 3 a 6 b 3 a 5 b 3 a 4 b 3 a 3 b 3 a 2 b 3 a 1 b 3 a 0 b 3 s 4 a 6 b 4 a 5 b 4 a 4 b 4 a 3 b 4 a 2 b 4 a 1 b 4 a 0 b 4 s 5 a 6 b 5 a 5 b 5 a 4 b 5 a 3 b 5 a 2 b 5 a 1 b 5 a 0 b 5 pp 4 pp 5 pp 6 Figure 3.14: The partial products pp 4, pp 5 and pp 6 s 4 a 6 s 3 a 6 a 5 a 6 a 5 a 4 a 5 a 4 a 3 a 4 a 3 a 2 a 3 a 2 a 1 a 2 a 1 a 0 a 1 a 0 a 0 b 3 b 3 b 4 b 5 HA HA c 16 s 16 c 15 s 15 c 14 s 14 c 13 s 13 c 12 s 12 c 11 s 11 c 10 s 10 c 9 s 9 p x Figure 3.15: The PPGR for pp 4, pp 5, and pp 6 s 6 a 6 b 6 a 5 b 6 a 4 b 6 a 3 b 6 a 2 b 6 a 1 b 6 a 0 b 6 s 8 s 7 a 6 b 7 a 5 b 7 a 4 b 7 a 3 b 7 a 2 b 7 a 1 b 7 a 0 b 7 pp 7 pp 8 Figure 3.16: The partial products pp 7 and pp 8 s 6 a 6 a 6 a 2 a 5 a 5 a 4 a 4 a 3 a 3 a 2 a 1 a 1 a 0 a 0 b 6 b 6 b 7 HA HA HA HA HA s 5 s 2 c 22 s 22 c 21 s 21 c 20 s 20 c 19 s 19 c 18 s 18 c 17 s 17 c 16 s 16 p y Figure 3.17: The PPGR for pp 7 and pp 8 47

The VCA consists of full adders in two levels, and are operated in parallel to produce the final two rows of sum and carry. The building blocks of the VCA are the full adders, operated in parallel for the addition of all the column bits. An expression for the sum and carry of full adder is given by the equations (3.19) and (3.20). In CMOS logic the full adder is implemented using only 12 transistors as shown in Figure 3.18. s i = x i+1 x i+2 c i c i+1 = (x i+1 x i+2 )c i + ( x i+1 x i+2 )x i+1 (3.19) (3.20) The logic required for SCGP is derived from the equation (3.20) is given by the equations (3.21) and (3.22). Where cp i is called carry propagate signal and cg i is called carry generate signal. cp i = x i+1 x i+2 cg i = (x i+1 x i+2 ) x i+1 (3.21) (3.22) The design of high performance SCGP logic circuit is implemented using the equations (3.21) through (3.22) is as shown in Figure 3.19. The SCGP logic circuit can save the extra hardware needed to generate carry and propagate signals. The SCGP circuit is used in the 8-bit CLA adder circuit of Figure 3.22. The circuit diagram of Figure 3.18 and Figure 3.19 is designed for small area, high speed and lower power consumption. Since most part of the multiplier consists of full adder and SCGP circuit, these are critical circuits for the multiplier performance. x i x i y s i y i cg i cp i s i c i c i+1 c i Figure 3.18: Circuit diagram of full adder Figure 3.19: Circuit diagram of SCGP 48

3.3.2 Design of CLCSA Finally, two rows of the PPRT are added by the CPA to obtain the product of an -bit signed unsigned array multiplier. For this we have designed the Carry Look-ahead Adder and Carry Select Adder (CLCSA) as the CPA. The CLCSA combines the effect of Carry Look-ahead Adder and Carry Select Adder as shown in Figure 3.20. In this method, the 8- bit CLA adder is used in cascade through carry select adder technique for high performance. The carry expressions for 8-bit CLA adder are given by the equations (3.23) through (3.30). Equations (3.23) through (3.30) are implemented as shown in Figure 3.22. Inputs g 0 through g 7 have been provided from the SCGP circuit of Figure 3.19. An 8 bit CLA of Figure 3.22 is implemented using 184 transistors. There are three such 8-bit CLA s and thus 3 184 + 224 = 552 + 224 + 32 = 808 transistors are required to implement the 16-bit CLA adder of Figure 3.20. c 1 = g 0 +p 0 c 0 (3.23) c 2 = g 1 + p 1 g 0 + p 1 p 0 c 0 (3.24) c 3 = g 2 + p 2 g 1 + p 2 p 1 g 0 + p 2 p 1 p 0 c 0 (3.25) c 4 = g 3 + p 3 g 2 +p 3 p 2 g 1 + p 3 p 2 p 1 g 0 + p 3 p 2 p 1 p 0 c 0 (3.26) c 5 = g 4 + p 4 g 3 + p 4 p 3 g 2 + p 4 p 3 p 2 g 1 + p 4 p 3 p 2 p 1 g 0 + p 4 p 3 p 2 p 1 p 0 c 0 (3.27) c 6 = g 5 +p 5 g 4 + p 5 p 4 g 3 + p 5 p 4 p 3 g 2 + p 5 p 4 p 3 p 2 g 1 + p 5 p 4 p 3 p 2 p 1 g 0 + p 5 p 4 p 3 p 2 p 1 p 0 c 0 (3.28) c 7 = g 6 + p 6 g 5 + p 6 p 5 g 4 + p 6 p 5 p 4 g 3 + p 6 p 5 p 4 p 3 g 2 + p 6 p 5 p 4 p 3 p 2 g + p 6 p 5 p 4 p 3 p 2 p 1 g 0 + p 6 p 5 p 4 p 3 p 2 p 1 p 0 c 0 (3.29) c 8 = g 7 +p 7 g 6 + p 7 p 6 g 5 + p 7 p 6 p 5 g 4 + p 7 p 6 p 5 p 4 g 3 + p 7 p 6 p 5 p 4 p 3 g 2 + p 7 p 6 p 5 p 4 p 3 p 2 g 1 + p 7 p 6 p 5 p 4 p 3 p 2 p 1 g 0 + p 7 p 6 p 5 p 4 p 3 p 2 p 1 p 0 c 0 (3.30) 49

Figure 3.20 shows the block diagram of an 8-bit CLCSA. An 8-bit CLA adder can produce carry in parallel and there are two 8-bit CLA s in each stage with 0 and 1 as the initial carry input. If the final carry output from the previous stage of 8 bit CLA adder is 1 then the output selected by the 2:1 multiplexer is the output of the CLA adder with 1 input as the initial carry. If the final carry output is 0 then the output selected by the 2:1 multiplexer is the output of the CLA adder with 0 input as the initial carry. The circuit diagram of 2:1 multiplexer is as shown Figure 3.21.The delay of the CLCSA is given by the equation (3.31). T CLCSA = (n/2)t CLA + (n/2) t MUX (3.31) Where n is the number of CLA adder blocks and t CLA is the delay of each CLA adder block and t MUX is the delay of 2:1 multiplexer. x 15 - x 8 y 15 - y 8 x 7 - x 0 y 7 - y 0 8-bit CLA adder 8-bit CLA adder 0 1 8-bit CLA adder c in 2:1 eight multiplexers c 8 p 15 p 8 p 7 - p 0 Figure 3.20: Architecture of CLCSA for 8-bit multiplier p 0i c 7i p i p 1i Figure 3.21: Circuit diagram of 2: 1 multiplexer 50

c 8 p 0 p 1 p 2 p 3 p 4 p 5 p 6 p 7 Figure 3.22: Circuit diagram of an 8-bit CLA adder 3.4 Design of 32 32-bit Array Multiplier Using Radix-256 Hence, from the discussion in the previous section, it is clear that, once the smaller size such as 4 4 or array multiplier is designed with less area, lower power consumption, and considerable critical path delay, then these multipliers can be used in parallel to design 16, 32, and 64 bit signed unsigned multipliers. These multipliers can serve as the basic building blocks, and also can be used to design parallel, pipeline, and the superscalar pipeline multipliers. By using an -bit array multiplier for signed unsigned number, the 32-bit multiplier for signed-unsigned number can be designed and implemented as illustrated in the following section. For designing 32 32-bit multiplier using an 8-bit multiplier, sixteen 8-bit multipliers are needed to operate in parallel as shown in Figure 3.23. Let A and B be the 32-bit operands for multiplication operation, these operands can be decomposed as follows. A = A 0 + A 1.2 k + A 2.2 2k + A 2.2 3k B = B 0 + B 1.2 k + B 2.2 2k + B 2.2 3k 51

Then the product P = A B is computed as follows P = A B = (A 0 + A 1.2 k + A 2.2 2k + A 2.2 3k ) ( B 0 + B 1.2 k + B 2.2 2k + B 2.2 3k ) After simplifying this expression 16 product terms are obtained. And these sixteen 8-bit multipliers are arranged with 4-rows as shown in the Figure 3.23. The CLCSA of the section 3.3.2 is used to add the four rows of 8-bit multipliers. p 63 -p 56 p 55 -p 48 p 47 -p 40 p 39 -p 32 p 31 -p 24 p 23 -p 16 p 15 -p 8 p 7 -p 0 Figure 3.23: The 32 32-bit array multiplier using sixteen 8-bit multipliers 3.5 Simulation Results and Discussion Our proposed an -bit signed and unsigned array multiplier of Figure 3.6 is compared with an signed MBE multipliers of Figure 3.4. For an array multiplier, partial product bit (1-bit) is generated by using 2-input AND gate, and in CMOS logic it is implemented using only 6 transistors. But the MBE [1-4] multipliers have been used 68, 56, 62 and 46 transistors to generate 1-bit of partial product respectively. 52

The VCA which consist of PPG and PPRT referred to as the PPGR is used to reduce the partial product array, but in case of the MBE multipliers, the Carry Save Adder (CSA) scheme is used for the PPRT. For final addition the CLCSA method is used to obtain the product of multiplication. And for the MBE multipliers CLA/Carry Select Adder scheme is used for the final addition. The 45nm CMOS technology Microwind tool is used to obtain the simulation results. From the simulation results critical path delay, the area and the power consumption are measured. The circuit such as PPGR and CLCSA are implemented using the Digital Schematic Tool. The schematic is compiled into the Verilog HDL code. After compilation the Verilog HDL code is translated into the layout. Finally, the layout is synthesized to obtain the delay, the area and the power consumption. The simulation results are as listed in the Table 3.2 and the Table 3.3. In Table 3.2 the PPG of an array multiplier using AND gate is compared with the PPG of MBE multipliers. In Table 3.3 an -bit signed unsigned array multiplier is compared with the MBE multipliers and from these results it is concluded that, there is an improvement in the delay by 25%, reduction in the area by 75% and saved in the power consumption by 40% of array multiplier over the MBE multiplier. This improvement in the result is due to the reduction in hardware, implementation logic of PPG, VCA and CLCSA scheme. Table 3.2: Comparison of Array PPG with MBE PPG References Number of Delay Area Power transistors (ns) ( m 2 ) ( W) Reference [4] 68 0.033 3.60 1.00 Reference [5] 56 0.044 3.50 0.82 Reference [6] 62 0.051 3.90 0.80 Reference [7] 46 0.045 3.10 0.66 Proposed 06 0.011 0.78 0.20 53

Table 3.3: Comparison of Array and MBE multiplier Size References Number of Delay Area Power Transistors (ns) ( m 2 ) ( W) Reference [1] 5220 0.27 1372 142 Reference [2] 4252 0.29 1302 123 Reference [3] 4412 0.32 1131 132 Reference [4] 4146 0.30 1168 120 Proposed 1196 0.20 333 85 In the table 3.3 various computations are as follows. For the proposed multiplier total number of transistors = PPG + PPRT + CPA = 252 + 392 + 552 = 1196 The percentage of the delay, the area and the power consumption compared to [1] is computed as follows. Delay % = {(0.27-0.20)/0.27} 100 = 25% Area % = {(1372-333)/1372} 100 = 75% Power % = {(142-85)/142} 100 = 40 % For the proposed multiplier of Figure 3.5 the Verilog HDL /VHDL code is written. After the successful compilation the RTL view generated is shown in Figure 3.24. The RTL view indicates that the multiplier simulated is an -bit multiplier with operand a and operand b of 8-bit each. The product p is of 16-bit represents the result of multiplication operation. The signed-unsigned bit (s_u) indicates the type multiplication operation, when s_u = 0, indicates unsigned and when s_u = 1 the signed multiplication operation. 54

Figure 3.25 shows the simulation result of an 8-bit multiplier. The result in the waveform is discussed as follows. Case 1: When s_u = 0, the product of unsigned operand is 11111111 (255) 11111111 (255) = 1111111000000001 (65025). Case 2: When s_u = 1, the product of signed operand is 11111111 (-1) 11111111 (-1) = 0000000000000001. 55

Figure 3.26 shows the simulation result when the operands are in decimal. The result in the waveform is discussed as follows. Case 1: When s_u = 0, the product of unsigned operand is (255) (255) = 65025. Case 2: When s_u = 1, the product of signed operand is (-1) (-1) = +1. Figure 3.27 shows the simulation results in decimal number system when the s_u = 0 and s_u = 1. Case 1: When s_u = 0, the operands a = 127 and b = 127 are treated as the unsigned number and the product is 127 127 = 16129. This case illustrates the condition that the operands are unsigned and positive. The positive operands in 2 s complement number system and unsigned number system, the value of magnitude is same for the range of operand from 0 to +127. Case 2: When s_u = 1, the operands a = +1 and b = -1 are treated as the unsigned number and the product is (+1) (-1) = -1. This case illustrates the condition that the operands in 2 s complement number system for 8-bit is positive from 0 to +127 and negative operand is from -1 to -128. 56

Figure 3.28 shows the simulation results in binary number system when the s_u = 0 and s_u = 1. Case 1: When s_u = 0, the operand a = 01111111 and b = 01111111 are treated as the unsigned number and the product is 01111111 01111111 = 0011111100000001. Case 2: When s_u = 1, the operands a = 11111111 (-1) and b = 00000001 (+1) are treated as the signed number and the product is 11111111 00000001 = 1111111111111111 (-1). Figure 3.29 shows the simulation results of binary number system when s_u = 0 and s_u =1. Case 1: When s_u = 0, unsigned number operation is performed. An 8-bit operands are 10000000 (128) and 10000000 (128) then the product is 10000000 (128) 10000000 (128) = 0100000000000000 (16384). Case 2: When s_u = 1, signed number operation is performed. An 8-bit operands are 10000000 (-128) and 01111111 (+127) then the product is 10000000 (-128) 01111111 (+127) = 1100000010000000 (-16256). 57

Figure 3.30 shows the simulation results of decimal number system when s_u = 0 and s_u =1. This waveform is the special case of the waveform in Figure 3.29. When s_u = 0 and when s_u = 1, the product remains same. 58

Case 1: When s_u = 0, unsigned number operation is performed. An 8-bit operands are 10000000 (128) and 10000000 (128) then the product is 10000000 (128) 10000000 (128) = 0100000000000000 (16384) as shown in Figure 3.29. Case 2: When s_u = 1, signed number operation is performed. An 8-bit operands are 10000000 (-128) and 10000000 (-128) then the product is 10000000 (-128) 10000000 (-128) = 16384. 3.6 Summary In this chapter, we have discussed various types of multipliers, and then we have proposed an 8 8 array multiplier. This proposed multiplier operates on signed as well as unsigned number system. For this multiplier to function for signed and unsigned number the required sign logic is implemented. In our proposed array multiplier 2-input AND gate PPG is used along with the PPRT and is referred to as the PPGR. It performs two functions such generating partial product and also reduces the partial product rows. 59

Once the high performance 8-bit multiplier is designed then this multiplier can be used to design long width multipliers. For the 32-bit proposed array multiplier only four rows are required. But when the MBE is 16 partial products are generated and there is long delay path in the implementation of the PPRT. With the designed approaches of PPGR, signed logic and CLCSA methods, simulation results have shown the reduction in the delay by 25%, reduction in the area by 75% and reduction the power consumption by 40% over the MBE multiplier. 60