II. MOTIVATION AND IMPLEMENTATION

Similar documents
Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

HIGH PERFORMANCE FUSED ADD MULTIPLY OPERATOR

Design of Add-Multiply operator usingmodified Booth Recoder K. Venkata Prasad, Dr.M.N. Giri Prasad

MODERN consumer electronics make extensive use of

Paper ID # IC In the last decade many research have been carried

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

Optimized Modified Booth Recorder for Efficient Design of the Operator

ISSN (Online)

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

JOURNAL OF INTERNATIONAL ACADEMIC RESEARCH FOR MULTIDISCIPLINARY Impact Factor 1.393, ISSN: , Volume 2, Issue 7, August 2014

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3

An Efficient Flexible Architecture for Error Tolerant Applications

Low Power Floating-Point Multiplier Based On Vedic Mathematics

Australian Journal of Basic and Applied Sciences

Efficient Radix-10 Multiplication Using BCD Codes

Improved Design of High Performance Radix-10 Multiplication Using BCD Codes

Flexible DSP Accelerator Architecture Exploiting Carry-Save Arithmetic

IMPLEMENTATION OF TWIN PRECISION TECHNIQUE FOR MULTIPLICATION

High Speed Multiplication Using BCD Codes For DSP Applications

A High Performance Reconfigurable Data Path Architecture For Flexible Accelerator

Design and Implementation of Advanced Modified Booth Encoding Multiplier

Study, Implementation and Survey of Different VLSI Architectures for Multipliers

High Throughput Radix-D Multiplication Using BCD

DESIGN AND IMPLEMENTATION BY USING BIT LEVEL TRANSFORMATION OF ADDER TREES FOR MCMS USING VERILOG

RADIX-4 AND RADIX-8 MULTIPLIER USING VERILOG HDL

32-bit Signed and Unsigned Advanced Modified Booth Multiplication using Radix-4 Encoding Algorithm

Efficient Design of Radix Booth Multiplier

Area Delay Power Efficient Carry-Select Adder

FPGA IMPLEMENTATION OF EFFCIENT MODIFIED BOOTH ENCODER MULTIPLIER FOR SIGNED AND UNSIGNED NUMBERS

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications for Efficient FIR Filter Implementation

Effective Improvement of Carry save Adder

A Review on Optimizing Efficiency of Fixed Point Multiplication using Modified Booth s Algorithm

AN IMPROVED FUSED FLOATING-POINT THREE-TERM ADDER. Mohyiddin K, Nithin Jose, Mitha Raj, Muhamed Jasim TK, Bijith PS, Mohamed Waseem P

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Design a floating-point fused add-subtract unit using verilog

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

DESIGN AND IMPLEMENTATION OF FAST DECIMAL MULTIPLIER USING SMSD ENCODING TECHNIQUE

AN EFFICIENT FLOATING-POINT MULTIPLIER DESIGN USING COMBINED BOOTH AND DADDA ALGORITHMS

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Design of a Floating-Point Fused Add-Subtract Unit Using Verilog

ISSN Vol.03,Issue.05, July-2015, Pages:

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

A novel technique for fast multiplication

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

DUE to the high computational complexity and real-time

Area-Delay-Power Efficient Carry-Select Adder

High Speed Special Function Unit for Graphics Processing Unit

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

1. Introduction. Raj Kishore Kumar 1, Vikram Kumar 2

Analysis of Different Multiplication Algorithms & FPGA Implementation

An Encoder Based Radix -16 Booth Multiplier for Improving Speed and Area Efficiency

Digital Computer Arithmetic

A BSD ADDER REPRESENTATION USING FFT BUTTERFLY ARCHITECHTURE

A Simple Method to Improve the throughput of A Multiplier

Reducing Computational Time using Radix-4 in 2 s Complement Rectangular Multipliers

Partial product generation. Multiplication. TSTE18 Digital Arithmetic. Seminar 4. Multiplication. yj2 j = xi2 i M

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

A High Speed Design of 32 Bit Multiplier Using Modified CSLA

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Vertical-Horizontal Binary Common Sub- Expression Elimination for Reconfigurable Transposed Form FIR Filter

FPGA Implementation of Efficient Carry-Select Adder Using Verilog HDL

Area Delay Power Efficient Carry Select Adder

IMPLEMENTATION OF CONFIGURABLE FLOATING POINT MULTIPLIER

Sungmin Bae, Hyung-Ock Kim, Jungyun Choi, and Jaehong Park. Design Technology Infrastructure Design Center System-LSI Business Division

Area Delay Power Efficient Carry-Select Adder

Hybrid Signed Digit Representation for Low Power Arithmetic Circuits

Delay Optimised 16 Bit Twin Precision Baugh Wooley Multiplier

Design of an Efficient 128-Bit Carry Select Adder Using Bec and Variable csla Techniques

Design of Delay Efficient Carry Save Adder

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

Design of BCD Parrell Multiplier Using Redundant BCD Codes

Implementation of digit serial fir filter using wireless priority service(wps)

VARUN AGGARWAL

A Unified Addition Structure for Moduli Set {2 n -1, 2 n,2 n +1} Based on a Novel RNS Representation

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding

Design and Implementation of an Eight Bit Multiplier Using Twin Precision Technique and Baugh-Wooley Algorithm

Design and Implementation of CVNS Based Low Power 64-Bit Adder

/$ IEEE

Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra Pradesh, India

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

Fault Tolerant Parallel Filters Based on ECC Codes

A Pipelined Fused Processing Unit for DSP Applications

OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

Signed Binary Addition Circuitry with Inherent Even Parity Outputs

Transcription:

An Efficient Design of Modified Booth Recoder for Fused Add-Multiply operator Dhanalakshmi.G Applied Electronics PSN College of Engineering and Technology Tirunelveli dhanamgovind20@gmail.com Prof.V.Gopi Dean of ECE department PSN College of Engineering and Technology Tirunelveli kanyavicky@gmail.com Abstract For computing complex arithmetic calculations in DSP, the fused Add-Multiply (FAM) operator is optimized to increasing the performance. The Sum to Modified Booth algorithm is introduced for Add-Multiply operation. We look into, the technique to implement the direct recoding of sum of two numbers in its Modified Booth (MB) form. It is structured, simple and can be easily modified in order to be applied either in signed or unsigned numbers, which comprise of odd or even number of bits. It decreases the critical path delay and reduces area and power consumption. Index Terms Add-Multiply operation, arithmetic circuits, Modified Booth recoding, VLSI design. I. Introduction Now-a-days Electronics are mostly used for Digital Signal Processing(DSP).DSP application carryout large number of operation such as Discrete Cosine Transform and Fast Fourier Transform etc.., The design of arithmetic components combining operation which shares data, can lead to significant performance improvements. Based on the observation addition can often be used to a multiplication.mac will be used for this operation. The fast multiplication process consists of three Steps: partial product generation, partial product reduction and final carry-propagating addition. To reduce the number of partial products, recoding techniques have been widely used. In addition, the multiply-accumulate (MAC) operation is very prevalent in many scientific and engineering applications. It is very easy to find such operation in signal processing algorithms and matrix arithmetic. In this paper, we have to use S-MB recoding algorithm to reduce the partial product. S-MB operates direct recoding of sum of two numbers in its MB form leads to more efficient implementation of the Fused Add-multiply (FAM) unit compared to existing one, Conventional recoding schemes are based on the complex manipulation in bit-level, which are implemented by the gate level circuits. This work observe on the efficient design of the FAM operator, The aim of computing the of the recoding scheme for direct shaping of the MB form of sum of two numbers(sum to Modified).We introduce the technique, which decrease the critical path delay and reduces area and power consumption. The proposed S-MB algorithm structured is simple and easily modified in order to apply either in signed or unsigned numbers which comprise odd or even number of bits. We explore three alternative schemes of the proposed S-MB approach using conventional and signed-bit Full Adders (FAs) and Half Adders (HAs) as building blocks. We evaluated the performance of the proposed S-MB technique by comparing its three different schemes with the state-of the-art recoding techniques. The proposed designs deliver improvements in both area occupation and power consumption, thus outperforming the existing S-MB recoding solutions. The rest of the paper is organized as follows: In section II, we discuss the motivation and present a technical background for the implementation of FAM units. In section III the proposed S-MB recoding schemes is presented.in Section IV, both theoretical analysis and experimental evaluation are given, clearly identifying the advantages of the proposed schemes with respect to area complexity, critical delay and power dissipation. Section V concludes our work. II. MOTIVATION AND IMPLEMENTATION In this paper, we observe on AM units which implement the operation Z=X.(A+B)The Existing design of the AM operator requires that its input B and C are driven to an adder and then the input A and the sum Y=A+B are driven to a multiplier in order to get X. The recent research activities in the field of arithmetic optimization have shown that the design of arithmetic components combining operations which share data, can lead to significant performance improvements. Based on the observation that an addition can often be subsequent to a multiplication (e.g., in symmetric FIR filters), the Multiply-Accumulator (MAC) and Multiply-Add (MAD) units were introduced leading to more efficient implementations of DSP algorithms compared to the conventional ones, which use only primitive resources. Several architectures have been proposed to optimize the performance of the MAC operation in terms of area occupation, critical path delay or power consumption. MAC components increase the flexibility of DSP data path synthesis as a large set of arithmetic operations can be efficiently mapped onto them. Except the MAC/MAD operations, many DSP applications are based on Add-Multiply (AM) operations. The straightforward design of the AM unit, by first allocating an adder and then driving its output to the input of a multiplier, increases significantly both area and critical path delay of the circuit. 34

The proposed S-MB algorithm is structured, simple and can be easily modified in order to be applied either in signed (in 2 s complement representation) or unsigned numbers, which comprise of odd or even number of bits. It explores three alternative schemes of the proposed S-MB approach using conventional and signed-bit Full Adders (FAs) and Half Adders (HAs) as building blocks. The proposed method evaluates the performance of the proposed S-MB technique by comparing its three different schemes with the state-of the- art recoding techniques. Industrial tools for RTL synthesis and power estimation have been used to provide accurate measurements of area utilization, critical path delay and power dissipation regarding various bit-widths of the input numbers. We show that the adoption of the proposed recoding technique delivers optimized solutions for the FAM design enabling the targeted operator to be timing functional (no timing violations) for a larger range of frequencies. Also, under the same timing constraints, the proposed designs deliver improvements in both area occupation and power consumption, thus outperforming the existing S-MB recoding solutions. A.REVIEW OF THE MODIFIED BOOTH FORM Modified Booth (MB) is a prevalent form used in multiplication. It is a redundant signed-digit radix-4 encoding technique. Its main advantage is that it reduces by half the number of partial products in multiplication comparing to any other radix-2 representation. Let us consider the multiplication of 2 s complement numbers and with each number consisting of n=2kbits. The multiplicand can be represented in MB form as: 1 1 1 0 1 0 0 0 Table I. Modified Booth Encoding Table B. FAM IMPLEMENTATION The FAM design is presented in Figure I the multiplier is a parallel one based on the MB algorithm. Let us consider the product X.Y. The term is encoded based on the MB algorithm and multiplied. Both X and Y consists of n=2kbits and are in 2 s complement form. correspond to the three consecutive bits and with one bit overlapped and considering that Table I shows how they are formed by summarizing the MB encoding technique. Each digit is represented by three bits named, one and two. Using these three bits we calculate the MB digits by the following relation: one j +2.two j )] Binary MB encoding Input Sign= 1= 2= Carry 0 0 0 0 0 0 0 0 0 0 1 +1 0 1 0 0 0 1 0 +1 0 1 0 0 0 1 1 +2 0 0 1 0 1 0 0-2 1 0 1 1 1 0 1-1 1 1 0 1 Fig I: Fused Add-Multiply III.SUM TO MODIFIED BOOTH RECODING TECHNIQUE (S-MB) A.Defining Signed-Bit Full Adders and Half Adders for Structured Signed Arithmetic In S-MB recoding technique, it recodes the sum of two consecutive bits of the input A ( ) with two consecutive bits of the input B ( ) into one MB digit. The most significant of them is negatively weighted while the two least significant of them have positive weight. Consequently, in order to transform the two aforementioned pairs of bits in MB form we need to use signed-bit arithmetic. For this purpose, we develop a set of bitlevel signed Half Adders (HA) and Full Adders (FA) considering their inputs and outputs to be signed. 1) S-MB1 Recoding Scheme: The first scheme of the proposed recoding technique is referred as S-MB1 For both even and odd bit-width of input numbers 1 1 0-1 1 1 0 1 35

Fig.S-MB1 recoding scheme for a) Even and b) Odd number of bits Both bits and are extracted from the j recoding cell. A conventional FA with inputs A and B produces the carry 2) S-MB2 Recoding Scheme: The second approach of the proposed recoding technique, S-MB2, is described for even and odd bit-width of input numbers. Fig.S-MB2 recoding scheme for a) Even and b) Odd number of bits Fig.S-MB3 recoding scheme for a) Even and b) Odd number of bits The bit is now the output carry of a HA* (basic operation Table 3.2), which belongs to the (j-1) recoding cell and has the bits, as inputs. The negatively signed bit is produced by a HA** (Table 3.4) in which we drive and the output sum (negatively signed) of the HA* of the j recoding cell with the bits as inputs. The carry and sum outputs of the HA** are given by the following Boolean equations: = (3.15) In case that both and comprise of even number of bits and are negatively weighted and this method use the dual implementation of the HA* (Table 3.3) in the ( k-1) recoding cell. Consequently, the output sum of the HA* becomes positively weighted and the HA** that follows has to be replaced with a HA*. The most significant digits for both cases of even and odd bit-width of A and B are formed as in S-MB2 recoding scheme. The critical path delay of S-MB3 recoding scheme is calculated as follows: The third scheme implementing the proposed recoding technique is S-MB3. It consider that C 0,1 =0 and C 0,2 =0. It build the digits based on S 2j+1,S 2j and C 2j.2, and according to (3.8). Once more, it use a conventional FA to produce the carry C 2j+1 and the sum S 2j. IV.PERFORMANCE EVALUATION A.Theoritical Analysis In this section, we present a theoretical analysis and comparative study in terms of area complexity and critical delay among the three recoding schemes that we described in Section III and the three existing recoding techniques TABLE-II Area And Delay Of Various Components In The Unit Gate Model. 36

they are incorporated in FAM designs, yield considerable performance improvements in comparison with the most efficient recoding schemes found in literature. ACKNOWLEDGMENT The authors would like to thank the reviewers for their constructive comments. REFERENCE B.EXPERIMENTEL EVALUATION In this section, we compare the performance of the three proposed recoding schemes explored in Section III with the three schemes described in [12], [13], [23].We included each of the recoding schemes in a fused Add-Multiply (FAM) operator (Fig. 1(b)) and implemented them using structural Verilog HDL for both cases of even and odd bit-width of the recoder s input numbers. The Wallace CSA tree and the fast CLA adder have been imported from the Synopsys DesignWare Library. We used Synopsys Design Compiler [18] and the Faraday 90 nm standard cell library [22] to synthesize the evaluated designs. In order to have a fair comparison among all FAM designs, they are implemented identically except for the recoding module. 1.A. Amaricai, M. Vladutiu, and O. Boncalo, Design issues and imple-mentations for floating-point divide-add fused, IEEE Trans. CircuitsSyst. II Exp. Briefs, vol. 57, no. 4, pp. 295 299, Apr. 2010. 2.E. E. Swartzlander and H. H. M. Saleh, FFT implementation with fused floating-point operations, IEEE Trans. Comput., vol. 61, no. 2, pp-284 288, Feb. 2012. 3.Y.-H. Seo and D.-W. Kim, A new VLSI architecture of parallel multiplier accumulator based on Radix-2 modified Booth algorithm, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 2, pp.201 208, Feb. 2010. 4.O. Kwon, K. Nowka, and E. E. Swartzlander, A 16-bit by 16-bit MAC design using fast 5: 3 compressor cells, J. VLSI Signal Process. Syst., Fig. 4. Area comparison for even bit-width with all values normalized to the corresponding ones of [12]. CONCLUSION This paper focuses on optimizing the design of the Fused- Add Multiply (FAM) operator. We propose a structured technique for the direct recoding of the sum of two numbers to its MB form. We explore three alternative designs of the proposed S-MB recoder and compare them to the existing. The proposed recoding schemes, when 5.vol. 31, no. 2, pp. 77 89, Jun. 2002. Y.-H. Seo and D.-W. Kim, A new VLSI architecture of parallel multiplier accumulator based on Radix-2 modified Booth algorithm, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 2, pp.201 208, Feb. 2010. 6.A. Amaricai, M. Vladutiu, and O. Boncalo, Design issues and implementations for floating-point divide-add fused, IEEE Trans. CircuitsSyst. II Exp. Briefs, vol. 57, no. 4, pp. 295 299, Apr. 2010. 7.M. Daumas and D. W. Matula, A Booth multiplier accepting both a redundant or a non redundant input with no additional delay, in Proc.IEEE Int. Conf. on Application-Specific Syst., Architectures, and Processors,2000, pp. 205 214. 8.Z. Huang andm. D. Ercegovac, High-performance low-power lefttoright array multiplier design, IEEE Trans. Comput., vol. 54, no. 3, pp. 272 283, Mar. 2005. 37

9.VLSI Design: A Circuits and Systems Perspective, 4th ed. Readington:Addison-Wesley, 2010, ch. 11. 10.S. Xydis, I. Triantafyllou, G. Economakos, and K. Pekmestzi, Flexible datapath synthesis through arithmetically optimized operation chaining, in Proc. NASA/ESA Conf. Adaptive Hardware Syst., 2009, 38