(Online First)A Survey of Algorithmic and Architectural Modifications for Enhancing the Performance of Booth Multipliers

Size: px

Start display at page:

Download "(Online First)A Survey of Algorithmic and Architectural Modifications for Enhancing the Performance of Booth Multipliers"

Samuel Quinn
5 years ago
Views:

1 Journal of Computer Hardware Engineering (2018) Volume 1 doi: /jche.v1i2.797 (Online First)A Survey of Algorithmic and Architectural Modifications for Enhancing the Performance of Booth Multipliers R Nirmaladevi 1*, R Seshasayanan 2 1 Assistant Professor, Department of Electronics and Communication Engineering, Meenakshi Sundararajan Engineering College, Chennai , India, nirmala.sashi@gmail.com 2 Associate Professor, Department of Electronics and Communication Engineering, College of Engineering, Guindy, AnnaUniversity, Chennai ABSTRACT Multiplication is the crucial operation in any DSP processor, since it requires more hardware resources and processing time. The performance of the multiplication is quantified with area, power and speed. Ever increasing demand for portable devices necessitates the design of high performance low power multipliers. Various high performance algorithms and architectures have been proposed in the design of multipliers. Number of partial products (PPs) has to be reduced to achieve low power dissipation. Booth algorithm is one of the most popular and fastest algorithm which reduces the number of partial product rows and there by the power dissipation. This paper is aimed to present a survey on booth multiplier. This survey will give researchers a brief idea about the algorithmic & architectural modifications in multipliers that are implemented in Booth algorithm so far. Keywords:Booth algorithm; Architecture; Multiplier; Partial product Summation 1. Introduction DSP applications are intensive in the field of Multimedia and Communications. 1 Since large number of arithmetic operations like Multiplication, Addition and Shifting are involved in applications such as DCT, DFT, FFT and FIR operations, it is a very important design challenge in designing high performance integrated circuits for portable devices. Power reduction, specifically dynamic power is achieved by either algorithmic approach or by architectural modification. Speed improvements are achieved by pipelining and parallel processing the architecture, but this would increase the area and computational power. There should be a tradeoff between the three constraints: Power, Delay and Area. The ultimate aim of design engineer is designing an efficient core like adder, multiplier (specialized arithmetic core), coprocessor, arithmetic software and core library. This is achieved by meeting the following challenges: numbering system, specifications, constraints, algorithm, operation and implementation. Each has its own merits and demerits. This proposed survey is an attempt to explore the techniques that have been implemented so far, to enhance the performance of multipliers to meet the above said requirements and challenges. This paper is organized as follows: In section 2, Various multiplier architectures are discussed, section 3 & section 4 describes the Booth algorithm and Modified Booth algorithm, in section 5 & 6 algorithmic modifications and architectural modifications are presented respectively, section 7 presents the Evaluation of Booth Algorithm with Wallace Tree architecture and finally the summary will be presented. 2. Various Multiplier Architectures Many researchers have tried and are trying to design multipliers which offer either of the following design targets high speed, low power consumption, regularity of layout and hence less area or even combination of them in one Copyright 2018 R Nirmaladevi et al. doi: /jche.v2i1.797 EnPress Publisher LLC.This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). by/4.0/

multiplier thus making them suitable for various high speed, low power and compact VLSI implementation. Some of the architectures are discussed in this section.

2 multiplier thus making them suitable for various high speed, low power and compact VLSI implementation. Some of the architectures are discussed in this section. In a conventional multiplier, partial products (PP) are formed first by multiplying the multiplicand with each bit of the multiplier. These PPs are then added together to generate the product. Thus, multiplication is composed of two main parts, namely PP generation and PP accumulation. 2.1 Shift and Add Multiplier The first and simplest method of multiplication is Shift and Add multiplication algorithm. In this algorithm, copies of multiplicand are shifted and added together to produce the final product as in the Figure. 1 Row of AND gates are used to generate a single row of partial product. The number of partial product rows equal to the bit width of the multiplier. Hence the latency of this algorithm is related to the height of the PP section. Accelerating multiplication operation, therefore, targets the following goals: speeding up the PP generation, reducing the number of PPs, speeding up the PP accumulation or a combination of one or more of them. But this method suits only for unsigned numbers. Figure 1. Shift and Add Multiplication Algorithm for 8x8 multiplication In serial architecture, one partial product is generated at a time upon the clock pulse and is stored in an accumulator or intermediate register (partial product register). Upon the generation of next partial product, it is shifted by one bit and added with the accumulator for intermediate values of partial products. This process is repeated until the number of multiplier bits (Figure 2). This class of architecture occupies minimum area and hence less power dissipation. However the delay is proportional to the number of multiplier and multiplicand bits. When area and power is of utmost importance and delay can be tolerated, the serial multiplier can be suited for low power applications. Figure 2. Serial multiplier architecture In parallel multipliers, addition of partial products is the main parameter that determines the performance of the multiplier. Array multiplier is well known due to its regular structure. Each partial product is generated by the multiplication of the multiplicand with one multiplier bit. The partial product are shifted according to their bit orders 2

Two s complement number system is typically chosen to represent numbers in signal processing applications since the arithmetic operations are easy to perform.

3 and then added. The addition can be performed with normal carry propagate adder. the multiplier length. N-1 adders are required where N is The choice of number system also affects the multiplier performance. Two s complement number system is typically chosen to represent numbers in signal processing applications since the arithmetic operations are easy to perform. One of the problems with two s complement representation is sign- extension, which will cause the most significant bit (MSB -the sign bit) to extend up to the product bit width. But in signed-magnitude representation, only one bit is allocated for the sign and the rest bits to the magnitude. In this case, only one bit will toggle when the signal switches sign, as opposed to the two's complement representation where several bits have to switch their condition due to sign-extension. This property will minimize the switching activity in the MSBs of signed magnitude representation [1]. Booth and Modified Booth algorithm are described in the following sections. 3. Booth Algorithm Booth [2] described a technique in which binary numbers of either sign can be multiplied together, by a uniform process that is independent of any prior knowledge of the signs of these numbers. Negative numbers were represented in two s complement form. Multiplier is partitioned into group of 2 bits, and each group is decoded to select a single partial product as per the selection Table.1 The hardware architecture for the conventional Booth algorithm is given in the Figure.3 Figure 3. Hardware architecture for conventional Booth Algorithm Multiplier Multiplier Bits Value Explanation Operation 00 0 String of 0s Add 0 to the partial product &One bit left shift operation 01 1 End of string of 1s Add multiplicand to the partial product and left shift by one bit 10-1 Beginning of string of 1s Subtract multiplicand from the partial product and left shift by one bit 11 0 String of 1s Add 0 to the partial product & One bit left Table 1. Radix-2 Booth Recoding shift operation 3

4 4. Modified Booth Algorithm (MBA) Modified version of the booth s algorithm is proposed by Mac Sorely [3]. Here the multiplier is grouped into 3 bits, each group is decoded to select a single partial product which is given in the Table.2. In radix 4 algorithm, PPs are shifted two bits at each cycle, (two bits are eliminated), this is given by e = log 2 r, ( r radix). In general the number of bits eliminated per cycle represents the standard shift length between the cycles of the multiplier. Srimani [4] and Rubinfield [5] presented the proof of this algorithm. Theoretically MBA [6] has the advantage of reducing the number of partial products to N/2 rows regardless of the input, but for the signed numbers PP rows are extended to (N/2 + 1)(N- bit width). This algorithm can be generalized to any radix in contrast with conventional Booth algorithm (radix-2) with an overhead of generating hard multiples (+3X, +5X, ). These hard multiples are produced by special kind of carry propagate adder [7]. In spite of the speed improvement in the addition, the delay caused by carry propagation renders this scheme to be slower than a conventional one. In addition to this, the PP selector logic (Booth Encoder : Figure. 4) is also an overhead in terms of area, power and delay as the radix is increased. Villeger and Oklobdzija [6] evaluated Booth encoding method. They have compared the number of gate levels with and without Booth encoding when 4:2 compressors are used. As per their evaluation, Booth algorithm could be used in the implementations of hardware or software multipliers considering both the sign magnitude numbers as well as two s complement numbers with no need for a correction term or a correction step. The application of Booth recoding results in a reduction of number of PP rows by two which is data dependent and not predictable. This feature makes the booth encoding incompatible for the parallel implementations of the multiplier. Multiplier Bits Multiplier Value Explanation Operation String of 0s End of string of 1s Isolated 1 in the string Add 0 to the partial product & two bits of left shift operation Add multiplicand to the partial product and left shift by two bits Add multiplicand to the partial product and left shift by two bits End of String of 1s Add two times the multiplicand to the partial product and left shift by two bits Beginning of a string of 1s Subtract two times the multiplicand from the partial product and left shift by two bits End of one string, Beginning of a new Subtract the multiplicand from the partial string product and left shift by two bits Beginning of a string of 1s Subtract the multiplicand from the partial product and left shift by two bits String of 1s Add 0 to the partial product & two bits of left shift operation Table 2. Radix-4 Booth Recoding Sign extension: Modified Booth algorithm introduces not only extra delay caused by more complex Booth Encoder, but also results in increased circuit size due to the need of propagating the sign extension through the Carry save Array(CSA). The hard multiples (3X) cannot be obtained by simple shifting and/or complementation, which increase the complexity and delay in the adder to precompute the odd multiples. The number of multiples increased with the radix, hence there is a bound on the radix of Booth Encoder to be beneficial. Radix-8 Booth architecture is used in many of the 4

(final carry propagation). Major portion of the multiplication time is spent in the partial product accumulation stage, which also consumes larger area.

5 high performance processors and higher radix architectures are not cost effective [8]. Figure 4. Partial product selector logic (Booth Encoder) In general the sources of power consumption in multipliers are, recoding of multiplier bits, partial product generation, PP reduction and final addition (final carry propagation). Major portion of the multiplication time is spent in the partial product accumulation stage, which also consumes larger area. The partial product generation can be improved by using Booth encoder [9,10], while the partial products accumulation can be accelerated using tree adders employing various architectures. Literature offers many high performance algorithms and architectures to accelerate the multiplication. The following section gives the brief of the modifications carried out in algorithmic level and architectural level to enhance the Booth multiplier performance. 5. Algorithmic Modifications There are many proposals for multi-operand addition of unsigned binary numbers are available in the literature, Agarwal & Rao [11,12] suggested an algorithm for the addition of two numbers in two s complement notation. It has been extended to n signed summands including the correction factor. This sign bit extension leaves the two s complement number invariant, the sum S can be obtained by adding the n extended numbers conventionally, by throwing away all carry overflows from (r-1)th stage( r- radix). In this algorithm, 1 is added for each number in the position k-1 of the original word length of k bits. For negative numbers, this generates a carry overflow at 2r position and a string of (r-k+1) leading zeros. For non-negative numbers the extended positions remain zero without any carry overflow. Thus in either case, the extended positions being zero value need not participate in the summation process. Hence the time delay remains the same as that of processing n unsigned numbers. The correction factor is added to the final product to get the correct result. Hence they have prevented the sign extension which will remove the additional overhead. Madrid et al [13] has shown an improved algorithm for fixed point multiplication of two fraction numbers, which involves the correction cycle in the final result for High Radix multiplication. Gong Guo [14] proved that this correction cycle is required only when fixed point fraction numbers are used and not for the integer case. Dawoud [15] proposed a new algorithm that needs a correction cycle for the multiplication of two fixed length signed numbers using radix greater than 2 ( r>2). This correction cycle depends on the radix r and the multiplier word length (M). Higher radix recoding found applications in high speed arithmetic units and single chip multipliers. By increasing the radix, the number of PPs tend to be reduced which speeds up the multiplication process. Mac sorely [3], Rubinfield [5], Vassiliadis [16] already suggested correction cycle for higher radix. However the general case of r has not been considered in [3] [5] [16]. This correction cycle includes the shifting the result of the standard algorithm a number of bit positions to the right, which depends on r and M. The number of correction shifts is given by, 5

6 C=(M-1)-Ts (1) (Ts Total number of shifts during the implementation of the algorithm) Bewick and Flynn [7] presented a Booth 3 algorithm for generation of hard multiples (+3X, +5X, ) in partially and fully redundant form. In a fully redundant form, a n-bit number is presented by two n-1 bit numbers whose sum equals to the number to be represented. Hence the hard multiple 3X is represented as 2X+X, since 2X and X are easy multiples. But this algorithm is not so efficient because of the two s complement of the number to get negative PP. During the process of two s complement, 1 is added at the LSB to get negative PP. In this algorithm, a number is represented by two numbers, hence two 1s are added in the LSB. Even though this method avoids the carry propagation, due to twice the sum of PP, will increase the complexity of the generation of partial product. In [17], the authors presented a Statistical approach to Booth s algorithm, they have altered the number of shifts for the multiples by means of state diagram realization. By doing so, they have achieved the lesser propagation delay than the Direct Booth implementation, also higher percentage of cycles being saved in this approach. But this approach has some of the following overheads, (i) multiples do not have same number of cycles always, (ii) State machine implementation is more complicated than Direct Booth implementation, (iii) Wider Multiplexers have to be used at the input and output of Adder/ Subtractor. The delay from an input to an output in a Full adder depends on 0 to 1 and 1 to 0 transition. The path from input A or B to sum output is equal to 2 XOR delays and the path from Cin to sum output is equal to one XOR delay. The authors [18] proposed a new delay model and properly interconnecting the fast inputs and fast outputs to minimize the critical path of the 4:2 compressor. This approach gives 3XOR gate delays regardless of the path. The use of this 4:2 compressor reduces the critical path. Hence the speed of the Parallel multipliers has been enhanced. In [19], authors tried to improve the PP array even before applying any PP reduction techniques. In MBA, the neg signals ( a one is added in the LSB of the complemented PP while taking the twos complement of the multiplicand) are needed, which will increase one more PP row to the PP array ( as shown in Figure.5) which leads at least one more EXOR delay. This is the worst case for small word length operands. This makes the irregular PP array too. The authors have proposed a fast method to find the twos complements so as to remove the extra row and they have achieved 13% speed improvement, 14% power savings when compared to the conventional methods. Figure 5. Conventional PP array by Modified Booth Algorithm for 8x8 multiplication 6. Architectural Modifications The first architectural modification was introduced by Wallace [20], the best alternative for the regular iterative array structure. Wallace introduced a tree structure with carry save adders constructed from one bit Full adders (also known as 3:2 counters) for reducing the partial products. Even though this tree structure results in speed improvements, suffers from an irregular structure and larger interconnect delays. Dadda [21] proposed a method to minimize the number of counters in a compression tree. Weinberger [22] introduced 4:2 counter for parallel multiplier array. Higher order counters are introduced by In a conventional Song and Michelli [23] which gives better results than other architectures. Modified Booth algorithm, PPs are added one at a time in an adder array whose result is formed in a final carry propagated added stage with an overhead of max gate delays. Cooper [24] described a novel parallel architecture to reduce the number of gate delays. In his architecture the PPs are divided into parallel groups and 6

7 the outputs from each group are combined in a single full adder row which produces a sum and carry output. This parallel implementation reduces the number of full adder delays from seven to four for a 16 bit multiplier. In contrast with the fully redundant form [7], 3X multiples are represented in partially redundant form. Here series of small length adders are used without carry propagation in addition, which results in reduction of hardware and faster than a full carry propagate adder. In [18], authors have also proposed a scheme by creating a vertical bit slice compressor (VCS). This VCS is designed not to introduce a long delay, so that the vertical and horizontal critical paths are minimized rather than the number of Full adder levels. In [25,26] Figure 6. Vertical bit Slice Compressor(VCS), the design of Dynamic Range Determination unit had been highlighted for reducing the switching activities. Due to sign extension in the twos complement representation, an addition operation for a small dynamic range of input data may be inefficient. The authors exploit the feasibility of dynamically allocating functional blocks of an adder for various dynamic data ranges, hence switching activities of unused blocks are reduced. Switching activities directly related to the switching capacitance, the dynamic power dissipation is also reduced by this method. In [27], the authors have developed a New radix-4 Modified Booth Encoding Scheme.They have designed three types of recoders in this scheme, which are realized by using pass- transistor logic or other logic styles. Even though these recorders suffer from latency and power efficiency problems, they offer a better performance in higher order bit positions. However in LSB positions this scheme is very slow. In [28], Elguibaly proposed a dependence graph to describe a merged multiply-accumulate(mac) hardware based on Modified Booth Algorithm. The author also developed a delay model which includes multilevel gate delays, considering the input ramp and output loading. Based on this delay model, he designed a pipelined parallel MAC, which is faster than the other parallel MAC. In [29], the authors proposed an approach to make the regular PP array, but this approach has an overhead of area and delay. In [19], architecture is designed by the authors to give a solution for the problem mentioned in [29], they have achieved improvements in area, delay and power. In multiplication, Partial Product Summation (PPS) unit consumes more power, area and delay. To decrease this overhead, many approaches have been proposed by authors. One such method is by Chen et al [30], in which the PPs are partitioned into upper / lower parts. Conventionally more power is consumed in the lower part of the PPS than the upper part because of the glitch effect accumulated from the upper part. To minimize this effect, both the parts are individual added in parallel, which are implemented by many Full adders (FA) and Half adders (HA) in addition with the DRD unit. 7

8 Multiplication of negative numbers involve twos complement numbers. Modified booth Algorithm uses sign bit extension scheme for both signed and unsigned numbers. In [31], the authors have implemented a sign control unit to determine whether the number is signed or unsigned. This control unit is designed with line of multiplexers to configure the PP rows and achieved 0.45% improvements in silicon area overhead. For a short bit width operands involving in two s complement multiplication suffer an overhead of latency problem. F Lamberti et al [32] attempted a solution to overcome the above problem. They designed an architecture to merge the PP generation and PP reduction, thereby reducing the extra PP row caused by signed numbers. PP array is also reduced in the upper part by sign extension prevention scheme [33]. In [34],authors attempted to partition the partial products into 2 parts and their reduction, two longest column in the middle of the PP array contribute max. delay in the PPS unit. The outputs of 2 parts are computed independently in parallel, these values are added using high speed hybrid final adder [35], which is made up of CLA, CSLA. BEC (Binary to Excess-1 Converter) adder provides faster performance than carry save adder, it consumes less area, low power than CSLA [36,37]. 7. Evaluation of Booth Algorithm with Wallace Tree Architecture It is essential to concentrate all the three stages in the multiplication to improve the overall performance of Multiplier. The three stages namely: PP generation and reduction, PP summation and Final addition. Previous sections prove that the Booth algorithm is the best choice for PP generation and reduction. Even though there are overheads due to two s complement representation (negative PP) and sign extension, the algorithm reduces the number of PP rows to N/2 ( N- bit width). In Partial Product summation due to the sign extension and correction factor, Booth algorithm adds more delay. In Wallace Tree structure, Carry save adders are used to sum the PP in order to decrease the PP by a factor of 1.5 at each level [38], hence the delay of the critical path is reduced. The availability of the parallel partial products improves the speed of the partial product summation. We have implemented MBA with conventional FA array and MBA with Wallace tree structure for the comparison using Cadence TSMC 180nm library and the results are tabulated. From Table. 3, Booth encoding scheme with Wallace tree structure gives the better results. Implementation Area (µm 2 ) Delay (ps) Power (nw) MBA with conventional FA array MBA with Wallace tree structure Table 3. Comparison Results (8 bit multiplication) 8

9 Figure 6. Comparison Results for Performance Parameters 8. Summary We have highlighted some of the important modifications for Modified Booth algorithm through this survey. Such modifications are employed to improve the multiplier design in many of the Signal processing and Image processing applications. To summarize this, Modified Booth algorithm is implemented in multiplier design to reduce the delay in partial product generation, but this will result in the overhead like sign extension, correction factor and irregular partial product array. Hence it needs an architectural modification not only in the PP summation unit but also in the final addition for speed improvements. Higher radix Booth algorithm results in good improvements in the delay, however the generation of hard multiples is the bottleneck. 9 High speed adders are employed to produce the hard multiples. Author Contributions Nirmaladevi, R, is working on low power VLSI design architectures. As a part of her research, she has submitted this survey article. Dr. Seshasayanan R, has submitted many of his research articles in the field of Low Power VLSI Design and Reconfigurable Architectures for Image processing. He is actively involved in various projects funded by Ministry of Earth science and Ministry of Defence, India. He has guided the research work of Nirmaladevi R. Conflict of Interest No conflict of interest was reported by the authors. References 1. Chandrakasan A P, Allmon R, Stratakos A, et al. Design of Portable systems, IEEE Custom Integrated Circuits Conference, Booth AD. A Signed Binary Multiplication Technique, Quarterly J. Mechanical and Applied Math. 4, 1951, MacSorley. High Speed Arithmetic in binary computer, Proc. IRE, 1961; 49; Srimani P K, Generalized proof of Modified Booth s algorithm, Computer & Electrical Engg, 1981; 8; Rubinfield.A Proof of the Modified Booth s Algorithm for Multiplications, IEEE Trans Computer. 1988; Villeger D, Oklobdzija V G. Analysis of Booth Encoding Efficiency in Parallel Multipliers using compressors for reduction of Partial products, IEEE Transactions on computers, Gary Bewick, Michael J Flynn. Binary Multiplication using Partially redundant multiples, Technical report CSL-TR -1992, , 8. Seidel. P. M., and Mc Fearin L. D, Secondary Radix Recodings for Higher Radix Multipliers, IEEE Trans. on Comp. 2005,54, Parhami B, Computer Arithmetic Algorithms and HardwareDesigns. Oxford Univ. Press, New York, Koren, Computer Arithmetic Algorithms, 2nd ed. Englewood Cliffs, New Jersey: Prentice Hall, Agrawal D P. Arithmetic algorithms in a negative base, IEEE Transaction on computers, 1975; 24; Agrawal D P, Rao TRN. On multiple Operand addition of signed binary numbers, IEEE Transactions on computers, 1978; Madrid P E, Millar B,Swartzlander E E,. Modified Booth Algorithm for High radix Multiplication, IEEE Computer Design Conf., 1992; Gong Guo, Mohammad Ashtijou. A Note about the correction cycle of High Radix Booth s Multiplication, IEEE Transactions on Computers, 1993

10 15. Dawoud D S, Modified booth Algorithm for higher Radix fixed point Multiplication, IEEE transactions on Computers, Vassiliadis S Hanrahan D. A general proof for overlapped multiple-bit scanning multiplications, IEEE Trans. Computer, 1989; 38; Dieffenderfer J N, Dieeffenderfer J W, Statistical Approach to Booth s Algorith, 1994; 37(1) 18. Oklobdzija VG, Villeger D,Liu SS.A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach, IEEE Trans. Computers, 1996, 45(3) , 19. Shiann_ Rong Kuang, Jiun- Ping Wang and Ang-Yuan Guo, Modified Booth Multipliers with a Regular Partial Product Array, IEEE Transactions on Circuits and Systems-II, Express briefs, 2009; 56(5) 20. Wallace CS.A Suggestion for a Fast Multiplier, IEEE Trans. Computers,1964; 13(2); Dadda L. Some Schemes for Parallel Multipliers, Alta Frequenza,1965; 34; Weinberger A, 4:2 Carry-Save Adder Module, IBM Technical Disclosure Bulletin,1981; 23(8); Song P, De Michelli G. Circuits and Architecture Trade off for high speed multiplication, IEEE J. Solid State Circuits,1991; 26(9) 24. Cooper AR,Parallel architecture Modified booth Multiplier, IEEE proceedings, 1988; 135(3) 25. Robin Sheen, Sandy Wang, Oscal T C Chen, et al. Power Consumption of a 2 s complement Adder Minimised by Effective Dynamic data Ranges, in proc.ieee Int. Symposium on Circuit Systems, 1999; 1; Oscal T C Chen, Sandy Wang, Yi-Wen Wu. Minimisation of Switching Activities of partial products for designing Low power Multipliers, IEEE Trans on VLSI systems,2003; 11(3). 27. Yeh W.-C, Jen C-W. High-Speed Booth Encoded Parallel Multiplier Design, IEEE Trans. on computers,2000,; 49(7). 28. Elguibaly F, A Fast Parallel Multiplier-Accumulator using the Modified Booth Algorithm, IEEE Trans. On Circuits and Systems-II: Analog and Digital Signal Processing, 2000; 47(9). 29. Jung-Yup Kang, Jean-Luc Gaudiot. A Simple High Speed Multiplier Design, IEEE Transactions on Computers,2006; 55(10) 30. Meng-Lin Hsia,Oscal T-C. Chen. Low power multiplier optimized by partial product summation and adder cells, IEEE trans on computers, Qingzheng LI, Guixuan LIANG and Amine BERMAK, A High speed 32 bit signed / unsigned Pipelined Multiplier, 5th IEEE International Symposium on Electronic Design, Test & Applications, Fabrizio Lamberti, Andrikos, Elisardo Antelo, Paolo Montuschi, Reducing the computation time in ( short bit-width) two s complement multipliers, IEEE Transactions on computers, 2011; 60(2) 33. Ercegovac M D, Lang T, Digital Arithmetic, Morgan Kaufmann Publishers, Ramkumar B,Kittur H M. Faster and energy efficient signed multipliers, Hindawi publishing Corporation, VLSI design, Volume Stelling PF,Oklobdzija VG. Design strategies for optimal hybrid final adders in a parallel multiplier, Journal of VLSI signal processing systems for Signal, Image and video tech, 1996; 14 (13); Ramkumar B, Kittur H M.Low power area efficient carry select adder, IEEE transactions on very large scale integration(vlsi) systems, 2012; 20(2 ); Ramkumar B, Kittur H M, Kannan PM, ASIC Implementation of modified faster carry save adder,european Journal of Scientific research, 2010; 42(1); Asadee P. A New MAC Design using High Speed Partial Product Summation Tree, IEEE Transactions on computers,

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018 RESEARCH ARTICLE DESIGN AND ANALYSIS OF RADIX-16 BOOTH PARTIAL PRODUCT GENERATOR FOR 64-BIT BINARY MULTIPLIERS K.Deepthi 1, Dr.T.Lalith Kumar 2 OPEN ACCESS 1 PG Scholar,Dept. Of ECE,Annamacharya Institute