IMPLEMENTATION OF TWIN PRECISION TECHNIQUE FOR MULTIPLICATION

IMPLEMENTATION OF TWIN PRECISION TECHNIQUE FOR MULTIPLICATION SUNITH KUMAR BANDI #1, M.VINODH KUMAR *2 # ECE department, M.V.G.R College of Engineering, Vizianagaram, Andhra Pradesh, INDIA. 1 sunithjc@gmail.com * ECE department, M.V.G.R College of Engineering, Vizianagaram, Andhra Pradesh, INDIA. 2 vinodh.edu@gmailcom ABSTRACT- Multiplication is a complex arithmetic operation, so it reflects in high power dissipation and high signal propagation delay. The bitwidth of the multiplier is chosen in a way that, it should be at least as wide as the largest operand of the applications that are to be executed. If the operands size is less than the bitwidth of the multiplier, then it leads to unnecessary power dissipation and long delays. Twin-precision technique, which adapts the multiplier to bitwidth of the operands and allows the narrowwidth operations to be computed in parallel, reduces the power dissipation and delays. This technique is applied to unsigned i.e. conventional multiplier and signed multipliers, such as Baugh-Wooley and Modified-Booth multipliers. An adder or summer is a digital circuit that performs addition and used in multipliers for the summation of partial products, and in general full adders are used. Carry-look ahead adder, one of the adders, exchanged with full adder, reduces the power dissipation. Index Terms- Twin-precision, power dissipation, delay, bitwidth, narrow-width, Carry-look ahead adder, full adder. I.INTRODUCTION Embedded systems, Digital signal processing algorithms, typically require a large number of multiplication operations to be performed quickly and repetitively on a set of data. In ALU also multiplication is the important operation, which consumes more power when compare with all arithmetic operations and takes high computational time. When choosing a multiplier for a digital system, the bit-width of the multiplier is required to be at least as wide as the largest operand of the applications that are to be executed on that digital system. The bit-width of the multiplier is, therefore, often much larger than the data represented inside the operands, which leads to unnecessarily high power dissipation and unnecessary long delay. Narrow-width operations have been explored to save power, through operand guarding. In operand guarding the most significant bits of the operands are not switched, thus power is saved in the arithmetic unit when multiple narrow-width operations are computed consecutively. Power dissipation can be reduced by gating of the upper part of narrowwidth operands. Narrow-width operands have also been used to increase instruction throughput, by computing several narrow-width operations in parallel on a full-width data path. Two-dimensional operand guarding for array multipliers are introduced, which results in a reduction of power dissipation compared to a conventional array multiplier. Achieving double throughput for a multiplier is not as straightforward as, for example, in an adder, where the carry chain can be cut at the appropriate place to achieve narrow-width additions. So presenting the twin-precision technique, this offers the same power reduction as operand guarding and performing the doublethroughput multiplications. This technique is an efficient way of achieving double throughput in a multiplier with only a small delay penalty. This technique here is, applied to both signed and unsigned multipliers. Fast adders are key elements in digital circuits, e.g., multipliers and digital signal processing (DSP) chips. Many efforts have been focused on the improvement of adder designs. A carry look-ahead adder is a type of adder used in digital logic. The carry look-ahead adder calculates one or more carry bits before the sum, which reduces the wait time to calculate the result of the larger value bits. II.CARRY LOOK AHEAD ADDER A ripple-carry adder works in the same way as pencil-and-paper methods of addition. Starting at the rightmost (least significant) digit position, the two corresponding digits are added and a result obtained. It is also possible that there may be a carry out of this digit position. Accordingly all digit positions other than the rightmost need to take into account the possibility of having to add an extra 1, from a carry that has come in from the next position to the right. Carry look-ahead logic uses the concepts of generating and propagating carries. Although in the context of a carry look-ahead adder, it is most natural to think of generating and propagating in the context of binary addition. Carry look-ahead depends on two things: 1) Calculating, for each digit position, whether that position is going to propagate a carry if one comes in from the right. 2) Combining these calculated values so as to be able to deduce quickly whether, for each group of digits, that group is going to propagate a carry that comes in from the right. The logic for the generate (g) and propagate (p) values are given below. Note that the given below equations are for 4-bit carry look ahead

adder. So, Numeric value determines the signal from the circuit above, starting from 0 on the far left to 3 on the far right. To determine whether a bit pair will generate a carry, the following logic works: Gi = (Ai. Bi). To determine whether a bit pair will propagate a carry, either of the following logic statements works: Pi = (Ai Bi). The carry bits are given by Ci+1 = Gi + (Pi. Ci). For e.g. C 1=G 0 + (P 0.C 0) ; C 2=G 1 + (P 1.C 1) ; C 3=G 2 + (P 2.C 2) ; C 4=G 3 + (P 3.C 3).Sum bits are given by Si = (Pi Ci). III.TWIN PRECISION FUNDAMENTALS Initially, presenting the twin-precision technique using an illustration of unsigned binary multiplication. Let us look at what happens when the precision of the operands is smaller than the multiplier we intend to use. In this case, the most significant bits of the operands will only contain zeros, thus large parts of the partial-product array will consist of zeros. Further, the summation of the most significant part of the partialproduct array and the most significant bits of the final result will only consist of zeros. An illustration of an 8-bit multiplication, where the precision of the operands is four bits, is shown in Fig1. Fig.1 Illustration when the precision of the operands is smaller than the precision of the multiplication. In order to be able to use the partial products in the most significant part, there has to be away of setting their values. For this we can use the most significant bits of the operands, since these are not carrying any useful information. If we are only looking at the upper half of the operands, the partial products generated from these bits are the ones shown in black in Fig2. By setting the other partial products to zero, it is possible to perform two multiplications within the same partial-product array, without changing the way the summation of the partial-product array is done. How the partial products, shown in gray, can be set to zero will be presented in the implementation section later on. partial-product array with size, shown in white, and the multiplication in the Most Significant Part (MSP) with size, shown in black. It is functionally possible to partition the multiplier into even more multiplications. For example, it would be possible to partition a 64-bit multiplier into four 16-bit multiplications. Given a number of K lowprecision multiplications, their total size needs to be smaller or equal to the full-precision multiplication. The precision of the two smaller multiplications will be equal and half the precision (N/2) of the full precision (N) of the multiplier For setting unwanted partial products to zero, this is easily accomplished by changing the two-input AND gate to a three-input AND gate, where the extra input can be used for a control signal. Of course, only the AND gates of the partial products that have to be set to zero need to be changed to a three-input version. During normal operation when a full-precision multiplication is executed the control signal is set to high, thus all partial products are generated as normal and the array of adders will sum them together and create the final result. When the control signal is set to low the unwanted partial products will become zero. Since the summation of the partial products is not overlapping, there is no need to modify the array of adders. The array of adders will produce the result of the two multiplications in the upper and lower part of the final output. IV.TWIN PRECISION BAUGH WOOLEY IMPLEMENTATION The BW algorithm is a relative straightforward way of performing signed multiplications. Fig.3 illustrates the algorithm for an 8- bit case, where the partial-product array has been reorganized according to the scheme of Hatamian. The creation of the reorganized partial-product array comprises three steps: 1) the most significant partial product of the first N-1 rows and the last row of partial products except the most significant has to be negated, 2) A constant one is added to the Nth column, 3) The most significant bit (MSB) of the final result is negated. Fig.3 Illustration of a signed 8-bit multiplication, using the Baugh Wooley Fig.2 Illustration of an unsigned 8-bit multiplication, using twin precision The condition for twin precision is given by, N N LSP + N MSP. To be able to distinguish between the two smaller multiplications, they are referred to as the multiplication in the Least Significant Part (LSP) of the It is not as easy to deploy the twin-precision technique onto a BW multiplication as it is for the unsigned multiplication, where only parts of the partial products need to be set to zero. To be able to compute two N/2 signed multiplications, it is necessary to make a more sophisticated modification of the partial-product array. Fig. 8 illustrates an

8-bit BW multiplication, in which two 4-bit multiplications have been depicted in white and black. When comparing the illustration of Fig.3 with that of Fig.4 one can see that the only modification needed to compute the 4-bit multiplication in the MSP of the array is an extra sign bit 1 in column S 12. For the 4-bit multiplication in the LSP of the array, there is a need for some more modifications. In the active partial-product array of the 4-bit LSP multiplication (shown in white), the most significant partial product of all rows, except the last, needs to be negated. For the last row it is the opposite, here all partial products, except the most significant, are negated. Also for this multiplication a sign bit 1 is needed, but this time in column S 4. Finally the MSB of the results needs to be negated to get the correct result of the two 4-bit multiplications. V. TWIN PRECISION MODIFIED BOOTH IMPLEMENTATION The Modified-Booth (MB) algorithm takes three bits at a time of the multiplier as shown in the fig.5. Then we are guaranteed that only half the number of partial products will be generated compared to a conventional partial product generation using two-input AND gates. With a fixed number of partial products the MB algorithm is suitable for hardware implementation. Fig.5 shows which parts of the multiplier that are encoded and used to recode the multiplicand into a row of partial products. An MB multiplier works internally with two s complement representation of the partial products, in order to be able to multiply the encoded data with the multiplicand. Fig. 5 8-bit Modified Booth encoding. Fig.4 Illustration of a twin precision multiplication, using the Baugh Wooley To allow for the full-precision multiplication of size N to coexist with two multiplications of size N/2 in the same multiplier, it is necessary to modify the partial-product generation. For the 4-bit multiplication in the MSP of the array all that is needed is to add a control signal that can be set to high, when the 4-bit multiplication is to be computed and to low, when the full precision multiplication is to be computed. To compute the 4-bit multiplication in the LSP of the array, certain partial products need to be negated. This can easily be accomplished by changing the two-input AND gate that generates the partial product to a two-input NAND gate followed by an XOR gate. The second input of the XOR gate can then be used to invert the output of the NAND gate. When computing the 4-bit LSP multiplication, the control input to the XOR gate is set to low making it work as a buffer. When computing a full-precision multiplication the same signal is set to high making the XOR work as an inverter. Finally the MSB of the result needs to be negated and this can again be achieved by using an XOR gate together with an inverted version of the control signal for the XOR gates used in the partial-product generation. Setting unwanted partial products to zero can be done by three-input AND gates as for the unsigned case. The changes that have done for the twin precision Baugh Wooley multiplier those of conventional Baugh Wooley multiplier are: 1) At the columns 4, 8 and 12, an extra sign bit, 1 is to be added. 2) XOR gates have been added at the output of column 7 and 15 so that they can be inverted as shown in fig.4. The procedure for the Modified Booth method is, 1) Pad the LSB with one zero. 2) Pad the MSB with 2 zeros if n is even and 1 zero if n is odd. 3) Divide the multiplier into overlapping groups of 3-bits. 4) Determine partial product scale factor from Modified Booth encoding table. 5) Compute the Multiplicand Multiples. 6) Sum the Partial Products obtained. Fig.6 Modified Booth encoding table. By the Modified Booth encoding method, Shown in fig.5, the three bit pattern thus obtained is gone through the Modified Booth encoding table shown in fig.6 to generate the 4 partial products as shown in fig.7. Thus the obtained partial products are summed to give the final sum. Fig.7 Illustration of signed 8-bit multiplication using Modified Booth

It is possible to take the partial products from the full-precision MB multiplication and use only the partial products that are of interest for the narrow-width MB multiplications. The reason for this is that all partial products are computed the same way and there exist a special case that needs to be handled. The partial products generated in the Modified Booth multiplier are also used in this, partial product row-1, partial product row- 2 and partial product row-4 are same. The remained partial products row-3 is varied. These can be generated using a special case. Fig.8 Encoding scheme for two 4-bit multiplications. The special case is, as shown in the fig.8, the third row of the pattern is changed. When operating in the twin precision mode an extra bit 0 is appended to the X-array. This can be achieved by using a two input AND gate, where one input is twin-precision mode (T) and the other bit is X (3). When operating in normal mode, the value of X (3) remains same, because it is 1. This is same to that of a normal method. So the remaining three bits are unchanged and so we are using in narrow width operations also. Another case to be considered is, when it is using in the twin mode, the partial product row-3, that is generated by pattern-3 is to be left shifted twice and added to the partial product row-4 to get the correct result. In the normal mode it should be remained same. So, these are the two considerations to follow strictly for the twin-precision technique. Fig.9 Illustration of twin precision multiplier using the Modified Booth As shown in the fig.9, two 4-bit multiplications are multiplied at a time concurrently using the same 8-bit multiplier. One 4-bit multiplication, shown in white, is computed in parallel with a second 4-bit multiplication, shown in grey. Where the other partial products shown in black, are unused partial products where all the bits are zeros. VI.RESULTS One of the goals of the twin-precision technique is to keep the performance degradation of the multiplier s full-precision operation at a minimum. Results are compared between the multipliers implemented by the two different adders, ripple carry adder with carry look ahead adder. The terms, delay, power dissipation and energy per operation of the multipliers are briefly discussed below. A.DELAY It is clear that the delay is not greatly degraded by the introduction of the twin-precision technique. Fig.10 shows the minimum delay for conventional and twin-precision multipliers. As can be seen the difference in timing is not large. The conventional multipliers and twin precision multipliers implemented with the two adders has almost the same delay. Clearly the delay of the Baugh Wooley multiplier is decreased nearly to 2nsec by using the carry look ahead adder. In the remaining cases, all the multipliers had almost same delays. Fig.10 Comparison of delay between the multipliers with the two adders. B. POWER DISSIPATION Clearly by the power results obtained, shown in the fig.11, the multipliers implemented with carry look ahead adder have consumed less power compared to that of multipliers with ripple carry adder. So, the multipliers implemented with carry look ahead adder are more efficient in terms of power. If we compare the results between the signed multipliers, Modified Booth multipliers have good power results than the Baugh Wooley multiplier. Twin precision Modified Booth multiplier has less power dissipation compared to the conventional Modified Booth multiplier, where as the other multipliers, Baugh Wooley and conventional multipliers, had high twin precision power than the respective conventional multipliers. By using the multipliers with carry look ahead adder, almost the power dissipation is reduced to (5.3% - 12.6%). Fig.11 Comparison of power dissipation between the multipliers with the two adders.

C. ENERGY PER OPERATION One of the reasons to choose a twin-precision implementation instead of a conventional multiplier implementation is that energy can be saved by reducing the precision of the multiplier, when operating on narrow-width operands. To compute the energy-per-operation, we extracted delay and power values for the multipliers designed. We then computed the energy, where energy = delay * power, for each delay and power pair and from these we chose the smallest energy for each multiplier design and size. From the obtained results as shown in Fig.12, the multipliers with carry look ahead adders have less energy per operation compared to the multipliers with ripple carry adders. As energy per operation is the product of delay and power dissipation, less the energy consumption is more preferable in digital systems. So, carry look ahead adders are comparatively better than the ripple carry adder multipliers. The less energy per operation (5.6% - 34%) is achieved by using the carry look ahead adders instead of ripple carry adders in the multipliers. The twinprecision multiplier has higher energy-per-operation, than a conventional implementation. Energy for twin precision Modified Booth multiplier have less compared to conventional Modified Booth multiplier, where as the other twin precision multipliers had high energy than the conventional multipliers irrespective of the adders used in the multipliers. Fig.12 Comparison of energy per operation between the multipliers with the two adders. VII.CONCLUSION The presented twin-precision technique allows for flexible architectural solutions, where the variation in operand bit-width that is common in most applications can be harnessed to decrease power dissipation and to increase throughput of multiplications. Due to the simplicity of the implementation, only minor modifications are needed to comply with the twin-precision technique. This makes for an efficient twin-precision implementation, capable of both signed and unsigned multiplications. By using the carry look ahead adder instead of ripple carry adder given the efficient energy per operation and the power reduction. Modified Booth multiplier has given the best results compared to that of Baugh Wooley multiplier. A clear trend is that a Modified Booth implementation is more power efficient than a Baugh Wooley implementation. The results of the comparison of the twin-precision implementations with their conventional counterparts in terms of delay are, the twin precision implementations are comparatively slow. And if the comparison made between the two adders, they almost takes the same time to give the final sum. When we consider the power dissipation, the twin-precision implementation dissipates more power than conventional multipliers. Multiplier with carry look ahead adder dissipates less power than the multipliers with ripple carry adder. Twin precision technique generally has higher energy per operation compared to the conventional multipliers. In the adders, carry look ahead adder has less energy comparatively than the ripple carry adder. Finally, carry look ahead adder has the best results compared to the ripple carry adder. REFERENCES [1] Magnus Själander and Per Larsson-Edefors, Multiplication Acceleration Through Twin Precision, in proc. IEEE Transactions On Very Large Scale Integration (VLSI) Systems, Vol. 17, No. 9, September 2009, pp.1233-1246. [2] Jiun-Ping Wang, Shiann-Rong Kuang, Member, IEEE, and Shish-Chang Liang, High-Accuracy Fixed-Width Modified Booth Multipliers for Lossy Applications, in proc. IEEE Transactions On Very Large Scale Integration (VLSI) Systems. [3] Magnus Själander and Per Larsson-Edefors, High-Speed and Low-Power Multipliers Using the Baugh-Wooley Algorithm and HPM Reduction Tree. [4] Magnus Själander and Per Larsson-Edefors, The Case for HPM-Based Baugh-Wooley Multipliers. [5] Ahmed Sayed and Hussain Al-Asaad, Survey And Evaluation Of Low-Power Full-Adder Cells. [6] Carry-Lookahead Adders, Supplement to logic and computer design fundamentals, 4 th edition, Pearson Education 2008. [7] Jain, Switching Theory and Logic Design, Tata McGraw- Hill, 2003 Edition.