A comparative study of Floating Point Multipliers Using Ripple Carry Adder and Carry Look Ahead Adder 1 Jaidev Dalvi, 2 Shreya Mahajan, 3 Saya Mogra, 4 Akanksha Warrier, 5 Darshana Sankhe 1,2,3,4,5 Department of Electronics, D. J. Sanghvi College of Engineering, Mumbai, India Abstract This paper presents a comparative study of floating point multipliers using two different adders to implement a Vedic Algorithm for multiplication. Floating point numbers are represented here using the IEEE-754 single precision format. The adders used here are the Ripple Carry Adder and the Carry Look Ahead Adder. The algorithm used for multiplication is the Urdhav Tiryakbhyam Algorithm based on ancient concepts of Vedic Mathematics. A detailed analysis of the delay in each of these implementations is presented. They were coded in VHDL, simulated using Altera Quartus Prime (version 16.0 Standard Edition) and synthesized. Keywords floating point multiplication, IEEE-754 single precision format, Urdhya Tiryakbhyam, Ripple Carry Adder Carry Look Ahead Adder, VHDL, Altera Quartus I. INTRODUCTION The floating point format is crucial for representing numbers when the data to be represented spans a wide range of values and hence, it has found applications in a varied array of fields, including digital signal processing, digital image processing and embedded systems. The floating point representation is an unencoded member of a floating-point format, representing a finite number, a signed infinity, a quiet NaN, or a signalling NaN. Finite numbers are represented by three components: a sign, an exponent, and a significand; its numerical value is the signed product of its significand and its radix raised to the power of its exponent [2]. Multiplication and addition are the most frequently used floating point arithmetic operations. Today, most computational functions like those used in image processing and signal processing involve recursive multiplication on a large dataset, which needs to be performed within nanoseconds in order to ensure that the processing occurs in real-time. Since floating point multiplication entails multiplication of the mantissas, as well as addition of the exponents, accurate and speed optimised multipliers and adders are extremely essential for developing an efficient floating point multiplier. After implementing two types of floating point multipliers, both using a Vedic multiplication algorithm for multiplying the mantissas with one using ripple carry adders for all the additions (exponent addition as well as carry look ahead adders, we have analysed the delay in each case and presented our findings here. The Vedic Multiplier, based on the Urdhav Tiryakbhyam algorithm, is faster than a conventional multiplier [9]. II. IEEE -754 STANDARD Over the years, there have been have many formats for floating point representation. But the most significant one has been defined by the IEEE 754 standard. It was adopted in 1985 and revised in 2008. Currently it is used by all processors and coprocessors [1]. It encapsulates both decimal and binary floating point representations. However, in this paper we emphasize on binary representation only. IEEE 754 Binary Floating Point single precision is a 32-bit representation, while the double precision format is a 64 representation. The 32- bit representation consists of three parts. The sign of the number is given in the first bit. If the number is positive then this bit is 0, if the number is negative then this bit is 1.The next 8 bits are a representation of the exponent to the base 2[2]. The value stored in the exponent field is an unsigned integer E'. It is stored in an excess-127 format. Which means that E' is in the range 0<=E'<=255. Thus, the signed exponent E is represented as E'=E+127. This means that E' is in the range of 0 to 255 whereas E is in the range of -126 to +127. The mantissa is represented in the last 23 bits. Hence, a 2^23 precision. The MSB of the mantissa is always equal to 1, which is known as binary Normalization. This bit is not a part of the 23 bits used for representation. It is implicitly assumed.[3] The standard gives accurate representation of positive and negative infinity, positive and negative zero and also sets exception flags in the case of underflow, overflow, divide by zero, invalid, inexact. An interrupt routine can be set for any of the exception flags. This can be system defined or user defined. The standard also defines representations for positive and negative infinity, a "negative zero", five exceptions to handle invalid results like division by zero, special values called NaNs for representing those exceptions, denormal numbers to represent numbers smaller than shown above, and four rounding modes. additions within the multiplier), while the other using 6
Fig. 1 : IEEE 754 Single Precision format Normalized scale Exponen t (E) Signifi cand (N) Value/Commen ts 255 Not Does not equal represent a to 0 number 255 0 - to + depending on sign bit 0<E<25 Any 5 0 0 0 depending on sign bit Table 1 : IEEE 754 Format III. FLOATING POINT MULTIPLICATION consider a 3x3 basic block. For a 3 bit operation, first we multiply the Least Significant bit (LSB) of the Multiplicand and Multiplier. This gives us the LSB of the Result. We then perform a cross wise multiplication of the Least two significant bits of both the multiplier and multiplicand. This crosswise multiplication is then performed for all three bits, followed by the two most significant bits (MSB), followed by the MSBs of multiplicand and multiplier. In each stage, the carry from the previous stage is added to the output. We eventually get a 6 bit result from the 3 bit numbers. [3] Fig. 3 : The 3 bit macro using the Urdhav Tiryakbhyam algorithm [3] The 3 bit block is then inculcated into a 6 bit block. We carry out the same process for 6x6 crosswise multiplication. The obtained result is a 12 bit result. 6 bit blocks are then used to make the 12 bit block. We then perform a 12x12 bit crosswise multiplication to obtain the final 24 bit mantissa result [5]. Fig. 2 : Algorithm for floating point multiplication [10] A. MANTISSA MULTIPLICATION A5 A4 A3 A2 A1 A0 B5 B4 B3 B2 B1 B0 C D X X X E X X X F X X X X X X RESULT Fig. 4 : Additions performed within the multiplier at each stage The multiplication of two floating point numbers is a stepwise process [4]. Firstly the two numbers are converted to the IEEE 754 Standard Format of representation. Then, the significands are multiplied. The 23 bit mantissa of the given number is represented with 1 at the MSB (as is required by the format), giving us two 24 bit numbers. These numbers are then multiplied using the Urdhav Tiryakbhyam Sutra. The algorithm follows a vertical and cross wise mechanism for the multiplication of two numbers. Derived from ancient Indian concepts of Math, it multiplies two numbers in a short amount of time. In the Urdhav Tiryakbhyam Algorithm, we first begin Where, C= A2A1A0 x B2B1B0 D= A5A4A3 x B2B1B0 E= A2A1A0 x B5B4B3 F= A5A4A3 x B5B4B3 B. EXPONENT ADDITION As seen from the operation method for floating point multiplication, an important stage is the addition of exponents. with a smaller block for multiplication. Let us initially 7
The biased exponents must first be made unbiased. To get the original exponent, from the biased exponent, we need to subtract 127. We then add the two exponents. To get a biased exponent output, we add 127 to the result. This can be mathematically expressed as, Output = (E1 127) + (E2 127) + 127 = E1 + E2 127 Exponent addition is implemented with the use of an adder. Since exponent addition involves two or more n bit numbers, we can use a Ripple Carry Adder or a Carry Look Ahead Adder. We perform a comparative study of the delay resulted by the choice of adder. In all summations as demanded by the algorithm and as required for calculating the final biased exponent, we use the same Adder (Ripple Carry or Carry Look Ahead). 1. RIPPLE CARRY ADDER A ripple carry adder is a kind of logic circuit that ripples the Carry bit through its various stages. Multiple Full adders are cascaded to add two N bit numbers. Each stage, apart from the first stage, has a Carry In bit. The Carry Out of the previous stage serves as the Carry In for the succeeding stage. The Sum and Carry Out of each stage is only valid, when the Carry In of the stage occurs. This contributes to a delay in propagation. There is a lapse between the input and occurrence of the output. [1] While this has a considerable delay, its simplicity is its advantage. The layout of the ripple carry adder is very easy to understand. Gate delay for this adder can be calculated easily by inspecting the circuit. If we consider the delay in terms of X units of time. For an n-bit adder, there will hence be a delay of, n.x While the adders are working in parallel, the Carry must ripple their way from the LSB and work their way to the MSB. It takes X units for the carry out of the rightmost column to make it as input to the adder in the column to its immediate left. 2.CARRY LOOK AHEAD ADDER A ripple carry adder, is slowed down due to the propagation of carry through each stage. The sum and carry outputs of a stage cannot be produced till the input carry occurs. This delay caused is known as the carry propagation delay. Other arithmetic operations, such as multiplication or division consist of an adder segment within them. The speed limitation hence slows down complex arithmetic operations by a considerable amount. A carry look ahead adder solves this issue by calculating the carry beforehand [7]. As we know there are two conditions that generate a possible carry: when both bits are 1 When one of the two bits is 1 and the carry-in (carry out from last stage) is 1. A CLA adder, first calculates if, a particular digit is going to propagate a carry, if a carry comes in from the previous stage. This is then evaluated as a group i.e. whether the group is going to propagate the carry or not. For a carry bit C1, C1=G0 + (P0.C0) Where, G0=a. b P0=a xor b C2=G1+P1 (G0 + (P0.C0)). Here we substituted the equation from the previous stage. Generalizing, we can get, Ci = G P0P1 Pi 1 + G0P1P2 Pi 1 + G1P2P3 Pi 1 + + Gi 2Pi 1 + Gi 1 = X i 1 j= 1 Gji Y 1 k=j+1 Pk [1] A CLA adder is faster and unique since it calculates multiple carries in parallel. Fig. 5 : Ripple Carry adder For addition of two 8 bit numbers, the Ripple Carry Adder gives a delay of 9.86ns. Even for large numbers, the complexity of this form of addition stays simple. Stages of super-groups are added when needed. Considering the increase in digits, the corresponding increase in the number of gates is quite feasible. 8
been performed using only an RCA. We found that the overall delay in this implementation is 34.884ns. Fig. 6 : Carry Look Ahead adder For addition of two 8 bit numbers, the Carry Look Ahead Adder, gives a delay of 9.69 ns. As observed, this produces a more efficient output than a Ripple Carry Adder due to the speed optimization. The difference in delay is 0.17ns. In successive iterations, this cumulatively optimizes the speed by a large margin. C. SIGN CALCULATION To get the output sign bit, EXOR operation is performed between the input sign bits. This can be expressed as, S= s1 XOR s2 D. NORMALIZATION The result obtained is normalized. We obtain the normalized 23 bits and a biased exponent. For doing so, first the result number is checked for a leading 1. For a leading one for a particular combination assumed as the 46 th bit, the exponent result is first incremented by1. We then take the Mantissa M [45:23] as the normalized set of bits. Alternatively, without incrementing the exponent result, we take Mantissa M [44:22] as the normalized set of bits [8]. IV. EXPERIMENTAL RESULTS The proposed Floating Point multiplier has been coded in VHDL, synthesized and simulated using Altera Quartus where we selected the device Altera Cyclone IV. Fig. 8 : Vedic multiplier using CLA adders We also tested the same floating point multiplier using a CLA Adder. Here we found a delay of 27.3 ns. This resultant delay can be attributed to the faster computational features of the Carry Look Ahead Adder. There was a difference in delay of 7.584 ns in both implementations. The given output displays a multiplication operation between the following two 8 bit numbers: input1 = 123.456 input2 = 456.789, Which corresponds to input1= 01000010111101101110100101111001 input2= 01000011111001000110010011111110 In IEEE 754, Single Precision Floating Point Format, in both the multipliers. It was also observed that, the more the number of 1s in the values being multiplied, the greater the delay for their multiplication and hence, greater the difference in the delays. V. CONCLUSION This paper presents an efficient implementation of the Floating Point multiplier using a Vedic Algorithm. Also, we compared its implementation using two different adders on the basis of delay in computation of output. We observed that floating point multiplier based on Urdhav Tiryakbhyam Sutra implemented using a Carry Look Ahead adder gave a faster output as compared to that using a Ripple Carry Adder. Fig. 7 : Vedic multiplier using Ripple Carry adders As part of the implementation for the above output, the multiplier has been designed using a Ripple Carry Adder. All addition operations included in the Vedic REFERENCES [1]. Al-Ashrafy, Mohamed, Ashraf Salem, and Wagdy Anis. "An efficient implementation of floating point multiplier." Electronics, Algorithm along with the addition of exponents have 9
Communications and Photonics Conference (SIECPC), 2011 Saudi International. IEEE, 2011. [2]. IEEE Standard for Floating-Point Arithmetic," in IEEE Std 754-2008, vol., no., pp.1-70, Aug. 29 2008 [3]. Paldurai, K., and K. Hariharan. "FPGA implementation of delay optimized single precision floating point multiplier." Advanced Computing and Communication Systems, 2015 International Conference on. IEEE, 2015. [4]. N. Shirazi, A. Walters, and P. Athanas, Quantitative Analysis of Floating Point Arithmetic on FPGA Based Custom Computing Machines, Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM 95), pp.155 162, 1995. [5]. Arish, S., and R. K. Sharma. "Run-time reconfigurable multi-precision floating point multiplier design for high speed, low-power applications." Signal Processing and Integrated Networks (SPIN), 2015 2nd International Conference on. IEEE, 2015. [7]. Kumar, Padala Siva, et al. "Efficient Floating Point Multiplier Implementation via Carry Save Multiplier." Middle-East Journal of Scientific Research 22.11 (2014): 1652-1657. [8]. Ganesh, B. Sreenivasa, J. E. N. Abhilash, and G. Rajesh Kumar. "Design and Implementation of Floating Point Multiplier for Better Timing Performance." International Journal of Advanced Research in Computer Engineering & Technology 1.7(2012). [9]. Kumar, G. Ganesh, and V. Charishma. "Design of high speed vedic multiplier using vedic mathematics techniques." International Journal of Scientific and Research Publications 2.3 (2012): 1. [10]. Stallings, William. Computer organization and architecture designing for performance. Pearson Education India, 2000. 10