International Journal Of Global Innovations -Vol.1, Issue.II Paper Id: SP-V1-I2-221 ISSN Online:

AN EFFICIENT IMPLEMENTATION OF FLOATING POINT ALGORITHMS #1 SEVAKULA PRASANNA - M.Tech Student, #2 PEDDI ANUDEEP - Assistant Professor, Dept of ECE, MLR INSTITUTE OF TECHNOLOGY, DUNDIGAL, HYD, T.S., INDIA. Abstract: - In computing, floating point describes a method of representing an approximation of a real number in a way that can support a wide range of values. Low power consumption and smaller area are some of the most important criteria for the fabrication of DSP systems and high performance systems. Optimizing the speed and area of the multiplier is a major design issue. However, area and speed are usually conflicting constraints so that improving speed results mostly in larger areas. This paper presents a floating point multiplier using Wallace and Dadda Algorithm of an IEEE 754 single precision floating point multiplier targeted for Xilinx. Improvement in speed multiplication of Dadda and Wallace multiplier is done thereby replacing Look ahead Carry adder. The methodology uses Exponent Calculator, Mantissa multiplier, Sign Calculator, and Normalization unit. Keywords: Dadda Algorithm, Wallace Algorithm, Floating point, multiplication, Spartan 6, ISE 13.1, VHDL language. I.INTRODUCTION Step 4. Finally the result is normalized such that there should be 1 in the MSB of the result (leading one). Single-precision binary floating-point is used due to its wider range over fixed point (of the same bit-width), even if at the Mohamed Al-Ashrfy, Ashraf Salem and Wagdy Anis cost of precision. Our discussion of floating point will focus implementation which handles the overflow and underflow almost exclusively on the IEEE floating-point standard (IEEE cases. Rounding is not implemented to give more precision 754) because of its rapidly increasing acceptance. Multiplying when using the multiplier in a multiply and Accumulate floating point numbers is a critical requirement for DSP (MAC) unit. And an implementation of a floating point applications. The possible ways to represent real numbers in multiplier that supports the IEEE 754-2008 binary interchange binary format floating point numbers are; the IEEE 754 format. The multiplier doesn t implement rounding and just standard [1] represents two floating point formats, Binary presents the significant multiplication result as is (48 bits). interchange format and Decimal interchange format. This In 2013, B. Jeevan, et al, shows a high speed binary floating paper focuses only on single precision normalized binary point multiplier based on Dadda Algorithm. In this interchange format. Representation of single precision improvement in speed of multiplication of mantissa is done normalized binary interchange format is shown in Fig.1. It using Dadda multiplier thereby replacing Carry Save consists of a one bit sign (S), an eight bit exponent (E), and a Multiplier. The design achieves high speed with maximum twenty three bit fraction (M or Mantissa). frequency of 526 MHz compared to existing floating point Z = (-1S) * 2 (E - Bias) * (1.M) (1) multipliers. The floating point multiplier is developed to handle the underflow and overflow cases. The significant Where M = n22 2-1 + n21 2-2 + n20 2-3+ + n1 2-22+ n0 2- multiplication time is reduced by using Dadda Algorithm. 23; Bias = 127. DADDA MULTIPLIER Figure 1 IEEE Single Precision Floating Point Format Floating point multiplication of two numbers is made in four steps: Step1. Exponents of the two numbers are added directly, extra bias is subtracted from the exponent result. Step 2. Significands multiplication of the two numbers using Dadda & Wallace algorithm. Step 3. Calculating the sign by XORing the sign of the two numbers. Dadda proposed a sequence of matrix heights that are predetermined to give the minimum number of reduction stages. To reduce the N by N partial product matrix, dada multiplier develops a sequence of matrix heights that are found by working back from the final two-row matrix. In order to realize the minimum number of reduction stages, the height of each intermediate matrix is limited to the least integer that is no more than 1.5 times the height of its successor. Fig. shows the process of reduction for a dadda multiplier is developed using the following recursive algorithm 1. Let d1=2 and dj+1 = [1.5*dj], where dj is the matrix height for the jth stage from the end. Find the smallest j such that at Paper Available @ ijgis.com OCTOBER/2014 Page 235

least one column of the original partial product matrix has more than dj bits. 2. In the jth stage from the end, employ (3, 2) and (2, 2) counter to obtain a reduced matrix with no more than dj bits in any column. 3. Let j = j-1 and repeat step 2 until a matrix with only two rows is generated. This method of reduction, because it attempts to compress each column, is called a column compression technique. Another advantage of utilizing Dadda multipliers is that it utilizes the minimum number of (3, 2) counters. {Therefore, the number of intermediate stages is set in terms of lower bounds: 2, 3, 4, 6, 9... Figure 2.Dot diagram for 8 by 8 Dadda Multiplier II. RELATED WORK Floating point implementation on FPGAs has been the interest of many researchers. Oklobdzija implemented 32- bit and 64-bit leading zero detector (LZD) circuit using CMOS and ECL technology. In Pavle Belanovic and Miriam Leeser implemented reconfigurable floating point arithmetic unit using VHDL, which is mapped on to Xilinx XCV1000 FPGA. K. Scott Hemmert Keith and D. Underwood implemented open source library of highly optimized floating point units for Xilinx FPGAs. The units are fully IEEE compliant. The double precision add and multiply achieved the operating frequency of 230 MHz using a 10 stage adder pipeline and a 12 stage multiplier pipeline. In floating point adder was implemented using Leading One Predictor (LOP) algorithm instead of Leading One Detector (LOD) algorithm. The main function of the LOP is to predict the leading number of zeros in the addition result, working in parallel with the 2 s complement adder. Dhiraj Sangwan and Mahesh K. Yadav implemented adder/subtractor and multiplication units for floating point arithmetic using VHDL. The floating point multiplication operation was implemented using sequential architecture based on Booth s Radix-4 recoding algorithm. For floating point addition, the sequential addition could have been complex so the combinational architecture has been implemented. In double precision floating point adder/subtractor was implemented using dynamic shifter, LOD, priority encoder. The design achieved the operating frequency of 353 MHz for a latency of 12 clock cycles. In an IEEE-754 single precision pipelined floating point multiplier is implemented on multiple FPGAs (4 Actel A1280). Nabeel Shirazi, Walters, and Peter Athanas implemented custom 16/18 bit three stage pipelined floating point multiplier, that doesn t support rounding modes. L.Louca, T.A.Cook, W.H. Johnson implemented a single precision floating point multiplier by using a digit-serial multiplier and Altera FLEX 8000. The design achieved 2.3 MFlops and doesn t support rounding modes. In a parameterizable floating point multiplier is implemented using five stages pipeline, Handel-C software and Xilinx FPGA. The design achieved the operating frequency of 28MFlops.The floating point unit is implemented using the primitives of Xilinx Spartan-6 FPGA. The design achieved the operating frequency of 100 MHz with a latency of 4 clock cycles. Mohamed Al-Ashrafy, Ashraf Salem, and Wagdy Anis implemented an efficient IEEE-754 single precision floating point multiplier and targeted for Xilinx Spartan-6 FPGA. The multiplier handles the overflow and underflow cases but rounding is not implemented. The design achieves 301 MFLOPs with latency of three clock cycles. The multiplier was verified against Xilinx floating point multiplier core. The double precision floating point adder/subtractor and multiplier presented here is based on IEEE-754 binary floating standard. We have designed a high speed double precision floating point adder/subtractor and multiplier using Verilog language and ported on Xilinx Spartan-6 FPGA. Adder/subtractor and multiplier operates at very high frequencies of 363.76 and 414.714 MFlops and occupies 660 and 648 slices respectively. It handles the overflow, underflow cases and rounding mode. III. FLOATING POINT ALGORITHM Multiplying two numbers in floating point format is done by Adding the exponent of the two numbers then subtracting the bias from their result. Multiplying the significand of the two numbers Calculating the sign by XORing the sign of the two numbers. In order to represent the multiplication result as a normalized number there should be 1 in the MSB of the result (leading one).the following steps are necessary to multiply two floating point numbers. Step1. Multiplying the significand, i.e., (1.M1*1.M2) Step2. Placing the decimal point in the result Step3. Adding the exponents, i.e., (E1 + E2 Bias) Step4. Obtaining the sign i.e. s1 xor s2 Step5. Normalizing the result, i.e., obtaining 1 at the MSB of the results significand Step6. Rounding the result to fit in the available bits Step7. Checking for underflow/overflow occurrence. 3.2. Implementation of Hardware Floating Point multiplication The black box view of floating point multiplier is shown in figure 2. Paper Available @ ijgis.com OCTOBER/2014 Page 236

The Bias is subtracted using an array of ripple borrow subtractors. A normal subtractor has three inputs (minuend (S), subtrahend (T), Borrow in (Bi)) and two outputs (Difference (R), Borrow out (B)). The subtractor logic can be optimized if one of its inputs is a constant value which is our case, where the Bias is constant (1023 10 = 001111111111 2).Table I shows the truth table for a 1-bit subtractor with the input T equal to 1 which we will call one subtractor (OS). Table II shows the truth table for a 1-bit subtractor with the input T equal to 0 which we will call zero subtractor (ZS). Figure2. Block box view of floating point multiplier. 3.2.1 Sign bit calculation Multiplying two numbers results in a negative sign number if one of the multiplied numbers is of a negative value. By the aid of a truth table we find that this can be obtained by XORing the sign of two inputs. 3.2.2. Exponent addition This unsigned adder is responsible for adding the exponent of the first input to the exponent of the second input and subtracting the Bias (1023) from the addition result (i.e. A_exponent + B_exponent -Bias). The result of this stage is called the intermediate exponent. The add operation is done on 8 bits, and there is no need for a quick result because most of the calculation time is spent in the significand multiplication process (multiplying 53 adder and a fast significand multiplier. An 11-bit ripple carry adder is used to add the two input exponents. As shown in Figure 3 a ripple carry adder is a chain of cascaded full adders and one half adder; each full adder has three inputs (A, B, Ci) and two outputs (S, C). The carry out (C) of each adder is fed to the next full adder (i.e each carry bit "ripples" to the next full adder). The addition process produces an 11 bit sum (S10 to S0) and a carry bit (C11). These bits are concatenated to form a 12 bit addition result (S12 to S0) from which the Bias is subtracted. Figure3. Ripple carry adder. 3.2.3. Exponent subtraction Table I S T Bi Difference (R) B 0 1 0 1 1 1 1 0 0 0 0 1 1 0 1 1 1 1 1 1 Table II S T Bi Difference(R) B 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0 1 0 0 Figure 4 shows the Bias subtractor which is a chain of 10 one subtractors (OS) followed by 2 zero subtractors (ZS); the borrow output of each subtractor is fed to the next subtractor. If an underflow occurs then Eresult < 0 and the number is out of the IEEE 754 single precision normalized numbers range; in this case the output is signaled to 0 and an underflow flag is asserted. Figure4. Ripple Borrow Subtractor 3.3. Dadda multiplier The multiplication process begins with the generation of all partial products in parallel using an array of AND gates. The next major steps in the design process are partitioning of the partial products and their reduction process. Each of these steps is elaborated in the following subsections. 3.3.1. Partitioning the partial products We consider two n-bit operands an-1an-2 a2a1a0 and bn- 1bn-2 b2b1b0 for n by n Baugh-Wooley multiplier, the partial products of two n-bit numbers are aibj where i,j go from 0,1,..n-1. The partial products form a matrix of n rows and 2n-1 columns as show in Fig. 1(a). To each partial product we assign a number as shown in Fig. 1 (a), e.g. a0b0 Paper Available @ ijgis.com OCTOBER/2014 Page 237

is given an index 0, a1b0 the index 1 and so on. For convenience we rearrange the partial products as shown in Fig 1(b). The longest column in the middle of the partial products contributes to the maximum delay in the PPST. Therefore in this work we split-up the PPST into two parts as shown in the Fig. 1(c), in which the Part0 and part1 consists of n columns. We then proceed to sum up each column of the two parts in parallel. The summation procedure adopted in this work is described in the next section. 3.3.2. The Dadda based reduction Next the partial products of each part are reduced to two rows by the using (3,2) and (2,2) counters based on the regular Dadda reduction algorithm as shown in Fig. 2 and Fig. 3. The grouping of 3-bits and 2-bits indicates (3,2) and (2,2) counters respectively and the different colors classify the difference between each column, where s and c denote partial sum and partial carry respectively. E.g. the bit positions of 6 and 13 in part0 are added using a (2,2) counter to generate sum s0 and c0. The c0 is carried to the next column where it is to be added up with the sum s1 of a (3,2) counter adding 7, 14 and 21. The carry c1 of (3,2) counter is added to the next column. The final two rows of each part are summed using a Carry Look-ahead Adder (CLA) to form the partial final products of a height of one bit column which indicated at the bottom of Fig. 3 and Fig. 4. The two parallel structures for Fig. 3 and Fig. 4 based on the Dadda approach are shown in Fig. 4, where HA, FA, p0, p1 and p denote Half Adder ((2,2)counter)), Full Adder ((3,2)counter), partial final product from part0, partial final product from part1 and final product respectively. The numerals residing on the HA and FA indicates the position of partial products. The output of part0 and part1 are computed independently in parallel and those values are added using a high speed hybrid final adder to get the final product. However, before we proceed to carry out the final addition with the proposed hybrid adder, we first carry out the final addition with the CLA for both the unpartitioned Dadda multiplier and the partitioned Dadda multiplier. This enables us to evaluate and analyze the effect of partitioning the PPST into two parts. It can be seen that for the 8-bit multiplier, there is no improvement in the speed, area and power. But with the increase in the word size, the improvement in the speed, area and power of the partitioned multipliers increases. There is a maximum of 10.5% improvement in delay for the 64-bit multiplier with only a slight increase in the area and power of 1% and 1.8% respectively. Having clearly demonstrated the reduction in the delay of the Dadda multipliers due to the partitioning of the partial products we now proceed to further enhance the speed of the proposed multiplier. IV. UNDERFLOW/OVERFLOW DETECTION Overflow/underflow means that the result s exponent is too large/small to be represented in the exponent field. The exponent of the result must be 11 bits in size, and must be between 1 and 2046 otherwise the value is not a normalized one. An overflow may occur while adding the double precision floating point multiplier code was a checked using Design Xilinx targeting on Sartan-6. Figure 5 shows the simulation results of high speed double precision floating point multiplier of the bias; resulting in a normal output value (normal operation). An underflow may occur while subtracting the bias to form the intermediate exponent. If the intermediate exponent < 0 then it s an underflow that can never be compensated. If the intermediate exponent = 0 then it s an underflow that may be compensated during by adding 1 to it. Table III shows the effect on result s exponent and overflow/underflow detection. Table III -1021 E result Underflow Can t compensated be During E result =0 Zero May turn to normalized number During 1 E result 2046 Normalized number May result in overflowduring 2047 E result Overflow Can t be compensated When an overflow occurs an overflow flag signal goes high and the result turns to ±Infinity (sign determined according to the sign of the floating point multiplier inputs). When an underflow occurs an underflow flag signal goes high and the result turns to ±Zero (sign determined according to the sign of the floating point multiplier inputs). De normalized numbers are signaled to Zero with the appropriate sign calculated from the inputs and an underflow flag is raised. Assume that E1 and E2 are the exponents of the two numbers A and B respectively the results exponent is calculated by (1). E result = E1 + E2 1023 ------------- (1) E1 and E2 can have the values from 1 to 2046; resulting in E result having values from -1021 (2-1023) to 3069 (4092-1023); but for normalized numbers, Eresult can only have the values from 1 to 2046. V. SIMULATION RESULTS The whole multiplier (top unit) was tested against the Xilinx floating point multiplier core generated by Xilinx core and an efficient implementation of floating point multiplier in. Xilinx core and multiplier in was customized to have two flags to indicate overflow and underflow, and to have a maximum latency of three cycles. Xilinx core implements the round to nearest rounding mode but multiplier doesn t support rounding modes. A test bench is used to generate the stimulus and applies it to the high speed Paper Available @ ijgis.com OCTOBER/2014 Page 238

double precision floating point. International Journal Of Global Innovations -Vol.1, Issue.II Johnson, Implementation of IEEE Single Precision Floating Point Addition and Multiplication on FPGAs, Proceedings of 83 the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM 96), pp. 107 116, 1996. [8] John G. Proakis and Dimitris G. Manolakis (1996), Digital Signal Processing: Principles,.Algorithms and Applications, Third Edition. [9] N. Shirazi, A. Walters, and P.Athanas, Quantitative Analysis of Floating Point Arithmetic on FPGA Based Custom Computing Machines, Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM 95), pp.155 162, 1995. [10] B. Fagin and C. Renard, Field Programmable Gate Arrays and Floating Point Arithmetic, IEEE Transactions on VLSI, vol. 2, no. 3, pp. 365 367, 1994. Figure5. Simulation results of high speed double precision floating point multiplier. VI. CONCLUSION The double precision floating point adder/subtractor and multiplier supports the IEEE-754 binary interchange format, targeted on a Xilinx Spartan-6 FPGA. The designs achieved the operating frequencies of 363.76 MHz and 414.714 MFLOPs with an area of 660 and 648 slices respectively. The adder/subtractor design operates at a frequency which is 3% and 28% more compared to and Xilinx core respectively. As compared to the single precision floating point multiplier and Xilinx core, the multiplier design supports double precision, provides high speed and gives more accuracy. These designs handles the overflow, underflow, rounding mode and various exception conditions. REFERENCES [1] M.Al-Ashrafy, A.Salem and W.Anis, An Efficient Implementation of Floating Point Multiplier Electronics Communications and Photonics Conference(SIECPC) 2011 Saudi International, pp.1-5,2011. [2] F.de Dinechin and B.Pasca. Large multipliers with fewer DSP blocks.in Field Pro- grammable Logic and Applications. IEEE, Aug. 2009. [3] IEEE 754-2008, IEEE Standard for Floating-Point Arithmetic, 2008. [4] Patterson, D. & Hennessy, J. (2005), computer Organization and Design : The Hardware/software Interface, Morgan Kaufmann. [5] B. Lee and N. Burgess, Parameterisable Floatingpoint Operations on FPGA, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems, and Computers, 2002 [6] A. Jaenicke and W.Luk, "Parameterized Floating- Point Arithmetic on FPGAs", Proc. of IEEE ICASSP, 2001, vol. 2, pp. 897-900. [7]L. Louca, T. A. Cook, and W.H. Paper Available @ ijgis.com OCTOBER/2014 Page 239