ISSN Vol.02, Issue.11, December-2014, Pages:

ISSN 2322-0929 Vol.02, Issue.11, December-2014, Pages:1208-1212 www.ijvdcs.org Implementation of Area Optimized Floating Point Unit using Verilog G.RAJA SEKHAR 1, M.SRIHARI 2 1 PG Scholar, Dept of ECE, Kakinada Institute of Engineering and Technology, Kakinada, AP, India, Email: grajasekhar98@gmail.com. 2 Assistant Professor, Dept of ECE, Kakinada Institute of Engineering and Technology, Kakinada, AP, India, Email: msrihari,es@gmail.com. Abstract: To represent very large or small values, large range is required as the integer representation is no longer appropriate. These values can be represented using the IEEE-754 standard based floating point representation. High speed implementation of a floating point arithmetic unit which can perform addition, subtraction, multiplication on 64-bit operands that use the IEEE 754 standard. All the functions are built by several changes incorporated that can improve overall latency, and if pipelined then higher throughput. The algorithms are modeled in Verilog HDL and the RTL code for adder, subtractor, and multiplier are synthesized using Xilinx synthesizer. The functionality is verified using Xilinx ISE simulator 8.2i. Finally, optimized FPU is designed by considering both architectural and system-level issues. Keywords: Floating Point, Verilog HDL, Xilinx ISE Simulator, Floating Point Unit. I. INTRODUCTION There are several ways to represent real numbers on computers. Floating point representation, in particular the standard IEEE format is by far the most common way of representing an approximation to real numbers in computers because it is efficiently handled in most large computer processors. Binary fixed point is usually used in specialpurpose applications on embedded processors that can only do integer arithmetic, but decimal fixed point is common in commercial applications. Fixed point places a radix point somewhere in the middle of the digits, and is eqallent to using integers that represent portions of some unit. Fixed point has a fixed window of representation, which limits it from representing very large or very small numbers. Also, fixed-point is prone to a loss of precision when two large numbers are divided. Floating point solves a number of representation problems. Floating point employs a sort of sliding window of precision appropriate to the scale of the number. This allows it to represent numbers from 1,000,000, 000,000 to 0.00000000000001 with ease. The advantage of floating-point representation over fixed-point (and integer) representation is that it can support a much wider range of values. Floating-point representation is the most common solution basically represents real value in scientific notation. Scientific notation represents numbers as a base number and an exponent. For example, 123.456 could be represented as 1.23456 x 10 2. Floating-point numbers are typically packed in to a computer datum as the sign bit, the exponent field, and the significand (mantissa), from left to right. In computing, floating-point describes a system for representing numbers that would be too large or too small to be represented as integers. Numbers are in general represented approximately to fixed number of significant digits and scaled using an exponent. The for the scaling is normally 2, 10or 16. The typical number that can be represented exactly is of the form Significant digits x base exponent (1) The term floating-point refers to the fact that the radix point (Decimal point or more commonly used in computers, Binary point) can float ; that is, it can be placed any where relative to the significant digits of the number. This position is indicated separately in the internal representation, and floating-point representation can thus be thought of as a computer realization of scientific notation. II. FLOATING POINT ALGORITHMS A. Addition and Subtraction The conventional floating-point addition algorithm consists of five stages - exponent difference, pre-alignment, addition, normalization and rounding. Given floating-point numbers X 1 = (s 1, e 1, f 1 ) and X 2 = (s 2, e 2, f 2 ), the stages for computing X 1 + X 2 are described as follows: 1. Find exponent difference d = e 1 -e 2. If e1 < e2, swap position of mantissas. Set larger exponent as tentative exponent of result. 2. Pre-align mantissas by shifting smaller mantissa right by d bits. 3. Add or subtract mantissas to get tentative result for mantissa. 4. Normalization. If there are leading-zeros in the Copyright @ 2014 IJVDCS. All rights reserved.

tentative result, shift result left and decrement exponent by the number of leading zeros. If tentative result overflows, shift right and increment exponent by 1-bit. 5. Round mantissa result. If it overflows due to rounding, shift right and increment exponent by 1-bit. All arithmetic operations have been carried out in four separate modules one for addition and subtraction and one each for multiplication, division and square root as shown in figure. In this unit one can select operation to be performed on the 32-bit operands by a 3-bit op-code and the same op-code selects the output from that particular module and connects it to the final output of the unit. Ready bit will be high when result will be available to be taken from the output. The algorithm used for adding or subtracting floating point numbers is shown in the figure. The pre-alignment and normalization stages require large shifters. The prealignment stage requires a right shifter that is twice the number of mantissa bits (i.e., 48-bits for single-precision, 106-bits for double-precision) because the bits shifted out have to be maintained to generate the guard, round and sticky bits needed for rounding. The shifter only needs to shift right by up to 24 places for single-precision or 53 places for double-precision. The normalization stage requires a left shifter equal to the number of mantissa bits plus 1 (to shift in the guard bit), i.e., 25-bits for single-precision and 54-bits for double precision. G.RAJA SEKHAR, M.SRIHARI which operand is larger, compare only the exponents of the two operands, so in fact, if the exponents are equal, the smaller operand might populate the mantissa_large and exponent_large registers. This is not an issue because the reason the operands are compared is to find the operand with the larger exponent, so that the mantissa of the operand with the smaller exponent can be right shifted before performing the addition. If the exponents are equal, the mantissas are added without shifting. The inter-connection of sub-modules of double precession floating point adder/subtractor is shown in Figure 4. Subtraction is similar to addition in that you need to calculate the difference in the exponents between the two operands, and then shift the mantissa of the smaller exponent to the right before subtracting. The flow chart of double precision floating point adder /subtractor is shown in Figure 1. B. Multiplication In discussing floating-point multiplication, by complies with the IEEE 754 Standard, the two mantissas are to be multiplied, and the two exponents are to be added. In order to perform floating-point multiplication, a simple algorithm is realized: Add the exponents and subtract 127 (bias) Multiply the mantissas and determine the sign of the result Normalize the resulting value, if necessary If done serial it would have taken 32 clock cycles (without pre-, post-normalization) instead of the actual 5 clock cycles needed. Disadvantage, the hardware needed for the parallel 32-bit multiplier is approximately 3 times that of serial. III. DOUBLE PRECISION FLOATING POINT ADDER/SUBTRACTOR The black box view and block diagram of double precision floating point adder/subtractor is shown in Figures 2 and 3 respectively. The input operands are separated into their sign, mantissa and exponent components. This module has inputs opa and opb of 64-bit width and clk, enable, rst are of 1-bit width. One of the operands is applied at opa and other operand at opb. Larger operand goes into mantissa_large and exponent_large, similarly the smaller operand goes into mantissa_small and exponent_small. To determine Figure1. Block diagram of double precision floating point adder/subtractor. A. Double Precision Floating Point Multiplier I implemented a double precision floating point multiplier with exceptions and rounding. Figure 6 shows the multiplier structure that includes exponent s addition, significand multiplication, and sign calculation. Figure 2 shows the multiplier, exceptions and rounding that are independent and are done in parallel. Figure2. Multiplier structure with rounding and exceptions.

Implementation of Area Optimized Floating Point Unit using Verilog Figure4. Inter-connection of sub-modules of double precision floating point adder/subtractor. IV. RESULTS Figure3. Flowchart for floating point addition/ subtraction. Figure5. RTL Schematic Diagram of Floating-Point Addition.

G.RAJA SEKHAR, M.SRIHARI Figure6. Testbench waveform of Floating-point Addition. Figure10. Testbench waveform of Floating-point Multiplication. Table1. Comparison table for FPU in terms of power, delay and memory Figure7. RTL Schematic diagram of Floating-point Subtraction. Figure8. Testbench waveform of Floating-point Subtraction. Figure9. RTL Schematic diagram of Floating-point Multiplication. V. CONCLUSION Arithmetic unit has been designed to perform three arithmetic operations, addition, subtraction, multiplication on floating point numbers. IEEE 754 standard based floating point representation has been used. The unit has been coded in Verilog. Code has been synthesized for the Virtex E FPGA, The device and package used are XCV600E and BG560 respectively. The speed grade is -6. Board using XILINX ISE 8.2i version software and has been implemented and verified on the board successfully. The designed arithmetic unit operates on 64-bit operands. It can be designed for 128-bit operands to enhance precision. It can be extended to have more mathematical operations like division, square root, and trigonometric functions. VI. REFERENCES [1] Charles Farnum, Compiler Support for Floating-Point Computation Software Practices and Experience, pp. 701-9 vol. 18, July 1988. [2] D. Goldberg, What every computer scientist should know about floating-point Arithmetic, pp. 5-48 in ACM Computing Surveys vol. 23-1 (1991). Logic Utilization Used Available Utilization Number of slice latches 25 13,824 1% Number of 4 input LUTs 1604 13,824 11% Logic Distribution Number of occupied slices 831 6912 12% Number of slices containing only related logic 831 831 100% Number of slices containing unrelated logic 0 831 0% Total number of 4 input LUTs 1605 13,824 11% Number used as logic 1604 Number of bonded IOBs 296 404 73% Number of GCLKs 1 4 25% Number of GCLKIOBs 1 4 25% 463 [3] Guillermo Marcus, Patricia Hinojosa, Alfonso Avila and Juan Nolazco- Flores A Fully Synthesizable Single-

Implementation of Area Optimized Floating Point Unit using Verilog Precision,Floating Point Adder/Subtractor and Multiplier in VHDL for General and Educational Use, Proceedings of the Fifth IEEE International Caracas Conference on Devices, Circuits and Systems, Dominican Republic, Nov.3-5, 2004. [4] IEEE Computer Society (1985), IEEE Standard for Binary Floating- Point Arithmetic, IEEE Std 754-1985. [5] Jim Hoff; "A Full Custom High Speed Floating Point Adder" Fermi National Accelerator Lab, 1992. [6] John Thompson, Nandini Karra, and Michael.J.Schulte A 64-bit decimal floating-point adder, Proceedings of the IEEE Computer Society Annual Symposium on VLSI Emerging Trends in VLSI Systems Design (ISVLSI 04). [7] W. Kahan IEEE Standard 754 for Binary Floating-Point Arithmetic, 1996 [8] Michael L. Overton, Numerical Computing with IEEE Floating Point Arithmetic, Published by Society for Industrial and Applied Mathematics,2001. [9] D. Narasimban, D. Fernandes, V. K. Raj, J. Dorenbosch, M. Bowden, V. S. Kapoor, A 100 Mhz FPGA based floating point adder,proceedings of IEEE custom integrated circuits conference, 1993.