FPGA Implementation of Low-Area Floating Point Multiplier Using Vedic Mathematics

Similar documents
An Efficient Implementation of Floating Point Multiplier

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm

2 Prof, Dept of ECE, VNR Vignana Jyothi Institute of Engineering & Technology, A.P-India,

Implementation of Floating Point Multiplier Using Dadda Algorithm

Figurel. TEEE-754 double precision floating point format. Keywords- Double precision, Floating point, Multiplier,FPGA,IEEE-754.

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

Implementation of Double Precision Floating Point Multiplier in VHDL

An FPGA Based Floating Point Arithmetic Unit Using Verilog

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015

Development of an FPGA based high speed single precision floating point multiplier

International Journal Of Global Innovations -Vol.1, Issue.II Paper Id: SP-V1-I2-221 ISSN Online:

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

Implementation of Double Precision Floating Point Multiplier on FPGA

Implementation of IEEE-754 Double Precision Floating Point Multiplier

VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS

A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA

An FPGA based Implementation of Floating-point Multiplier

Comparison of Adders for optimized Exponent Addition circuit in IEEE754 Floating point multiplier using VHDL

Design and Implementation of Floating Point Multiplier for Better Timing Performance

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering

An Implementation of Double precision Floating point Adder & Subtractor Using Verilog

Design of High Speed Area Efficient IEEE754 Floating Point Multiplier

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers

Design and Implementation of IEEE-754 Decimal Floating Point Adder, Subtractor and Multiplier

ISSN: (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

Architecture and Design of Generic IEEE-754 Based Floating Point Adder, Subtractor and Multiplier

ISSN Vol.02, Issue.11, December-2014, Pages:

Pipelined High Speed Double Precision Floating Point Multiplier Using Dadda Algorithm Based on FPGA

Fig.1. Floating point number representation of single-precision (32-bit). Floating point number representation in double-precision (64-bit) format:

University, Patiala, Punjab, India 1 2

Implementation of a High Speed Binary Floating point Multiplier Using Dadda Algorithm in FPGA

ISSN Vol.03,Issue.11, December-2015, Pages:

A comparative study of Floating Point Multipliers Using Ripple Carry Adder and Carry Look Ahead Adder

Double Precision Floating-Point Arithmetic on FPGAs

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

Implementation of Double Precision Floating Point Multiplier Using Wallace Tree Multiplier

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Design of Double Precision Floating Point Multiplier Using Vedic Multiplication

Run-Time Reconfigurable multi-precision floating point multiplier design based on pipelining technique using Karatsuba-Urdhva algorithms

VLSI Implementation of High Speed and Area Efficient Double-Precision Floating Point Multiplier

COMPARISION OF PARALLEL BCD MULTIPLICATION IN LUT-6 FPGA AND 64-BIT FLOTING POINT ARITHMATIC USING VHDL

Vendor Agnostic, High Performance, Double Precision Floating Point Division for FPGAs

Vedic Mathematics Based Floating Point Multiplier Implementation for 24 Bit FFT Computation

FPGA based High Speed Double Precision Floating Point Divider

Novel High Speed Vedic Maths Multiplier P. Harish Kumar 1 S.Krithiga 2

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

SIMULATION AND SYNTHESIS OF 32-BIT MULTIPLIER USING CONFIGURABLE DEVICES

Review on Floating Point Adder and Converter Units Using VHDL

Floating-Point Matrix Product on FPGA

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

REALIZATION OF MULTIPLE- OPERAND ADDER-SUBTRACTOR BASED ON VEDIC MATHEMATICS

Analysis of High-performance Floating-point Arithmetic on FPGAs

Design of Vedic Multiplier for Digital Signal Processing Applications R.Naresh Naik 1, P.Siva Nagendra Reddy 2, K. Madan Mohan 3

Design and Optimized Implementation of Six-Operand Single- Precision Floating-Point Addition

FLOATING POINT ADDERS AND MULTIPLIERS

Implementation of Double Precision Floating Point Adder with Residue for High Accuracy Using FPGA

ARCHITECTURAL DESIGN OF 8 BIT FLOATING POINT MULTIPLICATION UNIT

DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTOR SUPPORT

ECE331: Hardware Organization and Design

Floating Point Arithmetic

Module 2: Computer Arithmetic

SINGLE PRECISION FLOATING POINT DIVISION

VHDL implementation of 32-bit floating point unit (FPU)

A Library of Parameterized Floating-point Modules and Their Use

FPGA IMPLEMENTATION OF DFT PROCESSOR USING VEDIC MULTIPLIER. Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

ADDERS AND MULTIPLIERS

An Efficient Approach to an 8-Bit Digital Multiplier Architecture based on Ancient Indian Mathematics

An Efficient Design of Vedic Multiplier using New Encoding Scheme

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

FPGA Matrix Multiplier

A Single/Double Precision Floating-Point Reciprocal Unit Design for Multimedia Applications

A Binary Floating-Point Adder with the Signed-Digit Number Arithmetic

IJRASET 2015: All Rights are Reserved

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

Hemraj Sharma 1, Abhilasha 2

HIGH SPEED SINGLE PRECISION FLOATING POINT UNIT IMPLEMENTATION USING VERILOG

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

FFT REPRESENTATION USING FLOATING- POINT BUTTERFLY ARCHITECTURE

Area-Time Efficient Square Architecture

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

Floating Point Square Root under HUB Format

REALIZATION OF AN 8-BIT PROCESSOR USING XILINX

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

The Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive.

ECE232: Hardware Organization and Design

FPGA Implementation of Single Precision Floating Point Multiplier Using High Speed Compressors

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope

Design and Implementation of 3-D DWT for Video Processing Applications

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER BASED ON VEDIC MULTIPLICATION TECHNIQUE

FPGA Implementation of a High Speed Multistage Pipelined Adder Based CORDIC Structure for Large Operand Word Lengths

Optimized Design and Implementation of a 16-bit Iterative Logarithmic Multiplier

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

Floating Point Arithmetic

Transcription:

FPGA Implementation of Low-Area Floating Point Multiplier Using Vedic Mathematics R. Sai Siva Teja 1, A. Madhusudhan 2 1 M.Tech Student, 2 Assistant Professor, Dept of ECE, Anurag Group of Institutions (formerly CVSR College of Engineering), Ghatkesar, R. R Dist, A.P, India Abstract In this paper we describe an efficient implementation of an IEEE 754 single precision floating point multiplier using vedic mathematics.the purpose of using vedic mathematics is due to increase in the number of partial products in normal multiplication process,with using vedic mathematics partial products can be reduced so that the area and power constraints of the floating point multiplier can be reduced efficiently. Keywords-- floatingpoint; multiplication, FPGA, Nikhilamsutra, Radix selection unit, Vedic mathematics. I. INTRODUCTION Floating point numbers are one possible way of representing real numbers in binary format; the IEEE 754 [1] standard presents two different floating point formats, Binary interchange format and Decimal interchange format. Multiplying floating point numbers is a critical requirement for DSP applications involving large dynamic range. This paper focuses only on single precision normalized binary interchange format. Fig. 1 shows the IEEE 754 single precision binary format representation; it consists of a one bit sign (S), an eight bit exponent (E), and a twenty three bit fraction (M or Mantissa). An extra bit is added to the fraction to form what is called the significand 1. If the exponent is greater than 0 and smaller than 255, and there is 1 in the MSB of the significand then the number is said to be a normalized number; in this case the real number is represented by (1) Figure1. IEEE single precision floating point format Z = (-1 S ) * 2 (E - Bias) * (1.M) Where M = m 22 2-1 + m 21 2-2 + m 20 2-3 + + m 1 2-22 + m 0 2-23 ; Bias = 127. Multiplying two numbers in floating point format is done by 1- adding the exponent of the two numbers then subtracting the bias from their result, 2- multiplying the significand of the two numbers, and 3- calculating the sign by XORing the sign of the two numbers. In order to represent the multiplication result as a normalized number there should be 1 in the MSB of the result (leading one). Floating-point implementation on FPGAs has been the interest of many researchers. In [2], an IEEE 754 single precision pipelined floating point multiplier was implemented on multiple FPGAs (4 Actel A1280). In [3], a custom 16/18 bit three stage pipelined floating point multiplier that doesn t support rounding modes was implemented. In [4], a single precision floating point multiplier that doesn t support rounding modes was implemented using a digit-serial multiplier: using the Altera FLEX 8000 it achieved 2.3 MFlops. In [5], a parameterizable floating point multiplier was implemented using the software-like language Handel-C, using the Xilinx XCV1000 FPGA; a five stages pipelined multiplier achieved 28MFlops. In [6], a latency optimized floating point unit using the primitives of Xilinx Virtex II FPGA was implemented with a latency of 4 clock cycles. The multiplier reached a maximum clock frequency of 100 MHz. II. FLOATING POINT MULTIPLICATION ALGORITHM As stated in the introduction, normalized floating point numbers have the form of Z= (-1 S ) * 2 (E - Bias) * (1.M). To multiply two floating point numbers the following is done: 1. Multiplying the significand; i.e. (1.M 1 *1.M 2 ) 2. Placing the decimal point in the result 3. Adding the exponents; i.e. (E 1 + E 2 Bias) 4. Obtaining the sign; i.e. s 1 xor s 2 5. Normalizing the result; i.e. obtaining 1 at the MSB of the results significand 362

6. Rounding the result to fit in the available bits 7. Checking for underflow/overflow occurrence 1 Significand is the mantissa with an extra MSB bit. This research has been supported by Mentor Graphics. Consider a floating point representation similar to the IEEE 754 single precision floating point format, but with a reduced number of mantissa bits (only 4) while still retaining the hidden 1 bit for normalized numbers: A = 0 10000100 0100 = 40, B = 1 10000001 1110 = - 7.5 To multiply A and B 1. Multiply significand: 1.0100 1.1110 00000 1001011000 2. Place the decimal point: 10.01011000 3. Add exponents: 10000100 + 10000001 10000010 1 The exponent representing the two numbers is already shifted/biased by the bias value (127) and is not the true exponent; i.e. E A = E A-true + bias and E B = E B-true + bias And The result is (without the hidden bit): 1 10000111 00101100 6. The mantissa bits are more than 4 bits (mantissa available bits); rounding is needed. If we applied the truncation rounding mode then the stored value is: 1 10000111 0010. In this paper we present a floating point multiplier in which rounding support isn t implemented. Rounding support can be added as a separate unit that can be accessed by the multiplier or by a floating point adder, thus accommodating for more precision if the multiplier is connected directly to an adder in a MAC unit. Fig. 2 shows the multiplier structure; Exponents addition, Significand multiplication, and Result s sign calculation are independent and are done in parallel. The significand multiplication is done on two 24 bit numbers and results in a 48 bit product, which we will call the intermediate product (IP). The IP is represented as (47 downto 0) and the decimal point is located between bits 46 and 45 in the IP. The following sections detail each block of the floating point multiplier. E A + E B = E A-true + E B-true + 2 bias So we should subtract the bias from the resultant exponent otherwise the bias will be added twice. 100000101-01111111 10000110 4. Obtain the sign bit and put the result together: 1 10000110 10.01011000 5. Normalize the result so that there is a 1 just before the radix point (decimal point). Moving the radix point one place to the left increments the exponent by 1; moving one place to the right decrements the exponent by 1. 1 10000110 10.01011000 (before normalizing) 1 10000111 1.001011000 (normalized) Figure 2. Floating point multiplier block diagram III. VEDIC MATHEMATICS Vedic Mathematics is the ancient methodology of Indian mathematics which has a unique technique of calculations based on 16 Sutras (Formulae). A high speed complex multiplier design (ASIC) using Vedic Mathematics is presented in this paper. The idea for designing the multiplier and adder sub tractor unit is adopted from ancient Indian mathematics "Vedas". On account of those formulas, the partial products and sums are generated in one step which reduces the carry propagation from LSB to MSB. 363

The gifts of the ancient Indian mathematics in the world history of mathematical science are not well recognized. The contributions of saint and mathematician in the field of number theory, 'Sri Bharati Krsna Thirthaji Maharaja', in the fonn of Vedic Sutras (fonnulas) are significant for calculations. He had explored the mathematical potentials from Vedic primers and showed that the mathematical operations can be carried out mentally to produce fast answers using the Sutras. In this paper we are concentrating on "Nikhilam Navatascaramam Dasatah" fonnulas and other fonnulas are beyond the scope of this paper. IV. PROPOSED MULTIPLIER ARCHITECTURE The mathematical expression for the proposed algorithm is shown below. Broadly this algorithm is divided into three parts. (i) Radix Selection Unit (ii) Exponent Determinant (iii) Multiplier. Consider two n bit numbers X and Y. kl and k2 are the exponent of X and Y respectively. X and Y can be represented as: X= Z k 1 ± Z l (2) y = Z k 2 ± Z 2 (3) For the fast multiplication using Nikhilam sutra the bases of the multiplicand and the multiplier would be same, (here we have considered different base) thus the equation can be rewritten as Hardware implementation of this mathematics is shown in Fig. The architecture can be decomposed into three main subsections: (i) Radix Selection Unit (RSU) (ii) Exponent Determinant (ED) and (iii) Array Multiplier. The RSU is required to select the proper radices corresponding to the input numbers. If the selected radix is nearer to the given number then the multiplication of the residual parts (Zl xz2) can be easier to compute. The Subtractor blocks are required to extract the residual parts (ZI and Z2). The second subsection(ed) is used to extract the power (kl and k2) of the radix andit is followed by a subtractor to calculate the value of (k1-k2).the third subsection array multiplier [10] is used to calculate the product (Zl xzz). The output of the subtractor (klk2)and Zz are fed to the shifter block to calculate the value ofz2 x Zk1-k2.The first adder-subtractor block has been used to calculate the value of X ± Z2 x Zk1 -k2 The output of the first addersubtractor and the output of the second Exponent Determinant (k2) are fed to the second shifter block to compute the value of Zk2 x (X ± Z2 X Zk1-k2). The output of the multiplier (ZI xz2) and the output of the second shifter (Zk2 x (X ± Z2 x Zk1-k2))are fed to the second adder subtractor block to compute the value of (Zk2 x (X ± Z2 x Zk1-k2)) ± ZlZ2' Mathematical expression/or RSU Consider an 'n' bit binary number X, and it can be represented as X = Lf l Xi Zi Where XjE {O, l}. Then the values of X must lie in the rangezn-1 ::;; X < Zn. Consider the mean of the range is equals to 364

Exponent Determinant The hardware implementation of the exponent determinant is shown in Fig. 4.The integer part or exponent of the number from the binary fixed point number can be obtained by the maximum power of the radix. For the nonzero input, shifting operation is executed using parallel in parallel out (PIPO) shift registers. The number of select lines (in FigA it is denoted as S], So) of the PIPO shifter is chosen as per the binary representation of the number (N- 1)IO. 'Shift' pin is assigned in PIPO shifter to check whether the number is to be shifted or not (to initialize the operation 'Shift' pin is initialized to low). A decrementer [13] has been integrated in this architecture to follow the maximum power of the radix. A sequential searching procedure has been implemented here to search the first 'I' starting from the MSB side by using shifting technique. For an N bit number, the value (N-I)1O is fed to the input of decrementer. Fig-Hard ware implementation of Nikhilam Sutra The Block level architecture of RSU is shown in Fig. RSU consists of three main subsections: (i) Exponent Determinant (ED), (ii) Mean Determinant (MD) and (iii) Comparator. 'n' number bit from input X is fed to the ED block. The maximum power of X is extracted at the output which is again fed to shifter and the adder block. The second input to the shifter is the (n+i) bit representation of decimal '1'.If the maximum power of X from the ED unit is (n-i) then the output of the shifter is i"- I ). The adder unit is needed to increment the value of the maximum power of X by 'I'. The second shifter is needed to generate the value of 2".Here n is the incremented value taken from the adder block. The Mean Determinant unit is required to compute the mean of (zn- l + Zn). The Comparator compares the actual input with the mean value of (zn- l + zn). If the input is greater than the mean then 2" is selected as the required radix. If the input is less than the mean then 2"- 1 is selected as the radix. The select input to the multiplexer block is taken from the output of the comparator. Fig-Hardware implementation of RSU The decrementer is decremented based on a control signal which is generated by the searched result. If the searched bit is '0' then the control signal becomes low then decrementer start decrementing the input value (Here the decrementer is operating in active low logic). The searched bit is used as a controller of the decrementer. When the searched bit is 'I' then the control signal becomes high and the decrementer stops further decrementing and shifter also stops shifting operation. The output of the decrementer shows the integer part (exponent) of the number. 365

VI. CONCULSION AND FUTURE WORK From the above results we can clearly say that the area has been reduced as the utilization of CLB s and Flip Flops have been reduced by a fair bit. The carried out work can be further improved by using highspeed adders and subtsractors. Fig. Hardware implementation of exponent determinant V. SIMULATION RESULT ANALYSIS The result analysis is as shown below.due to the number of partial products have reduced the area is reduced in a fair amount.then no of CLB slices and the flipflops are reduced by a fair amount The simulation results of the existing and proposed system is as shown below Specifications Existing system Proposed system No of slices 604 356 No of Flip Flops 293 108 REFERENCES [1 ] IEEE 754-2008, IEEE Standard for Floating-Point Arithmetic, 2008. [2 ] B. Fagin and C. Renard, Field Programmable Gate Arrays and Floating Point Arithmetic, IEEE Transactions on VLSI, vol. 2, no. 3, pp. 365 367, 1994. [3 ] ARadhika pavan kumar UCS Saiprasad Goud A FPGA implementation of highspeed 8-bit vedic multiplier using barrel shifter [4 ] L. Louca, T. A. Cook, and W. H. Johnson, Implementation of IEEE Single Precision Floating Point Addition and Multiplication on FPGAs, Proceedings of 83 the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM 96), pp. 107 116, 1996. [5 ] A. Jaenicke and W. Luk, "Parameterized Floating-Point Arithmetic on FPGAs", Proc. of IEEE ICASSP, 2001, vol. 2, pp. 897-900. [6 ] B. Lee and N. Burgess, Parameterisable Floating-point Operations on FPGA, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems, and Computers, 2002 [7 ] DesignChecker User Guide, HDL Designer Series 2010.2a, Mentor Graphics, 2010 [8 ] Precision Synthesis User s Manual, Precision RTL plus 2010a update 2, Mentor Graphics, 2010. [9 ] Patterson, D. & Hennessy, J. (2005), Computer Organization and Design: The Hardware/software Interface, Morgan Kaufmann. 366