An FPGA based Implementation of Floating-point Multiplier

Similar documents
An Efficient Implementation of Floating Point Multiplier

VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm

Design and Implementation of Floating Point Multiplier for Better Timing Performance

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015

Implementation of Floating Point Multiplier Using Dadda Algorithm

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering

Implementation of Double Precision Floating Point Multiplier in VHDL

HIGH SPEED SINGLE PRECISION FLOATING POINT UNIT IMPLEMENTATION USING VERILOG

Principles of Computer Architecture. Chapter 3: Arithmetic

Design and Implementation of IEEE-754 Decimal Floating Point Adder, Subtractor and Multiplier

Development of an FPGA based high speed single precision floating point multiplier

FPGA Implementation of Low-Area Floating Point Multiplier Using Vedic Mathematics

A comparative study of Floating Point Multipliers Using Ripple Carry Adder and Carry Look Ahead Adder

Implementation of Double Precision Floating Point Multiplier on FPGA

Architecture and Design of Generic IEEE-754 Based Floating Point Adder, Subtractor and Multiplier

Computer Architecture and Organization

CS Computer Architecture. 1. Explain Carry Look Ahead adders in detail

An FPGA Based Floating Point Arithmetic Unit Using Verilog

International Journal Of Global Innovations -Vol.1, Issue.II Paper Id: SP-V1-I2-221 ISSN Online:

Figurel. TEEE-754 double precision floating point format. Keywords- Double precision, Floating point, Multiplier,FPGA,IEEE-754.

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

Implementation of IEEE754 Floating Point Multiplier

University, Patiala, Punjab, India 1 2

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

2 Prof, Dept of ECE, VNR Vignana Jyothi Institute of Engineering & Technology, A.P-India,

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

IJRASET 2015: All Rights are Reserved

ISSN Vol.02, Issue.11, December-2014, Pages:

High Speed Single Precision Floating Point Unit Implementation Using Verilog

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

Implementation of IEEE-754 Double Precision Floating Point Multiplier

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Number Systems Standard positional representation of numbers: An unsigned number with whole and fraction portions is represented as:

Implementation of a High Speed Binary Floating point Multiplier Using Dadda Algorithm in FPGA

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION

Pipelined High Speed Double Precision Floating Point Multiplier Using Dadda Algorithm Based on FPGA

Floating Point Arithmetic

Design and Analysis of Inexact Floating Point Multipliers

Run-Time Reconfigurable multi-precision floating point multiplier design based on pipelining technique using Karatsuba-Urdhva algorithms

ISSN: X Impact factor: (Volume3, Issue2) Analyzing Two-Term Dot Product of Multiplier Using Floating Point and Booth Multiplier

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Arithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit

Module 2: Computer Arithmetic

Comparison of pipelined IEEE-754 standard floating point multiplier with unpipelined multiplier

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

Divide: Paper & Pencil

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

MIPS Integer ALU Requirements

ECE 30 Introduction to Computer Engineering

Week 7: Assignment Solutions

Design of Double Precision Floating Point Multiplier Using Vedic Multiplication

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTOR SUPPORT

Chapter 4. Operations on Data

Number Systems and Computer Arithmetic

COMPARISION OF PARALLEL BCD MULTIPLICATION IN LUT-6 FPGA AND 64-BIT FLOTING POINT ARITHMATIC USING VHDL

Review on Floating Point Adder and Converter Units Using VHDL

Basic Arithmetic (adding and subtracting)

A Library of Parameterized Floating-point Modules and Their Use

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

An Implementation of Double precision Floating point Adder & Subtractor Using Verilog

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Comparison of Adders for optimized Exponent Addition circuit in IEEE754 Floating point multiplier using VHDL

Design and Optimized Implementation of Six-Operand Single- Precision Floating-Point Addition

FLOATING POINT ADDERS AND MULTIPLIERS

Computer Architecture Set Four. Arithmetic

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

Lecture 3: Basic Adders and Counters

Chapter 2 Data Representations

A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA

By, Ajinkya Karande Adarsh Yoga

DESIGN TRADEOFF ANALYSIS OF FLOATING-POINT. ADDER IN FPGAs

Implementation of Double Precision Floating Point Multiplier Using Wallace Tree Multiplier

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM

The Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive.

ECE232: Hardware Organization and Design

Numeric Encodings Prof. James L. Frankel Harvard University

FFT REPRESENTATION USING FLOATING- POINT BUTTERFLY ARCHITECTURE

Number Representations

Chapter 5 : Computer Arithmetic

Low Power Floating-Point Multiplier Based On Vedic Mathematics

Vendor Agnostic, High Performance, Double Precision Floating Point Division for FPGAs

ECE 2030B 1:00pm Computer Engineering Spring problems, 5 pages Exam Two 10 March 2010

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.

CPE300: Digital System Architecture and Design

CO212 Lecture 10: Arithmetic & Logical Unit

Chapter 6 Combinational-Circuit Building Blocks

ADDERS AND MULTIPLIERS

Data Representations & Arithmetic Operations

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

A Decimal Floating Point Arithmetic Unit for Embedded System Applications using VLSI Techniques

FPGA Implementation of Single Precision Floating Point Multiplier Using High Speed Compressors

Transcription:

An FPGA based Implementation of Floating-point Multiplier L. Rajesh, Prashant.V. Joshi and Dr.S.S. Manvi Abstract In this paper we describe the parameterization, implementation and evaluation of floating-point multipliers for FPGAs. We have developed a method, based on the VHDL language, for producing technology-independent pipelined designs that allow compile-time parameterization of design precision and range, and optional inclusion of features such as overflow protection, gradual underflow and rounding modes of the IEEE floating-point format. The resulting designs, implemented in a Xilinx Spartan-3 device, achieve 288 MFLOPs with IEEE single precision floating-point numbers. Keywords FPGAs, Floating point arithmetic, multiplication, CAD design flow. F I. INTRODUCTION LOATING-point operations are useful for computations involving large dynamic range, but they require significantly more resources than integer operations. The rapid advance in Field-Programmable Gate Array (FPGA) technology makes such devices increasingly attractive for implementing floating-point arithmetic. FPGAs offer reduced development time and costs compared to application specific integrated circuits, and their flexibility enables field upgrade and adaptation of hardware to run-time conditions[1] Our main aims of designing floating-point units (FPUs) for reconfigurable hardware implementation are: (a) to parameterise the precision and range of the floating-point format to allow optimizing the FPUs for specific applications, and (b) to support optional inclusion of features such as gradual underflow and rounding. An approach meeting these aims will achieve effective design tradeoff between performance and required resources. II. FLOATING POINT FORMAT The IEEE Standard for Binary Floating Point Arithmetic(ANSI/IEEE Std754-2008)[2] is used. The single precision format is shown in Figure 1. Numbers in this format are composed of the following three fields: 1-bit sign, S: A value of 1 indicates that the number is negative, and a 0 indicates a positive number. Bias-127 exponent, e = E + bias: This gives us an exponent range from Emin = -126 to Emax =127. L. Rajesh, PG Scholar, VLSI Design, Dept of ECE, RITM. E-Mail: Rajeshcareer24@Gmail.Com Prashant.V. Joshi, Assistant Professor, Dept of ECE, RITM, Bangalore. E- Mail: prashanth_joshi@revainstitution.org Dr.S.S. Manvi, Dept of ECE, RITM, Bangalore. E-Mail: sunil.manvi@revainstitution.org Mantissa, M: A twenty three bit fraction, an bit is added to the fraction to form what is called the significand. If the exponent is greater than 0 and smaller than 255, and there is 1 in the MSB of the significand then the number is said to be a normalized number; in this case the real number is represented by [2]. Fig.1 shows the IEEE 754 single precision binary Format representation Value = (-1 S ) * 2 (E - Bias) * (1.M) Where M = m22. 2-1 + m21. 2-2 + m20.2-3 +. + m1.2-22 + m0.2-23 Bias = 127. III. FLOATING POINT MULTIPLICATION ALGORITHM Multiplying two numbers in floating point format is done by 1- adding the exponent of the two numbers then subtracting the bias from their result, 2- multiplying the significand of the two numbers, and 3- calculating the sign by XORing the sign of the two numbers. In order to represent the multiplication result as a normalized number there should be 1 in the MSB of the result (leading one). As stated in the floating point format, normalized floating point numbers have the form of as Value = (-1S) * 2 (E - Bias) * (1.M). To multiply two floating point numbers the following is done: 1. Multiplying the significand(1.m 1 *1.M 2 ) 2. Placing the decimal point in the result 3. Adding the exponents; i.e.(e1 + E2 Bias) 4. Obtaining the sign; i.e. s1 xor s2 5. Normalizing the result; i.e. obtaining 1 at the MSB of the results significand 6. Rounding the result to fit in the available bits 7. 7.Checking for underflow/overflow occurrence Consider a floating point representation similar to the IEEE 754 single precision floating point format, but with a reduced number of mantissa bits (only 4) while still retaining the hidden 1 bit for normalized numbers: A = 0 10000100 0100 = 40, B = 1 10000001 1110 = -7.5 To multiply A and B

1. Multiply significand: 1.0100 1.1110 00000 1001011000 2. Place the decimal point: 10.01011000 3. Add exponents: 10000100+10000001= 100000101 The exponent representing the two numbers is already shifted/biased by the bias value (127) and is not the true exponent; i.e. EA = EA-true + bias and EB = EB-true + bias And EA + EB = EA-true + EB-true + 2 bias So we should subtract the bias from the resultant exponent otherwise the bias will be added twice. 100000101-01111111 10000110 4. Obtain the sign bit and put the result together: 10000110 10.01011000 5. Normalize the result so that there is a 1 just before the radix point (decimal point). Moving the radix point one place to the left increments the exponent by 1; moving one place to the right decrements the exponent by 1. 1 10000110 10.01011000 (before normalizing) 1 10000111 1.001011000 (normalized) The result is (without the hidden bit): 1 10000111 00101100 6. The mantissa bits are more than 4 bits (mantissa available bits); rounding is needed. If we applied the truncation rounding mode then the stored value is: 1 10000111 0010. Fig. 2 shows the multiplier structure; Exponents addition, Significand multiplication, and Result s sign calculation are independent and are done in parallel. The significand multiplication is done on two 24 bit numbers and results in a 48 bit product, which we will call the intermediate product (IP). The IP is represented as (47 downto 0) and the decimal point is located between bits 46 and 45 in the IP. The following sections detail each block of the floating point multiplier. IV. Fig2.Floating point multiplier block diagram HARDWARE OF FLOATING POINT MULTIPLIER A. Sign bit calculation Multiplying two numbers results in a negative sign number if one of the multiplied numbers is of a negative value. By the aid of a truth table we find that this can be obtained by XORing the sign of two inputs. B. Unsigned Adder /Subtractor(for exponent addition) To reduce the delay caused by the effect of carry propagation in the ripple carry adder, we attempt to evaluate the carry-out for each stage (same as carry-in to next stage) concurrently with the computation of the sum bit [3, 4]. The two Boolean functions for the sum and carry are as follows [5,6]: SUM = Ai Bi Ci Cout = C i+1 = Ai Bi + (Ai Bi) Ci Let Gi = Ai Bi be the carry generate function and Pi = (Ai Bi) be the carry propagate function, Then we can rewrite the carry function as follows: Ci+1 = Gi + Pi Ci Thus, for 4-bit adder, we can compute the carry for all the stages as shown below: C1 = G0 + P0 C0 C2 =G1+ P1 C1 = G1 + P1 G0 + P1 P0 C0 C3= G2+P2 G1 + P2 P1 G0 +P2 P1 P0 C0 C4=G3+P3 G2+P3 P2 G1+P3 P2 P1 G0+P3 P2 P1 P0 C0 In general, we can write: The sum function: SUM i = A i B i Ci = P i C i The carry function Carry Look Ahead Adder can produce carries faster due to parallel generation of the carry bits by using additional circuitry.

The improvement in 4-bit adder is the key requirement to improve its efficiency, thus the authors propose the replacement of its 4-bit adder block with its carry look-ahead counter part. Recently, a modified carry look-ahead adder (abbreviated as MCLA) is proposed which is similar to CLA (carry look-ahead adder) in basic construction [7]. The drawback of MCLA [7] is that in spite of having significant speed improvement; it has significant increase in the area due to excessive number of NAND gates used for the faster carry propagation. This problem will significantly increase when the MCLA will be used for designing higher order CLA. This is the driving force that has led us to the proposal of new carry look-ahead adder modifying the structure of MCLA to make it more economical in terms of number of gates (area) without losing its speed efficiency. The MCLA [7] uses the modified full adder (MFA) as shown in Fig 3. In the proposed carry look-ahead adder shown in Fig. 4, we propose the replacement of 4th MFA in the MCLA by a full adder to reduce the area (number of gates) without sacrificing the speed improvement. It can be easily be verified that there will be reduction in number of gates to generate the final carry as shown in Fig.5. In order to have further saving in terms of number of gates, the proposed 4-bit CLA can be cascaded in series to design the higher width CLA as shown in Fig. 5. Fig6: Proposed Carry Look-Ahead Nine s Complementer Figure6 shows the proposed Nine s complementer using the proposed carry look-ahead adder. The proposed nine s complementer sticks to the requirement of carry look-ahead approach having fast speed and reduced area(inherit property of proposed CLA). Carry Look Ahead BCD Subtractor By looking the key components designed in carry lookahead fashion, the carry look-ahead BCD subtractor can be designed by integrating them. Figure7 shows the implementation of the carry look-ahaed BCD subtractor. Fig7: Proposed Carry Look-Ahead BCD Subtractor Fig3: MFA( modified full adder) C. Multiplier for Unsigned Data Multiplication involves the generation of partial products, one for each digit in the multiplier, as in Fig8. These partial products are then summed to produce the final product. The multiplication of two n-bit binary integers results in a product of up to 2n bits in length [8]. Fig4: proposed 4-bit Carry Look-Ahead Adder Fig5: Cascading of the proposed CLA to Design Higher Width CLA D. Normalizer Fig8: A Schematic of multiplier The result of the significand multiplication (intermediate product) must be normalized to have a leading 1 just to the left of the decimal point (i.e. in the bit 46 in the intermediate product). Since the inputs are normalized numbers then the intermediate product has the leading one at bit 46 or 47 1- If the leading one is at bit 46 (i.e. to the left of the decimal point) then the intermediate product is already a normalized number and no shift is needed.

2- If the leading one is at bit 47 then the intermediate product is shifted to the right and the exponent is incremented by 1. The shift operation is done using combinational shift logic made by multiplexers. Fig9 shows a simplified logic of a Normalizer that has an 8 bit intermediate product input and a 6bit intermediate exponent input. TABLE I NORMALIZATION EFFECT ON RESULT S EXPONENT AND OVERFLOW/UNDERFLOW DETECTION. Fig9. Simplified Normalizer logic V. UNDERFLOW/OVERFLOW DETECTION Overflow/underflow means that the result s exponent is too large/small to be represented in the exponent field. The exponent of the result must be 8 bits in size, and must be between 1 and 254 otherwise the value is not a normalized one. An overflow may occur while adding the two exponents or during normalization. Overflow due to exponent addition may be compensated during subtraction of the bias; resulting in a normal output value (normal operation). An underflow may occur while subtracting the bias to form the intermediate exponent. If the intermediate exponent < 0 then it s an underflow that can never be compensated; if the intermediate exponent = 0 then it s an underflow that may be compensated during normalization by adding 1 to it. When an overflow occurs an overflow flag signal goes high and the result turns to ±Infinity (sign determined according to the sign of the floating point multiplier inputs). When an underflow occurs an underflow flag signal goes high and the result turns to ±Zero (sign determined according to the sign of the floating point multiplier inputs). Denormalized numbers are signaled to Zero with the ppropriate sign calculated from the inputs and an underflow flag is raised. Assume that E1 and E2 are the exponents of the two numbers A and B respectively; the result s exponent is calculated by (6) Eresult = E1 + E2 127 E1 and E2 can have the values from 1 to 254; resulting in Eresult having values from -125 (2-127) to 381 (508-127); but for normalized numbers, Eresult can only have the values from 1 to 254. Table I summarizes the Eresult different values and the effect of normalization on it. VI. PIPELINING THE MULTIPLIER In order to enhance the performance of the multiplier, three pipelining stages are used to divide the critical path thus increasing the maximum operating frequency of the multiplier. The pipelining stages are imbedded at the following locations: 1. In the middle of the significand multiplier, and in the middle of the exponent adder (before the bias subtraction). 2. After the significand multiplier, and after the exponent adder. 3. At the floating point multiplier outputs (sign, exponent and mantissa bits). Fig10 shows the pipelining stages as dotted lines. Three pipelining stages mean that there is latency in the output by three clocks. The synthesis tool retiming option was used so that the synthesizer uses its optimization logic to better place the pipelining registers across the critical path. VII. Fig10. Floating point multiplier with pipelined stages COMPARISONS AND VHDL SIMULATION First the VHDLsimulation of two multipliers is considered. The VHDL code for both multipliers, using a fast carry look ahead adder and a ripple carry adder, are generated. The multiplier uses 24-bit values. The worst case was applied to the two multipliers, where the gate delay is assumed to be 5ns. Fig11:Waveform for a carry ripple adder

Fig12: Waveform for a carry look-ahead adder Under the worst case, the multiplier with a ripple adder uses time=979.056ns, while the multiplier with the carry lookahead uses time=659.292ns. Hence we can overcome the ripple carry adder by carry-look ahead adder. The whole multiplier was tested against the Xilinx floating point multiplier core generated by Xilinx coregen. Xilinx coregen was customized to have two flags to indicate overflow and underflow, and to have a maximum latency of three cycles. Xilinx core implements the round to nearest rounding mode. A testbench is used to generate the stimulus and applies it to the implemented floating point multiplier and to the Xilinx core then compares the result. the floating point multiplier was also checked using Design checker. The design was synthesized using precision synthesis tool [9]targeted to XC3S1500-5FG456 Spartan-3 device. VIII. CONCLUSIONS AND FUTURE WORK This paper presents an implementation of a floating point multiplier that supports the IEEE 754-2008binary interchange format; the multiplier doesn t implement rounding and just presents the significand multiplication result as is (48 bits); this gives better precision if the whole 48 bits are utilized in another unit. The design has three pipelining stages and after implementation on a Xilinx Spartan3 FPGA it achieves 288 MFLOPs. REFERENCES [1] Styles, H. and Luk, W. Customising graphics applications: techniques and programming interface, Proc. IEEE Symp. on Field-Programmable Custom Computing Machines, 2000, pp. 77 87. [2] IEEE 754-2008, IEEE Standard for Floating-Point Arithmetic, 2008. [3] P. S. Mohanty, Design and Implementation of Faster and Low Power Multipliers, Bachelor Thesis. National Institute of Technology, Rourkela, 2009. http://ethesis.nitrkl.ac.in/213/1/10509019_final.pdf.pdf. [4] S. Brown and Z. Vranesic, Fundamentals of Digital Logic with VHDL Design, 2nd ed.,mcgraw-hill Higher Education, USA, 2005. ISBN: 0072499389. [5] J. R. Armstrong and F.G. Gray, VHDL Design Representation and Synthesis, 2nd ed., Prentice Hall, USA, 2000. ISBN: 0-13-0216704. [6] Z. Navabi, VHDL Modular Design and Synthesis of Cores and Systems, 3rd ed., McGraw-Hill Professional, USA, 2007. ISBN: 9780071508926. [7] Yu-Ting Pai and Yu-Kumg Chen, " The Fastest Carry Look ahead Adder", Proceedings of the Second IEEE International Workshop on Electronic Design, Test and Applications (DELTA 04),434-436. [8] W. Stallings, Computer Organization and Architecture Designing for Performance, 7th ed., Prentice Hall, Pearson Education International, USA, 2006, ISBN: 0-13-185644-8. [9] http://www.eng.utah.edu/~cs3710/xilinx-docs/xapp768c.pdf