IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers

Similar documents
Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

University, Patiala, Punjab, India 1 2

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

ISSN Vol.02, Issue.11, December-2014, Pages:

FPGA Implementation of Low-Area Floating Point Multiplier Using Vedic Mathematics

Fig.1. Floating point number representation of single-precision (32-bit). Floating point number representation in double-precision (64-bit) format:

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

Chapter 3: Arithmetic for Computers

Implementation of Double Precision Floating Point Multiplier in VHDL

ARCHITECTURAL DESIGN OF 8 BIT FLOATING POINT MULTIPLICATION UNIT

An FPGA Based Floating Point Arithmetic Unit Using Verilog

Implementation of Floating Point Multiplier Using Dadda Algorithm

Implementation of Double Precision Floating Point Multiplier Using Wallace Tree Multiplier

Run-Time Reconfigurable multi-precision floating point multiplier design based on pipelining technique using Karatsuba-Urdhva algorithms

EE260: Logic Design, Spring n Integer multiplication. n Booth s algorithm. n Integer division. n Restoring, non-restoring

International Journal of Advanced Research in Computer Science and Software Engineering

Low Power Floating-Point Multiplier Based On Vedic Mathematics

A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA

Analysis of Different Multiplication Algorithms & FPGA Implementation

A Library of Parameterized Floating-point Modules and Their Use

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

ECE 30 Introduction to Computer Engineering

Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015

CPE300: Digital System Architecture and Design

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm

EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION

Double Precision Floating-Point Arithmetic on FPGAs

VLSI Implementation of High Speed and Area Efficient Double-Precision Floating Point Multiplier

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

An Implementation of Double precision Floating point Adder & Subtractor Using Verilog

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

Comparison of Adders for optimized Exponent Addition circuit in IEEE754 Floating point multiplier using VHDL

An Efficient Implementation of Floating Point Multiplier

Organisasi Sistem Komputer

Design and Optimized Implementation of Six-Operand Single- Precision Floating-Point Addition

Figurel. TEEE-754 double precision floating point format. Keywords- Double precision, Floating point, Multiplier,FPGA,IEEE-754.

Architecture and Design of Generic IEEE-754 Based Floating Point Adder, Subtractor and Multiplier

ECE232: Hardware Organization and Design

COMPUTER ORGANIZATION AND ARCHITECTURE

An FPGA Implementation of the Powering Function with Single Precision Floating-Point Arithmetic

International Journal of Advanced Research in Computer Science and Software Engineering

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS

HIGH SPEED SINGLE PRECISION FLOATING POINT UNIT IMPLEMENTATION USING VERILOG

Parallelized Radix-4 Scalable Montgomery Multipliers

Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating point

Digital Computer Arithmetic

Implementation of Double Precision Floating Point Multiplier on FPGA

Integer Multiplication. Back to Arithmetic. Integer Multiplication. Example (Fig 4.25)

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

Module 2: Computer Arithmetic

Floating-Point Matrix Product on FPGA

ECE 341 Midterm Exam

Computer Organisation CS303

Design of Double Precision Floating Point Multiplier Using Vedic Multiplication

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

UNIT - I: COMPUTER ARITHMETIC, REGISTER TRANSFER LANGUAGE & MICROOPERATIONS

Numeric Encodings Prof. James L. Frankel Harvard University

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

Floating Point Arithmetic

DESIGN TRADEOFF ANALYSIS OF FLOATING-POINT. ADDER IN FPGAs

Development of an FPGA based high speed single precision floating point multiplier

CHW 261: Logic Design

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL

An IEEE Decimal Parallel and Pipelined FPGA Floating Point Multiplier

COMPARISION OF PARALLEL BCD MULTIPLICATION IN LUT-6 FPGA AND 64-BIT FLOTING POINT ARITHMATIC USING VHDL

Arithmetic Logic Unit

D I G I T A L C I R C U I T S E E

Vendor Agnostic, High Performance, Double Precision Floating Point Division for FPGAs

Reducing Computational Time using Radix-4 in 2 s Complement Rectangular Multipliers

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Efficient Floating-Point Representation for Balanced Codes for FPGA Devices (preprint version)

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

SINGLE PRECISION FLOATING POINT DIVISION

A High-Speed FPGA Implementation of an RSD- Based ECC Processor

A Review on Optimizing Efficiency of Fixed Point Multiplication using Modified Booth s Algorithm

Area-Time Efficient Square Architecture

I. Introduction. India; 2 Assistant Professor, Department of Electronics & Communication Engineering, SRIT, Jabalpur (M.P.

High Speed Special Function Unit for Graphics Processing Unit

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

CS Computer Architecture. 1. Explain Carry Look Ahead adders in detail

By, Ajinkya Karande Adarsh Yoga

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

Chapter 5 : Computer Arithmetic

Divide: Paper & Pencil

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

CHAPTER 1 Numerical Representation

Survey on Implementation of IEEE754 Floating Point Number Division using Vedic Techniques

Faster Interleaved Modular Multiplier Based on Sign Detection

FPGA Based Antilog Computation Unit with Novel Shifter

Effective Improvement of Carry save Adder

Pipelined High Speed Double Precision Floating Point Multiplier Using Dadda Algorithm Based on FPGA

Transcription:

International Journal of Research in Computer Science ISSN 2249-8257 Volume 1 Issue 1 (2011) pp. 1-7 White Globe Publications www.ijorcs.org IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers Geetanjali Wasson Assistant Professor, IET Bhaddal, Ropar, Punjab Email: geetanjlipuri@gmail.com Abstract In image and signal processing applications floating-point multiplication is major concern in calculations. Performing multiplication on floating point data is a long course of action and requires huge quantity of processing time. By improving the speed of multiplication task overall speed of the system can be enhanced. The Bottleneck in the floating point multiplication process is the multiplication of mantissas which needs 53*53 bit integer multiplier for double precision floating point numbers. By improving the speed of multiplication task the overall speed of the system can be improved. In this paper a comparison on the existing trends in mantissa multiplication is made. Vedic and Canonic Signed digit algorithms were compared on parameters like speed, complexity of routing, pipelining, resource required on FPGA. The comparison showed that Canonic Signed Digit Algorithm is better than Vedic algorithm in terms of speed and resources required on spartan 3 FPGA. Keywords: CSD (Canonic Signed Digit); FPGA (Field Programmable Gated Array); Double precision floating point number. I. Introduction There have been several explorations for implementing floating point arithmetic units on FPGAs to perform arithmetic operations on floating point data more rapidly than software algorithms. FPGA based floating point multiplication can be categorized as IP core based and algorithm based [1-2]. In IP core based highly optimized IP cores are employed as sub units of floating point unit design and in algorithm based implementation techniques are employed to get better performance by grading system goals as speed, area and throughput optimizations [2-5]. Power consumption of multipliers is governed by its number of non zero digits. If no technique is used to reduce the number of non-zero digits then multiplication of two N bit numbers produces N partial products and N-1 adders are needed to get result of multiplication [6][10]. Different techniques are

2 Geetanjali Wasson employed to reduce the number of adder stages and to accelerate the addition of partial products generated during multiplication process. Some algorithms use compression techniques to reduce the stages of addition [13]. Some algorithms accelerated the addition process by reducing number of nonzero partial products. This is done by coding the digits of in such a way that it has minimum number of non zero digits [6-10]. II. Related Work Image processing, Digital Signal Processing and communication systems all require competent and fast floating point multiplication. The bottleneck of floating point multiplication is mantissa multiplication which needs 53*53 bit integer multiplier for double precision floating point numbers. There are many multiplier architectures using different multiplication techniques for example, Array multiplier, redundant binary architecture and many more architectures using tree structures but they have problem of larger delay [1-5]. Different algorithms are also there that perform floating point multiplication like BOOTH algorithm, algorithms based on ancient mathematics [11], and many more. Ali Akoglu introduced an algorithm using ancient Vedic mathematics rules for fast multiplication which generate partial products concurrently. The rest of the paper is organized as follows: In section 3, floating point multiplication technique is introduced. In section 4, BOOTH algorithm is introduced. In section 5, Vedic Algorithm is discussed, in section 6, CSD algorithm and its implementation is discussed. Results are shown in section 7 and comparison is done with Ali Akoglu s Vedic algorithm [13] and section 8 contains conclusion and future scope. III. Floating Point Multiplication Floating-point representation can be thought of as a computer realization of scientific notation. Unlike fixed point numbers, in floating point numbers, radix point is not fixed [3]. The advantage of floating-point representation over fixed-point representation is that it can support wider range of values. Floating point formats generally followed for computation practices, are those which confirm to IEEE-754 standard. These standards play vital role in ensuring numerical robustness and code compatibility among machines of different architectures [5]. According to IEEE-754 standards, the floating point number is represented as: V= (-1) sign * 2 exponent-bias *1.fraction [1] Implicit bit is used before fraction or mantissa, which is 1 for normalized number and 0 for unnormalized number. Exponent bias is (2 e-1 ) - 1, which comes out to be 127 for single precision and 1023 for double precision exponent. Floating point multiplication is not straightforward as integer multiplication. It involves three operations. Exponents of multiplier and multiplicand are added, mantissas of two numbers are multiplied and ex-or of sign bits is done to get final sign bit [1-10].

Algorithms for Fast Multiplication of Double Precision Floating Point Numbers 3 IV. Booth Multiplication Algorithm In BOOTH multiplication algorithm to reduce the time needed for multiplication number of partial products to be added are reduced [1-3]. BOOTH recording reduces the number of adder units needed and hence reduced the delay by reducing number of nonzero bits in the multiplier [2]. In BOOTH recoding, the long sequence of 1s is replaced by two no zero bits; for example, If digits j through (down to) k are 1s, then, 2 j + 2 j+1 +. + 2 k+1 + 2 k =2 j+1 2 k This represents, the sequence of additions can be replaced by an addition of the multiplicand shifted by j+1 positions and a subtraction of the multiplicand shifted by k positions [5]. The drawback of BOOTH recording is the high power consumption and thus reduced efficiency [3-8]. V. Vedic Multiplication Let A & B is two binary numbers to be multiplied B= 1 1 0 1 A= 1 0 1 0 The Partial products are calculated using windowing and vertical & cross multiply C0= 0 and 1 C5=101 and 011 C1=10 and 10 C6=10 and 11 C3=010 and 101 C7=1 and 1 C4=1010 and 1011 0 1 0 1 0 1 Windowing of C3 Vedic algorithm based architectures for floating point multiplication are power efficient and are suitable for high speed applications, with reduced routing complexity [7-10],[23]. It encourages parallel computation and increases speed of multiplication by introducing inherent pipe-lined stages without any increased over head. This is achieved by using adder tree structures for addition of partial products. This algorithm basically deals with mantissa multiplication and has three stages for multiplication. i) Partial Product Generation ii) Accumulation iii) Final Addition Partial products are generated by windowing of the multiplier and multiplicand. As shown in Figure- 1. Then accumulation is done of partial products and then finally partial products are added using binary tree [5],[7],[8],[11]. Accumulation is done by using Wallace tree as shown in Figure2. The implementation of algorithm and stages of pipeline added depends on the implementation of accumulation and addition stages [8]. The speed can be increased by increasing radix of multiplication. Vedic algorithm is speed efficient and very less complex as compare to CSD but resources needed are

4 Geetanjali Wasson more [11]. And numbers of partial products are not reduced so adder units needed are more as compare to CSD. After the accumulation binary tree addition is done to get final mantissa multiplication result of 2n- 1 bits [11]. VI. Canonic Signed Digit Figure 2. Accumulation stage 3:1 Compression using Wallace Tree CSD is modified form of BOOTH algorithm. It utilizes the fact that the multiplication which is to be carried out by power of two can be easily obtained by simply shifting the multiplicand to right/left using some dedicated hardware and incorporating some adders. It also utilizes the fact that the multiplication by a left shift is much faster than repeated additions and more the number of 1's in multiplier [1-2], the slower the multiplication. CSD is a unique signed digit representation having smallest numbers of nonzero digits, with a property that no two consecutive bits are non zero.csd can have n/2 non-zero digits for n which is even. Thus to multiply two n-bits integers, it takes n/2-1 number of steps [1], [7-9]. Following are the properties of CSD numbers: i) No two consecutive bits in a CSD number are non- zero. ii) The CSD representation of a number contains the minimum possible number of non-zero bits, thus the name canonic. iii) The CSD representation of a number is unique. iv) CSD numbers contains about 33% fewer non-zero bits than two s complement numbers. v) Mantissa multiplication CSD is minimal signed digit code; minimal representation refers to a code requiring minimum number of non-zero digits for representation of a given number. A. CSD Algorithm: In CSD algorithm, coding of mantissa of multiplier is done so as to minimize the number of non-zero digits. The algorithm computes the recoded number starting from the least significant digit and proceeding to the left. Let Y is the auxiliary carry, X is the mantissa of multiplier, and C is multiplier s CSD code then, equations 1a and 1b are used to produce CSD code of multiplier.

Algorithms for Fast Multiplication of Double Precision Floating Point Numbers 5 Starting from LSB of X, first auxiliary carry variable Y0 is set to 0 and subsequently the binary number X is scanned two bits at a time [18].The canonically Recoded digit Ci and the next value of the auxiliary binary variable Yi+1 for i = 0, 1,2.. N is generated. Yi+1 is carry vector so it follows the rules of binary addition i.e. Yi+1 is 1 if there are 2 or three ones among Yi, Xi+1, Xi. Yi+1=Yi+Xi+1+X Ci=Yi+Xi-2Xi+1 Figure 3: Design flow of CSD algorithm. (a).... (b) Example: Take a number (13)10 = (01101)2 then using 13 = 16-4 + 1. CSD representation of 23 results 10-01. Notation (-) represents (-1). B. Methodology of CSD algorithm Product of two N bit numbers is obtained in N/2-1 steps. Figure 3 shows the methodology of multiplication using CSD coding.csd multiplication unit contains a CSD vector generation unit, a twos compliment generation unit, shift and control logic unit, and adder unit. i) Multiplier and multiplicand are stored in registers. ii) Mantissa of multiplicand is fed to twos compliment unit to get twos complement of multiplicand. iii) CSD coding of mantissa of multiplier is done and CSD code is fed to the shift control logic to produce the required number of left shifts to multiplicand or two s complimen depending on sign and position of non-zero digit in coded vector. Barrel shifter is used for shifting the multiplicand. Utilizing the property of CSD code that no two consecutive bits can be nonzero. N/2 partial products are generated. iv) These N/2 partial products are given to binary tree adder. Tree adder has property of providing pipeline without increasing overhead. Ten stages of pipeline are achieved and output of tree adder is result of mantissa multiplication. v) The result is fed to normalization and rounding unit and then after exponent adjustment the results are packed to get final result.

6 Geetanjali Wasson VII. Results Table 1 show that our algorithm outperforms any other algorithm in terms of maximum clock frequency. Our algorithm needs 10 pipeline stages for mantissa multiplication. Table 2 compares the results of CSD with other fast algorithm for mantissa multiplication and it is observed that CSD is better than Vedic algorithm when maximum clock frequency is prime concern. Table 1: Comparison of mantissa multiplication performance on Virtex 2 device (virtex2p xc2vp50) Mantissa Multiplier OURS LEE AKOLGU GOVINDU Speed(In Mhz) 341.530 67 231 200 Pipelines 10 2 6 7 Table 2: Comparison of our d.p. floating point multiplication with akoglu vedic algorithm on spartan3 device (3s1500fg676) FP MULTIPLICATION VEDIC OURS Speed (In Mhz) 312.891mhz 370.5096mhz Area(In Slices) 6848 5036 4 Input Luts 11895 8959 Pipeliones/Latency 12 10 VIII. Conclusion And Future Work This paper presents CSD algorithm implementation for double precision binary floating point multiplication. A signed digit coding is used by which number of partial products and hence addition levels in adder stage are reduced. Mantissa multiplication performance reaches to 385.505MHz on virtex4, 341.530MHz on virtex2 and 370.096MHz on Spartan 3.While performing multiplication on double precision binary floating point numbers, for mantissa multiplication task Canonic Signed digit algorithm is better than Vedic algorithm as: Maximum Clock Frequency is more with Canonic Signed digit algorithm than that allowed by Vedic Algorithm. Slices needed by CSD algorithm are 14% less than that required for Vedic Algorithm. LUTs needed for CSD algorithm are 11% less than that needed by Vedic Algorithm. Flip-Flops needed for CSD implementation are 9% more than required by Vedic Algorithm. Future work involves the optimization of resources and implementation of division and square root operations.

Algorithms for Fast Multiplication of Double Precision Floating Point Numbers 7 IX. References [1] B.C. Jinaga, Saroja. V Siddamal, R.M Banakar, Design of high-speed floating point multiplier 4th IEEE International Symposium on Electronic Design, Test & Applications, IEEE computer society, 2008, pp. 285,289 [2] E.M.Saad, M.Taher, High speed area efficient FPGA based floating point arithmetic modules, National conference on radio science (NRSC 2007), March-2007, pp 1-8. doi:10.1109/delta.2008.19 [3] Sang-min Kim, Low error fixed width CSD multiplier with efficient sign extension, IEEE Transactions on circuits and systems-ii: analog and digital signal processing,vol50,december 2003. doi:10.1109/tcsii.2003.820231 [4] Iterative radix-8 multiplier structure based on a novel real time CSD recoding, ACSSC 2007, conference record of forty first Asilomar conference on signals, systems & computers, 2007, pp 977 to 981. [5] Himanshu Thapliyal, Modified Montgomery Modular Multiplication using 4:2 Compressor and CSA Adder, Proceedings of the third IEEE international workshop on electronic design, test and applications (DELTA 06), Jan 2005 [6] B.Lee, N. Burgess, "Parameterizeable Floating-point Operations on FPGA" Signals, Systems and Computers, 2002. Conference Record of the 36th Asilomar Conference on Volume 2, 3-6 Nov. 2002, pp. 1064-1068 [7] G. Govindu, L. Zhuo, S. Choi and V. Prasanna, "Analysis of high-performance floating-point arithmetic on FPGAs,"l8th International Parallel and Distributed Processing Symposium, IPDPS 2004, Apr 26-30 2004, pp. 2043-2050. doi:10.1109/ipdps.2004.1303135 [8] G.Lienhart, A.Kugel and R.Manner, Using floating-point arithmetic on FPGAs to accelerate scientific body simulations, IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE Computer, April 2002, pp.182-191. [9] Kwen-Siong Chong, A Micro-power Low-Voltage Multiplier With Reduced Spurious Switching, IEEE transactions on VLSI systems, vol13, No.2, Feb 07, pp 255-265 [10] Mark A. Erle, Decimal Floating-Point Multiplication Via Carry-Save Addition, 18th IEEE Symposium on Computer Arithmetic (ARITH'07), 2007. doi:10.1109/arith.2007.14 [11] Nachiket Kapre, Optimistic Parallelization of Floating-Point Accumulation, 18 th IEEE Symposium on Computer Arithmetic(ARITH'07), 2007. doi:10.1109/arith.2007.25 [12] Ali Akoglu, Sandeep K Venishetti, A Highly Parallel FPGA based IEEE-754 Compliant Double Precision Binary Floating Point Multiplication Algorithm International conference on field programmable technology (FPT), Dec-2007, pp 145-152. doi:10.1109/fpt.2007.4439243 [13] G.Govindu, L.Zhuo, Analysis of high-performance floating point arithmetic on FPGAs, 18 th International Parallel and Distributed Processing Symposium, IPDPS 2004, Apr 2004, pp 2043-205. How to cite Geetanjali Wasson, "IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers". International Journal of Research in Computer Science, 1 (1): pp. 1-7, September 2011. doi:10.7815/ijorcs.11.2011.001