Fused Floating Point Three Term Adder Using Brent-Kung Adder

Similar documents
AN IMPROVED FUSED FLOATING-POINT THREE-TERM ADDER. Mohyiddin K, Nithin Jose, Mitha Raj, Muhamed Jasim TK, Bijith PS, Mohamed Waseem P

Low Power Floating-Point Multiplier Based On Vedic Mathematics

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

A BSD ADDER REPRESENTATION USING FFT BUTTERFLY ARCHITECHTURE

Design and Implementation of IEEE-754 Decimal Floating Point Adder, Subtractor and Multiplier

A Pipelined Fused Processing Unit for DSP Applications

FFT REPRESENTATION USING FLOATING- POINT BUTTERFLY ARCHITECTURE

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

An Effective Implementation of Dual Path Fused Floating-Point Add-Subtract Unit for Reconfigurable Architectures

Improved Design of High Performance Radix-10 Multiplication Using BCD Codes

CS Computer Architecture. 1. Explain Carry Look Ahead adders in detail

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

ADDERS AND MULTIPLIERS

FLOATING POINT ADDERS AND MULTIPLIERS

High Speed Multiplication Using BCD Codes For DSP Applications

DESIGN OF HYBRID PARALLEL PREFIX ADDERS

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

University, Patiala, Punjab, India 1 2

Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator

AT Arithmetic. Integer addition

Efficient Radix-10 Multiplication Using BCD Codes

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

By, Ajinkya Karande Adarsh Yoga

High Throughput Radix-D Multiplication Using BCD

Lecture 19: Arithmetic Modules 14-1

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

1. Introduction. Raj Kishore Kumar 1, Vikram Kumar 2

DESIGN OF RADIX-8 BOOTH MULTIPLIER USING KOGGESTONE ADDER FOR HIGH SPEED ARITHMETIC APPLICATIONS

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Lecture 5. Other Adder Issues

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Week 7: Assignment Solutions

A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design

HIGH PERFORMANCE FUSED ADD MULTIPLY OPERATOR

Measuring Improvement When Using HUB Formats to Implement Floating-Point Systems under Round-to- Nearest

EE260: Logic Design, Spring n Integer multiplication. n Booth s algorithm. n Integer division. n Restoring, non-restoring

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

AN EFFICIENT REVERSE CONVERTER DESIGN VIA PARALLEL PREFIX ADDER

Compound Adder Design Using Carry-Lookahead / Carry Select Adders

An FPGA based Implementation of Floating-point Multiplier

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

Design of BCD Parrell Multiplier Using Redundant BCD Codes

II. MOTIVATION AND IMPLEMENTATION

Design and Implementation of Floating-Point Butterfly Architecture Based on Multi-Operand Adders

Digital Computer Arithmetic

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

Assistant Professor, PICT, Pune, Maharashtra, India

VLSI Based Low Power FFT Implementation using Floating Point Operations

the main limitations of the work is that wiring increases with 1. INTRODUCTION

Design and Implementation of High Performance Parallel Prefix Adders

COMP2611: Computer Organization. Data Representation

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

ISSN (Online)

A Review on Optimizing Efficiency of Fixed Point Multiplication using Modified Booth s Algorithm

VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS

A High-Performance Significand BCD Adder with IEEE Decimal Rounding

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

THE DESIGN OF AN IC HALF PRECISION FLOATING POINT ARITHMETIC LOGIC UNIT

Binary Floating Point Fused Multiply Add Unit

AN EFFICIENT FLOATING-POINT MULTIPLIER DESIGN USING COMBINED BOOTH AND DADDA ALGORITHMS

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION

A Binary Floating-Point Adder with the Signed-Digit Number Arithmetic

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

VHDL implementation of 32-bit floating point unit (FPU)

A Simple Method to Improve the throughput of A Multiplier

Design of High Speed Modulo 2 n +1 Adder

International Journal Of Global Innovations -Vol.6, Issue.I Paper Id: SP-V6-I1-P14 ISSN Online:

An Efficient Hybrid Parallel Prefix Adders for Reverse Converters using QCA Technology

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

IMPLEMENTATION OF TWIN PRECISION TECHNIQUE FOR MULTIPLICATION

DESIGN AND IMPLEMENTATION OF FAST DECIMAL MULTIPLIER USING SMSD ENCODING TECHNIQUE

HIGH-SPEED CO-PROCESSORS BASED ON REDUNDANT NUMBER SYSTEMS

Floating Point Numbers

Design and Implementation of a Quadruple Floating-point Fused Multiply-Add Unit

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

High Speed Han Carlson Adder Using Modified SQRT CSLA

Srinivasasamanoj.R et al., International Journal of Wireless Communications and Network Technologies, 1(1), August-September 2012, 4-9

Integer Multiplication. Back to Arithmetic. Integer Multiplication. Example (Fig 4.25)

Foundations of Computer Systems

Computer Arithmetic Ch 8

Computer Arithmetic Ch 8

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

Design and Characterization of High Speed Carry Select Adder

Floating-Point Butterfly Architecture Based on Binary Signed-Digit Representation

CS 33. Data Representation (Part 3) CS33 Intro to Computer Systems VIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Floating-point representations

The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop.

Floating-point representations

FLOATING POINT NUMBERS

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

Floating Point Numbers

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Transcription:

P P IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 2 Issue 9, September 205. Fused Floating Point Three Term Adder Using Brent-Kung Adder 2 Ms. Neena Aniee JohnP P and Ms. Binu ManoharP PElectronics and communication Mangalam college of EngineeringKerala, India,3T Uneenaaniee6@gmail.comU3T 2 PElectronics and communication Mangalam college of EngineeringKerala, India, Ubinu.manohar@mangalam.co.inU3T ABSTRACT A fused floating-point three-term adder which performs two additions in a single unit to achieve better performance and accuracy compared to a network of discrete design. To further improve the performance of the three-term adder, several optimization techniques are applied. This include a new exponent compare and significand alignment, dualreduction, early normalization, three-input leading zero anticipation, compound addition/rounding and pipelining. The fused floating-point three-term adder reduces the area and power consumption compared to a discrete floating-point three-term adder. Brent- Kung is a parallel prefix form of the carry look ahead adder. It takes less area compared to other parallel prefix adder.. INTRODUCTION The Floating-point operations require complex processes such as alignment, normalization and rounding, which increases the area, power consumption and latency. Therefore to further reduce the overhead, fused floating-point units have been proposed, which performs several operations in a single unit. Several algorithms and optimization techniques are applied to improve the performance of fused floating point adder. ) Proposed a new exponent compare and significand alignment. Here the three exponent differences are computed in parallel and the differences are used for the significand alignment. The largest exponent and three aligned significands are determined by the control logic. This approach reduces the latency by performing the exponent processing and significand alignment simultaneously. 2) To check significand addition is positive or negative, dual reduction is used. Two reduction trees generated both the positive and negative significand pairs and the positive significand pair is selected based on the significand comparison.as the selected significand pair produces a positive sum the complementation after the significand addition is avoided, so that the latency is reduced. 3) To reduce the significand addition early normalization is applied. The adder size can be reduced by performing the normalization earlier and the rest of bits are used for rounding, which significantly reduces the latency. 4) As the normalization is performed prior to the significand addition, the Leading Zero Anticipation (LZA) and normalization shift has been done.to re- duce the latency, a three-input LZA is used. 5) For fast rounding Compound addition is used. The compound addition gives the rounded and unrounded sums together. The round logic is used to select the correct one. 2. FUSED FLOATING-POINT THREE-TERM ADDER The traditional fused floating-point three-term adder is an initial design so that optimizations can be applied to improve the performance. Fig. shows the modified design for fused floating-point three-term adder. Here five optimizations for the fused floating-point threeterm adder are described: ) A new exponent compare and significand alignment scheme, 2) Dual-reduction to avoid the need for complementation after the significand addition, 3) Early normalization, 4) Three-input LZA, and 5) Compound addition and rounding. Fig.. A fused floating-point three-term adder 755

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 2 Issue 9, September 205. 2. Exponent Compare and Significand Alignment The three operands determine the largest exponent and aligned significands. Based on exponent difference the largest exponent is determined. The significand alignment is determined based on exponent differences. This approach requires performing the exponent subtractions, complementation and significand shift sequentially, which takes a large latency. In order to reduce the latency, a new exponent compare and significand alignment logic is proposed as shown in Fig.2. Six subtractions are performed to compute all the combinations of (exprar-exprbr, exprbr-expra R,expRbR-expRcR, exprcrexprbr, exprcr-exprar and exprar-exprcr) differences, an absolute value is selected based on the exponent comparison result, which enables skipping the complementation after the subtractions. Based on the exponent comparison results which is shown in Table I,the control logic determines the largest exponent and the aligned significands. Table I: Exponent Compare Control Logic The aligned significands become 2f + 6 bits wide including two overflow bits, and guard, round and sticky bits, where f is the number of the significand bits as shown in fig 4. Fig.2. Exponent compare and significand alignment logic The exponent differences are used for the significand shifters.the sticky logic is performed inorder to determine the guard, round and sticky bits. The first and second bits of the LSB are the guard and round bits. If one of the over shifted bit is,then the sticky bit is set to.the sticky bit can be implemented with OR trees. The remaining over shifted bit is discarded.fig. 3 shows the significand shifter and sticky logic example for the single precision. Fig.3. Significand shifter and sticky logic for the single precision Fig.4.Significand process alignment, invert, reduction, normalization, addition and rounding 756

prirp prirp prirp prirp P = IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 2 Issue 9, September 205. 2.2. Invert & Dual-Reduction The sign logic generates the three effective sign bits ( sign_eff_a,sign_eff_b and sign_eff_c) based on the three sign bits and two op codes as Sign_eff_a =sign_a Sign_eff_b =sign_a (sign_b op) Sign_eff_c=sign_a (sign_c op2...(2) Where op and op2 are the first and second op codes, respectively. Based on the effective sign bits the aligned significands are inverted. 2 bits are extended to the LSB of the significands to avoid the increments after the inverters,. Table II shows the 2 bit extended LSBs based on the effective sign bits. is reduced to f+ bits and the rest of lower bits (f+7) are passed to rounding. 2.4. Three-Input LZA and Significand Comparison The three-input LZA encodes the three inputs at once.the three-input LZA consists of two parts: ) Preencoding indicator vectors and 2) Leading Zero Detection (LZD) logic for generating the leading zero count. The pre-encoder performs the bitwise operations to generate the W vector as W=A+B+C wi=ai+bi+ci, wi {0,,2,3}.. (4) Table II: 2 Bit Extended Lsbs For Complementation Inorder to eliminate the complementation after the significand addition dual-reduction is used. One reduction tree takes the three inverted significands and the other takes the reversely inverted significands. (e.g.a-b-c and A+B+C). The two 3:2 CSAs produce the two reduced significand pairs. Based on the sign of the significand sum,the positive pair is selected. For the delay-optimization, the first part of the significand addition is performed after the reduction.the addition performs a PG generator. A Kogge-Stone adder is used to perform this addition. The level 0 (for pi and gi) and level are implemented here. where ai,bi,ci are the ith bits of the three significands. The W vector can be represented by one of 0 2 3 the four elements, prirp P, prirp P, prirp Pand prirp P, indicating wi equals to 0,, 2 and 3, respectively. 0 if wi =0 P= if wi= 2 P= if wi=2 3 P= if wi=3. Using the four elements, the W vector is pre-encoded into three symbols, zi,ti and gi as zrir= prirp trir= (prirp grir= prirp 0 0 P(pP 0 3 2 3 PRi-R+pP PRi-R)+pRiRP P(pP PRi-R +pp PRi-R) 2 2 3 3 P)( pp PRi-R +pp PRi-R)+( prirp P+ prirp 2 3 2 0 PRi-R +pp PRi-R)+ prirp P( pp PRi-R+pP PRi-R) P+ prirp P)( pp 0 PRi-R+pP PRi-R) P(pP.. (5) The F vector is computed with the three symbols as f i = t i+ (g i Z i + Z i g i ) + t i+(z i Z i + g i g i ).(6) The F vector is passed to the LZD logic. The LZD produces the leading zero count which becomes the shift amount of the normalization. Fig. 5 shows the 64 bit LZD tree. Pi=ai bi Gi=aibi P ij=pi pi-.pi +(2j-) Gij=gi + gi- pi+.+gi - (2j - )pi pi-.pi - (2j- 2).(3) Where ai and bi are ith bits of the two significands and j is the level of the prefix tree adder. 2.3. Early Normalization The aligned significand of traditional fused floating-point three-term adder is 2f+ 6 bits, where f is the number of significand bits. Such large significands require a large significand addition and normalization.to reduce the overhead, early normalization is applied. As shown in Fig.4, the normalization is performed prior to the significand addition. Therefore the significand adder size Fig 5. 64 bit LZD tree for the single precision. 757

represent and are LR2R,LRRand IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 2 Issue 9, September 205. 2.5. Compound Addition and Rounding The normalized significand pair is passed to the significand addition and rounding. The upper significand bits are used for the addition and the lower bits are used for rounding.the significand addition result needs to be right shifted for the post-normalization. Fig.6 shows the significand result selection and the rounding position depending on the overflows of the significand addition. OR2 Rand ORR two overflow bits; LR0Rare the LSBs if there is two, one or no overflow; G, R and S mean the guard, round and sticky bits, respectively. Since the significand sum produces the overflow of at most 2, the two overflow bits ORR OR2R mutually exclusive. To determine if the significand sum is rounded up or not the round logic requires computing the LSB, carry, guard, round and sticky bits (L, C, G, R and S). As described in Fig.6, the rounding position varies depending on the overflow. Therefore, three rounding cases (i.e., two, one, and no overflows) are separately computed. Fig.8 shows the round logic. Fig.6. Significand result selection and rounding position. The compound addition and rounding for the fused floating-point three-term adder is shown in Fig. 7. The compound addition determines the upper f bits including two overflow bits, and the rounding determines the rest of three LSBs and the round decision. The upper f + are passed to the compound addition. Based on the overflow bits two significand sums are right shifted by 2. Based on the round decision.the correct significand sum is selected. Fig.8.Round logic. The largest exponent is adjusted by subtracting the shift amount from the LZA and adding the carry-out of the significand addition as shown in Fig. 9. Fig. 9.Exponent adjust logic Using the adjusted exponent, the exceptions are detected. The three exception cases are Over flow = if exp max _exp 0 otherwise Under flow = if exp 0 0 otherwise Inexact = round _ up\ overflow \ under flow. Where round _ up is the rounding decision of the significand addition result. Fig. 7. Compound addition and rounding. 758

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 2 Issue 9, September 205. 3. MODIFICATION 4. SIMULATION 3. Kogge-Stone adder Kogge-Stone adder is a parallel prefix form carry look ahead adder. Kogge-Stone prefix adder is a fast adder design. KS adder has best performance in VLSI implementations. It shows the less delay among other architectures so Kogge Stone adder is used for wide adders. In fig 0 each vertical stage produce Propagate and Generate bits. Generate bits that produced in the last stage are XORed with the initial propagate after the input to produce the sum bits. The 2-bit Kogge- Stone adder is shown below. Floating point Numbers Fig.2.Output waveform Binary representation 8.5 0_00000_000000000000000000000 9.3 0_000000_000000000000 2.4 0_0000000_000000000000 Figure 0: 2-bit KS Adder But the major drawback of Kogge-Stone adder is it has large area. So we replace Kogge-Stone adder with Brent-Kung adder. 3.2 Brent Kung Adder Brent- Kung is a parallel prefix form of the carry look ahead adder. It takes less area to implement than the other prefix adders such as Kogge- Stone adder and it also has less wiring congestion. Instead of using a carry chain to calculate the output, the method shown in Figure is used. Result 30.2 0_00000_00000000000 Table 3: Tabulation of output 4. Tabulation Results COMPARISON OF POWER DELAY PRODUCT Using Kogge Stone adder Using Brent-Kung adder Power =520 Power = 33 Delay = 78.332 Delay = 72.835 Power delay product=40732.64 Power delay product=9687.055 Table 4: Comparison of power delay product COMPARISON OF AREA DELAY PRODUCT Using Kogge Stone adder Using Brent-Kung adder Figure : 4-bit BK Adder This will reduce the delay without compromising the power performance of the adder. Replacing the Kogge-Stone adder with the above structure of Brent-Kung adder in the part of addition we can see that there is significant reduction in area, power and delay. Area =357 Area = 024 Delay = 78.332 Delay = 72.835 Area delay Area product=06296.524 product=74583.04 Table 5: Comparison of area delay product delay 759

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 2 Issue 9, September 205. 5. CONCLUSION The fused floating point three term adderweredemonstrated. The adder were implemetated in VHDL.The main advantages of the implementation using Brent-Kung is that it reduces area,power and delay compared to that of Kogge-Stone adder. Several algorithms and optimization techniques are applied: ) A new exponent compare and significand alignment 2) Dualreduction, 3) Early normalization, 4) Three-input LZA and 5) Compound addition and rounding. 6. REFERENCES [] Jongwook Sohn and E. E. Swartzlander, Jr., A fused floating-point three-term adder, IEEE Trans. Circuits Syst., 204,pp. 2842 2850. [2] Y. Tao, G. Deyuan, F. Xiaoya, and R. Xianglong, Three-operand floating-point adder, in Pro. 2th IEEE Int. Conf. Comput. Inf. Technol., 202, pp. 92 96. [3] H. H. Saleh and E. E. Swartzlander, Jr., A floatingpoint fused dot- product unit, in Proc. IEEE Int. Conf. Computer Des., 2008, pp.427 43. [4] H. H. Saleh and E. E. Swartzlander, Jr., A floatingpoint fused add- subtract unit, in Proc. 5st IEEE Midwest Symp. Circuits Syst., 2008,pp. 59 522. [5] T.Lang and J. D. Bruguera, Floating point fused multiply add with reduced latency, IEEE Trans. Computers, vol. 53, pp. 988 003, 2004. 760