AN IMPROVED FUSED FLOATING-POINT THREE-TERM ADDER. Mohyiddin K, Nithin Jose, Mitha Raj, Muhamed Jasim TK, Bijith PS, Mohamed Waseem P

Similar documents
Fused Floating Point Three Term Adder Using Brent-Kung Adder

A Pipelined Fused Processing Unit for DSP Applications

Design and Implementation of IEEE-754 Decimal Floating Point Adder, Subtractor and Multiplier

Low Power Floating-Point Multiplier Based On Vedic Mathematics

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

Design a floating-point fused add-subtract unit using verilog

AN EFFICIENT FLOATING-POINT MULTIPLIER DESIGN USING COMBINED BOOTH AND DADDA ALGORITHMS

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FLOATING POINT ADDERS AND MULTIPLIERS

A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design

An Effective Implementation of Dual Path Fused Floating-Point Add-Subtract Unit for Reconfigurable Architectures

Design of a Floating-Point Fused Add-Subtract Unit Using Verilog

ADDERS AND MULTIPLIERS

1. Introduction. Raj Kishore Kumar 1, Vikram Kumar 2

A Floating-Point Fused Dot-Product Unit

FLOATING-POINT BUTTERFLY ARCHITECTURE BASED ON MULTI OPERAND ADDERS

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL

A BSD ADDER REPRESENTATION USING FFT BUTTERFLY ARCHITECHTURE

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Design of High Speed Modulo 2 n +1 Adder

Neelesh Thakur, 2 Rama Pandole Dept. of Electronics and Communication Engineering Kailash narayan patidar college of science and technology, India

AT Arithmetic. Integer addition

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Assistant Professor, PICT, Pune, Maharashtra, India

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 2, Mar-Apr 2015

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

II. MOTIVATION AND IMPLEMENTATION

Reduced Latency IEEE Floating-Point Standard Adder Architectures

A Survey on Floating Point Adders

University, Patiala, Punjab, India 1 2

the main limitations of the work is that wiring increases with 1. INTRODUCTION

2 General issues in multi-operand addition

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

Lecture 19: Arithmetic Modules 14-1

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

HIGH PERFORMANCE FUSED ADD MULTIPLY OPERATOR

VLSI Based Low Power FFT Implementation using Floating Point Operations

Improved Design of High Performance Radix-10 Multiplication Using BCD Codes

Lecture 5. Other Adder Issues

Design of Delay Efficient Distributed Arithmetic Based Split Radix FFT

Design and Simulation of Pipelined Double Precision Floating Point Adder/Subtractor and Multiplier Using Verilog

High Speed Han Carlson Adder Using Modified SQRT CSLA

Design and Implementation of a Quadruple Floating-point Fused Multiply-Add Unit

FFT REPRESENTATION USING FLOATING- POINT BUTTERFLY ARCHITECTURE

Efficient Design of Radix Booth Multiplier

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

HIGH-SPEED CO-PROCESSORS BASED ON REDUNDANT NUMBER SYSTEMS

Efficient Radix-10 Multiplication Using BCD Codes

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator

High Throughput Radix-D Multiplication Using BCD

Binary Floating Point Fused Multiply Add Unit

Carry-Free Radix-2 Subtractive Division Algorithm and Implementation of the Divider

Compound Adder Design Using Carry-Lookahead / Carry Select Adders

DESIGN OF RADIX-8 BOOTH MULTIPLIER USING KOGGESTONE ADDER FOR HIGH SPEED ARITHMETIC APPLICATIONS

Parallel-Prefix Adders Implementation Using Reverse Converter Design. Department of ECE

THE DESIGN OF AN IC HALF PRECISION FLOATING POINT ARITHMETIC LOGIC UNIT

Operations On Data CHAPTER 4. (Solutions to Odd-Numbered Problems) Review Questions

Digital Computer Arithmetic

CS Computer Architecture. 1. Explain Carry Look Ahead adders in detail

High Speed Multiplication Using BCD Codes For DSP Applications

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.

DESIGN AND IMPLEMENTATION OF FAST DECIMAL MULTIPLIER USING SMSD ENCODING TECHNIQUE

A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA

ISSN Vol.03,Issue.05, July-2015, Pages:

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Chapter 4 Design of Function Specific Arithmetic Circuits

Analysis of Different Multiplication Algorithms & FPGA Implementation

High-Radix Implementation of IEEE Floating-Point Addition

Floating-Point Butterfly Architecture Based on Binary Signed-Digit Representation

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015

A High-Performance Significand BCD Adder with IEEE Decimal Rounding

HIGH SPEED SINGLE PRECISION FLOATING POINT UNIT IMPLEMENTATION USING VERILOG

High Speed Special Function Unit for Graphics Processing Unit

An Efficient Hybrid Parallel Prefix Adders for Reverse Converters using QCA Technology

A novel technique for fast multiplication

DESIGN AND ANALYSIS OF COMPETENT ARITHMETIC AND LOGIC UNIT FOR RISC PROCESSOR

ISSN: X Impact factor: (Volume3, Issue2) Analyzing Two-Term Dot Product of Multiplier Using Floating Point and Booth Multiplier

Improved Combined Binary/Decimal Fixed-Point Multipliers

CPE300: Digital System Architecture and Design

ANALYZING THE PERFORMANCE OF CARRY TREE ADDERS BASED ON FPGA S

Multi-Modulus Adder Implementation and its Applications

Floating Point Unit Generation and Evaluation for FPGAs

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Implementation of Floating Point Multiplier Using Dadda Algorithm

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

A Unified Addition Structure for Moduli Set {2 n -1, 2 n,2 n +1} Based on a Novel RNS Representation

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm

Non-Heuristic Optimization and Synthesis of Parallel-Prefix Adders

Implementation of Double Precision Floating Point Multiplier in VHDL

Transcription:

AN IMPROVED FUSED FLOATING-POINT THREE-TERM ADDER Mohyiddin K, Nithin Jose, Mitha Raj, Muhamed Jasim TK, Bijith PS, Mohamed Waseem P ABSTRACT A fused floating-point three term adder performs two additions in a single unit.it has better performance and accuracy compared to a network of discrete design. To improve the performance of three term adder, several optimization techniques are used. Brent kung adder is used in this architecture to achieve reduction in area. The proposed design is implemented for single precision and synthesized with a 45 nm cmos standard cell library. INTRODUCTION Addition is the most frequently used operation in many algorithms and applications. In case of the additions in series, however, a network of the two term adders loses accuracy due to the multiple roundings one after each addition. Many recent floating-point units can accommodate operations that have three inputs. Issues for the design of the fused floating-point three-term adder are (1)Complex exponent processing and significand alignment, (2)Complementation after the significand addition, (3)Large precision significand addition, (4)Massive cancellation management, and (5)Complex round processing. Those issues are addressed by investigating several optimization techniques in previous work. In this work the area is further reduced by replacing Kogge-Stone adder with Brent-Kung adder. The improved fused floating-point three-term adder will contribute to the next generation of floatingpoint arithmetic unit design. OPTIMIZATION TECHNIQUES 1. EXPONENT COMPARE AND SIGNIFICAND ALIGNMENT It is necessary to determine the largest exponent and aligned significands to handle the three operands. The traditional fused floating-point three-term adder determines the largest exponent based on the 5274 www.ijariie.com 1665

exponent differences. Then, the exponent differences are used for the significand alignment. This approach requires performing the exponent subtractions, complementation and significand shift sequentially, which takes a large latency. In order to reduce the latency, a new exponent compare and significand alignment logic is proposed. Six subtractions are performed to compute all the combinations of exponent differences. In each pair of differences, an absolute value is selected based on the exponent comparison result, which enables skipping the complementation after the subtractions. 5274 www.ijariie.com 1666

Fig.1 An improved fused floating-point three-term adder 2. INVERT AND DUAL-REDUCTION The sign logic generates the three effective sign bits ( the three sign bits and two op codes as ) based on 5274 www.ijariie.com 1667

) Where and are the first and second op codes, respectively. The aligned significands are inverted based on the effective sign bits for the subtraction. The three operand subtraction requires that up to two significands are complimented. If all three operands are negative, they are added and the sign becomes negative, that is A-B-C = -(A+B+C). In order to avoid the increments after the inverters, 2 bits are extended to the LSB of the significands that are propagated to the significand addition to handle the cases that one or two significands are inverted by effectively adding 1 or 2, respectively. 3. EARLY NORMALIZATION One of the design issues for the fused floating-point three term adder is the high precision significand addition. The traditional fused floating-point three-term adder aligns the significands to 2f+6 bits, where f is the number of significand bits. Such large significands require a large significand addition and normalization, which the biggest bottleneck of the fused floating-point three-term adder. To reduce the overhead, early normalization is applied. The normalization is performed prior to the significand addition so that the significand adder size is reduced to f+1 bits. The rest of lower bits (f+7) are passed to rounding. By normalizing the significand pair prior to the significand addition, the round position is fixed so that the significand addition and rounding can be performed in parallel, which significantly reduces the latency of the critical path. 4. THREE-INPUT LZA AND SIGNIFICAND COMPARISON Since the normalization is performed prior to the significand addition, the LZA and normalization is on the critical path. To use a traditional two-input LZA, the three significands need to be reduced to two using a 3:2 CSA, which increases the delay. The three-input LZA encodes the three inputs at once to skip the delay of the 3:2 CSA. The three-input LZA can be implemented by extending the traditional two-input LZA. Like most of the LZAs, the three-input LZA consists of two parts: 1) Pre-encoding indicator vectors and 2) Leading Zero Detection (LZD) logic for generating the leading zero count. 5. COMPOUND ADDITION AND ROUNDING The normalized significand pair is passed to the significand addition and rounding. As described above, the upper significand bits are used for the addition and the lower bits are used for the rounding. 5274 www.ijariie.com 1668

In the two-term adder, only one significand is shifted for the alignment so that there is no carry propagation for the lower part. The compound addition determines the upper f bits including possible two overflow bits, and the rounding determines the rest of three LSBs and the round decision. The upper f+1 bits are passed to the compound addition, which produces sum and sum+1 simultaneously. Two significand sums are right shifted by up to 2 bits based on the overflow bits. Then, one correct significand sum is selected based on the round decision. 6. BRENT KUNG ADDER The earlier fused floating point adders used Kogge-Stone adder. In this paper, the Kogge-Stone adder is replaced with the Brent-Kung adder inorder to reduce the adder size Brent- Kung is a parallel prefix form of the carry look ahead adder. It takes less area to implement than the other prefix adders such as Kogge- Stone adder and it also has less wiring congestion. Instead of using a carry chain to calculate the output, the method shown in Fig.2 is used. Fig.2 4-bit Brent-Kung Adder This will reduce the delay without compromising the power performance of the adder. Replacing the Kogge-Stone adder with the above structure of Brent-Kung adder in the part of addition we can see that there is significant reduction in area. 5274 www.ijariie.com 1669

RESULTS AND COMPARISON The proposed design is implemented for single precision in Verilog-HDL and synthesized with the Nangate 45 nm CMOS technology standard cell library. In order to evaluate the improvement of the proposed design, the area and delay are compared with the previous designs. Depending on the target frequency, the implementations are synthesized with different area and delay. Table 1 compares the logic utilization of Kogge-stone adder and Brent-kung adder. Even though the delay for both adder is same, the area covered is smaller for Brent-kung adder. Thus the power consumption also become lower. The proposed design has a smaller significand adder size compared to the traditional designs. Table 6.1 Logic utilization comparison Logic utilization Kogge-Stone adder Brent-Kung adder Number of slice registers 59 59 Number of slice LUTs 1028 923 Number of fully used LUT-FF pairs 59 59 Number of bonded IOBs 1658 1658 Delay 14.387 ns 14.387 CONCLUSION The improved architecture design and implementation for a fused floating-point three-term adder has been presented. There are several critical design issues for the fused floating-point three-term adder: 1) Complex exponent processing and significand alignment, 2) Complementation after the significand addition, 3) Large precision significand adder, 4) Massive cancellation management, and 5) Complex round processing. To resolve those issues, several algorithms and optimization techniques are applied. For further improvement, Kogge-stone adder is replaced with Brent-kung adder which reduces the area. 5274 www.ijariie.com 1670

REFERENCES [1] Jongwook Sohn, Earl E. Swartzlander Jr.,(2014) A Fused Floating-Point Three-Term Adder, IEEE transactions on circuits and systems: regular papers, vol. 61, no. 10, october [2] Pappu P. Potdukhe, Vishal D. Jaiswal(2015) Review of carry select adder by using brent kung adder IJARECE Volume 4, Issue 10, October. [3] J. Sohn and E. E. Swartzlander, Jr., (2013) Improved architectures for a floating-point fused dot product unit, in Proc. 21st Symp. Computer Arithmetic, pp. 41 48 [4] J. Sohn and E. E. Swartzlander, Jr.,(2012) Improved architectures for a fused floating-point add-subtract unit, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 59, no. 10, pp. 2285 2291, Oct [5] A. Tenca, (2009) Multi-operand floating-point addition, in Proc. 21st Symp. Computer Arithmetic, pp. 161 168. [6] H. H. Saleh and E. E. Swartzlander, Jr., (2008) A floating-point fused addsubtract unit, in Proc. 51st IEEE Midwest Symp. Circuits Syst., pp. 519 522. [7] H. H. Saleh and E. E. Swartzlander, Jr.,(2008) A floating-point fused dotproduct unit, in Proc. IEEE Int. Conf. Computer Des., pp. 427 431. [8] T. Lang and J.D. Bruguera,(2004) Floating-point fused multiply-add with reduced latency, IEEE Trans. Computers, vol. 53, pp. 988 1003. [9] P. M. Seidel and G. Even,(2004) Delay-optimized implementation of IEEE floating-point addition, IEEE Trans. Computers, vol. 53, no. 2, pp. 97 113, Feb. [10] S. F. Oberman, H. Al-Twaijry, and M. J. Flynn, (1997) The SNAP project: Design of floating point arithmetic units, in Proc. 14th IEEE Symp.Computer Arithmetic, pp. 156 165. [11] E. Hokenek, R. K. Montoye, and P. W. Cook,( 1990) Second-generation RISC floating point with multiply-add fused, IEEE J. Solid-State Circuits, vol. 25, pp. 1207 1213, X. [12] R. K. Montoye, E. Hokenek, and S. L. Runyon, (1990) Design of the IBM RISC system/6000 floating-point execution unit, IBM J. Res. Develop., vol. 34, pp. 59 70. [13] M. P. Farmwald,(1981) On the Design of High Performance Digital Arithmetic Units, Ph.D. dissertation, Computer Science, Stanford University, Stanford, CA, USA. 5274 www.ijariie.com 1671