Design a floating-point fused add-subtract unit using verilog

Similar documents
Design of a Floating-Point Fused Add-Subtract Unit Using Verilog

A Floating-Point Fused Dot-Product Unit

A Pipelined Fused Processing Unit for DSP Applications

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

AN IMPROVED FUSED FLOATING-POINT THREE-TERM ADDER. Mohyiddin K, Nithin Jose, Mitha Raj, Muhamed Jasim TK, Bijith PS, Mohamed Waseem P

A Decimal Floating Point Arithmetic Unit for Embedded System Applications using VLSI Techniques

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL

Implementation of IEEE754 Floating Point Multiplier

Design and Implementation of IEEE-754 Decimal Floating Point Adder, Subtractor and Multiplier

II. MOTIVATION AND IMPLEMENTATION

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

ISSN Vol.02, Issue.11, December-2014, Pages:

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

An Effective Implementation of Dual Path Fused Floating-Point Add-Subtract Unit for Reconfigurable Architectures

Floating-Point Butterfly Architecture Based on Binary Signed-Digit Representation

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

A Single/Double Precision Floating-Point Reciprocal Unit Design for Multimedia Applications

AN EFFICIENT FLOATING-POINT MULTIPLIER DESIGN USING COMBINED BOOTH AND DADDA ALGORITHMS

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

High Speed Radix 8 CORDIC Processor

Measuring Improvement When Using HUB Formats to Implement Floating-Point Systems under Round-to- Nearest

Low Power Floating-Point Multiplier Based On Vedic Mathematics

VLSI Based Low Power FFT Implementation using Floating Point Operations

A Macro Generator for Arithmetic Cores

Implementation of 64-Bit Pipelined Floating Point ALU Using Verilog

ISim Hardware Co-Simulation Tutorial: Accelerating Floating Point Fast Fourier Transform Simulation

Vendor Agnostic, High Performance, Double Precision Floating Point Division for FPGAs

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015

Implementation of a High Speed Binary Floating point Multiplier Using Dadda Algorithm in FPGA

[Sahu* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

International Journal for Research in Applied Science & Engineering Technology (IJRASET) IIR filter design using CSA for DSP applications

Design and Simulation of Pipelined Double Precision Floating Point Adder/Subtractor and Multiplier Using Verilog

PROJECT REPORT IMPLEMENTATION OF LOGARITHM COMPUTATION DEVICE AS PART OF VLSI TOOLS COURSE

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

Low-Power Split-Radix FFT Processors Using Radix-2 Butterfly Units

32-bit Signed and Unsigned Advanced Modified Booth Multiplication using Radix-4 Encoding Algorithm

Implementation of Double Precision Floating Point Multiplier in VHDL

Study, Implementation and Survey of Different VLSI Architectures for Multipliers

Abstract. Literature Survey. Introduction. A.Radix-2/8 FFT algorithm for length qx2 m DFTs

VHDL implementation of 32-bit floating point unit (FPU)

Fixed-Width Recursive Multipliers

An Implementation of Double precision Floating point Adder & Subtractor Using Verilog

Improved Design of High Performance Radix-10 Multiplication Using BCD Codes

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

ISSN: X Impact factor: (Volume3, Issue2) Analyzing Two-Term Dot Product of Multiplier Using Floating Point and Booth Multiplier

University, Patiala, Punjab, India 1 2

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

High Speed Special Function Unit for Graphics Processing Unit

Design and Implementation of an Eight Bit Multiplier Using Twin Precision Technique and Baugh-Wooley Algorithm

FLOATING-POINT BUTTERFLY ARCHITECTURE BASED ON MULTI OPERAND ADDERS

Run-Time Reconfigurable multi-precision floating point multiplier design based on pipelining technique using Karatsuba-Urdhva algorithms

Efficient Design of Radix Booth Multiplier

DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTOR SUPPORT

Analysis of Different Multiplication Algorithms & FPGA Implementation

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

ANALYSIS OF AN AREA EFFICIENT VLSI ARCHITECTURE FOR FLOATING POINT MULTIPLIER AND GALOIS FIELD MULTIPLIER*

A High-Speed FPGA Implementation of an RSD- Based ECC Processor

Chapter 5 Design and Implementation of a Unified BCD/Binary Adder/Subtractor

Optimized Design and Implementation of a 16-bit Iterative Logarithmic Multiplier

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

Design and Implementation of 3-D DWT for Video Processing Applications

Computations Of Elementary Functions Based On Table Lookup And Interpolation

Reducing Computational Time using Radix-4 in 2 s Complement Rectangular Multipliers

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Implementation of Floating Point Multiplier Using Dadda Algorithm

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

Design and Implementation of a Quadruple Floating-point Fused Multiply-Add Unit

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope

Optimized Modified Booth Recorder for Efficient Design of the Operator

FLOATING POINT ADDERS AND MULTIPLIERS

High Throughput Radix-D Multiplication Using BCD

Performance of Constant Addition Using Enhanced Flagged Binary Adder

FFT REPRESENTATION USING FLOATING- POINT BUTTERFLY ARCHITECTURE

Paper ID # IC In the last decade many research have been carried

A High Speed Design of 32 Bit Multiplier Using Modified CSLA

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

I. Introduction. India; 2 Assistant Professor, Department of Electronics & Communication Engineering, SRIT, Jabalpur (M.P.

Design and Implementation of Effective Architecture for DCT with Reduced Multipliers

Design of an Efficient 128-Bit Carry Select Adder Using Bec and Variable csla Techniques

HIGH PERFORMANCE FUSED ADD MULTIPLY OPERATOR

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm

Efficient Radix-10 Multiplication Using BCD Codes

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

On-line Algorithms for Complex Number Arithmetic

Implementation of Fast and Efficient Mac Unit on FPGA

HIGH-SPEED CO-PROCESSORS BASED ON REDUNDANT NUMBER SYSTEMS

An Efficient Implementation of Floating Point Multiplier

Implementation of Double Precision Floating Point Multiplier on FPGA

ENERGY-EFFICIENT VLSI REALIZATION OF BINARY64 DIVISION WITH REDUNDANT NUMBER SYSTEMS 1 AVANIGADDA. NAGA SANDHYA RANI

Performance Analysis of 64-Bit Carry Look Ahead Adder

16-BIT DECIMAL CONVERTER FOR DECIMAL / BINARY MULTI-OPERAND ADDER

Transcription:

Available online at www.scholarsresearchlibrary.com Archives of Applied Science Research, 2013, 5 (3):278-282 (http://scholarsresearchlibrary.com/archive.html) ISSN 0975-508X CODEN (USA) AASRC9 Design a floating-point fused add-subtract unit using verilog Mayank Sharma, Aswani Sengar and Prince Nagar ECE Department, Sharda University, Gr. Noida, U.P., India ABSTRACT A floating point fused add subtract unit is described that performs simultaneous floating point add and subtract operations on a common Pair of single precision data in about the same time That it takes to performs a single addition with a Conventional floating point adder. Placed and routed in 45nm process. So that there will be less consumption of memory as well as power. INTRODUCTION Much research has been done on the floating-point fused multiply add (FMA) unit [1].It has several advantage in a floating-point unit design. Not only can a fused multiplier-add unit reduce the latency of an application that executes a multiplication followed by an addition, but the unit may entirely replace a floating point co-processor s floating adder and floating-point multiplier. Many DSP algorithms have been rewritten to take advantage of the presence of FMA units in a given systems with FMA systems. For example, in [4] a radix-16 FFT algorithm is presented that speeds up FFTs in systems with FMA units. High-throughput and digital filter implementations are possible with the use of FMA unit. FMA units were utilized in embedded signal processing and graphics applications, used to perform division, argument reduction, and this is why the FMA started to become an integral unit of many commercial processors such as IBM, HP[7] and Intel [9]. Similar to operation performed by a FMA in many algorithms in DSP and other field both of the sum and difference of a pair of operands are needed for subsequent processing. For example, this is required in computation of the FFT & DCT butterfly operations. In traditional floating-point hardware these operations may be performed in a serial fashion which limits the throughput. The use of a fused adds subtract (FAS) unit accelerates the butterfly operation. Alternatively, the add and subtract may be performed in parallel with two independent floating point adders which is expensive. This paper investigate implementation of floating point add-subtract unit this paper investigates the implementation of a floating-point fused add-subtract unit shown in figure 1. It performs the following operations: 278

Figure 1 The Floating-Point Fused Add-Subtract Unit Concept [5]. 2. Approach There are two design approaches that can be taken with discrete floating-point adders to realize the add-subtract function. These are the parallel implementation shown in Figure 2 where two adders operate in parallel (one adding and one subtracting) and the serial implementation shown in Figure 3 where a single adder is used twice (once adding and once subtracting) with the same operands. Figure 2 Conventional Parallel Realization of an Add-Subtract Unit [5]. Figure 3. Conventional Serial Realization of an Add-Subtract Unit [5] In a parallel conventional implementation of the fused add-subtract (such as that shown in Figure 2) two floatingpoint adders are used to perform the operation. This approach is fast, however, the area and power overhead is large because two floating- point add/subtract units are used. In a serial conventional implementation of the fused add-subtract (such as that shown in Figure 3) one floatingpoint adder/subtracter is used to perform the operation in addition to a storage element to store the addition or subtraction result. This approach is very efficient in terms of area. However, due to the serial execution of both operations, the time needed to get both results is twice the time needed by the parallel approach. Also since a storage element is used, it adds slightly to the area and power overhead. 3. Verilog modeling The fused add-subtract unit, the following floating-point unit were implemented in synthesizable Verilog-RTL Fused floating add-subtract unit The Verilog models were synthesized 45nm libraries. The area and the critical timing paths ware evaluated. The 279

floating-point adder and fused add subtract unit were designed to operate on single precision IEEE Std-754 operands. Figure 4. Floating-Point Fused Add-Subtract Unit [5]. Simulation Results The Floating point fused add-subtract unit architecture which has been proposed earlier has been implemented and designed through Verilog (ModelSim) and XILINX ISE ISIM Simulator. The output results have been shown below in fig. 5 and fig.6. Here 2 floating point number has been performed through add and subtract operations in the following figure: Figure 5: Input and Output block diagram 280

Figure 6: Simulation results Diagram RESULTS Here using the Xilinx XST tool for floating point fused add-subtract the following netlist has been generated which is shown in following figure 7: Figure 7: Netlist 4. Place and route The floating-point adder and fused add-subtract unit were implemented using an automatic synthesize. Place and route approach. A high performance 45 nm process was used for the implementation with a standard cell library designed for high speed applications. 281

Implementation value FPA FAS Format IEEE 754 Single-Precision Cell area (unit inverter area unit) 11342 15396 Width 48.9 µm 65.1 µm Total wire length 14742 µm 25413 µm Critical Timing path 2817 ps 2813 ps Dynamic power 890 µw 1501 µw Leakage power 142 µw 215 µw Height 49.1 µm 65.2 µm Total power 1025 µw 1715 µw CONCLUSION The new fused unit uses the IEEE-754 single-precision format and supports all rounding modes. The implementation results using a 45nm industry standard process and a simulations and synthesis results show that the fused primitives are faster, smaller, use less power and energy. Provide a slightly more accurate result. Future research can focus on optimizing the fused add-subtract unit using multipath approaches. Last but not least, the fusing concept could be extended to other types of computation extensive applications and might result in delay, area and power consumption reduction REFERENCES [1] Rodina samy Hossam A.H. Fahmy,Ramy Raafat,Amira Mohamed,Tarek EIDeeb,Yasmin Farouk, IEEE, 2008. - pp. 1-4. [2] Jochen presis, M. B. (2009). IEEE, 1-9. [3] Earl E. Swartzlander Jr,Life Fellow.(2012). FFT Implementation with fused floating-point operations, 1-4. [4] D. Takahashi, A radix-16 FFT algorithm suitable for multiply-add instruction based on Goedecker method, Proceedings 2003 International Conference on Multimedia and Expo, Vol. 2, July 2003, pp. II-845 II-848. [5] Hani Hasan Mustafa saleh Fused Floating- Point Arithmetic For DSP, Ph.D. Dissertation, The University of Texas at Austin,2009 [6] A. D. Robison, N-Bit Unsigned Division Via N-Bit Multiply-Add, Proceedings of the 17th IEEE Symposium On Computer Arithmetic, pp. 131-139, 2005. [7] R.-C. Li, S. Boldo and M. Daumas, Theroems on Efficient Argument Reductions, Proceedings of the 16th IEEE Symposium on Computer Arithmetic, pp. 129-136, 2003. [8] A. Kumar, IEEE Micro Magazine, Vol. 17, Issue 2, pp. 27-32, April,1997 [9] B. Greer, J. Harrison, G. Henry, W. Li and P. Tang, Scientific Computing on the Itanium Processor, Proceedings of the ACM/IEEE SC2001 Conference,pp. 1-8, 2004. [10] M. P. Farmwald, On the Design of High Performance Digital Arithmetic Units, Ph.D. Thesis, Stanford University, 1981. [11] IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-1985. [12] IEEE Standard for Floating-Point Arithmetic 754-2008, IEEE, 2008. [13] www.wikipedia.org. [14] www.xilinx.com [15] www.mentor.com [16] www.cadence.com [17] IEEE Standard for Verilog Hardware Description Language, IEEE 1364-1995. [18] IEEE Standard for Verilog Language, IEEE 1364-2001. [19] IEEE Standard for System Verilog: Unified Hardware Design Specification and Verification, IEEE P1800-2005. [20] Stuart Franklin Oberman, Design Issues in High Performance Floating Point Arithmetic Units, Ph. D. Dissertation, Stanford University, 2000. [21] M. Ercegovac and T. Lang, Digital Arithmetic, San Francisco: Morgan-Kaufmann Publishers, 2006 282