Design of a Floating-Point Fused Add-Subtract Unit Using Verilog

Similar documents
Design a floating-point fused add-subtract unit using verilog

A Floating-Point Fused Dot-Product Unit

A Pipelined Fused Processing Unit for DSP Applications

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

Design and Implementation of IEEE-754 Decimal Floating Point Adder, Subtractor and Multiplier

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL

AN IMPROVED FUSED FLOATING-POINT THREE-TERM ADDER. Mohyiddin K, Nithin Jose, Mitha Raj, Muhamed Jasim TK, Bijith PS, Mohamed Waseem P

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

ISSN Vol.02, Issue.11, December-2014, Pages:

Implementation of IEEE754 Floating Point Multiplier

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

A Novel Carry-look ahead approach to an Unified BCD and Binary Adder/Subtractor

An Effective Implementation of Dual Path Fused Floating-Point Add-Subtract Unit for Reconfigurable Architectures

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

Implementation of 64-Bit Pipelined Floating Point ALU Using Verilog

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

II. MOTIVATION AND IMPLEMENTATION

Chapter 5 Design and Implementation of a Unified BCD/Binary Adder/Subtractor

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Design and Simulation of Pipelined Double Precision Floating Point Adder/Subtractor and Multiplier Using Verilog

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator

AN EFFICIENT FLOATING-POINT MULTIPLIER DESIGN USING COMBINED BOOTH AND DADDA ALGORITHMS

32-bit Signed and Unsigned Advanced Modified Booth Multiplication using Radix-4 Encoding Algorithm

Implementation of Double Precision Floating Point Multiplier in VHDL

ISSN: X Impact factor: (Volume3, Issue2) Analyzing Two-Term Dot Product of Multiplier Using Floating Point and Booth Multiplier

Simulation Results Analysis Of Basic And Modified RBSD Adder Circuits 1 Sobina Gujral, 2 Robina Gujral Bagga

Low Power Floating-Point Multiplier Based On Vedic Mathematics

Low-Power Split-Radix FFT Processors Using Radix-2 Butterfly Units

High Speed Special Function Unit for Graphics Processing Unit

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

Design and Development of Vedic Mathematics based BCD Adder

FLOATING-POINT BUTTERFLY ARCHITECTURE BASED ON MULTI OPERAND ADDERS

An Implementation of Double precision Floating point Adder & Subtractor Using Verilog

Floating-Point Butterfly Architecture Based on Binary Signed-Digit Representation

High Speed Radix 8 CORDIC Processor

Run-Time Reconfigurable multi-precision floating point multiplier design based on pipelining technique using Karatsuba-Urdhva algorithms

University, Patiala, Punjab, India 1 2

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

A Decimal Floating Point Arithmetic Unit for Embedded System Applications using VLSI Techniques

Design and Implementation of 3-D DWT for Video Processing Applications

Improved Design of High Performance Radix-10 Multiplication Using BCD Codes

Implementation of a High Speed Binary Floating point Multiplier Using Dadda Algorithm in FPGA

Implementation of Fast and Efficient Mac Unit on FPGA

[Sahu* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Design & Analysis of 16 bit RISC Processor Using low Power Pipelining

ARCHITECTURAL DESIGN OF 8 BIT FLOATING POINT MULTIPLICATION UNIT

REALIZATION OF MULTIPLE- OPERAND ADDER-SUBTRACTOR BASED ON VEDIC MATHEMATICS

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015

Comparison of Adders for optimized Exponent Addition circuit in IEEE754 Floating point multiplier using VHDL

MODULO 2 n + 1 MAC UNIT

Analysis of Different Multiplication Algorithms & FPGA Implementation

Study, Implementation and Survey of Different VLSI Architectures for Multipliers

Optimized Modified Booth Recorder for Efficient Design of the Operator

IMPLEMENTATION OF OPTIMIZED 128-POINT PIPELINE FFT PROCESSOR USING MIXED RADIX 4-2 FOR OFDM APPLICATIONS

Abstract. Literature Survey. Introduction. A.Radix-2/8 FFT algorithm for length qx2 m DFTs

VHDL implementation of 32-bit floating point unit (FPU)

A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA

Research Article Design of A Novel 8-point Modified R2MDC with Pipelined Technique for High Speed OFDM Applications

Vendor Agnostic, High Performance, Double Precision Floating Point Division for FPGAs

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

PROJECT REPORT IMPLEMENTATION OF LOGARITHM COMPUTATION DEVICE AS PART OF VLSI TOOLS COURSE

I. Introduction. India; 2 Assistant Professor, Department of Electronics & Communication Engineering, SRIT, Jabalpur (M.P.

A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design

Implementation of Floating Point Multiplier Using Dadda Algorithm

Improved Combined Binary/Decimal Fixed-Point Multipliers

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

An FPGA Based Floating Point Arithmetic Unit Using Verilog

Chapter 3: part 3 Binary Subtraction

Implimentation of A 16-bit RISC Processor for Convolution Application

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

Measuring Improvement When Using HUB Formats to Implement Floating-Point Systems under Round-to- Nearest

Pipelined High Speed Double Precision Floating Point Multiplier Using Dadda Algorithm Based on FPGA

A High Speed Design of 32 Bit Multiplier Using Modified CSLA

Architecture and Design of Generic IEEE-754 Based Floating Point Adder, Subtractor and Multiplier

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

VLSI Based Low Power FFT Implementation using Floating Point Operations

High speed Integrated Circuit Hardware Description Language), RTL (Register transfer level). Abstract:

High Performance Architecture for Reciprocal Function Evaluation on Virtex II FPGA

High Performance Pipelined Design for FFT Processor based on FPGA

Efficient Design of Radix Booth Multiplier

MULTIPLIERLESS HIGH PERFORMANCE FFT COMPUTATION

FPGA Matrix Multiplier

Computer Architecture, Lecture 14: Does your computer know how to add?

Indian Silicon Technologies 2013

VLSI Implementation of Fast Addition Using Quaternary Signed Digit Number System

ANALYSIS OF AN AREA EFFICIENT VLSI ARCHITECTURE FOR FLOATING POINT MULTIPLIER AND GALOIS FIELD MULTIPLIER*

Design and Implementation of a Quadruple Floating-point Fused Multiply-Add Unit

Performance of Constant Addition Using Enhanced Flagged Binary Adder

Design And Implementation Of Reversible Logic Alu With 4 Operations

An Efficient Design of Vedic Multiplier using New Encoding Scheme

Figurel. TEEE-754 double precision floating point format. Keywords- Double precision, Floating point, Multiplier,FPGA,IEEE-754.

DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTOR SUPPORT

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Novel Design of Dual Core RISC Architecture Implementation

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Transcription:

International Journal of Electronics and Computer Science Engineering 1007 Available Online at www.ijecse.org ISSN- 2277-1956 Design of a Floating-Point Fused Add-Subtract Unit Using Verilog Mayank Sharma, Prince Nagar, Ghanshyam Kumar Singh & Ram Mohan Mehra Department of Electronics and Communication Engineering School of Engineering & Technology Sharda University, Knowledge Park-III, Greater Noida, (UP), India Email- ghanshyam.singh@sharda.ac.in Abstract- A floating-point (FP) fused add-subtract unit is presented that performs simultaneous floating-point operation of add-subtract on a common pair of single-precision data at the same time that it takes to perform in a single addition with a conventional floating-point adder. The system was placed and routed in 45nm process so that there will be less consumption of memory as well as power. Keywords Floating-point adder (FPA), Fused add-subtractor (FAS), Verilog I. INTRODUCTION The growing need of decimal arithmetic witnessed an increased attention in the recent years in many commercial applications and database systems, where the binary arithmetic is not sufficient. The floating-point fused multiply add (FMA) unit has been studied by so many researchers because it has several advantages in a floating-point unit design. The fused multiplier-add unit not only can reduce the latency of an application that executes a multiplication followed by an addition, but the unit may entirely replace a floating point co-processor s floating-point adder and floating-point multiplier [1-5]. Many DSP algorithms have been rewritten to take advantage of the presence of FMA units in a given system. For example, in a radix-16 FFT algorithm is presented that speeds up FFTs in systems with FMA units. High-throughput and digital filter implementations are possible with the use of FMA unit. FMA units were utilized in embedded signal processing and graphics applications, used to perform division, argument reduction, and this is why the FMA started to become an integral unit of many commercial processors such as IBM, HP and Intel. Similar to operation performed by a FMA in many algorithms in DSP and other fields both of the sum and difference of a pair of operands are needed for subsequent processing. For example, this is required in computation of the FFT & DCT butterfly operations. In traditional floating-point hardware these operations may be performed in a serial fashion which limits the throughput. The use of a fused add-subtract (FAS) unit accelerates the butterfly operation. Alternatively, the add and subtract may be performed in parallel with two independent floating point adders which is expensive [6-10]. This paper presents implementation of floating point add-subtract unit. II. PROPOSED APPROACH There are two design approaches that can be taken with discrete floating-point adders to realize add-subtract function. These are the parallel implementation shown in Figure 1 where two adders operate in parallel (one adding and one subtracting)and the serial implementation shown in Figure 2 where a single adder is used twice(once adding and once subtracting) with the same operands.

IJECSE,Volume2, Number 3 Mayank Sharma at el. 1008 Figure 1 Conventional Parallel Realization of an Add-Subtract Unit Figure 2 Conventional Serial Realization of an Add-Subtract Unit In a parallel conventional implementation of the fused add-subtract (such as that shown in Figure 2) two floatingpoint adders are used to perform the operation. This approach is fast, however, the area and power overhead is large because two floating- point add/subtract units are used. In a serial conventional implementation of the fused add-subtract (such as that shown in Figure3) one floating-point adder/subtractor is used to perform the operation in addition to a storage element to store the addition or subtraction result. This approach is very efficient in terms of area. However, due to the serial execution of both operations, the time needed to get both results is twice the time needed by the parallel approach. Also since a storage element is used, it adds slightly to the area and power overhead. III. VERILOG MODELLING The fused add-subtract unit, the following floating-point unit were implemented in synthesizable Verilog-RTL: Fused floating add-subtract unit

Design of a Floating-Point Fused Add-Subtract Unit Using Verilog 1009 The Verilog models were synthesized using 45 nm libraries. The area and the critical timing paths were evaluated. The floating-point adder and fused add subtract unit were designed to operate on single precision IEEE Std-754 operands [11-15]. Figure 3 Floating-Point Fused Add-Subtract Unit A. Simulation Results The Floating point fused add-subtract unit architecture which has been proposed earlier has been implemented and designed through Verilog (ModelSim) and XILINX ISE ISIM Simulator. The output results have been shown below in Figure 4 and Figure 5. Here 2 floating point number has been performed through add and subtract operations in the following figure:

IJECSE,Volume2, Number 3 Mayank Sharma at el. 1010 Figure 4: Input and Output block diagram Figure 5: Simulation results Diagram B. Synthesis Results

Design of a Floating-Point Fused Add-Subtract Unit Using Verilog 1011 Here using the Xilinx XST tool for floating point fused add-subtract the following net-list has been generated which is shown in following Figure 6: Figure 6: Net-list IV. PLACE AND ROUTE VALUE

IJECSE,Volume2, Number 3 Mayank Sharma at el. 1012 The floating-point adder (FPA) and fused add-subtract (FAS) unit were implemented using an automatic synthesize, place and route approach. A high performance 45 nm process was used for the implementation with a standard cell library designed for high speed applications [11-15]. Implementation value FPA FAS Format IEEE 754 Single-Precision Cell area 11342 15396 (unit inverter area unit) Width 48.9 µm 65.1 µm Total wire length 14742 µm 25413 µm Critical Timing path 2817ps 2813ps Dynamic power 890 µw 1501 µw Leakage power 142 µw 215 µw Height 49.1µm 65.2 µm Total power 1025 µw 1715 µw V.CONCLUSION The new fused unit uses the IEEE-754 single-precision format and supports all rounding modes.the implementation results using a 45nm industry standard process and a simulations synthesis results shows that the fused primitives are faster, smaller, use less power and energy. It also provides slightly more accurate results. The fused add-subtract primitive unit can be used to realize many other DSP algorithms, including the basic butterfly computation of the discrete cosine transform and many forms of the wavelet transform. The presented fused add-subtract design were implemented with no pipelines. The unit could be redesigned employing pipelining to achieve higher operation speeds. If proper pipeline gating were employed, then power consumption could be reduced as well. Future research can focus on optimizing the fused add-subtract unit using multipath approaches. These fusing concepts could be extended to other types of computation extensive applications and might result in delay, area and power consumption reduction V. REFERENCE [1] Akkas, A.; Schulte, M.J., "A decimal floating-point fused multiply-add unit with a novel decimal leading-zero anticipator," Application- Specific Systems, Architectures and Processors (ASAP), 2011 IEEE International Conference on, vol., no., pp.43,50, 11-14, Sept. 2011 [2] Samy, R.; Fahmy, H.A.H.; Raafat, R.; Mohamed, A.; ElDeeb, T.; Farouk, Y., "A decimal floating-point fused-multiply-add unit," Circuits and Systems (MWSCAS), 2010 53rd IEEE International Midwest Symposium on, vol., no., pp.529,532, 1-4 Aug. 2010 [3] Preiss, J.; Boersma, M.; Mueller, S.M., "Advanced Clockgating Schemes for Fused-Multiply-Add-Type Floating-Point Units," Computer Arithmetic, 2009. ARITH 2009. 19th IEEE Symposium on, vol., no., pp.48,56, 8-10 June 2009 [4] Swartzlander, E.E.; Saleh, H.H., "FFT Implementation with Fused Floating-Point Operations," Computers, IEEE Transactions on, vol.61, no.2, pp.284,288, Feb. 2012 [5] Takahashi, D., "A radix-16 FFT algorithm suitable for multiply-add instruction based on Goedecker method," Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on, vol.2, no., pp.ii,845-8 vol.2, 6-9 July 2003

Design of a Floating-Point Fused Add-Subtract Unit Using Verilog 1013 [6] A.D. Robison, N-Bit Unsigned Division Via N-Bit Multiply-Add, Proceedings of the 17th IEEE Symposium On Computer Arithmetic, pp. 131-139, 2005. [7] R.-C. Li, S. Boldo and M. Daumas, Theroems on Efficient Argument Reductions, Proceedings of the 16th IEEE Symposium on Computer Arithmetic, pp. 129-136, 2003. [8] Kumar, The HP PA-8000 RISC CPU, IEEE Micro Magazine, Vol. 17, Issue 2, pp. 27-32, April,1997 [9] Greer, J. Harrison, G. Henry, W. Li and P. Tang, Scientific Computing on the Itanium Processor, Proceedings of the ACM/IEEE SC2001 Conference,pp. 1-8, 2004. [10] M. P. Farmwald, On the Design of High Performance Digital Arithmetic Units, Ph.D. Thesis, Stanford University, 1981. [11] IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-1985. [12] IEEE Standard for Floating-Point Arithmetic 754-2008, IEEE, 2008. [13] IEEE Standard for Verilog Hardware Description Language, IEEE 1364-1995. [14] IEEE Standard for Verilog Language, IEEE 1364-2001. [15] IEEE Standard for System Verilog: Unified Hardware Design Specification and Verification, IEEE P1800-2005.