Implementation and Comparative Analysis between Devices for Double Precision Interval Arithmetic Radix 4 Wallace tree Multiplication

Similar documents
Implementation and Analysis of Modified Double Precision Interval Arithmetic Array Multiplication

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Comparison of pipelined IEEE-754 standard floating point multiplier with unpipelined multiplier

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL

Implementation of Floating Point Multiplier Using Dadda Algorithm

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm

ISSN Vol.02, Issue.11, December-2014, Pages:

International Journal of Advanced Research in Computer Science and Software Engineering

ARCHITECTURAL DESIGN OF 8 BIT FLOATING POINT MULTIPLICATION UNIT

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Implementation of Double Precision Floating Point Multiplier Using Wallace Tree Multiplier

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

High speed Integrated Circuit Hardware Description Language), RTL (Register transfer level). Abstract:

ECE 645: Lecture 1. Basic Adders and Counters. Implementation of Adders in FPGAs

Design and Implementation of IEEE-754 Decimal Floating Point Adder, Subtractor and Multiplier

AN EFFICIENT FLOATING-POINT MULTIPLIER DESIGN USING COMBINED BOOTH AND DADDA ALGORITHMS

University, Patiala, Punjab, India 1 2

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

International Journal of Advanced Research in Computer Science and Software Engineering

Architecture and Design of Generic IEEE-754 Based Floating Point Adder, Subtractor and Multiplier

ABSTRACT I. INTRODUCTION. 905 P a g e

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

Run-Time Reconfigurable multi-precision floating point multiplier design based on pipelining technique using Karatsuba-Urdhva algorithms

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

ECE 341 Midterm Exam

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

Fig.1. Floating point number representation of single-precision (32-bit). Floating point number representation in double-precision (64-bit) format:

A Review on Optimizing Efficiency of Fixed Point Multiplication using Modified Booth s Algorithm

Design of Double Precision Floating Point Multiplier Using Vedic Multiplication

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

Measuring Improvement When Using HUB Formats to Implement Floating-Point Systems under Round-to- Nearest

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

IJRASET 2015: All Rights are Reserved

Implementation of Double Precision Floating Point Multiplier in VHDL

Optimized Design and Implementation of a 16-bit Iterative Logarithmic Multiplier

Numeric Encodings Prof. James L. Frankel Harvard University

PINE TRAINING ACADEMY

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

Double Precision Floating-Point Multiplier using Coarse-Grain Units

Summary. Introduction. Application Note: Virtex, Virtex-E, Spartan-IIE, Spartan-3, Virtex-II, Virtex-II Pro. XAPP152 (v2.1) September 17, 2003

An Efficient Implementation of Floating Point Multiplier

IMPLEMENTATION OF CONFIGURABLE FLOATING POINT MULTIPLIER

CPE300: Digital System Architecture and Design

HIGH SPEED SINGLE PRECISION FLOATING POINT UNIT IMPLEMENTATION USING VERILOG

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai

Area-Time Efficient Square Architecture

FPGA Implementation of Low-Area Floating Point Multiplier Using Vedic Mathematics

Comparison of Adders for optimized Exponent Addition circuit in IEEE754 Floating point multiplier using VHDL

AN IMPROVED FUSED FLOATING-POINT THREE-TERM ADDER. Mohyiddin K, Nithin Jose, Mitha Raj, Muhamed Jasim TK, Bijith PS, Mohamed Waseem P

Verilog Sequential Logic. Verilog for Synthesis Rev C (module 3 and 4)

FPGA Implementation of MIPS RISC Processor

The Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive.

Low Power Floating Point Computation Sharing Multiplier for Signal Processing Applications

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

Pipelined High Speed Double Precision Floating Point Multiplier Using Dadda Algorithm Based on FPGA

A Novel Floating Point Comparator Using Parallel Tree Structure

Australian Journal of Basic and Applied Sciences

FPGA Implementation of Single Precision Floating Point Multiplier Using High Speed Compressors

ISSN: X Impact factor: (Volume3, Issue2) Analyzing Two-Term Dot Product of Multiplier Using Floating Point and Booth Multiplier

High Speed Special Function Unit for Graphics Processing Unit

Designing an Improved 64 Bit Arithmetic and Logical Unit for Digital Signaling Processing Purposes

A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA

A High Performance Reconfigurable Data Path Architecture For Flexible Accelerator

High Performance Architecture for Reciprocal Function Evaluation on Virtex II FPGA

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

High Performance Pipelined Design for FFT Processor based on FPGA

VLSI Implementation of High Speed and Area Efficient Double-Precision Floating Point Multiplier

Study, Implementation and Survey of Different VLSI Architectures for Multipliers

Computer Organization EE 3755 Midterm Examination

Paper ID # IC In the last decade many research have been carried

Design & Implementation of AHB Interface for SOC Application

RADIX-4 AND RADIX-8 MULTIPLIER USING VERILOG HDL

Implementation of Double Precision Floating Point Multiplier on FPGA

Simulation & Synthesis of FPGA Based & Resource Efficient Matrix Coprocessor Architecture

Effective Improvement of Carry save Adder

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

An FPGA Based Floating Point Arithmetic Unit Using Verilog

ECE 30 Introduction to Computer Engineering

ANALYSIS OF AN AREA EFFICIENT VLSI ARCHITECTURE FOR FLOATING POINT MULTIPLIER AND GALOIS FIELD MULTIPLIER*

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Hardware Implementation

Low Power Floating-Point Multiplier Based On Vedic Mathematics

Design and Analysis of Inexact Floating Point Multipliers

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter

Implementation of IEEE754 Floating Point Multiplier

VLSI Based Low Power FFT Implementation using Floating Point Operations

FPGA Matrix Multiplier

Introduction to Field Programmable Gate Arrays

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Transcription:

Implementation and Comparative Analysis between Devices for Double Precision Interval Arithmetic Radix 4 Wallace tree Multiplication Krutika Ranjankumar Bhagwat #1, Dr. Tejas V. Shah * 2, Prof. Deepali H. Shah # 3 # Instrumentation & Control Engineering Department L. D. College of Engineering Ahmedabad-380015, Gujarat, India *S.S College of Engineering Bhavnagar - 364060, Gujarat, India Abstract This paper presents comparative analysis between two devices for the design of a radix 4 wallace tree multiplier that performs interval multiplication. This 64 bit multiplier requires Booth partial product selection logic and [27,2] compressor. This interval arithmetic gives accurate result as rounding off error of floating point multiplier is eliminated. It requires slightly more area than conventional floating point unit. There is definite performance improvement over software approach as function calls, error and range checking etc are not present due to dedicated hardware logic. The input and output registers are each 64 bits and two multiplexer with control signal t x,t y are used [10]. Keywords Double Precision, Interval Multiplication, Booth partial product selection logic. I. INTRODUCTION IEEE 754 standard defines double precision as 1 sign bit, 11 bits for exponent,53 bits for (52 explicitly stored) significand precision. The format is written with the significand having an implicit integer bit of value 1, unless the written exponent have all Zeros, with the 52 bits of the fraction significand appearing in the memory format. The total precision is therefore 53 bits results in 16 decimal digits which gives 53 log 10 (2) 15.955 ) [4]. II. INTERVAL MULTIPLICATION Multiplication of the intervals x = [ x l, x u ] and y = [ y l,y u ] is defined as: Z = x *y = [min(x l y l, x l y u,x u y l, x u y u ),max( x l y l, x l y u, x u y l, x u y u )] Fig. 1 Interval multiplier The sign logic computes the sign of the result by performing the exclusiveor of the sign bits of the input operands. The exponent adder performs an 11bit addition of the two exponents and subtracts the exponent bias of 1023 [10]. III. RADIX 4 WALLACE TREE MULTIPLIER The significand multiplier performs a 53 bit by 53 bit radix 4 wallace tree multiplication. If the most signicant bit of the product is one, the normalization logic shifts the product right one bit and increments the exponent. The rounding logic rounds the product to 53 bits based on a rounding mode (r m ) which round to nearest even [10]. The interval multiplier shown in figure 1 has input and output registers, sign logic, an exponent adder and a significand multiplier with rounding and normalization ISSN: 2231-5381 Page 340

A. Booth multiplication Booth multiplication is a technique that allows for smaller, faster multiplication circuits, by recoding the numbers that are multiplied. It is possible to reduce the number of partial products by half. In this technique, instead of shifting and adding for every column of the multiplier term and multiplying by 1 or 0, we only take every second column, and multiply by ±1, ±2, or 0 to obtain the same results [16]. Partial products are halved in this method which gives tremendous performance advantage. Booth recode multiplier term, we consider the bits in blocks of three such that, each block overlaps the previous block by one bit. Grouping starts from the LSB, and the first block only uses two bits of the multiplier [16]. Fig. 3 Booth partial product selector logic Fig. 2 Grouping of bits from the multiplier term Figure 2 shows the grouping of bits from the multiplier term for use in modified booth encoding [16]. Each block is decoded to generate the correct partial product. The encoding of the multiplier Y, using the modified booth algorithm, generates the following five signed digits -2, -1, 0, +1, +2. Each encoded digit in the multiplier performs a certain operation on the multiplicand X as illustrated in Table 1.Booth partial product selector logic is shown in figure 3. Booth recording is used to reduce the number of partial products 53 to 27. The needed (27,2) Wallace tree is implemented which is used to add 27 partial products [16]. A [27,2] Compressor is made of three (9,2) blocks and one (6,2) block shown in figure 4. A [9,2] Compressor is made of three (3,2) blocks and one (6,2) block shown in figure 5. A [6,2] Compressor is made of two (3,2) blocks and one (4,2) block shown in figure 6. A [4,2] Compressor is made of two full adders shown in figure 7. B. [27,2] Compressor Since 27 is a multiple of 9, the (9,2) building block gives a very simple global routing structure for this multiplier. But it would not be appropriate for a 16 or 32 bit multiplier [13]. TABLE I OPERATION ON THE MULTIPLICAND X BLOCK Re- coded digit Operation on X 000 0 0X 001 +1 +1X 010 +1 +1X 011 +2 +2X 100-2 -2X 101-1 -1X 110-1 -1X 111 0 0X ISSN: 2231-5381 Page 341

Fig. 4 [27,2] Compressor respectively. Fig. 5 [9,2] Compressor TABLE II. POWER ANALYSIS xc3s1400an-5-fgg676 I(mA) P(mW) Total estimated power 0 97 consumption Quiescent Vccint 1.20V 31 37 Quiescent Vccaux 3.30V 18 60 Quiescent Vcco25 0 1 2.50V v3200efg1156-8 I(mA) P(mW) Total estimated power 0 367 consumption Quiescent Vccint 1.80V 200 360 Quiescent Vcco33 3.30V 2 7 250000 Fig. 6 [6,2] Compressor 200000 150000 100000 50000 v3200efg115 6-8 xc3s1400an- 5-fgg676 0 Total memory usage in kilobytes Fig. 7 [4,2] Compressor Fig. 8 Area Analysis IV. COMPARATIVE ANALYSIS Table 2 and 3 gives comparative analysis of interval arithmetic double precision Radix 4 wallace tree multiplication between virtexe and spartarn 3A &Spartan 3AN family which has device XCV3200E and XC3S1400AN, package FG1156and FGG676 ISSN: 2231-5381 Page 342

TABLE III. COMPARATIVE ANALYSIS Device comparision for 64 bit interval arithmetic based radix 4 wallace tree multiplier v3200efg1156-8 xc3s1400an-5-fgg676 AREA ANALYSIS Number of Slices 333 317 Number of Slice Flip Flops 143 117 Number of 4 input LUTs 648 608 Number of bonded IOBs 498 608 Number of IOs 498 498 IOB Flip Flops 214 214 Number of GCLKs: 2 2 Total memory usage 237984 kilobytes 183188 kilobytes SPEED ANALYSIS Minimum period 9.069ns 8.295ns Maximum Frequency 110.266MHz 120.548MHz Minimum input arrival time before clock 14.289ns 11.534ns Maximum output required time after clock 11.919ns 8.200ns Maximum combinational path delay 19.712ns 13.469ns TIMING CONSTRAINTS Worst case stack hold 1.632ns 1.324ns Best case achievable set up 9.948ns 9.265ns Total REAL time to Xst completion: 16.00 secs 21.00 secs THERMAL SUMMARY Estimated junction temperature 30C 27c Ambient temp 25C 25C Case temp 29C 26C Theta J-A range 13C/W 18C/W CLOCK REPORT Fanout le 214 214 Fanout clk 126 112 Net skew le ns 0.263 0.140 Net skew clk ns 0.139 0.176 Max Delay le ns 1.435 1.06 Max Delay clk ns 1.167 1.063 CLOCK SIGNAL (LOAD) clk 143 117 le 214 214 ISSN: 2231-5381 Page 343

12 25 10 8 20 6 v3200efg 1156-8 15 4 2 xc3s1400 an-5- fgg676 10 5 0 Worst case Best case stack hold achievable in ns set up in ns Fig. 9 Timing Constraints Analysis 0 Minimum period in ns Minimum input arrival time before clock Maximum output required time after Maximum combinational path delay v3200ef g1156-8 xc3s140 0an-5- fgg676 Fig. 11 Speed Analysis Fig. 10 Power Analysis V.CONCLUSION Interval arithmetic provides reliability and accuracy by computing a lower and upper bound in which result is guaranteed to reside. Concept of carry look ahead for 11 bit exponent adder is used which reduces the delay. Radix 4 wallace tree interval arithmetic based multiplication using virtexe has more the number of gates and delay compare to spartarn 3A and Spartan 3AN. 237984 kilobytes memory and 367 mw power are required for virtexe Radix 4 Wallace tree multiplication using interval arithmetic with 19.712ns maximum combinational path delay. While only 183188 kilobytes memory and 97 mw power required for spartarn 3A &Spartan 3AN. REFERENCES [1] Josh Milthorpe and Alistair Rendell Learning to live with errors: A fresh look at floating-point computation, Australian National University, Computing Conference 2005 [2] Gupte, ruchir Interval arithmetic logic unit for dsp and control applications, Electrical and Computer Engineering, Raleigh 2006 [3] Samir Palniker, Verilog HDL: A Guide to Digital Design and Synthesis, ISBN 81-297-0092-1, @2003 SUN MICROSYSTEMS ISSN: 2231-5381 Page 344

[4] IEEE Standard 754 for Binary Floating Point Arithmetic, ANSI/IEEE Standard No. 754, American National Standards Institute, Washington DC, 1985. [5] Behrooz Parhami, Computer Arithmetic, Algorithms and Hardware Designs, 2nd Edn, OXFORD, 2011 [6] C. N.Marimuthu1, P. Thangaraj2, Low Power High Performance Multiplier,Anna University,Tamil nadu, India, ICGST-PDCS, Volume 8, Issue 1, December 2008 [7] Michael J. Schulte and Earl E. Swartzlander Jr., A Performance Comparison Study on Multiplier Designs,IEEE Transaction On Computers, May 2000 [8] Yong Dou S. Vassiliadis G. K. Kuzmanov G. N. Gaydadjiev, 64-bit Floating-Point FPGA Matrix Multiplication, National Laboratory for Computer Engineering, FPGA 05, Monterey, California, USA, February 2005 [9] Anane Nadjia, Anane Mohamed, Bessalah Hamid, Issad Mohamed & Messaoudi khadidja, Hardware Algorithm for Variable Precision Multiplication on FPGA 2009 IEEE [10] James E. Stine and Michael J. Schulte A Combined Interval and Floating Point Multiplier, Computer Architecture and Arithmetic Laboratory,Electrical Engineering and Computer Science Department, Lehigh University, Bethlehem, PA 18015 [11] Sparc Architecture Manual [12] Prof. LohCS3220- Processor Design Carry-Save Addition - Spring, February, 2005 [13] Carry Save Adder Trees in Multipliers ecen 6 2 6 3 advanced vlsi design november 3, 1999 [14] C..N. Marimuthu1, P. Thangaraj Low Power High Performance Multiplier, Anna University, Tamil nadu, India [15] Steve Kilts, Advanced FPGA Design Architecture, Implementation, and Optimization, Wiley Interscience, A John Wiley & Sons, ISBN 978-0-470-05437-6, @ 2007 IEEE [16] p. Assady, A New Multiplication Algorithm Using High-Speed Counters Islamic Azad University Varameen branch, Iran, EuroJournals Publishing, Inc. 2009 ISSN: 2231-5381 Page 345