PROJECT REPORT IMPLEMENTATION OF LOGARITHM COMPUTATION DEVICE AS PART OF VLSI TOOLS COURSE

Similar documents
Computations Of Elementary Functions Based On Table Lookup And Interpolation

Optimized Design and Implementation of a 16-bit Iterative Logarithmic Multiplier

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

DESIGN OF LOGARITHM BASED FLOATING POINT MULTIPLICATION AND DIVISION ON FPGA

Parallel FIR Filters. Chapter 5

FPGA Based Antilog Computation Unit with Novel Shifter

FPGA Matrix Multiplier

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

Architecture and Design of Generic IEEE-754 Based Floating Point Adder, Subtractor and Multiplier

Figurel. TEEE-754 double precision floating point format. Keywords- Double precision, Floating point, Multiplier,FPGA,IEEE-754.

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm

Vendor Agnostic, High Performance, Double Precision Floating Point Division for FPGAs

An FPGA Based Floating Point Arithmetic Unit Using Verilog

CHAPTER 1 Numerical Representation

Design and Implementation of IEEE-754 Decimal Floating Point Adder, Subtractor and Multiplier

Design and Optimized Implementation of Six-Operand Single- Precision Floating-Point Addition

CHW 261: Logic Design

Divide: Paper & Pencil

Implementation of Double Precision Floating Point Adder with Residue for High Accuracy Using FPGA

Run-Time Reconfigurable multi-precision floating point multiplier design based on pipelining technique using Karatsuba-Urdhva algorithms

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

Implementation of Double Precision Floating Point Multiplier in VHDL

Pipelined High Speed Double Precision Floating Point Multiplier Using Dadda Algorithm Based on FPGA

FPGA Implementation of Single Precision Floating Point Multiplier Using High Speed Compressors

University, Patiala, Punjab, India 1 2

Chapter 3: Arithmetic for Computers

ISSN: X Impact factor: (Volume3, Issue2) Analyzing Two-Term Dot Product of Multiplier Using Floating Point and Booth Multiplier

Exponential Numbers ID1050 Quantitative & Qualitative Reasoning

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

Review on Floating Point Adder and Converter Units Using VHDL

Implementation and Impact of LNS MAC Units in Digital Filter Application

CHAPTER V NUMBER SYSTEMS AND ARITHMETIC

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

Implementation of Floating Point Multiplier Using Dadda Algorithm

VHDL implementation of 32-bit floating point unit (FPU)

High Speed Special Function Unit for Graphics Processing Unit

Chapter 4 Section 2 Operations on Decimals


Implementation of a High Speed Binary Floating point Multiplier Using Dadda Algorithm in FPGA

An Implementation of Double precision Floating point Adder & Subtractor Using Verilog

Chapter 2 Data Representations

FPGA based High Speed Double Precision Floating Point Divider

Implementation of IEEE-754 Double Precision Floating Point Multiplier

Design and Implementation of 3-D DWT for Video Processing Applications

Natural Numbers and Integers. Big Ideas in Numerical Methods. Overflow. Real Numbers 29/07/2011. Taking some ideas from NM course a little further

COMP Overview of Tutorial #2

SINGLE PRECISION FLOATING POINT DIVISION

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

Digital Fundamentals

Data Representations & Arithmetic Operations

The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop.

DIGITAL IMPLEMENTATION OF THE SIGMOID FUNCTION FOR FPGA CIRCUITS

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

Chapter 2. Positional number systems. 2.1 Signed number representations Signed magnitude

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015

An Efficient Constant Multiplier Architecture Based On Vertical- Horizontal Binary Common Sub-Expression Elimination Algorithm

Module 2: Computer Arithmetic

Implementation of Double Precision Floating Point Multiplier on FPGA

Number Systems and Computer Arithmetic

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

ECE232: Hardware Organization and Design

Survey on Implementation of IEEE754 Floating Point Number Division using Vedic Techniques

ECE331: Hardware Organization and Design

A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND ITERATIVE LOGARITHMIC MULTIPLIER

Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter

ESE532: System-on-a-Chip Architecture. Today. Message. Fixed Point. Operator Sizes. Observe. Fixed Point

Number Systems. Both numbers are positive

Implementation of Double Precision Floating Point Multiplier Using Wallace Tree Multiplier

ARCHITECTURAL DESIGN OF 8 BIT FLOATING POINT MULTIPLICATION UNIT

Fig.1. Floating point number representation of single-precision (32-bit). Floating point number representation in double-precision (64-bit) format:

Implementation of Optimized CORDIC Designs

15213 Recitation 2: Floating Point

OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION

A High-Speed FPGA Implementation of an RSD- Based ECC Processor

An FPGA Implementation of the Powering Function with Single Precision Floating-Point Arithmetic

Floating-Point Megafunctions User Guide

DESIGN OF A COMPOSITE ARITHMETIC UNIT FOR RATIONAL NUMBERS

CS 101: Computer Programming and Utilization

ISSN Vol.02, Issue.11, December-2014, Pages:

Design of a Floating-Point Fused Add-Subtract Unit Using Verilog

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

Design of Double Precision Floating Point Multiplier Using Vedic Multiplication

Binary Adders. Ripple-Carry Adder

Review on 32 bit single precision Floating point unit (FPU) Based on IEEE 754 Standard using VHDL

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.

CS321 Introduction To Numerical Methods

COMPUTER ORGANIZATION AND. Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers

Inf2C - Computer Systems Lecture 2 Data Representation

D I G I T A L C I R C U I T S E E

Efficient Floating-Point Representation for Balanced Codes for FPGA Devices (preprint version)

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

International Journal Of Global Innovations -Vol.6, Issue.II Paper Id: SP-V6-I1-P01 ISSN Online:

Transcription:

PROJECT REPORT ON IMPLEMENTATION OF LOGARITHM COMPUTATION DEVICE AS PART OF VLSI TOOLS COURSE Project Guide Prof Ravindra Jayanti By Mukund UG3 (ECE) 200630022

Introduction The project was implemented from the paper titled A Fast Hardware Approach for Approximate, Efficient Logarithm Computations which was authored by Suganth Paul, Nikhil Jayakumar, and Sunil P. Khatri and featured in the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 2, FEBRUARY 2009. The generation of elementary functions such as log() and antilog() finds uses in many areas such as digital signal processing (DSP), 3-D computer graphics, scientific computing, artificial neural networks, logarithmic number systems (LNS), and other multimedia applications. The approach presented in the paper provides a good solution for field-programmable gate array (FPGA)-based applications that require high accuracy with a low cost in terms of required lookup table (LUT) size. Such applications include LNS, DSP cores, etc. In fact the fast generation of these functions is critical to performance in many of these applications. Using software algorithms to generate these elementary functions is often not fast enough. Hence, the use of dedicated hardware to compute log() and antilog() is of great value. In this paper, the number format used is similar to the IEEE 754 singleprecision floating point format that has 32 bits. The leading bit is the sign bit, followed by an 8-bit exponent and a 23 bit mantissa. The value of a number, represented in this format is given by (1) X = ±1. m 2 E 127, 0 m 1. We use a similar number format representation, but assume the number of bits in the mantissa to be variable. We also assume that the number is positive since the logarithm of a negative number does not exist.

Previous Work One of the earliest approaches to approximate the binary logarithm of a number was given by Mitchell. In his method, the logarithm of a number is found by attaching the mantissa part of the number to the exponent part. This method is extremely easy to implement but gives an absolute error as high as 0.086 which is only 3.5 bits of accuracy. In recent times, most elementary functions are generated using LUTs. This approach, initially proposed by Brubaker, involves the computation of a function using a single LUT. The accuracy of the approach depends solely on the size of the LUT used. Kmetz proposes to store the error that occurs due to the Mitchell approximation in the table and add it to the mantissa of the number. This results in a further improvement in error (by 3 bits) with the overhead of an add operation. In our approach, we follow the same scheme of storing the error values of the Mitchell approximation in a table. Other methods make use of bigger LUTs to give more accurate results, with a good speed of computation. In our approach, we try to find the optimum table size to use for a required accuracy. Apart from these simple methods, there are several other complex methods for the generation of these elementary functions. References [3] and [6] use a LUT combined with a polynomial approximation to interpolate the function between many small intervals. Reference [3] also presents a trade off between the table sizes and the degree of the polynomial to use while interpolating. The problem with these methods is that there are multiplications and divisions involved in the computation.

Idea Presented in Paper This work uses a LUT-based approach combined with a linear interpolation to generate the logarithm of a number. The multiplication required in this linear interpolation is avoided, resulting in an area and delay reduction. A. Interpolation Approach Mitchell s approximation is given by the following equation: log 2 1 + m m, 0 m 1. (2) The error due to this approximation is given by E m = log 2 1 + m m, 0 m 1. (3) The error curve shown in is sampled at points (depending on the size of the LUT required). These values are rounded depending on the width of the word required and stored in the LUT. The LUT is addressed by the first bits of the mantissa portion of the number. Now we investigate the option of interpolating between the values stored in the table. This is done by the following equation: log 2 1 + m m + a + b a. n 1 2 k t, 0 m 1 (4) Here m is the mantissa part, a is the error value from the table accessed by the first bits and b is the next table value adjacent to a. Also k is the total number of MSBs in the mantissa used for the interpolation and n 1 is the decimal value of the last (k-t) bits of the mantissa. Essentially we find an error value from the table based on the first bits and interpolate between this value and the next value based on the remaining bits. The third term in (4) requires a multiplication. In order to circumvent the multiplication (which is expensive in terms of area and delay), we investigate the option of interpolating repeatedly between any two adjacent values stored in the table.

The only problem with this approach is that there are too many steps involved as all the mantissa bits are considered. Trying another approach, we investigate the case where a limited number of interpolations are done. We tabulate the maximum error incurred when the previous algorithm is implemented for t, t+1, and so on until t+8 mantissa bits and ignoring the rest. This is the same as doing different levels of interpolation from 0 to 8. In this case, the size of the LUT used is 64 words and the width of each word is 16 bits. The width of each word in the table is chosen in such a way that the accuracy is not reduced due to rounding. But now we see that 1 or 2 interpolations are not enough to give a good error performance. Reasonable accuracy is obtained for either 7 or 8 bits, but this requires as many interpolations, and therefore results in larger delays in computing the logarithm. In order to obtain better accuracy, we need to implement the multiplication of (b-a) and n 1. However, implementing multiplication is expensive in terms of area and delay. Therefore, we approximate the multiplication, so as to obtain good error performance as well as low delay and area utilization. B. More Effective Approach for Multiplication In this section, we propose a more efficient approach to do interpolation without the multiplication of (b-a) and n1 in (4). The essential idea is that the multiplication is simplified by taking the antilogarithm of the sum of the logarithm. In order to perform this operation with a small delay, we consider the following options. 1) log 2 (b-a) may be approximated by either of the following two options: a) Mitchell approximation b) LUT: For the LUT option the constants for each of the intervals is stored in the original LUT. Recall that we stored the error values obtained by using a Mitchel approximation for the log function. The values are stored along with the error values and are indexed by the same address lines. 2) Antilog of the expression may be approximated by either of the following two options: a) Mitchell approximation

b) LUT: To obtain the antilogarithm of a number by this method, we need to construct another LUT. This antilog LUT is utilized to compute the antilogarithm of a number, since the multiplication of and is performed by taking the antilogarithm of the sum of the logarithm. C. Architecture of Implementation Fig. 1 shows the block diagram of the log() engine which is essentially an implementation of (4). The architecture of the interpolator block is shown in Fig. 2. The exponent part of the input number trivially becomes the decimal part of the logarithm. This is because we assume that 0 m 1 in (1). Also, we assume that the number we operate on (which is expressed as m in (1) is positive. The implementation is pipelined, with 12 stages in the pipeline. We use a 16-bit mantissa and a by 16 bit LUT. The width of each word in the table is chosen as 16 bits so that the error due to rounding does not dominate the overall error. MANTISSA, m 16 bits INTERPOLATOR 13 16 16 ADDER 17 RESULT FIGURE 1

(t=6) MSBs (k-t)=7 Look Up Table A log(b-a) LOD MSBs 6 16 20 9 6 COMPARE N < 28 LUT Log n1 16 1 FIXED POINT ADDER 16 6 ANTILOG LUT ADDER 16 16 ADD/SUB 16 FIGURE 2

The width of the mantissa bits processed by each block in the architecture is shown in the diagrams. It is found that (b-a) changes sign from positive to negative for m > 0.4427. This is equivalent to comparing the decimal value of the first six bits of the mantissa with 28, as shown. Hence, if the first six bits have a value greater than 28, the comparator block sends a control signal to the ADD/SUB block instructing it to perform a subtract operation. The leading one detector (LOD) block is used to find the log(n1). The LOD detects the first bit that has a value 1. It then uses the remaining bits as the mantissa portion to access the lookup table. Since there are 7 bits given as input to the LOD in this example, the LOD finds the first position that has a value 1 and the remaining bits which can be as wide as 6 bits is used to access the LUT. The decimal part of which is indicated by the position of the first bit with value 1, is directly sent to the three input fixed point adder to be added to the decimal part of log(b-a). The output of the adder after the antilog stage has to be shifted to the right or left depending on the decimal output of the fixed point adder. Also there is a term of 2 7 in the denominator of (4) and this is accounted for by a constant right shift of 7 bits at the output of the adder of the antilog LUT. In Fig. 2, the blocks representing a LUT for log(n1) and a LUT for a are identical and can be implemented using a dual port RAM. Also note that, the address of each of the three LUTs shown in Fig. 2 has a width of 6 bits. In each case the 6 bit address word is obtained by rounding off the value of the next LSB in the word from which the address word is derived. Notes on Implementation This complete hardware design was modelled in Verilog using the technical environment of Xilinx ISE and Modelsim SE. Figure 1 was modelled as one block and Figure 2 was modelled as a complete and separate block. The blocks representing adders in the figures were implemented as the + (plus) operator. In this case the simulating environment searches for the most optimum adder present in its libraries and replaces the + operator with that corresponding adder thereby ensuring best possible performance.

Simulation Results Please note that the binary values shown here are of the fractional part and not the decimal part so the right hand side bits contribute lesser to the number and can be erroneous. 1. Input (in decimal) - 1.564 Log value from calculator - 0.64745 = 101001011011 Log value from simulation = 101000011001 2. Input (in decimal) - 1.334 Log value from calculator - 0.41718 = 011010101100 Log value from simulation = 011010101001 3. Input (in decimal) - 1.791 Log value from calculator - 0.84365 = 110101111111 Log value from simulation = 110010111100 Conclusion The simulated results and the computed results always have at least 10 bits of accuracy which is the target of the implementation.

References 1. A Fast Hardware Approach for Approximate, Efficient Logarithm Computations by Suganth Paul, Nikhil Jayakumar, and Sunil P. Khatri, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 2, FEBRUARY 2009. 2. J. N. Mitchell, Computer multiplication and division using binary logarithms, IRE Trans. Electron. Computers, vol. 11, pp. 512 517, Aug. 1962