VLSI DESIGN OF FLOATING POINT ARITHMETIC & LOGIC UNIT

Similar documents
AN EFFICIENT FLOATING-POINT MULTIPLIER DESIGN USING COMBINED BOOTH AND DADDA ALGORITHMS

Design and Implementation of IEEE-754 Decimal Floating Point Adder, Subtractor and Multiplier

Implementation of IEEE754 Floating Point Multiplier

Divide: Paper & Pencil

Implementation of Double Precision Floating Point Multiplier in VHDL

Organisasi Sistem Komputer

University, Patiala, Punjab, India 1 2

Architecture and Design of Generic IEEE-754 Based Floating Point Adder, Subtractor and Multiplier

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

HIGH SPEED SINGLE PRECISION FLOATING POINT UNIT IMPLEMENTATION USING VERILOG

CPE300: Digital System Architecture and Design

By, Ajinkya Karande Adarsh Yoga

VHDL implementation of 32-bit floating point unit (FPU)

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

IJRASET 2015: All Rights are Reserved

VLSI Implementation of a High Speed Single Precision Floating Point Unit Using Verilog

FLOATING POINT NUMBERS

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Module 2: Computer Arithmetic

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS

Computer Arithmetic Ch 8

Computer Arithmetic Ch 8

CHW 261: Logic Design

Fig.1. Floating point number representation of single-precision (32-bit). Floating point number representation in double-precision (64-bit) format:

ISSN: X Impact factor: (Volume3, Issue2) Analyzing Two-Term Dot Product of Multiplier Using Floating Point and Booth Multiplier

Design of High Speed Modulo 2 n +1 Adder

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

High Speed Single Precision Floating Point Unit Implementation Using Verilog

Design of 2-Bit ALU using CMOS & GDI Logic Architectures.

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

Floating-Point Arithmetic

CO212 Lecture 10: Arithmetic & Logical Unit

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

An Efficient Implementation of Floating Point Multiplier

Floating-point representations

Floating-point representations

Chapter 5 : Computer Arithmetic

A Modified CORDIC Processor for Specific Angle Rotation based Applications

Review on Floating Point Adder and Converter Units Using VHDL

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop.

COMPARISION OF PARALLEL BCD MULTIPLICATION IN LUT-6 FPGA AND 64-BIT FLOTING POINT ARITHMATIC USING VHDL

MIPS Integer ALU Requirements

EE260: Logic Design, Spring n Integer multiplication. n Booth s algorithm. n Integer division. n Restoring, non-restoring

Run-Time Reconfigurable multi-precision floating point multiplier design based on pipelining technique using Karatsuba-Urdhva algorithms

COMPUTER ORGANIZATION AND ARCHITECTURE

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

Double Precision Floating-Point Arithmetic on FPGAs

Fixed Point. Basic idea: Choose a xed place in the binary number where the radix point is located. For the example above, the number is

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

Chapter 3: Arithmetic for Computers

Design and Optimized Implementation of Six-Operand Single- Precision Floating-Point Addition

DESIGN OF LOGARITHM BASED FLOATING POINT MULTIPLICATION AND DIVISION ON FPGA

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

FPGA Based Implementation of Pipelined 32-bit RISC Processor with Floating Point Unit

COMPUTER ORGANIZATION AND DESIGN

Number Systems and Computer Arithmetic

ARCHITECTURAL DESIGN OF 8 BIT FLOATING POINT MULTIPLICATION UNIT

Vendor Agnostic, High Performance, Double Precision Floating Point Division for FPGAs

(+A) + ( B) + (A B) (B A) + (A B) ( A) + (+ B) (A B) + (B A) + (A B) (+ A) (+ B) + (A - B) (B A) + (A B) ( A) ( B) (A B) + (B A) + (A B)

Integer Multiplication. Back to Arithmetic. Integer Multiplication. Example (Fig 4.25)

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL

Topics. 6.1 Number Systems and Radix Conversion 6.2 Fixed-Point Arithmetic 6.3 Seminumeric Aspects of ALU Design 6.4 Floating-Point Arithmetic

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

Implementation of 64-Bit Pipelined Floating Point ALU Using Verilog

Multi-Modulus Adder Implementation and its Applications

ECE260: Fundamentals of Computer Engineering

MODULO 2 n + 1 MAC UNIT

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

FLOATING POINT ADDERS AND MULTIPLIERS


International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

Quixilica Floating Point FPGA Cores

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

IEEE Standard for Floating-Point Arithmetic: 754

ENERGY-EFFICIENT VLSI REALIZATION OF BINARY64 DIVISION WITH REDUNDANT NUMBER SYSTEMS 1 AVANIGADDA. NAGA SANDHYA RANI

D I G I T A L C I R C U I T S E E

Signed Multiplication Multiply the positives Negate result if signs of operand are different

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

DIGITAL ARITHMETIC. Miloš D. Ercegovac Computer Science Department University of California Los Angeles and

FPGA Implementation of Low-Area Floating Point Multiplier Using Vedic Mathematics

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

LogiCORE IP Floating-Point Operator v6.2

Assistant Professor, PICT, Pune, Maharashtra, India

Computer Arithmetic andveriloghdl Fundamentals

A comparative study of Floating Point Multipliers Using Ripple Carry Adder and Carry Look Ahead Adder

Chapter 10 - Computer Arithmetic

Implementation of a High Speed Binary Floating point Multiplier Using Dadda Algorithm in FPGA

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION

Implementation of Double Precision Floating Point Multiplier Using Wallace Tree Multiplier

An Implementation of Double precision Floating point Adder & Subtractor Using Verilog

Development of an FPGA based high speed single precision floating point multiplier

Computer Architecture and Organization

Transcription:

VLSI DESIGN OF FLOATING POINT ARITHMETIC & LOGIC UNIT 1 DHANABAL R, 2 BHARATHI V, 3 G.SRI CHANDRAKIRAN, 4 BHARATH BHUSHAN REDDY.M 1 Assistant Professor (Senior Grade), VLSI division, SENSE, VIT University, 2 Assistant Professor, GGR College of Engineering, Anna University, Vellore, 3,4 BTECH Students, SENSE ECE Department, VIT University, Vellore- 632014,TN,INDIA, 54 BTECH Students, SMBS, MECHANICAL Department, VIT University, Vellore- 632014,TN,INDIA, E-mail : 1 rdhanabal@vit.ac.in, 2 bharathiveerappan@yahoo.co.in, 3 chandar_ck@yahoo.com, 4 bharath.mallireddy@gmail.com ABSTRACT In most modern general purpose computer one or more Floating Point Units are integrated with the CPU, however many embedded processors, especially older designs, do not have hardware support for floating point operation. In this paper, the design of DSP module such as floating point ALU is presented. The functions performed are handling of floating point data and converting data to IEEE 754 format and can perform arithmetic operations like Addition, Subtraction plication and Division. The Simulation tool used is Model Sim for verifying functional simulation. The tool for synthesis and power analysis is Quartus II Keywords: XOR, Full adders, XNOR, PTL, XOR-XNOR 1. INTRODUCTION In areas of Physics, excessively small and large values are used. For example, finding the effective mass of electron, Avogadro number etc. The range of fixed point format is insufficient to represent these values so we need something that is able to represent these numbers and operate on them. For representing these values the position of binary point should be variable hence we go for floating point numbers. plier-accumulator(mac) is the essential elements of the digital signal processing. plication involves two basic operations: the generation of partial products and their accumulation. [1] The addition and multiplication of two binary numbers is the fundamental and most often used arithmetic operation in microprocessors, digital signal processors, and data processing application-specific integrated circuits. [1] Since multiplication dominates the execution time of most DSP algorithms, so there is a need of high speed multiplier. [1] The Radix-4 MBA reduce N-bits of partial products to n/2 partial products.[1] The parallel multipliers like Radix-4 modified booth multiplier do the computations using lesser adders and lesser iterative steps. This is very important criteria because in the fabrication of chips and high performance system requires components which have area as small as possible. [1] The plier and Accumulator can be adapted to various fields requiring high performance such as signal processing areas.[1] Thus in above ALU, MAC unit can be included for fast multiplication processing [1]. In this paper, FPGA implementation of a high speed FPU has been carried out using efficient addition, subtraction, multiplication, division algorithms. Section II depicts the architecture of the floating point unit and methodology, to carry out the arithmetic operations. Section III presents the arithmetic operations that use efficient algorithms with some modifications to improve latency. Section IV presents the simulation results that have been simulated in Altera FPGA. Section V presents the conclusion. Standard IEEE 754 format of floating point number: Format Sign (s) Exponen t (e) Fraction or mantissa (m) Bias 703

Single precision Double precision 1bit [31] 1bit [63] 8bits [30-23] 11bits [52-62] 23bits [22-0] 52bits [51-0] 127 1023 NAN: NAN stands for Not A Number. It is used for representing undefined or unpredictable numbers. When E=255 and m 0, the value of NAN=1 and when performing an invalid operations such as 0/0, (-1) 1/2, ± /±,0*±. The floating point of any number is in the form given by n= (-1) s x 1.F x 2 (e-bias) 32 bit floating point number can be represented as n= (-1) s x 1.F x 2 (e-127) 64 bit floating point number can be represented as n= (-1) s x 1.F x 2 (e-1023) Sign bit: 0 represents positive number and 1 represents negative number Biasing the exponent: By adding a constant number to the original exponent, the biased exponent is always a positive number. The constant number which is to be added depends on number of bits available for exponent. For 32 bit number the exponent range is - 126 to +127. If you add +127 for exponent range you will get the range of +1 to +254 (0 and 255 are the special cases). Mantissa or Significant: Most of the bits are allotted to mantissa so that it will have more Precision. It consists of implicit leading bit and fraction bits. Special cases: List of special case given below are used for design of floating point unit for obtaining efficient capacities for the design. 1. If 1 E 254, then n= (-1) s x 2 (E-127) x (1.F) is normalized number. 2. If E=255 and F 0, then n=nan 3. If E=255 and F=0, then n= (-1) s x±. 4. If E=0 and F 0, then n= (-1) s x 2 (-126) x (0.F) is underflow 5. If E=0 and F=0, then n= (-1) s x 0 is positive and negative zero. Normalization: In order to store large quantity of representable numbers, floating-point numbers are typically stored in normalized form. De-normalized: If the exponent is all 0s, but the fraction is non-zero, then the value is a de-normalized number, which does not have an assumed leading 1 before the binary point. Thus, this represents a number (-1) s 0.f 2-126, where s is the sign bit and f is the fraction. Overflow: After performing ALU operations if the result of operation exceeds the maximum exponent range. Overflow can also occur not only in multiplication and division but also in addition/subtraction. Overflow occurs if the operands of two numbers are same in case of addition/subtraction. Underflow: When the result of operation is too small to be represented in exponent field. Underflow occurs, roughly speaking, when the result of an arithmetic operation is so small that it cannot be stored in its intended destination format without suffering a rounding error that is larger than usual Fig 1: Block Diagram Of Floating Point Addition And Subtraction 2. FLOATING POINT ADDITION/SUBTRACTION: Using the Classic Timing Analyzer tool, it was observed that the new multiplier design using carry look ahead adder has less delay compared to the other multiplier designs. The power consumption calculations of the Floating Point Addition/Subtraction were done successfully using play Analyser Tool in QuartusII mapping to the target device family CycloneII. It was observed that the power consumption of all the multiplier designs were more or less the same. Thus 704

the paper comes up with a new multiplier design of less delay without compromising on the power consumption.[2] The block diagram of 32-bit floating point addition/subtraction is as shown above Steps in Floating point Addition/Subtraction Algorithm: Step1: (i) Read the two floating point numbers. (ii)subtract the exponents; if sign bit is 1 perform swapping operation between two mantissas otherwise no swapping. (iii) The magnitude of exponent subtraction is used to shift the mantissa with lower exponent. Fig 2: Flow Chart Representing The Floating Point plier Step 2: (i)this step is performed by 2:1 multiplexer (ii)the result of mux is exponent A, if sign bit=0 or exponent B if sign bit=1.this sign bit is obtained from Step1 Step 3: (i)the important block of addition subtraction algorithm is Combinational Control Network. (ii)this Network determines which operation has to be performed on the two mantissas (iii)depending on two operand sign bits and add/subtract we can perform different operations for different combination of inputs Step 4: After performing add/subtract of two mantissas, normalize the result of mantissa and adjust the exponent. Floating point plication & Division: Floating point multiplication process is much simpler process compared to addition and subtraction. Floating Point plication Algorithm: i. First compute the sign, exponent, mantissa of the two operands. ii. The sign bit of the result is obtained by xoring the sign bits of two operands. iii. First convert the biased exponents to unbiased notation, add the exponents and then result of the exponent should be represented in biased notation. iv. ply the mantissas of two operands and this result should be normalized that means the mantissa result should be in the range of [1,4). v. The result is normalized by shift left, right correspondingly exponent was adjusted. vi. After the normalization result is rounded to nearest value. Rounding: If we multiply the two twenty four bit operands then the result should be forty eight bits. So we have to discard extra bits and consider the result in twenty eight bit format. This process is done using Rounding. 1. 11011010001101010010110 GRS We consider three extra bits these are G(guard),R(round),S(sticky). Floating point division Algorithm: Non Restoring algorithm is used for division. i. Read the operands 705

03/24/2010 SP 2 SJ Web Edition amily Timing Models Met timing requirements Yes Total logic elements 414 / 18,752 ( 2 % ) Total functions combinational 413 / 18,752 ( 2 % ) Dedicated logic registers 144 / 18,752 ( < 1 % ) Total registers 144 Total pins 98 / 315 ( 31 % Total virtual pins 0 Total memory bits 0 / 239,616 ( 0 % ) Embedded plier 9- bit elements 0 / 52 ( 0 % ) Total PLLs 0 / 4 ( 0 % ) ii. Compute exponent of the result by subtracting the exponents of two operands and the result is biased. iii. In Non Restoring algorithm first load the register A to zero, B with divisor and Q with dividend The following loop executed n number of times where n is number of dividend or divisor bits Fig3: Flow Chart For Non-Restoring Division Functional Analysis and Synthesis Results: Addition: Synthesis Results: Subtraction: Synthesis results: i. Left shift operation is performed on A and Q ii. If sign bit of A is zero left shift A and Q,subtract A and M and result is stored in A.and LSB of Q is set to 1. iii. If sign bit of A is one left shift A and Q,add A and M and result is stored in A.and LSB of Q is set to 0. After executing these statements If sign bit of A is Zero operation ends otherwise add A and M. Fig 3: Functional Verification Of Adder 706

Fig 4: Functional Verification Of Subtraction plication: Synthesis results: 03/24/2010 SP 2 SJ Web Edition Timing Models Met timing requirements ply ply Yes Total logic elements 127 / 18,752 ( < 1 % ) Total functions combinational 127 / 18,752 ( < 1 % ) Dedicated logic registers 32 / 18,752 ( < 1 % ) Total registers 32 Total pins 97 / 315 ( 31 % ) Total virtual pins 0 Total memory bits 0 / 239,616 ( 0 % ) Embedded plier 9-bit elements 7 / 52 ( 13 % ) Total PLLs 0 / 4 ( 0 % ) 03/24/2010 SP 2 SJ Web Edition Timing Models Met timing requirements No Total logic elements 414 / 18,752 ( 2 % ) Total functions combinational 409 / 18,752 ( 2 % ) Dedicated logic registers 144 / 18,752 ( < 1 % ) Total registers 144 Total pins 98 / 315 ( 31 % ) Total virtual pins 0 Total memory bits 0 / 239,616 ( 0 % ) Embedded plier 9-bit elements 0 / 52 ( 0 % ) Total PLLs 0 / 4 ( 0 % ) Fig 5: Functional Verification Of plication 707

Division: Reduced: Fig 6: Functional Verification Of Division In Model Sim Analysis: Addition/ Subtraction: Normal: 03/24/2010 SP 2 SJ WebEdition Models Total Thermal Core Dynamic Thermal Core Static Thermal I/O Thermal Estimation 202.18 mw 21.48 mw 47.59 mw 133.11 mw Low user provided insufficient toggle rate data 03/24/2010 SP 2 SJ Web Edition Models Total Thermal Core Dynamic Thermal P Core Static Thermal I/O Thermal Estimation plication:normal. 86.00 mw 5.63 mw 47.38 mw 32.99 mw Medium user provided moderately complete toggle rate data 03/24/2010 SP 2 SJ Web Edition Models Total Thermal Core Dynamic Thermal D Core Static Thermal I/O Thermal Estimation 125.04 mw 19.15 mw 47.45 mw 58.44 mw Low: user provided insufficient toggle rate data 708

Reduced: 03/24/2010 SP 2 SJ Web Edition Models Total Thermal 94.45 mw Core Dynamic Thermal PD 5.83 mw Core Static Thermal I/O Thermal 3. CONCLUSION: 47.40 mw 41.22 mw Estimation Medium: user provided moderately complete toggle rate data All the Floating Point ALU modules are designed from block diagram approach and we performed synthesis, functional verification and power analysis for addition, subtraction and multiplication in Quartus II Altera and functional simulation for division in Model Sim. Here we have designed floating point ALU for normalized inputs; This ALU can also be extended for performing Square root, exponential and logarithmic. Even pipelining for above FPU can increase the efficiency. In order to increase the accuracy of mantissa we can implement it using IEEE 754 Double Precision Format. Based on figure 1 to figure 6, the Functional Verification of various functions of ALU in ModelSim, Proposed plication takes 25% less total thermal power dissipation, 88% less core dynamic thermal power dissipation and 30 % less I/O thermal power dissipation. Proposed Adder/Subtractor takes 57% less total thermal power dissipation, 45 % less core static thermal power dissipation, 73% less core dynamic thermal power dissipation and 75 % less I/O thermal power dissipation as per the above results developed by 03/24/2010 SP 2 SJ Web Edition for. REFRENCES: [1] R. Zimmerman, Efficient VLSI Implementation of Modulo 2n+1Þ Addition and plication, Proc. 14th IEEE Symp. Computer Arithmetic, pp. 158-167, Apr. 1999. [2] H.T. Vergos, C. Efstathiou, and D. Nikolos, Diminished-One Modulo 2n+1 Adder Design, IEEE Trans. Computers, vol. 51, no. 12, pp. 1389-1399, Dec. 2002. [3] H.T. Vergos and C. Efstathiou, Efficient Modulo 2n + 1 Adder Architectures, Integration, the VLSI J., vol. 42, no. 2, pp. 149-157, Feb. 2009. [4] H.T. Vergos, G. Dimitrakopoulos, On modulo 2n + 1 adder design, IEEE Trans. Comput. 61 (2) (2012) 173 186. [5] S.-H. Lin and M.-H. Sheu, VLSI Design of Diminished-One Modulo 2n + 1 Adder Using Circular Carry Section, IEEE Trans. Circuits and Systems II, vol. 55, no. 9, pp. 897-901, Sept. 2008. [6] Dhanabal R,,Bharathi V,Saira Salim, Bincy Thomas, Hyma Soman, Dr Sarat Kumar Sahoo DESIGN OF 16-BIT LOW POWER ALU - DBGPU,International Journal of Engineering and Technology (IJET) 2013. [7] R Dhanabal,V Bharathi, Anand N, George Joseph, Suwin Sam Oommen, Dr Sarat Kumar Sahoo,"Comparison of Existing pliers and Proposal of a New Design for Optimized Performance ",International Journal of Engineering and Technology (IJET) 2013. [8] R Dhanabal, Ushashree, " Implementation of a High Speed Single Precision Floating Point Unit using Verilog",International Journal of Computer Applications (0975 8887),2013. 709