Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter

Similar documents
HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

Vertical-Horizontal Binary Common Sub- Expression Elimination for Reconfigurable Transposed Form FIR Filter

An Efficient Constant Multiplier Architecture Based On Vertical- Horizontal Binary Common Sub-Expression Elimination Algorithm

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

MCM Based FIR Filter Architecture for High Performance

FPGA Implementation of Single Precision Floating Point Multiplier Using High Speed Compressors

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

DESIGN AND IMPLEMENTATION OF DA- BASED RECONFIGURABLE FIR DIGITAL FILTER USING VERILOGHDL

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

Improved Design of High Performance Radix-10 Multiplication Using BCD Codes

High Speed Special Function Unit for Graphics Processing Unit

Parallel FIR Filters. Chapter 5

A HIGH PERFORMANCE FIR FILTER ARCHITECTURE FOR FIXED AND RECONFIGURABLE APPLICATIONS

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

Optimized Design and Implementation of a 16-bit Iterative Logarithmic Multiplier

32-bit Signed and Unsigned Advanced Modified Booth Multiplication using Radix-4 Encoding Algorithm

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 10 /Issue 1 / JUN 2018

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

FIR Filter Architecture for Fixed and Reconfigurable Applications

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

DUE to the high computational complexity and real-time

Analysis of Different Multiplication Algorithms & FPGA Implementation

University, Patiala, Punjab, India 1 2

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILTER USED IN ECHO CANCELLATION

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

FPGA Based FIR Filter using Parallel Pipelined Structure

Efficient Implementation of Low Power 2-D DCT Architecture

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

A High Performance Reconfigurable Data Path Architecture For Flexible Accelerator

Digital Filter Synthesis Considering Multiple Adder Graphs for a Coefficient

Design of 2-D DWT VLSI Architecture for Image Processing

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications

Fixed Point LMS Adaptive Filter with Low Adaptation Delay

Performance Analysis of 64-Bit Carry Look Ahead Adder

Research Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-6, Issue-8) Abstract:

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra Pradesh, India

Implementation of Double Precision Floating Point Multiplier Using Wallace Tree Multiplier

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit

Delay Optimised 16 Bit Twin Precision Baugh Wooley Multiplier

AnEfficientImplementationofDigitFIRFiltersusingMemorybasedRealization

An Efficient FPGA Implementation of the Advanced Encryption Standard (AES) Algorithm Using S-Box

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

International Journal of Innovative and Emerging Research in Engineering. e-issn: p-issn:

High Performance Architecture for Reciprocal Function Evaluation on Virtex II FPGA

Paper ID # IC In the last decade many research have been carried

Critical-Path Realization and Implementation of the LMS Adaptive Algorithm Using Verilog-HDL and Cadence-Tool

PROJECT REPORT IMPLEMENTATION OF LOGARITHM COMPUTATION DEVICE AS PART OF VLSI TOOLS COURSE

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

A Modified CORDIC Processor for Specific Angle Rotation based Applications

Efficient Radix-10 Multiplication Using BCD Codes

Design and Implementation of 3-D DWT for Video Processing Applications

High Throughput Radix-D Multiplication Using BCD

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Implementation of Reduce the Area- Power Efficient Fixed-Point LMS Adaptive Filter with Low Adaptation-Delay

Fault Tolerant Parallel Filters Based On Bch Codes

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers

A Dedicated Hardware Solution for the HEVC Interpolation Unit

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Pipelined High Speed Double Precision Floating Point Multiplier Using Dadda Algorithm Based on FPGA

Design of Convolutional Codes for varying Constraint Lengths

The Efficient Implementation of Numerical Integration for FPGA Platforms

Design Optimization Techniques Evaluation for High Performance Parallel FIR Filters in FPGA

Run-Time Reconfigurable multi-precision floating point multiplier design based on pipelining technique using Karatsuba-Urdhva algorithms

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

Australian Journal of Basic and Applied Sciences

High Speed Multiplication Using BCD Codes For DSP Applications

Study, Implementation and Survey of Different VLSI Architectures for Multipliers

RADIX-4 AND RADIX-8 MULTIPLIER USING VERILOG HDL

Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating point

Area Delay Power Efficient Carry-Select Adder

Implimentation of A 16-bit RISC Processor for Convolution Application

FPGA Implementation of Discrete Fourier Transform Using CORDIC Algorithm

FPGA based Simulation of Clock Gated ALU Architecture with Multiplexed Logic Enable for Low Power Applications

Optimization Method for Broadband Modem FIR Filter Design using Common Subexpression Elimination

Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field

International Journal for Research in Applied Science & Engineering Technology (IJRASET) IIR filter design using CSA for DSP applications

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

Implementation of Double Precision Floating Point Multiplier in VHDL

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

Transcription:

African Journal of Basic & Applied Sciences 9 (1): 53-58, 2017 ISSN 2079-2034 IDOSI Publications, 2017 DOI: 10.5829/idosi.ajbas.2017.53.58 Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter M. Jenifer Shopana and S. Poorna Lekha Department of ECE, V V College of Engineering, Tirunelveli, Tamilnadu, India Abstract: This paper presentsan efficient multiplier architecture based Upon Vertical Horizontal Binary Common Subexpression Elimination (VHBCSE) algorithm and memory based structure for FIR filter synthesis is proposed. The key requirement of FIR filter is low complexity. In a Finite Impulse Response (FIR) filter, the multiplier plays a vital role which defines the performance of the filter. Binary Common Subexpression Elimination (BCSE) algorithm is introduced to eliminate the Common Subexpression (CS) in binary form for designing an efficient multiplier. First, horizontal BCSE algorithm is introduced to find the CSs occurring within each coefficient (multiplier) followed by vertical BCSE which uses CSs found across adjacent coefficients inorder to eliminate redundant computations. Also Look Up Table (LUT) based multiplication are suggested to reduce the size over the conventional design. As a result, the proposed multiplier offers good reduction in the number of slices and LUT usage in realizing FIR filters. The proposed scheme is coded in Verilog language and simulation is performed using Xilinx ISE 13.4. Key words: Binary Common Subexpression Elimination (BCSE) algorithm Common Subexpression Elimination (CSE) Finite Impulse Response (FIR) filter Look Up Table (LUT) INTRODUCTION The ever increasing growth in laptop and portable systems in cellular networks has intensified the research efforts in low power microelectronics. Today, there are numerous portable applications requiring low power and high throughput than ever before. For example, notebook and laptop computers, representing the fastest growing segment of the computer industry, are demanding the same computation capabilities as found in desktop machines. The developments in personal communication services (PCS s), employ complex speech compression algorithms and sophisticated radio modems in a pocket sized device. Thus, low power system design has become a significant performance goal. So the designers faced more constraints: high speed, high throughput, less silicon area and at the same time, consumes as minimal power as possible. As the scale of integration keeps growing, more and more sophisticated signal processing systems are being implemented on a VLSI chip [14]. These signal processing applications not only demand great computation capacity but also consume considerable amount of energy. While performance and area remain to be the two major design tools, power consumption has become a critical concern in today s VLSI system design. The FPGA is one of the best devices for hardware implementation of the DSP applications. The need for low-power VLSI system arises from two main forces. First, with the steady growth of operating frequency and processing capacity per chip, large currents have to be delivered and the heat due to large power consumption must be removed by proper cooling techniques. Second, battery life in portable electronic devices is limited. Low power design directly leads to prolonged operation time in these portable devices. The Finite Impulse Response (FIR) Filter is the important component for designing an efficient digital signal processing system [10]. When considered the elementary structure of an FIR filter, it is found that it is a combination of multipliers and delays, which in turn are the combination of adders. A system s performance is generally determined by the performance of the multiplier because the multiplier is generally the slowest element in Corresponding Author: M. Jenifer Shopana, Department of ECE, V V College of Engineering, Tirunelveli, Tamilnadu, India. 53

the system. Furthermore, it is generally the most area Polynomial Evaluation: Subexpression elimination can be consuming. Hence, optimizing the speed and area of the applied to polynomial evaluation to reduce the multiplier is a major design issue. Thus, the goal of the computational complexity. Suppose we are to evaluate the research is to design an efficient multiplier which could be polynomial: used to build FIR filter. + + + + (2) Subexpression Elimination: Constant multiplication can be implemented efficiently by using dedicated shift-and- If the redundancies in this calculation are not add multipliers. Subexpression elimination is a numerical considered, the evaluation of this polynomial would transformation of the constant multiplication that can lead require 22 multiplications. However, if the exponents are to efficient hardware implementations in terms of area, examined, the number of multiplications required to power and speed. Subexpression elimination is applied to evaluate this polynomial can be reduced. a set of constant multiplications with the same constant multiplicand. Subexpression elimination can be performed MATERIALS AND METHODS on constant multiplications that operate on a common variable. Essentially, subexpression elimination is the VHBCSE Algorithm: VHBCSE algorithm is a combination process of examining the shift-and-add implementations of vertical and horizontal BCSE which is used for of the constant multiplications and finding redundant designing a FIR filter. Vertical and horizontal BCSEs are operations. Once the redundancies are found, these the two types of BCSE which eliminates the Common operations can be performed once and shared among the Subexpressions (CSs) present across the adjacent constant multiplications [9]. coefficients (multipliers) and within the coefficients (multipliers) respectively in avhbcse algorithm. Vertical Multiple Constant Multiplication(MCM): Subexpression BCSE is more effective in CS elimination compared to the elimination can be applied to a set of constant multipliers horizontal BCSE. The proposed Vertical Horizontal Binary that multiply a common variable. The multiple constant Common Subexpression Elimination (VHBCSE) algorithm multiplication problem determines how subexpression is made of the following steps: elimination can be applied to the set of constant multipliers so that the number of shifts and additions Get the input of 16-bits (x[15:0]). required for implementation is minimized. One of the Store coefficients of 16-bits (h[15:0]) in LUT. attractive is its versatility and adaptability to various Partition coefficient (h[15:0]) into fixed groups of 2 forms of problem. bits each and use these groups as the select lines to the corresponding multiplexer (M7-M0) at layer1 as Linear Transformation: Subexpression elimination can shown in Fig. 1. also be applied to linear transformations. When Partition the coefficient into groups of 4 bits each subexpression elimination is applied to constant (h[15:12], h[11:8], h[7:4], h[3:0]). multiplication, the constants. The nonzero bits in the Compare h[15:12] with h[11:8] in layer 2. binary representation of a constant determine the number If match is found then skip the output of the adder of partial products that need to be generated and summed A2. in order to realize the constant multiplier [9]. Consider the Otherwise, take the output of the adder A2. general form of the linear transformation: y=t*x, where T Compare h[15:12] with h[7:4] in the layer-2. is m by n matrix, y is a length-m vector and x is a length-n If match is found then skip the output of the adder vector. Writing this in an equivalent form, we have: A3. Otherwise, compare h[11:8] with h[7:4] in the layer 2. n yi = t, 1, 2 j 1 ijxj i = (1) If match is found then skip the output of the adder = (A3). When considering linear transformations, the Otherwise, use the output of the adder (A3). subexpression elimination problem consists of 3 basic Compare h[15:12] with h[3:0] in the layer-2. st steps. The 1 step is to minimize the number of shifts and If match is found then skip the output of the adder additions required to compute the products tijx, j i=1,2. A4. 54

Fig. 1: Block Diagram of Multiplier and Final Addition Unit Otherwise compare h[11:8] with h[3:0] in the layer-2. unit at layer-1, controlled addition at layer-2, controlled If match is found then skip the output of the adder addition at layer-3, final addition at layer-4 and control (A4). logic generator. The data flow diagram of the proposed Otherwise, compare h[7:4] with h[3:0] in the layer-2. vertical-horizontal algorithm based Constant Multiplier If match is found then skip the output of the adder (CM) design is shown in Fig. 2. (A4). Functionality along with hardware architecture of Else use the output of the adder (A4). different blocks of the designed VHBCSE based multiplier Partition the coefficient into fixed group of 8-bit are explained in details below. (h[15:8], h[7:0]). Compare h[15:8] with h[7:0] in the layer 3. Partial Product Generator (PPG): In this algorithm, shift If match is found then skip the output of the adder and add based technique is used to generate the partial A6. product and summed up for producing the final Else take the output of the adder A6. multiplication result. The number of partial products is Obtain the final addition result by performing 1-bit defined by the Choice of the size of BCS. Partial right shift on the output of the adder A7. Product Generator block helps in reducing the size of Multiplication is completed. Store this result, h*x, in multiplexer. the register. Control Logic (CL) Generator: Control logic generator Hardware Architecture of VHBCSE Algorithm: Consider block takes the coefficient (H[15:0]) as its input and the length of the input as Xin and coefficient as H with groups it into one of 4-bit each (H[15:12], H[11:8], H[7:4] 16-bit and 16-bit respectively while the output is assumed and H[3:0]). Another of 8-bit each (H[15:8], H[7:0]). to be 32-bit long. First the input (Xin) is stored in the register and the coefficients are stored in the LUTs. The Multiplexer Unit: The multiplexer unit is used to select data flow diagram of the proposed vertical-horizontal the data generated from the PPG unit depending on the algorithm based on constant multiplier (CM) design coefficients binary value.this is used to reduce the consists of partial product generator block, multiplexer hardware and power consumption. 55

Fig. 2: Flow Diagram of VHBCSE algorithm Fig. 3: LUT Multiplier Based FIR Filter Controlled Addition at Layer-2: The Partial Products (PP) generated from eight groups of two-bit BCSs are added to produce the final multiplication results which have been performed in three layers. Controlled Addition at Layer-3: The four multiplexed sums (AS1, AS2, AS3 and AS4) generated from layer-2 are summed up in layer-3. In our algorithm, controlled additions are performed, instead of direct addition of these four sums as shown in Figure 3. This addition (A6) is controlled by the control signal (C7) which is generated based on 8-bit BCSE from the CS generator block. Final Addition at Layer-4: This block performs the addition operation between the two sums (AS5-AS6) produced by layer-3 to finally produce the multiplication result between the input and the coefficient. 56

Look Up Table (LUT)-Multiplier Based Filter: The memory-based structures or memory-based systems the memory elements like RAM or ROM is employed either as a section or whole of an arithmetic unit. Memory-based structures are more regular compared with the multiply-accumulate structures and have several alternative benefits. Example, larger potential for high-throughput and reduced-latency implementation, (since the memoryaccess-time is much shorter than the usual multiplicationtime) and are expected to own less dynamic power consumption due to less switching activities for memory-read operations compared to the standard multipliers. Memory-based structures are well suited for several digital signal processing (DSP) algorithms, thatinvolve multiplication with a set of coefficients. Figure 3 shows Look Up Table (LUT) multiplier based FIR filter which uses the VHBCSE algorithm in combination with the LUT approach. RESULTS AND DISCUSSION The proposed VHBCSE-LUT based multiplier is coded in Verilog language. The simulation results are obtained using Xilinx ISE 13.4. Using VHBCSE algorithm simulation is performed for 16 bit multiplication, x as its multiplicand and h as its multiplier which eliminates the common subexpression and produce the result. The Figure 4 shows the simulation result of vertical horizontal binary common subexpression algorithm based multiplier for FIR filter. The Figure 5 shows the common subexpression found during multiplication operation. From the Figure 6, the input Xin is given as 1010101010101010, initially the reset is given 1 and then reset is given 0. When the reset is given 0 the results of h0, h1, h2, h3, h4 are loaded for each clock cycle. The Figure 7, shows that there is a considerable reduction in area based on the based on the usage of slices and LUT. The device utilization of VHBCSE algorithm as shown in Figure 7, the number of slices utilized is 469, number of 4 input LUT used is 838. The device utilization of VHBCSE and LUT based multiplier for FIR filter is shown in Figure 7, the number of slices utilized is 438, number of 4 input LUT used is 786. Based on the number of slices utilized and LUTs the result can be compared. The proposed VHBCSE-LUT algorithm significantly reduces area consumption when compared to the VHBCSE algorithm. Fig. 4: Waveform of VHBCSE algorithm based multiplier Fig. 5: Waveform of common subexpression Fig. 6: Waveform of VHBCSE and LUT based multiplier for FIR filter Fig. 7: Performance analysis interms of slice and LUT utilization 57

CONCLUSION 6. Ko, H.J. and S.F. Hsiao, 2011. Design and application of faithfully rounded and truncated multipliers with The proposed VHBCSE and LUT-multiplier-based combined deletion, reduction, truncation and filter presents one new Vertical-Horizontal BCSE algorithm rounding, IEEE Transactions on Circuits Systems-II, which removes the initial common sub-expressions (CSs) Analog Digital Signal Processing, 58(5): 304-308. by applying 8 bit and 4 bit common expressions combined 7. Indranil Hatai, Indrajit Chakrabarti and Swapna with LUT method. New approaches to LUT-based-with Banerjee, 2015. An efficient constant multiplier multiplication are suggested to reduce the LUT-size over architecture based on vertical horizontal common that of conventional design. The proposed method has subexpression algorithm for reconfigurable FIR filter nearly the same memory requirementand less number of synthesis, IEEE Transactions on Circuits and slices and LUT usage than the existing methods. Systems-I: Regular Papers, 62(4): 1071-1080. 8. Dandapat, A., S. Ghosal, P. Sarkar and P. REFERENCES Mukhopadhyay, 2009. A 1.2 ns 16 16-Bit Binary Multiplier Using High Speed Compressors, 1. IndranilHatai, Indrajit Chakrabarti and S. Banarjee, International Journal of Electrical, Computer and 2015. An efficient VLSI architecture of a Systems Engineering, pp: 234-239. reconfigurable pulse-shaping FIR interpolation 9. Keshab K. Parhi, 2008. VLSI Signal Processing filter for multi-standard DUC, IEEE Transactions On Systems, John Wiley & Sons. Very Large Scale Integration (VLSI) Systems, rd 10. Morris Mano M., 2006. Digital Design, vol.3 23(6):1150-1154. Edition, Prentice Hall of India Private Limited. 2. Baugh, C.R. and B.A Wooley, 1973. A two s 11. Nazieh Botros, 2000. HDL Programming complement parallel array multiplication algorithm, fundamentals VHDL & Verilog, Da Vinci Engineering IEEE Transactions on Computer, c-22: 1045-1047. Press. 3. Meher, P.K., 2010. New approach to look-up-table 12. Wayne Wolf, 2008. Modern VLSI Design System-ondesign and memory-based realization of FIR digital rd Chip Design, vol. 3 Edition, prentice Hall. filter, IEEE Transactions on Circuits Systems-I, Regular Papers, 57(3): 592-603. 4. Wallace, C., 1965. A suggestion for a fast multiplier, IEEE Transactions on Electronics Computer, EC-13(1): 14-17. 5. Chia-Yu Yao, Hsin-Horng Chen, Tsuan-Fan Lin, Chiang-JuChien and Chun-Te Hsu, 2004. A Novel Common-Subexpression-Elimination Method for Synthesizing Fixed-Point FIR Filters, IEEE Transactions on Circuits and Systems-I: Regular Papers, 51(11): 2215-2221. 58