A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT

Similar documents
Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design

Implementation of High Speed FIR Filter using Serial and Parallel Distributed Arithmetic Algorithm

DESIGN AND IMPLEMENTATION OF DA- BASED RECONFIGURABLE FIR DIGITAL FILTER USING VERILOGHDL

A HIGH PERFORMANCE FIR FILTER ARCHITECTURE FOR FIXED AND RECONFIGURABLE APPLICATIONS

Parallel FIR Filters. Chapter 5

Power and Area Efficient Implementation for Parallel FIR Filters Using FFAs and DA

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 10 /Issue 1 / JUN 2018

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay

Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

Fixed Point LMS Adaptive Filter with Low Adaptation Delay

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

Two High Performance Adaptive Filter Implementation Schemes Using Distributed Arithmetic

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

MCM Based FIR Filter Architecture for High Performance

Fast Block LMS Adaptive Filter Using DA Technique for High Performance in FGPA

32-bit Signed and Unsigned Advanced Modified Booth Multiplication using Radix-4 Encoding Algorithm

Hardware Realization of FIR Filter Implementation through FPGA

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

The Efficient Implementation of Numerical Integration for FPGA Platforms

FPGA Based FIR Filter using Parallel Pipelined Structure

INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS.

Vertical-Horizontal Binary Common Sub- Expression Elimination for Reconfigurable Transposed Form FIR Filter

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

Low-Power, High-Throughput and Low-Area Adaptive Fir Filter Based On Distributed Arithmetic Using FPGA

Design and Efficient FPGA Implementation of an RGB to YCrCb Color Space Converter Using Distributed Arithmetic

Implementation of Two Level DWT VLSI Architecture

DUE to the high computational complexity and real-time

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

Optimized Design and Implementation of a 16-bit Iterative Logarithmic Multiplier

Critical-Path Realization and Implementation of the LMS Adaptive Algorithm Using Verilog-HDL and Cadence-Tool

AnEfficientImplementationofDigitFIRFiltersusingMemorybasedRealization

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Design of Delay Efficient Distributed Arithmetic Based Split Radix FFT

PROJECT REPORT IMPLEMENTATION OF LOGARITHM COMPUTATION DEVICE AS PART OF VLSI TOOLS COURSE

Reduction of Latency and Resource Usage in Bit-Level Pipelined Data Paths for FPGAs

An Efficient Constant Multiplier Architecture Based On Vertical- Horizontal Binary Common Sub-Expression Elimination Algorithm

Recent Results from Analyzing the Performance of Heuristic Search

Paper ID # IC In the last decade many research have been carried

OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION

Implementation of a FIR Filter on a Partial Reconfigurable Platform

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra Pradesh, India

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

Efficient Implementation of Low Power 2-D DCT Architecture

FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILTER USED IN ECHO CANCELLATION

High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications

Fault Tolerant Parallel Filters Based On Bch Codes

A Novel VLSI Architecture for Digital Image Compression using Discrete Cosine Transform and Quantization

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India

Implementation of Reduce the Area- Power Efficient Fixed-Point LMS Adaptive Filter with Low Adaptation-Delay

A High-Speed FPGA Implementation of an RSD- Based ECC Processor

A Novel Distributed Arithmetic Multiplierless Approach for Computing Complex Inner Products

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS

On Designs of Radix Converters Using Arithmetic Decompositions

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

Low-Power FIR Digital Filters Using Residue Arithmetic

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE

Dynamic Partial Reconfigurable FIR Filter Design

Analysis of Different Multiplication Algorithms & FPGA Implementation

A High Performance Reconfigurable Data Path Architecture For Flexible Accelerator

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

International Journal for Research in Applied Science & Engineering Technology (IJRASET) IIR filter design using CSA for DSP applications

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers

Implementation of Lifting-Based Two Dimensional Discrete Wavelet Transform on FPGA Using Pipeline Architecture

Improved Design of High Performance Radix-10 Multiplication Using BCD Codes

An Efficient Implementation of Fixed-Point LMS Adaptive Filter Using Verilog HDL

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

MODULO 2 n + 1 MAC UNIT

Research Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-6, Issue-8) Abstract:

Three-D DWT of Efficient Architecture

An Efficient Flexible Architecture for Error Tolerant Applications

Design and Implementation of Adaptive FIR filter using Systolic Architecture

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

Australian Journal of Basic and Applied Sciences

COMPARISON OF DIFFERENT REALIZATION TECHNIQUES OF IIR FILTERS USING SYSTEM GENERATOR

FIR Filter Architecture for Fixed and Reconfigurable Applications

Implementation of Area Efficient Multiplexer Based Cordic

A Modified CORDIC Processor for Specific Angle Rotation based Applications

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices

Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field

Implementation of a Low Power Decimation Filter Using 1/3-Band IIR Filter

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication

IMPLEMENTATION OF DISTRIBUTED CANNY EDGE DETECTOR ON FPGA

MRPF: An Architectural Transformation for Synthesis of High-Performance and Low-Power Digital Filters

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications for Efficient FIR Filter Implementation

Abstract. Literature Survey. Introduction. A.Radix-2/8 FFT algorithm for length qx2 m DFTs

Design of 2-D DWT VLSI Architecture for Image Processing

Transcription:

IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 7, Issue 2 (Jul. - Aug. 2013), PP 13-18 A Novel Approach of Area-Efficient FIR Filter Design Using Distriuted Arithmetic with Decomposed 1 P.Sravanthi, 2 CH.Srinivasa Rao, 3 S.Madhava Rao, 1 M.TECH student, MLE College of Engineering, S.onda, prakasam(dt), A.P, 2,3 ASSOC.Professor, MLECollege, singarayakonda, prakasam, A.P, INDIA, Astract: In this paper, a highly area-efficient multiplier-less FIR filter is presented. Distriuted Arithmetic (DA) has een used to implement a it-serial scheme of a general asymmetric version of an FIR filter, taking optimal advantage of the 3-input -ased structure of FPGAs. The implementation of FIR filters on FPGA ased on traditional arithmetic method costs considerale hardware resources, which goes against the decrease of circuit scale and the increase of system speed. This paper presents the realization of area efficient architectures using Distriuted Arithmetic (DA) for implementation of Finite Impulse Response (FIR) filter. The performance of the it-serial and it parallel DA along with pipelining architecture with different quantized versions are analyzed for FIR filter Design. Distriuted Arithmetic structure is used to increase the resource usage while pipeline structure is also used to increase the system speed. In addition, the divided method is also used to decrease the required memory units. However, according to Distriuted Arithmetic, we can make a Look-Up-Tale () to conserve the MAC values and callout the values according to the input data if necessary. Therefore, can e created to take the place of MAC units so as to save the hardware resources. The simulation results indicate that FIR filters using Distriuted Arithmetic can work stale with high speed and can save almost 50 percent hardware resources to decrease the circuit scale, and can e applied to a variety of areas for its great flexiility and high reliaility. This method not only reduces the size, ut also modifies the structure of the filter to achieve high speed performance. eywords: DSP, Digital Filters, FIR, FPGA, MAC, Distriuted Arithmetic(DA),Divided, pipeline I. Introduction In the recent years, there has een a growing trend to implement digital signal processing functions in Field Programmale Gate Array (FPGA). In this sense, we need to put great effort in designing efficient architectures for digital signal processing functions such as FIR filters, which are widely used in video and audio signal processing, telecommunications and etc. Traditionally, direct implementation of a -tap FIR filter requires multiply-and-accumulate (MAC) locks, which are expensive to implement in FPGA due to logic complexity and resource usage. To resolve this issue, we first present DA, which is a multiplier-less architecture. Implementing multipliers using the logic faric of the FPGA is costly due to logic complexity and area usage, especially when the filter size is large. Modern FPGAs have dedicated DSP locks that alleviate this prolem, however for very large filter sizes the challenge of reducing area and complexity still remains. An alternative to computing the multiplication is to decompose the MAC operations into a series of lookup tale () accesses and summations. This approach is termed distriuted arithmetic (DA), a it serial method of computing the inner product of two vectors with a fixed numer of cycles. The original DA architecture stores all the possile inary cominations of the coefficients w[ of equation (1) in a memory or lookup tale[5,7,9]. It is evident that for large values of L, the size of the memory containing the pre computed terms grows exponentially too large to e practical. The memory size can e reduced y dividing the single large memory (2Lwords) into m multiple smaller sized memories each of size 2k where L = m k. The memory size can e further reduced to 2L 1 and 2L 2 y applying offset inary coding and exploiting resultant symmetries found in the contents of the memories[8,9]. This technique is ased on using 2's complement inary representation of data, and the data can e precomputed and stored in. As DA is a very efficient solution especially suited for -ased FPGA architectures, many researchers put great effort in using DA to implement FIR filters in FPGA. Patrick Longa introduced the structure of the FIR filter using DA algorithm and the functions of each part. Sangyun Hwang analyzed the power consumption of the filter using DA algorithm[3,7]. Heejong Yoo proposed a modified DA architecture that gradually replaces requirements with multiplexer/adder pairs. ut the main prolem of DA is that the requirement of capacity increases exponentially with the order of the filter, given that DA implementations need 2words ( is the numer of taps of the filter)[4,5,10]. And if is a prime, the hardware resource consumption will cost even higher. To overcome these prolems, this paper presents a hardwareefficient DA architecture. 13 Page

A Novel Approch Of Area-Efficient Fir Filter Design Using Distriuted Arithmetic With Decomposed This method not only reduces the size, ut also modifies the structure of the filter to achieve high speed performance. The proposed filter has een designed and synthesized with ISE 7.1, and implemented with a 4VLX40FF668 FPGA device. [2] Our results show that the proposed DA architecture can implement FIR filters with high speed and smaller resource usage in comparison to the previous DA architecture. II. Design Methodology 2.1 Distriuted Arithmetic FIR filter architecture Distriuted Arithmetic is one of the most well-known methods of implementing FIR filters. The DA solves the computation of the inner product equation when the coefficients are pre knowledge, as happens in FIR filters. An FIR filter of length is descried as: 1 y[ n] x[ n --(1) Where is the filter coefficient and x[ is the input data. For the convenience of analysis, x'[ =x [n - is used for modifying the equation (1) and we have: 1 ' y. x [...(2) Then we use -it two's complement inary numers to represent the input data: 1 0 ' x [ 2 x [ x [ 2 -(3) y 2 Where [ 1 x denotes the th it of x[, [ ( 2 1 x [ 1 0 h [ x[ 2 1 0 1 0 x x [ 2 1 {0, 1}.Sustitution of (3) into (2) yields: ) x 2 f (, x [ ) 2 f (, x [ ) - (4) We have 1 [ f (, x[ ) x[ In equation (4), we oserve that the filter coefficients can e pre-stored in, and addressed y x = [ x [0], x [1],... x [ 1] ].This way, the MAC locks of FIR filters are reduced to access and summation with. The implementation of digital filters using this arithmetic is done y using registers, memory resources and a scaling accumulator. Original -ased DA implementation of a 4-tap (=4) FIR filter is shown in Figure 2.1. The DA architecture includes three units: the shift register unit, the DA- unit, and the adder/shifter unit. Figure 2.1: Original -ased DA implementation of a 4-tap filter 14 Page

A Novel Approch Of Area-Efficient Fir Filter Design Using Distriuted Arithmetic With Decomposed 2.2 Distriuted Arithmetic Architecture With Out Lut: In Fig. 2.1, we can see that the lower half of (locations where 3 =1) is the same with the sum of the upper half of (locations where 3 =0) and h [3]. Hence, size can e reduced 1/2 with an additional 2x1 multiplexer and a full adder, as shown in Figure 2.2.2(a). y the same reduction procedure, we can have the final -less DA architectures, as shown in Figure 2.2.2(). On other side, for the use of comination logic circuit, the filter performance will e affected. ut when the taps of the filter is a prime, we can use 4-input units with additional multiplexers and full adders to get the trade off etween filter performance and small resource usage. 3 Figure 2.2.2(a): Proposed DA architecture for a 4-tap filter ( 2 word implementation of DA) Figure 2.2.2(): -less DA architectures III. Distriuted Arithmetic With 4-Input Look Up Tale Architecture The main advantage of FIR filters is their linear-phase response. It is only achieved with symmetrical or ant symmetrical structures which, furthermore, offer hardware reductions. As the filter order increases, the size ( 2 words) grows exponentially. With this it is difficult to implement the architecture using lut less method as more and more cominational logic is needed. This resulted in a new architecture which is explained elow. 3.1 Four input look up tale architecture: As the filter order increases, the size ( 2 words) grows exponentially. We can divide the - M tap FIR filter into L groups of smaller filters ( = L *M). Hence the size is reduced to L 2 words, and then we have: y LM 1 x[ L1 M 1 l 0 m0 lm m] x[ lm m] 15 Page

A Novel Approch Of Area-Efficient Fir Filter Design Using Distriuted Arithmetic With Decomposed We would etter set M=4, ecause only under this condition can we take optimal advantage ased on the 4-input structure of FPGA, as shown in Fig 3.1. If we can not otain M=4 after the calculation, we can use oth the proposed DA- unit and the original 4-input DA- unit to achieve our goal. X[0]..x1[0] X[M-1]...x1[M-1] x0[0] x0[m-1] + X[M]. x1[m] x0[m] X[2M-1] x1[2m-1] x0[2m-1] + +/- ACC X[2M] x1[2m] x0[2m] X[3M-1]... x1[3m-1] x0[3m-1] X[3M].. x1[3m] x0[3m] X[4M-1].. x1[4m-1] x0[4m-1] + t 2 Figure 3.1 DISTRIUTED ARITHMETIC WITH 4-INPUT LOO UP TALE IV. Simulation Results: Matla and Modelsim are used as the simulation platforms. We can analysis the changes etween the input wave and the output wave to oserve the permanence of the designed filter through Matla, while oserving the real-time implementation performance of FPGA through Modelsim. To oserve the performance of the designed filter, a square signal of 1.6MHz is generated as the input y Matla, and the sampling frequency is 32MHz. The data is changed into the two's complement form and stored in the file named fir_in.txt. And the data is called out later as the input of the FPGA filter through testench language in Modelsim, the output data is shown in the waveform workstation and also stored into the file, which can e read y Matla to make a contrast to analysis. Original -ased DA implementation of a 4-tap filter 4-tap -less DA architectures for a 4-tap FIR filter 16 Page

A Novel Approch Of Area-Efficient Fir Filter Design Using Distriuted Arithmetic With Decomposed 4-input look up tale architecture for high order filters Without pipelining 4-input look up tale architecture with pipelining Fully parallel DA architecture FIR filter without pipelining Fully parallel DA architecture FIR filter with pipelining V. Conclusion And Future Scope This paper presents the proposed DA architectures for high-order filters. The architectures reduce the memory usage y half at every iteration of reduction at the cost of the limited decrease of the system frequency. We also divide the high-order filters into several groups of small filters; hence we can reduce the size also. As to get the high speed implementation of FIR filters, a full-parallel version of the DA architecture is adopted. We have successfully implemented a high-efficient 70-tap full-parallel DA filter, using oth an original DA architecture and a modified DA architecture on a 4VLX40FF668 FPGA device. It shows that the proposed DA architectures are hardware efficient for FPGA implementation. The limitation of this project is that y the use of pipelining speed is increased however, the resource usage increase more than times of the it-serial one's, since we need to use pipeline registers and additional adders. Also the numer of s required would e enormous. The future scope of this project is to improve the architecture of the Distriuted arithmetic FIR filter such that it uses the hardware resources of the latest FPGA families. In vertex-5 and Vertex-6 family FPGA s, 6-input s were introduced. Future work includes changing the architecture which uses 6-input 17 Page

A Novel Approch Of Area-Efficient Fir Filter Design Using Distriuted Arithmetic With Decomposed s for storing coefficient sums and SRL(Shift register logic) macros to implement shift operations such that total numer of slices used will e reduced. References [1]. Partrick Longa, Ali Miri, "Area-Efficient Fir Filter Design on FPGAs using Distriuted Arithmetic" IEEE International Symposium on Signal Processing and Information Technology, pp:248-252,2006 [2]. Sangyun Hwang, Gunhee Han,Sungho ang, Jaeseok im, "New Distriuted Arithmetic Algorithm for Low-Power FIR Filter Implementation", IEEE Signal Processing Letters, Vol.11, No5, pp:463-466,may, 2004. [3]. Heejong Yoo, David V.Anderson, "Hardware-Efficient Distriuted Arithmetic Architecture For High-order Digital Filters", IEEE International Conference on Acoustics, Speech and Signal Processing, Vol.5,pp: 125-128,March,2005 [4]. Wangdian, Xingwang Zhuo "Digital Systems Applications and Design ased On Verilog HDL", eijin: National Defence Industry press, 2006. [5]. McClellan, J.H. Parks, T.W. Rainer, L.R. "A computer program for designing optimum FIR linear phase digital filters":. IEEE Trans. Audio Electroacoust. Vol. 21, No.6, pp:506-526, 1973. http://en.wikipedia.org/wiki/instruction_pipeline [6]. Httl://cse.stanford.edu/class/sophomore-college/projects-00/risc/pipelining/index.html we Meyer-aese.Digital signal processing with FPGA[M].eijing:Tsinghua University Press,2006:50~51 [7]. Tsao Y C and Choi. Area-Efficient Parallel FIR Digital Filter Structures for Symmetric Convolutions ased on Fast FIR Algorithm [J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2010,PP(99):1~5. [8]. Chao Cheng and esha Parhi. Low-Cost Parallel FIR Filter Structures With 2-Stage Parallelism[J].IEEE Transactions on Circuits and Systems I:Regular,2007,54(2):280~290. [9]. Tearney G J and ouma E. Real-Time FPGA Processing for High-Speed Optical Frequency Domain Imaging [J].IEEE Transactions on Medical Imaging, 2009,28(9):1468~1472. 18 Page