VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

Similar documents
Power and Area Efficient Implementation for Parallel FIR Filters Using FFAs and DA

R. Solomon Roach 1, N. Nirmal Singh 2.

MCM Based FIR Filter Architecture for High Performance

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

FIR Filter Architecture for Fixed and Reconfigurable Applications

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

Implementation of a Low Power Decimation Filter Using 1/3-Band IIR Filter

Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

Fault Tolerant Parallel Filters Based on ECC Codes

Fixed Point LMS Adaptive Filter with Low Adaptation Delay

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay

DUE to the high computational complexity and real-time

Implementation of Reduce the Area- Power Efficient Fixed-Point LMS Adaptive Filter with Low Adaptation-Delay

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

AnEfficientImplementationofDigitFIRFiltersusingMemorybasedRealization

RISC IMPLEMENTATION OF OPTIMAL PROGRAMMABLE DIGITAL IIR FILTER

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

An Efficient Flexible Architecture for Error Tolerant Applications

Critical-Path Realization and Implementation of the LMS Adaptive Algorithm Using Verilog-HDL and Cadence-Tool

High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications

The Efficient Implementation of Numerical Integration for FPGA Platforms

International Journal for Research in Applied Science & Engineering Technology (IJRASET) IIR filter design using CSA for DSP applications

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Modified Welch Power Spectral Density Computation with Fast Fourier Transform

Design Optimization Techniques Evaluation for High Performance Parallel FIR Filters in FPGA

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

Design of Adaptive Filters Using Least P th Norm Algorithm

An Efficient Constant Multiplier Architecture Based On Vertical- Horizontal Binary Common Sub-Expression Elimination Algorithm

Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications for Efficient FIR Filter Implementation

SCALABLE IMPLEMENTATION SCHEME FOR MULTIRATE FIR FILTERS AND ITS APPLICATION IN EFFICIENT DESIGN OF SUBBAND FILTER BANKS

Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra Pradesh, India

DESIGN AND IMPLEMENTATION BY USING BIT LEVEL TRANSFORMATION OF ADDER TREES FOR MCMS USING VERILOG

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope

Digital Filter Synthesis Considering Multiple Adder Graphs for a Coefficient

VLSI Implementation of Parallel CRC Using Pipelining, Unfolding and Retiming

Low-Power FIR Digital Filters Using Residue Arithmetic

Design of 2-D DWT VLSI Architecture for Image Processing

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

ISSN Vol.08,Issue.12, September-2016, Pages:

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

Implementation of Two Level DWT VLSI Architecture

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Fault Tolerant Parallel Filters Based On Bch Codes

Design of Low-Delay FIR Half-Band Filters with Arbitrary Flatness and Its Application to Filter Banks

A Review on Fractional Delay FIR Digital Filters Design and Optimization Techniques

Implementation of Lifting-Based Two Dimensional Discrete Wavelet Transform on FPGA Using Pipeline Architecture

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 10 /Issue 1 / JUN 2018

Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Synthesis of DSP Systems using Data Flow Graphs for Silicon Area Reduction

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders

VHDL Implementation of Multiplierless, High Performance DWT Filter Bank

OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION

Volume 5, Issue 5 OCT 2016

INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS.

Implementation of digit serial fir filter using wireless priority service(wps)

Vertical-Horizontal Binary Common Sub- Expression Elimination for Reconfigurable Transposed Form FIR Filter

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

Area-Delay-Power Efficient Carry-Select Adder

FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILTER USED IN ECHO CANCELLATION

A HIGH PERFORMANCE FIR FILTER ARCHITECTURE FOR FIXED AND RECONFIGURABLE APPLICATIONS

A High-Speed FPGA Implementation of an RSD- Based ECC Processor

ARITHMETIC operations based on residue number systems

IMPLEMENTATION OF DISTRIBUTED CANNY EDGE DETECTOR ON FPGA

A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

Efficient Implementation of Low Power 2-D DCT Architecture

Analysis of Different Multiplication Algorithms & FPGA Implementation

Design and Implementation of Adaptive FIR filter using Systolic Architecture

Implementation of Fault Tolerant Parallel Filters Using ECC Technique

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Low power Comb Decimation Filter Using Polyphase

Design & Analysis of 16 bit RISC Processor Using low Power Pipelining

Exercises in DSP Design 2016 & Exam from Exam from

Parallel FIR Filters. Chapter 5

An Efficient High Speed VLSI Architecture Based 16-Point Adaptive Split Radix-2 FFT Architecture

Area Delay Power Efficient Carry Select Adder

THE orthogonal frequency-division multiplex (OFDM)

DESIGN AND IMPLEMENTATION OF DA- BASED RECONFIGURABLE FIR DIGITAL FILTER USING VERILOGHDL

Implimentation of A 16-bit RISC Processor for Convolution Application

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

Design of Delay Efficient Distributed Arithmetic Based Split Radix FFT

CHAPTER 3 MULTISTAGE FILTER DESIGN FOR DIGITAL UPCONVERTER AND DOWNCONVERTER

CHAPTER 4. DIGITAL DOWNCONVERTER FOR WiMAX SYSTEM

Design and Implementation of Lifting Based Two Dimensional Discrete Wavelet Transform

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

Transcription:

IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 05, 2015 ISSN (online): 2321-0613 VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2 1 M.Tech Student 2 Assistant Professor 1,2 Department of Electronics and Communication Engineering 1 Ajay Kumar Garg Engineering College, Ghaziabad, India Abstract Filter occupies a major role in digital signal processing. The design of parallel FIR filter structures using Common sub expression elimination technique requires minimum number of multipliers and low power adders. As normally multiplier consume more area and power in comparison of adder. Moreover, along with the increase in length of parallel FIR filter number of adders does not increases. By using common sub expression elimination algorithms, the adopted parallel FIR structures use the basic nature of symmetric coefficients which reduces half the number of multipliers in Sub-Filter section at the amount of additional adders in preprocessing and post processing blocks. In term of silicon area, the weight of multiplier is more than adder so exchanging multipliers with adders is advantageous. Finally the proposed parallel FIR filter structures are superior from existing one in terms of consumption of area and power. Key words: I. INTRODUCTION The finite-impulse response (FIR) filter has been and continues to be one of the fundamental processing elements in any digital signal processing (DSP) system.fir filters are used in DSP applications that range from video and image processing to wireless communications. In some applications, such as video processing, the FIR filter circuit must be able to operate at high frequencies, while in other applications, such as cellular telephony, the FIR filter circuit must be a low-power circuit, capable of operating at moderate frequencies. Parallel, or block, processing can be applied to digital FIR filters to either increase the effective throughput or reduce the power consumption of the original filter. Traditionally, the application of parallel processing to a FIR filter involves the replication of the hardware units that exist in the original filter. If the area required by the original circuit is A, then the L-parallel circuit requires an area of L x A. With the continuing trend to reduce chip size and integrate multi-chip solutions into a single chip solution, it is important to limit the silicon area required to implement a parallel FIR digital filter in a VLSI implementation. In many design situations, the hardware overhead that is incurred by parallel processing cannot be tolerated due to limitations in design area. Therefore, it is advantageous to realize parallel FIR filtering structures that consume less area than traditional parallel FIR filtering structures. There have been a few papers proposing ways to reduce the complexity of the parallel FIR filter in the past [1] [9]. In [1] [4], polyphase decomposition is mainly manipulated, where the small-sized parallel FIR filter structures are derived first and then the larger block-sized ones can be constructed by cascading or iterating small-sized parallel FIR filtering blocks. Fast FIR algorithms (FFAs) introduced in [1] [3] shows that it can implement a L-parallel filter using approximately (2L-1) sub filter blocks, each of which is of length N/L. FFA structures successfully break the constraint that the hardware implementation cost of a parallel FIR filter has a linear increase along with the block size L. It reduces the required number of multipliers to (2N- N/L) from LXN. In [5] [9], the fast linear convolution is utilized to develop the small-sized filtering structures and then a long convolution is decomposed into several short convolutions, i.e., larger block-sized filtering structures can be constructed through iterations of the small-sized filtering structures. On the other hand, parallel and pipelining processing are two techniques used in DSP applications, which can both be exploited to reduce the power consumption. Pipelining shortens the critical path by interleaving pipelining latches along the data path, at the price of increasing the number of latches and the system latency, whereas parallel processing increase the sampling rate by replicating hardware so that multiple inputs can be processed in parallel and multiple outputs are generated at the same time, at the expense of increased area. Both techniques can reduce the power consumption by lowering the supply voltage, where the sampling speed does not increase. In this paper, parallel processing in the digital FIR filter will be discussed. II. WINDOW METHOD FOR FIR FILTER DESIGN The window method is the fast, easy and robust for FIR Filter digital but generally ordinary. By the convolution theorem of Fourier transforms it can easily be understood, making it informative to study after the windows for spectrum analysis and Fourier theorems. [-N, N] We would expect to be able to change the length of Filter to the interval, for some sufficiently large N, and obtain a good FIR filter which approximates the ideal filter. This would be an example of using the window method with the rectangular window. We can manage various trade-off by choosing the window carefully, so as to maximize the filter-design quality in a given application. The functions of Window are always time limited. Always FIR filter is design by window method rather than IIR filter. By using dual of the convolution theorem, point wise multiplication in the time domain corresponds to convolution in the frequency domain. III. PARALLEL PROCESSING FOR LOW POWER The throughput of the FIR filter can be increased by using parallel processing to a FIR filter. By the use of parallel processing and by lowering the voltage which in turn reduces the amount of power consumption. Let Po =Co Vo 2 fo represent the power consumed in the original FIR filter, where Co is the effective capacitance of the traditional filter, All rights reserved by www.ijsrd.com 869

Vo is the supply voltage of the traditional filter and fo is the clock frequency of the original filter. It should be noted that fo = 1/ To, where To is the clock period of the original filter. In order to maintain the same sample rate, the clock period of a L-parallel filter must be increased to L To since L samples are produced every clock cycle. The L copies of the original filter contain in L-parallel filter each of which has an effective capacitance of Co. This means that Co is charged in time L To rather than in time To. By examining the propagation delay considerations of the original and parallel filter, the power supply reduction factor, β, can be determined. The propagation delay of the original circuit is given by Tpd= 2 (1) where k is a process dependent parameter and Vt is the device threshold voltage. It should be noted that the clock period, To, is typically equal to the maximum propagation delay, Tpd, in a structure. The propagation delay of the L-parallel filter is given by LTpd= 2 (2) From (1) and (2) the following quadratic equation is obtained L (β Vo Vt ) 2 = β(vo Vt ) 2 (3) This equation is used to solve for β. Once β is obtained, the reduced power consumption of the FIR filter can be calculated using P = β 2 (LCo) Vo 2 ( fo/l) = β 2 Co Vo 2 fo (4) As can be seen, parallel processing leads to a reduction in power consumption by a factor of β 2. IV. FAST FIR ALGORITHM let {x i } and {h i } to be the input sequence and the Nth-order impulse response of an FIR filter respectively, the output sequence y(n) and the filter transfer function H(z) can be written as (5) in Fig 1 (5) The traditional L-parallel FIR filter can be shown Fig.1: Traditional 2 parallel FIR filter implementation Two-parallel FIR filter and Three-parallel FIR filter implementation using FFA as shown by Fig.2 and Fig.3 respectively. Fig. 2: Two parallel FIR filter implementation using FFA Fig. 3: Three parallel FIR filter implementation using FFA V. COMMON SUB EXPRESSION ELIMINATION ALGORITHM In this section, we will discuss a detailed description of CSE algorithm which will be able to solve Problem B pattern selection in which with arbitrary shifts within the input matrix the elimination of patterns will be done. Then, we will describe the changes which is necessary for the algorithm that can solve Problem A as well. The algorithm must complete the following tasks. 1) Look out the existence of multiple patterns in the input matrix. 2) Select one pattern for reduction. 3) Delete the entire selected pattern. Until there are no more multiple patterns present this should be regularly repeated. The input parameter show the number of nonzero bits in the examined patterns. In the first step, complete statistics of the pattern frequencies are created and an exhaustive search for all possible multiple-bit patterns is performed.as many different patterns will occur more than one time, so some criteria should be used to select the one for reduction or elimination. We select always the sequence of pattern with the highest frequency. In the second step, all outcomes of the selected sequence of pattern are eliminated, and at the bottom of the matrix sequence of pattern is added as a new line so that it can be searched out for the multiple patterns with smaller pattern later. Last, since the elimination of a pattern must effect the total frequency statistics of the remaining patterns, the complete information is present in global frequency statistic which has to be adjusted properly to reflect the changes. After all non zero bit with multiple pattern are processed, for nonzero bit patterns the whole cycle is repeated. A detailed discussion will be further concentrated on the following problems: A) Pattern identification; B) Pattern selection; C) Frequency statistics management; D) Adaptation of the algorithm for Problem A; E) Viability of the algorithm for large tasks; F) Applicability for similar CSE tasks. VI. PROPOSED 3X3 PARALLEL FIR DIGITAL FILTER STRUCTURE For N-tap three parallel FIR Filter as shown in Fig. 4, the total amount of saved multiplier would be the number of sub filter blocks that contain symmetric coefficients times half the number of multiplications in a single sub filter blocks. All rights reserved by www.ijsrd.com 870

Compared with the existing FFA three-parallel FIR filter structure, the proposed structure leads to two more sub filter blocks, which contains symmetric coefficients. A comparison figure between the existing FFA three parallel FIR filter and proposed FIR filter is shown in Fig.5 Where the shadow blocks stand for the sub filter blocks, which contain symmetric coefficients. Therefore, for an N-tap three-parallel FIR filter, the proposed structure can save N/3 multipliers from the existing FFA structure. However, it comes with the price of the increase in amount of adders, i.e., five additional adders, in pre-processing and post processing blocks. The number of multipliers and the number of adders required for the filtering structure (5) Where r is the number of FFAs used, Li is the block size of the FFA at step-i, Mi is the number of filters that result from the application of the ith FFA and N is the length of the filter. The number of required adders that is calculated as follows: * ( ) ( ) + Fig. 4: Proposed three parallel FIR filter implementation + [ [ ] (6) Fig. 5: Comparison of sub filter blocks between the existing FFA and the proposed structure. VII. PROPOSED DESIGN SIMULATION The block Level Implementation of 3X3 proposed FIR Filter of 81-tap is done on MATLAB R2012a as shown in Figure 6.Each Block of Proposed FIR Filter contains Sub filter blocks each of which is loaded with symmetric Coefficients based on Canonical Signed Digit Structures which reduces the hardware complexity of the Filter. The various output waveform of 81-tap direct form FIR filter based on window method is shown in Fig. 7. Fig. 6: Block Level implementation of 3x3 parallel FIR filter on MATLAB All rights reserved by www.ijsrd.com 871

with the help of number of slices used in Field Programmable Gate Arrays. Also the power utilization is measured with the help of Xilinx 14.1. Power consumption is depends on area utilized. i.e., number of slices to be used. Fig. 7(a): Magnitude Response Fig. 7(b): Step Response Fig. 7(c): Phase Response Fig. 7 (a) Magnitude Response, (b) Step Response, (c) Phase Response The main advantage of direct form Symmetric FIR is that it is no longer costly and can be operate on lower data sample rates. The simulation waveform of 81-tap FIR Filter is shown in Fig 8. Fig. 8: Output waveform of 81-tap FIR digital Filter structure Here simple 16-bit FIR filter has been designed with the help of common sub expression elimination algorithm. This above results shows the proposed 3X3 parallel linear phase fir filter of order 81 with symmetric coefficient in terms of Area and Power. Area can be calculated VIII. HARDWARE COMPLEXITY ANALYSIS A Comparison between the proposed CSE (Common Sub expression Elimination Algorithm) and the existing FFA structures for odd coefficients with different length under different level of parallelism is summarized in Table I. Algo L=3 Required multiplier Reduced multiplier Reduced Adders Sub Pre/ Post Adder Increased FFA 27-38 10 81-8 48 5 110 15 CSE 27-30 10 81-26 156 5 84 15 Table 1: Performance results of FFA and CSE Algorithm IX. IMPLEMENTATION In this work, sub filters blocks are used and implementation is done using MATLAB.Figure 6 shows the block diagram of proposed approach.each of the block is loaded with sub filters structures by using CSE Algorithm, and the blocks inside the subfilter structures is loaded with filter coefficients.finally,its simulation and synthesis results is analysed using Xilinx 14.1 ISE tool.xilinx 14.1 ISE tool is used to analyse Power and Area between existing FFA and Proposed CSE structure. Comparision of Power and Area is summarized in Table II. Algorithm Power(mW) Area (in slices) FFA 129.76 280861 CSE 102 79397 Table 2: Comparision of Power and Area Analysis X. CONCLUSION For reducing the number of multiplier in FIR Filter structure CSE algorithm is designed. The overall comparison between the FFA algorithm and CSE algorithm is that, by using FFA algorithm number of multiplier increases and by using CSE algorithm number of multipliers reduces at the expense of increasing the no of adders in preprocessing and post processing blocks. Simulation results and their performance were analyzed by using ISIM simulator and Xilinx 14.1. Thus the performance of the proposed method was improved compared to the existing approach. Finally, the symmetric coefficients for the parallel FIR filter using FFA are slightly increased in comparison to CSE approach. The area and power consumed by CSE algorithm is less as compared to the FFA algorithm. REFERENCES [1] D. A. Parker and K. K. Parhi, Low-area/power parallel FIR digital filter implementations, J. VLSI All rights reserved by www.ijsrd.com 872

Signal Process. Syst., vol. 17, no. 1, pp. 75 92,Sep. 1997. [2] J. G. Chung and K. K. Parhi, Frequency-spectrumbased low-area low power parallel FIR filter design, EURASIP J. Appl. Signal Process., vol. 2002, no. 9, pp. 444 453, Jan. 2002. [3] K. K. Parhi, VLSI Digital Signal Processing systems: Design and Implementation. New York: Wiley, 1999. [4] Z.-J. Mou and P. Duhamel, Short-length FIR filters and their use in fast non recursive filtering, IEEE Trans. Signal Process., vol. 39, no. 6, pp. 1322 1332, Jun. 1991. [5] J. I. Acha, Computational structures for fast implementation of L-path and L-block digital filters, IEEE Trans. Circuits Syst., vol. 36, no. 6, pp. 805 812, Jun. 1989. [6] C. Cheng and K. K. Parhi, Hardware efficient fast parallel FIR filter structures based on iterated short convolution, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 8, pp. 1492 1500, Aug. 2004. [7] C. Cheng and K. K. Parhi, Further complexity reduction of parallel FIR filters, in Proc. IEEE ISCAS, May 2005, vol. 2, pp. 1835 1838. [8] C. Cheng and K. K. Parhi, Low-cost parallel FIR structures with 2-stage parallelism, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 2, pp. 280 290, Feb. 2007. [9] I.-S. Lin and S. K. Mitra, Overlapped block digital filtering, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process. vol. 43, no. 8, pp. 586 596, Aug. 1996. [10] Y.-C. Tsao and K. Choi, Area-efficient parallel FIR digital filter structures for symmetric convolutions based on fast FIR algorithm, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 2, pp. 366 371, Feb. 2010. [11] Yu-Chi Tsao and Ken Choi, Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filter of Odd Length Based on Fast FIR Algorithm, IEEE Transaction on Circuits and Systems-II, vol. 59.no.6.pp.371-375, 2012. [12] Y.C. LIM, Design of Discrete-Coefficient-Value Linear Phase FIR Filters with Optimum Normalized Peak Ripple Magnitude, IEEE Transaction on Circuits And Systems I,vol.37. no.12.pp.1480-1486,1990. [13] P. P. Vaidyanathan, Optimal Design of Linear-Phase FIR Digital Filters with Very Flat Passbands and Equiripple Stopbands, IEEE Transactions on Circuits and Systems, vol. 32,no.9,pp. 904-917,1985. [14] P. K. Meher, S. Chandrasekaran, and A. Amira, FPGA realization of FIR filters by efficient and flexible systolization using distributed arithmetic, IEEE Trans. Signal Process., Vol. 56, no. 7, pp. 3009 3017,2008. All rights reserved by www.ijsrd.com 873