FIR Filter Architecture for Fixed and Reconfigurable Applications

Similar documents
MCM Based FIR Filter Architecture for High Performance

High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications

Design of Fir Filter Architecture Using Manifold Steady Method

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 10 /Issue 1 / JUN 2018

Vertical-Horizontal Binary Common Sub- Expression Elimination for Reconfigurable Transposed Form FIR Filter

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

DUE to the high computational complexity and real-time

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra Pradesh, India

Fault Tolerant Parallel Filters Based on ECC Codes

Fixed Point LMS Adaptive Filter with Low Adaptation Delay

OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

A HIGH PERFORMANCE FIR FILTER ARCHITECTURE FOR FIXED AND RECONFIGURABLE APPLICATIONS

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

Area and Delay Optimization using Various Multiple Constant Multiplication Techniques for FIR Filter

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay

Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design

Design Optimization Techniques Evaluation for High Performance Parallel FIR Filters in FPGA

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders

Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications for Efficient FIR Filter Implementation

Implementation of Reduce the Area- Power Efficient Fixed-Point LMS Adaptive Filter with Low Adaptation-Delay

FPGA IMPLEMENTATION OF MEMORY EFFICIENT HIGH SPEED STRUCTURE FOR MULTILEVEL 2D-DWT

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

Critical-Path Realization and Implementation of the LMS Adaptive Algorithm Using Verilog-HDL and Cadence-Tool

An Efficient Implementation of Fixed-Point LMS Adaptive Filter Using Verilog HDL

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

FPGA Implementation of Discrete Fourier Transform Using CORDIC Algorithm

DESIGN AND IMPLEMENTATION BY USING BIT LEVEL TRANSFORMATION OF ADDER TREES FOR MCMS USING VERILOG

Implementation of Two Level DWT VLSI Architecture

Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter

ISSN Vol.08,Issue.12, September-2016, Pages:

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication

DESIGN AND IMPLEMENTATION OF DA- BASED RECONFIGURABLE FIR DIGITAL FILTER USING VERILOGHDL

Implementation of digit serial fir filter using wireless priority service(wps)

Reconfigurable Architecture for Efficient and Scalable Orthogonal Approximation of DCT in FPGA Technology

A Novel Distributed Arithmetic Multiplierless Approach for Computing Complex Inner Products

LOW-POWER SPLIT-RADIX FFT PROCESSORS

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

Area And Power Optimized One-Dimensional Median Filter

A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

Power and Area Efficient Implementation for Parallel FIR Filters Using FFAs and DA

Digital Filter Synthesis Considering Multiple Adder Graphs for a Coefficient

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Keywords: Fast Fourier Transforms (FFT), Multipath Delay Commutator (MDC), Pipelined Architecture, Radix-2 k, VLSI.

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

AnEfficientImplementationofDigitFIRFiltersusingMemorybasedRealization

The Efficient Implementation of Numerical Integration for FPGA Platforms

Linköping University Post Print. Analysis of Twiddle Factor Memory Complexity of Radix-2^i Pipelined FFTs

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

DESIGN OF DCT ARCHITECTURE USING ARAI ALGORITHMS

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

The Serial Commutator FFT

FPGA Implementation of 4-Point and 8-Point Fast Hadamard Transform

RECENTLY, researches on gigabit wireless personal area

ARITHMETIC operations based on residue number systems

An efficient multiplierless approximation of the fast Fourier transform using sum-of-powers-of-two (SOPOT) coefficients

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope

Fault Tolerant Parallel Filters Based On Bch Codes

An Efficient Constant Multiplier Architecture Based On Vertical- Horizontal Binary Common Sub-Expression Elimination Algorithm

Low Power Floating Point Computation Sharing Multiplier for Signal Processing Applications

Optimization Method for Broadband Modem FIR Filter Design using Common Subexpression Elimination

Design and Implementation of 3-D DWT for Video Processing Applications

International Journal for Research in Applied Science & Engineering Technology (IJRASET) IIR filter design using CSA for DSP applications

A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices

Design of 2-D DWT VLSI Architecture for Image Processing

Design and Implementation of Lifting Based Two Dimensional Discrete Wavelet Transform

Carry Select Adder with High Speed and Power Efficiency

A High-Speed FPGA Implementation of an RSD-Based ECC Processor

ISSN Vol.02, Issue.11, December-2014, Pages:

Modified Welch Power Spectral Density Computation with Fast Fourier Transform

Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator

Volume 5, Issue 5 OCT 2016

DETECTION AND CORRECTION OF CELL UPSETS USING MODIFIED DECIMAL MATRIX

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

R. Solomon Roach 1, N. Nirmal Singh 2.

Design and Implementation of Adaptive FIR filter using Systolic Architecture

VLSI DESIGN FOR CONVOLUTIVE BLIND SOURCE SEPARATION

Realization of Fixed Angle Rotation for Co-Ordinate Rotation Digital Computer

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

A High Speed Design of 32 Bit Multiplier Using Modified CSLA

INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS.

Optimization Method for Broadband Modem FIR Filter Design using Common Subexpression Elimination

On the Design of High Speed Parallel CRC Circuits using DSP Algorithams

THE orthogonal frequency-division multiplex (OFDM)

II. MOTIVATION AND IMPLEMENTATION

VLSI Implementation of Parallel CRC Using Pipelining, Unfolding and Retiming

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VI /Issue 3 / JUNE 2016

Low-Power, High-Throughput and Low-Area Adaptive Fir Filter Based On Distributed Arithmetic Using FPGA

DESIGN & SIMULATION PARALLEL PIPELINED RADIX -2^2 FFT ARCHITECTURE FOR REAL VALUED SIGNALS

Abstract. Literature Survey. Introduction. A.Radix-2/8 FFT algorithm for length qx2 m DFTs

IMPLEMENTATION OF 2-D TWO LEVEL DWT VLSI ARCHITECTURE

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES

Transcription:

FIR Filter Architecture for Fixed and Reconfigurable Applications Nagajyothi 1,P.Sayannna 2 1 M.Tech student, Dept. of ECE, Sudheer reddy college of Engineering & technology (w), Telangana, India 2 Assosciate Professor, Dept. of ECE, Sudheer reddy college of Engineering & technology (w), Telangana, India Abstract The main objective of the work is to design transpose form Finite Impulse Response (FIR) filters. Which are pipelined and always support Multiple Constant Multiplications (MCM) technique that results in saving of computation. Transpose form of filter configuration does not directly support the block processing method. Transpose form of block FIR filter is depends on the multiplier based architecture for reconfigurable applications. A generalized block formulation is presented for transpose form FIR filter. We have derived a general multiplier-based architecture for the proposed transpose form block filter for reconfigurable applications. A low-complexity design using the MCM scheme is also presented for the block implementation of fixed FIR filters. The proposed structure involves significantly less area delay product (ADP) and less energy per sample (EPS) than the existing block implementation of direct form structure for medium or large filter lengths, while for the shortlength filters, the block implementation of direct-form FIR structure has less ADP and less EPS than the proposed structure. Index Terms MCM, transpose form FIR filter, reconfigurable architecture, VLSI I. INTRODUCTION The vast developing technological areas like mobile, computing and portable multimedia generally communication applications has increased the demand for low power systems. The digital signal processing (DSP) is one of the technologies most contributed in processing huge amount of data with higher operational speed in communication systems. Many application systems based on DSP, especially the recent next generation communication systems, require extremely fast processing of large amount of digital data. Most of DSP applications such as finite impulse response (FIR) filtering and fast Fourier transform (FFT) require additions and multiplications. Many previous efforts for reducing power consumption of FIR filter generally focus on the optimization of the filter coefficients while maintaining a fixed filter order[3] [5]. In those approaches, FIR filter structures are simplified to add and shift operations, and minimizing the number of additions/subtractions is one of the main aims of the research. But, one of the drawbacks in those approaches is once if filter architecture is decided, the coefficients cannot be changed; therefore, those techniques are not suitable to the FIR filter with programmable coefficients. Approximate [8] signal processing techniques are also used for the design of low power digital filters [9], [10].In [11] filter order dynamically varies according to the energy band of the input signal. However, the approach suffers from slow filter-order adaptation time due to energy computations in the feedback mechanism. Previous studies in [9] show that the data samples or filter coefficients before the convolution operation has a desirable energy-quality characteristic of FIR filter. However, the overhead associated with the real-time sorting of incoming samples is too large. Reconfigurable FIR filter architectures are previously proposed or low power implementations [9] [11] or to realize various frequency responses using a single filter [12]. For low power architectures, variable input word-length and filter taps [9], different coefficient word-lengths [8], and dynamic reduced signal representation [11] techniques are used. In those works, large overhead is incurred to support reconfigurable schemes such as arbitrary nonzero digit assignment or [11] programmable shift [10]. IJIRT 143959 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 350

The existing structure for block FIR filter is [based on the recurrence relation of shown in Fig. 6 for the block size L=4. It consists of one coefficient selection unit (CSU), one register unit (RU), M number of inner product units (IPUs), and one pipeline adder unit (PAU). The CSU stores coefficients of all the filters to be used for the reconfigurable application. It is implemented using N ROM LUTs, such that filter coefficients of any particular channel filter are obtained in one clock cycle, where N is the filter length. The RU receives xk during the kth cycle and produces L rows of S 0 k in parallel. L rows of S 0 kare transmitted to M IPUs of the proposed structure. The M IPUs also receive M short-weight vectors from the CSU such that during the kth cycle, the (m + 1) th IPU receives the weight vector c M m 1 from the CSU and L rows of S 0 k form the RU. Each IPU performs matrix vector product of S 0 k with the short-weight vector cm, and computes a block of L partial filter outputs (rmk). Therefore, each IPU performs L innerproduct computations of L rows of S0k with a common weight vector cm. It consists of L number of L-point inner-product cells (IPCs). The (l+1) th IPC receives the (l+1) th row of S 0 k and the coefficient vector cm, and computes a partial result of inner product r (k L l ), for 0 l L 1.). All the M IPUs work in parallel and produce M blocks of result (r m k ).These partial inner products are added in the PAU to obtain a block of L filter outputs. In each cycle, the proposed structure receives a block of L inputs and produces a block of L filter outputs, where the duration of each cycle is T = TM + TA + TFA log2 L, TM is one multiplier delay, TA is one adder delay, and TFA is one full-adder delay. Drawbacks It provides only block performances High delay Occupies high area II. LITERATURE SURVEY Pramod Kumar Meher (2006) proposed the structure that involves significantly less memory and less area delay complexity compared with the existing DAbased structures for circular convolution. Besides, it is shown that the proposed systolic designs for circular convolution can be used for computation of linear convolution as well. Basant Kumar Mohanty and Pramod Kumar Meher (2015) explore the possibility of realization of block FIR filter in transpose form configuration for areadelay efficient realization of large order FIR filters for both fixed and reconfigurable applications. Yu Pan and Pramod Kumar Meher (2014) proposed the resource minimization problem in the scheduling of adder-tree operations for the MCM block, and presented a mixed integer programming (MIP) based algorithm for more efficient MCM-based implementation of FIR filters. Experimental result shows that up to 15% reduction of area and 11.6% reduction of power (with an average of 8.46% and 5.96% respectively) can be achieved on the top of already optimized adder/subtractor network of the MCM block. Abbes Amira, Pramod Kumar Meherand Shrutisagar Chandrasekaran (2008) presented the design optimization of one and two dimensional fully pipelined computing structures for efficient implementation of finite impulse-response (FIR) filter to obtain effective area, delay and power by using systolic decomposition of inner product computation based on distributed arithmetic (DA). The systolic decomposition scheme is found to offer a flexible choice of the address length of the lookup tables (LUT) for DA-based computation to decide on suitable area time trade off. It is observed that by using smaller address lengths for DA-based computing units, it is possible to reduce the memory size, but on the other hand that leads to increase of adder complexity and the latency. III. PROPOSED METHOD There are several applications where the coefficients of FIR filters remain fixed, while in some other applications, like SDR channelizer that requires separate FIR filters of different specifications to extract one of the desired narrowband channels from the wideband RF front end. These FIR filters need to be implemented in a RFIR structure to support multi standard wireless communication. In this section, we present a structure of block FIR filter for such reconfigurable applications. In this section, we discuss the implementation of block FIR filter for fixed filters as well using MCM scheme. The proposed structure for block FIR filter is shown in Figure for the block size L = 4. It consists of one coefficient selection unit (CSU), one register unit (RU), M number of inner product units (IPUs), and one pipeline adder unit (PAU). The CSU stores IJIRT 143959 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 351

coefficients of all the filters to be used for the reconfigurable application. It is implemented using N ROM LUTs, such that filter coefficients of any particular channel filter are obtained in one clock cycle, where N is the filter length. The RU receives x k during the k th cycle and produces L rows of S0 k in parallel. L rows of S0 k are transmitted to M IPUs of the proposed structure. The M IPUs also receive M short-weight vectors from the CSU such that during the k th cycle, the (m + 1) th IPU receives the weight vector c M m 1 from the CSU and L rows of S0 k form the RU. Each IPU performs matrix vector product of S0 k with the short-weight vector cm, and computes a block of L partial filter outputs (r mk ). Therefore, each IPU performs L inner-product computations of L rows of S0 k with a common weight vector cm. In each cycle, the proposed structure receives a block of L inputs and produces a block of L filter outputs, where the duration of each cycle is T = TM + TA + TFA log2 L, TM is one multiplier delay, TA is one adder delay, and TFA is one full-adder delay Fig. 1. Proposed structure for block FIR filter. Fig.2 (a) Internal structure of RU for block size of L= 4. (b) Structure of (m + 1) th IPU. A. MCM-Based Implementation of Fixed-Coefficient FIR Filter We deliberate the derivation of MCM units for transpose form block FIR filter, and the design of proposed structure for fixed filters. For fixedcoefficient implementation, the CSU is no longer required, since the structure is to be tailored for only one given filter. Similarly, IPUs are not required. The multiplications are required to be mapped to the MCM units for a low-complexity realization. In the following, we show that the proposed formulation for MCM-based implementation of block FIR filter makes use of the symmetry in input matrix S0k to perform horizontal and vertical common sub expression elimination and to minimize the number of shift-add operations in the MCM blocks is no longer required, since the structure is to be tailored for only one given filter. The recurrence relation can be expressed as Y(z)= z 1 z 1(z 1rM 1 +rm 2 +rm 3) +r1+r0. Where R =S0 k C Similarly, IPUs are not required. The multiplications are required to be mapped to the MCM units for a low complexity realization. In the following, we show that the proposed formulation for MCM based implementation of block FIR filter makes use of the symmetry in input matrixs0k to perform horizontal and vertical common sub expression elimination and to minimize the number of shift-add operations in the MCM blocks. B. Hardware and Time Complexities The proposed structure for reconfigurable application consists of one CSU, one RU, M IPUs, and one PAU. The CSU consists of N ROM units of P words each, where P is the number of FIR filters to be implemented by the proposed reconfigurable structure. We have excluded complexity of CSU in the performance comparison, since it is common in all the RFIR structures. Each IPU is comprised of L IP cells, where each IP cell involves L multipliers and (L 1) adders. The RU involves (L 1) registers of B-bit width. The PAU involves (M 1) adders and the same number of registers, where each register has a width of (B+ B_), B, and B_ respectively, being the bit width of input sample and filter coefficients. Therefore, the proposed structure involves LN multipliers, L(N 1) adders, and [B(N 1) + B_(N L)] (flip flops) FFs; and processes L samples in every IJIRT 143959 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 352

cycle where the Duration of cycle period T = [TM + TA + TFA(log2L)] IV. Fig.3 Proposed MCM Structure RESULTS AND ANALYSIS The results presented establish a clear area advantage of Proposed FIR architecture over prior architecture for typical filter parameters with comparable maximum clock rates. Fig.4 RTL Schematic Fig.5 Technology diagram Fig.6 Simulation Output V. CONCLUSION In this paper, we have explored the possibility of realization of block FIR filters in transpose form configuration for area delay efficient realization of both fixed and reconfigurable applications. A generalized block formulation is presented for transpose form block FIR filter, and based on that we have derived transpose form block filter for reconfigurable applications. We have presented a scheme to identify the MCM blocks horizontal and vertical sub expression elimination in the proposed block FIR filter for fixed coefficients to reduce the computational complexity. Performance comparison shows that the proposed structure involves significantly less ADP and less EPS than the existing block direct-form structure for medium or large filter lengths while for the short-length filters, the existing block direct-form structure has less ADP and less EPS than the proposed structure. REFERENCES [1] J. G. Proakis and D. G. Manolakis, Digital Signal Processing: Principles, Algorithms and Applications. Upper Saddle River, NJ, USA: Prentice-Hall, 1996. [2] T. Hentschel and G. Fettweis, Software radio receivers, in CDMA Techniques for Third Generation Mobile Systems. Dordrecht, The Netherlands: Kluwer, 1999, pp. 257 283. [3] H. Samueli, An improved search algorithm for the design of multiplierless FIR filter with powers-of-two coefficients, IEEE Trans. Circuits Syst., vol. 36, no. 7, pp. 1044 1047, Jul. 1989. [4] R. I. Hartley, Sub expression sharing in filters using canonical signed digit multipliers, IEEE IJIRT 143959 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 353

Trans. Circuits Syst. II, Analog Digit. Signal Process. vol. 43, no. 10, pp. 677 688, Oct. 1996. [5] O. Gustafsson, A difference based adder graph heuristic for multiple constant multiplication problems, in Proc. IEEE Int. Symp. Circuits Syst., 2007, pp. 1097 1100. [6] S. H. Nawab, A. V. Oppenheim, A. P. Chandrakasan, J. M. Winograd, and J. T. Ludwig, Approximate signal processing, J.VLSI Signal Process., vol. 15, no. 1 2, pp. 177 200, Jan. 1997. [7] J. Ludwig, H. Nawab, and A. P. Chandrakasan, Low power digital filtering using approximate processing, IEEE J. Solid-State Circuits, vol. 31, no. 3, pp. 395 400, Mar. 1996. [8] A. Sinha, A. Wang, and A. P. Chandrakasan, Energy scalable system design, IEEE Trans. Very Large Scale Integr. Syst., vol. 10, no. 2, pp. 135 145, Apr. 2002. [9] K.-H. Chen and T.-D. Chiueh, A low-power digit-based reconfigurable FIR filter, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 8, pp. 617 621, Dec. 2006. [10] R. Mahesh and A. P. Vinod, New reconfigurable architectures for implementing filters with low complexity, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 29, no. 2, pp. 275 288, Feb. 2010. [11] Z. Yu, M.-L. Yu, K. Azadet, and A. N. Wilson, Jr, A low power FIR filter design technique using dynamic reduced signal representation, in Proc. Int. Symp. VLSI Tech., Syst., Appl., 2001, pp. 113 116. [12] R. Mahesh and A. P. Vinod, Coefficient decimation approach for realizing reconfigurable finite impulse response filters, in Proc. IEEE Int.Symp. Circuits Syst., 2008, pp. 81 84. BIODATA Author Nagajyothi presently pursuing her M.Tech (ES- VLSID) from Sudheer reddy college of Engineering & technology (w), Telangana, India. Co-Author1 P.Sayannna received M.Tech. Presently working as Assosciate Professor in Dept. of ECE, Sudheer reddy college of Engineering & technology (w), Telangana, India. IJIRT 143959 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 354