A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT

IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 7, Issue 2 (Jul. - Aug. 2013), PP 13-18 A Novel Approach of Area-Efficient FIR Filter Design Using Distriuted Arithmetic with Decomposed 1 P.Sravanthi, 2 CH.Srinivasa Rao, 3 S.Madhava Rao, 1 M.TECH student, MLE College of Engineering, S.onda, prakasam(dt), A.P, 2,3 ASSOC.Professor, MLECollege, singarayakonda, prakasam, A.P, INDIA, Astract: In this paper, a highly area-efficient multiplier-less FIR filter is presented. Distriuted Arithmetic (DA) has een used to implement a it-serial scheme of a general asymmetric version of an FIR filter, taking optimal advantage of the 3-input -ased structure of FPGAs. The implementation of FIR filters on FPGA ased on traditional arithmetic method costs considerale hardware resources, which goes against the decrease of circuit scale and the increase of system speed. This paper presents the realization of area efficient architectures using Distriuted Arithmetic (DA) for implementation of Finite Impulse Response (FIR) filter. The performance of the it-serial and it parallel DA along with pipelining architecture with different quantized versions are analyzed for FIR filter Design. Distriuted Arithmetic structure is used to increase the resource usage while pipeline structure is also used to increase the system speed. In addition, the divided method is also used to decrease the required memory units. However, according to Distriuted Arithmetic, we can make a Look-Up-Tale () to conserve the MAC values and callout the values according to the input data if necessary. Therefore, can e created to take the place of MAC units so as to save the hardware resources. The simulation results indicate that FIR filters using Distriuted Arithmetic can work stale with high speed and can save almost 50 percent hardware resources to decrease the circuit scale, and can e applied to a variety of areas for its great flexiility and high reliaility. This method not only reduces the size, ut also modifies the structure of the filter to achieve high speed performance. eywords: DSP, Digital Filters, FIR, FPGA, MAC, Distriuted Arithmetic(DA),Divided, pipeline I. Introduction In the recent years, there has een a growing trend to implement digital signal processing functions in Field Programmale Gate Array (FPGA). In this sense, we need to put great effort in designing efficient architectures for digital signal processing functions such as FIR filters, which are widely used in video and audio signal processing, telecommunications and etc. Traditionally, direct implementation of a -tap FIR filter requires multiply-and-accumulate (MAC) locks, which are expensive to implement in FPGA due to logic complexity and resource usage. To resolve this issue, we first present DA, which is a multiplier-less architecture. Implementing multipliers using the logic faric of the FPGA is costly due to logic complexity and area usage, especially when the filter size is large. Modern FPGAs have dedicated DSP locks that alleviate this prolem, however for very large filter sizes the challenge of reducing area and complexity still remains. An alternative to computing the multiplication is to decompose the MAC operations into a series of lookup tale () accesses and summations. This approach is termed distriuted arithmetic (DA), a it serial method of computing the inner product of two vectors with a fixed numer of cycles. The original DA architecture stores all the possile inary cominations of the coefficients w[ of equation (1) in a memory or lookup tale[5,7,9]. It is evident that for large values of L, the size of the memory containing the pre computed terms grows exponentially too large to e practical. The memory size can e reduced y dividing the single large memory (2Lwords) into m multiple smaller sized memories each of size 2k where L = m k. The memory size can e further reduced to 2L 1 and 2L 2 y applying offset inary coding and exploiting resultant symmetries found in the contents of the memories[8,9]. This technique is ased on using 2's complement inary representation of data, and the data can e precomputed and stored in. As DA is a very efficient solution especially suited for -ased FPGA architectures, many researchers put great effort in using DA to implement FIR filters in FPGA. Patrick Longa introduced the structure of the FIR filter using DA algorithm and the functions of each part. Sangyun Hwang analyzed the power consumption of the filter using DA algorithm[3,7]. Heejong Yoo proposed a modified DA architecture that gradually replaces requirements with multiplexer/adder pairs. ut the main prolem of DA is that the requirement of capacity increases exponentially with the order of the filter, given that DA implementations need 2words ( is the numer of taps of the filter)[4,5,10]. And if is a prime, the hardware resource consumption will cost even higher. To overcome these prolems, this paper presents a hardwareefficient DA architecture. 13 Page

A Novel Approch Of Area-Efficient Fir Filter Design Using Distriuted Arithmetic With Decomposed This method not only reduces the size, ut also modifies the structure of the filter to achieve high speed performance. The proposed filter has een designed and synthesized with ISE 7.1, and implemented with a 4VLX40FF668 FPGA device. [2] Our results show that the proposed DA architecture can implement FIR filters with high speed and smaller resource usage in comparison to the previous DA architecture. II. Design Methodology 2.1 Distriuted Arithmetic FIR filter architecture Distriuted Arithmetic is one of the most well-known methods of implementing FIR filters. The DA solves the computation of the inner product equation when the coefficients are pre knowledge, as happens in FIR filters. An FIR filter of length is descried as: 1 y[ n] x[ n --(1) Where is the filter coefficient and x[ is the input data. For the convenience of analysis, x'[ =x [n - is used for modifying the equation (1) and we have: 1 ' y. x [...(2) Then we use -it two's complement inary numers to represent the input data: 1 0 ' x [ 2 x [ x [ 2 -(3) y 2 Where [ 1 x denotes the th it of x[, [ ( 2 1 x [ 1 0 h [ x[ 2 1 0 1 0 x x [ 2 1 {0, 1}.Sustitution of (3) into (2) yields: ) x 2 f (, x [ ) 2 f (, x [ ) - (4) We have 1 [ f (, x[ ) x[ In equation (4), we oserve that the filter coefficients can e pre-stored in, and addressed y x = [ x [0], x [1],... x [ 1] ].This way, the MAC locks of FIR filters are reduced to access and summation with. The implementation of digital filters using this arithmetic is done y using registers, memory resources and a scaling accumulator. Original -ased DA implementation of a 4-tap (=4) FIR filter is shown in Figure 2.1. The DA architecture includes three units: the shift register unit, the DA- unit, and the adder/shifter unit. Figure 2.1: Original -ased DA implementation of a 4-tap filter 14 Page

A Novel Approch Of Area-Efficient Fir Filter Design Using Distriuted Arithmetic With Decomposed 2.2 Distriuted Arithmetic Architecture With Out Lut: In Fig. 2.1, we can see that the lower half of (locations where 3 =1) is the same with the sum of the upper half of (locations where 3 =0) and h [3]. Hence, size can e reduced 1/2 with an additional 2x1 multiplexer and a full adder, as shown in Figure 2.2.2(a). y the same reduction procedure, we can have the final -less DA architectures, as shown in Figure 2.2.2(). On other side, for the use of comination logic circuit, the filter performance will e affected. ut when the taps of the filter is a prime, we can use 4-input units with additional multiplexers and full adders to get the trade off etween filter performance and small resource usage. 3 Figure 2.2.2(a): Proposed DA architecture for a 4-tap filter ( 2 word implementation of DA) Figure 2.2.2(): -less DA architectures III. Distriuted Arithmetic With 4-Input Look Up Tale Architecture The main advantage of FIR filters is their linear-phase response. It is only achieved with symmetrical or ant symmetrical structures which, furthermore, offer hardware reductions. As the filter order increases, the size ( 2 words) grows exponentially. With this it is difficult to implement the architecture using lut less method as more and more cominational logic is needed. This resulted in a new architecture which is explained elow. 3.1 Four input look up tale architecture: As the filter order increases, the size ( 2 words) grows exponentially. We can divide the - M tap FIR filter into L groups of smaller filters ( = L *M). Hence the size is reduced to L 2 words, and then we have: y LM 1 x[ L1 M 1 l 0 m0 lm m] x[ lm m] 15 Page

A Novel Approch Of Area-Efficient Fir Filter Design Using Distriuted Arithmetic With Decomposed We would etter set M=4, ecause only under this condition can we take optimal advantage ased on the 4-input structure of FPGA, as shown in Fig 3.1. If we can not otain M=4 after the calculation, we can use oth the proposed DA- unit and the original 4-input DA- unit to achieve our goal. X[0]..x1[0] X[M-1]...x1[M-1] x0[0] x0[m-1] + X[M]. x1[m] x0[m] X[2M-1] x1[2m-1] x0[2m-1] + +/- ACC X[2M] x1[2m] x0[2m] X[3M-1]... x1[3m-1] x0[3m-1] X[3M].. x1[3m] x0[3m] X[4M-1].. x1[4m-1] x0[4m-1] + t 2 Figure 3.1 DISTRIUTED ARITHMETIC WITH 4-INPUT LOO UP TALE IV. Simulation Results: Matla and Modelsim are used as the simulation platforms. We can analysis the changes etween the input wave and the output wave to oserve the permanence of the designed filter through Matla, while oserving the real-time implementation performance of FPGA through Modelsim. To oserve the performance of the designed filter, a square signal of 1.6MHz is generated as the input y Matla, and the sampling frequency is 32MHz. The data is changed into the two's complement form and stored in the file named fir_in.txt. And the data is called out later as the input of the FPGA filter through testench language in Modelsim, the output data is shown in the waveform workstation and also stored into the file, which can e read y Matla to make a contrast to analysis. Original -ased DA implementation of a 4-tap filter 4-tap -less DA architectures for a 4-tap FIR filter 16 Page

A Novel Approch Of Area-Efficient Fir Filter Design Using Distriuted Arithmetic With Decomposed 4-input look up tale architecture for high order filters Without pipelining 4-input look up tale architecture with pipelining Fully parallel DA architecture FIR filter without pipelining Fully parallel DA architecture FIR filter with pipelining V. Conclusion And Future Scope This paper presents the proposed DA architectures for high-order filters. The architectures reduce the memory usage y half at every iteration of reduction at the cost of the limited decrease of the system frequency. We also divide the high-order filters into several groups of small filters; hence we can reduce the size also. As to get the high speed implementation of FIR filters, a full-parallel version of the DA architecture is adopted. We have successfully implemented a high-efficient 70-tap full-parallel DA filter, using oth an original DA architecture and a modified DA architecture on a 4VLX40FF668 FPGA device. It shows that the proposed DA architectures are hardware efficient for FPGA implementation. The limitation of this project is that y the use of pipelining speed is increased however, the resource usage increase more than times of the it-serial one's, since we need to use pipeline registers and additional adders. Also the numer of s required would e enormous. The future scope of this project is to improve the architecture of the Distriuted arithmetic FIR filter such that it uses the hardware resources of the latest FPGA families. In vertex-5 and Vertex-6 family FPGA s, 6-input s were introduced. Future work includes changing the architecture which uses 6-input 17 Page

A Novel Approch Of Area-Efficient Fir Filter Design Using Distriuted Arithmetic With Decomposed s for storing coefficient sums and SRL(Shift register logic) macros to implement shift operations such that total numer of slices used will e reduced. References [1]. Partrick Longa, Ali Miri, "Area-Efficient Fir Filter Design on FPGAs using Distriuted Arithmetic" IEEE International Symposium on Signal Processing and Information Technology, pp:248-252,2006 [2]. Sangyun Hwang, Gunhee Han,Sungho ang, Jaeseok im, "New Distriuted Arithmetic Algorithm for Low-Power FIR Filter Implementation", IEEE Signal Processing Letters, Vol.11, No5, pp:463-466,may, 2004. [3]. Heejong Yoo, David V.Anderson, "Hardware-Efficient Distriuted Arithmetic Architecture For High-order Digital Filters", IEEE International Conference on Acoustics, Speech and Signal Processing, Vol.5,pp: 125-128,March,2005 [4]. Wangdian, Xingwang Zhuo "Digital Systems Applications and Design ased On Verilog HDL", eijin: National Defence Industry press, 2006. [5]. McClellan, J.H. Parks, T.W. Rainer, L.R. "A computer program for designing optimum FIR linear phase digital filters":. IEEE Trans. Audio Electroacoust. Vol. 21, No.6, pp:506-526, 1973. http://en.wikipedia.org/wiki/instruction_pipeline [6]. Httl://cse.stanford.edu/class/sophomore-college/projects-00/risc/pipelining/index.html we Meyer-aese.Digital signal processing with FPGA[M].eijing:Tsinghua University Press,2006:50~51 [7]. Tsao Y C and Choi. Area-Efficient Parallel FIR Digital Filter Structures for Symmetric Convolutions ased on Fast FIR Algorithm [J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2010,PP(99):1~5. [8]. Chao Cheng and esha Parhi. Low-Cost Parallel FIR Filter Structures With 2-Stage Parallelism[J].IEEE Transactions on Circuits and Systems I:Regular,2007,54(2):280~290. [9]. Tearney G J and ouma E. Real-Time FPGA Processing for High-Speed Optical Frequency Domain Imaging [J].IEEE Transactions on Medical Imaging, 2009,28(9):1468~1472. 18 Page