FPGA Based FIR Filter using Parallel Pipelined Structure

FPGA Based FIR Filter using Parallel Pipelined Structure Rajesh Mehra, SBL Sachan Electronics & Communication Engineering Department National Institute of Technical Teachers Training & Research Chandigarh, UT, India rajeshmehra@yahoo.com, sblsachan@yahoo.co.in Abstract: - In this paper an efficient method is presented to design and implement FIR filter. The implementation is based on MAC algorithm which uses embedded multipliers of target FPGA for multiplyand-accumulate operations used in FIR filter implementation. Parallel Pipelined structure is used to implement the proposed FIR Filter in bit parallel manner taking optimal advantage of the look up table and multipliers of the FPGA device. This method is useful to enhance the speed performance. The proposed FIR filter is designed and simulated with Matlab and Xilinx DSP Tools, synthesized with Xilinx Synthesis Tool (XST), and implemented on Spartan 3E based xc3s5e FPGA device. The proposed parallel pipelined MAC algorithm based FIR filter can operate at an estimated frequency of 15.1 MHz with initial latency of 3 clocks by consuming very less resources in terms of slices, flip flops, LUTs and multipliers to provide cost effective solution for signal processing applications. Key-Words: - DSP, FIR, MAC, FPGA 1 Introduction The demands for digital products with programmability are growing day by day. Various industries like audio, video, and cellular industry rely heavily on digital technology. A great part of digital technology deals with digital signal processing. This aspect in engineering has gained increasing interest, especially with much of the world now turning to wireless technology. FPGAs are essentially arrays of uncommitted logic and signal processing resources [1]. These allow the designer to implement DSP functions using highly scalable, parallel processing techniques. There is a constant requirement for efficient use of FPGA resources where for a given system occupying less hardware can yield significant cost-related benefits like reduced power consumption, area for additional application functionality, potential to use a smaller, cheaper FPGA. Today s consumer electronics such as cellular phones and other multi-media and wireless devices often require digital signal processing (DSP) algorithms for several crucial operations in order to increase speed, reduce area and power consumption. Due to a growing demand for such complex DSP applications, high performance, low-cost Soc implementations of DSP algorithms are receiving increased attention among researchers and design engineers. Although ASICs and DSP chips have been the traditional solution for high performance applications, now the technology and the market demands are looking for changes. Most of the common functions performed by almost all DSP chips are FFTs, FIR filters, Interpolator, Decimator. Finite impulse response (FIR) digital filters are common DSP functions and are widely used in FPGA implementations. If very high sampling rates are required, full-parallel hardware must be used [2] where every clock edge feeds a new input sample and produces a new output sample. In case fully parallel implementation is not possible then partly serial approach can be adopted to enhance the system performance which is presented in this paper. Such filters can be implemented on FPGAs using combinations of the general purpose logic fabric, on-board RAM and embedded arithmetic hardware. Full-parallel filters ISBN: 978-96-474-33-8 311

cannot share hardware over multiple clock cycles and so tend to occupy large amounts of resource. Hence, efficient implementation of such filters is important to minimize hardware requirement. When implementing a DSP system on a platform containing dedicated arithmetic blocks, it is normal practice to utilize such blocks as far as possible in reference to any general purpose logic fabric. Fig.1(a) shows parallel implementation where 4 multipliers are used to process four taps i.e. one multiplier for one tap each. By using four embedded multipliers of target FPGA maximum speed can be achieved by consuming more resources. On one hand, high development costs and time-tomarket factors associated with ASICs can be prohibitive for certain applications while, on the other hand, programmable DSP processors can be unable to meet desired performance due to their sequential-execution architecture. In this context, embedded FPGAs offer a very attractive solution that balance high flexibility, time-tomarket, cost and performance [3]. Therefore, in this paper, an FIR filter is designed and implemented on FPGA devices whose impulse response may be expressed as: (a) Parallel Architecture Alternatively, serial architecture shown in Fib1.(b) can also be used to conserve area and implement the filter at a lower speed performance, by using only one multiplier, one accumulator, and a register. where C1,C2.CK are fixed coefficients and the x1, x2 xk are the input data words. A typical digital implementation will require K multiply-and-accumulate (MAC) operations. The new generations of FPGA not only provide an effective way of implementing high performance DSP functions but also provide the designer with an even more cost-effective solution. In this paper a 2 tap FIR Filter is designed and implemented on FPGA using parallel pipelined architecture. 2 Filter Architectures A traditional DSP chip performs the MAC FIR Filter function in serial manner where as an FPGA allows designers to implement this function in parallel style using dedicated multipliers and registers available on FPGA target devices. FPGAs are completely hardware configurable thus; the designer has the flexibility to use only the necessary resources that the system demands. In Fig.1 different structures have been shown to implement four tap FIR filter using MAC algorithm. (b) Serial Architecture Another option is semi-parallel approach called partly serial which is shown in Fig.1(c). It can be used to improve the system speed performance as compared to serial architecture where fully parallel approach is not possible due to the limitation of multipliers on FPGA target device. (c) Partially Serial Architecture Fig.1 Filter Architectures using MAC Algorithm [1] ISBN: 978-96-474-33-8 312

3 Proposed Design Simulation.2 2-Tap FIR Filter The proposed 2 tap MAC algorithm based symmetric FIR filter has been developed using Remez method. In this proposed work FIR filter has been designed and simulated using Matlab and Xilinx DSP Tools [4]-[7] by taking filter order 19 along with fully pipelined structure to enhance the speed performance. The complete design flow is shown in Fig.2 where first step is project matlab code development and simulation. The floating point simulated output is shown in Fig.3. Magnitude (db).1 -.1 -.2 5 1 15 2 Coefficient number 2-Tap FIR Filter, 1-channel Implementation -2-4 -6-8.2.4.6.8 1 Normalized Frequency Parallel MAC implementation; error against floating point MATLAB filter() 1 Input quant = [16,15] Coeff quant = [16].5 Output quant = [16,12] Truncated Bits = [] Input/Output Delay: 1 Max Error:.24348 Error Mean:.11758 -.5 1 2 3 4 5 6 Sample number Symmetric coefficients Input signal PSD Reference filter response Reference output PSD AW model output PSD Reference output AW model output Error Fig. 4 Fixed Point FIR Filter Response 4 Hardware Synthesis Fig. 2 Design Flow In hardware implementation embedded multipliers and pipelined registers have been used to enhance the speed performance of the designed FIR filter. The proposed MAC based FIR filter structure is shown in Fig.5..2 2-Tap FIR Filter.1 -.1 Symmetric coefficients -.2 5 1 15 2 Coefficient number 2-Tap FIR Filter, 1-channel Implementation Magnitude (db) -2-4 -6-8.2.4.6.8 1 Normalized Frequency Parallel MAC implementation; error against floating point MATLAB filter() 1 Input quant = [16,15] Coeff quant = [16].5 Output quant = [16,12] Truncated Bits = [] Input/Output Delay: 1 Max Error: 8.1439e-6 Error Mean: 1.1988e-7 -.5 1 2 3 4 5 6 Sample number Input signal PSD Reference filter response Reference output PSD AW model output PSD Reference output AW model output Error Fig. 3 Floating Point FIR Filter Response Then the equivalent fixed point file is simulated and verified whose output has been shown in Fig.4. Fig.5 Proposed MAC FIR Filter To observe the speed and resource utilization, RTL was generated, verified and synthesized. The proposed FIR filter has been implemented on Spartan 3E FPGA device. The proposed fully parallel pipelined MAC based FIR filter can operate at an estimated frequency of 15.1 MHz with initial latency of 3 clocks by consuming very less resources in terms of slices, flip flops LUTs and multipliers as shown in Table1. ISBN: 978-96-474-33-8 313

Table1. Speed Performance References: The resource utilization is shown in Table2. It can be observed from this table that developed MAC based FIR filter has consumed 397 slices, 759 flip flops, 723 LUTs and 1 multipliers of the target FPGA device. Table2. Resource Utilization [1] Steve Zack, Suhel Dhanani DSP Co-Processing in FPGAs Embedding High Performance, Low-Cost DSP Functions WP212 (v1.) March 18, 24. [2] K.N. Macpherson and R.W. Stewart Area efficient FIR filters for high speed FPGA Implementation, IEE Proc.-Vis. Image Signal Process., Vol. 153, No. 6, Page711-72, December 26. [3] Shahnam Mirzaei, Anup Hosangadi, Ryan Kastner FPGA Implementation of High Speed FIR Filters Using Add and Shift Method, International conference on Computer Design, ICCD, pp. 38-313, IEEE 26. 5 Conclusion In this paper, a parallel pipelined MAC algorithm based 2 tap FIR filter has been presented to enhance the speed, throughput and area efficiency by taking an optimal advantage of look up table and embedded multiliers of target FPGA. The proposed filter has been designed and simulated using Matlab and Xilinx DSP tools. The synthesis of then developed design has been performed on Spartan 3E based xc3s5e FPGA device. The result shows that the proposed FIR filter can operate at an estimated frequency of 15.1 MHz with latency of 3 clocks. The resource consumption of the developed filter is 8%, 8% 7% and 5% in terms of slices, flip flops, LUTs and multipliers respectively on target FPGA device to provide cost effective solution for DSP applications. Acknowledgement The authors would like to thank Dr. S. S. Pattnaik, Professor and Head, ETV Department, National Institute of Technical Teachers Training & Research, Chandigarh, India for constant help and suggestions. The authors are also thankful to Dr. M. P. Poonia, Director, National Institute of Technical Teachers Training & Research, Chandigarh, India for constant inspirations and support throughout this research work. [4] Prithviraj Banerjee, Malay Haldar, David Zaretsky, Robert Anderson, Overview of a compiler for synthesizing Matlab programs onto FPGAs,IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Page 312-324 Vol. 12, No. 3, March 24. [5] Hitesh Patel Synthesis and Implementation strategies to accelerate design performance, WP229 (v1.) July 6, 25. [6] Philippe Garrault and Brian Philofsky HDL coding practices to accelerate design performance, WP231 (1.1) January 6, 26. [7] Mathworks, Users Guide Filter Design Toolbox-4, March-27. Authors Dr. S B L Sachan has recieved his PhD from Kanpur University Kanpur-India. He has completed his study of Video Hardware System from JICA- Tokyo Japan and Broadcast Television System (BTS) from BOSCH-Darmstadt West Germany. He was awarded UNESCO FELLOWSHIP in 1986 for Japan and in 1987 for Germany under Technician Education in India. ISBN: 978-96-474-33-8 314

He is currently Professor & Head, Electronics & Communication Engineering Department at National Institute of Technical Teachers Training & Research, Chandigarh, India. Dr. S B L Sachan has more than 32 years of academic and 6 years of Industrial experience. He has authored 21 research papers in reputed Journals and conferences. He is co-author of TET book writing for CPSC Manila(Philippines). His areas of interest are Wireless & Mobile Communication, Advanced Digital Communication, Optical Fiber Communications and Broadcast Engineering. Rajesh Mehra received the Bachelors of Technology degree in Electronics and Communication Engineering from National Institute of Technology, Jalandhar, India in 1994, and the Masters of Engineering degree in Electronics and Communication Engineering from National Institute of Technical Teachers Training & Research, Panjab Univsrsity, Chandigarh, India in 28. He is pursuing Doctor of Philosophy degree in Electronics and Communication Engineering from National Institute of Technical Teachers Training & Research, Panjab Univsrsity, Chandigarh, India. He is an Associate Professor with the Department of Electronics & Communication Engineering,, National Institute of Technical Teachers Training & Research, Ministry of Human Resource Development, Chandigarh, India. His current research and teaching interests are in Signal, and Communications Processing, Very Large Scale Integration Design. He has authored more than 1 research publications including more than 5 in Journals. Mr. Mehra is member of IEEE and ISTE. ISBN: 978-96-474-33-8 315