FPGA Based FIR Filter using Parallel Pipelined Structure

Similar documents
A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

DSP Co-Processing in FPGAs: Embedding High-Performance, Low-Cost DSP Functions

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

The Efficient Implementation of Numerical Integration for FPGA Platforms

Parallel FIR Filters. Chapter 5

Implementation of High Speed FIR Filter using Serial and Parallel Distributed Arithmetic Algorithm

Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter

An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Simulation & Synthesis of FPGA Based & Resource Efficient Matrix Coprocessor Architecture

A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT

International Journal of Computer Sciences and Engineering. Research Paper Volume-6, Issue-2 E-ISSN:

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

A HIGH PERFORMANCE FIR FILTER ARCHITECTURE FOR FIXED AND RECONFIGURABLE APPLICATIONS

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

International Journal of Advanced Research in Computer Science and Software Engineering

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS

Design and Simulation of 32 bit Floating Point FFT Processor Using VHDL

INTRODUCTION TO FPGA ARCHITECTURE

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool

INTRODUCTION TO CATAPULT C

Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 10 /Issue 1 / JUN 2018

An FPGA based rapid prototyping platform for wavelet coprocessors

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

VHDL for Synthesis. Course Description. Course Duration. Goals

Design of 8 bit Pipelined Adder using Xilinx ISE

Design and Implementation of Hamming Code on FPGA using Verilog

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

Hardware Implementation of Cryptosystem by AES Algorithm Using FPGA

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

The DSP Primer 8. FPGA Technology. DSPprimer Home. DSPprimer Notes. August 2005, University of Strathclyde, Scotland, UK

Rapid Prototyping System for Teaching Real-Time Digital Signal Processing

Case Study on DiaHDL: A Web-based Electronic Design Automation Tool for Education Purpose

DESIGN AND IMPLEMENTATION OF DA- BASED RECONFIGURABLE FIR DIGITAL FILTER USING VERILOGHDL

ECE 450:DIGITAL SIGNAL. Lecture 10: DSP Arithmetic

An Efficient Designing of I2C Bus Controller Using Verilog

Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering. Winter/Summer Training

International Journal of Advanced Research in Computer Science and Software Engineering

FPGA based Simulation of Clock Gated ALU Architecture with Multiplexed Logic Enable for Low Power Applications

Power and Area Efficient Implementation for Parallel FIR Filters Using FFAs and DA

Optimized Design Platform for High Speed Digital Filter using Folding Technique

A Dedicated Hardware Solution for the HEVC Interpolation Unit

Design of Adaptive Filters Using Least P th Norm Algorithm

System Verification of Hardware Optimization Based on Edge Detection

Implementation of Pipelined Architecture Based on the DCT and Quantization For JPEG Image Compression

High Performance Pipelined Design for FFT Processor based on FPGA

CAD SUBSYSTEM FOR DESIGN OF EFFECTIVE DIGITAL FILTERS IN FPGA

VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS

Field Programmable Gate Array (FPGA)

IMPLEMENTATION OF DISTRIBUTED CANNY EDGE DETECTOR ON FPGA

FPGA Polyphase Filter Bank Study & Implementation

FPGA for Software Engineers

FPGAs: FAST TRACK TO DSP

DSP Resources. Main features: 1 adder-subtractor, 1 multiplier, 1 add/sub/logic ALU, 1 comparator, several pipeline stages

High Speed Pipelined Architecture for Adaptive Median Filter

FPGA architecture and design technology

FPGA Matrix Multiplier

Design & Implementation of 64 bit ALU for Instruction Set Architecture & Comparison between Speed/Power Consumption on FPGA.

Verilog for High Performance

Design of Feature Extraction Circuit for Speech Recognition Applications

A Guaranteed Stable Sliding Discrete Fourier Transform Algorithm to Reduced Computational Complexities

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University

Design and FPGA-Based Implementation of a High Performance 64-Bit DSP Processor

Design of Dual Port SDRAM Controller with Time Slot Register

Systolic Arrays for Reconfigurable DSP Systems

LOW-POWER SPLIT-RADIX FFT PROCESSORS

FPGA Based Design and Simulation of 32- Point FFT Through Radix-2 DIT Algorith

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Implementation of Double Precision Floating Point Multiplier Using Wallace Tree Multiplier

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

CHAPTER 4. DIGITAL DOWNCONVERTER FOR WiMAX SYSTEM

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

High Speed Special Function Unit for Graphics Processing Unit

University, Patiala, Punjab, India 1 2

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Efficient design and FPGA implementation of JPEG encoder

Designing an Improved 64 Bit Arithmetic and Logical Unit for Digital Signaling Processing Purposes

INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS.

DESIGN AND IMPLEMENTATION OF 32-BIT CONTROLLER FOR INTERACTIVE INTERFACING WITH RECONFIGURABLE COMPUTING SYSTEMS

Xilinx DSP. High Performance Signal Processing. January 1998

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

Synthesis Options FPGA and ASIC Technology Comparison - 1

FFT/IFFTProcessor IP Core Datasheet

FIR Filter Architecture for Fixed and Reconfigurable Applications

Controller Synthesis for Hardware Accelerator Design

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications

LogiCORE IP FIR Compiler v7.0

Efficient Implementation of Low Power 2-D DCT Architecture

FPGA Implementation of Low-Area Floating Point Multiplier Using Vedic Mathematics

ISSN Vol.02, Issue.11, December-2014, Pages:

Transcription:

FPGA Based FIR Filter using Parallel Pipelined Structure Rajesh Mehra, SBL Sachan Electronics & Communication Engineering Department National Institute of Technical Teachers Training & Research Chandigarh, UT, India rajeshmehra@yahoo.com, sblsachan@yahoo.co.in Abstract: - In this paper an efficient method is presented to design and implement FIR filter. The implementation is based on MAC algorithm which uses embedded multipliers of target FPGA for multiplyand-accumulate operations used in FIR filter implementation. Parallel Pipelined structure is used to implement the proposed FIR Filter in bit parallel manner taking optimal advantage of the look up table and multipliers of the FPGA device. This method is useful to enhance the speed performance. The proposed FIR filter is designed and simulated with Matlab and Xilinx DSP Tools, synthesized with Xilinx Synthesis Tool (XST), and implemented on Spartan 3E based xc3s5e FPGA device. The proposed parallel pipelined MAC algorithm based FIR filter can operate at an estimated frequency of 15.1 MHz with initial latency of 3 clocks by consuming very less resources in terms of slices, flip flops, LUTs and multipliers to provide cost effective solution for signal processing applications. Key-Words: - DSP, FIR, MAC, FPGA 1 Introduction The demands for digital products with programmability are growing day by day. Various industries like audio, video, and cellular industry rely heavily on digital technology. A great part of digital technology deals with digital signal processing. This aspect in engineering has gained increasing interest, especially with much of the world now turning to wireless technology. FPGAs are essentially arrays of uncommitted logic and signal processing resources [1]. These allow the designer to implement DSP functions using highly scalable, parallel processing techniques. There is a constant requirement for efficient use of FPGA resources where for a given system occupying less hardware can yield significant cost-related benefits like reduced power consumption, area for additional application functionality, potential to use a smaller, cheaper FPGA. Today s consumer electronics such as cellular phones and other multi-media and wireless devices often require digital signal processing (DSP) algorithms for several crucial operations in order to increase speed, reduce area and power consumption. Due to a growing demand for such complex DSP applications, high performance, low-cost Soc implementations of DSP algorithms are receiving increased attention among researchers and design engineers. Although ASICs and DSP chips have been the traditional solution for high performance applications, now the technology and the market demands are looking for changes. Most of the common functions performed by almost all DSP chips are FFTs, FIR filters, Interpolator, Decimator. Finite impulse response (FIR) digital filters are common DSP functions and are widely used in FPGA implementations. If very high sampling rates are required, full-parallel hardware must be used [2] where every clock edge feeds a new input sample and produces a new output sample. In case fully parallel implementation is not possible then partly serial approach can be adopted to enhance the system performance which is presented in this paper. Such filters can be implemented on FPGAs using combinations of the general purpose logic fabric, on-board RAM and embedded arithmetic hardware. Full-parallel filters ISBN: 978-96-474-33-8 311

cannot share hardware over multiple clock cycles and so tend to occupy large amounts of resource. Hence, efficient implementation of such filters is important to minimize hardware requirement. When implementing a DSP system on a platform containing dedicated arithmetic blocks, it is normal practice to utilize such blocks as far as possible in reference to any general purpose logic fabric. Fig.1(a) shows parallel implementation where 4 multipliers are used to process four taps i.e. one multiplier for one tap each. By using four embedded multipliers of target FPGA maximum speed can be achieved by consuming more resources. On one hand, high development costs and time-tomarket factors associated with ASICs can be prohibitive for certain applications while, on the other hand, programmable DSP processors can be unable to meet desired performance due to their sequential-execution architecture. In this context, embedded FPGAs offer a very attractive solution that balance high flexibility, time-tomarket, cost and performance [3]. Therefore, in this paper, an FIR filter is designed and implemented on FPGA devices whose impulse response may be expressed as: (a) Parallel Architecture Alternatively, serial architecture shown in Fib1.(b) can also be used to conserve area and implement the filter at a lower speed performance, by using only one multiplier, one accumulator, and a register. where C1,C2.CK are fixed coefficients and the x1, x2 xk are the input data words. A typical digital implementation will require K multiply-and-accumulate (MAC) operations. The new generations of FPGA not only provide an effective way of implementing high performance DSP functions but also provide the designer with an even more cost-effective solution. In this paper a 2 tap FIR Filter is designed and implemented on FPGA using parallel pipelined architecture. 2 Filter Architectures A traditional DSP chip performs the MAC FIR Filter function in serial manner where as an FPGA allows designers to implement this function in parallel style using dedicated multipliers and registers available on FPGA target devices. FPGAs are completely hardware configurable thus; the designer has the flexibility to use only the necessary resources that the system demands. In Fig.1 different structures have been shown to implement four tap FIR filter using MAC algorithm. (b) Serial Architecture Another option is semi-parallel approach called partly serial which is shown in Fig.1(c). It can be used to improve the system speed performance as compared to serial architecture where fully parallel approach is not possible due to the limitation of multipliers on FPGA target device. (c) Partially Serial Architecture Fig.1 Filter Architectures using MAC Algorithm [1] ISBN: 978-96-474-33-8 312

3 Proposed Design Simulation.2 2-Tap FIR Filter The proposed 2 tap MAC algorithm based symmetric FIR filter has been developed using Remez method. In this proposed work FIR filter has been designed and simulated using Matlab and Xilinx DSP Tools [4]-[7] by taking filter order 19 along with fully pipelined structure to enhance the speed performance. The complete design flow is shown in Fig.2 where first step is project matlab code development and simulation. The floating point simulated output is shown in Fig.3. Magnitude (db).1 -.1 -.2 5 1 15 2 Coefficient number 2-Tap FIR Filter, 1-channel Implementation -2-4 -6-8.2.4.6.8 1 Normalized Frequency Parallel MAC implementation; error against floating point MATLAB filter() 1 Input quant = [16,15] Coeff quant = [16].5 Output quant = [16,12] Truncated Bits = [] Input/Output Delay: 1 Max Error:.24348 Error Mean:.11758 -.5 1 2 3 4 5 6 Sample number Symmetric coefficients Input signal PSD Reference filter response Reference output PSD AW model output PSD Reference output AW model output Error Fig. 4 Fixed Point FIR Filter Response 4 Hardware Synthesis Fig. 2 Design Flow In hardware implementation embedded multipliers and pipelined registers have been used to enhance the speed performance of the designed FIR filter. The proposed MAC based FIR filter structure is shown in Fig.5..2 2-Tap FIR Filter.1 -.1 Symmetric coefficients -.2 5 1 15 2 Coefficient number 2-Tap FIR Filter, 1-channel Implementation Magnitude (db) -2-4 -6-8.2.4.6.8 1 Normalized Frequency Parallel MAC implementation; error against floating point MATLAB filter() 1 Input quant = [16,15] Coeff quant = [16].5 Output quant = [16,12] Truncated Bits = [] Input/Output Delay: 1 Max Error: 8.1439e-6 Error Mean: 1.1988e-7 -.5 1 2 3 4 5 6 Sample number Input signal PSD Reference filter response Reference output PSD AW model output PSD Reference output AW model output Error Fig. 3 Floating Point FIR Filter Response Then the equivalent fixed point file is simulated and verified whose output has been shown in Fig.4. Fig.5 Proposed MAC FIR Filter To observe the speed and resource utilization, RTL was generated, verified and synthesized. The proposed FIR filter has been implemented on Spartan 3E FPGA device. The proposed fully parallel pipelined MAC based FIR filter can operate at an estimated frequency of 15.1 MHz with initial latency of 3 clocks by consuming very less resources in terms of slices, flip flops LUTs and multipliers as shown in Table1. ISBN: 978-96-474-33-8 313

Table1. Speed Performance References: The resource utilization is shown in Table2. It can be observed from this table that developed MAC based FIR filter has consumed 397 slices, 759 flip flops, 723 LUTs and 1 multipliers of the target FPGA device. Table2. Resource Utilization [1] Steve Zack, Suhel Dhanani DSP Co-Processing in FPGAs Embedding High Performance, Low-Cost DSP Functions WP212 (v1.) March 18, 24. [2] K.N. Macpherson and R.W. Stewart Area efficient FIR filters for high speed FPGA Implementation, IEE Proc.-Vis. Image Signal Process., Vol. 153, No. 6, Page711-72, December 26. [3] Shahnam Mirzaei, Anup Hosangadi, Ryan Kastner FPGA Implementation of High Speed FIR Filters Using Add and Shift Method, International conference on Computer Design, ICCD, pp. 38-313, IEEE 26. 5 Conclusion In this paper, a parallel pipelined MAC algorithm based 2 tap FIR filter has been presented to enhance the speed, throughput and area efficiency by taking an optimal advantage of look up table and embedded multiliers of target FPGA. The proposed filter has been designed and simulated using Matlab and Xilinx DSP tools. The synthesis of then developed design has been performed on Spartan 3E based xc3s5e FPGA device. The result shows that the proposed FIR filter can operate at an estimated frequency of 15.1 MHz with latency of 3 clocks. The resource consumption of the developed filter is 8%, 8% 7% and 5% in terms of slices, flip flops, LUTs and multipliers respectively on target FPGA device to provide cost effective solution for DSP applications. Acknowledgement The authors would like to thank Dr. S. S. Pattnaik, Professor and Head, ETV Department, National Institute of Technical Teachers Training & Research, Chandigarh, India for constant help and suggestions. The authors are also thankful to Dr. M. P. Poonia, Director, National Institute of Technical Teachers Training & Research, Chandigarh, India for constant inspirations and support throughout this research work. [4] Prithviraj Banerjee, Malay Haldar, David Zaretsky, Robert Anderson, Overview of a compiler for synthesizing Matlab programs onto FPGAs,IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Page 312-324 Vol. 12, No. 3, March 24. [5] Hitesh Patel Synthesis and Implementation strategies to accelerate design performance, WP229 (v1.) July 6, 25. [6] Philippe Garrault and Brian Philofsky HDL coding practices to accelerate design performance, WP231 (1.1) January 6, 26. [7] Mathworks, Users Guide Filter Design Toolbox-4, March-27. Authors Dr. S B L Sachan has recieved his PhD from Kanpur University Kanpur-India. He has completed his study of Video Hardware System from JICA- Tokyo Japan and Broadcast Television System (BTS) from BOSCH-Darmstadt West Germany. He was awarded UNESCO FELLOWSHIP in 1986 for Japan and in 1987 for Germany under Technician Education in India. ISBN: 978-96-474-33-8 314

He is currently Professor & Head, Electronics & Communication Engineering Department at National Institute of Technical Teachers Training & Research, Chandigarh, India. Dr. S B L Sachan has more than 32 years of academic and 6 years of Industrial experience. He has authored 21 research papers in reputed Journals and conferences. He is co-author of TET book writing for CPSC Manila(Philippines). His areas of interest are Wireless & Mobile Communication, Advanced Digital Communication, Optical Fiber Communications and Broadcast Engineering. Rajesh Mehra received the Bachelors of Technology degree in Electronics and Communication Engineering from National Institute of Technology, Jalandhar, India in 1994, and the Masters of Engineering degree in Electronics and Communication Engineering from National Institute of Technical Teachers Training & Research, Panjab Univsrsity, Chandigarh, India in 28. He is pursuing Doctor of Philosophy degree in Electronics and Communication Engineering from National Institute of Technical Teachers Training & Research, Panjab Univsrsity, Chandigarh, India. He is an Associate Professor with the Department of Electronics & Communication Engineering,, National Institute of Technical Teachers Training & Research, Ministry of Human Resource Development, Chandigarh, India. His current research and teaching interests are in Signal, and Communications Processing, Very Large Scale Integration Design. He has authored more than 1 research publications including more than 5 in Journals. Mr. Mehra is member of IEEE and ISTE. ISBN: 978-96-474-33-8 315