Implementation of High Speed FIR Filter using Serial and Parallel Distributed Arithmetic Algorithm

Similar documents
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

Parallel FIR Filters. Chapter 5

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

Two High Performance Adaptive Filter Implementation Schemes Using Distributed Arithmetic

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

Vertical-Horizontal Binary Common Sub- Expression Elimination for Reconfigurable Transposed Form FIR Filter

Optimized Design and Implementation of a 16-bit Iterative Logarithmic Multiplier

Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design

Design of Delay Efficient Distributed Arithmetic Based Split Radix FFT

A HIGH PERFORMANCE FIR FILTER ARCHITECTURE FOR FIXED AND RECONFIGURABLE APPLICATIONS

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

FPGA Based FIR Filter using Parallel Pipelined Structure

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication

DESIGN AND IMPLEMENTATION OF DA- BASED RECONFIGURABLE FIR DIGITAL FILTER USING VERILOGHDL

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

CHAPTER 4. DIGITAL DOWNCONVERTER FOR WiMAX SYSTEM

Field Programmable Gate Array (FPGA)

Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter

Hardware Description of Multi-Directional Fast Sobel Edge Detection Processor by VHDL for Implementing on FPGA

COMPARISON OF DIFFERENT REALIZATION TECHNIQUES OF IIR FILTERS USING SYSTEM GENERATOR

Xilinx DSP. High Performance Signal Processing. January 1998

University, Patiala, Punjab, India 1 2

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3

FPGA Implementation of High Speed FIR Filters Using Add and Shift Method

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

[Sahu* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Fault Tolerant Parallel Filters Based On Bch Codes

TOPIC : Verilog Synthesis examples. Module 4.3 : Verilog synthesis

Low Power Complex Multiplier based FFT Processor

Comparison of pipelined IEEE-754 standard floating point multiplier with unpipelined multiplier

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

DESIGN AND IMPLEMENTATION OF ADDER ARCHITECTURES AND ANALYSIS OF PERFORMANCE METRICS

Research Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-6, Issue-8) Abstract:

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 10 /Issue 1 / JUN 2018

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

The Efficient Implementation of Numerical Integration for FPGA Platforms

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

Low Power Floating Point Computation Sharing Multiplier for Signal Processing Applications

Compact Clock Skew Scheme for FPGA based Wave- Pipelined Circuits

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

DSP Resources. Main features: 1 adder-subtractor, 1 multiplier, 1 add/sub/logic ALU, 1 comparator, several pipeline stages

Simulation & Synthesis of FPGA Based & Resource Efficient Matrix Coprocessor Architecture

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

INTRODUCTION TO CATAPULT C

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

Power and Area Efficient Implementation for Parallel FIR Filters Using FFAs and DA

DUE to the high computational complexity and real-time

II. MOTIVATION AND IMPLEMENTATION

Programmable Logic Devices

OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2

A Novel Distributed Arithmetic Multiplierless Approach for Computing Complex Inner Products

Design of Convolution Encoder and Reconfigurable Viterbi Decoder

A Low Power DDR SDRAM Controller Design P.Anup, R.Ramana Reddy

FPGAs: FAST TRACK TO DSP

INTRODUCTION TO FPGA ARCHITECTURE

Implementation of Floating Point Multiplier Using Dadda Algorithm

Design and Implementation of 3-D DWT for Video Processing Applications

Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra Pradesh, India

High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications

Field Programmable Gate Array

Programmable Logic. Any other approaches?

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Stratix II vs. Virtex-4 Performance Comparison

Digital Design Using Digilent FPGA Boards -- Verilog / Active-HDL Edition

FPGA Implementation of MIPS RISC Processor

FIR Filter Architecture for Fixed and Reconfigurable Applications

A High Speed Design of 32 Bit Multiplier Using Modified CSLA

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

Efficient Implementation of Low Power 2-D DCT Architecture

Design & Analysis of 16 bit RISC Processor Using low Power Pipelining

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University

Hardware Realization of FIR Filter Implementation through FPGA

Reduction of Latency and Resource Usage in Bit-Level Pipelined Data Paths for FPGAs

PINE TRAINING ACADEMY

System Verification of Hardware Optimization Based on Edge Detection

Low-Power, High-Throughput and Low-Area Adaptive Fir Filter Based On Distributed Arithmetic Using FPGA

Design & Implementation of 64 bit ALU for Instruction Set Architecture & Comparison between Speed/Power Consumption on FPGA.

Australian Journal of Basic and Applied Sciences

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Efficient design and FPGA implementation of JPEG encoder

An Efficient Implementation of Floating Point Multiplier

ABSTRACT I. INTRODUCTION. 905 P a g e

DISTRIBUTED ARITHMETIC (DA) provides multiplierless

DESIGN AND IMPLEMENTATION BY USING BIT LEVEL TRANSFORMATION OF ADDER TREES FOR MCMS USING VERILOG

Implimentation of A 16-bit RISC Processor for Convolution Application

Topics. Midterm Finish Chapter 7

Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications for Efficient FIR Filter Implementation

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

An FPGA Based Floating Point Arithmetic Unit Using Verilog

Transcription:

Volume 25 No.7, July 211 Implementation of High Speed FIR Filter using Serial and Parallel Distriuted Arithmetic Algorithm Narendra Singh Pal Electronics &Communication, Dr.B.R.Amedkar Tecnology, Jalandhar (Punja) Harjit Pal Singh Physics Dr.B.R.Amedkar Tecnology, Jalandhar (Punja) R.K.Sarin, Electronics & Communication Dr.B.R. Amedkar Technology Jalandhar Sarajeet Singh Physics Dr.B.R. Amedkar Technology Jalandhar (Punja) ABSTRACT This paper descries the implementation of highly efficient multiplierless serial and parallel distriuted arithmetic algorithm for FIR filters. Distriuted Arithmetic (DA) had een used to implement a it-serial scheme of a general symmetric version of an FIR filter due to its high staility and linearity y taking optimal advantage of the look-up tale (LUT) ased structure of FPGAs. The performance of the itserial and it-parallel DA technique for FIR filter design is analyzed and the results are compared to the conventional FIR filter design techniques. The proposed algorithm has een synthesized with Xilinx ISE 1.1i and implemented as a target device of Spartan3E FPGA. KEWORDS Distriuted Arithmetic (DA), FIR filter, Look up tale (LUT), FPGA 1. INTRODUCTION Due to the intensive use of FIR filters in video and communication systems, high performance in speed, area and power consumption is demanded. Basically, digital filters are used to modify the characteristic of signals in time and frequency domain and have een recognized as primary digital signal processing [1]. In DSP, the design methods were mainly focused in multiplier-ased architectures to implement the multiply-and- Accumulate (MAC) locks that constitute the central piece in FIR filters and several functions [2]. The FIR digital filter is presented as: N 1 y[ n] = c[ k ] x[ n k ] k= (1) Where y[ n ] is the FIR filter output, x[ n k] is input data and c[ k ] represents the filter coefficients Eq. (1) shows that multiplier-ased filter implementations may ecome highly expensive in terms of area and speed. This issue has een partially solved with the new generation of low-cost FPGAs that have emedded DSP locks [3]. The advantages of the FPGA approach to digital filter implementation include higher sampling rates than are availale from traditional DSP chips, lower costs than an ASIC for moderate volume applications, and more flexiility than the alternate approaches. In literature, several multiplier-less schemes had een proposed. These methods can e classified in two categories according to how they manipulate the filter coefficients for the multiply operation. The first type of multiplier-less technique is the conversion-ased approach, in which the coefficients are transformed to other numeric representations whose hardware implementation or manipulation is more efficient than the traditional inary representation. Example of such techniques are the Canonic Sign Digit (CSD) method, in which coefficients are represented y a comination of powers of two in such a way that multiplication can e simply implemented with adder/sutractors and shifters [4], and the Dempster-Mcleod method, which similarly involves the representation of filter coefficients with powers of two ut in this case arranging partial results in cascade to introduce further savings in the usage of adders [5]. The second type of multiplier-less method involves use of memories (RAMs, s) or Look-Up Tales (LUTs) to store pre-computed values of coefficient operations. These memory-ased methods involve Constant Coefficient Multiplier method and the very-well known Distriuted Arithmetic method [6] as examples. Distriuted Arithmetic (DA) algorithm appeared as a very efficient solution especially suited for LUT-ased FPGA architectures. Croisier et al [7] had proposed the multiplierless architecture of DA algorithm and it is ased on an efficient partition of the function in partial terms using 2 s complement inary representation of data. The partial terms can e pre-computed and stored in LUTs. oo et al. [8] oserved that the requirement of memory/lut capacity increases exponentially with the order of the filter, given that DA implementations need 2 K words, K eing the numer of taps of the filter. The work in this paper presents the design and implementation of serial and parallel distriuted algorithm for FIR filter target as Spartan 3E FPGA. The results of the implementation experiment are analyzed in terms of parameters such as area and speed. The rief description of the distriuted algorithm is presented in Section 2. The implementation of the proposed technique for FIR filters is discussed in Section 3. The Section 4 presents the implementation results. The last section concludes the work and presents the future work. 26

Volume 25 No.7, July 211 2. DISTRIBUTED ARITHMETIC (DA) Distriuted arithmetic (DA) is an important FPGA technology. It is extensively used in computing the sum of products, N 1 y[ n ] =< c, x >= c[ n ] x[ n ] n= DA system, assumes that the variale x[ n ] is represented y- (2) B 1 (3) = x[ n] = x [ n] 2, x [ n] [,1] c[ n] If are the known coefficients of the FIR filter, then output of FIR filter in it level form is: N 1 B 1 (4) n= = y= c[ n] x [ n] 2 In distriuted arithmetic form- B 1 N 1 y = 2 f ( c[ n], x [ n]) = n= In Eq. (5) second summation term realizing as one LUT. The use of this LUT or eliminates the multipliers [9]. For signed 2 s complement numer output of FIR filter can e computed as- B 1 N 1 B y= 2 f( c[ n], x [ n]) 2 f( c[ n], x [ n]) B = n= Where B represents the total numer of its used. Fig 1 shows the Distriuted architecture for FIR filter and different with the MAC architecture. When x[ n ] <, Binary representation of the input is [1], B 1 x[ n] = x[] x[ n]2 = 1 The output in distriuted arithmetic form- N 1 B 1 N 1 y = ( c[ n] x []) ( c[ n] x [ n])2 n= = 1 n= If the numer of coefficients N is too large to implement the full word with a single LUT (Input LUT it width = numer of coefficients), then partial tales can e and add the results as shown in Fig2. If pipeline registers are also added, then this modification will not reduce the speed, ut can dramatically reduce the size of the design [8]. (5) (6) (7) (8) 2.1 Parallel Distriuted Arithmetic Architecture A asic DA architecture, for a length N th sum-of-product computation, accepts one it from each of N words. If two its per word are accepted, then the computational speed can e essentially improved. The maximum speed can e achieved with the fully pipelined word-parallel architecture as shown in Fig 3. For maximum speed, a separate (with identical content) for each it vector x [n] should e provided [11]. 3. IMPLEMENTATION To evaluate the performance of the Distriuted Arithmetic serial and parallel scheme for symmetric FIR filters are implemented and synthesized using Xilinx ISE 1.1 Target as a Spartan 3E (Xc3s1c-5vq1) FPGA device and the results are compared to conventional FIR filter. ISE design software offers a complete design suit ased programmale logic devices on Xilinx ISE. The design can e simulated and synthesized in the form of schematic or HDL entry on Xilinx ISE platform. Spartan3E FPGA can e programming directly from Xilinx ISE in configuration logic locks interconnected with switching matrix. Spartan 3E has a microlaz DSP processor of 325 MHz operating frequency, so that DSP design can e implemented for less resources, high speed and low power. The designed FIR filter is programmed in verilog HDL language [12]. The proposed design is implemented for small memory location LUT and also for large memory location LUT to analyze the performance of the proposed design for speed and area parameters. In the present work, the proposed design is analyzed through 3-tap and 16 tap DA FIR filters. 3.1 Design Procedure: Step1: Derive the filter coefficient according to specification of filter. Step2: Store the inputs value in input register Step3: Design the LUTs as shown in Fig.4, which represents all the possile sum comination of filter coefficients. Step4: Accumulate and shift the value according to partial term eginning with LSB of the input and shift it to the right to add it to the next partial result. Step5: First value must e sutracted, due to negative it of MSB. Step6: Analyze the output of filter as per specifications, otherwise go to step 1. Step7: The same procedure is applied for Parallel DA FIR from step 1 to step 6 except in Step3 where it address value is eing called for 2-its at single time, so that two times LUT is required in comparison to the serial DA. 27

Volume 25 No.7, July 211 Word Shift register Multiplier Accumulator (a) x [N-1] x[1] x[] c[n-1] c[1] c[]. Register Bit shift register Arithmetic Tale Accumulator x B[] x 1[] x [] () x B[1] x 1[1] x [1] x B[N-1]. x 1[N-1] x [N-1] LUT /- Register 2-1 x B[] x 1[] x [] Add,<=t<B Su,t=B Fig 1: (a) Conventional MAC and () DA Architecture x B[N-1] x 1[N-1] x [N-1] x B[N] x 1[N] x [N] x B[2N-1] x 1[2N-1] x [2N-1] x B[2N] x 1[2N] x [2N] x B[3N-1] x 1[3N-1] x [3N-1] /- REGISTER x B[3N] x 1[3N] x [3N] x B[4N-1] x 1[4N-1] x [4N-1] 2-1 Add, <=t<b Fig 2: DA with Tale partitioning to yield a reduced Size Su,t=B 28

Volume 25 No.7, July 211 x [] x [N-1] x 1[] x 1[N-1] x 2[] x 2[N-1] x B[] ;;; ;;; ;;; ;;; 2 1 2 2 2 B x B[N-1] Fig 3: Parallel Distriuted Arithmetic Architecture Input Signal x[n-1] x[n-1] x[n-2] x[] x[1] x[2] x[2] x[1] x[] DATA 1 c[] 1 c[1] 1 1 c[1]c[] 1 c[2] 1 1 c[2]c[] 1 1 c[2]c[1] 1 1 c[2]c[1]c[] Sign Control /- Accumulator 2-1 Fig 4: 2 3 xb LUT ased DA FIR filter 4. IMPLEMENTATION RESULTS The 3-tap and 16-tap FIR filters are implemented for performance analysis of the proposed DA algorithm. The 3-tap DA FIR filter required very small area, since only single LUT is used for implementation due to its 2 3 memory locations. The control unit clears uffers and then the input collected y the input registers during the previous clock cycles is serially injected to the circuit. These its address a value in the input LUT structure, and this partial result is accumulated and shifted y the shifter/adder unit, taking into account that the very first value must e sutracted. Numer of clock cycle depends on the numer of its in the input. The test ench simulation results for 3 tap DA ased FIR filter are shown in Fig 5(a) and Fig 5(). The RTL view of 3-tap serial and parallel DA FIR filter, which shows the flow from input to output, are presented in Fig 6(a) and Fig.6 (). Fig 5 (a): Simulation for 3-tap serial DA FIR filter 29

Volume 25 No.7, July 211 Fig 5 (): Simulation for 3-tap parallel DA FIR filter For 16-tap DA FIR filter, partitioning the input into 4 it units, reduces the single look up tale 2 16 memory locations to 4 2 4 LUTs. The Logic flow is similar as shown in Fig.4, excepting that there are 4 partial results each from one of the each asic 4 input LUT cells in this case. The addition of these partial results gives the value of output and this output is sent to accumulate where shifting for next value occurs as shown in Fig 2. The precision of n-it input, the partial results accumulated and shifted n times with new value addressed in 4 input LUT units. Finally in (n1) th clock cycle signal from control indicate latch structure to output the final result, which will e shown in every (n2) th clock cycle. In parallel distriuted arithmetic, two its send for a single word so an extra LUT is required for second it. For 16 tap FIR filter total numer of LUT required is 8. Each contains 2 4 memory address location. Fig.7 (a) and Fig.7 () show the simulation result for 16 tap DA ased FIR filters for serial and parallel DA algorithm resp. The register transfer level (RTL) views for 16 tap DA FIR filters that shows there is no multiplier use in the architecture of distriuted arithmetic FIR filters as presented in Fig 8. Fig 6 (a): RTL view of 3-tap serial DA FIR filter Fig 8 (a): RTL view of 16-tap serial DA FIR filter Fig 6 (): RTL view of 3-tap serial DA FIR filter Fig 8 (): RTL view of 16-tap serial DA FIR filter Fig 7 (a): Simulation for 16-tap serial DA FIR filter 3

Volume 25 No.7, July 211 Fig 7 (): Simulation for 16 tap parallel DA FIR filter 4.1 Discussion The implementation results of 3-tap and 16-tap FIR filter after applying the distriuted arithmetic algorithm as shown in Tale 1 and 2. The 3-tap parallel DA FIR filter take high speed and lowest power dissipation in comparison to serial DA FIR filter and conventional FIR filter as shown in Tale 1. For small tap filters, the serial DA algorithm saves 5 % of the area and cost in comparison to the conventional design techniques. The speed is approximately 2 times for serial DA and 3 times in parallel DA is achieved and very less power is consumed in comparison to simple FIR filter. Tale 1. Synthesis and simulation result for 3 tap FIR filter Tale 2. Synthesis and simulation result for 16 tap FIR filter parameter Conventional FIR Serial DA FIR Parallel DA FIR Slices 412 18 243 Flip-Flop 35 171 192 4 input LUTs 53 263 47 Delay(ns) 46.11 19.2385 15.725 Power(mW) 1.3478 12.1451 11.785 Parameter Direct FIR Filter Serial DA FIR Filter Parallel DA FIR filter Slices 38 25 26 Slice Flip-flops 32 3 44 4 input LUTs 57 47 3 Delay (ns) 13.116 6.413 4.14 Power (mw) 1.522 1.3134 1.1965 TABLE 2 shows that after partitioning the input its and increasing the numer of LUTs in 16 tap DA FIR filter, parallel DA FIR filter is area efficient in comparison to conventional FIR filter ut at slight compromise of power which is equivalent to Conventional FIR, Parallel DA FIR gives high speed aout 3 times faster than conventional FIR filter. Whereas in serial DA FIR filter saved resources area is greater than 5% and speed is 2 times in comparison to conventional FIR filter. Fig 9 and Fig 1 shows the layout of serial DA and Parallel DA FIR Filter in Cadence SOC Encounter 18nm which represent the floor-plan and core area of the digital circuit. Fig 9: Lay out for 16 tap serial DA FIR filter 31

Volume 25 No.7, July 211 Fig 1: Lay out for 16 tap Parallel DA FIR filter 4. CONCLUSION & FUTURE WORK The implementation of highly efficient serial and parallel DA algorithm was presented in this work. The results were analyzed for 3-tap and 16-tap FIR filter using partitioned input ased LUT on Xilinx 1.1i as a target of SPARTAN-3E FPGA device.the speed performance of the Parallel DA FIR Filter was superior in comparison to all other techniques. For small tap filter less area, high speed and low power consumption is achieved after applying the Serial and Parallel DA technique. In large-tap FIR filter, speed of parallel DA FIR design technique ecome 3 times faster than that of conventional FIR filter. The proposed algorithm for FIR filters is also area efficient since approximately 5% of the area is saved with this technique as compared to conventional FIR filter design. Area efficiency and high speed is achieved with parallel DA technique at very slight cost of power consumption for large tap FIR filter. Since, distriuted arithmetic FIR filters are area efficient and contained less delay, so these filters can e used in various applications such as pulse shaping FIR filter in WCDMA system, software design radio and signal processing system for high speed. In future the work to reduce the power consumption in large tap DA FIR filters could e performed. 5. REFERENCES [1] Mohamed al mahdi Eshtawie and Masurie Bin Othman, An Algorithm Proposed For FIR Filter Cofficent Representation International Journel of Mahematics and computer Sciences 28.pp24-3. [2] John G. Prokis, Manolakis, Digital Signal Processing Principles,algorithm and applications (Fourth Edition)- 28 [3] Antolin Agatep, Xilinx Spartan-II FIR Filter Solution, WP116 (v1.) April 5, 2 [4] M. amada, and A. Nishihara, High-Speed FIR Digital Filter with CSD Coefficients Implemented on FPGA, in Proceedings of IEEE Design Automation Conference, 21, pp. 7-8. [5] M.A. Soderstrand, L.G. Johnson, H. Arichanthiran, M. Hoque, and R. Elangovan, Reducing Hardware Requirement in FIR Filter Design, in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing 2, Vol. 6, pp. 3275 3278 [6] Martinez-Peiro, J. Valls, T. Sansaloni, A.P. Pascual, and E.I. Boemo, A Comparison etween Lattice, Cascade and Direct Form FIR Filter Structures y using a FPGA Bit-Serial DA Implementation, in Proceedings of IEEE International Conference on Electronics, Circuits and Systems, 1999, Vol. 1,pp. 241 244. [7] A. Croisier, D. J. Estean, M. E. Levilion, and V. Rizo, Digital Filter for PCM Encoded Signals, U.S. Patent No. 3,777,13, issued April, 1973 [8] H. oo, and D. Anderson, Hardware-Efficient Distriuted Arithmetic Architecture for High-Order Digital Filters, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 25, Vol. 5, pp. 125 128. [9] T.Vigneswarn and P.Suarami Reddy Design of Digital FIR Filter Based on DDA algorithm Journal of Applied Science,27 [1] Stanley A. White, Application of Distriuted Arithmetic to Digital Signal Processing: A Tutorial Review IEEE Acoustic speech signal processing Magazine, July 1989 [11] Attri, S.; Sohi, B.S.; Chopra,.C.; Efficient design of application specific DSP cores using FPGAs in Proceedings of 4 th IEEE International Conference on application specific integrated circuits Oct. 21 Page(s):462 466 [12] Samir Palnitkar, Verilog HDL A guide to Digital Design and Synthesis Second Edition-27 32