FPGA Implementation of High Speed FIR Filters Using Add and Shift Method

Size: px
Start display at page:

Download "FPGA Implementation of High Speed FIR Filters Using Add and Shift Method"

Transcription

1 FPGA Implementation of High Speed FIR Filters Using Add and Shift Method Abstract We present a method for implementing high speed Finite Impulse Response (FIR) filters using just registered adders and hardwired shifts. We extensively use a modified common subexpression elimination algorithm to reduce the number of adders. We target our optimizations to Xilinx Virtex II devices where we compare our implementations with those produced by Xilinx CoregenTM using istributed Arithmetic. We observe up to 50% reduction in the number of slices and up to 75% reduction in the number of s for fully parallel implementations. We also observed up to 50% reduction in the total dynamic power consumption of the filters. Our designs perform significantly faster than the MAC filters, which use embedded multipliers. 1. Introduction FPGAs are being increasingly used for a variety of computationally intensive applications, mainly in the realm of igital Signal Processing (SP) and communications [2-8]. ue to rapid increases in the technology, current generation of FPGAs contain a very high number of Configurable Logic Blocks (CLBs), and are becoming more feasible for implementing a wide range of applications. The high non-recurring engineering (NRE) costs and long development time for ASICs are making FPGAs more attractive for application specific SP solutions. SP functions such as FIR filters and transforms are used in a number of applications such as communication and multimedia. These functions are major determinants of the performance and power consumption of the whole system. Therefore it is important to have good tools for optimizing these functions. Equation (I) represents the output of an L tap FIR filter, which is the convolution of the latest L input samples. L is the number of coefficients h(k) of the filter, and x(n) represents the input time series. y[n] = h[k] x[n-k] k= 0, 1,..., L-1 (I) The conventional tapped delay line realization of this inner product is shown in Figure 1. This implementation translates to L multiplications and L-1 additions per sample to compute the result. This can be implemented using a single Multiply Accumulate (MAC) engine, but it would require L MAC cycles, before the next input sample can be processed. Using a parallel implementation with L MACs can speed up the performance L times. A general purpose multiplier occupies a large area on FPGAs. Since all the multiplications are with constants, the full flexibility of a general purpose multiplier is not required, and the area can be vastly reduced using techniques developed for constant multiplication. Though most of the current generation FPGAs such as Virtex II TM have embedded multipliers to handle these multiplications, the number of these multipliers is typically limited. Furthermore, the size of these multipliers is limited to only 18 bits, which limits the precision of the computations for high speed requirements. The ideal implementation would involve a sharing of the Combinational Logic Blocks (CLBs) and these multipliers. In this paper, we present a technique that is better than conventional techniques for implementation on the CLBs. X [n] x h x x x L-1 h L-2 h L-3 h 1 z -1 z z -1 z -1 Figure 1. A MAC FIR filter block diagram An alternative to the above approach is istributed Arithmetic (A) which is a well known method to save resources. Using A method, the filter can be implemented either in bit serial or fully parallel mode to trade bandwidth for area utilization. Assuming coefficients c[n] are known constants, equation (I) can be rewritten as follows: y[n] = c[n] x[n] n = 0, 1,, N-1 (II) Variable x[n] can be represented by: x [n] = x b [n] 2 b b=0, 1,, B-1 (III) x b [n] [0, 1] where x b [n] is the b th bit of x[n] and B is the input width. Finally, the inner product can be rewritten as follows: y = c[n] x b [k] 2 b = c[0] (x B-1 [0]2 B-1 x B-2 [0]2 B-2 x 0 [0]2 0 ) c[1] (x B-1 [1]2 B-1 x B-2 [1]2 B-2 x 0 [1]2 0 ) c[n-1] (x B-1 [N-1]2 B-1 x B-2 [0]2 B-2 x 0 [N- 1]2 0 ) = (c[0] x B-1 [0] c[1] x B-1 [1] c[n-1] x B-1 [N- 1])2 B-1 (c[0] x B-2 [0] c[1] x B-2 [1] c[n-1] x B- 2 [N- 1])2 B-2 (c[0] x 0 [0] c[1] x 0 [1] c[n-1] x 0 [N-1])2 0 = 2 b c[n] x b [k] (IV) where n=0, 1,, N-1 and b=0, 1,, B-1 The coefficients in most of SP applications for the multiply accumulate operation are constants. The partial products are obtained by multiplying the coefficients c i by multiplying one bit of data x i at a time in AN operation. These partial products should be added and the result depend only on the outputs of the input shift registers. The AN functions and adders can be replaced by Look Up Tables (s) that gives the partial product. This is shown in Figure 2. Input sequence is fed into the shift register at the input sample rate. The serial output is x h 0 z -1 y [n]

2 presented to the RAM based shift registers (registers are not shown in Figure for simplicity) at the bit clock rate which is n1 times (n is number of bits in a data input sample) the sample rate. The RAM based shift register stores the data in a particular address. The outputs of registered s are added and loaded to the scaling accumulator from LSB to MSB and the result which is the filter output will be accumulated over the time. For an n bit input, n1 clock cycles are needed for a symmetrical filter to generate the output. In conventional MAC method with a limited number of MAC engines, as the filter length is increased, the system sample rate is decreased. This is not the case with serial A architectures since the filter sample rate is decoupled from the filter length. As the filter length is increased, the throughput is maintained but more logic resources are consumed. Though the serial A architecture is efficient by construction, its performance is limited by the fact that the next input sample can be processed only after every bit of the current input samples are processed. Each bit of the current input samples takes one clock cycle to x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 scaling accumulator << x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 0 [i1] x 1 [i1] x 2 [i1] x 3 [i1] x 4 [i1] x 5 [i1] x 6 [i1] x 7 [i1] scaling accumulator Figure 3. A 2 bit parallel A FIR filter block diagram A popular technique for implementing the transposed form of FIR filters is the use of a multiplier block, instead of using multipliers for each constant as shown in Figure 4. The multiplications with the set of constants {h k } are replaced by an optimized set of additions and shift operations, involving computation sharing. Further optimization can be done by factorizing the expression and finding common subexpressions. << Address ata C C 0 C C 0 C 1 C 2 C 3 Figure 2. A serial A FIR filter block diagram process. Therefore, if the input bitwidth is 12, then a new input can be sampled every 12 clock cycles. The performance of the circuit can be improved by modifying the architecture to a parallel architecture which processes the data bits in groups. Figure 3 shows the block diagram of a 2 bit parallel A FIR filter. The tradeoff here is performance for area since increasing the number of bits sampled has a significant effect on resource utilization on FPGA. For instance, doubling the number of bits sampled, doubles the throughput and results in the half the number of clock cycles. This change doubles the number of s as well as the size of the scaling accumulator. The number of bits being processed can be increased to its maximum size which is the input length n. This gives the maximum throughput to the filter. For a fully parallel implementation of the A filter (PA), the number of s required would be enormous. In this work we show an alternative to the PA method for implementing high speed FIR filters that consumes significantly lesser area and power. Figure 4. Replacing constant multiplication by multiplier block The performance of this filter architecture is limited by the latency of the biggest adder and is the same as that of the PA. In this work, we present a filter architecture based on the architecture illustrated in Figure 4. We use a modified common subexpression elimination algorithm that minimizes the number of adders as well as the number of latches. We show that such a scheme produces similar performance as the PA method, with significantly reduced area. We compare the area produced by our method after place and route with the PA based implementation produced by the commercial Xilinx Coregen TM tool. In addition, we also simulate the designs and compare the dynamic power consumption. The rest of the paper is organized as follows: Section 2 presents some related work. In Section3, we describe our filter architecture. In Section 4, we present our optimization algorithm for reducing the total area of the design. In Section 5, we describe our experimental setup and present our results. Finally we conclude the paper in Section Related Work Multiplications with constants have to be performed in many signal processing and communication applications such as FIR filters, audio, video and image

3 processing. Using a general purpose multiplier for performing constant multiplication is expensive in terms of area and latency. Though embedded multipliers are very popular in modern FPGAs, the size (18x18) and the number of these multipliers is a limitation. The partial product Constant Coefficient Multiplier called KCM in [9] is a popular approach for performing constant multiplications in FPGAs. The partial products are generated by table look-ups and are summed up using fast adders. This KCM is about one quarter the size of a fully functional multiplier [9]. istributed Arithmetic has been used extensively for implementing FIR filters on FPGAs [12, 13]. In [12], both Serial and Parallel istributed Arithmetic designs are used to implement a 16-tap FIR filter. The author notes the superior performance achieved by using PA over using a traditional VLIW SP processor. The Xilinx TM CORE Generator has a highly parameterizable, optimized filter core for implementing digital FIR filters [13]. It generates synthesized core that targeting a wide range of Xilinx devices. The Serial istributed Arithmetic (SA) architecture is very area efficient for serial implementations of filters. The fully Parallel istributed Arithmetic (PA) structure can be used for fastest sample rates, but it suffers from excessive area. In this work, we compare the area of our fully parallel architecture derived from the Shift and Add method and compare the area with that produced by PA. In [14], FIR filters are implemented using the Add and Shift method. Canonical Signed igit (CS) encoding is used for the coefficients to minimize the number of additions. The authors observe that the critical path of the filter can be reduced considerably by using registered adders, at minimal increase in area. This is because of the availability of a flip-flop at the output of the. This fact is illustrated in Figure 5, which compares a non-registered and a registered output. We also use registered adders in our design, and the performance of our filters is limited by the latency of the biggest adder. In comparison with [14], we extensively use common subexpression elimination for reducing the number of adders. Furthermore, the use of subexpression elimination increases the number of latches that are required to make the filter fully parallel. Our subexpression elimination algorithm considers both the cost of the latches as well as the cost of an adder. 3. Filter architecture We base our filter architecture on the transposed form of the FIR filter as shown in Figure 1. The filter can be divided into two main parts, the multiplier block and the delay block, and is illustrated in Figure 4. In the multiplier block, the current input variable x[n] is multiplied by all the coefficients of the filter to produce the y i outputs. These y i outputs are then delayed and added in the delay block to produce the filter output y[n]. We perform all our optimizations in the multiplier block. The constant multiplications are decomposed into registered additions and hardwire shifts. The additions are performed using two input adders, which are arranged in the fastest tree structure. We use registered adders, so that the performance of the filter is only limited by the slowest adder. We use common subexpression elimination extensively, to reduce the number of adders, which leads to a reduction in the area. To synchronize all the intermediate values in the computation, we insert registers in the dataflow, wherever necessary. X 1 y 1 X 0 y 0 X y s X z -1 Logic Block 2 Logic Block 1 carry s 1 s 0 X 1 y 1 y Logic Block 2 X 0 y 0 s' 0 Logic Block 1 s' carry (a) (b) Figure 5. Registered adder at no additional cost Performing subexpression elimination can sometimes increase the number of registers substantially, and the overall area could possibly increase. Consider the two expressions F 1 and F 2 which could be part of the multiplier block. F 1 = A B C F 2 = A B C E Figure 6 shows the original unoptimized expression trees. Both the expressions have a minimum critical path of two addition cycles. These expressions require a total of six registered adders for the fastest implementation, and no extra registers are required. From the expressions we can see that the computation A B C is common to both the expressions. If we extract this subexpression, we get the structure shown in Figure 7. Since both and E need to wait for two addition cycles to be added to (A B C), we need to use two registers each for and E, such that new values for A,B,C, and E can be read in at each clock cycle. Assuming that the cost of an adder and a register with the same bitwidth are the same, the structure shown in Figure 7 occupies more area than the one shown in Figure 6. A more careful subexpression elimination algorithm would only extract the common subexpression A B (or AC or B C). The number of adders is decreased by one from the original, and no additional registers are added. This is illustrated in Figure 8. The algorithm for performing this kind of optimization is described in the next section. Figure 6. Unoptimized expression trees s' 1

4 detected by performing intersections among the set of divisors. 4.2 Optimization algorithm Figure 7. Extracting common expression (A B C) We first calculate the minimum number of registers required for our design. We calculate this by arranging the original expressions in the fastest possible tree structure, and then inserting registers. For example, for the six term expression F = A B C E F, we have the fastest tree structure with three addition steps, and we require one register to synchronize the intermediate values, such that new values for A,B,C,,E,F can be read in every clock cycle. This is illustrated in Figure 9. Figure 8. Extracting common subexpression (AB) 4. Optimization algorithm The goal of our optimization is to reduce the area of the multiplier block by reducing the number of adders and any additional registers required for the fastest implementation of the FIR filter. We first give a brief overview of the common subexpression elimination methods. A detailed description can be found in [1]. We then present the modified optimization algorithm to be used for our work. 4.1 Overview of common subexpression elimination We use a polynomial transformation of constant multiplications. Given a representation for the constant C, and the variable X, the multiplication C*X can be represented as a summation of terms denoting the decomposition of the multiplication into shifts and additions as C*X = ± i i XL (V) The terms can be either positive or negative when the constants are represented using signed digit representations such as the Canonical Signed igit (CS) representation. The exponent of L represents the magnitude of the left shift and the i s represent the digit positions of the non-zero digits of the constants. For example the multiplication 7*X = (100-1) CS *X = X<<3 X = XL 3 X, using the polynomial transformation. We use the divisors to represent all possible common subexpressions. ivisors are obtained from an expression by looking at every pair of terms in the expression and dividing the terms by the minimum exponent of L. For example in the expression F = XL 2 XL 3 XL 5, consider the pair of terms (XL 2 XL 3 ). The minimum exponent of L in the two terms is L 2. ividing by L 2, we get the divisor (X XL). From the other two pairs of terms (XL 2 XL 5 ) and (XL 3 XL 5 ), we get the divisors (X XL 3 ) and (X XL 2 ) respectively. These divisors are significant, because every common subexpression in the set of expressions can be Figure 9. Calculating registers required for fastest evaluation We first generate all the divisors for the set of expressions describing the multiplier block. We then use an iterative algorithm, where we extract the divisor that has the greatest value. To calculate the value of the divisor, we assume that the cost of a registered adder and a register is the same. We calculate the value of a divisor as the number of additions saved by extracting it minus the number of registers that have to be added. After selecting the best divisor, we rewrite the expressions using it. We then generate new divisors from the new terms that have been generated due to rewriting, and add them to the dynamic list of divisors. The iteration stops when there is no valuable divisor remaining in the set of divisors. Consider the expressions shown in Figure 6. We need six registered adders and no additional registers for the fastest evaluation of F 1 and F 2. Now consider the selection of the divisor d 1 = (AB). This divisor saves one addition and does not increase the number of registers. ivisors (A C) and (B C) also have the same value, but (AB) is selected randomly. The expressions are now rewritten as: d 1 = (A B) F 1 = d 1 C F 2 = d 1 C E After rewriting the expressions and forming new divisors, the divisor d 2 = (d 1 C) is considered. This divisor saves one adder, but introduces five additional registers, as can be seen in Figure 7. Therefore this divisor has a value of - 4. No other valuable divisors can be found and the iteration stops. We end up with the expressions shown in Figure 8.

5 ReduceArea( {P i } ) { {P i } = Set of expressions in polynomial form; {} = Set o f divisors = ϕ ; //Step 1: Creating divisors and calculating minimum number of registers required for each expression P i in {P i } { { new } = Findivisors(P i ); Update frequency statistics of divisors in {}; {} = {} { new }; P i ->MinRegisters = Calculate Minimum registers required for fastest evaluation of P i ; } //Step 2: Iterative selection and elimination of best divisor while(1) { } Find d = ivisor in {} with greatest Value; // Value = Num Additions reduced Num Registers Added; if( d == NULL) break; Rewrite affected expressions in {P i } using d; Remove divisors in {} that have become invalid; Update frequency statistics of affected divisors; { new } = Set of new divisors from new terms added by division; {} = {} { new }; } Figure 10. Optimization algorithm to reduce area 5. Experiments The goal of our experiments was to compare the number of resources consumed by our add and shift method with that produced by the cores generated by the commercial Coregen TM tool, based on istributed Arithmetic. Besides the resources, we also compared the power consumption of the two implementations, and also measured the performance. For our experiments, we considered 10 FIR filters of various sizes (6, 10, 13, 20, 28, 41, 61, 71, 119 and 152 tap filters). We targeted the Xilinx Virtex II device for our experiments. The constants were normalized to 16 digit of precision and the input samples were assumed to be 12 bits wide. For the add and shift method, we decomposed all the constant multiplications into additions and shifts and optimized the expressions using the algorithm explained in Section 4.2. We used the Xilinx Integrated Software Environment (ISE) for performing synthesis and implementation of the designs. All the designs were synthesized for maximum performance. Table 1a shows the resources utilized for the various filters and the performance in terms of Million samples per second (Msps) for the filters implemented using the add and shift method. Table 1b, shows the same numbers for the filters implemented using Xilinx Coregen, using the Parallel istributed Arithmetic (PA) method. % Reductio Table 1a. Filter Synthesis using Add Shift method Filter (# taps) Slices s FFs Figure 11 plots the reduction in the number of resources, in terms of the number of Slices, Look Up Tables (s) and the number of Flip Flops (FFs). From the results, we can observe an average reduction of 58.7% in the number of s, and about 25% reduction in the number of slices and FFs. Though our algorithm does not optimize for performance, the synthesis produces better performance in most of the cases, and for the 13 and 20 tap filters, we observe about 26% improvement in performance. Table 1b. Filter Synthesis using Coregen (PA method) Reduction in resources # of taps Figure 11. Reduction in resources Performance (Msps) Filter (# taps) Slices s FFs Performance (Msps) Slices s FFs Figure 12 compares power consumption for our add/shift method versus Coregen TM. From the results we can observe up to 50% reduction in dynamic power consumption. We did not include the quiescent power into our calculation since that value is the same for both methods. The power consumption is the result of applying the same test stimulus to both designs and measuring the power using XPower tools provided by Xilinx ISE software.

6 Power (mw ynamic Power consumption Filter (# taps) Figure 11. Power consumption Add/Shift Coregen Comparison with MAC filters using embedded multipliers Coregen TM can produce FIR filters based on the Multiply Accumulate (MAC) method, which makes use of the embedded multipliers. We implemented some of these filters using the MAC method to compare the resource usage and performance with our add and shift method. ue to tool limitations we had to do the experiments for Virtex IV device and due to extremely long synthesis time for the MAC filter, we only implemented a subset of these filters. We present the synthesis results in terms of number of slices on the Virtex IV device and the performance in Msps in Table 2. Table 2. Comparing with MAC filter on Virtex IV Add Shift MAC Filter Method filter (# taps) Slices Msps Slices Msps From the table, it can be seen that the MAC filter uses fewer number of slices compared to the add-shift method, but it also uses the embedded multipliers. The number of multipliers is equal to the number of taps of the filter. For implementing the 61-tap MAC filter, we had to use a higher capacity device, since the device we were using did not have 62 embedded multipliers. The results show that we achieve higher performance as the filter size increases. 6. Conclusion In this paper we presented a multiplierless technique, based on the add and shift method and common subexpression elimination for low area, low power and high speed implementations of FIR filters. We validated our techniques on Virtex II TM devices where we observed significant area and power reductions over traditional istributed Arithmetic based techniques. In future, we would like to modify our algorithm to make use of the limited number of embedded multipliers available on the FPGA devices. 7. References [1] A.Hosangadi, F.Fallah, and R.Kastner, "Reducing Hardware complexity by iteratively eliminating two term common subexpressions", Asia South Pacific esign Automation Conference (ASP-AC), [2] K..Underwood and K.S.Hemmert, "Closing the Gap: CPU and FPGA Trends in Sustainable Floating- Point BLAS Performance", International Symposium on Field-Programmable Custom Computing Machines, California, USA, [3] L.Zhuo and V.K.Prasanna, "Sparse Matrix-Vector Multiplication on FPGAs", International Symposium on Field Programmable Gate Arrays (FPGA), Monterey, CA, [4] Y.Meng, A.P.Brown, R.A.Iltis, T.Sherwood, H.Lee, and R.Kastner, "MP Core: Algorithm and esign Techniques for Efficient Channel Estimation in Wireless Applications", esign Automation Conference (AC), Anaheim, CA, [5] B. L. Hutchings and B. E. Nelson, "Gigaop SP on FPGA", Acoustics, Speech, and Signal Processing, Proceedings. (ICASSP '01) IEEE International Conference on, [6] A.Alsolaim, J.Becker, M.Glesner, and J.Starzyk, "Architecture and Application of a ynamically Reconfigurable Hardware Array for Future Mobile Communication Systems", International Symposium on Field Programmable Custom Computing Machines (FCCM), [7] S.J.Melnikoff, S.F.uigley, and M.J.Russell, "Implementing a Simple Continuous Speech Recognition System on an FPGA", International Symposium on Field-Programmable Custom Computing Machines (FCCM), [8] T.Yokota, M.Nagafuchi, Y.Mekada, T.Yoshinaga, K.Ootsu, and T.Baba, "A Scalable FPGA-based Custom Computing Machine for Medical Image Processing", International Symposium on Field- Programmable Custom Computing Machines (FCCM), [9] K.Chapman, "Constant Coefficient Multipliers for the XC4000E," Xilinx Technical Report [10] M.J.Wirthlin, "Constant Coefficient Multiplication Using Look-Up Tables," Journal of VLSI Signal Processing, vol. 36, pp. 7-15, [11] K. Wiatr and E. Jamro, "Constant coefficient multiplication in FPGA structures", Euromicro Conference, Proceedings of the 26th, [12] G.R.Goslin, "A Guide to Using Field Programmable Gate Arrays (FPGAs) for Application-Specific igital Signal Processing Performance," Xilinx Application Note, San Jose [13] "istributed Arithmetic FIR Filter v9.0," Xilinx Product Specification [14] M. Yamada and A. Nishihara, "High-speed FIR digital filter with CS coefficients implemented on FPGA", esign Automation Conference, Proceedings of the ASP-AC Asia and South Pacific, 2001.

The Efficient Implementation of Numerical Integration for FPGA Platforms

The Efficient Implementation of Numerical Integration for FPGA Platforms Website: www.ijeee.in (ISSN: 2348-4748, Volume 2, Issue 7, July 2015) The Efficient Implementation of Numerical Integration for FPGA Platforms Hemavathi H Department of Electronics and Communication Engineering

More information

Parallel FIR Filters. Chapter 5

Parallel FIR Filters. Chapter 5 Chapter 5 Parallel FIR Filters This chapter describes the implementation of high-performance, parallel, full-precision FIR filters using the DSP48 slice in a Virtex-4 device. ecause the Virtex-4 architecture

More information

Simultaneous Optimization of Delay and Number of Operations in Multiplierless Implementation of Linear Systems

Simultaneous Optimization of Delay and Number of Operations in Multiplierless Implementation of Linear Systems Simultaneous Optimization of Delay and Number of Operations in Multiplierless Implementation of Linear Systems Abstract The problem of implementing linear systems in hardware by efficiently using shifts

More information

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN Xiaoying Li 1 Fuming Sun 2 Enhua Wu 1, 3 1 University of Macau, Macao, China 2 University of Science and Technology Beijing, Beijing, China

More information

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression Divakara.S.S, Research Scholar, J.S.S. Research Foundation, Mysore Cyril Prasanna Raj P Dean(R&D), MSEC, Bangalore Thejas

More information

Simultaneous Optimization of Delay and Number of Operations in Multiplierless Implementation of Linear Systems

Simultaneous Optimization of Delay and Number of Operations in Multiplierless Implementation of Linear Systems Simultaneous Optimization of Delay and Number of Operations in Multiplierless Implementation of Linear Systems Anup Hosangadi Farzan Fallah Ryan Kastner University of California, Fujitsu Labs of America,

More information

Two High Performance Adaptive Filter Implementation Schemes Using Distributed Arithmetic

Two High Performance Adaptive Filter Implementation Schemes Using Distributed Arithmetic Two High Performance Adaptive Filter Implementation Schemes Using istributed Arithmetic Rui Guo and Linda S. ebrunner Abstract istributed arithmetic (A) is performed to design bit-level architectures for

More information

DESIGN AND IMPLEMENTATION OF DA- BASED RECONFIGURABLE FIR DIGITAL FILTER USING VERILOGHDL

DESIGN AND IMPLEMENTATION OF DA- BASED RECONFIGURABLE FIR DIGITAL FILTER USING VERILOGHDL DESIGN AND IMPLEMENTATION OF DA- BASED RECONFIGURABLE FIR DIGITAL FILTER USING VERILOGHDL [1] J.SOUJANYA,P.G.SCHOLAR, KSHATRIYA COLLEGE OF ENGINEERING,NIZAMABAD [2] MR. DEVENDHER KANOOR,M.TECH,ASSISTANT

More information

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications , Vol 7(4S), 34 39, April 204 ISSN (Print): 0974-6846 ISSN (Online) : 0974-5645 Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications B. Vignesh *, K. P. Sridhar

More information

Implementing FIR Filters

Implementing FIR Filters Implementing FIR Filters in FLEX Devices February 199, ver. 1.01 Application Note 73 FIR Filter Architecture This section describes a conventional FIR filter design and how the design can be optimized

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information

Implementation of Floating Point Multiplier Using Dadda Algorithm

Implementation of Floating Point Multiplier Using Dadda Algorithm Implementation of Floating Point Multiplier Using Dadda Algorithm Abstract: Floating point multiplication is the most usefull in all the computation application like in Arithematic operation, DSP application.

More information

Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design

Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design Manish Kumar *, Dr. R.Ramesh

More information

Implementation of High Speed FIR Filter using Serial and Parallel Distributed Arithmetic Algorithm

Implementation of High Speed FIR Filter using Serial and Parallel Distributed Arithmetic Algorithm Volume 25 No.7, July 211 Implementation of High Speed FIR Filter using Serial and Parallel Distriuted Arithmetic Algorithm Narendra Singh Pal Electronics &Communication, Dr.B.R.Amedkar Tecnology, Jalandhar

More information

MCM Based FIR Filter Architecture for High Performance

MCM Based FIR Filter Architecture for High Performance ISSN No: 2454-9614 MCM Based FIR Filter Architecture for High Performance R.Gopalana, A.Parameswari * Department Of Electronics and Communication Engineering, Velalar College of Engineering and Technology,

More information

A HIGH PERFORMANCE FIR FILTER ARCHITECTURE FOR FIXED AND RECONFIGURABLE APPLICATIONS

A HIGH PERFORMANCE FIR FILTER ARCHITECTURE FOR FIXED AND RECONFIGURABLE APPLICATIONS A HIGH PERFORMANCE FIR FILTER ARCHITECTURE FOR FIXED AND RECONFIGURABLE APPLICATIONS Saba Gouhar 1 G. Aruna 2 gouhar.saba@gmail.com 1 arunastefen@gmail.com 2 1 PG Scholar, Department of ECE, Shadan Women

More information

Design of 2-D DWT VLSI Architecture for Image Processing

Design of 2-D DWT VLSI Architecture for Image Processing Design of 2-D DWT VLSI Architecture for Image Processing Betsy Jose 1 1 ME VLSI Design student Sri Ramakrishna Engineering College, Coimbatore B. Sathish Kumar 2 2 Assistant Professor, ECE Sri Ramakrishna

More information

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies VLSI IMPLEMENTATION OF HIGH PERFORMANCE DISTRIBUTED ARITHMETIC (DA) BASED ADAPTIVE FILTER WITH FAST CONVERGENCE FACTOR G. PARTHIBAN 1, P.SATHIYA 2 PG Student, VLSI Design, Department of ECE, Surya Group

More information

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2 IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 05, 2015 ISSN (online): 2321-0613 VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 10 /Issue 1 / JUN 2018

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 10 /Issue 1 / JUN 2018 A HIGH-PERFORMANCE FIR FILTER ARCHITECTURE FOR FIXED AND RECONFIGURABLE APPLICATIONS S.Susmitha 1 T.Tulasi Ram 2 susmitha449@gmail.com 1 ramttr0031@gmail.com 2 1M. Tech Student, Dept of ECE, Vizag Institute

More information

CHAPTER 4. DIGITAL DOWNCONVERTER FOR WiMAX SYSTEM

CHAPTER 4. DIGITAL DOWNCONVERTER FOR WiMAX SYSTEM CHAPTER 4 IMPLEMENTATION OF DIGITAL UPCONVERTER AND DIGITAL DOWNCONVERTER FOR WiMAX SYSTEM 4.1 Introduction FPGAs provide an ideal implementation platform for developing broadband wireless systems such

More information

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE Anni Benitta.M #1 and Felcy Jeba Malar.M *2 1# Centre for excellence in VLSI Design, ECE, KCG College of Technology, Chennai, Tamilnadu

More information

Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter

Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter African Journal of Basic & Applied Sciences 9 (1): 53-58, 2017 ISSN 2079-2034 IDOSI Publications, 2017 DOI: 10.5829/idosi.ajbas.2017.53.58 Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm

More information

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA FPGA Implementation of 16-Point FFT Core Using NEDA Abhishek Mankar, Ansuman Diptisankar Das and N Prasad Abstract--NEDA is one of the techniques to implement many digital signal processing systems that

More information

An Efficient Implementation of Floating Point Multiplier

An Efficient Implementation of Floating Point Multiplier An Efficient Implementation of Floating Point Multiplier Mohamed Al-Ashrafy Mentor Graphics Mohamed_Samy@Mentor.com Ashraf Salem Mentor Graphics Ashraf_Salem@Mentor.com Wagdy Anis Communications and Electronics

More information

An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication

An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication 2018 IEEE International Conference on Consumer Electronics (ICCE) An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication Ahmet Can Mert, Ercan Kalali, Ilker Hamzaoglu Faculty

More information

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter A.S. Sneka Priyaa PG Scholar Government College of Technology Coimbatore ABSTRACT The Least Mean Square Adaptive Filter is frequently

More information

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering An Efficient Implementation of Double Precision Floating Point Multiplier Using Booth Algorithm Pallavi Ramteke 1, Dr. N. N. Mhala 2, Prof. P. R. Lakhe M.Tech [IV Sem], Dept. of Comm. Engg., S.D.C.E, [Selukate],

More information

Fault Tolerant Parallel Filters Based On Bch Codes

Fault Tolerant Parallel Filters Based On Bch Codes RESEARCH ARTICLE OPEN ACCESS Fault Tolerant Parallel Filters Based On Bch Codes K.Mohana Krishna 1, Mrs.A.Maria Jossy 2 1 Student, M-TECH(VLSI Design) SRM UniversityChennai, India 2 Assistant Professor

More information

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay e-issn: 2349-9745 p-issn: 2393-8161 Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com Area And Power Efficient LMS Adaptive

More information

Comparison of Adders for optimized Exponent Addition circuit in IEEE754 Floating point multiplier using VHDL

Comparison of Adders for optimized Exponent Addition circuit in IEEE754 Floating point multiplier using VHDL International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 11, Issue 07 (July 2015), PP.60-65 Comparison of Adders for optimized Exponent Addition

More information

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC Thangamonikha.A 1, Dr.V.R.Balaji 2 1 PG Scholar, Department OF ECE, 2 Assitant Professor, Department of ECE 1, 2 Sri Krishna

More information

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code COPY RIGHT 2018IJIEMR.Personal use of this material is permitted. Permission from IJIEMR must be obtained for all other uses, in any current or future media, including reprinting/republishing this material

More information

A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT

A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 7, Issue 2 (Jul. - Aug. 2013), PP 13-18 A Novel Approach of Area-Efficient FIR Filter

More information

Design Optimization Techniques Evaluation for High Performance Parallel FIR Filters in FPGA

Design Optimization Techniques Evaluation for High Performance Parallel FIR Filters in FPGA Design Optimization Techniques Evaluation for High Performance Parallel FIR Filters in FPGA Vagner S. Rosa Inst. Informatics - Univ. Fed. Rio Grande do Sul Porto Alegre, RS Brazil vsrosa@inf.ufrgs.br Eduardo

More information

Implementation and Impact of LNS MAC Units in Digital Filter Application

Implementation and Impact of LNS MAC Units in Digital Filter Application Implementation and Impact of LNS MAC Units in Digital Filter Application Hari Krishna Raja.V.S *, Christina Jesintha.R * and Harish.I * * Department of Electronics and Communication Engineering, Sri Shakthi

More information

A Dedicated Hardware Solution for the HEVC Interpolation Unit

A Dedicated Hardware Solution for the HEVC Interpolation Unit XXVII SIM - South Symposium on Microelectronics 1 A Dedicated Hardware Solution for the HEVC Interpolation Unit 1 Vladimir Afonso, 1 Marcel Moscarelli Corrêa, 1 Luciano Volcan Agostini, 2 Denis Teixeira

More information

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm 455 A High Speed Binary Floating Point Multiplier Using Dadda Algorithm B. Jeevan, Asst. Professor, Dept. of E&IE, KITS, Warangal. jeevanbs776@gmail.com S. Narender, M.Tech (VLSI&ES), KITS, Warangal. narender.s446@gmail.com

More information

Implementation of Double Precision Floating Point Multiplier in VHDL

Implementation of Double Precision Floating Point Multiplier in VHDL ISSN (O): 2349-7084 International Journal of Computer Engineering In Research Trends Available online at: www.ijcert.org Implementation of Double Precision Floating Point Multiplier in VHDL 1 SUNKARA YAMUNA

More information

Design and Implementation of Effective Architecture for DCT with Reduced Multipliers

Design and Implementation of Effective Architecture for DCT with Reduced Multipliers Design and Implementation of Effective Architecture for DCT with Reduced Multipliers Susmitha. Remmanapudi & Panguluri Sindhura Dept. of Electronics and Communications Engineering, SVECW Bhimavaram, Andhra

More information

By, Ajinkya Karande Adarsh Yoga

By, Ajinkya Karande Adarsh Yoga By, Ajinkya Karande Adarsh Yoga Introduction Early computer designers believed saving computer time and memory were more important than programmer time. Bug in the divide algorithm used in Intel chips.

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information

High Performance Applications on Reconfigurable Clusters

High Performance Applications on Reconfigurable Clusters High Performance Applications on Reconfigurable Clusters by Zahi Samir Nakad Thesis Submitted to the Faculty of Virginia Polytechnic Institute and State University In partial fulfillment of the requirements

More information

FPGA Matrix Multiplier

FPGA Matrix Multiplier FPGA Matrix Multiplier In Hwan Baek Henri Samueli School of Engineering and Applied Science University of California Los Angeles Los Angeles, California Email: chris.inhwan.baek@gmail.com David Boeck Henri

More information

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS American Journal of Applied Sciences 11 (4): 558-563, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.558.563 Published Online 11 (4) 2014 (http://www.thescipub.com/ajas.toc) PERFORMANCE

More information

An FPGA Implementation of the Powering Function with Single Precision Floating-Point Arithmetic

An FPGA Implementation of the Powering Function with Single Precision Floating-Point Arithmetic An FPGA Implementation of the Powering Function with Single Precision Floating-Point Arithmetic Pedro Echeverría, Marisa López-Vallejo Department of Electronic Engineering, Universidad Politécnica de Madrid

More information

DESIGN AND IMPLEMENTATION BY USING BIT LEVEL TRANSFORMATION OF ADDER TREES FOR MCMS USING VERILOG

DESIGN AND IMPLEMENTATION BY USING BIT LEVEL TRANSFORMATION OF ADDER TREES FOR MCMS USING VERILOG DESIGN AND IMPLEMENTATION BY USING BIT LEVEL TRANSFORMATION OF ADDER TREES FOR MCMS USING VERILOG 1 S. M. Kiranbabu, PG Scholar in VLSI Design, 2 K. Manjunath, Asst. Professor, ECE Department, 1 babu.ece09@gmail.com,

More information

Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications for Efficient FIR Filter Implementation

Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications for Efficient FIR Filter Implementation Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications for Efficient FIR Filter Implementation 1 S. SRUTHI, M.Tech, Vlsi System Design, 2 G. DEVENDAR M.Tech, Asst.Professor, ECE Department,

More information

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017 VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier 1 Katakam Hemalatha,(M.Tech),Email Id: hema.spark2011@gmail.com 2 Kundurthi Ravi Kumar, M.Tech,Email Id: kundurthi.ravikumar@gmail.com

More information

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Performance Analysis of CORDIC Architectures Targeted by FPGA Devices Guddeti Nagarjuna Reddy 1, R.Jayalakshmi 2, Dr.K.Umapathy

More information

Reduction of Latency and Resource Usage in Bit-Level Pipelined Data Paths for FPGAs

Reduction of Latency and Resource Usage in Bit-Level Pipelined Data Paths for FPGAs Reduction of Latency and Resource Usage in Bit-Level Pipelined Data Paths for FPGAs P. Kollig B. M. Al-Hashimi School of Engineering and Advanced echnology Staffordshire University Beaconside, Stafford

More information

Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra Pradesh, India

Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra Pradesh, India Memory-Based Realization of FIR Digital Filter by Look-Up- Table Optimization Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra

More information

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS 1 RONNIE O. SERFA JUAN, 2 CHAN SU PARK, 3 HI SEOK KIM, 4 HYEONG WOO CHA 1,2,3,4 CheongJu University E-maul: 1 engr_serfs@yahoo.com,

More information

Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field

Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field Veerraju kaki Electronics and Communication Engineering, India Abstract- In the present work, a low-complexity

More information

FAST FIR FILTERS FOR SIMD PROCESSORS WITH LIMITED MEMORY BANDWIDTH

FAST FIR FILTERS FOR SIMD PROCESSORS WITH LIMITED MEMORY BANDWIDTH Key words: Digital Signal Processing, FIR filters, SIMD processors, AltiVec. Grzegorz KRASZEWSKI Białystok Technical University Department of Electrical Engineering Wiejska

More information

Design and Analysis of Efficient Reconfigurable Wavelet Filters

Design and Analysis of Efficient Reconfigurable Wavelet Filters Design and Analysis of Efficient Reconfigurable Wavelet Filters Amit Pande and Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Ames, IA 50011 Email: {amit, zambreno}@iastate.edu

More information

TSEA44 - Design for FPGAs

TSEA44 - Design for FPGAs 2015-11-24 Now for something else... Adapting designs to FPGAs Why? Clock frequency Area Power Target FPGA architecture: Xilinx FPGAs with 4 input LUTs (such as Virtex-II) Determining the maximum frequency

More information

LogiCORE IP FIR Compiler v7.0

LogiCORE IP FIR Compiler v7.0 LogiCORE IP FIR Compiler v7.0 Product Guide for Vivado Design Suite Table of Contents IP Facts Chapter 1: Overview Feature Summary.................................................................. 5 Licensing

More information

FIR Filter Architecture for Fixed and Reconfigurable Applications

FIR Filter Architecture for Fixed and Reconfigurable Applications FIR Filter Architecture for Fixed and Reconfigurable Applications Nagajyothi 1,P.Sayannna 2 1 M.Tech student, Dept. of ECE, Sudheer reddy college of Engineering & technology (w), Telangana, India 2 Assosciate

More information

High Level Abstractions for Implementation of Software Radios

High Level Abstractions for Implementation of Software Radios High Level Abstractions for Implementation of Software Radios J. B. Evans, Ed Komp, S. G. Mathen, and G. Minden Information and Telecommunication Technology Center University of Kansas, Lawrence, KS 66044-7541

More information

Measuring Improvement When Using HUB Formats to Implement Floating-Point Systems under Round-to- Nearest

Measuring Improvement When Using HUB Formats to Implement Floating-Point Systems under Round-to- Nearest Measuring Improvement When Using HUB Formats to Implement Floating-Point Systems under Round-to- Nearest Abstract: This paper analyzes the benefits of using half-unitbiased (HUB) formats to implement floatingpoint

More information

INTRODUCTION TO FPGA ARCHITECTURE

INTRODUCTION TO FPGA ARCHITECTURE 3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)

More information

Technology-Dependent Optimization of FIR Filters based on Carry-Save Multiplier and 4:2 Compressor unit

Technology-Dependent Optimization of FIR Filters based on Carry-Save Multiplier and 4:2 Compressor unit ELECTRONICS, VOL. 20, NO. 2, ECEMBER 206 43 Technology-ependent Optimization of FIR Filters based on Carry-Save Multiplier and Compressor unit Burhan Khurshid and Roohie Naaz Abstract - This work presents

More information

OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION

OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION 1 S.Ateeb Ahmed, 2 Mr.S.Yuvaraj 1 Student, Department of Electronics and Communication/ VLSI Design SRM University, Chennai, India 2 Assistant

More information

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER. OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER. A.Anusha 1 R.Basavaraju 2 anusha201093@gmail.com 1 basava430@gmail.com 2 1 PG Scholar, VLSI, Bharath Institute of Engineering

More information

FPGA Polyphase Filter Bank Study & Implementation

FPGA Polyphase Filter Bank Study & Implementation FPGA Polyphase Filter Bank Study & Implementation Raghu Rao Matthieu Tisserand Mike Severa Prof. John Villasenor Image Communications/. Electrical Engineering Dept. UCLA 1 Introduction This document describes

More information

IMPLEMENTATION OF CONFIGURABLE FLOATING POINT MULTIPLIER

IMPLEMENTATION OF CONFIGURABLE FLOATING POINT MULTIPLIER IMPLEMENTATION OF CONFIGURABLE FLOATING POINT MULTIPLIER 1 GOKILA D, 2 Dr. MANGALAM H 1 Department of ECE, United Institute of Technology, Coimbatore, Tamil nadu, India 2 Department of ECE, Sri Krishna

More information

High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications

High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications Pallavi R. Yewale ME Student, Dept. of Electronics and Tele-communication, DYPCOE, Savitribai phule University, Pune,

More information

An FPGA Based Floating Point Arithmetic Unit Using Verilog

An FPGA Based Floating Point Arithmetic Unit Using Verilog An FPGA Based Floating Point Arithmetic Unit Using Verilog T. Ramesh 1 G. Koteshwar Rao 2 1PG Scholar, Vaagdevi College of Engineering, Telangana. 2Assistant Professor, Vaagdevi College of Engineering,

More information

High Speed Systolic Montgomery Modular Multipliers for RSA Cryptosystems

High Speed Systolic Montgomery Modular Multipliers for RSA Cryptosystems High Speed Systolic Montgomery Modular Multipliers for RSA Cryptosystems RAVI KUMAR SATZODA, CHIP-HONG CHANG and CHING-CHUEN JONG Centre for High Performance Embedded Systems Nanyang Technological University

More information

System Verification of Hardware Optimization Based on Edge Detection

System Verification of Hardware Optimization Based on Edge Detection Circuits and Systems, 2013, 4, 293-298 http://dx.doi.org/10.4236/cs.2013.43040 Published Online July 2013 (http://www.scirp.org/journal/cs) System Verification of Hardware Optimization Based on Edge Detection

More information

Comparison of pipelined IEEE-754 standard floating point multiplier with unpipelined multiplier

Comparison of pipelined IEEE-754 standard floating point multiplier with unpipelined multiplier Journal of Scientific & Industrial Research Vol. 65, November 2006, pp. 900-904 Comparison of pipelined IEEE-754 standard floating point multiplier with unpipelined multiplier Kavita Khare 1, *, R P Singh

More information

An Efficient Constant Multiplier Architecture Based On Vertical- Horizontal Binary Common Sub-Expression Elimination Algorithm

An Efficient Constant Multiplier Architecture Based On Vertical- Horizontal Binary Common Sub-Expression Elimination Algorithm Volume-6, Issue-6, November-December 2016 International Journal of Engineering and Management Research Page Number: 229-234 An Efficient Constant Multiplier Architecture Based On Vertical- Horizontal Binary

More information

A Novel Distributed Arithmetic Multiplierless Approach for Computing Complex Inner Products

A Novel Distributed Arithmetic Multiplierless Approach for Computing Complex Inner Products 606 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'5 A ovel Distributed Arithmetic Multiplierless Approach for Computing Complex Inner Products evin. Bowlyn, and azeih M. Botros. Ph.D. Candidate,

More information

A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA

A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA RESEARCH ARTICLE OPEN ACCESS A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Nishi Pandey, Virendra Singh Sagar Institute of Research & Technology Bhopal Abstract Due to

More information

Xilinx DSP. High Performance Signal Processing. January 1998

Xilinx DSP. High Performance Signal Processing. January 1998 DSP High Performance Signal Processing January 1998 New High Performance DSP Alternative New advantages in FPGA technology and tools: DSP offers a new alternative to ASICs, fixed function DSP devices,

More information

Fixed Point LMS Adaptive Filter with Low Adaptation Delay

Fixed Point LMS Adaptive Filter with Low Adaptation Delay Fixed Point LMS Adaptive Filter with Low Adaptation Delay INGUDAM CHITRASEN MEITEI Electronics and Communication Engineering Vel Tech Multitech Dr RR Dr SR Engg. College Chennai, India MR. P. BALAVENKATESHWARLU

More information

FPGA Implementation of a High Speed Multistage Pipelined Adder Based CORDIC Structure for Large Operand Word Lengths

FPGA Implementation of a High Speed Multistage Pipelined Adder Based CORDIC Structure for Large Operand Word Lengths International Journal of Computer Science and Telecommunications [Volume 3, Issue 5, May 2012] 105 ISSN 2047-3338 FPGA Implementation of a High Speed Multistage Pipelined Adder Based CORDIC Structure for

More information

VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS

VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS I.V.VAIBHAV 1, K.V.SAICHARAN 1, B.SRAVANTHI 1, D.SRINIVASULU 2 1 Students of Department of ECE,SACET, Chirala, AP, India 2 Associate

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

Analysis of High-performance Floating-point Arithmetic on FPGAs

Analysis of High-performance Floating-point Arithmetic on FPGAs Analysis of High-performance Floating-point Arithmetic on FPGAs Gokul Govindu, Ling Zhuo, Seonil Choi and Viktor Prasanna Dept. of Electrical Engineering University of Southern California Los Angeles,

More information

ERROR MODELLING OF DUAL FIXED-POINT ARITHMETIC AND ITS APPLICATION IN FIELD PROGRAMMABLE LOGIC

ERROR MODELLING OF DUAL FIXED-POINT ARITHMETIC AND ITS APPLICATION IN FIELD PROGRAMMABLE LOGIC ERROR MODELLING OF DUAL FIXED-POINT ARITHMETIC AND ITS APPLICATION IN FIELD PROGRAMMABLE LOGIC Chun Te Ewe, Peter Y. K. Cheung and George A. Constantinides Department of Electrical & Electronic Engineering,

More information

CHAPTER 4 BLOOM FILTER

CHAPTER 4 BLOOM FILTER 54 CHAPTER 4 BLOOM FILTER 4.1 INTRODUCTION Bloom filter was formulated by Bloom (1970) and is used widely today for different purposes including web caching, intrusion detection, content based routing,

More information

Design and Optimized Implementation of Six-Operand Single- Precision Floating-Point Addition

Design and Optimized Implementation of Six-Operand Single- Precision Floating-Point Addition 2011 International Conference on Advancements in Information Technology With workshop of ICBMG 2011 IPCSIT vol.20 (2011) (2011) IACSIT Press, Singapore Design and Optimized Implementation of Six-Operand

More information

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Fall 2002 EECS150 - Lec06-FPGA Page 1 Outline What are FPGAs? Why use FPGAs (a short history

More information

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm 1 A.Malashri, 2 C.Paramasivam 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India Mapping Signal Processing Algorithms to Architecture Sumam David S Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India sumam@ieee.org Objectives At the

More information

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator M.Chitra Evangelin Christina Associate Professor Department of Electronics and Communication Engineering Francis Xavier

More information

Figurel. TEEE-754 double precision floating point format. Keywords- Double precision, Floating point, Multiplier,FPGA,IEEE-754.

Figurel. TEEE-754 double precision floating point format. Keywords- Double precision, Floating point, Multiplier,FPGA,IEEE-754. AN FPGA BASED HIGH SPEED DOUBLE PRECISION FLOATING POINT MULTIPLIER USING VERILOG N.GIRIPRASAD (1), K.MADHAVA RAO (2) VLSI System Design,Tudi Ramireddy Institute of Technology & Sciences (1) Asst.Prof.,

More information

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015 Design of Dadda Algorithm based Floating Point Multiplier A. Bhanu Swetha. PG.Scholar: M.Tech(VLSISD), Department of ECE, BVCITS, Batlapalem. E.mail:swetha.appari@gmail.com V.Ramoji, Asst.Professor, Department

More information

LOW COMPLEXITY SUBBAND ANALYSIS USING QUADRATURE MIRROR FILTERS

LOW COMPLEXITY SUBBAND ANALYSIS USING QUADRATURE MIRROR FILTERS LOW COMPLEXITY SUBBAND ANALYSIS USING QUADRATURE MIRROR FILTERS Aditya Chopra, William Reid National Instruments Austin, TX, USA Brian L. Evans The University of Texas at Austin Electrical and Computer

More information

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs? EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Outline What are FPGAs? Why use FPGAs (a short history lesson). FPGA variations Internal logic

More information

Implementation of digit serial fir filter using wireless priority service(wps)

Implementation of digit serial fir filter using wireless priority service(wps) Implementation of digit serial fir filter using wireless priority service(wps) S.Aruna Assistant professor,ece department MVSR Engineering College Nadergul,Hyderabad-501510 V.Sravanthi PG Scholar, ECE

More information

FPGA architecture and design technology

FPGA architecture and design technology CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA

More information

APPLICATION NOTE. Block Adaptive Filter. General Description of Adaptive Filters. Adaptive Filter Design Overview

APPLICATION NOTE. Block Adaptive Filter. General Description of Adaptive Filters. Adaptive Filter Design Overview APPLICATION NOTE Block Adaptive Filter XAPP 055 January 9, 1997 (Version 1.1) Application Note by Bill Allaire and Bud Fischer Summary This application note describes a specific design for implementing

More information

University, Patiala, Punjab, India 1 2

University, Patiala, Punjab, India 1 2 1102 Design and Implementation of Efficient Adder based Floating Point Multiplier LOKESH BHARDWAJ 1, SAKSHI BAJAJ 2 1 Student, M.tech, VLSI, 2 Assistant Professor,Electronics and Communication Engineering

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

Embedded Systems: Hardware Components (part I) Todor Stefanov

Embedded Systems: Hardware Components (part I) Todor Stefanov Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System

More information

Implementation Of Quadratic Rotation Decomposition Based Recursive Least Squares Algorithm

Implementation Of Quadratic Rotation Decomposition Based Recursive Least Squares Algorithm 157 Implementation Of Quadratic Rotation Decomposition Based Recursive Least Squares Algorithm Manpreet Singh 1, Sandeep Singh Gill 2 1 University College of Engineering, Punjabi University, Patiala-India

More information

Design and Implementation of 3-D DWT for Video Processing Applications

Design and Implementation of 3-D DWT for Video Processing Applications Design and Implementation of 3-D DWT for Video Processing Applications P. Mohaniah 1, P. Sathyanarayana 2, A. S. Ram Kumar Reddy 3 & A. Vijayalakshmi 4 1 E.C.E, N.B.K.R.IST, Vidyanagar, 2 E.C.E, S.V University

More information