ENERGY EFFICIENT PARAMETERIZED FFT ARCHITECTURE. Ren Chen, Hoang Le, and Viktor K. Prasanna
|
|
- Opal Barrett
- 5 years ago
- Views:
Transcription
1 ENERGY EFFICIENT PARAMETERIZED FFT ARCHITECTURE Ren Chen, Hoang Le, and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California, Los Angeles, USA {renchen, hoangle, ABSTRACT Recently, there has been a growing interest within the research community to improve energy efficiency. In this paper, we revisit the classic Fast Fourier Transform (FFT) for energy efficient designs on FPGAs. Parameterized FFT architecture is proposed to identify design trade-offs in achieving energy efficiency. We first perform design space exploration by varying the algorithm mapping parameters, such as the degree of vertical and horizontal parallelism, that characterize the decomposition based FFT algorithms. After empirical selection on the values of algorithm mapping parameters, an energy-performance-area trade-off design for energy efficiency is identified by varying the architecture parameters, including the type of memory elements, the type of interconnection network and the number of pipeline stages. The tradeoffs between energy, area, and time are analyzed using two performance metrics: the Energy Area Time (EAT) composite metric and the energy efficiency (defined as the number of operations per Joule). From the experimental results, a design space is generated to demonstrate the effect of these parameters on the various performance metrics. For N-point FFT (16 N 124), our designs achieve up to 28% and 38% improvement in the energy efficiency and EAT, respectively, compared with a state-of-the-art design. 1. INTRODUCTION FPGA is a promising implementation technology for computationally intensive applications such as signal, image, and network processing tasks [1, 2]. State-of-the-art FPGAs offer high operating frequency, unprecedented logic density and a host of other features. As FPGAs are programmed specifically for the problem to be solved, they can achieve higher performance with lower power consumption than general purpose processors. Fast Fourier Transform (FFT) is one of the most frequently used kernels for Discrete Fourier Transform (DFT) in a wide variety of image and signal processing applications. Various derivative FFT algorithms have been pro- This work has been funded by DARPA under grant number HR posed and developed. Radix-x Cooley-Tukey algorithm is one of the most popular algorithms for hardware implementation [3, 4, 5, 6]. Most hardware solutions for Radix-x FFT fall into the following categories: delay feedback or delay commutator architectures [4], such as Radix-2 2 single-path delay feedback FFT [4], Radix-4 single-path delay commutator FFT [5], etc. By focusing on circuit level optimizations, these solutions achieved improvement either in throughput, area, or power. Power is a key metric in computing today. To obtain an energy efficient design for FFT, we analyze the tradeoffs between energy, area, and time for fixed-point FFT on a parameterized architecture, using Cooley-Tukey algorithm. Energy efficiency can be obtained both at the algorithm mapping level and the architecture level [7, 8]. Optimizing at these two levels allows power to be effectively traded off with other performance parameters. For example, a design consuming 2 power but achieving 3 system throughput is actually 5% more energy efficient than the original design. We present the design space for the chosen architecture with respect to energy efficiency at the algorithm mapping level. Energy-performance-area trade-off design is achieved at the architecture level by empirical selection on the proposed architecture parameters. In this paper, we make the following contributions: 1. A parameterized architecture of the Radix-4 Cooley- Tukey algorithm for FFT (Section 3.1). 2. A design space that demonstrates the effect of the parameters on the EAT and the energy efficiency metric (Section 4.3.2). 3. Demonstrate improved energy efficiency of the proposed trade-off design by identifying the energy hotspots and varying the proposed architecture parameters (Section 4.3.2). 4. Optimized designs achieving significant improvement in energy efficiency compared with a state-of-the-art design (Section 4.4). The rest of the paper is organized as follows. Section 2 covers the background and related work. Section 3 describes the proposed parameterized architecture and its implementation on FPGA. Section 4 presents experimental results and analysis. Section 5 concludes the paper. 1
2 2. BACKGROUND AND RELATED WORK 2.1. Background Given N complex numbers x,..., x N 1, DFT is computed as: X k = N 1 n= x ne i2πk n N, k =,..., N 1. Radixx Cooley-Tukey FFT is a well know decomposition based algorithm for N-point DFT. Radix-4 FFT is employed in this paper. The description of Radix-4 FFT is presented in Algorithm 1. In terms of the number of real operations, the computational complexity for N-point Radix-4 FFT is O(N log 4 N). The algorithm performs N-point FFT in N/m (m < N) cycles using m Input/Output ports (I/Os) and log 4 N radix blocks, which are used for butterfly computations. The algorithm iteratively decomposes the entire problem into four subproblems. This feature enables us to map Algorithm 1 Radix-4 FFT Algorithm 1: q = N/4; d = N/4; 2: for p := to log 4 N do 3: for k := to 4 p 1 do 4: l = 4kq/4 p ; r = l + q/(4 p 1); 5: tw 1 = w[k]; tw 2 = w[2k]; tw 3 = w[3k]; 6: for i := l to r do 7: t = i; t 1 = i+d/4 p ; t 2 = i+2d/4 p ; t 3 = i+3d/4 p ; 8: do parallel 9: f p+1 [t ] = f p[t ] + f p[t 1 ] + f p[t 2 ] + f p[t 3 ]; 1: f p+1 [t 1 ] = f p[t ] jf p[t 1 ] f p[t 2 ] + jf p[t 3 ]; 11: f p+1 [t 2 ] = f p[t ] f p[t 1 ] + f p[t 2 ] + jf p[t 3 ]; 12: f p+1 [t 3 ] = f p[t ] + jf p[t 1 ] f p[t 2 ] jf p[t 3 ]; 13: end parallel 14: do parallel 15: f p+1 [t ] = f p+1 [t ]; 16: f p+1 [t 1 ] = tw 1 f p+1 [t 1 ]; 17: f p+1 [t 2 ] = tw 2 f p+1 [t 2 ]; 18: f p+1 [t 3 ] = tw 3 x p+1 [t 3 ]; 19: end parallel : end for 21: end for 22: end for the algorithm by folding the FFT architecture vertically or horizontally, thus providing much freedom to implement various designs on FPGAs. We will propose our parameterized architecture in Section 3.2 based on this characteristic Related Work To the best of our knowledge, there has been no previous work targeted at exploring the design space for energy efficiency of FFT at both the algorithm mapping level and the architecture level on FPGAs. Existing work has mainly focused on optimizing the performance, power and area of the design at the circuit level. In [9], the authors designed an energy-efficient 124- point FFT processor. Cache-based FFT algorithm was proposed for achieving low power and high performance. Energytime performance metric was evaluated at different processor operation points. In [1], a high-speed and low-power FFT architecture was presented. They presented a delay balanced pipeline architecture based on split-radix algorithm. Algorithm trade-offs for reducing computation complexity were explored and the architecture was evaluated in area, power and timing performance. Based on Radix-x FFT, various pipeline FFT architectures have been proposed, such as Radix-2 single-path delay feedback FFT [3], Radix-4 single-path delay commutator FFT [5], Radix-2 multi-path delay commutator FFT [6], and Radix-2 2 single-path delay feedback FFT [4]. These architectures can achieve high throughput per unit area with single-path or multi-path pipelines, while energy efficiency has not been explored and evaluated in these works. In [11], a mathematical model for generating DFT soft core was developed. This model can automatically produce an optimized design with user inputs on performance and resource constraints. The resource usage was estimated with available parameters. However, the power and performance estimation have not been presented in this work. In [7], it presented a parameterized FFT architecture for energy efficiency. For energy efficiency, the optimized design was achieved by varying the chosen architecture parameters. Some energy efficient design techniques, such as clock gating and memory binding, are also employed in their work. Other than FPGA, there are also some techniques for energy efficient FFT presented based on other different platforms [12, 13]. However, it is not clear how to apply these techniques on FPGAs. In this work, we extend the work of [7] by design space exploration for energy efficiency at different levels. The design space exploration is performed on the current state-of-the-art FPGAs. By exploring the energy efficiency at two levels, we obtained an energy-performancearea trade-off design for FFT. 3. ARCHITECTURE AND IMPLEMENTATIONS 3.1. Architecture building blocks The proposed N-point FFT architecture is based on the Radix- 4 Cooley-Tukey FFT algorithm. Note that the choice of the radix affects energy efficiency of the design. Compared with Radix-2 algorithm, Radix-4 uses less number of multiply operations. The basic architecture consists of five building blocks (see Fig.1): Radix-4 block (R4), buffer, path permutation (PER), Parallel-to-serial/serial-to-parallel (PS/SP) multiplexer, and twiddle factor computation (). A complete design for a given N-point FFT can be obtained from combinations of the basic blocks. A. Radix-4 block In this module, 16 signed adder/subtractors are used to complete butterfly computations. It takes four inputs and generates four outputs in parallel. Each input data contains real and imaginary components. The data outputs of R4 will be used by the twiddle factor computation block except in the last stage (see Fig. 1a). 2
3 Radix Block X R4 (a) (b) (c) (d) (e) Fig. 1: (a) Radix block, (b) buffer, (c) path permutation (PER), (d) Parallel-to-serial/serial-to-parallel MUX (PS/SP), (e) Twiddle factor computation () Memory entry X X 7 X 1 X 13 X [i] X [i+1] X [i+2] R4 (a) H p = 1,V p = 1 R4 X 2[i] X 2[i+1] X 2[i+2] X 1 X 4 X 11 X 14 X 2 X 5 X 8 X 15 X 3 X 6 X 9 X 12 output in parallel Fig. 2: permutation in the data buffers for 16-point FFT B. buffer The data buffer consists of a dual-port RAM having N/m (m equals to the number of I/Os) entries. is written into one port and read from the other port simultaneously. The data buffers are shown in Fig. 2 where N = 16. In four cycles, 16 permutated data inputs are fed into the data buffers. And in each cycle, with alternating entries, four data outputs are read in parallel. For different architectural parameters, the read and write addresses are generated with different strides. For example, in Fig. 2, four data inputs (X, X 4, X 8, X 12 ) are written in cycle, cycle 1, cycle 2, and cycle 3 respectively. Then they are output simultaneously in cycle 4. C. permutation block Parallel input data are required to be permutated before being processed by the subsequent modules. Fig. 2 shows the data permutation for 16-point FFT. In the first cycle, four data inputs (X, X 1, X 2, X 3 ) are fed into the first entry of each data buffer without permutation. In the second cycle, another four data inputs are written into the second entry of each data buffer with one location permutated. The parallel output data (X i, X i+4, X i+8, X (i+12)mod16, i =, 1, 2, 3) are stored in different RAMs after four cycles. These permutations are repeated for every four cycles. D. PS/SP module This module is used to multiplex serial/parallel input data to output in parallel/serial respectively. As shown in Fig. 3a, the number of I/Os is limited to one, but the radix-4 block still operates on four data inputs in parallel, thus the PS/SP module is employed to match the data rate both before and after the radix-4 block. X [i+3] (b) H p = 2,V p = 4 Fig. 3: Parameterized Architectures for 16-point FFT E. Twiddle factor computation This module consists of two blocks: the twiddle factor generation block and the complex number multiplier block. The twiddle factor generation block includes several lookup tables for storing twiddle factor coefficients, where the data read addresses will be updated with the control signals. The size of the lookup tables will increase with the problem size. The complex number multiplier block consists of three multipliers and three adder/subtractors Parameterized FFT Architecture Algorithm Mapping Parameters X 2[i+3] Decomposition based Radix-4 FFT offers much flexibility to map various architectures. By folding the FFT architecture horizontally or vertically, the radix-4 blocks can be reused iteratively, connected in a pipeline, or replicated to process input data in parallel. Hence we use two algorithm mapping parameters that characterize the decomposition-based N-point FFT algorithm in our design: 1. Horizontal Parallelism (H p ): determines the number of radix blocks used in one pipeline (1 H p log 4 N). 2. Vertical Parallelism (V p ): determines the number of inputs being computed in parallel (1 V p N). V p varies with the number of data channels per pipeline (N c ) and the number of parallel pipelines (N p ), and V p = N c N p. These two proposed architectural parameters are chosen to create a design space. Two different architectures are presented in Fig. 3. In Fig. 3a, V p = N c = N p = 1, H p = 1, N = 16, one radix-4 block is employed and iteratively used by two stages, and one input data is processed per cycle. This architecture achieves higher resource efficiency 3
4 Out Out1 Out2 Out3 In In1 In2 In3 (a) (b) Fig. 4: (a) Crossbar network, (b) Complete binary tree, (c) Dynamic network In In1 In2 In3 (c) Out Out1 Out2 Out3 and consumes less I/Os power consumption, at the expense of the throughput. In Fig. 3b, V p = 4, H p = 2, N = 16, two radix-4 blocks are utilized. There is only one pipeline and N c = 4, N p = 1. All the stages are fully pipelined, and four inputs can be processed in parallel per cycle. Note that there is no feedback path. The architecture achieves high throughput by using more basic blocks and I/Os, while resulting in higher power consumption. We can also increase V p by replicating the basic pipeline. This replication allows several pipelines to work in parallel to significantly increase the throughput at the cost of more complex interconnections Architecture Parameters Three architecture parameters that significantly affect energyefficiency are employed in our design and applied to different components: 1. Type of memory element: BRAM or distributed RAM (dist. RAM) can be used as memories. In our design, both data buffers and twiddle factor lookup tables can be implemented using different memory elements. 2. Type of interconnection: three different types of interconnection (see Fig.4) are used for implementation of data permutation blocks, including crossbar network, complete binary tree, as well as dynamic network. 3. Pipeline depth: Both adder/subtractors and DSP slices in FPGA can be deep pipelined by inserting registers, so we parameterized the arithmetic units and multipliers with pipeline depth in our design to balance the performance and resource usage. According to [14], when used for large size memories, BRAM consumes less power than dist. RAM. Hence this characteristic can be utilized to make a trade-off between power and performance for various problem sizes. As there are 2 m (H p + 1) (when V p = 4, m = 1, otherwise m = ) permutation modules, using different interconnection networks can significantly affect the energy efficiency of the designs. The physical layout of the complete binary tree is similar with that of crossbar network, while it can be inserted with more pipeline registers between the layers of tree. The dynamic network can be implemented by using shift registers. Among three of them, dynamic network can lead to high performance while more power consumption; crossbar network consumes least resource and power while will also bring long wire delay; complete binary tree can be used to release routing burden to improve performance at the expense of more area usage. 4. EXPERIMENTAL RESULTS AND ANALYSIS 4.1. Experimental Setup In this section, we present a detailed analysis of several implementation experiments by varying the parameters. All the designs were implemented in Verilog on Virtex-7 FPGA (XC7VX69T, speed grade -2L) using Xilinx ISE Inputs are 16-bit fixed point complex numbers. The designs were verified by post place-and-route simulation. The reported results are post place-and-route results. We used the SAIF file (Switching Activity Interchange Format) as input to Xilinx XPower Analyzer to produce accurate power dissipation estimation [14] Performance Metrics Two metrics for performance evaluation are considered in this paper: 1. Energy efficiency is defined as the number of operations per unit energy consumed (Energy efficiency = number of operations / energy consumed by the design). For N-point FFT, Energy efficiency is given by (2N log 2 N N log 2 N) / energy consumed by the design, Energy consumed by the design = time taken by the design average power dissipation of the design. Alternatively energy efficiency of the design is Power efficiency (Power efficiency = number of operations per second / Watt). 2. Energy Area Time (EAT) is measured as the product of three important metrics: energy, area, and time. We define Energy in Joules consumed by the design for one transformation of N points. Area is defined as area usage of the design, which is considered as the maximum number of LUTs or flip-flops occupied by the entire design. The area of design using BRAMs is equal to the area usage of the same design when only using dist. RAMs. Time is the latency of N-point FFT Design space exploration In this section, we first present the design space exploration by varying algorithm mapping parameters. Both the dist. RAM based design and the BRAM based design are used in this experiment. The effect of the algorithm mapping parameters on energy efficiency is demonstrated by using 4
5 4, 1 Giga operations/joule 1 log log 1/2 1 Fig. 5: Energy efficiency for various H p with varying N for the dist. RAM based design Giga operations / Joule 1 4, 4 1, 4 1, 1 Fig. 7: Energy efficiency for various V p with varying N for the dist. RAM based design 5 Giga oprations/joule 1 Giga operations / Joule 1 4, 1 4, 4 1, 4 1, 1 Problem Size N Problem Size N Fig. 6: Energy efficiency for various H p with varying N for the BRAM based design Fig. 8: Energy efficiency for various V p with varying N for the BRAM based design the proposed performance metrics. Next we explore the energy-performance-area trade-off design (denoted tradeoff design) by varying the architecture parameters, based on the conclusions of design space exploration in this section Algorithm mapping level exploration A. Horizontal Parallelism In this experiment, we explore the energy efficiency for various horizontal parallelism, and V p = 4, N c = 4, N p = 1. The range of H p is [1, log 4 N]. The energy efficiency for various H p are shown in Fig. 6 and Fig. 5 respectively. Based on the experimental results, we have the following observations: For all the considered problem sizes, increasing horizontal parallelism could significantly improve energy efficiency for both the dist. RAM and BRAM based design. As the problem size N increases, the energy efficiency of the dist. RAM based design declines, whereas, the energy efficiency of the BRAM based design increases. The improvement in energy efficiency brought by increasing H p for the dist. RAM based design is sensitive to N. For example, when N = 124, halving H p only leads to slight performance decline in energy efficiency. Considering reducing H p to save area would be a feasible alternative for larger size problems. The improvement in energy efficiency brought by increasing H p for BRAM based designs is not sensitive to N. Reducing H p to save area is not a feasible approach, which leads to a significant decline in energy efficiency. B. Vertical Parallelism Vertical parallelism is determined by three different values: radix value (fixed at 4), N c, and N p. H p was set as log 4 N. N c and N p were modified for evaluation. Both dist. RAM and BRAM based designs were evaluated. The energy efficiency for various V p are shown in Fig. 7 and Fig. 8. Based on the results, the conclusions are listed as below: Reducing N c leads to performance decline in energy efficiency. BRAM based design is more scalable than dist. RAM based design with respect to energy efficiency. When N 64, the energy efficiency starts to decline for dist. RAM based designs due to high power consumption per access of dist. RAMs with large memory entries. 5
6 Table 1: Architecture parameters of designs for comparison Design A Trade-off Design Design C Giga operations / Joule 5 Memory type Dist. RAM Dist. RAM or BRAM BRAM Interconnection Network Pipeline stages Type Components Multiplier Adder Dynamic network Crossbar network Complete binary tree Regitsers 5 2 LUTs 3 2 LUTs+ Registers Design A Trade-off design Design C 2 1 Fig. 9: Energy efficiency of the trade-off design and the baseline designs Increasing N c instead N p is a more feasible approach to improve energy efficiency. Also there is no much extra resource needed for increasing N c, and we have to replicate the pipeline to increase N p. Although increasing H p leads to a high power and resource consumption, it can produce improvement in energy efficiency due to high throughput Architecture level exploration In this section, the trade-off design is explored at the architecture level. In this experiment, we choose V p = 4 and H p = log 4 N based on previous experimental conclusions. A. Energy hot spots As shown in Fig.1a, dominant portion of the entire power is consumed by the data buffers for 124-point FFT. This indicates that BRAM can be utilized to improve energy efficiency for large values of N. It also suggests that I/Os consumes a major power for small values of N. Fig.1b shows that the core power consumption except I/O power and static power is dominant in the entire power for BRAM-based designs. And we observe that pipeline registers are the energy hot-spots among the architecture components. B. Trade-off design By varying the architecture parameters, a set of implementations have been evaluated in this experiment. The analysis of effects of the architecture parameters on power, performance, and area is performed as below: Static Power 9% Radix block 4% I/O PER power 3% 8% 4% buffer 72% (a) Dist. RAM based design I/O power 26% Static Power 13% Radix block 13% PER 8% 12% buffer 28% (b) BRAM based design Fig. 1: % power consumed by the components for 124- point FFT architecture Energy: Reducing the number of registers can significantly reduce signal power, which is dominant in the dynamic power. Crossbar network can be evaluated to increase energy efficiency. Performance: Using BRAM can lead to a decline in peak operating frequency. For large values of N, when using BRAMs, extra pipeline stages can be used to solve the performance degradation issue. Area: Area usage of pipeline registers is dominant in the entire design area. Pipeline registers can be balanced to obtain the trade-off design between area and performance. The analysis above has been applied to achieve the tradeoff design in our experiment and serves as a guide for design space exploration. As shown in Table1, we use two baseline designs to compare with our proposed trade-off design. The architecture parameters of the designs for comparison are shown in Table1. The comparison results of the designs on energy efficiency are shown in Fig.9. It shows that the energy efficiency can be improved up to 27% by our proposed trade-off design, compared with the other two baseline designs Performance comparison We finally use SPIRAL FFT IP core to compare with our proposed trade-off design. The SPIRAL DFT/FFT IP Generator can automatically generate customized DFT soft IP cores in synthesizable RTL Verilog with user inputs [11]. The available parameters of the DFT core generator include transform size, data precision, etc. In this comparison, we use the dist. RAM based design for N 64 and the BRAM based design for N > 64. For the design from SPIRAL, the codes of N-point (16-bit fixed point) FFT are automatically generated by the SPIRAL Core generator. The architecture is fully streaming and the data are presented in their natural ordering. As shown in Fig. 11, our proposed design improves energy-efficiency by 8% to 28% and EAT by 23% to 38%, respectively, compared with the SPIRAL FFT IP Cores. 6
7 Giga operations / Joule Energy efficency of our design Energy efficiency of SPIRAL FFT IP Core (EAT of SPIRAL IP CORE) / (EAT of our design) Fig. 11: Comparison between the proposed trade off design and the SPIRAL FFT IP Cores for EAT and energy efficiency 5. CONCLUSION In this work, we presented a parameterized architecture for energy efficiency using Radix-4 Cooley-Tukey FFT algorithm. The effect of the multi-level parameters on energyefficiency was demonstrated by using design space exploration. We studied the power consumption of the components for various problem sizes, and proposed our tradeoff design by empirical selection on architecture parameters. Compared with the state-of-the-art design, our optimized architectures achieve up to 28% and 38% improvement in the energy efficiency and EAT respectively. In the future we plan to work on an accurate high-level performance model for energy-efficiency estimation, which can be used to accelerate design space exploration to obtain an energy efficient design. EAT Ratio [6] L. R. Rabiner and B. Gold, Theory and application of digital signal processing, Englewood Cliffs, NJ, Prentice-Hall, Inc., p., vol. 1. [7] S. Choi, R. Scrofano, V. K. Prasanna, and J.-W. Jang, Energy-efficient signal processing using FPGAs, in Proceedings of the 3 FPGA, pp [8] D. Aravind and A. Sudarsanam, High level -Application Analysis Techniques Architectures - To Explore Design possibilities for Reduced Reconfiguration Area Overheads in FP- GAs executing Compute Intensive Applications, in Proc. of IPDPS, 5, pp. 158a 158a. [9] B. Baas, A low-power, high-performance, 124-point FFT processor, IEEE Journal of Solid-State Circuits, vol. 34, no. 3, pp , [1] C.-W. J. Wen-Chang Yeh, High-speed and low-power splitradix FFT, IEEE Transactions on Signal Processing, vol. 51, no. 3, pp , 3. [11] P. A. Milder, M. Ahmad, J. C. Hoe, and M. Püschel, Fast and accurate resource estimation of automatically generated custom DFT IP cores, in Proceedings of the 6 FPGA, pp [12] T. Sugimura, H. Yamasaki, H. Noda, O. Yamamoto, Y. Okuno, and K. Arimoto, A high-performance and energyefficient FFT implementation on super parallel processor (MX) for mobile multimedia applications, in International Symposium on Intelligent Signal Processing and Communications Systems, 9, pp [13] H. Kimura, H. Nakamura, S. Kimura, and N. Yoshimoto, Numerical analysis of dynamic snr management by controlling dsp calculation precision for energy-efficient ofdmpon, Photonics Technology Letters, IEEE, vol. 24, no. 23, pp , 12. [14] XST User Guide for Virtex-6, Spartan-6, and 7 Series Devices, 6. REFERENCES [1] N. Shirazi, P. M. Athanas, and A. L. Abbott, Implementation of a 2-D Fast Fourier Transform on an FPGA-Based Custom Computing Machine, in Field-Programmable Logic and Applications, 1995, pp [2] D. Chen, G. Yao, C. Koc, and R. Cheung, Low complexity and hardware-friendly spectral modular multiplication, in International Conference on Field-Programmable Technology (FPT), 12, pp [3] E. H. Wold and A. M. Despain, Pipeline and parallelpipeline FFT processors for VLSI implementations, IEEE Transactions on Computers, vol. 1, no. 5, pp , [4] S. He and M. Torkelson, A new approach to pipeline FFT processor, in Proceedings of IPPS 96, pp [5] G. Bi and E. Jones, A pipelined FFT processor for wordsequential data, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 37, no. 12, pp ,
ENERGY EFFICIENT PARAMETERIZED FFT ARCHITECTURE. Ren Chen, Hoang Le, and Viktor K. Prasanna
ENERGY EFFICIENT PARAMETERIZED FFT ARCHITECTURE Ren Chen, Hoang Le, and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California, Los Angeles, USA 989 Email:
More informationHigh Throughput Energy Efficient Parallel FFT Architecture on FPGAs
High Throughput Energy Efficient Parallel FFT Architecture on FPGAs Ren Chen Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, USA 989 Email: renchen@usc.edu
More informationEnergy Optimizations for FPGA-based 2-D FFT Architecture
Energy Optimizations for FPGA-based 2-D FFT Architecture Ren Chen and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Ganges.usc.edu/wiki/TAPAS Outline
More informationAnalysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope
Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope G. Mohana Durga 1, D.V.R. Mohan 2 1 M.Tech Student, 2 Professor, Department of ECE, SRKR Engineering College, Bhimavaram, Andhra
More informationScalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA
Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089
More informationFrequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM
More informationOPTIMIZING INTERCONNECTION COMPLEXITY FOR REALIZING FIXED PERMUTATION IN DATA AND SIGNAL PROCESSING ALGORITHMS. Ren Chen and Viktor K.
OPTIMIZING INTERCONNECTION COMPLEXITY FOR REALIZING FIXED PERMUTATION IN DATA AND SIGNAL PROCESSING ALGORITHMS Ren Chen and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University
More informationThe Serial Commutator FFT
The Serial Commutator FFT Mario Garrido Gálvez, Shen-Jui Huang, Sau-Gee Chen and Oscar Gustafsson Journal Article N.B.: When citing this work, cite the original article. 2016 IEEE. Personal use of this
More informationON CONFIGURATION OF RESIDUE SCALING PROCESS IN PIPELINED RADIX-4 MQRNS FFT PROCESSOR
POZNAN UNIVE RSITY OF TE CHNOLOGY ACADE MIC JOURNALS No 80 Electrical Engineering 2014 Robert SMYK* Maciej CZYŻAK* ON CONFIGURATION OF RESIDUE SCALING PROCESS IN PIPELINED RADIX-4 MQRNS FFT PROCESSOR Residue
More informationEvaluating Energy Efficiency of Floating Point Matrix Multiplication on FPGAs
Evaluating Energy Efficiency of Floating Point Matrix Multiplication on FPGAs Kiran Kumar Matam Computer Science Department University of Southern California Email: kmatam@usc.edu Hoang Le and Viktor K.
More informationAN FFT PROCESSOR BASED ON 16-POINT MODULE
AN FFT PROCESSOR BASED ON 6-POINT MODULE Weidong Li, Mark Vesterbacka and Lars Wanhammar Electronics Systems, Dept. of EE., Linköping University SE-58 8 LINKÖPING, SWEDEN E-mail: {weidongl, markv, larsw}@isy.liu.se,
More informationA Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms
A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern California Los Angeles, California,
More informationLinköping University Post Print. Analysis of Twiddle Factor Memory Complexity of Radix-2^i Pipelined FFTs
Linköping University Post Print Analysis of Twiddle Factor Complexity of Radix-2^i Pipelined FFTs Fahad Qureshi and Oscar Gustafsson N.B.: When citing this work, cite the original article. 200 IEEE. Personal
More informationAbstract. Literature Survey. Introduction. A.Radix-2/8 FFT algorithm for length qx2 m DFTs
Implementation of Split Radix algorithm for length 6 m DFT using VLSI J.Nancy, PG Scholar,PSNA College of Engineering and Technology; S.Bharath,Assistant Professor,PSNA College of Engineering and Technology;J.Wilson,Assistant
More informationResearch Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-6, Issue-8) Abstract:
International Journal of Emerging Research in Management &Technology Research Article August 27 Design and Implementation of Fast Fourier Transform (FFT) using VHDL Code Akarshika Singhal, Anjana Goen,
More informationKeywords: Fast Fourier Transforms (FFT), Multipath Delay Commutator (MDC), Pipelined Architecture, Radix-2 k, VLSI.
ww.semargroup.org www.ijvdcs.org ISSN 2322-0929 Vol.02, Issue.05, August-2014, Pages:0294-0298 Radix-2 k Feed Forward FFT Architectures K.KIRAN KUMAR 1, M.MADHU BABU 2 1 PG Scholar, Dept of VLSI & ES,
More informationAnalysis of High-performance Floating-point Arithmetic on FPGAs
Analysis of High-performance Floating-point Arithmetic on FPGAs Gokul Govindu, Ling Zhuo, Seonil Choi and Viktor Prasanna Dept. of Electrical Engineering University of Southern California Los Angeles,
More informationTOPICS PIPELINE IMPLEMENTATIONS OF THE FAST FOURIER TRANSFORM (FFT) DISCRETE FOURIER TRANSFORM (DFT) INVERSE DFT (IDFT) Consulted work:
1 PIPELINE IMPLEMENTATIONS OF THE FAST FOURIER TRANSFORM (FFT) Consulted work: Chiueh, T.D. and P.Y. Tsai, OFDM Baseband Receiver Design for Wireless Communications, John Wiley and Sons Asia, (2007). Second
More informationNovel design of multiplier-less FFT processors
Signal Processing 8 (00) 140 140 www.elsevier.com/locate/sigpro Novel design of multiplier-less FFT processors Yuan Zhou, J.M. Noras, S.J. Shepherd School of EDT, University of Bradford, Bradford, West
More informationMulti-Gigahertz Parallel FFTs for FPGA and ASIC Implementation
Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation Doug Johnson, Applications Consultant Chris Eddington, Technical Marketing Synopsys 2013 1 Synopsys, Inc. 700 E. Middlefield Road Mountain
More informationDESIGN OF PARALLEL PIPELINED FEED FORWARD ARCHITECTURE FOR ZERO FREQUENCY & MINIMUM COMPUTATION (ZMC) ALGORITHM OF FFT
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 4, Apr 2014, 199-206 Impact Journals DESIGN OF PARALLEL PIPELINED
More informationTowards Performance Modeling of 3D Memory Integrated FPGA Architectures
Towards Performance Modeling of 3D Memory Integrated FPGA Architectures Shreyas G. Singapura, Anand Panangadan and Viktor K. Prasanna University of Southern California, Los Angeles CA 90089, USA, {singapur,
More informationModified Welch Power Spectral Density Computation with Fast Fourier Transform
Modified Welch Power Spectral Density Computation with Fast Fourier Transform Sreelekha S 1, Sabi S 2 1 Department of Electronics and Communication, Sree Budha College of Engineering, Kerala, India 2 Professor,
More informationLow-Power Split-Radix FFT Processors Using Radix-2 Butterfly Units
Low-Power Split-Radix FFT Processors Using Radix-2 Butterfly Units Abstract: Split-radix fast Fourier transform (SRFFT) is an ideal candidate for the implementation of a lowpower FFT processor, because
More informationMULTIPLIERLESS HIGH PERFORMANCE FFT COMPUTATION
MULTIPLIERLESS HIGH PERFORMANCE FFT COMPUTATION Maheshwari.U 1, Josephine Sugan Priya. 2, 1 PG Student, Dept Of Communication Systems Engg, Idhaya Engg. College For Women, 2 Asst Prof, Dept Of Communication
More informationComputing the Discrete Fourier Transform on FPGA Based Systolic Arrays
Computing the Discrete Fourier Transform on FPGA Based Systolic Arrays Chris Dick School of Electronic Engineering La Trobe University Melbourne 3083, Australia Abstract Reconfigurable logic arrays allow
More informationFPGA Based Design and Simulation of 32- Point FFT Through Radix-2 DIT Algorith
FPGA Based Design and Simulation of 32- Point FFT Through Radix-2 DIT Algorith Sudhanshu Mohan Khare M.Tech (perusing), Dept. of ECE Laxmi Naraian College of Technology, Bhopal, India M. Zahid Alam Associate
More informationEnergy and Memory Efficient Mapping of Bitonic Sorting on FPGA
Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA Ren Chen, Sruja Siriyal, Viktor Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California, Los Angeles, USA
More informationResearch Article Design of A Novel 8-point Modified R2MDC with Pipelined Technique for High Speed OFDM Applications
Research Journal of Applied Sciences, Engineering and Technology 7(23): 5021-5025, 2014 DOI:10.19026/rjaset.7.895 ISSN: 2040-7459; e-issn: 2040-7467 2014 Maxwell Scientific Publication Corp. Submitted:
More informationSFF The Single-Stream FPGA-Optimized Feedforward FFT Hardware Architecture
Journal of Signal Processing Systems (2018) 90:1583 1592 https://doi.org/10.1007/s11265-018-1370-y SFF The Single-Stream FPGA-Optimized Feedforward FFT Hardware Architecture Carl Ingemarsson 1 Oscar Gustafsson
More informationSTUDY OF A CORDIC BASED RADIX-4 FFT PROCESSOR
STUDY OF A CORDIC BASED RADIX-4 FFT PROCESSOR 1 AJAY S. PADEKAR, 2 S. S. BELSARE 1 BVDU, College of Engineering, Pune, India 2 Department of E & TC, BVDU, College of Engineering, Pune, India E-mail: ajay.padekar@gmail.com,
More informationA Model-based Methodology for Application Specific Energy Efficient Data Path Design using FPGAs
A Model-based Methodology for Application Specific Energy Efficient Data Path Design using FPGAs Sumit Mohanty 1, Seonil Choi 1, Ju-wook Jang 2, Viktor K. Prasanna 1 1 Dept. of Electrical Engg. 2 Dept.
More informationA SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN
A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN Xiaoying Li 1 Fuming Sun 2 Enhua Wu 1, 3 1 University of Macau, Macao, China 2 University of Science and Technology Beijing, Beijing, China
More informationDESIGN METHODOLOGY. 5.1 General
87 5 FFT DESIGN METHODOLOGY 5.1 General The fast Fourier transform is used to deliver a fast approach for the processing of data in the wireless transmission. The Fast Fourier Transform is one of the methods
More informationA 4096-Point Radix-4 Memory-Based FFT Using DSP Slices
A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices Mario Garrido Gálvez, Miguel Angel Sanchez, Maria Luisa Lopez-Vallejo and Jesus Grajal Journal Article N.B.: When citing this work, cite the original
More informationFused Floating Point Arithmetic Unit for Radix 2 FFT Implementation
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 2, Ver. I (Mar. -Apr. 2016), PP 58-65 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Fused Floating Point Arithmetic
More informationFPGA Matrix Multiplier
FPGA Matrix Multiplier In Hwan Baek Henri Samueli School of Engineering and Applied Science University of California Los Angeles Los Angeles, California Email: chris.inhwan.baek@gmail.com David Boeck Henri
More informationLOW-POWER SPLIT-RADIX FFT PROCESSORS
LOW-POWER SPLIT-RADIX FFT PROCESSORS Avinash 1, Manjunath Managuli 2, Suresh Babu D 3 ABSTRACT To design a split radix fast Fourier transform is an ideal person for the implementing of a low-power FFT
More informationUser Manual for FC100
Sundance Multiprocessor Technology Limited User Manual Form : QCF42 Date : 6 July 2006 Unit / Module Description: IEEE-754 Floating-point FPGA IP Core Unit / Module Number: FC100 Document Issue Number:
More informationINTRODUCTION TO FPGA ARCHITECTURE
3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)
More informationTwiddle Factor Transformation for Pipelined FFT Processing
Twiddle Factor Transformation for Pipelined FFT Processing In-Cheol Park, WonHee Son, and Ji-Hoon Kim School of EECS, Korea Advanced Institute of Science and Technology, Daejeon, Korea icpark@ee.kaist.ac.kr,
More informationAdaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design
International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design Manish Kumar *, Dr. R.Ramesh
More informationParallelism in Spiral
Parallelism in Spiral Franz Franchetti and the Spiral team (only part shown) Electrical and Computer Engineering Carnegie Mellon University Joint work with Yevgen Voronenko Markus Püschel This work was
More informationLow Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm
Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm 1 A.Malashri, 2 C.Paramasivam 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,
More informationVerilog for High Performance
Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes
More informationAn Area Efficient Mixed Decimation MDF Architecture for Radix. Parallel FFT
An Area Efficient Mixed Decimation MDF Architecture for Radix Parallel FFT Reshma K J 1, Prof. Ebin M Manuel 2 1M-Tech, Dept. of ECE Engineering, Government Engineering College, Idukki, Kerala, India 2Professor,
More informationResource-efficient Acceleration of 2-Dimensional Fast Fourier Transform Computations on FPGAs
In Proceedings of the International Conference on Distributed Smart Cameras, Como, Italy, August 2009. Resource-efficient Acceleration of 2-Dimensional Fast Fourier Transform Computations on FPGAs Hojin
More informationComputer Generation of IP Cores
A I n Computer Generation of IP Cores Peter Milder (ECE, Carnegie Mellon) James Hoe (ECE, Carnegie Mellon) Markus Püschel (CS, ETH Zürich) addfxp #(16, 1) add15282(.a(a69),.b(a70),.clk(clk),.q(t45)); addfxp
More informationComparison of Adders for optimized Exponent Addition circuit in IEEE754 Floating point multiplier using VHDL
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 11, Issue 07 (July 2015), PP.60-65 Comparison of Adders for optimized Exponent Addition
More informationFPGA Implementation of Discrete Fourier Transform Using CORDIC Algorithm
AMSE JOURNALS-AMSE IIETA publication-2017-series: Advances B; Vol. 60; N 2; pp 332-337 Submitted Apr. 04, 2017; Revised Sept. 25, 2017; Accepted Sept. 30, 2017 FPGA Implementation of Discrete Fourier Transform
More informationAn Efficient High Speed VLSI Architecture Based 16-Point Adaptive Split Radix-2 FFT Architecture
IJSTE - International Journal of Science Technology & Engineering Volume 2 Issue 10 April 2016 ISSN (online): 2349-784X An Efficient High Speed VLSI Architecture Based 16-Point Adaptive Split Radix-2 FFT
More informationEfficient Self-Reconfigurable Implementations Using On-Chip Memory
10th International Conference on Field Programmable Logic and Applications, August 2000. Efficient Self-Reconfigurable Implementations Using On-Chip Memory Sameer Wadhwa and Andreas Dandalis University
More informationCore Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items
(FFT_PIPE) Product Specification Dillon Engineering, Inc. 4974 Lincoln Drive Edina, MN USA, 55436 Phone: 952.836.2413 Fax: 952.927.6514 E mail: info@dilloneng.com URL: www.dilloneng.com Core Facts Documentation
More informationImplementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics
Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics Yojana Jadhav 1, A.P. Hatkar 2 PG Student [VLSI & Embedded system], Dept. of ECE, S.V.I.T Engineering College, Chincholi,
More informationEECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)
EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Fall 2002 EECS150 - Lec06-FPGA Page 1 Outline What are FPGAs? Why use FPGAs (a short history
More informationHigh-Performance 16-Point Complex FFT Features 1 Functional Description 2 Theory of Operation
High-Performance 16-Point Complex FFT April 8, 1999 Application Note This document is (c) Xilinx, Inc. 1999. No part of this file may be modified, transmitted to any third party (other than as intended
More informationA High-Performance and Energy-efficient Architecture for Floating-point based LU Decomposition on FPGAs
A High-Performance and Energy-efficient Architecture for Floating-point based LU Decomposition on FPGAs Gokul Govindu, Seonil Choi, Viktor Prasanna Dept. of Electrical Engineering-Systems University of
More informationPyGen: A MATLAB/Simulink Based Tool for Synthesizing Parameterized and Energy Efficient Designs Using FPGAs
PyGen: A MATLAB/Simulink Based Tool for Synthesizing Parameterized and Energy Efficient Designs Using FPGAs Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern
More informationA scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment
LETTER IEICE Electronics Express, Vol.11, No.2, 1 9 A scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment Ting Chen a), Hengzhu Liu, and Botao Zhang College of
More informationOutline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?
EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Outline What are FPGAs? Why use FPGAs (a short history lesson). FPGA variations Internal logic
More informationDUE to the high computational complexity and real-time
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen
More informationA Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO
2402 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016 A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO Antony Xavier Glittas,
More informationInternational Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering
An Efficient Implementation of Double Precision Floating Point Multiplier Using Booth Algorithm Pallavi Ramteke 1, Dr. N. N. Mhala 2, Prof. P. R. Lakhe M.Tech [IV Sem], Dept. of Comm. Engg., S.D.C.E, [Selukate],
More informationHigh-Speed and Low-Power Split-Radix FFT
864 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 3, MARCH 2003 High-Speed and Low-Power Split-Radix FFT Wen-Chang Yeh and Chein-Wei Jen Abstract This paper presents a novel split-radix fast Fourier
More informationFAST FOURIER TRANSFORM (FFT) and inverse fast
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 11, NOVEMBER 2004 2005 A Dynamic Scaling FFT Processor for DVB-T Applications Yu-Wei Lin, Hsuan-Yu Liu, and Chen-Yi Lee Abstract This paper presents an
More informationHigh-throughput Online Hash Table on FPGA*
High-throughput Online Hash Table on FPGA* Da Tong, Shijie Zhou, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 989 Email: datong@usc.edu,
More informationHigh Performance Pipelined Design for FFT Processor based on FPGA
High Performance Pipelined Design for FFT Processor based on FPGA A.A. Raut 1, S. M. Kate 2 1 Sinhgad Institute of Technology, Lonavala, Pune University, India 2 Sinhgad Institute of Technology, Lonavala,
More informationA Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA
RESEARCH ARTICLE OPEN ACCESS A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Nishi Pandey, Virendra Singh Sagar Institute of Research & Technology Bhopal Abstract Due to
More informationAccelerating Equi-Join on a CPU-FPGA Heterogeneous Platform*
Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform* Ren Chen and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California, Los Angeles, USA 90089 Email:
More informationDesign of a Floating-Point Fused Add-Subtract Unit Using Verilog
International Journal of Electronics and Computer Science Engineering 1007 Available Online at www.ijecse.org ISSN- 2277-1956 Design of a Floating-Point Fused Add-Subtract Unit Using Verilog Mayank Sharma,
More informationParallelized Radix-4 Scalable Montgomery Multipliers
Parallelized Radix-4 Scalable Montgomery Multipliers Nathaniel Pinckney and David Money Harris 1 1 Harvey Mudd College, 301 Platt. Blvd., Claremont, CA, USA e-mail: npinckney@hmc.edu ABSTRACT This paper
More informationA HIGH PERFORMANCE FIR FILTER ARCHITECTURE FOR FIXED AND RECONFIGURABLE APPLICATIONS
A HIGH PERFORMANCE FIR FILTER ARCHITECTURE FOR FIXED AND RECONFIGURABLE APPLICATIONS Saba Gouhar 1 G. Aruna 2 gouhar.saba@gmail.com 1 arunastefen@gmail.com 2 1 PG Scholar, Department of ECE, Shadan Women
More informationPERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS
American Journal of Applied Sciences 11 (4): 558-563, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.558.563 Published Online 11 (4) 2014 (http://www.thescipub.com/ajas.toc) PERFORMANCE
More informationFPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA
FPGA Implementation of 16-Point FFT Core Using NEDA Abhishek Mankar, Ansuman Diptisankar Das and N Prasad Abstract--NEDA is one of the techniques to implement many digital signal processing systems that
More informationHigh Speed Systolic Montgomery Modular Multipliers for RSA Cryptosystems
High Speed Systolic Montgomery Modular Multipliers for RSA Cryptosystems RAVI KUMAR SATZODA, CHIP-HONG CHANG and CHING-CHUEN JONG Centre for High Performance Embedded Systems Nanyang Technological University
More informationDESIGN AND IMPLEMENTATION OF DA- BASED RECONFIGURABLE FIR DIGITAL FILTER USING VERILOGHDL
DESIGN AND IMPLEMENTATION OF DA- BASED RECONFIGURABLE FIR DIGITAL FILTER USING VERILOGHDL [1] J.SOUJANYA,P.G.SCHOLAR, KSHATRIYA COLLEGE OF ENGINEERING,NIZAMABAD [2] MR. DEVENDHER KANOOR,M.TECH,ASSISTANT
More informationParallel FFT Program Optimizations on Heterogeneous Computers
Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid
More informationEnergy Efficient Adaptive Beamforming on Sensor Networks
Energy Efficient Adaptive Beamforming on Sensor Networks Viktor K. Prasanna Bhargava Gundala, Mitali Singh Dept. of EE-Systems University of Southern California email: prasanna@usc.edu http://ceng.usc.edu/~prasanna
More informationHardware Description of Multi-Directional Fast Sobel Edge Detection Processor by VHDL for Implementing on FPGA
Hardware Description of Multi-Directional Fast Sobel Edge Detection Processor by VHDL for Implementing on FPGA Arash Nosrat Faculty of Engineering Shahid Chamran University Ahvaz, Iran Yousef S. Kavian
More informationFixed Point Streaming Fft Processor For Ofdm
Fixed Point Streaming Fft Processor For Ofdm Sudhir Kumar Sa Rashmi Panda Aradhana Raju Abstract Fast Fourier Transform (FFT) processors are today one of the most important blocks in communication systems.
More informationFPGAs: THE HIGH-END ALTERNATIVE FOR DSP APPLICATIONS. By Dr. Chris Dick
THE HIGH-END ALTERNATIVE FOR D APPLICATIONS By Dr. Chris Dick Engineers have been using field programmable gate arrays (FPGAs) to build high performance D systems for several years. FPGAs are uniquely
More informationThe Efficient Implementation of Numerical Integration for FPGA Platforms
Website: www.ijeee.in (ISSN: 2348-4748, Volume 2, Issue 7, July 2015) The Efficient Implementation of Numerical Integration for FPGA Platforms Hemavathi H Department of Electronics and Communication Engineering
More informationDynamically Configurable Online Statistical Flow Feature Extractor on FPGA
Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA Da Tong, Viktor Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Email: {datong, prasanna}@usc.edu
More informationA Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors
A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,
More information16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.
16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE. AditiPandey* Electronics & Communication,University Institute of Technology,
More informationImage Compression System on an FPGA
Image Compression System on an FPGA Group 1 Megan Fuller, Ezzeldin Hamed 6.375 Contents 1 Objective 2 2 Background 2 2.1 The DFT........................................ 3 2.2 The DCT........................................
More informationAccelerating Equi-Join on a CPU-FPGA Heterogeneous Platform
Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform Ren Chen, Viktor Prasanna Computer Engineering Technical Report Number CENG-05- Ming Hsieh Department of Electrical Engineering Systems University
More informationFPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression
FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression Divakara.S.S, Research Scholar, J.S.S. Research Foundation, Mysore Cyril Prasanna Raj P Dean(R&D), MSEC, Bangalore Thejas
More informationFAST Fourier transform (FFT) is an important signal processing
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 54, NO. 4, APRIL 2007 889 Balanced Binary-Tree Decomposition for Area-Efficient Pipelined FFT Processing Hyun-Yong Lee, Student Member,
More informationA Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter
A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter A.S. Sneka Priyaa PG Scholar Government College of Technology Coimbatore ABSTRACT The Least Mean Square Adaptive Filter is frequently
More informationENERGY, AREA AND SPEED OPTIMIZED SIGNAL PROCESSING ON FPGA
ENERGY, AREA AND SPEED OPTIMIZED SIGNAL PROCESSING ON FPGA A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Technology In VLSI design and embedded system By DURGA
More informationPower Spectral Density Computation using Modified Welch Method
IJSTE - International Journal of Science Technology & Engineering Volume 2 Issue 4 October 2015 ISSN (online): 2349-784X Power Spectral Density Computation using Modified Welch Method Betsy Elina Thomas
More informationOverview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips
Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,
More informationVHDL for Synthesis. Course Description. Course Duration. Goals
VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes
More informationCreating Parameterized and Energy-Efficient System Generator Designs
Creating Parameterized and Energy-Efficient System Generator Designs Jingzhao Ou, Seonil Choi, Gokul Govindu, and Viktor K. Prasanna EE - Systems, University of Southern California {ouj,seonilch,govindu,prasanna}@usc.edu
More informationIEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers
International Journal of Research in Computer Science ISSN 2249-8257 Volume 1 Issue 1 (2011) pp. 1-7 White Globe Publications www.ijorcs.org IEEE-754 compliant Algorithms for Fast Multiplication of Double
More informationAREA-DELAY EFFICIENT FFT ARCHITECTURE USING PARALLEL PROCESSING AND NEW MEMORY SHARING TECHNIQUE
AREA-DELAY EFFICIENT FFT ARCHITECTURE USING PARALLEL PROCESSING AND NEW MEMORY SHARING TECHNIQUE Yousri Ouerhani, Maher Jridi, Ayman Alfalou To cite this version: Yousri Ouerhani, Maher Jridi, Ayman Alfalou.
More informationDESIGN OF AN FFT PROCESSOR
1 DESIGN OF AN FFT PROCESSOR Erik Nordhamn, Björn Sikström and Lars Wanhammar Department of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract In this paper we present a structured
More informationDesign & Analysis of 16 bit RISC Processor Using low Power Pipelining
International OPEN ACCESS Journal ISSN: 2249-6645 Of Modern Engineering Research (IJMER) Design & Analysis of 16 bit RISC Processor Using low Power Pipelining Yedla Venkanna 148R1D5710 Branch: VLSI ABSTRACT:-
More informationMultiplierless Unity-Gain SDF FFTs
Multiplierless Unity-Gain SDF FFTs Mario Garrido Gálvez, Rikard Andersson, Fahad Qureshi and Oscar Gustafsson Journal Article N.B.: When citing this work, cite the original article. 216 IEEE. Personal
More information