AN FFT PROCESSOR BASED ON 16-POINT MODULE

Similar documents
Efficient Radix-4 and Radix-8 Butterfly Elements

The Serial Commutator FFT

Twiddle Factor Transformation for Pipelined FFT Processing

MULTIPLIERLESS HIGH PERFORMANCE FFT COMPUTATION

Linköping University Post Print. Analysis of Twiddle Factor Memory Complexity of Radix-2^i Pipelined FFTs

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

DESIGN METHODOLOGY. 5.1 General

LOW-POWER SPLIT-RADIX FFT PROCESSORS

A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices

FAST FOURIER TRANSFORM (FFT) and inverse fast

Abstract. Literature Survey. Introduction. A.Radix-2/8 FFT algorithm for length qx2 m DFTs

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope

THE orthogonal frequency-division multiplex (OFDM)

A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO

Novel design of multiplier-less FFT processors

ON CONFIGURATION OF RESIDUE SCALING PROCESS IN PIPELINED RADIX-4 MQRNS FFT PROCESSOR

An efficient multiplierless approximation of the fast Fourier transform using sum-of-powers-of-two (SOPOT) coefficients

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

High Throughput Energy Efficient Parallel FFT Architecture on FPGAs

Radix-4 FFT Algorithms *

TOPICS PIPELINE IMPLEMENTATIONS OF THE FAST FOURIER TRANSFORM (FFT) DISCRETE FOURIER TRANSFORM (DFT) INVERSE DFT (IDFT) Consulted work:

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

Low Power Complex Multiplier based FFT Processor

RECENTLY, researches on gigabit wireless personal area

Parallel-computing approach for FFT implementation on digital signal processor (DSP)

FAST Fourier transform (FFT) is an important signal processing

DESIGN OF PARALLEL PIPELINED FEED FORWARD ARCHITECTURE FOR ZERO FREQUENCY & MINIMUM COMPUTATION (ZMC) ALGORITHM OF FFT

Keywords: Fast Fourier Transforms (FFT), Multipath Delay Commutator (MDC), Pipelined Architecture, Radix-2 k, VLSI.

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

VLSI IMPLEMENTATION AND PERFORMANCE ANALYSIS OF EFFICIENT MIXED-RADIX 8-2 FFT ALGORITHM WITH BIT REVERSAL FOR THE OUTPUT SEQUENCES.

Fixed Point Streaming Fft Processor For Ofdm

Low-Power Split-Radix FFT Processors Using Radix-2 Butterfly Units

A VLSI Array Architecture for Realization of DFT, DHT, DCT and DST

Modified Welch Power Spectral Density Computation with Fast Fourier Transform

Research Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-6, Issue-8) Abstract:

A Pipelined Fused Processing Unit for DSP Applications

A scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment

Research Article Design of A Novel 8-point Modified R2MDC with Pipelined Technique for High Speed OFDM Applications

REAL TIME DIGITAL SIGNAL PROCESSING

An Area Efficient Mixed Decimation MDF Architecture for Radix. Parallel FFT

Multiplierless Unity-Gain SDF FFTs

A Genetic Algorithm for the Optimisation of a Reconfigurable Pipelined FFT Processor

Feedforward FFT Hardware Architectures Based on Rotator Allocation

IMPLEMENTATION OF OPTIMIZED 128-POINT PIPELINE FFT PROCESSOR USING MIXED RADIX 4-2 FOR OFDM APPLICATIONS

IMPLEMENTATION OF FAST FOURIER TRANSFORM USING VERILOG HDL

International Journal of Innovative and Emerging Research in Engineering. e-issn: p-issn:

STUDY OF A CORDIC BASED RADIX-4 FFT PROCESSOR

High-Speed and Low-Power Split-Radix FFT

ENERGY EFFICIENT PARAMETERIZED FFT ARCHITECTURE. Ren Chen, Hoang Le, and Viktor K. Prasanna

Reconfigurable FFT Processor A Broader Perspective Survey

Digital Signal Processing. Soma Biswas

Efficient Methods for FFT calculations Using Memory Reduction Techniques.

ISSN Vol.02, Issue.11, December-2014, Pages:

Design And Simulation Of Pipelined Radix-2 k Feed-Forward FFT Architectures

Design of Delay Efficient Distributed Arithmetic Based Split Radix FFT

ENERGY EFFICIENT PARAMETERIZED FFT ARCHITECTURE. Ren Chen, Hoang Le, and Viktor K. Prasanna

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay

An Efficient High Speed VLSI Architecture Based 16-Point Adaptive Split Radix-2 FFT Architecture

FFT. There are many ways to decompose an FFT [Rabiner and Gold] The simplest ones are radix-2 Computation made up of radix-2 butterflies X = A + BW

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL

DESIGN OF AN FFT PROCESSOR

HIGH SPEED REALISATION OF DIGITAL FILTERS

FPGA Based Design and Simulation of 32- Point FFT Through Radix-2 DIT Algorith

Implementation of Low-Memory Reference FFT on Digital Signal Processor

Research Article Radix-2 α /4 β Building Blocks for Efficient VLSI s Higher Radices Butterflies Implementation

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

Reconfigurable Fast Fourier Transform Architecture for Orthogonal Frequency Division Multiplexing Systems

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

Computing the Discrete Fourier Transform on FPGA Based Systolic Arrays

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

Decimation-in-Frequency (DIF) Radix-2 FFT *

Design and Implementation of 3-D DWT for Video Processing Applications

CORDIC Based DFT on FPGA for DSP Applications

AREA-DELAY EFFICIENT FFT ARCHITECTURE USING PARALLEL PROCESSING AND NEW MEMORY SHARING TECHNIQUE

Decimation-in-time (DIT) Radix-2 FFT *

Latest Innovation For FFT implementation using RCBNS

DESIGN & SIMULATION PARALLEL PIPELINED RADIX -2^2 FFT ARCHITECTURE FOR REAL VALUED SIGNALS

A Novel Distributed Arithmetic Multiplierless Approach for Computing Complex Inner Products

Carry-Free Radix-2 Subtractive Division Algorithm and Implementation of the Divider

FPGA Implementation of Discrete Fourier Transform Using CORDIC Algorithm

DUE to the high computational complexity and real-time

FFT/IFFTProcessor IP Core Datasheet

High-Performance 16-Point Complex FFT Features 1 Functional Description 2 Theory of Operation

Implementation of a Low Power Decimation Filter Using 1/3-Band IIR Filter

An Enhanced Mixed-Scaling-Rotation CORDIC algorithm with Weighted Amplifying Factor

Design of FPGA Based Radix 4 FFT Processor using CORDIC

Speed Optimised CORDIC Based Fast Algorithm for DCT

Low-Power Adaptive Viterbi Decoder for TCM Using T-Algorithm

Low Power Floating-Point Multiplier Based On Vedic Mathematics

INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS.

SFF The Single-Stream FPGA-Optimized Feedforward FFT Hardware Architecture

CORDIC Based FFT for Signal Processing System

Fast Block LMS Adaptive Filter Using DA Technique for High Performance in FGPA

An Ultra Low-Power WOLA Filterbank Implementation in Deep Submicron Technology

Architectures for Dynamic Data Scaling in 2/4/8K Pipeline FFT Cores

Fast Orthogonal Neural Networks

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

Fast Fourier Transform Architectures: A Survey and State of the Art

Efficient Double-Precision Cosine Generation

An Algorithm for Computing the Radix-2 n Fast Fourier Transform

Transcription:

AN FFT PROCESSOR BASED ON 6-POINT MODULE Weidong Li, Mark Vesterbacka and Lars Wanhammar Electronics Systems, Dept. of EE., Linköping University SE-58 8 LINKÖPING, SWEDEN E-mail: {weidongl, markv, larsw}@isy.liu.se, Tel.: +46 284059, Fax: +46 9282 ABSTRACT: The number of multiplications has been a key merit for FFT algorithms. It has important impact on the total power consumption. In this paper, we present a 6-point FFT module, which reduces the multiplicative complexity by using real constant multiplications. A pipeline FFT processor has been implemented with the 6-point module and simulation result shows that it is an attractive candidate to reduce the power consumption.. INTRODUCTION FFT processor has been widely used in digital signal processing. Recently, FFT processor is applied to Orthogonal Frequency Division Multiplex (OFDM) based communication systems liked xdsl modems and wireless mobile terminals due to its efficient implementation of the modulator and demodulator bank. Furthermore, the low power has become a main constraint for battery-operated devices. Hence, the effective design of FFT processor with low power is vital. In this paper we present the design of an FFT processor which computes a 024-point FFT including I/O within 40 µs and is part of high bit rate mobile radio modem. Since our target application has the high requirements in throughput, power, and area, the ASIC implementation is one of the most feasible implementations. Complex multiplications is an expansive operation both in the past [0] and now [] [2]. One method to reduce the complexity is to replace complex multiplications with less expansive real multiplications when possible. Several 6-point modules are summarized in the following section. Among them, we uses the most efficient one as basic building block for the whole FFT processor, which is described in section. Result is presented in section 4. The conclusions are given finally in section 5. 2. 6-POINT FFT MODULE A N-point DFT can be expressed as following, N X( n) = xk ( )W nk, where W nk N = e. k = 0 In this section, we concentrate us on 6-point FFT module. There are mainly three different FFT algorithms, i.e., radix-2, radix-4 and split-radix 2/4, which are suitable for VLSI implementation. For simplicity, all algorithms in this paper are based on decimation-in-frequency (DIF), which is equivalent to decimation-in-time (DIT) based algorithms in complexity. These algorithms are derived in the rest of this section. 2. RADIX-2 2πnk ------------ The radix-2 6-point FFT maps the indices with k = 2 i k i and n = 2 i n i i = 0 i = 0

( k i, n i [ 0, ]). With notation ( m, m 2, m, m 0 ) = 2 i mi, the 6-point FFT can be expressed as following i = 0 X( n, n 2, n, n 0 ) xk (, k 2, k, k 0 )W 8k + 4k 2 + 2k + k = 0 k 0 = 0 k = 0 k 2 = 0 ( k 2 + 2k + k 0 )2n k = 0 ( k + k 0 )4n 2 8k 0 n W ( )n 0 (). A 6-point FFT with radix-2 algo- which includes the simplification of rithm is illustrated in Fig.. W mn 2πjmN ----------------- N = e = x(0) x() x(2) x() x(4) x(5) x(6) x(7) x(8) x(9) x(0) x() x(2) x() x(4) x(5) W W W 5 W 7 Figure 6-point FFT with radix-2 algorithm. X(0) X(8) X(4) X(2) X(2) X(0) X(6) X(4) X() X(9) X(5) X() X() X() X(7) X(5) The multiplication with = j can be done by swapping and inversion and therefore is trivial. The number of complex multiplications is 0. The complex multiplications with can be implemented with two real multiplications and other non-trivial complex multiplications can be implemented with three real multiplications. The number of real multiplications is therefore 24. 2.2 RADIX-4 The 6-point FFT with radix-4 algorithm can be driven in a similar manner.this is illustrated in Fig. 2.The number of complex multiplications is 8. With simplification for multiplications 2

with, the total number of real multiplications is reduced to 20. x(0) x() x(2) x() x(4) x(5) x(6) x(7) x(8) x(9) x(0) x() x(2) x() x(4) x(5) W W W W 9 Figure 2 6-point FFT with radix-4 algorithm. X(0) X(4) X(8) X(2) X() X(5) X(9) X() X(2) X(6) X(0) X(4) X() X(7) X() X(5) 2. SPLIT-RADIX FFT The split-radix FFT algorithm combines the radix-2 and radix-4 algorithms to reduce the number operations []. Odds terms of DFT are computed with radix-4 algorithm while the even terms with radix-2 algorithm. A 6-point FFT with split-radix 2/4 is shown in Fig.. x(0) x() x(2) x() x(4) x(5) x(6) x(7) x(8) x(9) x(0) x() x(2) x() x(4) x(5) W W W W 9 X(0) X(8) X(4) X(2) X(2) X(0) X(6) X(4) X() X(9) X(5) X() X() X() X(7) X(5) Figure Split-radix 6-point FFT. The number of complex multiplications is 8. With simplification, the number of real multiplications is 20, which is the same as radix-4 algorithm. To reduce the power consumption, the number of multiplications must be reduced. The radix-2 algorithm is less attractive due to the requirement of more multiplications. Since the FFT processor is implemented with fixed-point arithmetic, the multiplication with can be implemented with constant multiplication with sufficient accuracy. In our target application, this constant multiplication is realized with five shifted additions.

. PIPELEINE FFT PROCESSOR High performances, like high throughput and continue input/output etc., are required for communication systems. The pipeline architecture is suitable for those ends. In our target application, the input data are arrived in natural order. The data memory are required for pipeline processor since the incoming data have to be rearranged according to the FFT algorithm. The data memory consumes a large portion of power for large transform length FFT processor. Hence the data memory is also a critical factor for power consumption. In this section, we discuss the mapping of 6-point module for the pipeline processor.. MULTIPLICATIONS A large portion of total power are consumed by the computation of complex multiplications in the FFT processor. A complex multiplier consumes 72.6 mw with supply voltage of.v at 25MHz. For a 024-point FFT processor, it requires four complex multipliers and hence consumes 290mW@.V, 25MHz. Even with bypass technique for trivial complex multiplications, the power consumption for the computation of complex multiplications is still larger than 20mW. Hence the reduction of the number of complex multiplication is vital. Using high radix module can reduce the number of complex multiplications outsides the module. However, it is not common to use high radix module for implementations due to two main drawbacks: it increases the number of complex multiplications within the module if the radix is larger than 4 and it increases the routing complexity as well. To overcome those drawbacks is the key for using high radix module, which is also the key issues for our discussions. As well-known, adders consumes much less power than that of multipliers with the same wordlength. This is because the adder has less hardware and much less glitches. A 2-bit Brent-Kung adder (real) consumes.5mw@.v, 25MHz, which is much less than a 7 bit complex multiplier (72.6mW@.V, 25MHz). Therefore it is efficient to replace the complex multiplier with constant multiplier (carry-save-adders).we apply this idea to the design of 6-point module in order to reduce the number of complex multiplications. For a 6-point FFT module, there 2 are three type non-trivial complex multiplications, i.e., multiplications with,, and. The multiplications with and can share coefficients since π cos-- and. We can therefore use constant multiplication, which reduce the multiplication complexity. The implementation of multi- 8 sin π π -- -- 2 8 π π = = sin----- sin-- 8 8 cos π π -- -- 2 8 π = = cos----- 8 plication with is illustrated in Fig. 4. Re{input} π π cos-- + sin-- 8 8 Im{output} C constant π π multiplication Im{input} cos-- sin-- 8 8 π Re{output} Figure 4 Complex multiplication with. cos-- 8 For the three different algorithms, the different positions of multiplications cause different hardware implementations. Both radix-2 and split-radix algorithm require three multipliers (two 2 multipliers with and one multiplier with ) while the radix-4 algorithm requires only 2 two multipliers (one multipliers with and one multiplier with ). Hence the 6-point FFT module with radix-4 is more efficient and is selected for our implementation. The power consumption for complex multiplication within 6-point module is about 0mW@.V, 25MHz. 4

By replacing the complex multiplications with constant multiplications within the 6-point module, the number of non-trivial complex multiplications can be reduced to 776 with 6 6 4 configuration. The total number of complex multipliers is reduced to two for 024- point FFT due to the use of 6-point module. Table shows the number non-trivial complex multiplications required for 024-point FFT with different algorithms. Algorithm Radix-2 Radix-4 Split-radix Our approach No. of comp. mult. 586 272 290 776 Table : Number of non-trivial complex multiplications for 024-point FFT..2 DATA MEMORY The data memory consumes a significant portion of the total power. It is therefore desirable to reduce the size of data memory. For the pipeline FFT processor, the data memory for the first few stages dominates both size and power consumption of the total memory. The architecture selection for those stages is of importance. There are two main methods to reorder data for FFT algorithm for pipeline FFT processor: delay-forward and feedback. The key difference between two methods is that the delay-forward method stores only the incoming data in the data memory while the feedback method stores both the incoming data and partial results in the data memory at each stage. This is shown in Fig. 5 for a 4-point FFT. (a) D Figure 5 Delay-forward (a) and feedback (b). (b) As described in [2], the efficient way to reduce data memory size is to use feedback method. We select single-path feedback for data memory since it gives the minimum data memory with N words for N -point FFT [2].. REALIZATION OF 6-POINT MODULE Direct use of feedback method for the three algorithms listed in section 2 faces two main problems: large memory bandwidth and complex interconnection scheme. Also direct implementation of 6-point module is complicated. Figure 6 6-point FFT module. Constant multipliers The radix-4 algorithm can be decomposed into radix-2 algorithm as it does in [7]. Hence the mapping of 6-point module can be done with four pipelined radix-2 butterfly s. Each butterfly has its own feedback memory. The 6-module is illustrated in Fig. 6. With this mapping, the two main drawbacks for high radix module have been removed. 5

4. RESULTS For the complex multipliers, the conventional radix-4 algorithm requires 4 complex multipliers. Each complex multiplier consumes 72.6mW@.V, 25MHz at full rate (simulation result). With bypass technique, the total power consumption for complex multipliers is about 20mW. In our approach, there is only two complex multipliers and two constant multipliers (one consumes 0mW@.V, 25MHz), which consumes a total power less than 60mW. A power saving more than 20% for the computation of complex multiplications. This is less than the theoretical saving of 5% (the ratio for the number of complex multiplications) due to the computation for complex multiplications within the 6-point module. The power consumption for the data memory and butterfly s are of the same. The power consumption for the data memory is estimated 00mW (the power consumption for 28 words or higher memory is given by the vendor and the smaller memory is estimated through linear approximation downto 2 words). The butterfly s consumes about 0mW. 5. CONCLUSIONS In this paper, we introduces an FFT processor based on 6-point module. The new approach reduces the number of complex multiplications and retains the minimum size of data memory. The simulation result shows that it can reduce the power consumption. REFERENCE [] W. Li and L. Wanhammar, A Pipeline FFT Processor, IEEE Workshop on Signal Processing Systems (SiPS), Taipei, China, Oct., 999. [2] J. Melander, Design of SIC FFT Architectures, Linköping Studies in Science and Technology, Thesis No. 68, Linköping University, Sweden, 997. [] L. Wanhammar, DSP Integrated Circuits, Academic Press, 999. [4] Z. Mou and F. Jutand, Overturned-Stairs Adder Trees and Multiplier Design, IEEE Trans. on Computer, vol. C-4, No. 8, pp. 940-948, Aug. 992. [5] W. Li and L. Wanhammar, A Complex Multiplier Using Overturned-Stairs Adder Tree, Int. Conf. on Electronic Circuits and Systems (ICECS), Sept., 999. [6] G. Bi and E. V. Jones, A Pipelined FFT Processor for Word-Sequential Data, IEEE Trans. on Acoustic, Speech, and Signal Process., vol. ASSP-7, No.2, pp. 982-985, Dec. 989. [7] S. He and M. Torkelson, A New Approach to Pipeline FFT Processor, The 0th International Parallel Processing Symposium (IPPS), pp. 766-770, 996. [8] L. R. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Prentice- Hall, 975. [9] A. M. Despain, Fourier Transform Computer Using CORDIC Iterations, IEEE Trans. on Computers, vol. C-2, No. 0, pp. 99-00, 974. [0]M. T. Heideman and S. Burrus, On the Number of Multiplications Necessary to Compute a Length-2 n DFT, IEEE trans. on Acoustic, Speech, and Signal Process., vol. ASSP-4, No., pp. 9-95, 986. []P. Duhamel and H. Hollmann, Split Radix FFT Algorithm, Electronics Letters, Vol. 20, No., pp. 4-6, Jan., 984. [2]P. Duhamel and H. Hollmann, Existence of a 2 n FFT algorithm with a number of multiplications lower than 2 n +, Electronics Letters, Vol. 20, No. 7, pp. 690-692, Aug., 984. 6