AN FFT PROCESSOR BASED ON 16-POINT MODULE

AN FFT PROCESSOR BASED ON 6-POINT MODULE Weidong Li, Mark Vesterbacka and Lars Wanhammar Electronics Systems, Dept. of EE., Linköping University SE-58 8 LINKÖPING, SWEDEN E-mail: {weidongl, markv, larsw}@isy.liu.se, Tel.: +46 284059, Fax: +46 9282 ABSTRACT: The number of multiplications has been a key merit for FFT algorithms. It has important impact on the total power consumption. In this paper, we present a 6-point FFT module, which reduces the multiplicative complexity by using real constant multiplications. A pipeline FFT processor has been implemented with the 6-point module and simulation result shows that it is an attractive candidate to reduce the power consumption.. INTRODUCTION FFT processor has been widely used in digital signal processing. Recently, FFT processor is applied to Orthogonal Frequency Division Multiplex (OFDM) based communication systems liked xdsl modems and wireless mobile terminals due to its efficient implementation of the modulator and demodulator bank. Furthermore, the low power has become a main constraint for battery-operated devices. Hence, the effective design of FFT processor with low power is vital. In this paper we present the design of an FFT processor which computes a 024-point FFT including I/O within 40 µs and is part of high bit rate mobile radio modem. Since our target application has the high requirements in throughput, power, and area, the ASIC implementation is one of the most feasible implementations. Complex multiplications is an expansive operation both in the past [0] and now [] [2]. One method to reduce the complexity is to replace complex multiplications with less expansive real multiplications when possible. Several 6-point modules are summarized in the following section. Among them, we uses the most efficient one as basic building block for the whole FFT processor, which is described in section. Result is presented in section 4. The conclusions are given finally in section 5. 2. 6-POINT FFT MODULE A N-point DFT can be expressed as following, N X( n) = xk ( )W nk, where W nk N = e. k = 0 In this section, we concentrate us on 6-point FFT module. There are mainly three different FFT algorithms, i.e., radix-2, radix-4 and split-radix 2/4, which are suitable for VLSI implementation. For simplicity, all algorithms in this paper are based on decimation-in-frequency (DIF), which is equivalent to decimation-in-time (DIT) based algorithms in complexity. These algorithms are derived in the rest of this section. 2. RADIX-2 2πnk ------------ The radix-2 6-point FFT maps the indices with k = 2 i k i and n = 2 i n i i = 0 i = 0

( k i, n i [ 0, ]). With notation ( m, m 2, m, m 0 ) = 2 i mi, the 6-point FFT can be expressed as following i = 0 X( n, n 2, n, n 0 ) xk (, k 2, k, k 0 )W 8k + 4k 2 + 2k + k = 0 k 0 = 0 k = 0 k 2 = 0 ( k 2 + 2k + k 0 )2n k = 0 ( k + k 0 )4n 2 8k 0 n W ( )n 0 (). A 6-point FFT with radix-2 algo- which includes the simplification of rithm is illustrated in Fig.. W mn 2πjmN ----------------- N = e = x(0) x() x(2) x() x(4) x(5) x(6) x(7) x(8) x(9) x(0) x() x(2) x() x(4) x(5) W W W 5 W 7 Figure 6-point FFT with radix-2 algorithm. X(0) X(8) X(4) X(2) X(2) X(0) X(6) X(4) X() X(9) X(5) X() X() X() X(7) X(5) The multiplication with = j can be done by swapping and inversion and therefore is trivial. The number of complex multiplications is 0. The complex multiplications with can be implemented with two real multiplications and other non-trivial complex multiplications can be implemented with three real multiplications. The number of real multiplications is therefore 24. 2.2 RADIX-4 The 6-point FFT with radix-4 algorithm can be driven in a similar manner.this is illustrated in Fig. 2.The number of complex multiplications is 8. With simplification for multiplications 2

with, the total number of real multiplications is reduced to 20. x(0) x() x(2) x() x(4) x(5) x(6) x(7) x(8) x(9) x(0) x() x(2) x() x(4) x(5) W W W W 9 Figure 2 6-point FFT with radix-4 algorithm. X(0) X(4) X(8) X(2) X() X(5) X(9) X() X(2) X(6) X(0) X(4) X() X(7) X() X(5) 2. SPLIT-RADIX FFT The split-radix FFT algorithm combines the radix-2 and radix-4 algorithms to reduce the number operations []. Odds terms of DFT are computed with radix-4 algorithm while the even terms with radix-2 algorithm. A 6-point FFT with split-radix 2/4 is shown in Fig.. x(0) x() x(2) x() x(4) x(5) x(6) x(7) x(8) x(9) x(0) x() x(2) x() x(4) x(5) W W W W 9 X(0) X(8) X(4) X(2) X(2) X(0) X(6) X(4) X() X(9) X(5) X() X() X() X(7) X(5) Figure Split-radix 6-point FFT. The number of complex multiplications is 8. With simplification, the number of real multiplications is 20, which is the same as radix-4 algorithm. To reduce the power consumption, the number of multiplications must be reduced. The radix-2 algorithm is less attractive due to the requirement of more multiplications. Since the FFT processor is implemented with fixed-point arithmetic, the multiplication with can be implemented with constant multiplication with sufficient accuracy. In our target application, this constant multiplication is realized with five shifted additions.

. PIPELEINE FFT PROCESSOR High performances, like high throughput and continue input/output etc., are required for communication systems. The pipeline architecture is suitable for those ends. In our target application, the input data are arrived in natural order. The data memory are required for pipeline processor since the incoming data have to be rearranged according to the FFT algorithm. The data memory consumes a large portion of power for large transform length FFT processor. Hence the data memory is also a critical factor for power consumption. In this section, we discuss the mapping of 6-point module for the pipeline processor.. MULTIPLICATIONS A large portion of total power are consumed by the computation of complex multiplications in the FFT processor. A complex multiplier consumes 72.6 mw with supply voltage of.v at 25MHz. For a 024-point FFT processor, it requires four complex multipliers and hence consumes 290mW@.V, 25MHz. Even with bypass technique for trivial complex multiplications, the power consumption for the computation of complex multiplications is still larger than 20mW. Hence the reduction of the number of complex multiplication is vital. Using high radix module can reduce the number of complex multiplications outsides the module. However, it is not common to use high radix module for implementations due to two main drawbacks: it increases the number of complex multiplications within the module if the radix is larger than 4 and it increases the routing complexity as well. To overcome those drawbacks is the key for using high radix module, which is also the key issues for our discussions. As well-known, adders consumes much less power than that of multipliers with the same wordlength. This is because the adder has less hardware and much less glitches. A 2-bit Brent-Kung adder (real) consumes.5mw@.v, 25MHz, which is much less than a 7 bit complex multiplier (72.6mW@.V, 25MHz). Therefore it is efficient to replace the complex multiplier with constant multiplier (carry-save-adders).we apply this idea to the design of 6-point module in order to reduce the number of complex multiplications. For a 6-point FFT module, there 2 are three type non-trivial complex multiplications, i.e., multiplications with,, and. The multiplications with and can share coefficients since π cos-- and. We can therefore use constant multiplication, which reduce the multiplication complexity. The implementation of multi- 8 sin π π -- -- 2 8 π π = = sin----- sin-- 8 8 cos π π -- -- 2 8 π = = cos----- 8 plication with is illustrated in Fig. 4. Re{input} π π cos-- + sin-- 8 8 Im{output} C constant π π multiplication Im{input} cos-- sin-- 8 8 π Re{output} Figure 4 Complex multiplication with. cos-- 8 For the three different algorithms, the different positions of multiplications cause different hardware implementations. Both radix-2 and split-radix algorithm require three multipliers (two 2 multipliers with and one multiplier with ) while the radix-4 algorithm requires only 2 two multipliers (one multipliers with and one multiplier with ). Hence the 6-point FFT module with radix-4 is more efficient and is selected for our implementation. The power consumption for complex multiplication within 6-point module is about 0mW@.V, 25MHz. 4

By replacing the complex multiplications with constant multiplications within the 6-point module, the number of non-trivial complex multiplications can be reduced to 776 with 6 6 4 configuration. The total number of complex multipliers is reduced to two for 024- point FFT due to the use of 6-point module. Table shows the number non-trivial complex multiplications required for 024-point FFT with different algorithms. Algorithm Radix-2 Radix-4 Split-radix Our approach No. of comp. mult. 586 272 290 776 Table : Number of non-trivial complex multiplications for 024-point FFT..2 DATA MEMORY The data memory consumes a significant portion of the total power. It is therefore desirable to reduce the size of data memory. For the pipeline FFT processor, the data memory for the first few stages dominates both size and power consumption of the total memory. The architecture selection for those stages is of importance. There are two main methods to reorder data for FFT algorithm for pipeline FFT processor: delay-forward and feedback. The key difference between two methods is that the delay-forward method stores only the incoming data in the data memory while the feedback method stores both the incoming data and partial results in the data memory at each stage. This is shown in Fig. 5 for a 4-point FFT. (a) D Figure 5 Delay-forward (a) and feedback (b). (b) As described in [2], the efficient way to reduce data memory size is to use feedback method. We select single-path feedback for data memory since it gives the minimum data memory with N words for N -point FFT [2].. REALIZATION OF 6-POINT MODULE Direct use of feedback method for the three algorithms listed in section 2 faces two main problems: large memory bandwidth and complex interconnection scheme. Also direct implementation of 6-point module is complicated. Figure 6 6-point FFT module. Constant multipliers The radix-4 algorithm can be decomposed into radix-2 algorithm as it does in [7]. Hence the mapping of 6-point module can be done with four pipelined radix-2 butterfly s. Each butterfly has its own feedback memory. The 6-module is illustrated in Fig. 6. With this mapping, the two main drawbacks for high radix module have been removed. 5

4. RESULTS For the complex multipliers, the conventional radix-4 algorithm requires 4 complex multipliers. Each complex multiplier consumes 72.6mW@.V, 25MHz at full rate (simulation result). With bypass technique, the total power consumption for complex multipliers is about 20mW. In our approach, there is only two complex multipliers and two constant multipliers (one consumes 0mW@.V, 25MHz), which consumes a total power less than 60mW. A power saving more than 20% for the computation of complex multiplications. This is less than the theoretical saving of 5% (the ratio for the number of complex multiplications) due to the computation for complex multiplications within the 6-point module. The power consumption for the data memory and butterfly s are of the same. The power consumption for the data memory is estimated 00mW (the power consumption for 28 words or higher memory is given by the vendor and the smaller memory is estimated through linear approximation downto 2 words). The butterfly s consumes about 0mW. 5. CONCLUSIONS In this paper, we introduces an FFT processor based on 6-point module. The new approach reduces the number of complex multiplications and retains the minimum size of data memory. The simulation result shows that it can reduce the power consumption. REFERENCE [] W. Li and L. Wanhammar, A Pipeline FFT Processor, IEEE Workshop on Signal Processing Systems (SiPS), Taipei, China, Oct., 999. [2] J. Melander, Design of SIC FFT Architectures, Linköping Studies in Science and Technology, Thesis No. 68, Linköping University, Sweden, 997. [] L. Wanhammar, DSP Integrated Circuits, Academic Press, 999. [4] Z. Mou and F. Jutand, Overturned-Stairs Adder Trees and Multiplier Design, IEEE Trans. on Computer, vol. C-4, No. 8, pp. 940-948, Aug. 992. [5] W. Li and L. Wanhammar, A Complex Multiplier Using Overturned-Stairs Adder Tree, Int. Conf. on Electronic Circuits and Systems (ICECS), Sept., 999. [6] G. Bi and E. V. Jones, A Pipelined FFT Processor for Word-Sequential Data, IEEE Trans. on Acoustic, Speech, and Signal Process., vol. ASSP-7, No.2, pp. 982-985, Dec. 989. [7] S. He and M. Torkelson, A New Approach to Pipeline FFT Processor, The 0th International Parallel Processing Symposium (IPPS), pp. 766-770, 996. [8] L. R. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Prentice- Hall, 975. [9] A. M. Despain, Fourier Transform Computer Using CORDIC Iterations, IEEE Trans. on Computers, vol. C-2, No. 0, pp. 99-00, 974. [0]M. T. Heideman and S. Burrus, On the Number of Multiplications Necessary to Compute a Length-2 n DFT, IEEE trans. on Acoustic, Speech, and Signal Process., vol. ASSP-4, No., pp. 9-95, 986. []P. Duhamel and H. Hollmann, Split Radix FFT Algorithm, Electronics Letters, Vol. 20, No., pp. 4-6, Jan., 984. [2]P. Duhamel and H. Hollmann, Existence of a 2 n FFT algorithm with a number of multiplications lower than 2 n +, Electronics Letters, Vol. 20, No. 7, pp. 690-692, Aug., 984. 6