System-on-Chip engineering FFT/IFFTProcessor IP Core Datasheet - Released - Core:120801 Doc: 130107
This page has been intentionally left blank ii
Copyright reminder Copyright c 2012 by System-on-Chip engineering S.L. All rights are reserved. Unauthorized duplication of this document, in whole or in part, by any means is prohibited without the prior written permission of SoCe S.L. Although SoCe S.L. believes that the information included in this publication is correct as of the date of publication, SoCe S.L. reserves the right to make changes at any time without notice. All information in this document is strictly confidential and may only be published by SoCe S.L. All referenced trademarks are the property of their respective owners. Revision History Rev. Date Author Description 130103 13/01/03 MT Initial Version 130103 13/01/03 AC Revised 120107 13/01/07 AA Re-formatted iii
Contents Copyright reminder iii 1 Overview 1 2 Architecture 2 3 Core Symbol and Port Definitions 4 4 Operation Time Diagram 7 5 Preliminary Area an Frequency Characteristics 9 6 Detailed Example Design 11 iv
List of Figures 2.1 FFT Architecture............................... 3 3.1 Core Symbol.................................. 6 4.1 Timing diagram of the FFT/IFFTProcessor operation........... 8 6.1 Pipeline-SDF radix 2 7 DIF architecture for N = 8K............ 12 v
List of Tables 3.1 Core Signal Pinout.............................. 5 3.2 Parameters.................................. 5 5.1 Parameter values used for synthesis..................... 10 5.2 Spartan-6 Family Performance and Resource Utilization.......... 10 vi
1 Overview This Fast Fourier Transform (FFT) core, FFT/IFFTProcessor IP core, implements radix 2 k single-path delay feedback (SDF) pipeline FFT architectures, with k > 1. The pipeline architectures offer a good area-power-speed trade-off, and the 2 k algorithms achieve a great reduction of the number of twiddle-factor multiplications without complicating the butterfly radix 2 architecture. The decimation-in-frequency (DIF) method is used. This FFT core computes and N-point forward FFT or inverse FFT (IFFT), where N can be 2 p, with p > 1. The IFFT is computed by exchanging the real and imaginary parts of the initial and final sequence. The choice of forward or inverse transform is run-time configurable. The input data is a vector of N complex values represented as i dbw-bit two scomplement numbers, that is, i dbw bits for each of the real and imaginary components of the data sample. The output vector is represented using o dbw bits for each of the real and imaginary components of the output data. Input data is presented in natural order and the output data in bit reversed order. When the number of different twiddle factors corresponding to a twiddle factor operator is 8 or 16, its implementation is improved with constant multipliers. In other cases, a complex multiplier and a LUT are used. For the biggest LUTs, two different optimizations can be applied: 1. A memory reduction scheme that reduce the number of twiddle factors to store from L to L/8 + 1. 2. A memoryless CORDIC. All memory is on-chip using either block RAM or distributed RAM. The width of the datapath increases to accommodate the bit growth through the butterfly. Increasing unneeded FFT internal wordlength affects the precision, area, and power consumption. Having the same wordlength throughout the whole FFT processor gives a poor performance since the data has to be shifted down after each radix 2 butterfly to avoid overflow. The first stages in a DIF FFT contain the largest s. The shuffling of the first butterfly is achieved using a delay feedback of N/2. In the second butterfly, a delay feedback of N/4 is needed, in the third N/8 and so on. If the bitwidth is reduced at the input and then allowed to increase towards the output, the required memory, as well as the design area will decrease significantly. The fact that the datapath will be wider at the output than at the input will not affect the size of the memory that much, since the last s are short length. In fact, the last one is only one word length. SoCe Core:120801 Doc: 130107 1 / 12
2 Architecture The number of points of an FFT can be defined as N = 2 n k+l. If l = 0, the FFT is composed by n radix 2 k stages. On the other hand, if l > 0, the FFT is composed by n radix 2 k stages and one extra radix 2 l stage. Figure 2.1 shows the architecture of one generic radix 2 b stage of the FFT core. The BF1 blocks implements the hardware needed to perform the arithmetic operations of the radix 2 butterflies and the required shuffling for its correct operation. The BF2 blocks are similar to the BF1 blocks. However, some of the outputs are multiplied by j. The required shuffling is achieved using s of length N/2 i, where i goes from 1 to log 2 N. The value of b indicates the number of butterfly radix 2 blocks in each stage. If it is even, the stage is composed by b/2 groups of BF1-BF2 blocks, as can be seen in Figure 2.1(b). On the other hand, if b is odd, the stage is composed by (b 1)/2 groups of BF1-BF2 blocks and one extra group with a BF1 block, Figure 2.1(a). The symbol represents a general complex multiplier and the symbol a constant complex multiplier. When the number of different twiddle factors corresponding to a twiddle factor operator is 8 or 16, its implementation is improved with constant multipliers. In other cases, a complex multiplier and a LUT are used. If the coefficient memory reduction scheme is applied, the size of the largest LUT in the design is reduced from L to (L/8 + 1). If the CORDIC algorithm is applied, no LUT is needed for their implementation. SoCe Core:120801 Doc: 130107 2 / 12
FFT/IFFTProcessor IP Core Datasheet Group 1 Group (b-1)/2 Group (b+1)/2 LUT LUT Input data BF1 BF2... BF1 BF2 BF1 Ouput data (a) b odd Group 1 Group b/2-1 Group b/2 LUT LUT Input data BF1 BF2... BF1 BF2 BF1 BF2 Ouput data (b) b even Figure 2.1: FFT Architecture SoCe Core:120801 Doc: 130107 3 / 12
3 Core Symbol and Port Definitions Signal names for the schematic symbol are shown in Figure 3.1 and described in Table 3.1. In order to configure the FFT/IFFTProcessor, it has been employed some parameters. They are described in Table 3.2. SoCe Core:120801 Doc: 130107 4 / 12
FFT/IFFTProcessor IP Core Datasheet Table 3.1: Core Signal Pinout Port Name Port Width Direction Description CLK 1 Input Rising-edge clock RESET N 1 Input Synchronous reset, active in low. DIN I i dbw Input Real part of the input data in two s complement. DIN Q i dbw Input Imaginary part of the input data in two s complement. VALID DIN 1 Input Input data validation signal (Active High): This signal is High when valid data is presented at the input. SELECT 1 Output Control signal that indicates if a forward FFT or an inverse FFT is performed. When SELECT = 0, a forward transform is computed. If SELECT = 1, an inverse transform is computed. DOUT I o dbw Output Real part of the output data two s complement. DOUT Q o dbw Output Imaginary part of the output data in two s complement. VALID DOUT 1 Output Output data validation signal (Active High): This signal is High when valid data is presented at the output. Table 3.2: Parameters Name Type Range Description i dbw Integer > 0 Bitwidth of the input din i and din q ports. o dbw Integer > 0 Bitwidth of the output dout i and dout q ports. log2n Integer log 2 N Base 2 logarithm of the number of points of the FFT (N). dbw 1, dbw 2,..., dbw log2n Integer i dbw Bitwidth of the data in each butterfly of the architecture. tbw Integer > 0 Bitwidth of the real and imaginary parts of the twiddle factors. k Integer > 0 Value of k of the radix 2 k algorithm. SoCe Core:120801 Doc: 130107 5 / 12
FFT/IFFTProcessor IP Core Datasheet CLK RESET_N DIN_I DIN_Q VALID_DIN DOUT_I DOUT_Q VALID_DOUT SELECT Figure 3.1: Core Symbol SoCe Core:120801 Doc: 130107 6 / 12
4 Operation Time Diagram This section describes the timing behaviour of FFT/IFFTProcessor. Figure 4.1 shows the timing diagram of the FFT core operation. Data enter into the core through the din i and din q input ports while the valid din signal is high. After an FFT delay, the data starts to appear at the dout i and dout q output ports while the output data validation signal valid dout is high. This FFT delay delay is calculated as: F F T delay = log log 2N 2 N + ( N + 1) (4.1) 2 2i where the sum represents the delays introduced by the s and the log 2N 2 term, the delays introduced by the multipliers. i=1 SoCe Core:120801 Doc: 130107 7 / 12
FFT/IFFTProcessor IP Core Datasheet clk din_i din_q valid_din dout_i dout_q valid_dout FFT delay FFT delay Guard interval Figure 4.1: Timing diagram of the FFT/IFFTProcessor operation SoCe Core:120801 Doc: 130107 8 / 12
5 Preliminary Area an Frequency Characteristics This section summarizes some synthesis and place and route results of the FFT/IFFTProcessor core for Xilinx FPGAs. The analysis has been limited to the parameter values shown in Table 5.1. All the results have been obtained using Xilinx ISE version 12.4. Table 5.2 shows performance and resource usage numbers for Spartan-6 LXT family FPGA. The achievable maximum frequency of the core with this FPGA family depends on the applied optimizations. SoCe Core:120801 Doc: 130107 9 / 12
FFT/IFFTProcessor IP Core Datasheet Table 5.1: Parameter values used for synthesis Parameter Value i dbw 12 o dbw 12 log2n 13 tbw 10 k 7 Table 5.2: Spartan-6 Family Performance and Resource Utilization Optimization ROM N/8 CORDIC FFs 1548 1545 1538 LUTs 6334 3239 3937 RAM16BWERs 11 13 11 RAM8BWERs 6 7 6 DSP48A1s 24 24 21 Max Frequency (MHz) 64 70 41 SoCe Core:120801 Doc: 130107 10 / 12
6 Detailed Example Design In this example, an N = 8K-points FFT implementation example is presented. The specifications are an input and output wordlengths of 12 bits, and a signal to quantization noise ratio (SQNR) of 40dB. A pipeline-sdf radix 2 7 DIF FFT architecture has been implemented. Figure 6.1 shows the block diagram of the architecture. The wordlength of the processor is determined by SQNR simulation. The internal wordlengths have been adjusted to their minimum values for continue guaranteeing the specified SQNR. The bitwidth of the twiddle factors has been fixed to 10 bits. The VHDL description of the module has been simulated and compared with the obtained results with floating-point arithmetic with a satisfactory result. SoCe Core:120801 Doc: 130107 11 / 12
FFT/IFFTProcessor IP Core Datasheet 4096 2048 LUT 128 1024 512 LUT 32 256 128 64 LUT 8192 x(n) BF1 BF2 BF1 BF2 BF1 BF2 BF1 32 BF1 16 BF2 LUT 64 8 BF1 4 BF2 2 BF1 1 BF2 X[k] Figure 6.1: Pipeline-SDF radix 2 7 DIF architecture for N = 8K SoCe Core:120801 Doc: 130107 12 / 12