FFT/IFFTProcessor IP Core Datasheet

Similar documents
Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

User Manual for FC100

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

LogiCORE IP Fast Fourier Transform v7.1

LogiCORE IP Fast Fourier Transform v7.1

ENT 315 Medical Signal Processing CHAPTER 3 FAST FOURIER TRANSFORM. Dr. Lim Chee Chin

DESIGN METHODOLOGY. 5.1 General

Low-Power Split-Radix FFT Processors Using Radix-2 Butterfly Units

Fast Fourier Transform IP Core v1.0 Block Floating-Point Streaming Radix-2 Architecture. Introduction. Features. Data Sheet. IPC0002 October 2014

Linköping University Post Print. Analysis of Twiddle Factor Memory Complexity of Radix-2^i Pipelined FFTs

Image Compression System on an FPGA

The Serial Commutator FFT

Design & Development of IP-core of FFT for Field Programmable Gate Arrays Bhawesh Sahu ME Reserch Scholar,sem(IV),

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

REAL TIME DIGITAL SIGNAL PROCESSING

FFT MegaCore Function User Guide

STUDY OF A CORDIC BASED RADIX-4 FFT PROCESSOR

Topics. Midterm Finish Chapter 7

AN 464: DFT/IDFT Reference Design

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

FPGA Based Design and Simulation of 32- Point FFT Through Radix-2 DIT Algorith

Fixed Point Streaming Fft Processor For Ofdm

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

Documentation. Implementation Xilinx ISE v10.1. Simulation

FFT Compiler IP Core User Guide

ELEC 427 Final Project Area-Efficient FFT on FPGA

White Paper. Floating-Point FFT Processor (IEEE 754 Single Precision) Radix 2 Core. Introduction. Parameters & Ports

Exercises in DSP Design 2016 & Exam from Exam from

Discontinued IP. Distributed Memory v7.1. Functional Description. Features

Topics. Midterm Finish Chapter 7

Design Space Exploration of the Lightweight Stream Cipher WG-8 for FPGAs and ASICs

Digital Signal Processing for Analog Input

FFT. There are many ways to decompose an FFT [Rabiner and Gold] The simplest ones are radix-2 Computation made up of radix-2 butterflies X = A + BW

Design of Delay Efficient Distributed Arithmetic Based Split Radix FFT

RECENTLY, researches on gigabit wireless personal area

LogiCORE IP FIFO Generator v6.1

FFT MegaCore Function User Guide

High-Performance 16-Point Complex FFT Features 1 Functional Description 2 Theory of Operation

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL

On the Fixed-Point Accuracy Analysis and Optimization of FFT Units with CORDIC Multipliers

An Area Efficient Mixed Decimation MDF Architecture for Radix. Parallel FFT

1024-Point Complex FFT/IFFT V Functional Description. Features. Theory of Operation

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

FPGA Matrix Multiplier

FPGA Implementation of IP-core of FFT Block for DSP Applications

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

New Integer-FFT Multiplication Architectures and Implementations for Accelerating Fully Homomorphic Encryption

AT40K FPGA IP Core AT40K-FFT. Features. Description

LOW-POWER SPLIT-RADIX FFT PROCESSORS

Abstract. Literature Survey. Introduction. A.Radix-2/8 FFT algorithm for length qx2 m DFTs

High Performance Pipelined Design for FFT Processor based on FPGA

FPGA Polyphase Filter Bank Study & Implementation

TOPICS PIPELINE IMPLEMENTATIONS OF THE FAST FOURIER TRANSFORM (FFT) DISCRETE FOURIER TRANSFORM (DFT) INVERSE DFT (IDFT) Consulted work:

FAST Fourier transform (FFT) is an important signal processing

AES Core Specification. Author: Homer Hsing

Quixilica Floating Point FPGA Cores

Utility Reduced Logic (v1.00a)

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

Discontinued IP. Verification

International Journal of Innovative and Emerging Research in Engineering. e-issn: p-issn:

Low Power Complex Multiplier based FFT Processor

Computing the Discrete Fourier Transform on FPGA Based Systolic Arrays

isplever Parallel FIR Filter User s Guide October 2005 ipug06_02.0

SHA3 Core Specification. Author: Homer Hsing

Digital Signal Processing. Soma Biswas

LogiCORE IP Floating-Point Operator v6.2

Parallel FIR Filters. Chapter 5

AN FFT PROCESSOR BASED ON 16-POINT MODULE

Asynchronous FIFO V3.0. Features. Synchronization and Timing Issues. Functional Description

An FPGA Implementation of the Powering Function with Single Precision Floating-Point Arithmetic

Multiplierless Unity-Gain SDF FFTs

Lab 2 Fractals! Lecture 5 Revised

Twiddle Factor Transformation for Pipelined FFT Processing

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India

Advanced Encryption Standard / Rijndael IP Core. Author: Rudolf Usselmann

FFT MegaCore Function User Guide

An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs

isplever Correlator IP Core User s Guide June 2008 ipug38_02.3

Computer Arithmetic. L. Liu Department of Computer Science, ETH Zürich Fall semester, Reconfigurable Computing Systems ( L) Fall 2012

Synchronous FIFO V3.0. Features. Functional Description

Multiplier Generator V6.0. Features

Using a Scalable Parallel 2D FFT for Image Enhancement

FFT Compiler IP Core User s Guide

Floating-point to Fixed-point Conversion. Digital Signal Processing Programs (Short Version for FPGA DSP)

FPGA Implementation of Discrete Fourier Transform Using CORDIC Algorithm

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 2

A Library of Parameterized Floating-point Modules and Their Use

INTRODUCTION TO FPGA ARCHITECTURE

Field Programmable Gate Array (FPGA)

ISSN Vol.02, Issue.11, December-2014, Pages:

Decimation-in-Frequency (DIF) Radix-2 FFT *

Discontinued IP. 3GPP2 Turbo Decoder v1.0. Features. Applications. General Description

Channel FIFO (CFIFO) (v1.00a)

An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs

TeleBench 1.1. software benchmark data book.

Vendor Agnostic, High Performance, Double Precision Floating Point Division for FPGAs

FIR Filter IP Core User s Guide

Virtex-5 GTP Aurora v2.8

FAST FOURIER TRANSFORM (FFT) and inverse fast

Transcription:

System-on-Chip engineering FFT/IFFTProcessor IP Core Datasheet - Released - Core:120801 Doc: 130107

This page has been intentionally left blank ii

Copyright reminder Copyright c 2012 by System-on-Chip engineering S.L. All rights are reserved. Unauthorized duplication of this document, in whole or in part, by any means is prohibited without the prior written permission of SoCe S.L. Although SoCe S.L. believes that the information included in this publication is correct as of the date of publication, SoCe S.L. reserves the right to make changes at any time without notice. All information in this document is strictly confidential and may only be published by SoCe S.L. All referenced trademarks are the property of their respective owners. Revision History Rev. Date Author Description 130103 13/01/03 MT Initial Version 130103 13/01/03 AC Revised 120107 13/01/07 AA Re-formatted iii

Contents Copyright reminder iii 1 Overview 1 2 Architecture 2 3 Core Symbol and Port Definitions 4 4 Operation Time Diagram 7 5 Preliminary Area an Frequency Characteristics 9 6 Detailed Example Design 11 iv

List of Figures 2.1 FFT Architecture............................... 3 3.1 Core Symbol.................................. 6 4.1 Timing diagram of the FFT/IFFTProcessor operation........... 8 6.1 Pipeline-SDF radix 2 7 DIF architecture for N = 8K............ 12 v

List of Tables 3.1 Core Signal Pinout.............................. 5 3.2 Parameters.................................. 5 5.1 Parameter values used for synthesis..................... 10 5.2 Spartan-6 Family Performance and Resource Utilization.......... 10 vi

1 Overview This Fast Fourier Transform (FFT) core, FFT/IFFTProcessor IP core, implements radix 2 k single-path delay feedback (SDF) pipeline FFT architectures, with k > 1. The pipeline architectures offer a good area-power-speed trade-off, and the 2 k algorithms achieve a great reduction of the number of twiddle-factor multiplications without complicating the butterfly radix 2 architecture. The decimation-in-frequency (DIF) method is used. This FFT core computes and N-point forward FFT or inverse FFT (IFFT), where N can be 2 p, with p > 1. The IFFT is computed by exchanging the real and imaginary parts of the initial and final sequence. The choice of forward or inverse transform is run-time configurable. The input data is a vector of N complex values represented as i dbw-bit two scomplement numbers, that is, i dbw bits for each of the real and imaginary components of the data sample. The output vector is represented using o dbw bits for each of the real and imaginary components of the output data. Input data is presented in natural order and the output data in bit reversed order. When the number of different twiddle factors corresponding to a twiddle factor operator is 8 or 16, its implementation is improved with constant multipliers. In other cases, a complex multiplier and a LUT are used. For the biggest LUTs, two different optimizations can be applied: 1. A memory reduction scheme that reduce the number of twiddle factors to store from L to L/8 + 1. 2. A memoryless CORDIC. All memory is on-chip using either block RAM or distributed RAM. The width of the datapath increases to accommodate the bit growth through the butterfly. Increasing unneeded FFT internal wordlength affects the precision, area, and power consumption. Having the same wordlength throughout the whole FFT processor gives a poor performance since the data has to be shifted down after each radix 2 butterfly to avoid overflow. The first stages in a DIF FFT contain the largest s. The shuffling of the first butterfly is achieved using a delay feedback of N/2. In the second butterfly, a delay feedback of N/4 is needed, in the third N/8 and so on. If the bitwidth is reduced at the input and then allowed to increase towards the output, the required memory, as well as the design area will decrease significantly. The fact that the datapath will be wider at the output than at the input will not affect the size of the memory that much, since the last s are short length. In fact, the last one is only one word length. SoCe Core:120801 Doc: 130107 1 / 12

2 Architecture The number of points of an FFT can be defined as N = 2 n k+l. If l = 0, the FFT is composed by n radix 2 k stages. On the other hand, if l > 0, the FFT is composed by n radix 2 k stages and one extra radix 2 l stage. Figure 2.1 shows the architecture of one generic radix 2 b stage of the FFT core. The BF1 blocks implements the hardware needed to perform the arithmetic operations of the radix 2 butterflies and the required shuffling for its correct operation. The BF2 blocks are similar to the BF1 blocks. However, some of the outputs are multiplied by j. The required shuffling is achieved using s of length N/2 i, where i goes from 1 to log 2 N. The value of b indicates the number of butterfly radix 2 blocks in each stage. If it is even, the stage is composed by b/2 groups of BF1-BF2 blocks, as can be seen in Figure 2.1(b). On the other hand, if b is odd, the stage is composed by (b 1)/2 groups of BF1-BF2 blocks and one extra group with a BF1 block, Figure 2.1(a). The symbol represents a general complex multiplier and the symbol a constant complex multiplier. When the number of different twiddle factors corresponding to a twiddle factor operator is 8 or 16, its implementation is improved with constant multipliers. In other cases, a complex multiplier and a LUT are used. If the coefficient memory reduction scheme is applied, the size of the largest LUT in the design is reduced from L to (L/8 + 1). If the CORDIC algorithm is applied, no LUT is needed for their implementation. SoCe Core:120801 Doc: 130107 2 / 12

FFT/IFFTProcessor IP Core Datasheet Group 1 Group (b-1)/2 Group (b+1)/2 LUT LUT Input data BF1 BF2... BF1 BF2 BF1 Ouput data (a) b odd Group 1 Group b/2-1 Group b/2 LUT LUT Input data BF1 BF2... BF1 BF2 BF1 BF2 Ouput data (b) b even Figure 2.1: FFT Architecture SoCe Core:120801 Doc: 130107 3 / 12

3 Core Symbol and Port Definitions Signal names for the schematic symbol are shown in Figure 3.1 and described in Table 3.1. In order to configure the FFT/IFFTProcessor, it has been employed some parameters. They are described in Table 3.2. SoCe Core:120801 Doc: 130107 4 / 12

FFT/IFFTProcessor IP Core Datasheet Table 3.1: Core Signal Pinout Port Name Port Width Direction Description CLK 1 Input Rising-edge clock RESET N 1 Input Synchronous reset, active in low. DIN I i dbw Input Real part of the input data in two s complement. DIN Q i dbw Input Imaginary part of the input data in two s complement. VALID DIN 1 Input Input data validation signal (Active High): This signal is High when valid data is presented at the input. SELECT 1 Output Control signal that indicates if a forward FFT or an inverse FFT is performed. When SELECT = 0, a forward transform is computed. If SELECT = 1, an inverse transform is computed. DOUT I o dbw Output Real part of the output data two s complement. DOUT Q o dbw Output Imaginary part of the output data in two s complement. VALID DOUT 1 Output Output data validation signal (Active High): This signal is High when valid data is presented at the output. Table 3.2: Parameters Name Type Range Description i dbw Integer > 0 Bitwidth of the input din i and din q ports. o dbw Integer > 0 Bitwidth of the output dout i and dout q ports. log2n Integer log 2 N Base 2 logarithm of the number of points of the FFT (N). dbw 1, dbw 2,..., dbw log2n Integer i dbw Bitwidth of the data in each butterfly of the architecture. tbw Integer > 0 Bitwidth of the real and imaginary parts of the twiddle factors. k Integer > 0 Value of k of the radix 2 k algorithm. SoCe Core:120801 Doc: 130107 5 / 12

FFT/IFFTProcessor IP Core Datasheet CLK RESET_N DIN_I DIN_Q VALID_DIN DOUT_I DOUT_Q VALID_DOUT SELECT Figure 3.1: Core Symbol SoCe Core:120801 Doc: 130107 6 / 12

4 Operation Time Diagram This section describes the timing behaviour of FFT/IFFTProcessor. Figure 4.1 shows the timing diagram of the FFT core operation. Data enter into the core through the din i and din q input ports while the valid din signal is high. After an FFT delay, the data starts to appear at the dout i and dout q output ports while the output data validation signal valid dout is high. This FFT delay delay is calculated as: F F T delay = log log 2N 2 N + ( N + 1) (4.1) 2 2i where the sum represents the delays introduced by the s and the log 2N 2 term, the delays introduced by the multipliers. i=1 SoCe Core:120801 Doc: 130107 7 / 12

FFT/IFFTProcessor IP Core Datasheet clk din_i din_q valid_din dout_i dout_q valid_dout FFT delay FFT delay Guard interval Figure 4.1: Timing diagram of the FFT/IFFTProcessor operation SoCe Core:120801 Doc: 130107 8 / 12

5 Preliminary Area an Frequency Characteristics This section summarizes some synthesis and place and route results of the FFT/IFFTProcessor core for Xilinx FPGAs. The analysis has been limited to the parameter values shown in Table 5.1. All the results have been obtained using Xilinx ISE version 12.4. Table 5.2 shows performance and resource usage numbers for Spartan-6 LXT family FPGA. The achievable maximum frequency of the core with this FPGA family depends on the applied optimizations. SoCe Core:120801 Doc: 130107 9 / 12

FFT/IFFTProcessor IP Core Datasheet Table 5.1: Parameter values used for synthesis Parameter Value i dbw 12 o dbw 12 log2n 13 tbw 10 k 7 Table 5.2: Spartan-6 Family Performance and Resource Utilization Optimization ROM N/8 CORDIC FFs 1548 1545 1538 LUTs 6334 3239 3937 RAM16BWERs 11 13 11 RAM8BWERs 6 7 6 DSP48A1s 24 24 21 Max Frequency (MHz) 64 70 41 SoCe Core:120801 Doc: 130107 10 / 12

6 Detailed Example Design In this example, an N = 8K-points FFT implementation example is presented. The specifications are an input and output wordlengths of 12 bits, and a signal to quantization noise ratio (SQNR) of 40dB. A pipeline-sdf radix 2 7 DIF FFT architecture has been implemented. Figure 6.1 shows the block diagram of the architecture. The wordlength of the processor is determined by SQNR simulation. The internal wordlengths have been adjusted to their minimum values for continue guaranteeing the specified SQNR. The bitwidth of the twiddle factors has been fixed to 10 bits. The VHDL description of the module has been simulated and compared with the obtained results with floating-point arithmetic with a satisfactory result. SoCe Core:120801 Doc: 130107 11 / 12

FFT/IFFTProcessor IP Core Datasheet 4096 2048 LUT 128 1024 512 LUT 32 256 128 64 LUT 8192 x(n) BF1 BF2 BF1 BF2 BF1 BF2 BF1 32 BF1 16 BF2 LUT 64 8 BF1 4 BF2 2 BF1 1 BF2 X[k] Figure 6.1: Pipeline-SDF radix 2 7 DIF architecture for N = 8K SoCe Core:120801 Doc: 130107 12 / 12