Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Similar documents
Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

User Manual for FC100

Documentation. Design File Formats. Constraints Files. Verification. Slices 1 IOB 2 GCLK BRAM

April 7, 2010 Data Sheet Version: v4.00

Documentation Design File Formats

An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs

An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs

7 Series FPGAs Memory Interface Solutions (v1.9)

High-Performance DDR3 SDRAM Interface in Virtex-5 Devices Author: Adrian Cosoroaba

Core Facts. Documentation. Design File Formats. Simulation Tool Used Designed for interfacing configurable (32 or 64

AN 464: DFT/IDFT Reference Design

April 6, 2010 Data Sheet Version: v2.05. Support 16 times Support provided by Xylon

DDR2 Controller Using Virtex-4 Devices Author: Tze Yi Yeoh

Enabling success from the center of technology. Interfacing FPGAs to Memory

logibayer.ucf Core Facts

Core Facts. Documentation. Encrypted VHDL Fallerovo setaliste Zagreb, Croatia. Verification. Reference Designs &

Support Triangle rendering with texturing: used for bitmap rotation, transformation or scaling

Basic Xilinx Design Capture. Objectives. After completing this module, you will be able to:

Virtex-6 FPGA Embedded Tri-Mode Ethernet MAC Wrapper v1.4

FFT/IFFTProcessor IP Core Datasheet

EECS150 - Digital Design Lecture 17 Memory 2

Virtex-5 FPGA Embedded Tri-Mode Ethernet MAC Wrapper v1.7

LogiCORE IP Initiator/Target v5 and v6 for PCI-X

Creating High-Speed Memory Interfaces with Virtex-II and Virtex-II Pro FPGAs Author: Nagesh Gupta, Maria George

Synthesizable CIO DDR RLDRAM II Controller for Virtex-II Pro FPGAs Author: Rodrigo Angel

Data Side OCM Bus v1.0 (v2.00b)

2GB DDR3 SDRAM SODIMM with SPD

ISim Hardware Co-Simulation Tutorial: Accelerating Floating Point Fast Fourier Transform Simulation

High-Performance 16-Point Complex FFT Features 1 Functional Description 2 Theory of Operation

LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v3.01.a)

Block RAM. Size. Ports. Virtex-4 and older: 18Kb Virtex-5 and newer: 36Kb, can function as two 18Kb blocks

Graphics Controller Core

Single Channel HDLC Core V1.3. LogiCORE Facts. Features. General Description. Applications

4GB Unbuffered VLP DDR3 SDRAM DIMM with SPD

Virtex-5 GTP Aurora v2.8

FPGA Provides Speedy Data Compression for Hyperspectral Imagery

FPGA Matrix Multiplier

Energy Optimizations for FPGA-based 2-D FFT Architecture

LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v3.00.a)

DDR and DDR2 SDRAM Controller Compiler User Guide

32 Channel HDLC Core V1.2. Applications. LogiCORE Facts. Features. General Description. X.25 Frame Relay B-channel and D-channel

SHA Core, Xilinx Edition. Core Facts

Channel FIFO (CFIFO) (v1.00a)

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS

Fast Fourier Transform IP Core v1.0 Block Floating-Point Streaming Radix-2 Architecture. Introduction. Features. Data Sheet. IPC0002 October 2014

FlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com

Xilinx Memory Interface Generator (MIG) User Guide

LogiCORE IP Multiply Adder v2.0

LogiCORE IP FIFO Generator v6.1

Documentation. Implementation Xilinx ISE v10.1. Simulation

Instantiation. Verification. Simulation. Synthesis

IMM64M72D1SCS8AG (Die Revision D) 512MByte (64M x 72 Bit)

LogiCORE IP Serial RapidIO v5.6

Table 1: Example Implementation Statistics for Xilinx FPGAs

High-Performance DDR2 SDRAM Interface in Virtex-5 Devices Authors: Karthi Palanisamy and Maria George

ISim Hardware Co-Simulation Tutorial: Accelerating Floating Point FFT Simulation

LogiCORE IP Mailbox (v1.00a)

Fibre Channel Arbitrated Loop v2.3

FFT MegaCore Function User Guide

Asynchronous FIFO V3.0. Features. Synchronization and Timing Issues. Functional Description

LogiCORE IP AXI Video Direct Memory Access v5.00.a

APPROVAL SHEET. Apacer Technology Inc. Apacer Technology Inc. CUSTOMER: 研華股份有限公司 APPROVED NO. : T0031 PCB PART NO. :

IMM64M64D1SOD16AG (Die Revision D) 512MByte (64M x 64 Bit)

QDRII SRAM Controller MegaCore Function User Guide

IMM128M72D1SOD8AG (Die Revision F) 1GByte (128M x 72 Bit)

QDRII SRAM Controller MegaCore Function User Guide

AES Core Specification. Author: Homer Hsing

EECS150 - Digital Design Lecture 16 - Memory

January 19, 2010 Product Specification Rev1.0. Core Facts. Documentation Design File Formats. Slices 1 BUFG/

Utility Reduced Logic (v1.00a)

IMM128M64D1DVD8AG (Die Revision F) 1GByte (128M x 64 Bit)

APPROVAL SHEET. Apacer Technology Inc. Apacer Technology Inc. CUSTOMER: 研華股份有限公司 APPROVED NO. : T0026 PCB PART NO. :

440GX Application Note

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

Stratix vs. Virtex-II Pro FPGA Performance Analysis

IMM64M64D1DVS8AG (Die Revision D) 512MByte (64M x 64 Bit)

SDR SDRAM Controller. User Guide. 10/2012 Capital Microelectronics, Inc. Beijing, China

Discontinued IP. Distributed Memory v7.1. Functional Description. Features

Agenda. Introduction FPGA DSP platforms Design challenges New programming models for FPGAs

Clock Correction Module

A Low Power DDR SDRAM Controller Design P.Anup, R.Ramana Reddy

LogiCORE IP AXI DataMover v3.00a

Virtex-II SiberBridge Author: Ratima Kataria & the SiberCore Applications Engineering Group

V8-uRISC 8-bit RISC Microprocessor AllianceCORE Facts Core Specifics VAutomation, Inc. Supported Devices/Resources Remaining I/O CLBs

COMPUTER ARCHITECTURES

H100 Series FPGA Application Accelerators

Introduction to Partial Reconfiguration Methodology

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

LogiCORE IP Fast Fourier Transform v7.1

DO-254 AXI 7 Series DDRx (Limited) 1.00a Certifiable Data Package (DAL A) General Description. Features. August 29, 2014, Rev. -

ALTDQ_DQS2 Megafunction User Guide

LogiCORE IP AXI DMA (v4.00.a)

LogiCORE IP AXI Master Lite (axi_master_lite) (v1.00a)

White Paper. Floating-Point FFT Processor (IEEE 754 Single Precision) Radix 2 Core. Introduction. Parameters & Ports

Sparse Matrix-Vector Multiplication FPGA Implementation

FPGAs for Image Processing

AXI4-Stream Infrastructure IP Suite

JEDEC Standard No. 21 -C Page Appendix E: Specific PD s for Synchronous DRAM (SDRAM).

Gate Estimate. Practical (60% util)* (1000's) Max (100% util)* (1000's)

Transcription:

(ULFFT) November 3, 2008 Product Specification Dillon Engineering, Inc. 4974 Lincoln Drive Edina, MN USA, 55436 Phone: 952.836.2413 Fax: 952.927.6514 E-mail: info@dilloneng.com URL: www.dilloneng.com Core Facts Documentation Design File Formats Constraints Files Verification Instantiation Templates Reference Designs & Provided with Core Features Application Notes Additional Items UltraLong algorithm for performing continuous Simulation Tool Used Fast Fourier Transforms (FFTs) For transform lengths that exceed on-fpga Aldec Riviera 2008.06 - memory capacity Up to 4M points using QDR SRAM Support Support Provided by Dillon Engineering, Inc. - Up to 64M points using DDR SDRAM Any-width fixed- or floating-point data Run-time selectable length Run-time selectable Forward/Inverse transform mode Continuous processing at rate up to Fmax (see Table 1). - Data rate of 200MSamples/sec in Virtex-5. Natural-order inputs and outputs Includes C/C++ bit-accurate model and data generator - Model also usable from MATLAB Includes Verilog testbench and run scripts User Guide ISE Project with EDIF/NGC netlist, Verilog source available for extra cost.ucf constraints Verilog Testbench, Test Vectors Verilog None C/C++ Model Table 1: Example Implementation Statistics for Xilinx Virtex -5 SXT-2, Single Precision Float FFT Length Fmax (MHz) External # Mem Min Mem Memory Type 1 Banks Size (ea.) Slice FF 2 Slice LUT 2 IOB 3 BUFG BRAM 4 DSP48E DCM Design 2M 200 QDRII SRAM 3 4Mx32 43,705 48,831 474 5 90 385 1 ISE 10.1.02 16M 175 DDR2 SDRAM 3 32Mx64 59,430 62,434 567 6 459 445 1 ISE 10.1.02 64M 100 DDR2 SDRAM 3 128Mx64 63,643 66,365 567 6 483 485 1 ISE 10.1.02 64M 50 DDR2 SDRAM 2 128Mx64 34,968 36,354 425 6 302 249 1 ISE 10.1.02 64M 50 DDR2 SDRAM 3 256Mx32 61,161 63,716 474 6 282 485 1 ISE 10.1.02 64M 25 DDR2 SDRAM 2 256Mx32 33,314 34,863 316 6 168 249 1 ISE 10.1.02 Notes: 1) Assuming QDRII SRAM @ 200MHz, DDR2 SDRAM @ 300MHz. 2) Actual slice count dependent on percentage of unrelated logic see Mapping Report File for details 3) Assuming all core I/Os and clocks are routed off-chip. 4) Indicates maximum BRAM usage. Substituting distributed RAM and/or built-in FIFOs reduces BRAM count. Tools November 3, 2008 2008 Xilinx, Inc. All rights reserved. XILINX, the Xilinx Logo, and other designated brands included herein are trademarks of Xilinx, Inc.

Figure 1: UltraLong FFT Block Diagram General Description Dillon Engineering's UltraLong FFT IP Core uses an efficient Fast Fourier Transform (FFT) algorithm to provide multimillion-point discrete transforms on data frames or continuous data streams. This structure utilizes state-of-the-art off-chip memory technology and N1- and N2-length pipelined radix-2 FFT engines with an additional rotation stage to perform N=N1xN2 transform lengths, from 1K to 64M points. The core is available with any width fixed or floating point data. The UltraLong IP Core is easily targeted to current Xilinx FPGA devices and various external memory types. Functional Description Shuffler The Shuffler blocks transpose the N-length data into N1- and N2-length row and column orders, also handling different memory access burst requirements. Memory Controllers The Memory Controller blocks write and read data to and from external memories. Memory technologies include QDR and DDR SRAMs and DRAMs supported by the Xilinx Memory Interface Generator. FFTA and B The FFT blocks use radix-2 pipelined FFT engines to perform continuous N1- and N2-length transforms. CORDIC/Rotate The CORDIC generates complex coordinate multipliers as required by the Rotate block in the UltraLong FFT algorithm. UltraLong Algorithm The N=N1xN2 UltraLong Discrete Fourier Transform algorithm follows the form: November 3, 2008 2

Applications The UltraLong FFT IP Core is useful in High Performance Embedded Computing (HPEC) applications which require continuous Digital Processing (DSP) at high sample rates and long transforms. FFT hardware acceleration or co-processing is often a goal of scientific algorithms used in High Performance Computing (HPC). End applications and markets include radar, sonar, spectral analysis, acoustics, and telecommunications. Core Modifications The IP Core is available in netlist or parameterized source code and is customized to support the following: Netlist builds for current Xilinx Virtex-5 and Virtex-4 devices. FFT length and speed depend on chip resources, speed grade, and external memory interfaces. Standard builds use 3 independent external memory banks as shown in Figure 1. Alternative builds use 2 or 1 external memory banks and share a single FFT/Rotation engine. The same maximum FFT length applies with any option, but 3 banks support higher continuous data rates compared with 2 or 1 banks. Further customized builds are available to support high-rate (to 200MSps) combined with long-length (to 64M points). These designs require DDR SDRAM for transpose storage and additional QDR SRAM for transpose burst cache. Per-transform length selectable in powers-of-2 from 2 min to 2 max points, with definable min >= 10 and min <= max <= 26. IEEE-754 floating point math operations use Xilinx Coregen floating point cores, which are built separately using Coregen. Thus all trade-offs between speed, number of pipeline stages, DSP48/Mult macro usage, single-, double- or custom-precision float, etc., can be supported. Any width fixed-point math operators can be used in lieu of floating point, with options for scaling, rounding and saturation modes, all matched bit-accurate with the C/C++-model. Contact Dillon Engineering for more details. Core I/O s The core signal I/O have not been fixed to specific device pins to provide flexibility for interfacing with user logic. Descriptions of all standard signal I/O are provided in Table 2, 2a and 2b. Table 2: Core I/O s. Direction Description CLK Input Clock Input. For QDR SRAM designs, this is the source clock for both the datapath logic and the memories. For DDR SDRAM designs, this is the datapath logic clock. CLK200 Input 200MHz reference clock used by QDR and DDR delay control. CLK0 Output (QDR SRAM designs only). This is the CLK0 output of the DCM, matched to CLK. CLK_REF Input (DDR SDRAM designs only). This is the source clock for the SDRAMs. RST_N Input Active-low asynchronous reset. Resets all control logic. November 3, 2008 3

DIR Input Transform mode select. 0 = Forward FFT, 1 = Inverse FFT. SEL[4:0] Input Transform length select. Valid range is from 5'd10 (indicating transform length of 1K) up to the maximum length supported by the build (e.g. 5'd26 for a transform length of 64M). INIT_DONE Output When active, indicates all external memories have completed the PHY init sequence. SYNC_IN Input Input sync strobe. Indicates to the core to begin processing i_data on the following clock cycle. A[63:0] Input Input data. Complex data of the form R + iq, where R is contained in bits 63:32 and Q is contained in bits 31:0, each a single-precision floating point number. SYNC_OUT Output Output sync strobe. Indicates the core is sending processed o_data beginning on the following clock cycle. X[63:0] Output Output data. Complex data of the form R + iq, where R is contained in bits 63:32 and Q is contained in bits 31:0, each a single-precision floating point number. Table 2a: Example I/O for each QDRII SRAM interface, using 2 components of 4Mx18 each. Direction Description QDRx_CQ[1:0] Input QDR read clock QDRx_CQ_N[1:0] Input QDR read clock QDRx_Q[35:0] Input QDR memory read data QDRx_C[1:0] Output QDR read source clock QDRx_C_N[1:0] Output QDR read source clock QDRx_K[1:0] Output QDR write clock QDRx_K_N[1:0] Output QDR write clock QDRx_SA[19:0] Output QDR memory address QDRx_D[35:0] Output QDR memory write data QDRx_BW_N[3:0] Output QDR memory byte enable QDRx_DOFF_N Output QDR DLL disable QDRx_R_N Output QDR read enable QDRx_W_N Output QDR write enable Table 2b: Example I/O for each DDR2 SDRAM interface, using 4 components of 256Mx8 each. Direction DDRx_CK[3:0] Output DDR clock DDRx_CK_N[3:0] Output DDR clock DDRx_A[14:0] Output DDR row/col address DDRx_BA[2:0] Output DDR bank address DDRx_RAS_N Output DDR row select DDRx_CAS_N Output DDR col select DDRx_WE_N Output DDR write enable DDRx_CS_N Output DDR chip select DDRx_CKE Output DDR clock enable DDRx_ODT[3:0] Output DDR on-die termination DDRx_DM[3:0] Output DDR data mask DDRx_DQS[3:0] InOut DDR dqs strobe DDRx_DQS_N[3:0] InOut DDR dqs strobe DDRx_DQ[31:0] InOut DDR data Description November 3, 2008 4

Critical Descriptions All data interface and internal operation of the core is synchronous to CLK. Simple SYNC strobes are used on the input and output interfaces to signal that data is valid on the following clock cycle. An active SYNC coinciding with the last data point thus indicates back-to-back transforms. A SYNC_IN strobe active while the core is already inputing data is ignored. Tying SYNC_IN active will signal the core to perform continuous transforms, and SYNC_OUT will strobe as normal to frame the output data. Figure 2: Interface Input Timing, 1K-Length Back-to-Back Transforms Figure 3: Interface Output Timing, 1K-Length Back-to-Back Transforms The DIR and SEL configuration inputs are by default selectable per-transform, but must be stable starting with SYNC_IN active and must not be changed until the transformed data has been completely emptied from the core (i.e. 2 m clocks after the corresponding SYNC_OUT). The user design must wait until INIT_DONE is active before inputing data. Core Assumptions Following SYNC_IN, the initial transform has a start-up latency dependent on the three transpose operations and the FFT pipeline latencies for the length of the transform. The core provides continuous processing at steady state, though the SYNC IN to OUT latencies may vary slightly due to internal pipeline alignment and memory interface interruptions such as refresh. Verification Methods The core is verified to be bit-accurate with the C/C++ data model under all supported lengths, modes, throughputs and data format, using a rigorous simulation suite of directed and random data. Our model development is evaluated in terms of SQNR with a double-precision floating point software FFT implementation. Dillon Engineering's FFT IP Cores have been proven over the years in many Xilinx designs. November 3, 2008 5

Ordering Information This product is available directly from Dillon Engineering, Inc. Please contact Dillon Engineering for pricing and additional information about this product using the contact information on the front page of this datasheet. Visit www.dilloneng.com/fft_ip to see all of Dillon Engineering's FFT IP offerings, including: Pipelined FFTs (single point per clock cycle, fixed or floating point) Parallel Butterfly FFTs (continuous FFTs at multiple points per clock cycle) Full Parallel FFTs (extremely fast rates, up to 25GSamples/sec) 2D FFTs (Two-dimensional transform for image processing) Mixed Radix FFTs (for non-power of 2 FFT lengths) Related Information Xilinx Programmable Logic For information on Xilinx programmable logic or development system software, contact your local Xilinx sales office, or: Xilinx, Inc. 2100 Logic Drive San Jose, CA 95124 Phone: +1 408-559-7778 Fax: +1 408-559-7114 URL: www.xilinx.com November 3, 2008 6