FPGA Solutions: Modular Architecture for Peak Performance

Similar documents
VXS-621 FPGA & PowerPC VXS Multiprocessor

VXS-610 Dual FPGA and PowerPC VXS Multiprocessor

CHAMP-FX2. FPGA Accelerator Signal Processing Platform. Data Sheet. Features 6U VPX-REDI (VITA 46 and 48) FPGA signal processing platform

QuiXilica V5 Architecture

XMC-FPGA05F. Programmable Xilinx Virtex -5 FPGA PMC/XMC with Quad Fiber-optics. Data Sheet

Support for Programming Reconfigurable Supercomputers

PMC-440 ProWare FPGA Module & ProWare Design Kit

ISE Design Suite Software Manuals and Help

Ensemble 6000 Series OpenVPX HCD6210 Dual QorIQ T4240 Processing Module

HPE720. Dual Xilinx Virtex -5 FPGA & MPC8640D VPX Processor Card. Data Sheet

Choosing the Right COTS Mezzanine Module

Octopus: A Multi-core implementation

FPE320. Xilinx Virtex -5 3U VPX Processor with FMC Site. Data Sheet

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

Product Overview. Programmable Network Cards Network Appliances FPGA IP Cores

VPF1. Dual PowerPC, Dual Xilinx Virtex-II Pro FPGA Processing Engine. Features.

SDR-3000 Series Software Defined Radio Transceiver Subsystems

Compute Node Design for DAQ and Trigger Subsystem in Giessen. Justus Liebig University in Giessen

AD GSPS Analog Input XMC/PMC with Xilinx Virtex -5 FPGA. Data Sheet

UWB PMC/XMC I/O Module

Nutaq Perseus 601X Virtex-6 AMC with FMC site PRODUCT SHEET

Calypso-V6 VME / VXS. Extreme Signal Acquisition. and FPGA-based Processing. Without Compromise

Leveraging the PCI Support in Windows 2000 for StarFabric-based Systems

AMC516 Virtex-7 FPGA Carrier for FMC, AMC

An NVMe-based Offload Engine for Storage Acceleration Sean Gibb, Eideticom Stephen Bates, Raithlin

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

Virtex-6 FPGA ML605 Evaluation Kit FAQ June 24, 2009

Spartan-6 & Virtex-6 FPGA Connectivity Kit FAQ

ReconOS: An RTOS Supporting Hardware and Software Threads

TS-C43. The TS-C43 is a quad TigerSHARC DSP PMC. Quad ADSP-TS101S DSP 64-bit PMC Card. Features. 4x 300MHz ADSP-TS101 DSPs

AMC517 Kintex-7 FPGA Carrier for FMC, AMC

Embedded Tech Trends 2014 New EW architectures based on tight coupling of FPGA and CPU processing

PowerPC on NetFPGA CSE 237B. Erik Rubow

Creating High Performance Clusters for Embedded Use

Virtex 6 FPGA Broadcast Connectivity Kit FAQ

Virtex-5 GTP Aurora v2.8

In-chip and Inter-chip Interconnections and data transportations for Future MPAR Digital Receiving System

Gemini-V6 VME / VXS. Extreme Signal Acquisition. and FPGA-based Processing. Without Compromise

RiceNIC. A Reconfigurable Network Interface for Experimental Research and Education. Jeffrey Shafer Scott Rixner

Avoid Bottlenecks Using PCI Express-Based Embedded Systems

Replacing legacy RICE electronics Mitigating risk Component obsolescence, maintenance burden, Micro VAX II backplane, programmer portability

Field Programmable Gate Array (FPGA) Devices

S2C K7 Prodigy Logic Module Series

VPXI epc. Datasheet. AmpliconBenelux.com. Air Cooled 4U 1/2 Rack OpenVPX Windows/Linux Computer with Four Expansion Slots DESCRIPTION

System Level Design with IBM PowerPC Models

INT G bit TCP Offload Engine SOC

3CPF1. 3U PowerPC/Xilinx Virtex-II Pro Processing Engine. Data Sheet

XMC Module with Eight 250 MSPS, 14-bit A/Ds, Xilinx Virtex-6 FPGA, and 4 GB LPDDR2

ML505 ML506 ML501. Description. Description. Description. Features. Features. Features

Air Cooled 4U 1/2 Rack OpenVPX Windows/Linux Computer with Four Expansion Slots DESCRIPTION

Simplify System Complexity

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

OCP Engineering Workshop - Telco

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors

An FPGA-Based Optical IOH Architecture for Embedded System

Components of a MicroTCA System

Altera Product Overview. Altera Product Overview

Gate Estimate. Practical (60% util)* (1000's) Max (100% util)* (1000's)

Embedded Control Redefined: New C/C++ Options on NI Linux Real-Time

Gedae cwcembedded.com. The CHAMP-AV6 VPX-REDI. Digital Signal Processing Card. Maximizing Performance with Minimal Porting Effort

Simplify System Complexity

DESIGN STRATEGIES & TOOLS UTILIZED

Software Development Using Full System Simulation with Freescale QorIQ Communications Processors

C901 PowerPC MPC7448 3U CompactPCI SBC

High Performance Embedded Applications. Raja Pillai Applications Engineering Specialist

Boundary Scan: Technology Update

Industry Collaboration and Innovation

Design of a Gigabit Distributed Data Multiplexer and Recorder System

LogiCORE IP Serial RapidIO v5.6

English Japanese

X6-250M. XMC Module with Eight 310 MSPS, 14-bit A/Ds, Xilinx Virtex-6 FPGA, and 4 GB LPDDR2 DESCRIPTION FEATURES APPLICATIONS SOFTWARE

SECURE PARTIAL RECONFIGURATION OF FPGAs. Amir S. Zeineddini Kris Gaj

Simplifying the Development and Debug of 8572-Based SMP Embedded Systems. Wind River Workbench Development Tools

XMC Products. High-Performance XMC FPGAs, XMC 10gB Ethernet, and XMC Carrier Cards. XMC FPGAs. FPGA Extension I/O Modules.

FlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com

Keysight U5340A FPGA Development Kit for High-Speed Digitizers

Full Linux on FPGA. Sven Gregori

10.3 A General Purpose FPGA Based Radar Controller and Signal Processor

Technical Backgrounder: The Optical Data Interface Standard April 28, 2018

The VITA Radio Transport as a Framework for Software Definable Radio Architectures

The Myricom ARC Series with DBL

Thunderbolt. VME Multiprocessor Boards and Systems. Best Price/Performance of High Performance Embedded C o m p u t e r s

Embedded Tech Trends 2014 Rodger H. Hosking Pentek, Inc. VPX for Rugged, Conduction-Cooled Software Radio Virtex-7 Applications

XMC Module with Two 1.8 GSPS, 12-bit A/Ds, Xilinx Virtex-6 FPGA, and 4 GB LPDDR2 DESCRIPTION

The CoreConnect Bus Architecture

New! New! New! New! New!

Early Models in Silicon with SystemC synthesis

Avnet, Xilinx ATCA PICMG Design Kit Hardware Manual

A Deterministic Flow Combining Virtual Platforms, Emulation, and Hardware Prototypes

Streaming, made simple. FPGA Manager. Streaming made simple

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

tekdxfxsl FibreXtreme I/O Interface

Ensemble 6000 Series OpenVPX Intel Xeon Dual Quad-Core HDS6600 Module

Using FPGAs as a Flexible PCI Interface solution

Implementing RapidIO. Travis Scheckel and Sandeep Kumar. Communications Infrastructure Group, Texas Instruments

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)

C6100 Ruggedized PowerPC VME SBC

Intel Quartus Prime Pro Edition Software and Device Support Release Notes

Transcription:

FPGA Solutions: Modular Architecture for Peak Performance Real Time & Embedded Computing Conference Houston, TX June 17, 2004 Andy Reddig President & CTO andyr@tekmicro.com

Agenda Company Overview FPGA technology trends - Xilinx Virtex II Pro (RocketIO ports, embedded 405 PPC) - Hardware/software co-design toolsets - IP cores FPGA system-level integration - tekconnect IP-to-IP interconnect - tekx software environment FPGA-based products - PMC I/O Modules - VITA 42 XMC I/O Modules (PCI Express, Serial RapidIO) - 6U VME / RACE++ carriers - 6U VITA 41 VXS carriers (PCI Express, Serial RapidIO) FPGA system examples

TEK Microsystems At A Glance Business Facts and Figures: - Founded in 1981 - Privately held - 20 employees - $4.4M sales (2003), CAGR > 30% from 1999 to 2003 Early adopter of fabric based technology - First RACEway product in 1996 - Industry s first PPC/PMC carrier for RACE++ in 1999 - Broadest range of I/O solutions for RACE++ in 2001 FPGA technology for reconfigurable computing - Over 30 I/O modules using FPGAs for customization since 1998 - Modular hardware / software / IP architecture Leadership position in open standards development - VITA / VSO, RapidIO, PICMG - Draft editor for VITA 42 XMC, Co-chair PICMG XMC Express Leveraging I/O and fabric expertise into heterogeneous signal processing technology (FPGA, FPOA, PPC)

FPGA Technology Trends Xilinx Virtex II Pro Family Embedded 405 PowerPC processors - Limited usefulness for general purpose processing - Good fit for stream management and control Gate density up to 12.5M gates - Tends to make designs power-limited instead of space-limited - Power and thermal requirements driven by IP functionality Integrated RocketIO SerDes ports - Supports Serial RapidIO, PCI Express, other interconnects through IP changes - Enables fabric-agnostic endpoints, processors Active and growing ecosystem - Hardware / software co-design toolsets - IP cores for signal processing

FPGA vs PowerPC Toolsets PowerPC FPGA Components Development Language AltiVec IBM 970 C, C++ Xilinx V2Pro VHDL, Verilog Source-Level Development Tools Optimized Signal Processing Functions Compilers Libraries VSIPL Synthesis Place & Route Cores

FPGA Toolsets Traditional FPGA design - Xilinx Foundation ISE, Synplicity Synplify Pro for synthesis - Xilinx Foundation ISE for place & route - ModelSim for simulation FPGA debug support - Xilinx ChipScope logic analyzer through JTAG interface - Hardware-in-the-loop verification tools C-to-VHDL translation tools Graphical high-level design tools IP core vendors No one toolset fits all customers / applications our approach is to enable and validate toolsets and offer the full range of options to users

FPGA Turnkey Solutions Some applications do not need custom IP development If 80% of the application workload is 512K FFTs (for example), a turnkey solution offers quick time-to-market with low development cost / risk Tekmicro offers pre-integrated IP cores as bitstream solutions no FPGA coding required Turnkey solutions have the same software API support as customer-developed IP, allowing initial prototyping using a turnkey solution to be upgraded to a custom tailored solution when available Our focus is on integration of cores, not development of cores with cores selected based on customer demand

tekconnect Interface IP CORE DATA + TAG CONTROL STATUS DATA + TAG CONTROL STATUS IP-to-IP interconnect Used on-chip and chip-to-chip Simple streaming interface 32 or 64 data bits 4 tag bits - Frame marks - Split transaction control - Event notification - Control / status registers Unidirectional or bidirectional Supports data-only or address/route/data functionality Supports master or slave semantics

tekconnect Integration (FFT core) tekconnect wrapper around off-theshelf FFT IP core tekconnect wrapper FFT Registers Framing & flow control Frame marks used to start FFT processing Register interface used for FFT core configuration - Static or dynamic Abstracts interface to FFT core Allows easy pipelining of IP cores Supports insertion of improved cores without impacting application software or other FPGA IP

tekconnect Integration (Fabric Interface) tekconnect wrapper DMA Engine ukernel Firmware PCI Express Core 405 PPC tekconnect wrapper around interface to off-chip fabric Uses embedded 405PPC for intelligent stream management protocol Frame marks used to control DMA packet boundaries Head of frame data optionally controls DMA packet chain selection, allowing data-driven dispatch Abstracts interface to bus / fabric Supports fabric-agnostic FPGA designs - PCI, PCI-X - RACE++ - StarFabric -PCI Express - Serial RapidIO Supports migration of FPGA design to different platforms / fabrics without impacting application software or other FPGA IP

tekx Software Environment Fabric configuration (auto-discovery when possible) Name server for object lookup from any node - Distributed object database (low latency) - Static vs. dynamic object management Global Shared Memory ( SMB ) for shared data Data transfer library ( DX ) for scheduled DMA operations Interprocessor communications primitives - Semaphore - Message queue (1-to-1 and N-to-1) - Socket Fabric agnostic: RACE++, StarFabric, Serial RapidIO, PCI Express, Advanced Switching OS independent: VxWorks, Linux, MCOE Uses native OS development toolchain

tekx Software Architecture Client-server model for intelligent stream management Each fabric node executes a common server protocol and provides a uniform messaging interface to client nodes Fabric nodes include: - Traditional processor nodes - FPGA based adjunct processing nodes (embedded 405GP in Xilinx Platform Pro FPGAs) - PMC / XMC based I/O processing nodes (embedded 405GP in Xilinx Platform Pro FPGA on PMC or XMC) Architecture provides a standard interface to a wide range of I/O and processing devices accessible to heterogeneous clients Fully interoperable I/O and FPGA solutions with Mercury MCOEbased PowerPC processing

tekx Software FPGA Drivers tekx includes integrated driver support for tekconnect-based endpoints for PCI, PCI-X, RACE++, Serial RapidIO and PCI Express User API calls simply request data transfer between endpoints Endpoints can be I/O streams (I.e. PMC / XMC modules), FPGA streams or memory buffers on PowerPC processing nodes tekx abstracts the management of the DMA controllers, using the appropriate hardware resources to push data efficiently through the fabric Address, routing and flow control are managed under the covers Grouped, looped and adaptive transfers are supported Notification can use polling (spin-lock) or blocking semantics The combination of tekx and tekconnect support co-development of application software and customized FPGA IP that can easily be moved to future products without redesign or modification

Standard FPGA-based Products PMC / XMC I/O modules - Front end / back end architecture - 32-bit and 64-bit PCI options - FPGA formatting / processing engines - VITA 42 XMC modules in development - VxWorks drivers PowerRACE: PMC / XMC carrier boards - Onboard RACE++ fabric - 6U VME form factor - Dual PowerPC processors - Dual FPGA processing engines - Software drivers for VxWorks and MCOE tekx Software Environment tekconnect FPGA IP-to-IP interconnect Linux support under development

PMC Module

PMC / XMC Front End Interfaces HOTLink, HOTLink II (copper, fiber) 11 models TAXI (copper, fiber) 6 models Front Panel Data Port (FPDP) 2 models Parallel ECL, PECL, LVDS, 485 10 models Serial ECL, PECL, 485 3 models Channel Link (Serial LVDS) 4 models Digital Video (125, 244, 259) 2 models DFLEX64, FlexIO Customizable platform FPDP II, Camera Link in development

PMC / XMC Back End Interfaces Module PMC64 PMC64X XMC.2 XMC.3 Interface PCI 64/33 PCI 64/66 VITA 42.2 Serial RIO VITA 42.3 PCI Express FPGA Altera 1K100 Xilinx VP30 Xilinx VP30 Xilinx VP30 Throughput 267 MB/s 533 MB/s 4x: 1.25 GB/s 8x: 2.5 GB/s 4x: 1.0 GB/s 8x: 2.0 GB/s Memory 1 MB 64 MB 1.0 GB/s 256 MB 2.0 GB/s 256 MB 2.0 GB/s

PMC / XMC Back End (srio / PCI Express)

PowerRACE-3 I/O Processor Block Diagram Fabric I/F Fabric I/F FPGA (100K)

PowerRACE-3 I/O Processor Two I/O controller nodes - PowerPC CPU, memory, PCI bridge - RACE++ fabric port - PMC slot Two Virtex II Pro (VP30) FPGA processing engines Fully fabric enabled without using PMC slots tekx software environment Turnkey I/O solutions for a wide range of PMC modules Turnkey FPGA IP solutions FPGA Developers Kit for integration of user-developed IP

PowerRACE-3 FPGA Developers Kit Xilinx Foundation toolset with ModelSim simulation FPGA bitstream downloaded under software control from CPU JTAG connector for ChipScope debug support Minimal serial interface to CPU (can be used for UART-level debug) FPGA IP provided for: - Master/Slave RACE++ interface - DDR memory interface - QDR memory interface - 405GP microkernel and message queue IPC Common core-to-core interconnect using tekconnect v1.1 Sample application IP and test software provided Streaming data IP interface supports recompilation of user IP for different PowerRACE products / future fabrics without modifications

PowerRACE-3 FPGA Developers Kit

PowerRACE-3 FPGA IP Cores Adjunct Processing IP Image processing: - Non-uniformity correction - Forward motion correction - Convolution - Compression / decompression Small FFT (1k 8k points) - Optional (runtime) windowing - Optional (runtime) fixed-to-float conversion Large FFT (up to 512K points) - Requires QDR SSRAM model - 100 Msps throughput with full 36-bit internal precision Digital filtering

FPGA Systems Example #1 PMC Module Time domain processing I/O Processor FFT Core PCI Interface RACE++ Interface RACE++ Interface Digital Receiver Front End Input is Parallel ECL, 100 MB/s Time domain processing performed in PMC FPGA 4K FFT performed on baseboard FPGA Migration underway to move FFT to larger PMC FPGA and downstream processing into I/O Processor baseboard FPGA

FPGA Systems Example #2 PMC Module Frame formatting I/O Processor Non- Uniformity Correction Detection PCI Interface RACE++ Interface RACE++ Interface Image Processing Front End Input is Channel Link, 200 MB/s Line / frame formatting performed in PMC FPGA Non-Uniformity Correction - 50 MB table memory in DDR Detection - Multi-line / multi-frame processing - Uses other DDR page for buffering Two VP30 FPGAs will replace six 7410 AltiVec processors Board count reduced by 40% (5 to 3) Next generation will use serial fabric, lowering cost further, add capability Reuse of FPGA IP and rapid prototyping critical to meet fast product cycle times

FPGA Systems Example #3 PMC Module TD proc, formatting I/O Processor Windowing 512K FFT PCI Interface RACE++ Interface RACE++ Interface DF Processing Front End Input is digital receiver data, 14 bit x 100 Msps, 200 MB/s Time domain processing and sample formatting performed in PMC FPGA Custom 512K FFT core - 34 bit internal precision - Proprietary windowing algorithm - Uses three banks of QDR SSRAM Replaces 12-16 AltiVec CPUs with four VP30 FPGAs In development

Summary PowerRACE-3 is our first FPGA-based I/O processing baseboard - Available now - Targeted at legacy RACE++ systems for technology refresh - Limited by RACE++ to 267 MB/s per fabric port - FPGA processing is I/O limited for many applications PowerFLEX-4 (3Q04) will break the throughput bottleneck - Open standards-based solution (I/O and backplane) - 2.5 GB/s bandwidth to/from each XMC module - 2.5 GB/s bandwidth to/from the backplane - Same tekconnect and tekx architecture Our modular approach to FPGA solutions allows applications to be prototyped today using RACE++ and migrate to future switched fabric interconnects