FPGA Solutions: Modular Architecture for Peak Performance Real Time & Embedded Computing Conference Houston, TX June 17, 2004 Andy Reddig President & CTO andyr@tekmicro.com
Agenda Company Overview FPGA technology trends - Xilinx Virtex II Pro (RocketIO ports, embedded 405 PPC) - Hardware/software co-design toolsets - IP cores FPGA system-level integration - tekconnect IP-to-IP interconnect - tekx software environment FPGA-based products - PMC I/O Modules - VITA 42 XMC I/O Modules (PCI Express, Serial RapidIO) - 6U VME / RACE++ carriers - 6U VITA 41 VXS carriers (PCI Express, Serial RapidIO) FPGA system examples
TEK Microsystems At A Glance Business Facts and Figures: - Founded in 1981 - Privately held - 20 employees - $4.4M sales (2003), CAGR > 30% from 1999 to 2003 Early adopter of fabric based technology - First RACEway product in 1996 - Industry s first PPC/PMC carrier for RACE++ in 1999 - Broadest range of I/O solutions for RACE++ in 2001 FPGA technology for reconfigurable computing - Over 30 I/O modules using FPGAs for customization since 1998 - Modular hardware / software / IP architecture Leadership position in open standards development - VITA / VSO, RapidIO, PICMG - Draft editor for VITA 42 XMC, Co-chair PICMG XMC Express Leveraging I/O and fabric expertise into heterogeneous signal processing technology (FPGA, FPOA, PPC)
FPGA Technology Trends Xilinx Virtex II Pro Family Embedded 405 PowerPC processors - Limited usefulness for general purpose processing - Good fit for stream management and control Gate density up to 12.5M gates - Tends to make designs power-limited instead of space-limited - Power and thermal requirements driven by IP functionality Integrated RocketIO SerDes ports - Supports Serial RapidIO, PCI Express, other interconnects through IP changes - Enables fabric-agnostic endpoints, processors Active and growing ecosystem - Hardware / software co-design toolsets - IP cores for signal processing
FPGA vs PowerPC Toolsets PowerPC FPGA Components Development Language AltiVec IBM 970 C, C++ Xilinx V2Pro VHDL, Verilog Source-Level Development Tools Optimized Signal Processing Functions Compilers Libraries VSIPL Synthesis Place & Route Cores
FPGA Toolsets Traditional FPGA design - Xilinx Foundation ISE, Synplicity Synplify Pro for synthesis - Xilinx Foundation ISE for place & route - ModelSim for simulation FPGA debug support - Xilinx ChipScope logic analyzer through JTAG interface - Hardware-in-the-loop verification tools C-to-VHDL translation tools Graphical high-level design tools IP core vendors No one toolset fits all customers / applications our approach is to enable and validate toolsets and offer the full range of options to users
FPGA Turnkey Solutions Some applications do not need custom IP development If 80% of the application workload is 512K FFTs (for example), a turnkey solution offers quick time-to-market with low development cost / risk Tekmicro offers pre-integrated IP cores as bitstream solutions no FPGA coding required Turnkey solutions have the same software API support as customer-developed IP, allowing initial prototyping using a turnkey solution to be upgraded to a custom tailored solution when available Our focus is on integration of cores, not development of cores with cores selected based on customer demand
tekconnect Interface IP CORE DATA + TAG CONTROL STATUS DATA + TAG CONTROL STATUS IP-to-IP interconnect Used on-chip and chip-to-chip Simple streaming interface 32 or 64 data bits 4 tag bits - Frame marks - Split transaction control - Event notification - Control / status registers Unidirectional or bidirectional Supports data-only or address/route/data functionality Supports master or slave semantics
tekconnect Integration (FFT core) tekconnect wrapper around off-theshelf FFT IP core tekconnect wrapper FFT Registers Framing & flow control Frame marks used to start FFT processing Register interface used for FFT core configuration - Static or dynamic Abstracts interface to FFT core Allows easy pipelining of IP cores Supports insertion of improved cores without impacting application software or other FPGA IP
tekconnect Integration (Fabric Interface) tekconnect wrapper DMA Engine ukernel Firmware PCI Express Core 405 PPC tekconnect wrapper around interface to off-chip fabric Uses embedded 405PPC for intelligent stream management protocol Frame marks used to control DMA packet boundaries Head of frame data optionally controls DMA packet chain selection, allowing data-driven dispatch Abstracts interface to bus / fabric Supports fabric-agnostic FPGA designs - PCI, PCI-X - RACE++ - StarFabric -PCI Express - Serial RapidIO Supports migration of FPGA design to different platforms / fabrics without impacting application software or other FPGA IP
tekx Software Environment Fabric configuration (auto-discovery when possible) Name server for object lookup from any node - Distributed object database (low latency) - Static vs. dynamic object management Global Shared Memory ( SMB ) for shared data Data transfer library ( DX ) for scheduled DMA operations Interprocessor communications primitives - Semaphore - Message queue (1-to-1 and N-to-1) - Socket Fabric agnostic: RACE++, StarFabric, Serial RapidIO, PCI Express, Advanced Switching OS independent: VxWorks, Linux, MCOE Uses native OS development toolchain
tekx Software Architecture Client-server model for intelligent stream management Each fabric node executes a common server protocol and provides a uniform messaging interface to client nodes Fabric nodes include: - Traditional processor nodes - FPGA based adjunct processing nodes (embedded 405GP in Xilinx Platform Pro FPGAs) - PMC / XMC based I/O processing nodes (embedded 405GP in Xilinx Platform Pro FPGA on PMC or XMC) Architecture provides a standard interface to a wide range of I/O and processing devices accessible to heterogeneous clients Fully interoperable I/O and FPGA solutions with Mercury MCOEbased PowerPC processing
tekx Software FPGA Drivers tekx includes integrated driver support for tekconnect-based endpoints for PCI, PCI-X, RACE++, Serial RapidIO and PCI Express User API calls simply request data transfer between endpoints Endpoints can be I/O streams (I.e. PMC / XMC modules), FPGA streams or memory buffers on PowerPC processing nodes tekx abstracts the management of the DMA controllers, using the appropriate hardware resources to push data efficiently through the fabric Address, routing and flow control are managed under the covers Grouped, looped and adaptive transfers are supported Notification can use polling (spin-lock) or blocking semantics The combination of tekx and tekconnect support co-development of application software and customized FPGA IP that can easily be moved to future products without redesign or modification
Standard FPGA-based Products PMC / XMC I/O modules - Front end / back end architecture - 32-bit and 64-bit PCI options - FPGA formatting / processing engines - VITA 42 XMC modules in development - VxWorks drivers PowerRACE: PMC / XMC carrier boards - Onboard RACE++ fabric - 6U VME form factor - Dual PowerPC processors - Dual FPGA processing engines - Software drivers for VxWorks and MCOE tekx Software Environment tekconnect FPGA IP-to-IP interconnect Linux support under development
PMC Module
PMC / XMC Front End Interfaces HOTLink, HOTLink II (copper, fiber) 11 models TAXI (copper, fiber) 6 models Front Panel Data Port (FPDP) 2 models Parallel ECL, PECL, LVDS, 485 10 models Serial ECL, PECL, 485 3 models Channel Link (Serial LVDS) 4 models Digital Video (125, 244, 259) 2 models DFLEX64, FlexIO Customizable platform FPDP II, Camera Link in development
PMC / XMC Back End Interfaces Module PMC64 PMC64X XMC.2 XMC.3 Interface PCI 64/33 PCI 64/66 VITA 42.2 Serial RIO VITA 42.3 PCI Express FPGA Altera 1K100 Xilinx VP30 Xilinx VP30 Xilinx VP30 Throughput 267 MB/s 533 MB/s 4x: 1.25 GB/s 8x: 2.5 GB/s 4x: 1.0 GB/s 8x: 2.0 GB/s Memory 1 MB 64 MB 1.0 GB/s 256 MB 2.0 GB/s 256 MB 2.0 GB/s
PMC / XMC Back End (srio / PCI Express)
PowerRACE-3 I/O Processor Block Diagram Fabric I/F Fabric I/F FPGA (100K)
PowerRACE-3 I/O Processor Two I/O controller nodes - PowerPC CPU, memory, PCI bridge - RACE++ fabric port - PMC slot Two Virtex II Pro (VP30) FPGA processing engines Fully fabric enabled without using PMC slots tekx software environment Turnkey I/O solutions for a wide range of PMC modules Turnkey FPGA IP solutions FPGA Developers Kit for integration of user-developed IP
PowerRACE-3 FPGA Developers Kit Xilinx Foundation toolset with ModelSim simulation FPGA bitstream downloaded under software control from CPU JTAG connector for ChipScope debug support Minimal serial interface to CPU (can be used for UART-level debug) FPGA IP provided for: - Master/Slave RACE++ interface - DDR memory interface - QDR memory interface - 405GP microkernel and message queue IPC Common core-to-core interconnect using tekconnect v1.1 Sample application IP and test software provided Streaming data IP interface supports recompilation of user IP for different PowerRACE products / future fabrics without modifications
PowerRACE-3 FPGA Developers Kit
PowerRACE-3 FPGA IP Cores Adjunct Processing IP Image processing: - Non-uniformity correction - Forward motion correction - Convolution - Compression / decompression Small FFT (1k 8k points) - Optional (runtime) windowing - Optional (runtime) fixed-to-float conversion Large FFT (up to 512K points) - Requires QDR SSRAM model - 100 Msps throughput with full 36-bit internal precision Digital filtering
FPGA Systems Example #1 PMC Module Time domain processing I/O Processor FFT Core PCI Interface RACE++ Interface RACE++ Interface Digital Receiver Front End Input is Parallel ECL, 100 MB/s Time domain processing performed in PMC FPGA 4K FFT performed on baseboard FPGA Migration underway to move FFT to larger PMC FPGA and downstream processing into I/O Processor baseboard FPGA
FPGA Systems Example #2 PMC Module Frame formatting I/O Processor Non- Uniformity Correction Detection PCI Interface RACE++ Interface RACE++ Interface Image Processing Front End Input is Channel Link, 200 MB/s Line / frame formatting performed in PMC FPGA Non-Uniformity Correction - 50 MB table memory in DDR Detection - Multi-line / multi-frame processing - Uses other DDR page for buffering Two VP30 FPGAs will replace six 7410 AltiVec processors Board count reduced by 40% (5 to 3) Next generation will use serial fabric, lowering cost further, add capability Reuse of FPGA IP and rapid prototyping critical to meet fast product cycle times
FPGA Systems Example #3 PMC Module TD proc, formatting I/O Processor Windowing 512K FFT PCI Interface RACE++ Interface RACE++ Interface DF Processing Front End Input is digital receiver data, 14 bit x 100 Msps, 200 MB/s Time domain processing and sample formatting performed in PMC FPGA Custom 512K FFT core - 34 bit internal precision - Proprietary windowing algorithm - Uses three banks of QDR SSRAM Replaces 12-16 AltiVec CPUs with four VP30 FPGAs In development
Summary PowerRACE-3 is our first FPGA-based I/O processing baseboard - Available now - Targeted at legacy RACE++ systems for technology refresh - Limited by RACE++ to 267 MB/s per fabric port - FPGA processing is I/O limited for many applications PowerFLEX-4 (3Q04) will break the throughput bottleneck - Open standards-based solution (I/O and backplane) - 2.5 GB/s bandwidth to/from each XMC module - 2.5 GB/s bandwidth to/from the backplane - Same tekconnect and tekx architecture Our modular approach to FPGA solutions allows applications to be prototyped today using RACE++ and migrate to future switched fabric interconnects