Peter Alfke, Xilinx, Inc. Hot Chips 20, August Virtex-5 FXT A new FPGA Platform, plus a Look into the Future

Similar documents
Agenda. FPGA Capacity Trends. FPGA Performance Trends. Peter Alfke Presented by Volker Lindenstruth Including slides from Ivo Bolsens.

Programmable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

Zynq-7000 All Programmable SoC Product Overview

IGLOO2 Evaluation Kit Webinar

Field Programmable Gate Array (FPGA) Devices

Virtex-5 FPGA RocketIO GTX Transceiver

A HT3 Platform for Rapid Prototyping and High Performance Reconfigurable Computing

Copyright 2017 Xilinx.

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

Achieving Breakthrough Performance with Virtex-4, the World s Fastest FPGA

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx

Multi-Gigabit Transceivers Getting Started with Xilinx s Rocket I/Os

Zynq AP SoC Family

40-Gbps and 100-Gbps Ethernet in Altera Devices

5 GT/s and 8 GT/s PCIe Compared

Using IBIS-AMI in the Modeling of Advanced SerDes Equalization for Serial Link Simulation

S2C K7 Prodigy Logic Module Series

High Performance Memory in FPGAs

Designing with the Xilinx 7 Series PCIe Embedded Block. Tweet this event: #avtxfest

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

Embedded Processor Block in Virtex-5 FPGAs

Maximising Serial Bandwidth And Signal Integrity In FPGAs

Intel Cyclone 10 GX Device Overview

ECE 485/585 Microprocessor System Design

SoC FPGAs. Your User-Customizable System on Chip Altera Corporation Public

SGMII Interface Implementation Using Soft-CDR Mode of Stratix III Devices

QPairs QTE/QSE-DP Multi-connector Stack Designs In PCI Express Applications 16 mm Connector Stack Height REVISION DATE: OCTOBER 13, 2004

White Paper Compromises of Using a 10-Gbps Transceiver at Other Data Rates

1. Overview for the Arria V Device Family

LatticeSC/M Family flexipcs Data Sheet. DS1005 Version 02.0, June 2011

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Supporting Advanced-Node FinFET SoCs with 16Gbps Multi-Protocol SerDes PHY IP

Section I. Stratix II GX Device Data Sheet

Arria 10 Transceiver PHY User Guide

PowerPC on NetFPGA CSE 237B. Erik Rubow

Virtex-6 FPGA GTX Transceiver Characterization Report

Building Gigabit Interfaces in Altera Transceiver Devices

Embedded Tech Trends 2014 New EW architectures based on tight coupling of FPGA and CPU processing

CKSET V CC _VCO FIL SDO+ MAX3952 SCLKO+ SCLKO- PRBSEN LOCK GND TTL

LogiCORE IP Spartan-6 FPGA GTP Transceiver Wizard v1.9

Design Guidelines for Intel FPGA DisplayPort Interface

Intel Cyclone 10 GX Transceiver PHY User Guide

Design and Implementation of High Performance DDR3 SDRAM controller

ML505 ML506 ML501. Description. Description. Description. Features. Features. Features

Introduction to Microprocessor

Qsys and IP Core Integration

Components for Integrating Device Controllers for Fast Orbit Feedback

Rich Sevcik Executive Vice President, Xilinx APAC: RS _January 05

Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas

XMC-FPGA05F. Programmable Xilinx Virtex -5 FPGA PMC/XMC with Quad Fiber-optics. Data Sheet

Jim Keller. Digital Equipment Corp. Hudson MA

Unleashing the Power of Embedded DRAM

ARM Processors for Embedded Applications

Achieving UFS Host Throughput For System Performance

Virtex-6 FPGA GTX Transceiver OTU1 Electrical Interface

RiseUp RU8-DP-DV Series 19mm Stack Height Final Inch Designs in PCI Express Applications. Revision Date: March 18, 2005

Optimal Management of System Clock Networks

H100 Series FPGA Application Accelerators

ChipScope Pro Software and Cores

Q Pairs QTE/QSE-DP Final Inch Designs In PCI Express Applications 16 mm Stack Height

Virtex-5 GTP Aurora v2.8

Copyright 2016 Xilinx

Hardware Design with VHDL PLDs IV ECE 443

Virtex-5 FPGA RocketIO GTX Transceiver Wizard v1.5

DINI Group. FPGA-based Cluster computing with Spartan-6. Mike Dini Sept 2010

New Software-Designed Instruments

Power Consumption in 65 nm FPGAs

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Accelerating System Designs Requiring High-Bandwidth Connectivity with Targeted Reference Designs

White Paper. ORSPI4 Field-Programmable System-on-a-Chip Solves Design Challenges for 10 Gbps Line Cards

PCI Express Link Equalization Testing 서동현

The Benefits of FPGA-Enabled Instruments in RF and Communications Test. Johan Olsson National Instruments Sweden AB

A ONE CHIP HARDENED SOLUTION FOR HIGH SPEED SPACEWIRE SYSTEM IMPLEMENTATIONS

FPGA architecture and design technology

Virtex-5 FPGA Embedded Tri-Mode Ethernet MAC Wrapper v1.7

Virtex 6 FPGA Broadcast Connectivity Kit FAQ

In-chip and Inter-chip Interconnections and data transportations for Future MPAR Digital Receiving System

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

Cyclone 10 GX Advance Information Brief

Optimizing latency in Xilinx FPGA Implementations of the GBT. Jean-Pierre CACHEMICHE

Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

VXS-610 Dual FPGA and PowerPC VXS Multiprocessor

DisplayPort 1.4 Webinar

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

Power vs. Performance: The 90 nm Inflection Point

INT G bit TCP Offload Engine SOC

The DSP Primer 8. FPGA Technology. DSPprimer Home. DSPprimer Notes. August 2005, University of Strathclyde, Scotland, UK

Resource Efficiency of Scalable Processor Architectures for SDR-based Applications

Building Interfaces with Arria 10 High-Speed Transceivers

LogiCORE IP Virtex -5 FPGA RocketIO GTX Transceiver Wizard v1.7

Trends in Digital Interfaces for High-Speed ADCs

A 1.5GHz Third Generation Itanium Processor

Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM. Join the Conversation #OpenPOWERSummit

Virtex-5 FPGA Embedded Tri-Mode Ethernet MAC User Guide. UG194 (v1.7) October 17, 2008

fleximac A MICROSEQUENCER FOR FLEXIBLE PROTOCOL PROCESSING

Virtex-4 Family Overview

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor

The S6000 Family of Processors

Transcription:

Peter Alfke, Xilinx, Inc. Hot Chips 20, August 2008 Virtex-5 FXT A new FPGA Platform, plus a Look into the Future

FPGA Evolution Moore s Law: Double density every other year New process technology, smaller geometry and always a new FPGA family Traditionally: bigger + faster + cheaper = better Now: more functionality, more hard cores, lower cost per function, better software But: raw speed does not come for free anymore, and leakage current is our worst enemy HOT CHIPS 20 P.A. 2

Virtex Evolution 40-nm Virtex-E Virtex-II Virtex-II Pro Virtex-4 Virtex-5 65-nm 90-nm 130-nm 150-nm 180-nm Virtex 220-nm 1st Generation 2nd Generation 3rd Generation 4th Generation 5th Generation 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 HOT CHIPS 20 P.A. 3

Virtex-5 The Gods Were Smiling The process was available early, and yields well Features were aggressive, but well-defined Circuit design was solid Testing was well-planned and executed Rapid introduction of 5 sub-families (Platforms) LX, LXT, SXT, FXT and TXT ( 26 devices total) There was (and still is) no real competition > 90% share of market ( >$ 50 M in first half of 2008) 18 different devices in volume production now HOT CHIPS 20 P.A. 4

Virtex-5 Common Features 30% Higher Performance New Capability Advanced Configuration Options 6-LUT + Express Fabric 36Kb Dual-Port Block RAM / FIFO with ECC Higher Bandwidth Integrated System Monitor A/D Converter SelectIO with IDELAY/ODELAY and SerDes Higher Precision 25x18 DSP Slices 10/100/1000 Mbps Ethernet MAC Saves Logic Precision + low jitter 550 MHz Clock Management PCI-Express Endpoint Blocks PowerPC440 Processors GTX 6.5 Gbps Transceivers GTP 3.75 Gbps Transceivers HOT CHIPS 20 P.A. 5 New Capability

Virtex-5 FXT Additional Capabilities One or two PPC440 hard Microprocessor cores faster and more efficient than PPC405, super-scalar, larger caches, deeper instruction pipeline, integrated crossbar switch saves thousands of slices 5 family members, 3 in volume production by Sept.08 HOT CHIPS 20 P.A. 6

More Than Just a PPC440 5 x 2 Crossbar Connection (128-bit interfaces) Reduces need for external buses Allows two operations simultaneously Memory Controller Interface Reduces latency and logic Four 32-bit DMA Channels Auxiliary Processing-Unit Controller to attach soft co-processors 128 bit interface, supports pipelining APU Control CPM Processor Block PowerPC 440 PLB DCR DMA DMA PLB Slv0 Memory I/F PLB Mstr PLB Slv1 DMA DMA HOT CHIPS 20 P.A. 7

Breakthrough Performane! 8 Crossbar in Action TEMAC TEMAC Wrapper Wrapper Processor Block DMA APU Control CPM PowerPC 440 DCR Crossbar DMA SPLB0 MCI MPLB SPLB1 DMA DMA Memory Controller Interface I/O Bus (PLB V46) GPIO Counter Timer PPC440MC_ DDR2 RS232 External DDR2 Memory Interrupt Controlle r PLB I2C GPIO Timer Interrupt Controller RS232 Custom IP Four built-in DMA channels provide high speed access to memory or I/O Separate memory and I/O buses greatly improve system performance External masters can access memory or I/O through the crossbar HOT CHIPS 20 P.A. 8

Auxiliary Processor Unit Control Reduces PPC/APU overhead and increases data throughput Processor Block Access to soft execution units e.g. Floating Point Coprocessor Adds 16 user-defined instructions 128-bit load/store interfaces Pipelines instructions Buffers data for up to 3 instructions Instruction 32 bits Operand A 32 bits Operand B 32 bits Result 32 bits Control Load Data 128 bits Store Data 128 bits APU Control PowerPC 440 Tight coupling between CPU and soft logic cores HOT CHIPS 20 P.A. 9

128-bit Floating Point Unit Soft co-processor module, free-of-charge accessible by the 440 processor instruction pipeline Single- and double-precision IEEE-754 compliant Speed-up 6 to 30 times, 200 MFLOPS sustained Decode Reg addresses Pipeline Interlock controller Processor Block INSTRUCTION Fused add FPSCR Multiply DATA (load) PowerPC 440 APU Control DATA (store) INST_VALID FCB I/F Register file Add FP to Int DATA_VALID DONE Load Store Int to FP Abs/Neg Cmp Div 128-bit (V4=32-bit) Sqrt HOT CHIPS 20 P.A. 10

Virtex-5 FXT Additional Capabilities GTX High-performance Transceivers optimized for performance and low power and pc-board signal integrity for an open eye. 5 family members, 3 in volume production by Sept.08 HOT CHIPS 20 P.A. 11

Low-Power Transceivers I/O Speed 6.5 Gbps 3.75 Gbps 622 Mbps 155 Mbps Tx Pre-emphasis Tx Pre-emphasis Advanced Rx EQ SONET Path PCI-Express Virtex-5 GTX Advanced Rx EQ Low latency Low Power PCI Express IP GTP Low Power PCI-Express PHY PCI Express IP Easy to Use Gen 1 Gen 2 Gen 3 HOT CHIPS 20 P.A. 12

GTX Multi-Gigabit Transceiver Tx Rx GTX Transceiver PMA PMA PCS PCS FPGA Fabric Interface 8 to 24 transceivers per device (40 and 48 in TXT subfamily) Supporting data rates from 150 Mbps to 6.5 Gbps Power dissipation less than 250 mw per channel Programmable Tx pre-emphasis and Rx equalization HOT CHIPS 20 P.A. 13

GTX Transmitter Driver PMA PLL Divider From PMA PLL Preemphasis Parallel to Serial Tx- PMA Polarity Tx- PCS Phase Adjust FIFO & Oversampling PRBS Generator 8b/10b Tx PIPE Control FPGA Fabric Programmable Tx swing (diff): 1.3V, 1.2V 1.1V 1.0V, 900, 800, 700, 500mV Tx pre-emphasis*: 0%, 8%, 17%, 25%, 33%, 42%, 50%, 58% (nominally) Maximum peak-to-peak jitter: 0.32 UI @ 6.5 Gbps (~50ps) see Characterization Report Serial and Parallel loop-back capability for testing Out of Band (OOB) signal generation * Also referred to as de-emphasis HOT CHIPS 20 P.A. 14

GTX Receiver Equalizer & RX-OOB CDR PMA PLL Divider Serial to Parallel Oversampler PBRS Checker Polarity Comma Detect & Align 8b / 10b Loss of Sync Elastic Buffer FPGA Fabric From PMA PLL Rx status Control Rx- PMA Rx- PCS RXxPipe Control 4 modes of analog receive equalization plus Decision Feedback Equalizer (DFE) Programmable receive termination; supporting PCIe and SATA Can track +0 / -0.5% spread-spectrum clock at 30-60 khz rate Receiver eye-scan capability for testing Ability to turn off Clock-Data Recovery tracking and to control CDR phase offset HOT CHIPS 20 P.A. 15

Multi-Tap Digital Equalizer First three taps have the biggest impact HOT CHIPS 20 P.A. 16

GTX: Additional Features Handles 150 consecutive 1 s or 0 s without losing lock 8B/10B, 64B/66B, and 64B/66B Interlaken supported with Gearbox Built-in Burst-Error Test support Pseudo Random Bit Sequence generator/checker: PRBS 2 7, 2 23, 2 31 Deterministic through-delay option with low latency Bypasses encoders and FIFOs All GTX Transceivers have the same specifications and have full functionality HOT CHIPS 20 P.A. 17

6.5 Gb/s Transmit Eye Opening Always keep your eyes open HOT CHIPS 20 P.A. 18

From Our Public Characterization Report http://www.xilinx.com/support/documentation/virtex-5.htm#19312 DATARATE: 6.5GB/s REFCLK: 325MHz 20MHz Sinusoidal Jitter Tolerance DATARATE: 6.5GB/s REFCLK: 325MHz 80MHz Sinusoidal Jitter Tolerance 140 160 120 140 100 120 Number of Datapoints 80 60 40 100 80 60 40 20 20 0 0 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 Number of Datapoints 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 JT20(UI) JT80(UI) Median Average Stan. Dev Min Max 2% 98% 0.530 0.528 0.069 0.398 0.695 0.398 0.672 Median Average Stan. Dev Min Max 2% 98% 0.645 0.634 0.066 0.418 0.782 0.470 0.748 300 DATARATE: 6.5GB/s REFCLK: 325MHz RX input sensitivity 250 Number of Datapoints 200 150 100 50 0 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 Differential (mv) Median Average Stan. Dev Min Max 2% 98% 80 87.8 15.7 44 140 56 116 HOT CHIPS 20 P.A. 19

New Additions to the Family XC5VTX150T and TX240T Like FX130T and FX200T minus the PPC microprocessor. but with twice the number of GTX transceivers 40 and 48 GTXs respectively Availability: ES 4Q08, Production 1Q09 When you need lots of fast transceivers HOT CHIPS 20 P.A. 20

2009-2013 Moore Law continues

Moore s Law Continues 40/45 nm.32 nm.22 nm.? Each step doubles density and thus max capacity Each step lowers the cost per function But: More complex and more expensive masks Harder to control leakage current Larger parametric spread due to smaller feature size IC logic density outstrips pc-board pin-count >6000 bumps on the flip-chip die, <1800 package pins Stacked die to the rescue? Progress gets harder and more expensive HOT CHIPS 20 P.A. 22

Higher Speed is a Challenge Raw transistor speed is difficult to improve, Supply voltage scaling reaches its limits. Performance is limited by the power & heat budget. Need for multiple technologies, optimized for either performance or power, not for both simultaneously. Multi-gigabit transceivers: Conflict between highest bit-rate vs. widest frequency range (R-C vs. L-C type of VCO) Higher speed does not come for free HOT CHIPS 20 P.A. 23

Users Design Productivity 10 x growth in FPGA capacity may challenge our users budget for design-time and cost. Design verification is biggest burden Design re-use becomes more important. More cores. Incremental compile and block-based design. Place-and-route times must stay acceptable. How to manage mega-gate designs at reasonable cost HOT CHIPS 20 P.A. 24

Summary Virtex-5 is the winner, without any real competition Xilinx strategy of sub-families proved to be very successful Multi-gigabit transceivers are very popular Moore s Law will give us much more logic and lower cost Speed and power consumption pose a difficult challenge Users want and need to improve design productivity The pace is becoming faster and more competitive. Xilinx will keep bringing you better chips and tools Thank You! HOT CHIPS 20 P.A. 25

HOT CHIPS 20 P.A. 26