The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

Similar documents
Achieving Breakthrough Performance with Virtex-4, the World s Fastest FPGA

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

EE219A Spring 2008 Special Topics in Circuits and Signal Processing. Lecture 9. FPGA Architecture. Ranier Yap, Mohamed Ali.

International Training Workshop on FPGA Design for Scientific Instrumentation and Computing November 2013.

7-Series Architecture Overview

Field Programmable Gate Array (FPGA)

Reconfigurable Computing

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

H100 Series FPGA Application Accelerators

Power Considerations in High Performance FPGAs. Abu Eghan, Principal Engineer Xilinx Inc.

Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Virtex-4 Family Overview

Field Programmable Gate Array (FPGA) Devices

7 Series FPGAs Overview

FPGA IO Resources. FPGA pin buffers can be configured for. IO columns include Xilinx SelectIO resources. Slew rate. Serializer/Deserializer (IOSERDES)

FPGA Architecture Overview. Generic FPGA Architecture (1) FPGA Architecture

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx

High-Performance Integer Factoring with Reconfigurable Devices

INTRODUCTION TO FPGA ARCHITECTURE

Introduction to FPGAs. H. Krüger Bonn University

Peter Alfke, Xilinx, Inc. Hot Chips 20, August Virtex-5 FXT A new FPGA Platform, plus a Look into the Future

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

8. Migrating Stratix II Device Resources to HardCopy II Devices

Dynamic Phase Alignment for Networking Applications Author: Tze Yi Yeoh

Spiral 2-8. Cell Layout

FPGA Provides Speedy Data Compression for Hyperspectral Imagery

Field Programmable Gate Array

The Nios II Family of Configurable Soft-core Processors

XA Artix-7 FPGAs Data Sheet: Overview

EECS 151/251A Spring 2019 Digital Design and Integrated Circuits. Instructor: John Wawrzynek. Lecture 18 EE141

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

XA Spartan-6 Automotive FPGA Family Overview

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

FPGA VHDL Design Flow AES128 Implementation

DINI Group. FPGA-based Cluster computing with Spartan-6. Mike Dini Sept 2010

Maximum Capability Artix-7 Family Kintex-7 Family Virtex-7 Family

High-Performance Memory Interfaces Made Easy

Outline of Presentation Field Programmable Gate Arrays (FPGAs(

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Power vs. Performance: The 90 nm Inflection Point

Introduction to Partial Reconfiguration Methodology

Zynq AP SoC Family

Rich Sevcik Executive Vice President, Xilinx APAC: RS _January 05

EE178 Lecture Module 2. Eric Crabill SJSU / Xilinx Fall 2007

The DSP Primer 8. FPGA Technology. DSPprimer Home. DSPprimer Notes. August 2005, University of Strathclyde, Scotland, UK

FPGA architecture and design technology

Leso Martin, Musil Tomáš

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

Power Consumption in 65 nm FPGAs

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

HDL Coding Style Xilinx, Inc. All Rights Reserved

Introduction to Field Programmable Gate Arrays

The Xilinx XC6200 chip, the software tools and the board development tools

Topics. Midterm Finish Chapter 7

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

The Ultimate System Integration Platform. VIRTEX-5 FPGAs

TSEA44 - Design for FPGAs

Virtex -5 Product Overview

FPGA Programming Technology

Outline. Field Programmable Gate Arrays. Programming Technologies Architectures. Programming Interfaces. Historical perspective

Xilinx DSP. High Performance Signal Processing. January 1998

CS310 Embedded Computer Systems. Maeng

Topics. Midterm Finish Chapter 7

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

Programmable Logic. Any other approaches?

Built-In Self-Test of Programmable Input/Output Tiles in Virtex-5 FPGAs

Digital Integrated Circuits

FPGA Polyphase Filter Bank Study & Implementation

DSP Resources. Main features: 1 adder-subtractor, 1 multiplier, 1 add/sub/logic ALU, 1 comparator, several pipeline stages

Hardware Design with VHDL PLDs IV ECE 443

Lab 3 Sequential Logic for Synthesis. FPGA Design Flow.

Programmable Logic Devices FPGA Architectures II CMPE 415. Overview This set of notes introduces many of the features available in the FPGAs of today.

Zynq-7000 All Programmable SoC Product Overview

Fibre Channel Arbitrated Loop v2.3

FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

Stratix vs. Virtex-II Pro FPGA Performance Analysis

FPGA Implementations

Learning Outcomes. Spiral 3 1. Digital Design Targets ASICS & FPGAS REVIEW. Hardware/Software Interfacing

7 Series FPGAs Clocking Resources

Intel Stratix 10 Clocking and PLL User Guide

ALTERA FPGAs Architecture & Design

FlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com

FABRICATION TECHNOLOGIES

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

Pipelining & Verilog. Sequential Divider. Verilog divider.v. Math Functions in Coregen. Lab #3 due tonight, LPSet 8 Thurs 10/11

User Manual for FC100

Section I. Cyclone II Device Family Data Sheet

Xilinx ASMBL Architecture

Programmable Logic. Simple Programmable Logic Devices

Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool

UltraScale Architecture and Product Overview

Achieving High Performance DDR3 Data Rates in Virtex-7 and Kintex-7 FPGAs

DATA SHEET. Low power and low cost CPLD. Revision: 1.0. Release date: 10/10/2016. Page 1 of 14

FPGA for Software Engineers

The Virtex FPGA and Introduction to design techniques

Introduction to Modern FPGAs

Transcription:

The Next Generation 65-nm FPGA Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006 Hot Chips, 2006

Structure of the talk 65nm technology going towards 32nm Virtex-5 family Improved I/O Benchmarking Virtex-5 LUT6 fabric New Microblaze in Virtex-5 fabric Conclusion Hot Chips, 2006 slide 2

65nm Process Technology 40-nm gate length (physical poly) 1.6nm oxide thickness (16 Angstrom) ~5 atomic layers Triple-Oxide II technology 3 oxide thicknesses for optimum power and performance 1.0 Vcc core Lower dynamic power Mobility engineered transistors (strained silicon) Maximum performance at lowest AC power 65-nm Transistor Cross Section Over 1 Billion Transistors on a 23 x 23 mm Chip Hot Chips, 2006 slide 3

FPGAs Drive the Process 180 nm 150 nm 130 nm 90 nm 90nm Low cost New process technology drives down cost FPGAs can take advantage of new technology faster than ASICs and ASSPs 300mm wafers Low cost Triple Oxide Low power FPGA 2010: 32 nm, 5 Billion transistors 65 nm 45 nm 32 nm 12 layer copper, 1 volt core 1.0 Volt The cost of IC development increases. Therefore customers want to buy reconfigurable and programmable platforms, instead of developing their own. Hot Chips, 2006 slide 4

Challenges Higher leakage current and stand-by power Lower Vcc: good for power, tough for decoupling 3.3-V compatibility is getting more difficult 1 billion transistors, large chips, heat density 12-layer chip, 10-layer package, 16-layer pc-board Faster transitions, 2 V/ns and 50 ma/ns per pin, Pc-board signal integrity problems Complex chip, complex package, complex board Hot Chips, 2006 slide 5

LX Platform Overview Hot Chips, 2006

Two Generations of ASMBL (Application-Specific Modular BLock Architecture) Virtex-4 Virtex-5 Serial I/O Hot Chips, 2006 slide 7

2 nd Generation of ASMBL Easy to create sub-families LX : Logic + parallel IO LXT: Logic + serial I/O LX LXT Four Platforms SXT: DSP + serial I/O FXT: PPC + fastest serial I/O SXT FXT Hot Chips, 2006 slide 8 Many choices to optimize cost and performance

System components High-Performance 6-LUT Fabric More Configuration Options 36Kbit Dual-Port Block RAM / FIFO with ECC 25x18 Multiplier DSP Slice with Integrated ALU SelectIO with ChipSync + XCITE DCI 550 MHz Clock Management Tile DCM + PLL Hot Chips, 2006 slide 9

Virtex-5 Logic Architecture True 6-input LUTs with dual 5-input LUT option 1.4 times the value for actual logic only 1.15 times the cost in silicon area. 64-bit RAM per M-LUT about half of all LUTs 32-bit or 16-bit x 2 shift register per M-LUT RAM64 SRL32 RAM64 SRL32 RAM64 SRL32 RAM64 SRL32 LUT6 LUT6 LUT6 LUT6 Register/ Latch Register/ Latch Register/ Latch Register/ Latch Hot Chips, 2006 slide 10

Virtex-5 Virtex-4 Routing More symmetric pattern, connecting CLBs More logic reached per hop Same pattern for all outputs Fast Connect 1 Hop 2 Hops 3 Hops Hot Chips, 2006 slide 11

BRAM/FIFO 36 Kbit BRAM Integrated FIFO Logic for multi-rate designs Built-in ECC Cascadable to build larger RAM arrays Dual Port: a read and write every clock cycle Performance up to 550 MHz Hot Chips, 2006 slide 12

General Purpose I/O (Select I/O) All I/O pins are created equal Compatible with >40 different standards Vcc, output drive, input threshold, single/differential, etc Each I/O pin has dedicated circuitry for: On-chip transmission-line termination (serial or parallel) Fine timing adjustment in 75 ns steps (IDELAY + ODELAY) Serial-to-parallel converter on the input (CHIPSYNC) Parallel -to-serial converter on the output (CHIPSYNC) Clock divider, and high-speed regional clock distribution Ideal for source-synchronous I/O up to 1 Gbps Hot Chips, 2006 slide 13

75-ps Incremental Alignment CLK DATA 175-225 MHz (calibration clk) ChipSync IDELAY INC/DEC ISERDES IDELAY CNTRL FPGA Fabric State Machine Calibration clock can be internal or external 64 delay elements of ~ 70 to 89 ps each Hot Chips, 2006 slide 14

ISERDES for Incoming Data Data CLK ChipSync ISERDES n CLKDIV FPGA Fabric CLK BUFIO BUFR Clock frequency division widens internal data path n = 2, 3, 4, 5, 6, 7, 8, 10 bits Dynamic signal alignment Bit alignment, Word alignment, Clock alignment Supports Dynamic Phase Alignment (DPA) using IDELAY Hot Chips, 2006 slide 15

OSERDES for Outgoing Data ChipSync n OSERDES m FPGA Fabric CLK DCM/PMCD CLKDIV Parallel-to-Serial converter Data SERDES: 2, 3, 4, 5, 6, 7, 8, 10 bits Three-state control SERDES: 1, 2, 4 bits Hot Chips, 2006 slide 16

Virtex-5 Applications Benchmarks Hot Chips, 2006

One MPEG4 Video Decoder High Definition resolution 720 vertical video lines, progressive RAM Memory Controller MPEG 4 Decoder Copy Controller Shared Memory Parser 1 FIFO Motion Comp. Object FIFO Texture Update Object 8 FIFO Object FIFO Inverse scan, Prediction, Inverse AC DC Prediction IDCT DCT Coeff Inverse Quantisatio n / IDCT Texture/ID CT Object FIFO Hot Chips, 2006 slide 18

8 MPEG4 decoders RAM RAM Off Chip Frame Memories Eight Ports of Compressed Video In Memory Memory Controller Controller Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Mpeg4 Decoder Eight Ports of De-Compressed 720p Video Out Category Virtex-4 Virtex-5 Tools XST/ISE 8.1.02i XST/ISE 8.2i Devices XC4VFX140-11 Virtex5 part Hot Chips, 2006 slide 19

8 Decoders: Resources Diff. = 6932 1634 14,809 Design Virtex-4 Virtex-5 Resources Used Used Registers 21,248 20,242 LUTs 67,523 44,148 BlockRAMs 233 233 DSP Elements 192 216 35% fewer LUTs dramatic improvements for multiplexers, memory, and misc. logic Same VHDL source code used for both designs Hot Chips, 2006 slide 20

Logic Synthesis-Driven Results Virtex-5 Virtex-4 35,000 30,000 25,000 20,000 15,000 10,000 5,000 0 <= 3 4 5 6 Hot Chips, 2006 slide 21 Number of Inputs Synthesis uses 6-input LUTs efficiently : fewer logic levels 23% increase in synthesized frequency, from 95MHz to 117MHz From 720p to 1080p video standards with little effort

Quad-Port Memory in Four LUT6 Read data Write Port: Four LUT6s share the data input and can also share a distributed write address Read Ports: Three independent read operations Write data 32 Write Port Register File 32X32 LUT 32 32 32 Read Port 32 x 32 Quad-Port RAM structure in 64 LUTs Common write address LUT Independent read address Associated data 6x density improvement over Virtex-4 Common write data LUT Independent read address Associated data LUT Independent read address Associated data Hot Chips, 2006 slide 22

Application Example: new MicroBlaze 5.0 ILMB IOPB Instruction-side bus interface Bus IF Program Counter Instruction Buffer Instruction Decode Add/Sub Shift/Logical Multiply Register File 32X32b Data-side bus interface Bus IF DLMB DOPB Better use of new LUTs 1269 LUT4s in Virtex-4, MB 4.0 1400 LUT6s in Virtex-5, MB 5.0 from 3 stage -> 5 stage pipeline new processor: from 0.92 DMips/MHz to 1.14 DMips/MHz 180MHz -> 201 MHz 166 -> 230 Dhrystone Mips Use new 6 LUT, 2 stage deeper pipe, 10% more MHz, 39% better performance Hot Chips, 2006 slide 23

Suite of Benchmarks Suite of 74 designs run against ISE8.1i Slow speedgrade (-10) Virtex-4 4 compared to slow speedgrade (-1) Virtex-5 70 60 50 Percent Improvement of Virtex-5 over Virtex-4 40 30 20 10 Avg of Designs 0 ~30% average advantage for Virtex-5 5 fabric vs. Virtex-4 - As high as 56% advantage for some designs Hot Chips, 2006 slide 24

Virtex-5: Summary Leading 65 nm technology FPGA platform Better input and output I/O on all pins New 6-input LUT logic that is 30% better Demonstrated example of video benchmark with 35% fewer LUTs and 23% increased frequency New Microblaze with 39% improved performance Expect more: new announcements on serial I/O and integrated processor technology very soon Hot Chips, 2006 slide 25

Appendix: Virtex-5 LX Hot Chips, 2006

Virtex-5 LX Platform 5VLX30 5VLX50 5VLX85 5VLX110 5VLX220 5VLX330 Logic Cells 30,720 46,080 82,944 110,592 221,184 331,776 LUT6/FFs Distributed RAM Kbits 19,200 320 28,800 480 51,840 840 69,120 1,120 138,240 2,280 207,360 3,420 Block RAM Kbits 1,152 1,728 3,456 4,608 6,912 10,368 CMTs 2 6 6 6 6 6 DSP48E Slices 32 48 48 64 128 192 Total I/O Banks 13 17 17 23 23 35 EasyPath No No Yes Yes Yes Yes Package Size IO FF324 19 220 220 220 FF676 27 440 400 440 440 440 FF1153 35 800 560 560 800 FF1760 42.5 1,200 800 800 1,200 Hot Chips, 2006 slide 27