INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

Similar documents
Outline. Field Programmable Gate Arrays. Programming Technologies Architectures. Programming Interfaces. Historical perspective

EE219A Spring 2008 Special Topics in Circuits and Signal Processing. Lecture 9. FPGA Architecture. Ranier Yap, Mohamed Ali.

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

FPGA Implementations

INTRODUCTION TO FPGA ARCHITECTURE

Outline of Presentation Field Programmable Gate Arrays (FPGAs(

EITF35: Introduction to Structured VLSI Design

What is Xilinx Design Language?

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

Field Programmable Gate Array (FPGA)

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

Very Large Scale Integration (VLSI)

Spiral 2-8. Cell Layout

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

Built-In Self-Test of Programmable Input/Output Tiles in Virtex-5 FPGAs

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

Outline of Presentation

Programmable Logic Devices FPGA Architectures II CMPE 415. Overview This set of notes introduces many of the features available in the FPGAs of today.

Topics. Midterm Finish Chapter 7

Altera FLEX 8000 Block Diagram

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx

Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I

EE178 Lecture Module 2. Eric Crabill SJSU / Xilinx Fall 2007

Presentation Outline Overview of FPGA Architectures Virtex-4 & Virtex-5 Overview of BIST for FPGAs BIST Configuration Generation Output Response Analy

FPGA architecture and design technology

Programmable Logic. Any other approaches?

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

ECE 448 Lecture 5. FPGA Devices

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES

EECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are:

7-Series Architecture Overview

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

Memory and Programmable Logic

Design Methodologies. Full-Custom Design

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA: What? Why? Marco D. Santambrogio

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Autonomous Built-in Self-Test Methods for SRAM Based FPGAs

Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool

ECE 545 Lecture 12. FPGA Resources. George Mason University

PINE TRAINING ACADEMY

Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs

Introduction to Field Programmable Gate Arrays

Digital Integrated Circuits

Chapter 2. FPGA and Dynamic Reconfiguration ...

PLAs & PALs. Programmable Logic Devices (PLDs) PLAs and PALs

FPGA Architecture Overview. Generic FPGA Architecture (1) FPGA Architecture

Introduction to FPGAs. H. Krüger Bonn University

FPGA Based Digital Design Using Verilog HDL

ALTERA FPGAs Architecture & Design

CS310 Embedded Computer Systems. Maeng

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

Chapter 5: ASICs Vs. PLDs

FPGA How do they work?

Summary. Introduction. Application Note: Virtex, Virtex-E, Spartan-IIE, Spartan-3, Virtex-II, Virtex-II Pro. XAPP152 (v2.1) September 17, 2003

Atmel AT94K FPSLIC Architecture Field Programmable Gate Array

International Training Workshop on FPGA Design for Scientific Instrumentation and Computing November 2013.

Introduction to Modern FPGAs

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

Qsys and IP Core Integration

ECEN 449 Microprocessor System Design. FPGAs and Reconfigurable Computing

An Architecture for Fail-Silent Operation of FPGAs and Configurable SoCs

Workspace for '4-FPGA' Page 1 (row 1, column 1)

Fault Grading FPGA Interconnect Test Configurations

XA Spartan-6 Automotive FPGA Family Overview

Topics. Midterm Finish Chapter 7

FPGAs: Instant Access

FPGA for Software Engineers

Xilinx XC4VLX25-FF668AGQ FPGA. IOB Circuit Analysis

ECE 485/585 Microprocessor System Design

discrete logic do not

Learning Outcomes. Spiral 3 1. Digital Design Targets ASICS & FPGAS REVIEW. Hardware/Software Interfacing

Design Methodologies and Tools. Full-Custom Design

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University

Advanced Digital Design Using FPGA. Dr. Shahrokh Abadi

A Case Study. Jonathan Harris, and Jared Phillips Dept. of Electrical and Computer Engineering Auburn University

Scheme G. Sample Test Paper-I

More Course Information

Memory and Programmable Logic

FPGA VHDL Design Flow AES128 Implementation

Zynq AP SoC Family

Product Obsolete/Under Obsolescence

Prototyping NGC. First Light. PICNIC Array Image of ESO Messenger Front Page

Field Programmable Gate Array

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

An Introduction to Programmable Logic

Review. EECS Components and Design Techniques for Digital Systems. Lec 03 Field Programmable Gate Arrays

Xilinx DSP. High Performance Signal Processing. January 1998

ECE 331 Digital System Design

8. Migrating Stratix II Device Resources to HardCopy II Devices

Section I. Cyclone FPGA Family Data Sheet

DIGITAL CIRCUIT LOGIC UNIT 9: MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras

Transcription:

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS) Bill Jason P. Tomas Dept. of Electrical and Computer Engineering University of Nevada Las Vegas

FIELD PROGRAMMABLE ARRAYS Dominant digital design implementation Ability to re-configure FPGA to implement any digital logic function Partial re-configuration allows a portion of the FPGA to be continuously running while another portion is being re-configured FPGAs also contain analog circuitry features including a programmable slew rate and drive strength, differential comparators on I/O designed to be connected to differential signaling channels. Mixed-signal FPGAs contains ADCs and DACs with analog signal conditional blocks allowing them to operate as a system-on-chip (SoC)

FPGA ARCHITECTURES Early FPGAs N x N array of unit cells (CLB + routing) Special routing along center axis Next Generation FPGAs M x N unit cells Small block RAMs around edges More recent FPGAs Added block RAM arrays Added multiplier cores Adders processor cores

FPGA ARCHITECTURE TRENDS Memories Single & Dual-port RAMS FIFO (first-in first-out) ECC (error correcting codes) Digital Signal Processors Multipliers Accumulators Arithmetic Logic Units (ALUs) Embedded Processors Hardcore (dedicated processors) Dedicated program and data memories Programmable RAM in FPGA can be used in conjunction with the processor to provide program and data memories Soft core (synthesized from a HDL)

BASIC FPGA ARCHITECTURE More recent FPGA architectures have small block RAM arrays (usually placed in center column), multipliers, processor cores, DSP cores w/ multipliers, and I/O cells along columns for BGAs.

FPGA OPERATION User writes configuration memory which defines the function of the system. This includes: the connectivity between the CLBs and the I/O cells, the logic to be implemented onto the CLBs, and the I/O blocks. By changing the data in the configuration memory, the function of the system changes as well. This change in data can be implemented at anytime during FPGA operation (run-time configuration).

CONFIGURABLE LOGIC BLOCKS (CLBS) ARCHITECTURE CLBs consist of: Look-up Tables (LUT) which implement the entries of a logic functions truth table Some FPGAs can use LUTs to implement small Random Access Memory (RAM) Carry and Control Logic Implements fast arithmetic operations (adders/ subtractors) Can be alsoconfigured for additional operations (Built-in-Self Test iterative-or chain) Memory Elements Configurable Flip Flops (FFs)/ Latches( Programmable clock edges, set/reset, and clock enable) These memory elements usually can be configured as shiftregisters

CONFIGURABLE LOGIC BLOCKS A CLB can contain several slices, which make up a single CLB. Xilinx Virtex-5 FPGAs (right) have two slices: SLICEL (logic) and SLICEM (memory). In addition to the basic CLB architecture, the Virtex-5 contains widefunction MUXs which can implement: - 4:1 MUX using 1 LUT - 8:1 MUX using 2 LUTs - 16:1 MUX using 4 LUTs

LOOK-UP TABLES (2:1 MUX EXAMPLE) Configuration memory holds output of truth table entries Internal signals connect to control signals of MUXs to select a values of the truth tables for any given input signals

Normal LUT mode performs read operation Address decoders with WE generates clock signals to latches for write operation Smaller RAMs can be combined to create larger RAMs (up to 64- bit in Virtex-5) LUT BASED RAM

FPGA PROGRAMMABLE INTERCONNECTION NETWORK Horizontal and vertical mesh of wire segments interconnected by programmable switches called programmable interconnect points (PIPs). These PIPs are implemented using a transmission gate controlled by a memory bits from the configuration memory. Consists of global routing connecting PLBs to I/O buffers, non-adjacent PLBs, and other embedded components. Local routing connects PLBs to other adjacent PLBs and PLBs to global routing (done through a switch matrix) Several types of PIPs are used Cross-point = connects vertical or horizontal wire segments allowing turns Breakpoint = connects or isolates 2 wire segments Decoded MUX = group of 2^n cross-points connected to a single output configure by n configuration bits Non-decoded MUX = n wire segments each with a configuration bit (n segments) Compound cross-point = 6 Break-point PIPS (can isolate two isolated signal nets)

PROGAMMABLE INPUT/OUTPUT CELLS Bi-directional Buffers Programmable for inputs or outputs Tri-state controls bi-directional operation Pull-up/down resistors FFs/ Latches are used to improve timing issues Set-up and hold times Clock-to-out delay Routing Resources Connections to core of array Programmable I/O voltage and current levels Boundary Scan Access

FPGA CONFIGURATION INTERFACES Master (Serial or Parallel) FPGA retrieves configuration from ROM at initial power-up Slave (Serial or Parallel) FPGA configured by an external source (i.e microprocessor/ other FPGA) Used for dynamic partial re-configuration Boundary Scan 4-wire IEEE standard serial interface used for testing Write and read access to configuration memory Interfaces to FPGA core internal routing network

BOUNDARY SCAN CONFIGURATION Developed to test interconnect between chips on PCB Multi-FPGA Emulation Framework to support NoC design and verification (UNLV NSIL) Test Access Point (TAP) controller composed of 16 state FSM Daisy Chain Configuration

FPGA CONFIGURATION TECHNIQUES Full configuration and readback Simple configuration interface Automatic internal calculation of frame address Larger FPGAs have a longer download time Compressed configuration Requires multiple frame write capability Identical frames of configuration data are written to multiple frame addresses Extension of partial re-configuration interface capabilities Frame address is much smaller than frame of configuration data Reduces download time for initial configuration depending on regularity of system function and the array percent that is utilized Partial re-configuration and readback Only change portions of configuration memory with respect to reference design Reduces download time for re-configuration

XILINX VIRTEX-5 FPGAS Multi-FPGA-based emulation framework for NoC design and verification (UNLV Networking and System Integration Laboratory)

VIRTEX-5 FPGA PLATFORMS Five Virtex-5 Platforms 1. LX- general logic applications 2. LXT- logic with advanced serial connectivity 3. SXT-signal processing applications with advanced serial connectivity 4. TXT- high performance systems with double density advanced serial connectivity 5. FXT- high performance embedded systems with advanced serial connectivity Over 320,000 PLBs on the largest Virtex-5 ExpressFabric interconnect sturcture and 12 levels of metal interconnect allowing implementation of complex logic functions allowing connections to neighboring PLBs in few hops than Virtex-4 Each PLB contains 8 LUTs, 8 configurable memory elements (can be configured as RAM/ ROM/ shift register) Enhanced DSP functions on 25 x 18-bit multipliers (ability to be cascaded) Clock managments contain one PLLC and two managers which can drive global l k b ff d fil ji ( d d)

VIRTEX-5 CLB A single CLB in Virtex-5 consists of two slices: SLICEL (logic) and SLICEM (memory). Each CLB is connected to a switch matrix which can access to a general routing (global) matrix. Every slice contains four LUTS, wide function MUXs, carry logic, and configurable memory elements. SLICEM support storing data using distributed RAM and data shifting with 32-bit shift registers

SLICEL

SLICEM

FPGA DESIGN COMPARISON VIRTEX-5, VIRTEX-6, AND SPARTAN 6 Virtex-6 CLB have the same setup as Virtex-5 (SLICEL & SLICEM) Virtex-6 devices add four additional storage elements which can only be configured as edgetriggered D-FFs. The D inputs are driven by the output of the LUTs or bypass slice inputs AX-DX

FPGA DESIGN COMPARISON VIRTEX-5, VIRTEX-6, AND SPARTAN 6 Spartan-6 CLB columns are separated into two columns: 1 column for a new SLICEX and 1 column for alternating SLICEL and SLICEM. SLICEX is a basic CLB without any carry logic added

BACK TO VIRTEX-5 CLB LUT Up to 207, 360 LUTs (6-input) with greater than 13 million configuration bits. Can be configured as dual-output 5-input LUTs. In single 6-input LUT, O6 is the primary output.

Inputs to LUT 2 LUT 1 LUT 2 Inputs to LUT 1 & Select Lines Output MUX (A6) Output A5

LUT SCHEMATIC SIMULATION Logical AND Logical OR

VIRTEX-5 PROGRAMMABLE I/O The I/O cells in Virtex-5 have output logic blocks (OLOGIC), input logic blocks (ILOGIC), I/O delays blocks, and a bidirectional I/O buffer. OLOGIC implements registers to improve system clock-to-output timing and supports single data-rate (SDR) and double data-rate (DDR) reception of data. It can also perform parallel-to-serial conversion of output data (2 & 6 bits) in Serial/De-serializer (SerDes) mode. Two I/O cells are grouped to form a single I/O tile. In master/slave mode, two I/O cells in the same I/O tile are connected via dedicated shift routing to support larger data widths. ILOGIC implements registers to improve setup and hold times and support SDR and DDR transmission of data. It can perform serial-toparallel conversion of input data(2 & 6 bits) when in SerDes mode.

VIRTEX-5 PROGRAMMABLE I/O

FPGA PROGRAMMABLE INTERCONNECTION NETWORK Horizontal and vertical mesh of wire segments interconnected by programmable switches called programmable interconnect points (PIPs). These PIPs are implemented using a transmission gate controlled by a memory bits from the configuration memory. Consists of global routing connecting PLBs to I/O buffers, non-adjacent PLBs, and other embedded components. Local routing connects PLBs to other adjacent PLBs and PLBs to global routing (done through a switch matrix) Several types of PIPs are used Cross-point = connects vertical or horizontal wire segments allowing turns Breakpoint = connects or isolates 2 wire segments Decoded MUX = group of 2^n cross-points connected to a single output configure by n configuration bits Non-decoded MUX = n wire segments each with a configuration bit (n segments) Compound cross-point = 6 Break-point PIPS (can isolate two isolated signal nets)

VIRTEX-5 FPGA INTERCONNECTION NETWORK Global routing consists of Long Lines= routing has three connections: beginning, middle, and end. Double lines have five connections into a switch matrix between beginning and end, and can source in all four directions of the FPGA from a switch matrix. Every direction has 10 BEGs, MIDs, and ENDs (all bidirectional) for a total of 240 wire segments per switch matrix. Spans 24 rows/columns of components with a switch matrix connection at every sixth component Double Lines= resources span three columns/rows of components, with a connection to the switch matrix for each component. Hex lines = three connections into a switch matrix similar to long lines. Source in all four directions from switch matrix. Spans six rows or columns of components

VIRTEX-5 FPGA INTERCONNECTION NETWORK PIPs

HANDS ON DEMONSTRATION

FUTURE FPGA DEVELOPEMENT Moore s law states that the number of transistors on a IC circuit doubles every two years. How to continue with the trend stated by Moore?? 3D Integrated Circuitry

2D INTEGRATED CIRCUIT Metal layer 6 Metal layer 3 Metal layer 2 Metal layer 1 Active device layer Si Substrate

TRANSISTORS NO LONGER DOMINATE, METAL INTERCONNECTIONS TOOK OVER

DESIGN COSTS INCREASE AS TECHNOLOGY GETS SMALLER

IC DESIGNS DECREASE

FPGAS SEE DIMINISHING BENEFITS WITH SCALING 90% of FPGA logic area is programmable interconnect Performance and power penalty are direct result of the area (70% Virtex-2) Interconnect needs to increase faster than number of gates to keep up (Rents rule) 10% Interconnect 14% Logic 16% 60% Clocking IOB Dynamic Power in Virtex-2 (Shang FPGA 02)

CROSS-TALK INCREASE AS TECHNOLOGY GETS SMALLER

3D INTEGRATED CIRCUITS More functionality in a smaller space extends Moore s Law More transistors in a package larger designs Shorter Interconnects less RC delays better chip performance Power Decrease shorter wires reduce power consumption by producing less capacitance (also less inductance) Bandwith large number of vertial vias between layers allow construction of wide bandwidth buses between functional blocks in different layers

3D INTEGRATE CIRCUIT Metal layers Device layer 2 Metal layers Device layer 1 Si Substrate

Young-Su KWON (MIT) 2005

NUPGA ARCHITECTURE ( ACHIEVE SAME DENSITIES AS AN ASIC DESIGN? Uses a graphite-based memory process for creating reprogrammable memory elements, which is now being used as anti-fuses for 3D FPGAs. Anti-fuses start as an open circuit, but can be reprogrammed to create a low-resistance with a high voltage. Since the anti-fuses lay above the logic, the interconnection density can rival ASICs. The problem is that high voltage programming transistors take up a lot of area negating the density boost. NuPGA claims they have solved that problem by burying the programmable transistors in a 3D foundation layer beneath the FPGA circuitry

QUESTIONS?