The Xilinx XC6200 chip, the software tools and the board development tools

Similar documents
INTRODUCTION TO FPGA ARCHITECTURE

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

A Hardware / Software Co-Design System using Configurable Computing Technology

FPGA: What? Why? Marco D. Santambrogio

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

FPGA architecture and design technology

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Programmable Logic Devices FPGA Architectures II CMPE 415. Overview This set of notes introduces many of the features available in the FPGAs of today.

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Field Programmable Gate Array (FPGA)

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Stephen Churcher, Tom Kean, and Bill Wilkie Xilinx Inc. (presented by Raj Patel)

Topics. Midterm Finish Chapter 7

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

Spiral 2-8. Cell Layout

Design Methodologies. Full-Custom Design

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

VHDL-MODELING OF A GAS LASER S GAS DISCHARGE CIRCUIT Nataliya Golian, Vera Golian, Olga Kalynychenko

Xilinx DSP. High Performance Signal Processing. January 1998

Programmable Logic. Any other approaches?

V8-uRISC 8-bit RISC Microprocessor AllianceCORE Facts Core Specifics VAutomation, Inc. Supported Devices/Resources Remaining I/O CLBs

ProASIC PLUS FPGA Family

Field Program mable Gate Arrays

CS310 Embedded Computer Systems. Maeng

ECE 485/585 Microprocessor System Design

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

Design Methodologies and Tools. Full-Custom Design

DIGITAL CIRCUIT LOGIC UNIT 9: MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES

Virtex-II Architecture

Field Programmable Gate Array

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design

FPGA Implementations

Don t expect to be able to write and debug your code during the lab session.

Lecture 41: Introduction to Reconfigurable Computing

ispxpld TM 5000MX Family White Paper

Device XC5202 XC5204 XC5206 XC5210 XC5215. Max Logic Gates 3,000 6,000 10,000 16,000 23,000

PINE TRAINING ACADEMY

FPGA Based Digital Design Using Verilog HDL

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Workspace for '4-FPGA' Page 1 (row 1, column 1)

Altera FLEX 8000 Block Diagram

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

Axcelerator Family FPGAs

Memory and Programmable Logic

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

ECEN 449 Microprocessor System Design. FPGAs and Reconfigurable Computing

COSC 243. Computer Architecture 1. COSC 243 (Computer Architecture) Lecture 6 - Computer Architecture 1 1

Learning Outcomes. Spiral 3 1. Digital Design Targets ASICS & FPGAS REVIEW. Hardware/Software Interfacing

ECE 545 Lecture 12. FPGA Resources. George Mason University

FABRICATION TECHNOLOGIES

What is Xilinx Design Language?

Built-In Self-Test for Regular Structure Embedded Cores in System-on-Chip

Introduction to Modern FPGAs

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

Advanced High-level HDL Design Techniques for Programmable Logic

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

Digital Integrated Circuits

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

EECS150 - Digital Design Lecture 16 Memory 1

EE4380 Microprocessor Design Project

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

Graphics: Alexandra Nolte, Gesine Marwedel, Universität Dortmund. RTL Synthesis

Chapter 2. FPGA and Dynamic Reconfiguration ...

ECSE-2610 Computer Components & Operations (COCO)

TSEA44 - Design for FPGAs

Table 17-3: Symbols in the Schematic View

Note: Closed book no notes or other material allowed, no calculators or other electronic devices.

Review. EECS Components and Design Techniques for Digital Systems. Lec 03 Field Programmable Gate Arrays

Spiral 3-1. Hardware/Software Interfacing

CREATED BY M BILAL & Arslan Ahmad Shaad Visit:

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design

Hardware Design with VHDL PLDs IV ECE 443

Lecture #1: Introduction

Qsys and IP Core Integration

The Virtex FPGA and Introduction to design techniques

Design Methodologies

Computer Memory. Textbook: Chapter 1

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Embedded Systems: Hardware Components (part I) Todor Stefanov

Programmable Logic Devices

Lab 1 Modular Design and Testbench Simulation ENGIN 341 Advanced Digital Design University of Massachusetts Boston

Programmable Logic. Simple Programmable Logic Devices

FPGAs: Instant Access

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

Chapter 13 Programmable Logic Device Architectures

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder

FPGA Polyphase Filter Bank Study & Implementation

DIGITAL DESIGN TECHNOLOGY & TECHNIQUES

The D igital Digital Logic Level Chapter 3 1

The QR code here provides a shortcut to go to the course webpage.

Introduction to Field Programmable Gate Arrays

EE178 Lecture Module 2. Eric Crabill SJSU / Xilinx Fall 2007

Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

discrete logic do not

FPGA design with National Instuments

Microcomputer Architecture and Programming

1. INTRODUCTION TO MICROPROCESSOR AND MICROCOMPUTER ARCHITECTURE:

VHX - Xilinx - FPGA Programming in VHDL

Transcription:

The Xilinx XC6200 chip, the software tools and the board development tools

What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions in hardware Also called a Reconfigurable Processing Unit (RPU)

Reasons to use an FPGA Hardwired logic is very fast Can interface to outside world Custom hardware/peripherals Glue logic to custom co/processors Can perform bit-level and systolic operations not suited for traditional CPU/MPU

The Xilinx XC6200 RPU SRAM-based FPGA Fast, unlimited reconfiguration Dynamic and partially reconfigurable logic Microprocessor interface Symmetrical, hierarchical and regular structure

Agenda XC6200 Architecture Design Flows Library Support Applications Reconfigurable Processing

Typical Embedded Control Design CPU Memory I/O Reconfigurable Coprocessor (FPGA) What are the problems with this? Camera and sensors I/O Motors and actuators

6 Problems Confronting Embedded Control Designers Today 1. Reconfiguration from external memory limited to low frequency CPU Memory I/O 4. High frequency access to registers needed 8. Bus access to large number of internal registers requires careful design 6. Partial Reconfiguration is difficult Reconfigurable Coprocessor (FPGA) I/O 2. Microprocessor interface consumes resources 3. Insufficient memory capacity for coprocessing algorithms

XC6200 System Features Meet these requirements of Embedded Coprocessing 1000x improvement in reconfiguration time from external memory CPU Memory I/O FastMAP tm assures high speed access to all internal registers All registers accessed via built-in low-skew FastMAP tm busses Reconfigurable Coprocessor XC6200 Microprocessor interface built-in High capacity distributed memory permits allocation of chip resources to logic or memory Ultrafast Partial Reconfiguration fully supported I/O Up to 100,000 000 gates!

XC6200 Architectural Overview Array of fine grain function cells, each with a register high gate count for structured logic or regular arrays Abundant, hierarchical routing resources global/local use cell logic/use switches Flexible pin configuration programmable as in, out, bidirectional, tristate CMOS or TTL logic levels

XC6200 Architecture (cont( cont) High speed CPU interface for configuration and register I/O Programmable bus width (8..32-bits) Direct processor read/write access to all user registers All user registers and configuration SRAM mapped into processor address space

XC6200 Architecture 16x16 Tile Processor interface FastMAP tm Interface Address Data Control User I/Os User I/Os User I/Os Number of tiles varies between devices in family User I/Os 4x4 Block Function Cell Hierarchy of tiles, blocks and cells

XC6200 Functional Unit Design based on the fact that any function of two Boolean variables can be computed by a 2:1 MUX.

XC6200 Unit Cell Each unit cell contains a computation unit: D-type register 2-input logic function Nearest neighbor interconnection Individually programmable from host interface (up) CLR CLK

Logical Organization: Basic Cell. From buses

Logical Organization: XC6200 Function Unit Function unit allows : any function of 2 variables any flavour of 2:1 mux buffers, inverters, or constant 0s and 1s any of the above in addition to a D-type register 3 I/Ps, each from any of 8 directions; O/P to up to 4 directions Programmable input Redirect output

Logical Organization: Function Unit. 2 bit programming C=combinational S=sequential Figure 6: XC6200 Function Unit Y3 MUX 2+2+1+1 bits

Logical Organization: Function Unit. (cont) Three basic modes

On any 2 of these arguments Cell logic function table Shows what selected by corresponding mux 0 for fast multiplexers sel

Logical Organization: Function Unit. (cont)

Physical Organization: Cells, Blocks and Tiles Bus of length 4

XC6200 Architecture Large array of simple, configurable cells (sea of gates) Each cell: D-Type register Logic function Nearest-neighbor interconnection Grouped in 4x4 block Regular connections to nearest neighbors

XC6200 Architecture 16 (4x4) neighborconnected cells are grouped together to form a larger cellular array Communication lanes available between neighboring 4x4 cell blocks

Routing Resources Example

Physical Organization: Cells, Blocks and Tiles (cont)

XC6200 Architecture Length 16 fast lanes A 4x4 array of the previously shown 4x4 blocks forms a 16x16 block Length 16 FastLANEs connect these larger arrays

XC6200 Architecture A 4x4 array of the 16x16 blocks forms the central 64x64 cell array Chip-Length FastLANEs connect Central block surrounded by I/O pads

XC6200 Routing Each level of hierarchy has its own associated routing resources Unit cells, 4x4, 16x16, 64x64 cell blocks Routing does not use a unit cell s resources Switches at the edge of the blocks provide for connections between the levels of interconnect

Routing Switches:

North and South Switches: This slide shows what is connected to routing switches

East and West Switches:

Clock Distribution: Clock connections are fixed, for speed

Clear Distribution:

Input/Output Architecture: Flexible pin configuration => programmable as in, out, bidirectional, tristate =>CMOS or TTL logic levels

Connections Between IOB s and Built-In XC6200 Control Logic: This is programmed for input only

Array Data Sources In West IOB s:

Easy interface to up XC6200 Device Organization Conceptual view Logic symbol RAM Interface Logic Array Programmable I/O OE Reset G1 G2 GClk GClr CS RdWr A(15:0) D(31:0) I/O RAM on board

FastMAP CPU Interface The industry s only random access configuration interface allows for extremely fast full or partial device configuration - you only program the bits you need Allows direct CPU (random) access to user registers supports coprocessing applications.

FastMAP CPU Interface (cont( cont) Easily interfaced to most microprocessors and microcontrollers memory mapped architecture makes it just like designing with SRAM

FastMAP (cont) Map Register allows mapping of user registers on to 8, 16, or 32 bit data bus Allows unconstrained register placement Obviates need for complex shift and mask operations Map Register Data Bus 1 0 0 0 1 1 0 0 0 1 0 0 1 1 bit 7 bit 6 bit 5 bit 4 bit 3 bit 2 bit 1 bit 0 User-defined register Cell Array Cells

FastMAP (cont) Wildcard Registers allow don t cares on address bits same data can be written to several locations (SRAM and user registers) in one cycle fast configuration of bit-slice type designs broadcast of data to registers without tying up valuable routing resources.

Partial Run-time Reconfiguration Extend hardware to a larger (virtual) capacity through rapid reconfiguration Derive time-varying structures that are smaller and faster than the ASIC counterpart Make more transistors participate in a given computation

Partial Run-time Reconfiguration F3 F4 F2 F6 F5 Time = 0

Partial Run-time Reconfiguration F8 F7 F9 F4 F2 F6 F5 Time = <a short time later>

Time to reconfigure Reconfiguration Speed vs Traditional Technologies Design Swapping 200us XC4013 Block Swapping XC6216 250ms Circuit Updates Size to reconfigure Rewiring 40ns ns us ms s

XC6200 Family Members Device XC6209 XC6216 XC6236 XC6264 Appr Gate Count 9k 16k 36k 64k Number of Cells 2304 4096 9216 16384 Max No. of Registers 2304 4096 9216 16384 Number of IOBs 192 256 384 512 Cell Rows x Columns 48x48 64x64 96x96 128x128 Notes : 1. Gate counts are estimated average cases, based on LSI Logic figures - register rich designs can have a much higher equivalent gate count than stated above. 2. Not all IOBs are connected directly to pads - some pads are shared between IOBs.

Design Flows Schematic Capture Macro Libraries VHDL Synthesis Delay File Hierarchical EDIF XACTstep Series 6000 Device Configuration This shows flow of of information among tools

Library Support Primitive gates and functions (compatible with other Xilinx parts) AND, OR, ADD, MULT, etc More complex macros also to be available memory access DSP functions (FIR, FFT, DCT) JTAG, decoders, etc.

Applications Can be used as regular FPGA serial interface allows for booting from PROM Intended to act as hardware accelerator for microprocessors FastMAP allows for direct microprocessor access to internal logic fast reconfiguration of all or part of device

Applications (cont) Context switching and virtual hardware are realistic propositions Typical uses might include: DSP, image processing, data paths, etc.

Reconfigurable Processing Custom computing concept, building on fast configuration virtual hardware PCI based development system to be made available can be used as a custom computer in its own right, or as an aid to system development for customers designs

XC6000 Software: XACT6000 Software From Xilinx. Trianus/Hades Design Entry Software for the XC6200 Velab: Free VHDL Elaborator for the XC6200. XC6200 Inspector.

EXAMPLE

A Multiplier for the XC6200 Structure Math Building Lookup Tables Area Optimization Mapping into an XC6200 Changing Coefficients Performance Summary

Distributed Arithmetic (Multiplier) 8 bit data 4 4 LUT 16 X 12 LUT 16 X 12 12 8 12 bit adder 12 4

Colors related to previous slide Math Class Constant LUT-B Input LUT-A Input LUT-B Output LUT-A Output Adder Output

Architecture of the Multiplier M[7:0] M[7:4] M[3:0] LUT-B LUT-A Pipelined Lookup Tables B[11:8] B[7:4] A[11:8] B[3:0] A[7:4] A[3:0] 4-bit half Carry 4-bit full Carry 4-bit half Pipeline Register Pipelined Adder P[15:12] P[11:8] P[7:4] P[3:0] A and B are arguments, pipelined

LUTs by Muxing Lookup Table contains all pre-calculated partial products. Use a Truth Table to determine Mux inputs. All possible products for multiplying by 0011 (3) A3 A2 A1 A0 P7 P6 P5 P4 P3 P2 P1 P0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 1 1 0 0 1 0 1 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 1 0 1 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 1 1 1 0 1 0 0 0 0 1 1 1 1 0 1 0 1 1 0 0 1 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 1 1 1 0 0 0 1 0 1 0 0 0 1 1 1 1 0 0 1 0 1 1 1 1 All logic realized as trees A0 A1 A2 Each Pi realized as this A3 Px

Optimizing the Lookup Two mux levels can be collapsed into a single gate. The function can be determined with a truth table. No optimization A3 A2 A1 A0 P7 P6 P5 P4 P3 P2 P1 P0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 1 1 0 0 1 0 1 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 1 0 1 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 1 1 1 0 1 0 0 0 0 1 1 1 1 0 1 0 1 1 0 0 1 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 1 1 1 0 0 0 1 0 1 0 0 0 1 1 1 1 0 0 1 0 1 1 1 1 XOR Func1 NAND Func2 OR Func3 BUF Func4 Optimized A0 A1???? A2 A3 Px Logic synthesis done by the CAD tool

Multiplier Schematic The corresponding view in the layout editor. The LUTs are offset to line up bits for adder. Pipeline registers are cheap. XC6216 has 4096 Flip Flops Schematic resembles the block diagram. Two LUTs sourcing adder. LUT-A LUT-B ADDER

A Closer Look at a Lookup Each 12-bit LUT is built from 12 one bit LUTs. LUTs get stacked vertically.

Determining Coefficients Schematic for a single 4-input LUT. Functions can be determined from the Truth Table. A3 A2 A1 A0 P 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 1 1 0 1 0 0 0 1 1 0 0 1 0 1 0 1 0 0 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0

Changing Coefficients Functionality of a cell is contained in one byte. 32-bit access can change the function of 4 cells per write cycle. 96 cells need writing, or 24 write cycles. (worst case) 1.45µs assuming 33MHz Func1 Func2 Func3 Func4 4*24=96

Summary on Multiplier Design 8x8 constant coefficient multiplier Pipelined - 75+ MHz performance Small grain architecture - High degree of LUT optimization Coefficients easily changed - Fast reconfiguration times. High Performance/Dollar

Complete Hardware Systems

EXAMPLE : H.O.T. Works Development system based on the Xilinx XC6200-series RPU Includes: H.O.T. Works Configurable Computer Board H.O.T. Works Development System Software

H.O.T. Works Board Interfaces with a host system (Windows95- based PC) on PCI bus 2MB SRAM (memory) XC6200 (RPU) PCI controller on XC4000 (FPGA) Expansion through Mezzanine connector

Memory Access Modes

Memory Access Modes

Memory Access Modes

Memory Access Modes

H.O.T. Works Software Xilinx XACTStep Map, Place and Router for XC6200 Velab Structural VHDL elaborator WebScope Java-based debug tool H.O.T. Works Development System C++-based API for board interfacing

Design Flow

Run-Time Programming C++ support software is provided for low-level board interface and device configuration Digital design is downloaded to the board at execution time User-level routines must be written to conduct data input/output and control

Conclusions Xilinx XC6200 provides a fast and inexpensive method to obtain great speedups in certain classes of algorithms There exist tools that provide a useable development platform to go from structural VHDL to digital design, and a programmable run-time interface in C++.

Sources Mark L. Chang <mchang@ece.nwu.edu> Stephen Churcher Ahmad Alsolaim