Introduction to Modern FPGAs

Similar documents
EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

INTRODUCTION TO FPGA ARCHITECTURE

Review. EECS Components and Design Techniques for Digital Systems. Lec 03 Field Programmable Gate Arrays

Field Programmable Gate Array (FPGA)

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Digital Integrated Circuits

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Programmable Logic. Any other approaches?

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

EE178 Lecture Module 2. Eric Crabill SJSU / Xilinx Fall 2007

EECS150 - Digital Design Lecture 16 Memory 1

FPGA architecture and design technology

EECS150 - Digital Design Lecture 16 - Memory

Topics. Midterm Finish Chapter 7

The Xilinx XC6200 chip, the software tools and the board development tools

Spiral 2-8. Cell Layout

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA: What? Why? Marco D. Santambrogio

Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

Programmable Logic Devices

Introduction to Field Programmable Gate Arrays

Chapter 5: ASICs Vs. PLDs

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

Review from last time. CS152 Computer Architecture and Engineering Lecture 6. Verilog (finish) Multiply, Divide, Shift

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

Design Methodologies. Full-Custom Design

An Introduction to Programmable Logic

Field Programmable Gate Array

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

Programmable Logic Devices UNIT II DIGITAL SYSTEM DESIGN

Design Methodologies and Tools. Full-Custom Design

FPGA How do they work?

FPGA Architecture Overview. Generic FPGA Architecture (1) FPGA Architecture

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx

FPGA Based Digital Design Using Verilog HDL

CS310 Embedded Computer Systems. Maeng

PLAs & PALs. Programmable Logic Devices (PLDs) PLAs and PALs

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras

Outline. Field Programmable Gate Arrays. Programming Technologies Architectures. Programming Interfaces. Historical perspective

Product Obsolete/Under Obsolescence

Xilinx DSP. High Performance Signal Processing. January 1998

Memory and Programmable Logic

LSN 6 Programmable Logic Devices

EITF35: Introduction to Structured VLSI Design

Chapter 13 Programmable Logic Device Architectures

TSEA44 - Design for FPGAs

Embedded Systems: Hardware Components (part I) Todor Stefanov

Memory and Programmable Logic

ECEN 449 Microprocessor System Design. FPGAs and Reconfigurable Computing

DIGITAL DESIGN TECHNOLOGY & TECHNIQUES

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES

Benefits of Embedded RAM in FLEX 10K Devices

Workspace for '4-FPGA' Page 1 (row 1, column 1)

Programmable Logic Devices FPGA Architectures II CMPE 415. Overview This set of notes introduces many of the features available in the FPGAs of today.

Hardware Design with VHDL PLDs IV ECE 443

FPGA Implementations

Summary. Introduction. Application Note: Virtex, Virtex-E, Spartan-IIE, Spartan-3, Virtex-II, Virtex-II Pro. XAPP152 (v2.1) September 17, 2003

Outline. EECS Components and Design Techniques for Digital Systems. Lec 11 Putting it all together Where are we now?

ECEU530. Project Presentations. ECE U530 Digital Hardware Synthesis. Rest of Semester. Memory Structures

Chapter 2. FPGA and Dynamic Reconfiguration ...

Field Program mable Gate Arrays

Virtex-II Architecture

VHX - Xilinx - FPGA Programming in VHDL

Topics. Midterm Finish Chapter 7

ECE 331 Digital System Design

The Virtex FPGA and Introduction to design techniques

VHDL-MODELING OF A GAS LASER S GAS DISCHARGE CIRCUIT Nataliya Golian, Vera Golian, Olga Kalynychenko

Very Large Scale Integration (VLSI)

ADVANCED FPGA BASED SYSTEM DESIGN. Dr. Tayab Din Memon Lecture 3 & 4

Programmable Logic. Simple Programmable Logic Devices

Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas

CMPE 415 Programmable Logic Devices Introduction

CHAPTER - 2 : DESIGN OF ARITHMETIC CIRCUITS

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

FYSE420 DIGITAL ELECTRONICS. Lecture 7

DIGITAL CIRCUIT LOGIC UNIT 9: MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES

Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool

Digital Design Methodology (Revisited) Design Methodology: Big Picture

Digital Design Methodology

ECE 485/585 Microprocessor System Design

The Nios II Family of Configurable Soft-core Processors

Design Methodologies

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

More Course Information

Altera FLEX 8000 Block Diagram

Section 6. Memory Components Chapter 5.7, 5.8 Physical Implementations Chapter 7 Programmable Processors Chapter 8

ECE 645: Lecture 1. Basic Adders and Counters. Implementation of Adders in FPGAs

Xilinx ASMBL Architecture

Digital Electronics 27. Digital System Design using PLDs

Field Programmable Gate Arrays (FPGAs)

FABRICATION TECHNOLOGIES

Outline of Presentation Field Programmable Gate Arrays (FPGAs(

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007

Transcription:

Introduction to Modern FPGAs Arturo Díaz Pérez Centro de Investigación y de Estudios Avanzados del IPN Departamento de Ingeniería Eléctrica Sección de Computación adiaz@cs.cinvestav.mx Outline Technology Evolution Field Programmable Gate-Arrays Applications Trends Future Processors Conclusions

Performance 1000 100 Moore s Law 10 1 Processor-DRAM Memory Gap (latency) 1980 1981 1982 1983 1984 1985 Moore s Law Less Law? 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Time DRAM CPU 2001 2002 2003 µproc 60%/yr. (2X/1.5yr) Processor-Memory Performance Gap: (grows 50% / year) DRAM 9%/yr. (2X/10 yrs) Today Conventional Microprocessors Instructions sets Advanced memory systems (L3 caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar, vectors, VLIW, speculative, branch prediction) Interconnection Technology Basic parallel processing

Modern Processor Architecture Alpha 21264 500Mhz Hardware Trends In 1994 Semiconductor Industry Association predicted that by 2010 800-million-transistor processors Thousands of pins 1,000-bit bus Clock speed over 2 GHz 180 W

What will integrated circuits be at 2010? Microprocessors will have more than one billion logic transistors on a single chip Transistors and wires will have feature sizes less than a tenth of a micron The time to send a signal along an on-chip wire will become proportionally much greater than that needed for a transistor to switch Off-chip communication will become relatively slower Minimizing power dissipation and the resultant heat will be enormous, despite reduce voltage levels Hardware Trends and Physical Limits On-chip wires are becoming much slower relative to logic gates as the onchip devices shrink. It will be impossible to keep one global clock over the entire chip It will be necessary to distribute silicon area among independent but coordinated different modules

In 10 Years! Using the Silicon PE PE MMX FFT VIZ RC5 M More Cache 64-way Superscalar CISC PE PE PE PE PE PE M MPP PE PE M Vector PE Reconfigurable Logic M Reconfigurable Processor

Using the Silicon PE MMX PE FFT VIZ RC5 Reconfigurable Logic M M CISC Reconfigurable Processor FPGA: Field Programmable Gate-Array VirtexE: ~ 3.2 Millions of gates <= 804 I/O Pins ~ 832 Kb RAM

Why FPGAs? (1 / 5) By the early 1980 s most of logic circuits in typical systems were absorbed by a handful of standard large scale integrated circuit s (LSI ICs). Microprocessors, bus/io controllers, system timers,... Every system still needed random small glue logic ICs to help connect the large ICs: generating global control signals (for resets etc.) data formatting (serial to parallel, multiplexing, etc.) Systems had a few LSI components and lots of small low density SSI (small scale IC) and MSI (medium scale IC) components. Printed Circuit (PC) board with many small SSI and MSI ICs and a few LSI ICs Why FPGAs? (2 / 5) Custom ICs sometimes designed to replace glue logic: reduced complexity/manufacturing cost, improved performance But custom ICs expensive to develop, and delay introduction of product ( time to market ) because of increased design time Note: need to worry about two kinds of costs: 1. cost of development, Non-Recurring Engineering (NRE), fixed 2. cost of manufacture per unit, variable Usually tradeoff between NRE cost and manufacturing costs Therefore custom IC approach was only viable for products with very high volume (where NRE could be amortized), and not sensitive in time to market (TTM) NRE NRE Total Cost Few Medium Many Units manufactured

Why FPGAs? (3 / 5) FPGAs introduced as alternative to custom ICs for implementing glue logic: improved PC board density vs. discrete SSI/MSI components (within around 10x of custom ICs) computer aided design (CAD) tools meant circuits could be implemented quickly (no physical layout process, no mask making, no IC manufacturing), relative to Application Specific ICs (ASICs) (3-6 months for these steps for custom IC) lowers NREs (Non Recurring Engineering) shortens TTM (Time To Market) Because of Moore s law the density (gates/area) of FPGAs continued to grow through the 80 s and 90 s to the point where major data processing functions can be implemented on a single FPGA. Why FPGAs? (4 / 5) FPGAs continue to compete with custom ICs for special processing functions (and glue logic) but now try to compete with microprocessors in dedicated and embedded applications Performance advantage over microprocessors because circuits can be customized for the task at hand. Microprocessors must provide special functions in software (many cycles) MICRO: Highest NRE, SW: fastest TTM ASIC: Highest performance, worst TTM FPGA: Highest cost per chip (unit cost)

Why FPGAs? (5 / 5) As Moore s Law continues, FPGAs work for more applications as both can do more logic in 1 chip and faster Can easily be patched vs. ASICs Perfect for courses: Can change design repeatedly Low TTM yet reasonable speed Programmable Logic Device Families Source: Dataquest Standard Logic Programmable Logic Devices (PLDs) SPLDs (PALs) Gate Arrays CPLDs Logic ASIC Cell-Based ICs Acronyms SPLD = Simple Prog. Logic Device PAL = Prog. Array of Logic CPLD = Complex PLD FPGA = Field Prog. Gate Array FPGAs Full Custom ICs Common Resources Configurable Logic Blocks (CLB) Memory Look-Up Table AND-OR planes Simple gates Input / Output Blocks (IOB) Bidirectional, latches, inverters, pullup/pulldowns Interconnect or Routing Local, internal feedback, and global

What is FPGA? Field Programmable Gate Arrays Array of logic cells connected via routing channels Special I/O cells Logic cells are mainly LUT with associated registers Interconnection on SRAM basis or antifuse elements Architecting a FPGA Performance Density and capacity Ease of use In-system programmability and in-circuit reprogrammability Other FPGA features Besides primitive logic elements and programmable routing, some FPGA families add other features Embedded memory Many hardware applications need memory for data storage. Many FPGAs include blocks of RAM for this purpose Dedicated logic for carry generation, or other arithmetic functions Phase locked loops for clock synchronization, division, multiplication.

Configurable Logic Block 4 Combinational Logic 1-bit reg 4 16x1 RAM 1-bit reg 4 Combinational Logic 1-bit reg 4 16x1 RAM 1-bit reg Logic Mode Memory Mode FPGA Variations Families of FPGA s differ in: physical means of implementing user programmability, arrangement of interconnection wires, and basic functionality of logic blocks Most significant difference is in the method for providing flexible blocks and connections

Technology and Architecture Tradeoffs Antifuse elements high density non volatile not reprogrammable User Programmability Latch-based (Xilinx, Altera, ) + reconfigurable - volatile latch - relatively large die size - Note: Today 90% die is interconnect, 10% is gates Latches are used to: 1. make or break cross-point connections in interconnect 2. define function of logic blocks 3. set user options: within the logic blocks in the input/output blocks global reset/clock Configuration bit stream loaded under user control: All latches are strung together in a shift chain Programming => creating bit stream

Idealized FPGA Logic Block Logic Block latch set by configuration bit-stream INPUTS 4-LUT FF 1 0 OUTPUT 4-input "look up table" 4-input Look Up Table (4-LUT) implements combinational logic functions Register optionally stores output of LUT Latch determines whether read reg or LUT 4-LUT Implementation 16 latch latch latch latch INPUTS 16 x 1 mux OUTPUT Latches programmed as part of configuration bit-stream n-bit LUT is actually implemented as a 2 n x 1 memory: inputs choose one of 2 n memory locations. memory locations (latches) are normally loaded with values from user s configuration bit stream. Inputs to mux control are the CLB (Configurable Logic Block) inputs. Result is a general purpose logic gate. n-lut can implement any function of n inputs!

LUT as general logic gate An n-lut as a direct implementation of a function truth-table Each latch location holds value of function corresponding to one input combination Example: 2-lut INPUTS AND OR 00 0 0 01 0 1 10 0 1 11 1 1 Implements any function of 2 inputs. How many functions of n inputs? INPUTS Example: 4-lut 0000 F(0,0,0,0) 0001 F(0,0,0,1) 0010 F(0,0,1,0) 0011 F(0,0,1,1) 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 store in 1st latch store in 2nd latch Look Up Tables Combinatorial Logic is stored in 16x1 SRAM Look Up Tables (LUTs) in a CLB Look Up Table 4-bit address Example: Combinatorial Logic A B C D Z A B C D Capacity is limited by number of inputs, not complexity Choose to use each function generator as 4 input logic (LUT) or as high speed sync.dual port RAM Z WE G4 G3 G2 G1 G Func. Gen. 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1... 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1

More functionality for free? Given basic idea LUT built from RAM Latches connected as shift register What other functions could be provided at very little extra cost? Using CLB latches as little RAM vs. logic Using CLB latches as shift register vs. logic 1. Distributed RAM CLB LUT configurable as Distributed RAM A LUT equals 16x1 RAM Implements Single and Dual- Ports Cascade LUTs to increase RAM size Synchronous write Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read LUT LUT LUT RAM32X1S D WE WCLK A0 A1 A2 A3 A4 or = = O RAM16X1S D W EWCL A0K A1 A2 A3 RAM16X2S D0 D1 WE WCLK O A O0 0A 1 1A 2A 3 or O RAM16X1D D W EWCL A0K A1 A2 A3 SPO DPRA0 DPO DPRA1 DPRA2 DPRA3

2. Shift Register Each LUT can be configured as shift register Serial in, serial out Saves resources: can use less than 16 FFs Faster: no routing Note: CAD tools determine with CLB used as LUT, RAM, or shift register, rather than up to designer IN CE CLK LUT = LUT D CE D CE D CE D CE Q Q Q Q OUT DEPTH[3:0] Design Flow 1 Design Entry in schematic, ABEL, VHDL, and/or Verilog. Vendors include Synopsys, Aldec (Xilinx Foundation), Mentor, Cadence, Viewlogic, and 35 others. Implementation includes Placement & Routing and bitstream generation using 2 Xilinx s M1 Technology. Also, analyze timing, view layout, and more. M1 Technology Download directly to the Xilinx hardware device(s) with unlimited reconfigurations*!! 3 *XC9500 has 10,000 write/erase cycles XC4000 XC4000 XC4000

How Program: FPGA Generic Design Flow Design Entry: Create your design files using: schematic editor or hardware description language (Verilog, VHDL) Design implementation on FPGA: Partition, place, and route ( PPR ) to create bit-stream file Divide into CLB-sized pieces, place into blocks, route to blocks Design verification: Use Simulator to check function, Other software determines max clock frequency. Load onto FPGA device (cable connects PC to board) check operation at full speed in real environment. Foundation Series Delivers Value & Ease of Use Complete, ready-to-use software solution Simple, easy-to-use design environment Easy-to-learn schematic, state-diagram, ABEL, VHDL, & Verilog design Synopsys FPGA Express Integration*

Example Partition, Placement, and Route n Idealized FPGA structure: n Example Schematic Circuit: n collection of gates and flipflops Circuit combinational logic must be covered by 4-input 1-output gates. Flip-flops from circuit must map to FPGA flip-flops. (Best to preserve closeness to CL to minimize wiring.) Placement in general attempts to minimize wiring. Xilinx FPGA Routing n n n 1) Fast Direct Interconnect - CLB to CLB 2) General Purpose Interconnect - Uses Switch switch matrix Matrix 3) Long Lines n n n Segmented across chip Global clocks, lowest skew 2 Tri-states per CLB for busses CLB CLB Switch Matrix CLB CLB

SINGLE Xilinx Virtex-E Routing Hierarchy Note: CAD tools do PPR, not designers LONG HEX SINGLE LONG HEX SINGLE SWITCH MATRIX INTERNAL BUSSES CARRY TRISTATE BUSSES LONG HEX CARRY Internal 3-state Bus Long lines and Global lines Buffered Hex lines (1/6 blocks) Single-length lines DIRECT CONNECTION LONG HEX SINGLE SLICE SLICE Local Feedback Direct connections CLB CARRY CARRY 24 single-length lines Route GRM signals to adjacent GRMs in 4 directions 96 buffered hex lines Route GRM (general routing matrix) signals to another GRMs six blocks away in each of the 4 directions 12 buffered Long lines Routing across top and bottom, left and right Virtex-E Configurable Logic Block (CLB) 2 logic slices / CLB, two 4-LUTs / slice => Four 4-LUTs / CLB

Virtex-E CLB Slice Structure Each slice contains two sets of the following: Four-input LUT Any 4-input logic function Or 16-bit x 1 sync RAM Or 16-bit shift register Carry & Control Fast arithmetic logic Multiplexer logic Multiplier logic Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control Very fast ripple carry: (24-bit @ 100 MHz) Details of Virtex-E Slice Multiplexors to help combine CLBs into larger multiplexor

Virtex-E Dedicated Expansion Multiplexers Since 4-LUT has 4 inputs, max is 2:1 Mux (2 inputs, 1 control line) MUXF5 combines 2 LUTs to create 4x1 multiplexer Or any 5-input function (5-LUT) Or selected functions up to 9 inputs MUXF6 combines 2 slices to form 8x1 multiplexer Or any 6-input function (6-LUT) Or selected functions up to 19 inputs Dedicated muxes are faster and more space efficient CLB Slice LUT LUT Slice LUT LUT MUXF5 MUXF5 MUXF6 Xilinx Virtex-E Chip Floorplan Input / Output Blocks (IOBs) Configurable Logic Blocks (CLBs) Block RAMs (BRAMs) Delay Locked Loop (DLL)

Block RAM (Extra RAM not using LUTs) Port A Spartan -IIE True Dual -Port Block RAM Port B Block RAM Most efficient memory implementation Dedicated blocks of memory Ideal for most memory requirements Virtex-E XCV2000 has 160? blocks 4096 bits per blocks Use multiple blocks for larger memories Builds both single and true dual-port RAMs CORE Generator provides custom-sized block RAMs Quickly generates optimized RAM implementation Virtex-E Block RAM Flexible 4096-bit block Variable aspect ratio 4096 x 1 2048 x 2 1024 x 4 512 x 8 256 x 16 Increase memory depth or width by cascading blocks

Virtex-E Delay Lock Loop (DLL) Capabilities Easy clock duplication System clock distribution Cleans and reconditions incoming clock Quick and easy frequency adjustment Single crystal easily generates multiple clocks Excellent for advanced memory types De-skew incoming clock Clock De-skew Generate fast setup and hold time or fast clock-to-outs DLL: Multiplication of Clock Speed Have faster internal clock relative to external clock source Use 1 DLL for 2x multiplication Combine 2 DLLs for 4x multiplication Reduce board EMI Route low-frequency clock externally and multiply clock on-chip 66 MHz 66MHz - 2x Clock Multiplication DLL 132 MHz (Multiply by 2)

DLL: Division of Clock Speed Selectable division values 1.5, 2, 2.5, 3, 4, 5, 8, or 16 Cascade DLLs to combine functions Combine DLLs to multiply and divide to get desired speed 50/50 duty cycle correction available Shift 30 MHz 180 Phase DLL 30 MHz Used for FB 30 MHz (180 Shift) 15 MHz 30 MHz DLL (Divide by 2) (180 Shift) 60 MHz (Multiply by 2) Clock x2 and Clock 2 Clock Management Summary All digital DLL Implementation Input noise rejection 50/50 duty cycle correction Clock mirror provides system clock distribution Multiply input clock by 2x or 4x Divide clock by 1.5, 2, 2.5, 3, 4, 5, 8, or 16 De-skew clock for fast setup, hold, or clockto-out times

Virtex-E Family of Parts What s Really In that Chip? Switch Matrix Programmable Interconnect Points, PIPs (White) Routed Wires (Blue) Direct Interconnect (Green) CLB (Red) Long Lines (Purple)

Exponential Growth in Density Logic Cells Logic Gates 1,000,000 12M 5 Million logic gates 100,000 1.2M 10,000 1,000 Year Nov. 1997- shipping world s largest FPGA, XC40125XV (10,982 logic cells, 250K System Gates) 1 Logic cell = 4-input LUT + FF 175,000 Logic cells = 2.0 M logic gates in 2001 120K 1994 1996 1998 2000 2002 12K LUT D FF Q Virtex-II Pro Feature/Product XC 2VP2 XC 2VP4 XC 2VP7 XC 2VP20 XC 2VP30 XC 2VP40 XC 2VP50 XC 2VP70 XC 2VP100 XC 2VP125 EasyPath cost reduction - - - - XCE 2VP30 XCE 2VP40 XCE 2VP50 XCE 2VP70 XCE 2VP100 XCE 2VP125 Logic Cells 3,168 6,768 11,088 20,880 30,816 43,632 53,136 74,448 99,216 125,136 Slices 1,408 3,008 4,928 9,280 13,696 19,392 23,616 33,088 44,096 55,616 BRAM (Kbits) 216 504 792 1,584 2,448 3,456 4,176 5,904 7,992 10,008 18x18 Multipliers Digital Clock Management Blocks 12 4 28 4 44 4 88 8 136 8 192 8 232 8 328 8 444 12 556 12 Config (Mbits) 1.31 3.01 4.49 8.21 11.36 15.56 19.02 25.6 33.65 42.78 PowerPC Processors 0 1 1 2 2 2 2 2 2 4 Max Available Multi-Gigabit Transceivers* 4 4 8 8 8 12* 16* 20 20* 24* Max Available User I/O* 204 348 396 564 644 804 852 996 1164 1200 1 Logic Cell = (1) 4-input LUT + (1) FF + (1) Carry Logic 1 CLB = (4) Slices http://www.xilinx.com/products/tables/fpga.htm#v2p

Issues in FPGA Technologies Complexity of Logic Element How many inputs/outputs for the logic element? Does the basic logic element contain a FF? What type? Interconnect How fast is it? Does it offer high speed paths that cross the chip? How many of these? Can I have on-chip tri-state busses? How routable is the design? If 95% of the logic elements are used, can I route the design? More routing means more routability, but less room for logic elements Issues in FPGA Technologies (cont) Macro elements Are there SRAM blocks? Is the SRAM dual ported? Is there fast adder support (i.e. fast carry chains?) Is there fast logic support (i.e. cascade chains) What other types of macro blocks are available (fast decoders? register files? ) Clock support How many global clocks can I have? Are there any on-chip Phase Logic Loops (PLLs) or Delay Locked Loops (DLLs) for clock synchronization, clock multiplication?

What about applications? Reconfigurable Computing Issues Attributes Highly regular Pipeline-able Variable bit -width Logical, integer or fixed-point operations Local interconnections Better tools There is currently no way to program RC systems in a general purpose manner. Current tools are slow and hardware oriented. Better architectures FPGA is current basis

Application Sampling Binary operations over large operands Arithmetic operations over non standard length operands Encryption, decryption and compression Sequence and string matching Sorting Physical system simulation Video and image processing Relaxation methods Neural networks implementation DSP Genetic programming Dynamic programming Applications There is a clear trend in personal computing toward multimedia-rich and mobile applications Audio and Video Compression 2D Image Processing 3D Graphics Speech and handwriting recognition media mining narrow-/broadband signal processing for communication Mobile computing

Mobile Computing Requirements for mobile services are wireless communications stability, bandwidth/cost considerations, integration into the familiar environment, application transparency, extendibility, and Security Authentication and encryption Future Mobile Processor PE MEMORY MMX AUDIO/ VIDEO MODEM Wireless Comm Advanced Encryption Standard AES Routing CODEC CISC

Conclusion: FPGAs (1) FPGAs are basically interconnect plus distributed RAM that can be programmed to act as any logical function of 4 inputs How they differ from idealized array: In addition to their use as general logic gates, LUTs can alternatively be used as general purpose RAM or shift register Each 4-LUT can become a 16x1-bit RAM array Special circuitry to speed up ripple carry in adders and counters Therefore adders assembled by the CAD tools operate much faster than adders built from gates and LUTs alone. Many more wires, including tri-state capabilities. Conclusion: FPGAs (2) CAD tools due the partitioning, routing and placement functions onto CLBs FPGAs offer compromise of performance, unit cost, time to market vs. ASICs and microprocessors plus software

Conclusions: Reconfigurable Computing The use of reconfigurable logic attached to a processor is feasible A number of ways in which the architecture of solutions to application problems might change in the future Reconfigurable hardware seems to have all the right characteristics to ensure its long-term position in the market Until arbitrary programs can be transformed into efficient hardware implementations, it will be necessary for programmers to think carefully about the hardware programs they write Hardware compilation and FPGA are key new ideas for research in computer architecture Reconfigurable hardware should be an increasingly important component in the future systems