Architecture by Xilinx, Inc. All rights reserved.

Similar documents
EE178 Lecture Module 2. Eric Crabill SJSU / Xilinx Fall 2007

Chapter 8 FPGA Basics

Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

Summary. Introduction. Application Note: Virtex, Virtex-E, Spartan-IIE, Spartan-3, Virtex-II, Virtex-II Pro. XAPP152 (v2.1) September 17, 2003

ispxpld TM 5000MX Family White Paper

EECS150 - Digital Design Lecture 16 Memory 1

EECS150 - Digital Design Lecture 16 - Memory

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

Virtex-II Architecture

ECEU530. Project Presentations. ECE U530 Digital Hardware Synthesis. Rest of Semester. Memory Structures

Programmable Logic. Simple Programmable Logic Devices

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

The Virtex FPGA and Introduction to design techniques

8. Selectable I/O Standards in Arria GX Devices

Field Programmable Gate Array (FPGA)

Topics. Midterm Finish Chapter 7

8. Migrating Stratix II Device Resources to HardCopy II Devices

ispgdx2 vs. ispgdx Architecture Comparison

Field Programmable Gate Array (FPGA) Devices

Programmable Logic. Any other approaches?

FPGA architecture and design technology

Outline. Field Programmable Gate Arrays. Programming Technologies Architectures. Programming Interfaces. Historical perspective

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

7-Series Architecture Overview

INTRODUCTION TO FPGA ARCHITECTURE

The Xilinx XC6200 chip, the software tools and the board development tools

Setup/Hold. Set Up time (t su ):

4. Selectable I/O Standards in Stratix II and Stratix II GX Devices

Topics. Midterm Finish Chapter 7

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Product Obsolete/Under Obsolescence

Introduction to Modern FPGAs

Section I. Cyclone II Device Family Data Sheet

Section I. Cyclone II Device Family Data Sheet

Xilinx ASMBL Architecture

Interfacing RLDRAM II with Stratix II, Stratix,& Stratix GX Devices

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Axcelerator Family FPGAs

5. High-Speed Differential I/O Interfaces in Stratix Devices

International Training Workshop on FPGA Design for Scientific Instrumentation and Computing November 2013.

Hardware Design with VHDL PLDs IV ECE 443

Introduction to FPGAs. H. Krüger Bonn University

FPGA Architecture Overview. Generic FPGA Architecture (1) FPGA Architecture

Introduction to Field Programmable Gate Arrays

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

Version 1.6 Page 2 of 25 SMT351 User Manual

QPro XQ17V16 Military 16Mbit QML Configuration PROM

Implementing LVDS in Cyclone Devices

FastFLASH XC9500XL High-Performance CPLD Family

APEX II The Complete I/O Solution

QPro XQR17V16 Radiation Hardened 16Mbit QML Configuration PROM

4. Selectable I/O Standards in Stratix & Stratix GX Devices

Chapter 2. Cyclone II Architecture

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

Lecture 11 Memories in Xilinx FPGAs

Gate Estimate. Practical (60% util)* (1000's) Max (100% util)* (1000's)

Review: Timing. EECS Components and Design Techniques for Digital Systems. Lec 13 Storage: Regs, SRAM, ROM. Outline.

AGM CPLD AGM CPLD DATASHEET

Using High-Speed Differential I/O Interfaces

XA Spartan-6 Automotive FPGA Family Overview

Virtex -E 1.8 V Extended Memory Field Programmable Gate Arrays

FPGA Implementations

4I39 RS-422 ANYTHING I/O MANUAL

Achieving Breakthrough Performance with Virtex-4, the World s Fastest FPGA

Selecting the Correct CMOS PLD An Overview of Advanced Micro Devices CMOS PLDs

Digital Integrated Circuits

DATA SHEET AGM AG16K FPGA. Low Cost and High Performance FPGA. Revision: 1.0. Page 1 of 17

Spartan-3E FPGA Design Guide for prototyping and production environment

Description. Device XC5202 XC5204 XC5206 XC5210 XC5215. Logic Cells ,296 1,936. Max Logic Gates 3,000 6,000 10,000 16,000 23,000

Interfacing FPGAs with High Speed Memory Devices

ΔΙΑΛΕΞΗ 2: FPGA Architectures

Introduction to Partial Reconfiguration Methodology

HDL Coding Style Xilinx, Inc. All Rights Reserved

TSEA44 - Design for FPGAs

ECE 485/585 Microprocessor System Design

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

APEX Devices APEX 20KC. High-Density Embedded Programmable Logic Devices for System-Level Integration. Featuring. All-Layer Copper.

Field Programmable Gate Array

6. I/O Features in Stratix IV Devices

Memory and Programmable Logic

Implementing Bus LVDS Interface in Cyclone III, Stratix III, and Stratix IV Devices

Typical Gate Range (Logic and RAM)*

Device XC5202 XC5204 XC5206 XC5210 XC5215. Max Logic Gates 3,000 6,000 10,000 16,000 23,000

13. Configuring Stratix & Stratix GX Devices

Enabling success from the center of technology. A Practical Guide to Configuring the Spartan-3A Family

High-Performance Memory Interfaces Made Easy

Chapter 2. FPGA and Dynamic Reconfiguration ...

Virtex -E 1.8 V Field Programmable Gate Arrays

XC1701L (3.3V), XC1701 (5.0V) and XC17512L (3.3V) Serial Configuration PROMs. Features. Description

Spiral 2-8. Cell Layout

EE260: Digital Design, Spring 2018

Virtex-II Pro and Virtex-II Pro X Platform FPGAs: Complete Data Sheet

XC95288 In-System Programmable CPLD

XC95144 In-System Programmable CPLD

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

ALTERA FPGAs Architecture & Design

Low Power Design Techniques

Transcription:

Architecture 2002 by Xilinx, Inc. All rights reserved.

Spartan-IIE Technical Details Table of Contents Spartan-IIE Overview Logic and Routing Embedded Memory System Clock Management Interfaces Select I/O Configuration Solutions

Xilinx Your Programmable Logic Solution Features Virtex-II CPLDs Low Power Spartan-IIE FPGAs SRAM-based Feature Rich Low Cost FPGAs SRAM-based Feature Rich High Performance 10K 600K 10M Density (System Gates)

The Spartan-IIE Solution More Than Just Silicon I/O Connectivity SelectIO TM Technology Support major I/O standards Memory Resources SRL16 registers Distributed Memory Block Memory External Memory I O B I O B DLL R A M R A M...... CLB CLB R A M CLB IOB... IOB... CLB DLL R A M DLL IOB IOB DLL I O B I O B Logic & Routing Flexible logic implementation Vector Based Routing Internal 3-State bussing System Clock Management Digital Delay Lock Loops (DLLs)

Spartan-IIE Features System Clock Management System Interfaces Embedded Memory Logic & Routing Configuration

Spartan-IIE Features Logic & Routing

Logic & Routing Configurable for simple to complex logic Excellent for fast arithmetic operations Flexible for logic or distributed RAM implementations Configurable Logic Block (CLB) Predictable routing delays Core-friendly architecture Quick Place and Route times Internal 3-state bussing CLB Array

Logic Advantages Look Up Table (LUT) versatility CLB primary building block Flexible for logic or distributed RAM implementation Fast arithmetic operations Specialized Carry Logic for arithmetic operations Fast DSP functions FIR filters Configurable for simple to complex logic Allow up to 6 input functions into a one logic level

CLB Structure COUT COUT G4 G3 G2 G1 F5IN BY SR Look-Up Table O Carry & Control Logic YB Y S D CK EC R Q G4 G3 G2 G1 F5IN BY SR Look-Up Table O Carry & Control Logic YB Y S D CK EC R Q F4 F3 F2 F1 Look-Up Table O Carry & Control Logic XB X S D CK EC R Q F4 F3 F2 F1 Look-Up Table O Carry & Control Logic XB X S D CK EC R Q CIN CLK CE SLICE CIN CLK CE SLICE Each slice has 2 LUT-FF pairs with associated carry logic Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs

CLB Slice Structure Each slice contains two sets of the following: Four-input LUT Any 4-input logic function Or 16-bit x 1 sync RAM Or 16-bit shift register Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control

Four-Input LUT Implements combinatorial logic Any 4-input logic function Cascaded for wide-input functions Truth Table Inputs(ABCD) Output(Z) 0000 0 0001 0 0010 1 0011 0.. 1110 1 1111 1 4-input logic function LUT = A B C D Z

Dedicated Expansion Multiplexers MUXF5 combines 2 LUTs to create 4x1 multiplexer Or any 5-input function (LUT5) Or selected functions up to 9 inputs MUXF6 combines 2 slices to form 8x1 multiplexer Or any 6-input function (LUT6) Or selected functions up to 19 inputs Dedicated muxes are faster and more space efficient CLB Slice LUT LUT Slice LUT LUT MUXF5 MUXF5 MUXF6

Distributed RAM CLB LUT configurable as Distributed RAM A LUT equals 16x1 RAM Implements Single and Dual-Ports Cascade LUTs to increase RAM size Synchronous write Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read LUT LUT LUT = RAM32X1S D WE WCLK A0 A1 A2 A3 A4 or = O RAM16X1S D WE WCLK A0 A1 A2 A3 RAM16X2S D0 D1 WE WCLK A0 A1 A2 A3 or O0 O1 O RAM16X1D D WE A0 A1 A2 A3 WCLK SPO DPRA0 DPO DPRA1 DPRA2 DPRA3

Shift Register Each LUT can be configured as shift register Serial in, serial out Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays Use CLB flip-flops to add depth IN CE CLK LUT = LUT D CE D CE D CE Q Q Q OUT D CE Q DEPTH[3:0]

Shift Register 12 Cycles 64 Operation A Operation B 4 Cycles 8 Cycles Operation C 3 Cycles 64 3 Cycles Register-rich FPGA 9-Cycle imbalance Allows for addition of pipeline stages to increase throughput Data paths must be balanced to keep desired functionality

Shift Register 12 Cycles 64 Operation A Operation B 4 Cycles 8 Cycles Operation C Pipeline 64 3 Cycles 9 Cycles LUT as shift register Used to add pipeline stages Increase overall register count 16 bit shift register per LUT 64 bit shift register per CLB 12 Cycles Paths statically balanced

CLB Arithmetic Logic Dedicated carry logic Provides high performance for counters & arithmetic functions Discrete XOR component for single level sum completion Two separate carry chains in CLB allow for 3 operand functions LUT LUT 01 01 LUT LUT 01 01 Single-level Sum Can also be used to cascade LUTs for wide-input logic functions

3 Operand Adder Function COUT COUT B1 B0 Look-Up Table Carry & Control Logic PARTIAL1 Look-Up Table O Carry & Control Logic SUM1 A1 A0 Look-Up Table Carry & Control Logic PARTIAL0 Look-Up Table Carry & Control Logic SUM0 C1 C0 CIN SLICE0 A, B, C are two-bits wide CLB CIN SLICE1 SUM = A + B + C or PARTIAL + C, where PARTIAL = A + B Implementation First 2-operand sum A+B is performed in Slice 0 Second 2-operand sum PARTIAL + C is performed in Slice 1 Fast local feedback connection within the CLB Very small delay for on PARTIAL

Carry Logic for Wide Input Functions Higher performance Efficient resource utilization Common applications Wide input decoding Comparators HDL design entry LUT can be inferred MUXCY must be instantiated

12- Input AND Function L K J I H G F E D C B A LUT3 INIT=8000 LUT2 INIT=8000 LUT1 INIT=8000 0 1 0 1 0 1 MUXCY MUXCY MUXCY Vcc Output 4-Input AND Truth Table Inputs(ABCD) Output(Z) Output(HEX) 0000 0 0001 0 0010 0 0011 0.... 1011 0.. 1100 0 1101 0 1110 0 1111 1 Utilization 3 LUTs and 3 MUXCYs As opposed to 4 LUTs Performance 1 logic level As opposed to 2 logic levels 0 8

12- Input OR Function L K J I H G F E D C B A LUT3 INIT=0001 LUT2 INIT=0001 LUT1 INIT=0001 0 1 0 1 0 1 MUXCY MUXCY MUXCY Output Vcc Vcc Vcc 4-Input NOR Truth Table Inputs(ABCD) Output(Z) Output(HEX) 0000 1 0001 0 0010 0 0011 0.... 1011 0.. 1100 0 1101 0 1110 0 1111 0 Utilization 3 LUTs and 3 MUXCYs As opposed to 4 LUTs Performance 1 logic level As opposed to 2 logic levels 1 0

Dedicated CLB Multiplier Logic LUT A CY_MUX CO S DI CI CY_XOR MULT_AND Dedicated AND gate A x B B Dedicated AND gate Highly efficient Shift & Add implementation For a 16x16 Multiplier 30% reduction in area and one less logic level

Lower Operating Power 1.8V core supply Reduces power consumption Advanced signaling standards Smaller voltage transitions Reduces switching power DLLs reduce clock speed requirements Faster clock propagation Internal multiplication of clock Reduces power on clock nets

Logic Summary Flexible Configurable Logic Block (CLB) implementations Logic Distributed RAM Shift register CLB configurable for simple to complex logic Any 6 input function into one logic level Excellent for fast arithmetic operations Specialized carry logic for arithmetic operations Fast DSP functions FIR filters

Spartan-IIE Features Logic & Routing

Routing Core-friendly vector-based routing Provides predictable routing delays independent of IP placement Number of IP Device size Superior routing Quick Place and Route times Design to system at 100,000 gates per minute Easier rerouting Internal 3-state bussing Eliminates bus routing contention Reduced CLB usage by using 3 states instead of MUXs Increases performance by reducing logic levels CLB Array

High-Performance Routing LONG HEX SINGLE LONG HEX SINGLE General SWITCH Routing Matrix MATRIX (GRM) CARRY CARRY INTERNAL BUSSES TRISTATE BUSSES LONG HEX SINGLE Internal 3-state busses Long lines and Global lines Buffered Hex lines Single-length lines LONG HEX SINGLE SLICE Local Feedback SLICE Direct connections CLB CARRY CARRY Local routing Direct connections General Routing Matrix (GRM) Single line, Long line, Hex line Dedicated routing Internal 3-state bus Global routing Primary Clock Buffer lines, Secondary lines

Local Routing Local Routing Interconnect among LUTs, FFs, GRM CLB feedback path for connections to LUTs in same CLB Direct path between horizontally adjacent CLBs

General Purpose Routing LONG HEX SINGLE LONG HEX SINGLE SWITCH MATRIX INTERNAL BUSSES CARRY CARRY TRISTATE BUSSES LONG HEX SINGLE Internal 3-state Bus Long lines and Global lines Buffered Hex lines Single-length lines DIRECT CONNECTION LONG HEX SINGLE SLICE Local Feedback SLICE Direct connections CLB CARRY 24 single-length lines Route GRM signals to adjacent GRMs in 4 directions 96 buffered hex lines Route GRM signals to another GRMs six blocks away in each of the four directions 12 buffered Long lines Routing across top and bottom, left and right CARRY

Routing Summary Vector-based routing Predictable routing delays independent of device size and routing direction Core-friendly architecture Quick Place and Route times Design to system at 100,000 gates per minute Easier re-routing Internal 3-state bussing Eliminates bus routing contention Improves density and performance CLB Array

Spartan-IIE Features Embedded Memory

D C CL E K A0 A1 A2 A3 Shift Register LUT 16 registers, 1 LUT Compact & fast D CL K A0 A1 A2 A3 SRL16E Spartan-IIE Memory Hierarchy SRL16 Q Pipelining Buffers Q Bytes Distributed RAM Single-port Dual port Cascadable 16x1 DSP Coefficients Small FIFOs Scratch Pad Block RAMs 4Kbit blocks True dual-port Port A 4Kx1 2Kx2 1Kx4 512x8 256x16 Block RAM Cache Tag memory Large FIFOs Packet buffers Video line buffers Kilobytes Port B High-Performance External Memory Interfaces DDR I/O SSTL, HSTL, CTT SDRAM SGRAM PB SRAM DDR SRAM ZBT SRAM QDR SRAM Collaboration with memory vendors IDT, Cypress, Micron, NEC, Samsung, Toshiba... Megabytes

Distributed RAM RAM16X1S CLB LUT configurable as Distributed RAM A LUT equals 16x1 RAM Implements single and dual ports Cascade LUTs to increase RAM size Synchronous write Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read LUT LUT LUT RAM32X1S D WE WCLK A0 A1 A2 A3 A4 or = = O D WE WCLK A0 A1 A2 A3 RAM16X2S D0 D1 WE WCLK A0 A1 A2 A3 O0 O1 or O RAM16X1D D WE A0 A1 A2 A3 WCLK SPO DPRA0 DPO DPRA1 DPRA2 DPRA3

SRL-16 and SRL-16E D CLK A0 A1 A2 A3 SRL16 Q IN CE CLK LUT D CE D CE Q Q 16-bit Shift Register Look-Up-Table D CE CLK A0 A1 A2 A3 SRL16E Q CLB Slice LUT Slice LUT D CE D CE Q Q OUT 16-bit Shift Register Look-Up-Table with Clock Enable LUT LUT ADDR[3:0]

Distributed RAM Dual-Port Implementation 2 LUTs equal 16x1 dual-port RAM A Port A[3:0] Uses A[3:0] address WE D Write and read WCLK B Port Uses DPA[3:0] address Read only Excellent for FIFOs, scratch pads. RAM16X1D 16 x 1 RAM 16 x 1 RAM SPO DPA[3:0] DPO

Block RAM Port A Spartan-IIE True Dual-Port Block RAM Port B Block RAM Most efficient memory implementation Dedicated blocks of memory Ideal for most memory requirements 8 to 72 memory blocks 4096 bits per blocks Use multiple blocks for larger memories Builds both single and true dual-port RAMs CORE Generator provides custom-sized block RAMs Quickly generates optimized RAM implementation

Block RAM Configurable synchronous Block RAM Single-port RAM True dual-port RAM Two independent single-port RAMs Block count increases with FPGA size Device No. of Blocks Block RAM Bits XC2S50E 8 32,768 XC2S100E 10 40,960 XC2S150E 12 49,152 XC2S200E 14 57,344 XC2S300E 16 65,536 XC2S400E 40 163,840 XC2S600E 72 294,912

Block RAM Flexible 4096-bit block Variable aspect ratio 4096 x 1 2048 x 2 1024 x 4 512 x 8 256 x 16 Increase memory depth or width by cascading blocks

Block RAM Single-Port Implementation Easy cascading of block RAMs Utilize variable aspect ratio for desired RAM size Example Desired RAM size: 1024 x 8 1024 x 4 + 1024 x 4 = 1024 x 8 CORE Generator software Efficiently cascades RAM blocks Quick custom RAM implementation DATA[7..4] DATA[3..0] 1024 X 8 RAM RAMB4_S4 WE EN RST DO[3:0] CLK ADDR[9:0] DI[3:0] RAMB4_S4 WE EN RST DO[3:0] CLK ADDR[9:0] DI[3:0] OUT[7..4] OUT[3..0]

Dual-Port Bus Flexibility RAMB4_S4_S16 WEA Port A In 1K-Bit Depth ENA RSTA CLKA DOA[3:0] Port A Out 4-Bit Width ADDRA[9:0] DIA[3:0] WEB Port B In 256-Bit Depth ENB RSTB CLKB ADDRB[7:0] DOB[15:0] Port B Out 16-Bit Width DIB[15:0] Each port can be configured with a different data bus width Provides easy data width conversion without any additional logic

Two Independent Port A In 2K-Bit Depth Single-Port RAMs VCC, ADDR[10:0] WEA ENA RSTA CLKA ADDRA[10:0] DIA[0] RAMB4_S1_S1 DOA[0] Port A Out 1-Bit Width Port B In 2K-Bit Depth GND, ADDR[10:0] WEB ENB RSTB CLKB ADDRB[10:0] DIB[0] Added advantage of True Dual-Port No wasted RAM Bits Can split a Dual-Port 4K RAM into two Single-Port 2K RAM Simultaneous independent access to each RAM DOB[0] Port B Out 1-Bit Width To access the lower RAM Tie the MSB address bit to Logic Low To access the upper RAM Tie the MSB address bit to Logic High

CAM in Block RAM Content Addressable Memory (CAM) Storage array like a RAM Functionally opposite of a RAM Quickly find the location of a particular stored value Output the address and toggle the MATCH line, if data match is found RAM CAM ADD[9:0] DATA [7:0] 1024x8 DATA[7:0] ADD [9:0] 1024x8 MATCH Used in telecommunications, networking, Ethernet, ATM switches Xilinx provides reference designs and application notes

External Memory Interface Easy access to high-speed external memory External Memory Type SelectI/O Stardard SRAM SSTL SGRAM HSTL ZBT SRAM/NoBL LVTTL QDR SRAM HSTL SDRAM LVTTL DDR SRAM SSTL2 EDO TTL FPM TTL PB TTL PC100/133 LVTTL / SSTL SelectI/O TM provides interface to most memory types

Memory Controller Designs Memory Resources Free! DRAM controller 64-bit DDR DRAM controller 16-bit DDR DRAM controller SDRAM controller SRAM controller ZBT SRAM controller QDR SRAM controller SigmaRAM controller Flash controller NOR / NAND flash controller Embedded memory CAMs, FIFOs Memory Solutions Portal www.xilinx.com/memory Download Now!

Embedded Memory Summary Fast distributed RAM Data right beside logic Memory requirements solved by Block RAM Single and True Dual-Port RAM implementations FIFO for buffering data Data width conversion Cache Register stacks CAM for high-speed parallel searches Many more Direct connection to external high-speed memory

Spartan-IIE Features System Clock Management

System Clock Management I O B I O B DLL R A M R A M CLB... CLB IOB...... IOB... CLB CLB DLL R A M R A M DLL IOB IOB DLL 4 DLLS in every device I O B I O B 100% Digital DLL Design Noise insensitive Scalable to new processes Excellent Jitter specifications +/- 100ps, <<50ps Typical No cumulative phase error Used in advanced memories Every Spartan-IIE device has 4 DLLs External clock outputs Delay Locked Loops Lower Board Costs

System Clock Management DLL1 DLL2 Mirror clock for board distribution De-skew clocks System Clocks 4 low-skew global clocks DLL3 DLL4 Convert clock to different I/O standards using SelectI/O Multiply Divide Shift Delay Lock Loops (DLLs) Lower Board Costs

Generic DLL Operation A DLL inserts delay on the clock net until the clock input rising edge is in phase with the clock feedback rising edge Requires a well-designed clock distribution network: the clock edges arrive simultaneously everywhere in the part CLKIN Delay Delay Delay Delay CLKOUT Phase Delay Control Clock Distribution Network CLKFB

DLL Capabilities Easy clock duplication System clock distribution Cleans and reconditions incoming clock Quick and easy frequency adjustment Single crystal easily generates multiple clocks Faster state machine utilizing different clock phases Excellent for advance memory types Clock De-skew De-skew incoming clock Generate fast setup and hold time or fast clock-to-outs

DLL: Clock Mirrors 100MHz Clock Mirror 100 MHz DLL 100 MHz Feedback from External Trace Input clock duplication Provides on and off-chip clocks Clock distribution across system Cleans and reconditions backplane or noisy clocks Extremely low output skew *Actual Device Measurements

Spartan-IIE DLL Example 1X Clock Mirror with 180 Output Phase (100MHz) Xilinx FPGA 1X 100 MHz Clock 180 0 DLL 100 MHz (0 Phase) 100 MHz (180 Phase) Benefit - DDR Memory Interface - Avoid external DLLs

DLL: Multiplication Use 1 DLL for 2x multiplication Combine 2 DLLs for 4x multiplication Reduce board EMI Route low-frequency clock externally and multiply clock on-chip 66 MHz 66MHz - 2x Clock Multiplication DLL 132 MHz (Multiply by 2)

DLL: Multiplication Example Reduce EMI by increasing data width and decreasing clock frequency Cross over clock domains without worries Synchronized clock edges No external drift Minimal external clock skew 16 IO 16 Data Buffer 32 Internal Logic CLK 2x DLL 1x

DLL: 2x Multiplication Implementation Requires one CLKDLL primitive CLK0 output removes skew between registers on the chip CLK2X is 2X clock output

DLL: Division Selectable division values 1.5, 2, 2.5, 3, 4, 5, 8, or 16 Cascade DLLs to combine functions 50/50 duty cycle correction available 180 Phase Shift 30 MHz DLL 30 MHz Used for FB 30 MHz (180 Shift) 15 MHz 30 MHz (Divide by 2) DLL (180 Shift) 60 MHz (Multiply by 2) Clock x2 and Clock 2

DLL: Phase Shift 180 Phase Shift Phase shifts 0, 90, 180, and 270 Increase system performance by utilizing additional clock phases 50/50 duty cycle correction available Excellent for external memory interfaces DDR and QDR RAM 100 MHz (0 Shift) DLL 100 MHz (180 Shift)

DLL: Speedup Tsu/h and Tco External Clock T clock = 0ns Internal Clock DLL D Q > OUT T c2q + T out = T co * Spartan-IIE data sheet module 3 Pin-to-Pin Parameters, LVTTL, 12 ma, Fast Slew Rate External Spec No DLL Nullify clock line delay External clock pin and internal clock are aligned Optional duty cycle correction 50/50 duty cycle correction applied when specified Low sensitivity to clock input noise Lower-cost oscillator With DLL Setup 2.0ns 1.7ns Hold 0ns -0.4ns Clock to out 4.7ns 3.1ns

Spartan-IIE DLL Example Clock-to-Out Improvement Using DLLs Output standard = SSTL-3 Class-II (OBUF_SSTL3_II) Temp=100C, Vdd=2.375V, Vcco=3.3V, Vtt=1.5V Waveforms: 1: CLKIN 2: DATA OUT (no DLL) 3: DATA OUT (DLL deskewed) Timing: w/o DLL w/ DLL r->r r->f r->r r->f 3.5n 3.8n 1.1n 1.3n Benefit - Increases Timing Budget - Allows Use of Cheaper Memories

System Clock Management Summary All digital DLL Implementation Input noise rejection 50/50 duty cycle correction Clock mirror provides system clock distribution Multiply input clock by 2x or 4x Divide clock by 1.5, 2, 2.5, 3, 4, 5, 8, or 16 Provides 0, 90, 180, and 270 clock phase shift De-skew clock for fast setup, hold, or clock-to-out times

Spartan-IIE Features System Interfaces

Comprehensive I/O Connectivity Single ended and differential DLL IOB IOB DLL Up to 514 single-ended, 205 differential R CLB CLB... R I pairs A A O M M B 400 Mb/sec LVDS: ideal for Consumer Applications 19 I/O standards, 8 flexible I/O banks R R I A A O PCI 32/33 and 64/66 support M... CLB CLB M B Multiple package options DLL IOB IOB DLL 3IOBregisters: in, out, 3-state 8 I/O banks enable multiple simultaneous standards Voltages: 3.3V, 2.5V, 1.8V, 1.5V I O B I O B...... Chip-to-Chip Interfacing: LVDS LVPECL LVCMOS LVTTL Backplane Interfacing: AGP GTL GTL+ PCI BLVDS High-speed Memory Interfacing: CTT HSTL SSTL

Basic I/O Block Structure Three-State FF Enable Clock Set/Reset Output FF Enable D EC SR D EC SR Q Q Three-State Control Output Path Direct Input FF Enable Registered Input Q D EC SR Input Path

Programmable Output Driver Significant EMI reduction benefit Programmable driver strength Pull-up and Pull-down drivers can be individually controlled 16 different setting for each 2 slew rate settings Simultaneous Switching Output Guidelines

Post-PCB Signal Integrity Adjustment Optimizing Performance As Built Initial Design: LVTTL_F16 (Fast slew, 16 ma) Driver impedance too low Undershoot! Final Design: LVTTL_F8 (Fast slew, 8 ma) Driver impedance ~50Ω -- No Undershoot Requires a Bitstream Change Only!

System Interfaces -- SelectI/O LVDS Voltage Standards 3.3V 2.5V 1.8V 1.5V Chip-to-Chip Interfaces LVPECL LVCMOS LVTTL 19 Different Standards Supported! Backplane Interfaces AGP GTL GTL+ PCI BLVDS High-speed Memory Interfaces CTT HSTL SSTL Supports multiple voltage and signal standards simultaneously Eliminate costly bus transceivers

SelectI/O TM Standards Standard V REF V CCO Chip to Chip Interface LVTTL na 3.3 LVCMOS2 na 2.5 LVCMOS18 na 1.8 LVDS na 2.5 LVPECL na 3.3 Backplane Interface PCI 33MHz 3.3V na 3.3 PCI 66MHz 3.3V na 3.3 GTL 0.80 na GTL+ 1.00 na AGP-2X 1.32 3.3 Bus LVDS na 2.5 Memory Interface HSTL-I 0.75 1.5 HSTL-III & IV 0.90 1.5 SSTL3-I & II 1.50 3.3 SSTL2-I & II 1.25 2.5 CTT 1.50 3.3 Output Input V CCO V CCO defines output voltage Internal Reference User I/O Pin V REF V REF defines input threshold reference voltage Available as user I/O when using internal reference

I/Os Separated into 8 Banks Bank 0 Bank 1 Bank 7 Bank 6 I O B I O B DLL R A M R A M CLB... CLB IOB GCLK3 GCLK1...... IOB GCLK2... GCLK0 CLB CLB DLL R A M R A M DLL IOB IOB DLL I O B I O B Bank 2 Bank 3 Banks 2 and 3 used during configuration IOB=I/O Blocks Bank 5 Bank 4

I/O Signal Types I/O Signal Type Single-Ended Differential LVCMOS HSTL SSTL LVTTL LVDS Bus LVDS LVPECL NOTE: Only the popular IO types shown here

Single Ended I/O Traditional means of data transfer Data is carried on a single line Bigger voltage swing between logic Low and High 3.3 V Logic High Driver Data Out Data In Receiver 2 V 0.8 V 1.2V swing Logic Low Single ended data transfer LVTTL input levels

SystemI/O Single-Ended I/O Standards Summary

Differential I/O Latest means of data transfer One data bit is carried through two signal lines Voltage difference determines logic High or Low Smaller voltage swing between logic Low and High Higher performance Lower power Lower noise Driver Data Out Rt Receiver + - Data In 3.3 V 1.7 V 1.3 V 0.4V swing Differential signal data transfer LVDS Input levels

SelectI/O: Differential I/O Types LVDS (Low Voltage Differential Signal) Unidirectional data transfer Bus LVDS Bi-directional communication between 2 or more devices Can transmit and receive LVDS signals through the same pins LVPECL (Low Voltage Positive Emitter Coupled Logic) Unidirectional data transfer Popular industry standard for fast clocking

More Differential I/O Information Xilinx web site (http://www.xilinx.com/apps/xapp.htm) Application Notes XAPP230, XAPP231, XAPP232, XAPP233, XAPP237, XAPP238, XAPP243, XAPP245 National Semi. web site (http://www.national.com/appinfo/lvds) LVDS Design Guide BLVDS White Paper

System Interface Summary SelectI/O TM supports 19 IEEE/JEDEC I/O standards High speed with differential I/Os Low power, less noise External high speed memory interface Use HSTL and SSTL standards High performance backplane applications Use PCI, GTL and GTL+ standards Flexible I/O block Programmable slew rate for EMI and ground bounce control Independent input, output and programmable 3-state registers Input delay for 0 hold time

Spartan-IIE Features Configuration

Configuration Basics Configuration Data Source Simple Serial Interface System Integrated Serial High Performance Parallel Spartan-IIE device Is SRAM-based and hence volatile Needs a configuration data source Needs to be re-configured (re-programmed) upon power-up ISP Re-programmable/upgradable in the field Configuration Programming the device with design logic

Configuration Configuration data source PROM Serial/Parallel PROMs Hard disk Microprocessor memory Configuration interface Simple serial High-speed parallel JTAG or boundary scan IRL Microprocessor CPLD Software Tools Pervasive Networking IRL Programmable Silicon

JTAG Basics Also known as IEEE/ANSI standard 1149.1 Boundary scan Set of design rules that facilitate Testing Programming Debug Can be done at the chip, board, and systems level Can also have user-defined instructions Example: vendor-specific instructions: configure and verify

JTAG Basics (cont d) Rapid and automatic detection and isolation of defects due to common failures Detect opens and shorts Ensure all components on PCB are Mounted properly In right place Have proper interconnects among them Allows complete control and access to the boundary pins of a device without the need for Bed-of-nails Other test equipment

JTAG Compliant Device Includes a boundary-scan cell connected to each input, output or bi-directional pin Transparent and inactive under normal conditions Test mode Input signals captured and output signals set to affect other devices on the board

JTAG Mode Supports readback through boundary scan port Can mix any Xilinx device (FPGA, CPLD, PROM) and non-xilinx devices in the chain

JTAG Mode (Cont d) Dedicated TDI, TCK, TDO and TMS pins must operate at LVTTL V CCO for bank 2 must be at 3.3V Maximum configuration rate of 33 MHz

Xilinx Web: Configuration Solutions

Xilinx Download Cables Types MultiLINX cable Parallel cable Perfect source for prototype and debugging Supports all traditional and JTAG-based configuration methods

Cable Software Support impact software Included in Xilinx Alliance and Foundation ISE software tools

Summary System Clock Management System Interfaces Embedded Memory Logic & Routing Configuration

Spartan-IIE: A System-Level Solution Hierarchical memory support SelectRAM+ can be used to create bytes or Kbytes of internal storage and access megabytes of fast external memory System speedup and synchronization Nullify clock distribution delays - 160 MHz system performance Synthesize clocks for internal and external use Synchronize systems: create clock mirrors/nullify board delay System level integration Connect directly to existing and emerging I/O standards Vector-based interconnect Much more predictable before place and route Enhances synthesis-based flows

Spartan-IIE: A System-Level Solution IP solutions Software Based on proven timing-driven place and route technology System-level features RAM, DLLs, I/O standards Re-programmable

Reference Slides 2002 by Xilinx, Inc. All rights reserved.

SelectI/O TM I/O can be programmed for 19 signal standards Provides industry-standard IEEE/JEDEC I/O standards Single-ended and differential Allows connection to Processors, memory, bus-specific standards, mixed signal High-performance backplanes Improved power and grounds ratio to minimize ground bounce Simple entry of I/O standards in design tools

Chip-to-Chip Interface Standards 5V V CC 5V V CC 5V V CC 4.44V V OH 3.5V V IH 2.5V V T 2.4V V OH 2.4V V OH 2.0V V IH 1.5V V IL 1.5V V T 1.6V 1.5V 1.4V V IH V T V IL 0.5V ETL Enhanced transceiver logic V OL 0 GND 5-V CMOS Rail-to-Rail 5V 0.8V 0.4V V IL V OL 0 GND 5-V TTL ABT 0.4V V OL 0 GND ETL (ABTE)

Chip-to-Chip Interface Standards (Cont d) 3.3V V CC 3.3V V CC 3.3V V CC 2.4V 2.0V 0.8V 0.4V V OH V IH V IL V OL 0 GND LVTTL LVT 1.9V 1.7V 0.7V 0.4V V OH V IH V IL OL 0 GND LVCMOS2 1.8V 1.4V 1.1V 0.6V 0.4V V CC V OH V IH V IL OL 0 GND LVCMOS18 1.9V 1.7V 1.3V 1.1V V OH V IH V IL V OL 0 GND LVDS 1.5V V CC 1.1V 0.85V V OH V IH 0.65V V IL 0.4V V OL 0 GND LVPECL

Backplane Interface Standards 3.3V V CC 3.3V 3.0V V CC V OH 3.3V 3.0V V CC V OH 2.1V V OH 1.7V 1.3V 0.9V V IH V IL V OL 0 GND Bus LVDS 1.7V 1V 0.4V V IH V IL V OL 0 GND PCI 1.5V 1.1V 0.3V V IH V IL V OL 0 GND AGP-2X 0.85V 0.8V 0.4V 0 GND GTL V IH V IL V OL 1.1V 0.9V 0.6V V IH V IL V OL 0 GND GTL+

Memory Interface Standards 3.3V V CC 3.3V V CC 2.1V 1.7V V OH V IH 1.9V 1.7V V OH V IH 1.5V V CC 1.1V V OH 0.85V V IH 0.65V V IL 0.4V V OL 0 GND 1.3V 0.9V V IL V OL 0 GND 1.3V 1.1V 0 GND HSTL SSTL3/2 CTT V IL V OL

SelectI/O Input Bank Rules Each bank has a single input reference voltage (V REF ) Shared among all I/Os in the bank All I/O types in a bank must use the same reference voltage All V REF pins in a bank must be tied to the same voltage Inputs not requiring a V REF fit in the bank LVTTL, LVCMOS, LVPECL, LVDS, PCI V REF pins in a bank available as additional I/O, iff I/O type does not require V REF Otherwise, all V REF pins must be used to supply reference voltage OBUFTs with Keepers require a reference voltage and are treated as IOBUFs Input buffers with LVTTL, LVCMOS2/18, PCI33/66 supplied by V CCO

SelectI/O Output Banks Each bank has a single source voltage (V CCO ) Shared among all I/Os in that bank All I/O types in a bank must use the same voltage source All V CCO pins in a bank must be tied to the same voltage Only one VCCO voltage for smaller pin count packages TQ144, PQ208 Outputs not requiring V CCO fit in the bank GTL, GTL+ Configuration pins need special consideration Configuration pins are located on the right side of device in Banks 2 and 3 V CCO must be 3.3 volts for serial PROMs configuration

Single-Ended I/O Standards Benefits Reduced EMI compared to 3.3V TTL Low Output Voltage Swing Slow Edge Rates (dv/dt) Reduced Power Consumption Reduced Noise With External Termination Reduced reflection Ringing Cross-talk Higher Performance/Higher Bandwidth

Differential I/O Benefits I/O Connectivity Significant Cost Savings Reduced EMI Fewer pins Fewer PCB layers, fewer PCB traces (PCB area savings) Fewer/smaller connectors No external transceivers High performance per pin pair - up to 400 Mb/sec Reduced EMI due to low output voltage swing High noise immunity Reduced power consumption Spartan-IIE Supports LVDS, Bus LVDS, and LVPECL

SelectI/O: Differential I/O Differential I/O is a standard feature Supported in all devices densities, all speed grades More differential I/Os within a device Up to 240 I/O pairs Offers flexibility in board layout Flexible differential I/Os Use any I/O as input, output or bi-directional Spartan-IIE Can be driven by any standard LVDS/LVPECL driver Complies with LVDS/LVPECL receiver specs LVDS LVPECL LVDS LVDS LVPECL LVPECL LVDS LVPECL

SelectI/O: Differential I/O Configurations Point to Point One transmitter and one receiver Mostly used by LVDS/LVPECL in chipto-chip applications Multi-Drop One transmitter and multiple receivers Used by Bus LVDS/LVPECL in backplane applications Multi-point Multiple transceivers Used by Bus LVDS/LVPECL in backplane applications Point to Point Multi-drop Multi-Point

SelectI/O: LVDS & LVPECL All I/Os have LVDS/LVPECL capability Differential signal pairs can be used as Synchronous inputs or outputs Asynchronous inputs Some as asynchronous outputs Synchronous Signal comes from IOB flip-flop Asynchronous Signal comes from internal logic

What is LVDS? LVDS - Low Voltage Differential Signaling LVDS is a differential signaling interconnect technology Requires two pins per channel LVDS was first used as a interconnectivity technology in laptops and displays to alleviate EMI issues Technology is now widely used A broad spectrum of telecom and networking applications Mainstream consumer applications like digital video and displays

LVDS Benefits Higher I/O speed Lower cost Serialize multiple single-ended to differential channel signals Save I/O pins Use a smaller package Save board space Technology and process independent Easy migration path for lower supply voltages Maintain same signal levels Maintain same performance Low power Low noise Low EMI

LVDS Low Power Advantage LVDS technology saves power in several important ways Power dissipation at the terminator is ~1.2 mw RS-422 driver delivers 3 V across a termination of 100 Ω, for 90 mw power consumption... 75 times more than LVDS! Due to the current mode driver design, the frequency component of ICC is greatly reduced Compared to TTL/CMOS transceivers where the dynamic power consumption increases exponentially with frequency

LVDS Noise Immunity Advantage R OUT is clean even in cases of extreme common mode noise contamination

LVDS benefits - Low EMI Low voltage swing (~350mV) Slow edge rates compared to other technologies (1V/ns) Current mode of operation ensures low I CC spikes High noise immunity Switching noise cancels between the two lines Data is not effected by the noise External noise effects both lines, but the voltage difference stays about the same

LVDS Applications Communications and Networking Switches Repeaters Wireless base stations Data Communications Routers Hubs

LVDS Applications (cont d) Consumer Electronics Digital cameras Flat panel displays Office/Home Printers Copiers Various backplane applications

Spartan-IIE LVDS Benefits Exceptional performance Up to 400Mb/sec. per differential pair Significant Cost Savings Reduced EMI Fewer pins (smaller package) Fewer PCB layers Fewer PCB traces (PCB area savings) Fewer/smaller connectors No transceivers Quicker Time-to-market Fewer EMI issues

LVDS Driver and Receiver Driver Spartan-IIE FPGA Receiver Spartan-IIE FPGA

SelectI/O: Bus LVDS All I/Os have Bus LVDS capability Fully compatible with industry-standard Bus LVDS devices from National Semiconductor and other vendors

LVDS Benefits Reduced I/O Example Count 4 Gb/s 8 Gb/s 4 Gb/s Switch Single-ended I/O 40 Pins @ 100MHz 8 Gb/s Switch 40 Pins@ 100MHz # of Pins: 80 LVDS I/O # of Pins: 46 20 Pins @ 400Mbps 20 Pins @ 400Mbps 1 clock pair per 8 data line pairs = 6 pins 8 Gb/s Switch 1 clock pair per 8 data line pairs = 6 pins

Spartan-IIE LVDS Example Clock Distribution No LVDS-TTL Translator Equal-Length Point-to-Point LVDS PCB Clock Traces Spartan-IIE Spartan-IIE 1 1 LVDS LVDS Clock Clock Source Source LVDS LVDS Clock Clock Distributor Distributor 2 2 2 2 Spartan-IIE Spartan-IIE 2 2 Spartan-IIE Spartan-IIE n n Clock speeds of 200 MHz+ can be distributed with ease using LVDS Spartan-IIE Eliminates LVDS-to-TTL Converters -- Eliminates 2ns Delay & Skew Benefits - Higher performance, low EMI, lower cost, fewer components

Spartan-IIE LVDS Example Clock Conversion with Zero Delay Zero-Delay Local Clock Generation to Any of Spartan-IIE I/O Standards LVDS Clock DLL TTL External External RAM RAM DLL Spartan-IIE SSTL External External RAM RAM Benefits - Low EMI, lower cost, fewer components

LVPECL Benefits Higher I/O speed Board-level clock distribution Zero-delay conversion of LVPECL clocks into virtually any other I/O standard Lower cost Serialize multiple single-ended to differential channel signals Save I/O pins Use a smaller package Save board space Low power Low noise Low EMI

LVPECL Applications Backplanes High performance clocking 100 MHz and above Optical Transceiver High speed networking Mixed-signal interfacing

LVPECL Driver and Receiver Driver Spartan-IIE FPGA Receiver Spartan-IIE FPGA

LVPECL: Clock Conversion LVPECL LVPECL Clock Clock Source Source PECL-to-TTL converter DLL DLL TTL SSTL Other Other Device Device External External RAM RAM PECL-to-TTL converter Receive and convert high speed clocks with zero delay Zero-delay clock generation to any of SelectI/O Standards Eliminate costly bus translators

Configuration Methods Master serial mode Slave serial mode Slave parallel mode JTAG mode IRL Multiple devices can be daisy-chained in Master serial mode Slave serial mode JTAG mode MultiLINX JTAG Target Board

Master Serial Mode Spartan-IIE device acts like a master Generates configuration clock (CCLK) using internal oscillator PROM stores the configuration data Configuration rate selectable from 4-60 MHz -30% to +45% variance due to process dependence

Slave Serial Mode Spartan-IIE device acts like a slave An external clock source drives the CCLK pin Configuration data is stored in PROM, flash, microcontroller or microprocessor memory Maximum configuration rate of 66 MHz

Slave Parallel Mode Single or multiple Spartan-IIE devices connected in parallel

Slave Parallel Mode (cont d) Spartan-IIE device acts like a slave An external clock source drives the CCLK pin Microprocessor, Microcontroller or CPLD controls configuration Configuration data is stored in parallel PROM, flash, Microcontroller or Microprocessor memory Fastest configuration mode 8 bits per CCLK cycle 50MHz configuration rate (400 Mbit/sec) Supports Readback Bi-directional read/write port for configuration and readback

IRL and Xilinx Online Internet Reconfigurable Logic (IRL) IRL is a design methodology to create field upgradable applications Supported by products, design guidelines and reference designs Xilinx Online Xilinx program to enable, identify and promote field upgradable applications

IRL Methodology Elements 4 main elements in IRL model Host / Server Network Target to be updated Payload(s) Xilinx provides an API (PAVE) and a set of design guidelines that define how remote devices can be upgraded via a network. PAVE Host PAVE Payload Portal Server Your Network PAVE Target

PAVE Features Configures FPGAs / CPLDs IEEE 1149.1 JTAG / SelectMAP PAVE Payload upgrades PLD + system software Systems Integration Framework (SIF) within Wind River s Tornado environment PAVE source distributed and supported by Xilinx C++ / C Application PAVE Device API Platform Abstraction Layer (PAL) VxWorks RTOS BSP for WRS VxWorks RTOS Microprocessor FPGA

MultiLINX Cable Configuration and Readback support Using boundary scan (JTAG) mode Slave serial/parallel mode Supports USB interface on PC Fastest configuration Baud rate up to 12M Supports RS-232 interface on PC and UNIX Baud rate Up to 57.6K on PC Up to 38.4K on UNIX

MultiLINX Cable

Parallel Cable Configuration and Readback support Using boundary scan (JTAG) mode Supports parallel port on PC Baud rate up to 57.6K

Parallel Cable