FPGA Architecture Overview. Generic FPGA Architecture (1) FPGA Architecture

Similar documents
Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

FPGA architecture and design technology

Field Programmable Gate Array (FPGA)

DSP Resources. Main features: 1 adder-subtractor, 1 multiplier, 1 add/sub/logic ALU, 1 comparator, several pipeline stages

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

Topics. Midterm Finish Chapter 7

Topics. Midterm Finish Chapter 7

Parallel FIR Filters. Chapter 5

EE178 Lecture Module 2. Eric Crabill SJSU / Xilinx Fall 2007

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

INTRODUCTION TO FPGA ARCHITECTURE

H100 Series FPGA Application Accelerators

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

The DSP Primer 8. FPGA Technology. DSPprimer Home. DSPprimer Notes. August 2005, University of Strathclyde, Scotland, UK

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

EE219A Spring 2008 Special Topics in Circuits and Signal Processing. Lecture 9. FPGA Architecture. Ranier Yap, Mohamed Ali.

7-Series Architecture Overview

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

Virtex-II Architecture

ECE 448 Lecture 5. FPGA Devices

Xilinx ASMBL Architecture

ECE 545 Lecture 12. FPGA Resources. George Mason University

FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

Achieving Breakthrough Performance with Virtex-4, the World s Fastest FPGA

Introduction to Modern FPGAs

International Training Workshop on FPGA Design for Scientific Instrumentation and Computing November 2013.

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

TSEA44 - Design for FPGAs

HDL Coding Style Xilinx, Inc. All Rights Reserved

Altera FLEX 8000 Block Diagram

Review from last time. CS152 Computer Architecture and Engineering Lecture 6. Verilog (finish) Multiply, Divide, Shift

Programmable Logic. Simple Programmable Logic Devices

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Virtex-4 Family Overview

ECE 645: Lecture 1. Basic Adders and Counters. Implementation of Adders in FPGAs

High-Performance Integer Factoring with Reconfigurable Devices

EECS150 - Digital Design Lecture 16 - Memory

CS Digital Systems Project Laboratory

EECS150 - Digital Design Lecture 13 - Project Description, Part 2: Memory Blocks. Project Overview

CHAPTER 4. DIGITAL DOWNCONVERTER FOR WiMAX SYSTEM

Agenda. Introduction FPGA DSP platforms Design challenges New programming models for FPGAs

ECE 699: Lecture 9. Programmable Logic Memories

The Virtex FPGA and Introduction to design techniques

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

Course Overview Revisited

Outline of Presentation Field Programmable Gate Arrays (FPGAs(

Digital System Construction

Programmable Logic. Any other approaches?

Chapter 2. Cyclone II Architecture

Reconfigurable Computing

Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs. Chethan Kumar H B and Nachiket Kapre

Chapter 8 FPGA Basics

Summary. Introduction. Application Note: Virtex, Virtex-E, Spartan-IIE, Spartan-3, Virtex-II, Virtex-II Pro. XAPP152 (v2.1) September 17, 2003

EE260: Digital Design, Spring 2018

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

EECS150 - Digital Design Lecture 16 Memory 1

What is Xilinx Design Language?

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007

System-on Solution from Altera and Xilinx

The Xilinx XC6200 chip, the software tools and the board development tools

ECEU530. Project Presentations. ECE U530 Digital Hardware Synthesis. Rest of Semester. Memory Structures

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

Outline. Field Programmable Gate Arrays. Programming Technologies Architectures. Programming Interfaces. Historical perspective

FPGA Implementations

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Autonomous Built-in Self-Test Methods for SRAM Based FPGAs

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

Digital System Design Lecture 7: Altera FPGAs. Amir Masoud Gharehbaghi

EITF35: Introduction to Structured VLSI Design

ECE 545: Lecture 11. Programmable Logic Memories

ECE 545: Lecture 11. Programmable Logic Memories. Recommended reading. Memory Types. Memory Types. Memory Types specific to Xilinx FPGAs

Section I. Cyclone II Device Family Data Sheet

ΔΙΑΛΕΞΗ 2: FPGA Architectures

Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool

VLSI Programming 2016: Lecture 3

Introduction to Partial Reconfiguration Methodology

Programmable Logic Devices FPGA Architectures II CMPE 415. Overview This set of notes introduces many of the features available in the FPGAs of today.

XA Spartan-6 Automotive FPGA Family Overview

ECE 448 Lecture 5. FPGA Devices

Spiral 2-8. Cell Layout

EECS150 - Digital Design Lecture 3 - Field Programmable Gate Arrays (FPGAs) Project platform: Xilinx ML

Field Programmable Gate Array (FPGA) Devices

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder

Built-In Self-Test for Regular Structure Embedded Cores in System-on-Chip

Presentation Outline Overview of FPGA Architectures Virtex-4 & Virtex-5 Overview of BIST for FPGAs BIST Configuration Generation Output Response Analy

EECS 151/251A Spring 2019 Digital Design and Integrated Circuits. Instructor: John Wawrzynek. Lecture 18 EE141

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Introduction to Field Programmable Gate Arrays

Embedded Systems: Hardware Components (part I) Todor Stefanov

Outline of Presentation History of DSP Architectures in FPGAs Overview of Virtex-4 4 DSP

Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas

EE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination

User Manual for FC100

Pipelining & Verilog. Sequential Divider. Verilog divider.v. Math Functions in Coregen. Lab #3 due tonight, LPSet 8 Thurs 10/11

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I

FPGA: What? Why? Marco D. Santambrogio

Transcription:

FPGA Architecture Overview dr chris dick dsp chief architect wireless and signal processing group xilinx inc. Generic FPGA Architecture () Generic FPGA architecture consists of an array of logic tiles Tile typically consists of lookup table(s) register(s) multipliers/multiplyaccumulate unit (MAC) Vertical Wiring Channel Routing resources in the channels between the logic tiles provide the connectivity between tiles, I/O, onchip memory & other resources FPGA Architecture 2

Generic FPGA Architecture (2) FPGA Architecture 3 VirtexII FPGA Brief overview of an older generation FPGA FPGA Architecture 4 2

VirtexII Platform FPGA Active Interconnect Powerful CLB Slice S3 Switch Matrix CLB, IOB, DCM Switch Matrix Slice S Slice S2 Fully Buffered Fast, Predictable Slice S BRAM Block RAM KBit True Dual Port Up to 3.5Mbits / device Multipliers b x b multiplier 2MHz pipelined 8 LUTs 28b distributed RAM Wide Input functions (32:) FPGA Architecture 5 CLB Contains 4 Slides Each CLB is connected to one switch matrix Providing access to general routing resources Switch Matrix TBUF TBUF SHIFT Slice S XY Slice S XY COUT CIN Slice S3 XY Slice S2 XY COUT CIN Fast Connects High level of logic integration Wideinput functions: 6: multiplexer in CLB or any 7input function 32: multiplixer in 2 CLBs ( level of LUT) Fast arithmetic functions 2 lookahead carry chains per CLB column Addressable shift registers in LUT 6b shift register in LUT 28b shift register in CLB (dedicated shift chain) FPGA Architecture 6 3

FPGA Logic Slice slice = 2 LUTs + 2 Registers PRE LUT Carry D Q CE CLR F5 F6 LUT Carry PRE D Q CE CLR FPGA Architecture 7 VirtexII Slice Each Function Generator F & G: 4input Lookup Table 6bit distributed RAM 6bit Shift Register Each Register: D FlipFlop Latch Dedicated Logic: Muxes Multiplier fabric FPGA Architecture 8 4

VirtexII Slice Detailed View diagram shows ½ slice FPGA Architecture 9 Dedicated Carry Logic The limited arithmetic performance achievable using LUT elements alone to implement carry logic is solved by the dedicated carry logic contained in each slice. Dedicated fast connection path to next slice I3 I2 I I I3 I2 I I O O D D Q Q MUXCY XORCY The dedicated, and much simpler structure of the MUXCY and XORCY carry logic components enables a very high performance (low delay) path from carry input to carry output. It also allows a 2bit operation to be performed in a slice. However, it is not so obvious how this is achieved.. Dedicated fast connection from previous slice FPGA Architecture 5

The Xilinx Full Adder A B Half_sum Cin Sum Half_sum A Cout Cin XOR function Cout Multiplexer function 2 input XOR function B A SUM Connects to A Cin Fast MSB resolution FPGA Architecture VirtexII Family FPGA Architecture 2 6

CLB Multiplexors Slice S3 Slice S2 F5 F5 F6 F8 MUXF8 combines the 2 MUXF7 outputs (Two CLB) MUXF6 combines Slices XY & XY Slice S F5 F7 MUXF7 combines the 2 MUXF6 outputs Slice S F5 F6 MUXF6 combines Slices XY & XY CLB FPGA Architecture 3 Hierarchical Routing Resources FPGA Architecture 4 7

VirtexII Memory Hierarchy 6x 6x 6x 6x Distributed RAM 6x 6x 6x 6x HighPerformance External Memory Interfaces DDR SDRAM 6k x 8k x 2 4k x 4 2k x 9 k x 52 x TrueDual Port Synchronous Block RAM ZBT SRAM QDR SRAM FPGA Architecture 5 Unique Distributed RAM LUTs used as memory inside the fabric Flexible, can be used as RAM, ROM, or shift register 64b 64b Dual Port RAM Distributed memory with fast access time Cascadable with builtin CLB routing Applications Linear Feedback Shift Register Distributed arithmetic Timeshared registers Small FIFO Digital delay lines RAM6 SRL6 LUT 6b 28b CLB Single Port RAM CLB Shift register 28b 6b CLB FPGA Architecture 6 8

Efficient Shift Register in LUT 6 latches in the LUT can be configured as shift register Maximum delay of 6 clock cycles in one LUT, up to 28 in one CLB Can be read asynchronously by toggling address lines Efficient programmable delay for balancing pipelined designs Can also be used for small FIFOs or to reprogram LUTs IN CE CLK LUT D Q CE D Q CE D Q CE OUT D Q CE ADDRESS CASCADE FPGA Architecture 7 SRL6 Applications Pipeline compensation (different length per branch ) FIFO, pseudorandom number generator (LFSR) Serial frame synchronizer Runningaverage calculator Pulse generator and clock divider Pattern generator, state machine Website:http://support.xilinx.com/support/techxclusives/SRL6techxclusive2.htm FPGA Architecture 9

Time Division Multiplexing Hardware with the SRL6 () Channel (28,22) RS encoder 56 slices gate g g g 2 g2t b GF multiplier GF adder b b 2 2t x a( X ) message b2 t Parity Bits Output FPGA Architecture 9 Time Division Multiplexing Hardware with the SRL6 (2) 6 Channel (28,22) RS encoder 9 slices 6chn arch 9 = = % 6 copies of chn. arch 6 56 gate g g g2 b b b 2 GF multiplier SRL6 GF adder 2t x a( X ) message g2t b2 t Parity Bits Output FPGA Architecture 2

Kb True DualPort Block RAM Configuration Depth 6K x 6K 8K x 2 8K 4K x 4 4K 2K x 9 2K K x K 52 x 52 Data Bits Parity Bits 2 4 8 6 2 32 4 9bit bit Port A: 9b Port B: b 9Bit bit Each Port Independent Width Supports data width conversion including parity bits Synchronous read and write 25 MHz registered performance FPGA Architecture 2 BRAM Data Width Conversion Block RAMs Provide Data Width Conversion And FIFO Function in One Narrow Data Stream Narrow Data Stream Wide Processing Data Path FPGA Architecture 22

Dual Port Memory Configurations for VirtexII FPGA Architecture 23 Virtex4 FPGA FPGA Architecture 24 2

Virtex4 FPGA Revolutionary Advance in FPGA Architecture ASMBL Enables DialIn Resource Allocation Mix Logic, DSP, BRAM, I/O, MGT, DCM, PowerPC Enabled by FlipChip Packaging Technology I/O Columns Distributed Throughout the Device FPGA Architecture 25 Three Virtex4 Platforms Device Logic Cells Block RAM [Kb] DCM SelectIO XtremeDSP Slice PowerPC // EMAC RocketIO transceiver XC4VLX5 3,824 864 4 32 32 XC4VLX25 24,92,296 8 4 XC4VLX4 4,472,728 8 64 64 XC4VLX6 59,94 2,88 8 64 64 XC4VLX8 8,64 3,6 2 768 8 XC4VLX,592 4,32 2 96 96 XC4VLX6 52,64 5,4 2 96 96 XC4VLX2 2,4 6, 2 96 96 XC4VSX25 23,4 2,34 4 32 28 XC4VSX35 34,56 3,456 8 4 92 XC4VSX55 55,296 5,76 8 64 52 XC4VFX2 2,32 6 4 32 32 2 XC4VFX2 9,224,224 4 32 32 2 8 XC4VFX4 4,94 2,592 8 4 2 4 2 XC4VFX6 56,88 4,76 2 576 28 2 4 6 XC4VFX 94,896 6,768 2 768 6 2 4 2 XC4VFX4 42,28 9,9 2 896 92 2 4 24 FPGA Architecture 26 3

DSP Tile think of this as the main computational element C BCOUT To Adjacent DSP Tile PCOUT Shared with adjacent DSP A B 72 X Y CIN ± P ZERO Z SUB Register BCIN Wire Shift Right By 7b PCIN FPGA Architecture 27 Dynamically Reconfigurable DSP OPMODEs OpMode Zero Hold P A:B Select Multiply C Select Feedback Add Bit Adder P Cascade Select P Cascade Feedback Add P Cascade Add P Cascade Multiply Add P Cascade Add P Cascade Feedback Add Add P Cascade Add Add Hold P Double Feedback Add Feedback Add MultiplyAccumulate Feedback Add Double Feedback Add Feedback Add Add C Select Feedback Add Bit Adder MultiplyAdd 7Bit Shift P Cascade Select 7Bit Shift P Cascade Feedback Add 7Bit Shift P Cascade Add 7Bit Shift P Cascade Multiply Add 7Bit Shift P Cascade Add 7Bit Shift P Cascade Add Add 7Bit Shift Feedback 7Bit Shift Feedback Feedback Add 7Bit Shift Feedback Add 7Bit Shift Feedback Multiply Add 7Bit Shift Feedback Add Z Y X 6 5 4 3 2 +/ Cin +/ (P + Cin) Output +/ (A:B + Cin) +/ (A * B + Cin) +/ (C + Cin) +/ (C + P + Cin) +/ (A:B + C + Cin) PCIN +/ Cin PCIN +/ (P + Cin) PCIN +/ (A:B + Cin) PCIN +/ (A * B + Cin) PCIN +/ (C + Cin) PCIN +/ (C + P + Cin) PCIN +/ (A:B + C + Cin) P +/ Cin P +/ (P + Cin) P +/ (A:B + Cin) P +/ (A * B + Cin) P +/ (C + Cin) P +/ (C + P + Cin) P +/ (A:B + C + Cin) C +/ Cin C +/ (P + Cin) C +/ (A:B + Cin) C +/ (A * B + Cin) Shift(PCIN) +/ Cin Shift(PCIN) +/ (P + Cin) Shift(PCIN) +/ (A:B + Cin) Shift(PCIN) +/ (A * B + Cin) Shift(PCIN) +/ (C + Cin) Shift(PCIN) +/ (A:B + C + Cin) Shift(P) +/ Cin Shift(P) +/ (P + Cin) Shift(P) +/ (A:B + Cin) Shift(P) +/ (A * B + Cin) Shift(P) +/ (C + Cin) Over 4 Different Modes Each XtremeDSP Slice individually controllable Change operation in a single clock cycle Enables resource sharing for maximum utilization FPGA Architecture 28 4

Combinatorial Multiplier C To Adjacent DSP Tile BCOUT PCOUT MS Word LS Word A B 72 X Y CIN ± P A B P (PCOUT) ZERO Z SUB b product sign extended to b Register Wire Shift Right By 7b BCIN PCIN FPGA Architecture 29 Pipelined Multiplier C To Adjacent DSP Tile BCOUT 3 delay latency PCOUT MS Word LS Word A B 72 X Y CIN ± P A B z 3 P (PCOUT) b product sign extended to b Register ZERO SUB Z Wire Shift Right By 7b BCIN PCIN FPGA Architecture 3 5

DSP: Wide Add/Sub () Wide add/sub C [A:B] add/sub MS Word LS Word This is one option Use of C port can restrict use of adjacent DSP since the C port is shared C C+/[A:B] A B To Adjacent DSP Tile BCOUT Register 2 delay latency X CIN 72 Y ± ZERO SUB Z Wire Shift Right By 7b PCOUT P BCIN PCIN FPGA Architecture 3 DSP: Wide Add/Sub (2) PCIN Wide add/sub [A:B] add/sub MS Word LS Word This is a 2nd option for wide add/sub Use of PCIN removes coupling between DSPs that results from use of C port C A B PCIN+/[A:B] To Adjacent DSP Tile BCOUT 2 delay latency X 72 Y ZERO Z CIN ± SUB PCOUT P Register Wire Shift Right By 7b BCIN PCIN FPGA Architecture 32 6

Conventional FIR Filter Standard textbook FIR filter Direct implementation of this graph has potential sample rate limitations due to long combinatorial path FPGA Architecture 33 Pipelined FIR Filter Virtex4 DSP provides support for pipelined FIR filters Pipelining ensures high performance >5 MHz operation Pipelining registers implemented directly in DSP tile FPGA Architecture 34 7

35x MPY A[34:7] S2 P[52:7] S,A[6:] B[7:] Sign Extension >>7 z P[6:] sn = Slice n Register z Logic Fabric Delay FPGA Architecture 35 Pipelined 35x35 MPY A[34:7] z 3 S4 P[69:34] S3,A[6:] z B[34:7] z >>7 z P[33:7] A[34:7] S2 S,A[6:] sn = Slice n,b[6:] Register Sign Extension Logic Fabric Delay z >>7 z 3 P[6:] FPGA Architecture

Pipelined Complex x MPY Ai Bi S4 Pr Ar Br Ar Bi S3 S2 Pi S Ar Bi sn = Slice n Register Sign Extension FPGA Architecture 37 Pipelined Complex x MACC Ai Bi S5 S6 Pr Ar Br Ar Bi S4 S2 S3 Pi sn = Slice n S Ar Bi Register Sign Extension FPGA Architecture 38 9

Pipelined Complex 35x MPY Real component of complex product Ai[34:7] Bi[7:] z 3 z 3 S4 Pr[52:7] Ar[34:7] z S3,Ar[6:] Br[7:] S2 >>7 z 2 Pr[6:] S,Ai[6:] sn = Slice n Bi[7:] Register Sign Extension Logic Fabric Delay z FPGA Architecture 39 Pipelined Complex 35x MPY Imaginary component of complex product Ai[34:7] Br[7:] z 3 z 3 S4 Pi[52:7] Ar[34:7] z S3,Ar[6:] Bi[7:] S2 >>7 z 2 Pi[6:] S,Ai[6:] sn = Slice n Br[7:] Register Sign Extension Logic Fabric Delay z FPGA Architecture 4 2

Bit Barrel Shifter A[7:] S2 2 n A[7:] S A[,7:] 2 n Sign Extension >>7 sn = Slice n Register FPGA Architecture 4 Virtex6 FPGA FPGA Architecture 42 2

Virtex6 FPGA: Faster Logic Fabric LUT6 increases logic capability Reduces number of logic levels Reduces routing Lowers fanout LUT6 with Dual FF Pair LUT6 Second flipflop added Improves heavily pipelined designs Same CLB architecture for both Spartan6 and Virtex6 FPGA Architecture 43 Virtex6 CLB Each CLB connected to a switch CLB element contains a pair of slices Slices do not have direct connections to each other Each slice contains 4 6input function generators 8 flipflops (FFs) Some slices (slicem) contain memory others (slicel) do not support distributed memory. SLICEM only, SLICEL does not have distributed RAM or shift registers FPGA Architecture 44 22

Detailed View of Virtex6 SliceM Each LUT arbitrary function of 6 inputs output on O6 2 arbitrary functions of 5 inputs outputs are O5 and O6 Generating functions of 7 and 8 inputs F7AMUX can generate a function of 7 input variables by combining the outputs of LUTs A and B F7BMUX can generate a function of 7 input variables by combining the outputs of LUTs C and D F8MUX combines the outputs of F7AMUX and F7BMUX F7BMUX F8MUX F7AMUX FPGA Architecture 45 Virtex6 LUT function of 6 variables Figures illustrate computing a function of 6 input variables Combinatorial output Registered output combinatorial output f ( A (6) ) A (6) registered output f ( A (6) ) f ( A (6) ) config. mem cell defines mux sel A (6) f ( A (6) ) FPGA Architecture 46 23

Virtex6 LUT two independent functions of 5 variables one LUT can compute two independent functions of 5 variables Combinatorial output combinatorial output f ( A(5) ), g ( A(5) ) A (5) g ( A (5) ) f ( A (5) ) Registered output registered output A (5) ( ), ( ) (5) (5) f A g A config. mem cell defines mux sel g ( A (5) ) f ( A (6) ) FPGA Architecture 47 Virtex6 LUT can bypass LUT and register AX, DX inputs register AX, DX to insert clock cycle delay pipeline balancing shortening critical path in design FPGA Architecture 24

Virtex6 CLB: Higher Performance for Pipelined Designs Virtex4 and Earlier Virtex5 Virtex6 LUT/FF Pair LUT/FF Pair LUT/Dual FF Pair 4LUT 6LUT 6LUT Great GeneralPurpose Logic Substantial increase in LUT logic capability: Drives performance NEW: Second flipflop added to increase utilization of heavily pipelined designs Virtex6 Overview 49 FPGA Architecture 49 Virtex6 Distributed RAM memory configurations singleport 32xbit RAM dualport 32xbit RAM quadport 32x2bit RAM simple dualport 32x6bit RAM singleport 64xbit RAM dualport 64xbit RAM quadport 64xbit RAM simple dualport 64x3bit RAM singleport 28xbit RAM dualport 28xbit RAM singleport 256xbit RAM Refer to http://www.xilinx.com/support/documentation/virtex6.htm for details FPGA Architecture 5 25

Distributed RAM Configurations distributed RAM: RAM32x2Q distributed RAM: RAM32x6SDP FPGA Architecture 5 Distributed RAM Configurations distributed RAM: RAM64xS distributed RAM: RAM64xD FPGA Architecture 52 26

LUT as Shift Register () Shift register logic (SRL) LUT configuration LUT can be configured to operation as 32bit shift register LUT configured as SRL functional representation of LUT configured as SRL FPGA Architecture 53 LUT as Shift Register (2) Shift register logic (SRL) LUT configuration LUT can be configured to operation as dual 6bit shift register 2 LUTs support 64bit SRL LUT configured as 6b dual SRL 64b shift register configuration FPGA Architecture 54 27

LUT as Shift Register (3) 96b SRL 28b SRL FPGA Architecture 55 Virtex6 Multiplexors () four 4: multiplexors two 8: multiplexors FPGA Architecture 56 28

2 5 2 5 2 5 2 5 2 5 2 5 8 8 8 2 5 2 5 2 5 FPGA Architecture Virtex6 Multiplexors (2) dedicated F7/F8 MUX in a slice enables efficient construction of large MUX 6: multiplexor dedicated F7/F8 also ensure high clock frequencies FPGA Architecture 57 High DSP Performance DSPE FPGA tile Cascadable MultiplierAccumulator Nearly a TeraMAC of DSP performance Powerful thirdgeneration DSP slice Up to 6 MHz operation in Virtex6 Up to 287 MHz operation in Spartan6 New optional preadder Familiar cascade capability for highest performance and utilization B A D +/ B A D +/ PreAdder 25 x MULT C = 25 x MULT C = P P Highest DSP slice capacity Up to 2, DSP Slices in Virtex6 Up to 2 DSP slices in Spartan6 B A D +/ 25 x MULT C = P Mont Blanc Overview 58 FPGA Architecture 58 29

Virtex6 DSPE Tile To first order the DSPE comprises a multiplier followed by an accumulator All 3 pipeline registers (input A/B, middle M and output P) should be enabled to achieve maximum clock rate input preadder is very useful for symmetric FIR filters add/sub out ( Z ( X Y CIN )) or ( Z ( X Y CIN )) = ± + + ± + + FPGA Architecture 59 Virtex6 DSPE Tile simplified view To first order the DSPE comprises a multiplier followed by an accumulator All 3 pipeline registers (input A/B, middle M and output P) should be enabled to achieve maximum clock rate FPGA Architecture 6 3

Taking Advantage of Filter Symmetry with the Virtex6/Spartan6 DSPE/A Preadder FPGA Architecture 6 Virtex6 Base Platform FPGA Architecture 62 3

Partial Reconfiguration Configuration 3 2 Reconfigurable Module A Reconfigurable Module B Partition RP Reconfigurable Module C Reconfigurable Module D Partition RP2 Partial Reconfiguration is unique to Xilinx Saves static and dynamic Power: Design Green FPGA Architecture 63 32