Virtex-II Architecture Block SelectRAM resource I/O Blocks (IOBs) edicated multipliers Programmable interconnect Configurable Logic Blocks (CLBs) Virtex -II architecture s core voltage operates at 1.5V Clock Management (CMs, BUFGMUXes) Basic FPGA Architecture 2-2 2005 Xilinx, Inc. All Rights Reserved
Slices and CLBs Each Virtex -II CLB contains four slices Local routing provides feedback between slices in the same CLB, and it provides routing to neighboring CLBs A switch matrix provides access to general routing resources Switch Matrix COUT BUFT BUF T SHIFT Slice S1 Slice S3 Slice S2 COUT Slice S0 Local Routing CIN CIN Basic FPGA Architecture 2-3 2005 Xilinx, Inc. All Rights Reserved
Simplified Slice Structure Each slice has four outputs Two registered outputs, two non-registered outputs Two BUFTs associated with each CLB, accessible by all 16 CLB outputs Carry logic runs vertically, up only Two independent carry chains per CLB Slice 0 PRE LUT Carry Q CE LUT Carry CLR PRE CE Q CLR Basic FPGA Architecture 2-4 2005 Xilinx, Inc. All Rights Reserved
etailed Slice Structure The next few slides discuss the slice features LUTs MUXF5, MUXF6, MUXF7, MUXF8 (only the F5 and F6 MUX are shown in this diagram) Carry Logic MULT_ANs Sequential Elements Basic FPGA Architecture 2-5 2005 Xilinx, Inc. All Rights Reserved
Look-Up Tables Combinatorial logic is stored in Look-Up Tables (LUTs) Also called Function Generators (FGs) Capacity is limited by the number of inputs, not by the complexity elay through the LUT is constant A B C Z 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 A B C Combinatorial Logic Z 0 1 0 1 1... 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 Basic FPGA Architecture 2-6 2005 Xilinx, Inc. All Rights Reserved
Connecting Look-Up Tables CLB Slice S3 Slice S2 F5 F8 F5 F6 MUXF8 combines the two MUXF7 outputs (from the CLB above or below) MUXF6 combines slices S2 and S3 Slice S1 F5 F7 MUXF7 combines the two MUXF6 outputs Slice S0 F5 F6 MUXF6 combines slices S0 and S1 MUXF5 combines LUTs in each slice Basic FPGA Architecture 2-7 2005 Xilinx, Inc. All Rights Reserved
Fast Carry Logic Simple, fast, and complete arithmetic Logic edicated XOR gate for single-level sum completion Uses dedicated routing resources All synthesis tools can infer carry logic CIN COUT To S0 of the next CLB First Carry Chain COUT SLICE S1 SLICE S0 CIN COUT To CIN of S2 of the next CLB COUT Second Carry Chain SLICE S3 SLICE S2 CIN CIN CLB Basic FPGA Architecture 2-8 2005 Xilinx, Inc. All Rights Reserved
MULT_AN Gate Highly efficient multiply and add implementation Earlier FPGA architectures require two LUTs per bit to perform the multiplication and addition The MULT_AN gate enables an area reduction by performing the multiply and the add in one LUT per bit A LUT CY_MUX S CO I CI MULT_AN CY_XOR A x B LUT B LUT Basic FPGA Architecture 2-9 2005 Xilinx, Inc. All Rights Reserved
Flexible Sequential Elements Either flip-flops or latches Two in each slice; eight in each CLB Inputs come from LUTs or from an independent CLB input Separate set and reset controls Can be synchronous or asynchronous All controls are shared within a slice Control signals can be inverted locally within a slice FRSE _1 CE S Q R FCPE PRE Q CE CLR LCPE PRE CE Q G CLR Basic FPGA Architecture 2-10 2005 Xilinx, Inc. All Rights Reserved
Shift Register LUT (SRL16CE) ynamically addressable serial shift registers Maximum delay of 16 clock cycles per LUT (128 per CLB) Cascadable to other LUTs or CLBs for longer shift registers edicated connection from Q15 to input of the next SRL16CE Shift register length can be changed asynchronously by toggling address A LUT CE CLK A[3:0] LUT CE CE CE CE Q Q Q Q Q Q15 (cascade out) Basic FPGA Architecture 2-11 2005 Xilinx, Inc. All Rights Reserved
IOB Element Input path Two R registers Output path Two R registers Two 3-state enable R registers Separate clocks and clock enables for I and O Set and reset signals are shared Reg OCK1 Reg OCK2 Reg OCK1 Reg OCK2 R MUX 3-state R MUX Output IOB Input Reg ICK1 Reg ICK2 PA Basic FPGA Architecture 2-12 2005 Xilinx, Inc. All Rights Reserved
istributed SelectRAM Resources Uses a LUT in a slice as memory Synchronous write Asynchronous read Accompanying flip-flops can be used to create synchronous read RAM and ROM are initialized during configuration ata can be written to RAM after configuration Emulated dual-port RAM One read/write port One read-only port LUT Slice LUT LUT RAM16X1S WE WCLK A0 O A1 A2 A3 RAM32X1S WE WCLK A0 O A1 A2 A3 A4 RAM16X1 WE WCLK A0 SPO A1 A2 A3 PRA0 PO PRA1 PRA2 PRA3 Basic FPGA Architecture 2-13 2005 Xilinx, Inc. All Rights Reserved
Block SelectRAM Resources Up to 3.5 Mb of RAM in 18-kb blocks Synchronous read and write True dual-port memory Each port has synchronous read and write capability ifferent clocks for each port Supports initial values Synchronous reset on output latches Supports parity bits One parity bit per eight data bits 18-kb block SelectRAM memory IA IPA ARA WEA ENA SSRA CLKA IB IPB ARB WEB ENB SSRB CLKB OA OPA OB OPB Basic FPGA Architecture 2-14 2005 Xilinx, Inc. All Rights Reserved
ual-port Block RAM Configurations Configurations available on each port Configuration epth ata Bits Parity Bits 16k x 1 16 kb 1 0 8k x 2 8 kb 2 0 4k x 4 4 kb 4 0 2k x 9 2 kb 8 1 1k x 18 1 kb 16 2 512 x 36 512 32 4 Independent configurations on ports A and B Supports data-width conversion, including parity bits IN 8 bit Port A: 8 bits Port B: 32 bits OUT 32 bit Basic FPGA Architecture 2-15 2005 Xilinx, Inc. All Rights Reserved
edicated Multiplier Blocks 18-bit twos complement signed operation Optimized to implement Multiply and Accumulate functions Multipliers are physically located next to block SelectRAM memory ata_a (18 bits) 4 x 4 signed 18 x 18 Multiplier Output (36 bits) 8 x 8 signed 12 x 12 signed ata_b (18 bits) 18 x 18 signed Basic FPGA Architecture 2-16 2005 Xilinx, Inc. All Rights Reserved
Xilinx esign Flow Plan & Budget Implement Translate Map Create Code/ Schematic Functional Simulation HL RTL Simulation Synthesize to create netlist Place & Route Attain Timing Closure Timing Simulation Create BIT File Basic FPGA Architecture 2-17 2005 Xilinx, Inc. All Rights Reserved
Xilinx Implementation Once you generate a netlist, you can implement the design There are several outputs of implementation Reports Timing simulation netlists Floorplan files FPGA Editor files and more! Implement Translate Map Place & Route.... Basic FPGA Architecture 2-18 2005 Xilinx, Inc. All Rights Reserved
What is Implementation? More than just Place & Route Implementation includes many phases Translate: Merge multiple design files into a single netlist Map: Group logical symbols from the netlist (gates) into physical components (slices and IOBs) Place & Route: Place components onto the chip, connect the components, and extract timing data into reports Each phase generates files that allow you to use other Xilinx tools Floorplanner, FPGA Editor, XPower Basic FPGA Architecture 2-19 2005 Xilinx, Inc. All Rights Reserved
Project Summary esign Overview evice Utilization Performance and Constraints Reports Basic FPGA Architecture 2-20 2005 Xilinx, Inc. All Rights Reserved
Map Reports Map Report contents Command line options for the map program esign summary List of how many device resources are used Errors and warnings Removed logic summary List of logic that was removed due to sourceless or loadless nets IOB properties Indicates whether an I/O flip-flop is used List of attributes on each I/O pin Post-Map Static Timing Report not covered here Basic FPGA Architecture 2-21 2005 Xilinx, Inc. All Rights Reserved
Map Report Example Release 4.1i - Map E.30 Xilinx Mapping Report File for esign 'top' esign Information ------------------ Command Line : map -p xc2v40-fg256-4 -cm area -k 4 -c 100 -tx off top.ngd Target evice : x2v40 Target Package : fg256 Target Speed : -4 Mapper Version : virtex2 -- $Revision: 1.58 $ Mapped ate : Tue Aug 21 09:42:20 2001 esign Summary -------------- Basic FPGA Architecture 2-22 2005 Xilinx, Inc. All Rights Reserved
Map Report Example Number of errors: 0 Number of warnings: 0 Number of Slices: 256 71% 182 out of Number of Slices containing unrelated logic: 182 0% 0 out of Number of Slice Flip Flops: 512 33% 170 out of Total Number 4 input LUTs: 512 48% 248 out of Number used as LUTs: 167 Number used as a route-thru: 81 Number of bonded IOBs: 88 29% 26 out of Number of GCLKs: 16 6% 1 out of Total equivalent gate count for design: 3,475 Additional JTAG gate count for IOBs: 1,248! Basic FPGA Architecture 2-23 2005 Xilinx, Inc. All Rights Reserved
Place & Route Reports Place & Route Report contents Command line options for the par program Errors and warnings evice utilization summary Similar to the esign Summary from the Map Report Unrouted nets Timing summary Statistics on average routing delays Performance versus constraints if the design contains timing constraints Basic FPGA Architecture 2-24 2005 Xilinx, Inc. All Rights Reserved
Timing Reports Timing Report contents (for designs with constraints) Command line options for the trce program Timing Constraints section Summary of each timing constraint etails on paths that fail to meet constraints ata Sheet section Setup/hold, clock to pad, timing between clock domains, and pad-to-pad delay information Organized in easy-to-read table format Timing Summary section Number of errors and Timing Score Constraint coverage Basic FPGA Architecture 2-25 2005 Xilinx, Inc. All Rights Reserved
Timing Report Example Release 4.1i - Trace E.30 Copyright (c) 1995-2001 Xilinx, Inc. All rights reserved. trce -e 3 -l 3 -xml top top.ncd -o top.twr top.pcf esign file: top.ncd Physical constraint file: top.pcf evice,speed: xc2v40,-4 (AVANCE 1.85 2001-07-24) Report level: error report -------------------------------------------------------------------------------- WARNING:Timing - No timing constraints found, doing default enumeration. ================================================================================ Timing constraint: efault period analysis 8292 items analyzed, 0 timing errors detected. Minimum period is 8.852ns. Maximum delay is 11.830ns. -------------------------------------------------------------------------------- Basic FPGA Architecture 2-26 2005 Xilinx, Inc. All Rights Reserved
Timing Report Example All constraints were met. ata Sheet report: ----------------- All values displayed in nanoseconds (ns) Clock FiftyM_clk to Pad ---------------+------------+ clk (edge) estination Pad to PA ---------------+------------+ EN 10.035(R) half1 9.465(R) half2 9.166(F) half3 9.740(R) half4 9.174(F) ---------------+------------+ Basic FPGA Architecture 2-27 2005 Xilinx, Inc. All Rights Reserved
Without Timing Constraints This design had no timing constraints or pin assignments entered when it was implemented Note the logical structure of the placement and pins. Xilinx recommends that you compile your design at least once without timing constraints or pin assignments This design has a maximum system clock frequency of 50 MHz Basic FPGA Architecture 2-28 2005 Xilinx, Inc. All Rights Reserved
With Timing Constraints This is the same design with three global timing constraints entered with the Constraints Editor It has a maximum system clock frequency of 60 MHz Note how most of the logic is placed closer to the edge of the device where the pins have been placed Basic FPGA Architecture 2-29 2005 Xilinx, Inc. All Rights Reserved
Period Constraint In this example the Period constraint optimizes all delay paths between flip-flops The Period constraint does NOT optimize delay paths from input pads to output pads (purely combinatorial), paths from input pads to flip-flops, or paths from flip-flops to output pads AATA FLOP1 Q FLOP2 Q FLOP3 Q OUT1 CLK BUFG FLOP4 FLOP5 BUS [7..0] Q Q OUT2 CATA = Combinatorial Logic Basic FPGA Architecture 2-30 2005 Xilinx, Inc. All Rights Reserved
The Period Constraint A synchronous element is a flip-flop, latch, or a synchronous RAM The Period constraint covers paths Between synchronous elements which are clocked by the reference net Synchronous elements are grouped by the clock signal driving them. This is called forward propagation and enables constraining large pieces of logic with a single constraint Basic FPGA Architecture 2-31 2005 Xilinx, Inc. All Rights Reserved
Offset Constraint In this example, the Offset constraint optimizes delay paths from input pads to flip-flops and paths from flip-flops to output pads Offset In Offset Out AATA FLOP Q FLOP Q FLOP Q OUT1 CLK BUFG FLOP FLOP BUS [7..0] Q Q OUT2 CATA = Combinatorial Logic Basic FPGA Architecture 2-32 2005 Xilinx, Inc. All Rights Reserved
The Offset Constraint The Offset constraint covers paths From input pads to synchronous elements clocked by the reference net (Offset In) From synchronous elements to output pads clocked by the reference net (Offset Out) Note, that this constraint does not cover paths Between synchronous elements From pads to pads (purely combinatorial paths) Basic FPGA Architecture 2-33 2005 Xilinx, Inc. All Rights Reserved