Programmable Logic SM098 Computation Structures - Programmable Logic Simple Programmable Logic evices Programmable Array Logic (PAL) AN-OR arrays are common blocks in SPL and CPL architectures Implements two level logic functions like: F = AB C + B + C SM098 Computation Structures - Programmable Logic 2
36 Product Term Allocator Additional Product Terms (from other macrocells) Product Term Set 0 Product Term Clock Product Term Reset Product Term OE Additional Product Terms (from other macrocells) Global Set/Reset Global Clocks 3 S /T R OUT PTOE X5878 To FastCONNECT Switch Matrix To Blocks X5879 Simple Programmable Logic evices I - I 8 CLK/ I0 8 Programmable AN Array 32 x 64 Vantis PALV6V8 MC 0 MC MC 2 MC 3 MC 4 MC 5 MC 6 MC 7 OE/I 9 0 2 3 4 5 6 7 773-0 X 0 OE V CC 0 0 0 0 To Adjacent Macrocell Macrocell SG SL0 X 0 X 0 X SL X CLK 0 0 X *SG SL0 X From Adjacent Pin SM098 Computation Structures - Programmable Logic 3 Complex Programmable Logic evices CPLs have much higher capacity than SPLs, but the architecture is similar. Function block Xilinx XC9500 architecture Macrocell JTAG Port 3 JTAG Controller In-System Programming Controller From 36 FastCONNECT Switch Matrix Programmable AN-Array Product Term Allocators 8 8 8 OUT PTOE To FastCONNECT Switch Matrix To Blocks 36 8 Function Block Macrocells to 8 Macrocell 8 /GCK /GSR /GTS 3 2 or 4 Blocks FastCONNECT Switch Matrix 36 8 36 8 36 8 Function Block 2 Macrocells to 8 Function Block 3 Macrocells to 8 Function Block N Macrocells to 8 3 Global Global Set/Reset Clocks Macro cell SM098 Computation Structures - Programmable Logic 4
Field Programmable Gate Arrays - Xilinx XC4000 SM098 Computation Structures - Programmable Logic 5 CPL or FPGA? This is what Xilinx says (but I agree) CPL Non-volatile Wide fan-in Fast counters, state machines Combinational Logic Small student projects, lower level courses Control Logic FPGA SRAM reconfiguration Excellent for computer architecture, SP, registered designs ASIC like design flow Great for first year to graduate work More common in schools PROM required for non-volatile operation SM098 Computation Structures - Programmable Logic 6
Virtex Architecture SRAM based, needs external configuration memory Two main configurable elements: configurable logic blocks (CLBs) and input/output blocks (IOBs) CLBs interconnect through a general routing matrix (GRM). The VersaRing interface provides additional routing resources around the periphery of the device. The Virtex architecture also includes the following circuits that connect to the GRM. edicated block memories of 4096 bits each Clock LLs for clock-distribution delay compensation and clock domain control 3-State buffers (BUFTs) associated with each CLB that rive dedicated segmentable horizontal routing resources LL IOBs VersaRing BRAMs LL IOBs VersaRing CLBs VersaRing IOBs LL IOBs VersaRing BRAMs LL SM098 Computation Structures - Programmable Logic 7 Virtex routing resources A view from FPGA editor. Blue boxes are slices (2 slices = CLB). Grey lines are local interconnect. Red lines are long lines. Green lines are pin wires. Three switch boxes per CLB. SM098 Computation Structures - Programmable Logic 8
Virtex clock distribution There are four primary global clock nets that are driven by four global buffers. If these clock nets are used clock skew will not be a problem. GCLKPA3 GCLKPA2 Global Clock Rows GCLKBUF3 GCLKBUF2 Global Clock Column Global Clock Spine GCLKBUF GCLKBUF0 GCLKPA GCLKPA0 gclkbu_2.eps SM098 Computation Structures - Programmable Logic 9 Virtex IOB The Virtex IOBs are configurable to support several different high speed standards Weak Keeper SR PA OBUFT SR I Programmable elay IBUF SR Vref R LK ds022_02_09300 SM098 Computation Structures - Programmable Logic 0
Virtex CLB Xilinx definitions: Logic cell (LC) - 4 input LUT, carry logic and a storage element A slice consist of two LCs A CLB consists of 4.5 CLBs. The /2 LC comes from the fact that some additional logic is available for implementing functions with more than 4 inputs COUT COUT G4 G3 G2 G LUT Carry & Control SP EC YB Y Y G4 G3 G2 G LUT Carry & Control SP EC YB Y Y BY RC XB BY RC XB F4 F3 F2 F LUT Carry & Control SP EC X X F4 F3 F2 F LUT Carry & Control SP EC X X BX RC RC BX Slice Slice 0 CIN CIN slice_b.eps SM098 Computation Structures - Programmable Logic Virtex slice - detailed view The additional logic are the F5 and F6 multiplexers. COUT CY YB G4 G3 G2 G I3 I2 I I0 WE LUT I O 0 INIT EC Y Y BY REV F5IN F6 XB CY F5 F5 CK WE WSO BY G X BX A4 WSH BX I INIT EC X F4 F3 F2 F I3 I2 I I0 WE LUT I O REV 0 SR CLK CIN SM098 Computation Structures - Programmable Logic 2
Virtex - LUTs The Virtex LUTs can be configure to implement: 4-input LUTs implements any function of 4 variables! 6x-bit synchronous RAM Two LUTs in one slice can be combined to implement 6x2-bit or 32x-bit synchronous RAM 6x-bit dual-port synchronous RAM 6-bit shift register SM098 Computation Structures - Programmable Logic 3 LUTs Combinatorial Logic is stored in 6x SRAM Look Up Tables (LUTs) in a CLB Look Up Table Example: 4-bit address Combinatorial Logic A A B C Z B C w Capacity is limited by number of inputs, not complexity Z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0... 0 0 0 0 0 0 0 2 (2 ) 4 = 64K! SM098 Computation Structures - Programmable Logic 4
Virtex slice - FPGA Editor view SM098 Computation Structures - Programmable Logic 5 Example library ieee; use ieee.std_logic_64.all; entity Example is port ( A, B, C, : in std_logic; -- Inputs Reset, Clk, En : in std_logic; -- Reset, Clock, Clock enable Y : out std_logic); -- Output end Example; architecture RTL of Example is begin -- RTL process(clk) begin if rising_edge(clk) then if Reset = then Y <= 0 ; elsif En = then Y <= A xor B xor C xor ; end if; end if; end process; end RTL; How will this be implemented? How many slices? SM098 Computation Structures - Programmable Logic 6
Example SM098 Computation Structures - Programmable Logic 7 Example 2 8-bit adder with carry input and output How can this be implemented in a Virtex? How many slices? library ieee; use ieee.std_logic_64.all; use ieee.numeric_std.all; entity Example2 is port ( A, B : in unsigned(7 downto 0); Cin : in std_logic; R : out unsigned(7 downto 0); Cout : out std_logic); end Example2; architecture RTL of Example2 is begin -- RTL process(a, B, Cin) variable r_tmp : unsigned(8 downto 0); variable cin_tmp : integer range 0 to ; begin if Cin = 0 then cin_tmp := 0; else cin_tmp := ; end if; r_tmp := ( 0 & A) + B + cin_tmp; R <= r_tmp(7 downto 0); Cout <= r_tmp(8); end process; end RTL; SM098 Computation Structures - Programmable Logic 8
Example 2 Four slices - the carry chain is the high lighted (red) net Next slide shows this slice SM098 Computation Structures - Programmable Logic 9 Example 2 Two full adders per slice SM098 Computation Structures - Programmable Logic 20
A Clk Reset FC [0] s[0] FC [0] [] s[] FC [] [2] s[2] FC [2] [3] s[3] FC [3] [4] s[4] FC [4] [5] s[5] FC [5] [6] s[6] FC [6] [7] s[7] FC [7] [8] s[8] FC [8] [9] s[9] FC [9] FC [0] [0] FC [] [] FC [2] [2] FC [3] [3] s[0] FC [4] [4] s[] s[2] s[3] s[4] s[5] Y Example 3 - shift register library ieee; use ieee.std_logic_64.all; entity Example3 is port ( A : in std_logic; Clk, Reset : in std_logic; Y, Y2 : out std_logic); end Example3; architecture RTL of Example3 is signal S, S2 : std_logic_vector(5 downto 0); begin -- RTL 6 FFs 8 slices Shift : process(clk, Reset) begin if Reset = then S <= (others => 0 ); elsif rising_edge(clk) then S <= S(4 downto 0) & A; end if; end process; Shift2 : process(clk) begin if rising_edge(clk) then S2 <= S2(4 downto 0) & A; end if; end process; Y <= S(5); Y2 <= S2(5); end RTL A Clk 0 SRL6 A0 A A2 A3 CLK un2.i_ /2 slice F un2.out[0] Y2 SM098 Computation Structures - Programmable Logic 2 Virtex Block RAM Each Block RAM is a synchronous dual-ported 4096-bit RAM with independent control signals for each port ata widths may be configured independently WEA ENA RSTA CLKA ARA[#:0] IA[#:0] RAMB4_S#_S# OA[#:0] WEB ENB RSTB CLKB ARB[#:0] IB[#:0] OB[#:0] You have actually already used the block RAM in one lab. Virtex evice # of Blocks Total Block SelectRAM Bits XCV50 8 32,768 XCV00 0 40,960 XCV50 2 49,52 XCV200 4 57,344 XCV300 6 65,536 XCV400 20 8,920 XCV600 24 98,304 XCV800 28 4,688 XCV000 32 3,072 SM098 Computation Structures - Programmable Logic 22
Virtex LLs A elayed Locked Loop (LL) can align internal and external clocks. Effectively eliminates on-chip clock distribution delay. This maximizes the achievable speed. Chip Chip 2 LL LL Clock Clock ata Comparator Error elay Clock distribution Virtex have four LLs. The LLs can also be used to divide or double the incoming clock frequency internally. The output of the LL can drive the global clock routing recourses and clock skew can be eliminated. SM098 Computation Structures - Programmable Logic 23 Virtex compared to Virtex-E Virtex evice System Gates CLB Array Logic Cells Maximum Available Block RAM Bits Maximum SelectRAM+ Bits XCV50 57,906 6x24,728 80 32,768 24,576 XCV00 08,904 20x30 2,700 80 40,960 38,400 XCV50 64,674 24x36 3,888 260 49,52 55,296 XCV200 236,666 28x42 5,292 284 57,344 75,264 XCV300 322,970 32x48 6,92 36 65,536 98,304 XCV400 468,252 40x60 0,800 404 8,920 53,600 XCV600 66, 48x72 5,552 52 98,304 22,84 XCV800 888,439 56x84 2,68 52 4,688 30,056 XCV000,24,022 64x96 27,648 52 3,072 393,26 Virtex-E System gates? Logic gates? Logic cells? LG/LC=2 istributed System Logic CLB Logic ifferential User BlockRAM evice Gates Gates Array Cells Pairs Bits RAM Bits XCV50E 7,693 20,736 6 x 24,728 83 76 65,536 24,576 XCV00E 28,236 32,400 20 x 30 2,700 83 96 8,920 38,400 XCV200E 306,393 63,504 28 x 42 5,292 9 284 4,688 75,264 XCV300E 4,955 82,944 32 x 48 6,92 37 36 3,072 98,304 XCV400E 569,952 29,600 40 x 60 0,800 83 404 63,840 53,600 XCV600E 985,882 86,624 48 x 72 5,552 247 52 294,92 22,84 XCV000E,569,78 33,776 64 x 96 27,648 28 660 393,26 393,26 XCV600E 2,88,742 49,904 72 x 08 34,992 344 724 589,824 497,664 XCV2000E 2,54,952 58,400 80 x 20 43,200 344 804 655,360 64,400 XCV2600E 3,263,755 685,584 92 x 38 57,32 344 804 753,664 82,544 XCV3200E 4,074,387 876,096 04 x 56 73,008 344 804 85,968,038,336 SM098 Computation Structures - Programmable Logic 24