CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1
FPGA fabric A generic island-style FPGA fabric Configurable Logic Blocks (CLB) and Programmable Switch Matrices (PSM) Bitstream configures functionality of each CLB and interconnection between logic blocks 2
CLB (Combinational Logic Block) Xilinx case 3
Xilinx slice features LUTs MUXF5, MUXF6, MUXF7, MUXF8 (only the F5 and F6 MUX are shown in this diagram) Carry Logic MULT_ANDs Sequential Elements The Xilinx Slice Detailed Structure CE435 Basic - Embedded Architecture Systems 4
Slice Logic Look-Up Tables An N-input LUT to implement any combinational boolean function with N inputs Coarser-grained than logic gates Less area efficient than fixed logic gates (e.g. 4-input AND gate) Very powerful concept for implementation of bit-level random digital logic. Typical values (N=4,5, or 6) 5
Example 2-input LUT Lookup table: a b out 0 0 0 1 a b 0 0 0 1 LUT 1 0 0 1 out 0 1 1 0 0 0 0 0 configuration input 1 1 1 1 6
Example 4-input LUT 7
D Flip Flops Slice Logic A D-Flip Flop at the output of the LUT can be used to latch the output data It can also be used to carry state in FSM designs OR as a pipeline stage etc. 8
Carry Logic Carry Logic is used to speed up carry-based computations. Additions (CLA, Ripple Carry Adders), parity functions, etc. Cin/Cout routing is separate from general purpose routing (fewer logic stages and faster) Slice Logic 9
Carry Chains Dedicated carry chains speeds up arithmetic operations Simple, fast, and complete arithmetic Logic Dedicated XOR gate for single-level sum completion Uses dedicated routing resources All synthesis tools can infer carry logic S= A xor B xor C in C out = AB + (Cin(A xor B )) CIN COUT To S0 of the next CLB First Carry Chain COUT SLICE S1 SLICE S0 CIN COUT To CIN of S2 of the next CLB COUT Second Carry Chain CIN CIN CLB SLICE S3 SLICE S2 Basic Architecture 10
Multiplexer Logic Dedicated MUXes provided to connect slices and LUTs CLB Slice S3 Slice S2 F5 F8 F5 F6 MUXF8 combines the two MUXF7 outputs (from the CLB above or below) MUXF6 combines slices S2 and S3 Slice S1 F5 F7 MUXF7 combines the two MUXF6 outputs Slice S0 F5 F6 MUXF6 combines slices S0 and S1 MUXF5 combines LUTs in each slice CE435 Basic - Embedded Architecture Systems 11
Programmable wiring Organized into channels. Many wires per channel. Connections between wires made at programmable interconnection points. Must choose: Channels from source to destination. Wires within the channels. Routing area typically much larger than logic area 12
Programmable interconnect MOS switch controlled by configuration bit: D Q 13
Programmable wiring paths 14
channel channel Switchbox channel channel 15
Choosing a path LE LE 16
Interconnection architectures Segmented interconnects consists of shorter wires that connect to emulate longer wires Hierarchical interconnects assume that most connections are local 17
Routing problems Global routing: Which combination of channels? Local routing: Which wire in each channel? Routing metrics: Net length. Delay. 18
I/O Fundamental selection: input, output, threestate? Additional features: Register. Voltage levels. Slew rate. 19
Configuration Must set control bits for: LE. Interconnect. I/O blocks. Usually configured off-line. Separate burn-in step (antifuse). At power-up (SRAM). 20
Configuration vs. programming FPGA configuration: Bits stay at the device they program. A configuration bit controls a switch or a logic bit. CPU programming: Instructions are fetched from a memory. Instructions select complex operations. add r1, r2 memory add IR r1, r2 CPU 21
Reconfiguration Some FPGAs are designed for fast configuration. A few clock cycles, not thousands of clock cycles. Allows hardware to be changed on-the-fly. 22
FPGA fabric architecture questions Given limited area budget: How many logic elements? How much interconnect? How many I/O blocks? 23
Logic element questions How many inputs? How many functions? All functions of n inputs or eliminate some combinations? What inputs go to what pieces of the function? Any specialized logic? Adder, etc. What register features? 24
Interconnect questions How many wires in each channel? Uniform distribution of wiring? How should wires be segmented? How rich is interconnect between channels? How long is the average wire? How much buffering do we add to wires? 25
I/O block questions How many pins? Maximum number of pins determined by package type. Are pins programmed individually or in groups? Can all pins perform all functions? How many logic families do we support? 26
The Design Cycle for FPGAs (I) 27
The Design Cycle for FPGAs (II) 28
Mapping 29
Placement 30
Routing 31
Modern FPGA architecture Xilinx Virtex family Columns of on-chips SRAMs, hard IP cores (PPC 405), and DSP slices (Multiply-Accumulate) units 32
DSP slices Large number of hard multipliers allow for DSP applications 33
Example Aplication: FIR filtering 34
Device Complexity and Performance Architectural Evolution Reconfigurable FPGAs Domainoptimized System Logic Programmable System in a Package Glue Logic FPGA Fabric Block Logic FPGA Fabric Block RAM Platform Logic FPGA Fabric Block RAM Embedded Registers and Multipliers Clock Management Multi-standard Programmable IO System Logic FPGA Fabric Block RAM Embedded Registers and Multipliers Clock Management Multi-standard Programmable IO Embedded Microprocessor Multigigabit Transceivers 1985 1992 2000 2002 2004 FPGA Fabric Block RAM Embedded Registers and Multipliers Clock Management Multi-standard Programmable IO Embedded Microprocessor Multigigabit Transceivers Embedded DSPoptimized Multiplers Embedded Ethernet MACs 2005 35