FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

Similar documents
FPGA Implementations

Outline. Field Programmable Gate Arrays. Programming Technologies Architectures. Programming Interfaces. Historical perspective

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

Outline of Presentation Field Programmable Gate Arrays (FPGAs(

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

INTRODUCTION TO FPGA ARCHITECTURE

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

EITF35: Introduction to Structured VLSI Design

Field Programmable Gate Array (FPGA)

7-Series Architecture Overview

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

Programmable Logic Devices FPGA Architectures II CMPE 415. Overview This set of notes introduces many of the features available in the FPGAs of today.

Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

Digital Integrated Circuits

Topics. Midterm Finish Chapter 7

Spiral 2-8. Cell Layout

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

ECE 448 Lecture 5. FPGA Devices

FPGA architecture and design technology

EE219A Spring 2008 Special Topics in Circuits and Signal Processing. Lecture 9. FPGA Architecture. Ranier Yap, Mohamed Ali.

Very Large Scale Integration (VLSI)

Outline of Presentation

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

Altera FLEX 8000 Block Diagram

FPGA Architecture Overview. Generic FPGA Architecture (1) FPGA Architecture

Memory and Programmable Logic

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA: What? Why? Marco D. Santambrogio

Zynq-7000 All Programmable SoC Product Overview

Programmable Logic. Any other approaches?

Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool

FPGA How do they work?

Zynq AP SoC Family

International Training Workshop on FPGA Design for Scientific Instrumentation and Computing November 2013.

Learning Outcomes. Spiral 3 1. Digital Design Targets ASICS & FPGAS REVIEW. Hardware/Software Interfacing

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

Introduction to Field Programmable Gate Arrays

Spiral 3-1. Hardware/Software Interfacing

Copyright 2017 Xilinx.

EE178 Lecture Module 2. Eric Crabill SJSU / Xilinx Fall 2007

Field Programmable Gate Array

Embedded Systems: Hardware Components (part I) Todor Stefanov

Introduction to FPGAs. H. Krüger Bonn University

What is Xilinx Design Language?

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I

XA Spartan-6 Automotive FPGA Family Overview

Introduction to Partial Reconfiguration Methodology

Atmel AT94K FPSLIC Architecture Field Programmable Gate Array

ECE 545 Lecture 12. FPGA Resources. George Mason University

Programmable Logic. Simple Programmable Logic Devices

ECE 448 Lecture 5. FPGA Devices

Memory and Programmable Logic

CHAPTER 9 MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES

VHX - Xilinx - FPGA Programming in VHDL

The Xilinx XC6200 chip, the software tools and the board development tools

FPGAs: Instant Access

Zynq Architecture, PS (ARM) and PL

Field Program mable Gate Arrays

Chapter 2. FPGA and Dynamic Reconfiguration ...

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

Summary. Introduction. Application Note: Virtex, Virtex-E, Spartan-IIE, Spartan-3, Virtex-II, Virtex-II Pro. XAPP152 (v2.1) September 17, 2003

Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES

A Case Study. Jonathan Harris, and Jared Phillips Dept. of Electrical and Computer Engineering Auburn University

Reconfigurable Computing

Experiment 3. Digital Circuit Prototyping Using FPGAs

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx

Enabling success from the center of technology. A Practical Guide to Configuring the Spartan-3A Family

Field Programmable Gate Arrays (FPGAs)

System-on Solution from Altera and Xilinx

Configuration Modes and Daisy-Chains

Lecture 7. Standard ICs FPGA (Field Programmable Gate Array) VHDL (Very-high-speed integrated circuits. Hardware Description Language)

Section 6. Memory Components Chapter 5.7, 5.8 Physical Implementations Chapter 7 Programmable Processors Chapter 8

DIGITAL CIRCUIT LOGIC UNIT 9: MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES

Xilinx ASMBL Architecture

Copyright 2016 Xilinx

XA Artix-7 FPGAs Data Sheet: Overview

ECE 645: Lecture 1. Basic Adders and Counters. Implementation of Adders in FPGAs

Spartan-3E FPGA Design Guide for prototyping and production environment

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Chapter 13 Programmable Logic Device Architectures

Embedded Systems. 7. System Components

Built-In Self-Test for Regular Structure Embedded Cores in System-on-Chip

Virtex-II Architecture

Virtex-4 Family Overview

An Architecture for Fail-Silent Operation of FPGAs and Configurable SoCs

Advanced Digital Design Using FPGA. Dr. Shahrokh Abadi

Spartan-3 FPGA Family Advanced Configuration Architecture

Hardware Design with VHDL PLDs IV ECE 443

Lecture 11 SRAM Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor

Qsys and IP Core Integration

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.

PINE TRAINING ACADEMY

FPGA Based Digital Design Using Verilog HDL

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

Transcription:

FIELD PROGRAMMABLE GATE ARRAYS (FPGAS) 1 Roth Text: Chapter 3 (section 3.4) Chapter 6 Nelson Text: Chapter 11

Programmable logic taxonomy Lab Device 2

Field Programmable Gate Arrays Typical Complexity = 5M 1B transistors 3

Basic FPGA Operation Writing configuration memory defines system function Input/Output Cells Logic in PLBs Connections between PLBs & I/O cells Changing configuration memory data changes system function Can change at anytime Even while system function is in operation 111001101000100010010101000101110001 01001010101010010010001000101010010 01001100100100001111000110010100010 00011001000101000100100100100010100 10101010010010010100010100101000101 001010010001001010101110101010101010 101010101011110111110000000000000011 010011111000010011100000111001001010 00000001111100100100010100111001001 010000111100011100010010101010101010 101010010100101010100100101010101010 101001001001 4

5

6

Ranges of Resources Logic Routing Specialized Cores Other FPGA Resource Small FPGA Large FPGA PLBs per FPGA 256 25,920 LUTs and flip-flops per PLB 1 8 Wire segments per PLB 45 406 PIPs per PLB 139 3,462 Bits per memory core 128 36,864 Memory cores per FPGA 16 576 DSP cores 0 512 Input/output cells 62 1,200 Configuration memory bits 42,104 79,704,832 7

Programmable ASIC logic cells Xilinx : configurable logic block (CLB) contains SRAM lookup tables (LUTs) to implement combinational logic D flip flops Multiplexers to establish paths in the CLB Actel ACT : multiplexers implement logic Altera Flex : similar to Xilinx CLB Altera MAX : PALs implement logic 8

Mux-based logic blocks in Text Figure 3.33 9

Actel ACT architecture (Fig. 5.1) (mux-based logic modules) ACT 1 logic module Pass transistor implementation 10

Virtex and Spartan 2 Array of 96 to 6,144 PLBs 4 LUTs/RAMs (4-input) 4 FF/latches 4 to 32 4K-bit dual-port RAMs Virtex II, Virtex II Pro Array of 352 to 11,204 PLBs 8 LUTs/RAMs (4-input) 8 FF/latches 12 to 444 18K-bit dual-port RAMs 12 to 444 18 18-bit multipliers 0 to 2 PowerPC processor cores Virtex 4 Array of 1,536 to 22,272 PLBs 4 LUTs/RAMs (4-input) 4 LUTs (4-input) 8 FF/latches 48 to 552 18K-bit dual-port RAMs Also operate as FIFOs 32 to 512 DSP cores include: 0 to 2 PowerPC processor cores Special cores I/O cells Routing PLBs Xilinx Spartan 3 PC PC Array of 192 to 8,320 PLBs 4 LUTs/RAMs (4-input) 4 LUTs (4-input) 8 FF/latches 4 to 104 18K-bit dual-port RAMs 4 to 104 18 18-bit multipliers 11

Xilinx 7 Series Families 12

Xilinx Artix-7 Family 13

Xilinx UltraScale Family Kintex and Virtex UltraScale and UltraScale+ 14

Xilinx: Basic CLB Architecture Look-up Table (LUT) implements truth table Memory elements: Flip-flop/latch Some - LUTs can also implement small RAMs Carry & control logic implements fast adders/subtractors Input[1:4] Control 4 LUT/ RAM Carry & Control Logic carry out Flip-flop/ Latch Output Q output clock, enable, set/reset 3 carry in 15

Combinational Logic Functions Gates are combined to create complex circuits A S Z Multiplexer example B If S = 0, Z = A If S = 1, Z = B Very common digital circuit Heavily used in S input controlled by configuration memory bit We ll see it again Truth table S A B Z 0 0 0 0 0 0 1 0 0 1 0 1 0 1 1 1 1 0 0 0 1 0 1 1 1 1 0 0 1 1 1 1 Logic symbol A B S 01 0 1 Z 16

Recall multiplexer example Configuration memory holds outputs for truth table Internal signals connect to control signals of multiplexers to select value of truth table for any given input value Look-up Tables 0 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 B A S Z 1 Multiplexer A B S 0 1 Truth table S A B Z 0 0 0 0 0 0 1 0 0 1 0 1 0 1 1 1 1 0 0 0 1 0 1 1 1 1 0 0 1 1 1 1 Z 17

Look-up Table Based RAMs Data In ck0 0 0 Normal LUT mode performs read operations Address decoder with write enable generates clock signals to latches for write operations Small RAMs but can be combined for larger RAMs In0 In1 In2 Address Decoder ck1 ck2 ck3 ck4 ck5 ck6 0 1 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 Z Write Enable ck7 1 1 In0 In1 In2 18

A Simple CLB Two 3-input LUTs Can implement any 4-input combinational logic function 1 flip-flop Programmable: Active levels Clock edge Set/reset 22 configuration memory bits 8 per LUT C0-7 S0-7 6 controls CB0-7 D2-0 D3 3 CB 5 Clock Enable Set/Reset Clock LUT C 8x1 LUT S 8x1 D2-0 LUT C 7 Smux 0 1 CB 0 CB 1 CB 2 C 6 C 5 C 4 C 3 C 2 C 1 C 0 111 110 101 100 011 010 001 000 0 1 CEmux CB 3 SRmux 0 1 CB out FF CB 4 Cout SOmux 0 Sout 1 = Configuration Memory Bit 19

Artix-7 SLICEL (1/2 shown) Four 6-input Look-Up Tables (LUTs) Any combinational logic function of up to 6 inputs SLICEM LUT can function as small RAM (16x1-bit) or shift register (up to 16-bit) Example CLB Eight D flip-flops Programmable as latches Programmable clock edge, clock enable, set/reset Extra logic Fast carry for adders MUXs for Shannon expansion And more 20

CLBs and Slices in rows/columns 21

Using lookup-table (LUT) programmable logic 22

Functions of more variables than # of LUT inputs Shannon s Expansion Theorem (partition into smaller functions): ZZ aa, bb, cc, dd, ee, ff = aa ZZ 0, bb, cc, dd, ee, ff + aa ZZ 1, bb, cc, dd, ee, ff = aazz 0 + aazz 1 23

MM = SS 1 SS 0 II 0 + SS 1 SS 0 II 1 + SS 1 SS 0 II 2 + SS 1 SS 0 II 3 M1 M2 24

25

Fig. 6-13 Simplified Spartan and Virtex Slice 26

Bi-directional buffers Input/Output Cells Programmable for input or output Tri-state control for bi-directional operation Flip-flops/latches for improved timing Set-up and hold times Clock-to-output delay Pull-up/down resistors Routing resources Connections to core of array Programmable I/O voltage & current levels to/from internal routing resources Tri-state Control Output Data Input Data Pad 27

Detailed I/O Cell 28

Interconnect Network Wire segments of varying length xn = N CLBs in length 1, 2, 4, and 6 are most common xh = half the array in length xl = length of full array Programmable Interconnect Points (PIPs) Also known as Configurable Interconnect Points (CIPs) Transmission gate connects to 2 wire segments Controlled by configuration memory bit 0 = wires disconnected 1 = wires connected config bit Wire A Wire B 29

Xilinx interconnect structures 30

PIPs Break-point PIP Connect or isolate 2 wire segments Cross-point PIP Turn corners Multiplexer PIP Directional and buffered Select 1-of-N inputs for output Decoded MUX PIP N config bits select from 2 N inputs Non-decoded MUX PIP 1 config bit per input Compound cross-point PIP Collection of 6 break-point PIPs Can route to two isolated signal nets 31

32

Spartan 3 Routing Resources switch matrix over 2,400 PIPs mostly MUX PIPs PLB consists of 4 slices x6 wire segments x2 wire segments xh & xl wire segments over 450 total wire segments in PLB 33

Fully routed design LOGICIN1 S2BEG0 LOGICOUT1 (5,7) S3 (6,7) (7,7) S1 (8,7).I1.O1.O2 Net N1 LOGICOUT2 W2END0 W2BEG0 W2END1 W2BEG1 S2END0 LOGICIN2 (5,6) (6,6) (7,6) (8,6).I1 S2 E2BEG2 Net N2 E2END2 Net N1: Site S1 output pin O1 connects to input pin I1 on site S3 Net N2: Site S1 output pin O2 connects to input pin I1 on site S2 Black dots are routing pips Predefined connections exist between switch boxes 34

ELEC 4200 Lab 0 in Spartan 6 35

Lab 0 in Spartan 6 (routing details) 36

Ex: modulo7 counter (device xc6slx25t) INTs Nets CLBs IO Pads 37

FPGA clock regions Logic Resources Logic Resources Clock Region Logic Resources Logic Resources Logic Resources Logic Resources Clock Region Logic Resources Logic Resources Center Clock Column(s) Clock Routing 38

Spartan 6 clock tree example Main vertical spine BUFG Folded vertical spine (one each in top and bottom half) CLKC tile Distribution wire in center of clock region Clock to INTs in one column 4 BUFH 7 8 INT CLEXM 1 0 39

7 Series FPGA high-level clock architecture view 40

Basic view of a clock region 41

Clock management tile (CMT) Mixed-mode clock manager (MMCM) MMCM Outputs Clock Sources MMCM Outputs = frequency divided, phase shifted, inverted Up to 24 CMTs per Series 7 device 42

Clock management tile (CMT) Phase-locked loop (PLL) PLL = frequency synthesizer using a voltage-controlled oscillator (VCO) PLL Outputs Clock Sources 43

From DCM/PLL and fabric Spartan 6 global clock sources From top edge global clock pads BUFGs Vertical spines BUFG controls from fabric Clock created by MIPS control logic Switch Box From right edge global clock pads From left edge Global clock pads From clock input pad From DCM/PLL and fabric From bottom edge global clock pads 44

Specialized hard cores RAMs single-port, dual-port, FIFOs 128 bits to 36K bits per RAM 4 to 575 RAM cores per FPGA DSPs 18x18-bit multiplier, 48-bit accumulator, etc. up to 512 per FPGA Microprocessors and/or microcontrollers Up to 2 per FPGA (hard core processor) Support soft core processors Synthesized from HDL into programmable resources Communication functions Gigabit transceivers Ethernet MAC PCE Express bus 45

FPGA Architectures 4000/Spartan NxN array of unit cells Unit cell = CLB + routing Special routing along center axes I/O cells around perimeter Virtex/Spartan-2 MxN array of unit cells Added block 4K RAMs at edges Virtex-2/Spartan-3 Block 18K RAMs in array Added 18x18 multipliers with each RAM Added PowerPCs in Virtex-2 Pro Virtex-4/Virtex-5 Added 48-bit DSP cores w/multipliers I/O cells along columns for BGA PC PC PC PC 46

Xilinx Virtex-4 Configuration memory: 4.7M to 50.8M bits of RAM PLBs: 1,536 to 22,272 4 slices per PLB 2 LUTs & 2 FFs per slice 2 slices can operate as RAMs/SRs Block RAMs: 48 to 552 18K-bit dualport RAMs Also operate as FIFOs DSP cores: 32 to 512, each includes: 18x18-bit multiplier 48-bit adder & accumulator Up to 2 PowerPC processors PC PC 47

Block RAMs 36 Kbit dual-port RAM Each port independently configurable: IK words x 36 bits 32 data bits + 4 parity bits 2K words x 18 bits 16 data bits + 2 parity bits 4K words x 9 bits 8 data bits + 1 parity bit 8K words x 4 bits (no parity) 16K words x 2 bits (no parity) 32K words x 1 bit (no parity) Each port has independently programmable clock edge, active levels for write enable, RAM enable, reset 48

49

ROM Contents 6-50

Distributed RAM 6-51

6-52

Refer to the synthesis guide for recommended HDL forms 53

DSP Blocks: Multiplier and Support Circuits 54

7 Series DSP48E1 DSP slice 25 18 two s-complement multiplier: Dynamic bypass 48-bit accumulator: Can be used as a synchronous up/down counter Power-saving pre-adder: Optimizes symmetrical filter applications and reduces DSP slice requirements 55

DSP48E1 slice details 56

Hard core Faster Fixed position Few devices Embedded Processors Soft core Slower Can be placed anywhere Applicable to many devices Virtex-4 Processors: Embedded Core Max Clock Processor Type Frequency PowerPC Hard 222 MHz 1000 250 9 Microblaze Soft 180 MHz 940 235 9 Picoblaze Soft 221 MHz 333 84 3 Picoblaze (optimized) ARM Processors in 7 Series SlicesPLBs Block RAMs Soft 233 MHz 274 69 3 MicroBlaze PowerPC MicroBlaze PicoBlaze 57

Xilinx Zynq SoC devices Zynq-7000 SoC: Dual-core ARM Cortex-A9 MPCore (up to 1GHz) Zynq UltraScale+ MPSoC: Quad-core ARM Cortex-A53 MP (up to 1.5 GHz) Dual-core ARM Cortex-R5 MPCore (up to 600MHz) GPY ARM Mali-400 MP2 (up to 667MHz) PL = Programmable Logic 58

Zynq-7000 SoC Features (1) Processing System Resources Continued next slide 59

Zynq-7000 SoC Features (2) Programmable Logic Resources 60

61

Zynq-7000 SoC Processor System 62

63

Zynq-7000 SoC Logic Fabric Series-7 CLBs, IOBs, etc. (as in Artix-7) 64

Configuration Interfaces Master FPGA retrieves its own configuration from ROM after power-up Serial or Parallel options clock CCLK CCLK CCLK Slave FPGA configured by external source (i.e., a µp) Serial or Parallel options Used for dynamic reconfiguration Can also read configuration memory contents Boundary Scan Interface 4-wire IEEE standard serial interface for testing Write and read access to configuration memory Not available in all Used for dynamic partial reconfiguration Interfaces to FPGA core Not available in all PROM with Configuration Data data out FPGA in Master Mode Connections between Boundary Scan Interface and internal routing network and PLBs (Xilinx provides 2-4 of these ports) Other configuration interfaces in some Din Dout FPGA in Slave Mode Din Dout FPGA in Slave Mode Din Dout 65

Slave configuration modes 66

Nexys4 DDR configuration options Artix-7 100T bitstream is typically 30,606,304 bits 2 1 3 1. USB-JTAG : PC connection via USB or JTAG 2. Master SPI: Program from quad mode flash memory (x1, x2, x4 width) 3. USB/SD: Program from micro SD card or USB memory stick 67

FPGA Configuration Memory PLB addressable Good for partial reconfiguration X-Y coordinates of PLB location to be written Requires tag to identify which resources will be configured Frame addressable Vertical or horizontal frame Access to all PLBs in frame Only portion of logic and routing resources accessible in a given frame Many frames to configure PLBs Major address for column, minor address for frame Hybrid, i.e.: Virtex-4 Virtex-5 Virtex-6 68

Daisy Chain Configuration EPROM Configuration Bits 69

Xilinx Configuration Interface Pins 70

Configuration Techniques Full configuration & readback Simple configuration interface Internal automatic calculation of frame address Long download time for large Partial reconfiguration & readback Only change portions of configuration memory with respect to reference design Reduces download time for reconfiguration Requires more complicated interface Command Register (CMR) Frame Length Register (FLR) Frame Address Register (FAR) Frame Data Register Input (FDRI) for download Output (FDRO) for readback (note separate access) 71

Full Configuration Example Dummy Word 0xFFFFFFFF Synchronize Word 0xAA995566 CMD Write 0x30008001 Reset CRC 0x00000007 FLR Write 0x30016001 FLR = 0x00000024 Frame length = 37 words 1,184 bits 32 bits/word COR Write 0x30012001 COR Write 0x00003FE5 IDCODE Write 0x3001C001 Device ID = 0x0140D093 (3S50) MASK Write 0x3000C001 MASK = 0x00000000 CMD Write 0x30008001 Switch CCLK 0x00000009 FAR Write 0x30002001 FAR = 0x00000000 (full config) CMD Write 0x30008001 Write CFG 0x00000001 FDRI Write 0x30004000 # words to write 0x50003555 Xilinx ASCII Bitstream Created by Bitstream I.32 Design name: s3mod7.ncd Architecture: spartan3 Part: 3s50tq144 Date: Tue Sep 04 15:50:09 2007 Bits: 439264 11111111111111111111111111111111 10101010100110010101010101100110 00110000000000001000000000000001 00000000000000000000000000000111 00110000000000010110000000000001 00000000000000000000000000100100 00110000000000010010000000000001 01000000000000000011111111100101 00110000000000011100000000000001 00000001010000001101000010010011 00110000000000001100000000000001 00000000000000000000000000000000 00110000000000001000000000000001 00000000000000000000000000001001 00110000000000000010000000000001 00000000000000000000000000000000 00110000000000001000000000000001 00000000000000000000000000000001 00110000000000000100000000000000 01010000000000000011010101010101 start of actual configuration data 72