Field Programmable Gate Arrays (FPGAs)

Similar documents
ECE 545 Lecture 12. FPGA Resources. George Mason University

Digital Integrated Circuits

ECE 448 Lecture 5. FPGA Devices

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

INTRODUCTION TO FPGA ARCHITECTURE

Zynq-7000 All Programmable SoC Product Overview

Field Programmable Gate Array

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Review. EECS Components and Design Techniques for Digital Systems. Lec 03 Field Programmable Gate Arrays

Field Program mable Gate Arrays

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

Field Programmable Gate Array (FPGA) Devices

ECE 448 Lecture 5. FPGA Devices

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

FPGA architecture and design technology

Introduction to Field Programmable Gate Arrays

SoC Platforms and CPU Cores

EITF35: Introduction to Structured VLSI Design

Spiral 2-8. Cell Layout

ECE 3220 Digital Design with VHDL. Course Information. Lecture 1

Programmable Logic Devices

Lecture 7. Standard ICs FPGA (Field Programmable Gate Array) VHDL (Very-high-speed integrated circuits. Hardware Description Language)

FPGA: What? Why? Marco D. Santambrogio

FPGAs in a Nutshell - Introduction to Embedded Systems-

Field Programmable Gate Array (FPGA)

FPGA Based Digital Design Using Verilog HDL

FPGA How do they work?

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Programmable Logic Devices FPGA Architectures II CMPE 415. Overview This set of notes introduces many of the features available in the FPGAs of today.

FYSE420 DIGITAL ELECTRONICS. Lecture 7

FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

An Introduction to Programmable Logic

FPGA VHDL Design Flow AES128 Implementation

Zynq Architecture, PS (ARM) and PL

MYC-C7Z010/20 CPU Module

Topics. Midterm Finish Chapter 7

Zynq AP SoC Family

Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool

Introduction to VHDL Design on Quartus II and DE2 Board

Advanced Digital Design Using FPGA. Dr. Shahrokh Abadi

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Chapter 5: ASICs Vs. PLDs

Programmable Logic. Simple Programmable Logic Devices

CS211 Digital Systems/Lab. Introduction to VHDL. Hyotaek Shim, Computer Architecture Laboratory

Lecture 41: Introduction to Reconfigurable Computing

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I

MYD-C7Z010/20 Development Board

Design Methodologies. Full-Custom Design

Outline. Field Programmable Gate Arrays. Programming Technologies Architectures. Programming Interfaces. Historical perspective

FPGA Implementations

Graduate course on FPGA design

The Xilinx XC6200 chip, the software tools and the board development tools

Introduction to Modern FPGAs

Memory and Programmable Logic

Chapter 9: Integration of Full ASIP and its FPGA Implementation

PINE TRAINING ACADEMY

Readings: Storage unit. Can hold an n-bit value Composed of a group of n flip-flops. Each flip-flop stores 1 bit of information.

Programmable Logic. Any other approaches?

discrete logic do not

Copyright 2017 Xilinx.

Design Methodologies and Tools. Full-Custom Design

DIGITAL LOGIC WITH VHDL (Fall 2013) Unit 1

CS310 Embedded Computer Systems. Maeng

Programmable Logic Devices UNIT II DIGITAL SYSTEM DESIGN

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

FABRICATION TECHNOLOGIES

Graduate Institute of Electronics Engineering, NTU FPGA Design with Xilinx ISE

The Virtex FPGA and Introduction to design techniques

VHX - Xilinx - FPGA Programming in VHDL

Advanced FPGA Design. Jan Pospíšil, CERN BE-BI-BP ISOTDAQ 2018, Vienna

Workspace for '4-FPGA' Page 1 (row 1, column 1)

Very Large Scale Integration (VLSI)

BASICS. Design. Field-programmable gate arrays. Tradeoffs Abound. FPGAs offer all of the features needed to implement. in FPGA Design.

LSN 6 Programmable Logic Devices

ECE 448 FPGA and ASIC Design with VHDL. Spring 2018

ECEN 449 Microprocessor System Design. FPGAs and Reconfigurable Computing

What is Xilinx Design Language?

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx

BASICS. Design. Field-programmable gate arrays. Tradeoffs Abound. FPGAs offer all of the features needed to implement. in FPGA Design.

Xilinx ASMBL Architecture

Lecture 12 VHDL Synthesis

Terasic DE0 Field Programmable Gate Array (FPGA) Development Board

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

ECE 448 Lecture 3. Combinational-Circuit Building Blocks. Data Flow Modeling of Combinational Logic

ECE 448 Lecture 3. Combinational-Circuit Building Blocks. Data Flow Modeling of Combinational Logic

Atlys (Xilinx Spartan-6 LX45)

Copyright 2016 Xilinx

Lecture 3: Modeling in VHDL. EE 3610 Digital Systems

DE2 Board & Quartus II Software

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

Design Choices for FPGA-based SoCs When Adding a SATA Storage }

EE178 Lecture Module 2. Eric Crabill SJSU / Xilinx Fall 2007

Early Models in Silicon with SystemC synthesis

S2C K7 Prodigy Logic Module Series

ECE 331 Digital System Design

The QR code here provides a shortcut to go to the course webpage.

Transcription:

EECE-474 Advanced VHDL and FPGA Design Lecture Field Programmable Gate Arrays (FPGAs) Cristinel Ababei Dept. of Electrical and Computer Engr. Marquette University Overview FPGA Devices ASIC vs. FPGA FPGA architecture FPGA Design Flow Synthesis Place Route

Traditional CMOS Circuits (think of application specific integrated circuits, ASICs) Once fabricated cannot be changed! 3 Field Programmable Gate Array (FPGA) Regularity = predictability Once fabricated: Does not implement a specific circuit functionality! Can be (re)programmed or configured to implement any desired circuit! 4 2

ASIC vs. FPGA ASIC Application Specific Integrated Circuit designed all the way from behavioral description to physical layout designs must be sent for expensive and time consuming fabrication in semiconductor foundry FPGA Field Programmable Gate Array no physical layout design; design ends with a bitstream used to configure a device bought off the shelf and reconfigured by designers themselves Which way to go? ASICs High performance Low power Low cost in high volumes FPGAs Off-the-shelf Low development cost Short time to market Reconfigurability 3

Why FPGAs? Custom ICs are very expensive to develop, and delay introduction of product to market (time to market) because of increased design time. Note: need to worry about two kinds of costs:. cost of development, called non-recurring engineering (NRE) 2. cost of manufacture A tradeoff usually exists between NRE cost and manufacturing costs total costs A B FPGAs ASICs NRE number of units manufactured (volume) Applications of FPGAs Implementation of random logic easier changes at system-level (one device is modified) can eliminate need for full-custom chips Prototyping ensemble of gate arrays used to emulate a circuit to be manufactured get more/better/faster debugging done than possible with simulation Reconfigurable hardware one hardware block used to implement more than one function functions must be mutually-exclusive in time can greatly reduce cost while enhancing flexibility Special-purpose computation engines hardware dedicated to solving one problem (or class of problems) accelerators attached to general-purpose computers 4

Applications of FPGAs Early on, used to serve as glue logic and for prototyping. Now? Everywhere! Communications, software-defined radio, digital signal processing, ASIC prototyping, computer hardware emulation, medical imaging, computer vision, automotive, speech recognition, cryptography, bioinformatics, financial, bitcoin, https://www.altera.com/products/fpga/arria-series/arria- /applications.html https://www.xilinx.com/applications.html https://www.xilinx.com/about/customer-innovation/aerospaceand-defense/mars-exploration-rovers.html HW accelerators in datacenter servers (Intel purchased Altera for $6 billion). 9 Major FPGA Vendors SRAM-based FPGAs Xilinx Inc. Altera Corp. (now Intel) Atmel Lattice Semiconductor Share about 9% of the market Flash & antifuse FPGAs Actel Corp. Quick Logic Corp. 5

Xilinx FPGA Families Old families XC3, XC4, XC52 Old.5µm,.35µm and.25µm technology. Not recommended for modern designs. High-performance families Virtex (22 nm) Virtex-E, Virtex-EM (8 nm) Virtex-II, Virtex-II PRO (3 nm) Virtex-4 (9 nm) Virtex-5 (65 nm) Virtex-6 Low Cost Family Spartan/XL derived from XC4 Spartan-II derived from Virtex Spartan-IIE derived from Virtex-E Spartan-3 (9 nm) Spartan-3E (9 nm) logic optimized Spartan-3A (9 nm) I/O optimized Spartan-3AN (9 nm) non-volatile Spartan-3A DSP (9 nm) DSP optimized Spartan-6 Zynq-7 Based on the Xilinx All programmable SoC architecture; 28nm technology node ARM dual-core Cortex-A9 MPCore processors Fixed processing system that can operate independently from the programmable logic Processor boots on reset like any processor-based device or ASSP Processor acts as system master and controls the configuration of the programmable logic enabling full or partial reconfiguration of the programmable logic during operation Standard development flows providing a familiar programming environment for software developers Additional documentation and resources: http://www.xilinx.com/products/silicon-devices/soc/zynq-7.html 6

Zynq-7 Device Family Processor Core Processor Extensions L Cache L2 Cache Memory Interfaces Peripherals Z-7 Z-75 Z-72 Z-73 Z-745 Z-7 Dual ARM Cortex -A9 MPCore with CoreSight NEON & Single / Double Precision Floating Point for each processor 52 KB 256 KB DDR3, DDR3L, DDR2, LPDDR2, 2x Quad-SPI, NAND, NOR 2x USB 2. (OTG), 2x Tri-mode Gigabit Ethernet, 2x SD/SDIO Logic Cells 28K Logic Cells 74K Logic Cells 85K Logic Cells 25K Logic Cells 35K Logic Cells 444K Logic Cells BlockRAM (Mb) 24 KB 38 KB 56 KB,6 KB 2,8 KB 3,2 KB DSP Slices 8 6 22 4 9 2,2 Transceiver Count 4 (6.25 Gb/s) up to 8 (2.5 Gb/s) up to 6 (2.5 Gb/s) up to 6 (.325 Gb/s) Zynq-7 Diagram 7

ZebBoard Intel Altera FPGA Families High & Medium Density FPGAs Stratix II, Stratix, APEX II, APEX 2K, & FLEX K Low-Cost FPGAs Cyclone & ACEX K FPGAs with Clock Data Recovery Stratix GX & Mercury CPLDs MAX 7 & MAX 3 Embedded Processor Solutions Nios, Excalibur Configuration Devices EPC 8

Altera: Cyclone V Extends the Cyclone FPGA series Wide spectrum of general logic applications Up to 3, logic elements (LEs) Additional documentation and resources: https://www.altera.com/products/fpga/cycloneseries/cyclone-v/features.html Cyclone V Key Architectural Features 9

Cyclone V Devices

8-input Adaptive Logic Module (ALM)

DE-SoC Board $75 USD (academic) FPGA Device Cyclone V SoC 5CSEMA5F3C6 Device Dual-core ARM Cortex-A9 (HPS) 85K Programmable Logic Elements 4,45 Kbits embedded memory 6 Fractional PLLs 2 Hard Memory Controllers Built-in USB Blaster for FPGA programming http://www.terasic.com.tw/cgibin/page/archive.pl?language=english&categoryno=2 5&No=836&PartNo=2 Overview FPGA Devices ASIC vs. FPGA FPGA architecture FPGA Design Flow Synthesis Place Route 2

FPGA Architecture General 25 FPGA Architecture Detail 26 3

Configurable Logic Block (CLB) > Think of LUT as of memory that stores truth table of any Boolean function of 4 inputs! > The four inputs represent the address from where to read from this memory! INPUTS Logic Block 4-LUT FF latch set by configuration bit-stream OUTPUT 4-input look-up table (LUT) Implements combinational logic functions (essentially store truth table of the function) How do we implement LUT s? Register Optionally stores output of LUT 4-input "look up table" 27 How could you build a generic Boolean logic circuit? Memories as LUTs N-bit address memory word 2 N words -bit memory to hold boolean value Address is vector of boolean input values Contents encode a boolean function Read out logical value (col) for associated row 4

LUT as general logic gate An n-lut as a direct implementation of a function truth-table. Each latch location holds the value of the function corresponding to one input combination. Example: 2-LUT INPUTS AND OR Can be used to implement any function of 2 inputs. How many of these are there? How many functions of n inputs? Example: 4-lut INPUTS F(,,,) F(,,,) F(,,,) F(,,,) store in st latch store in 2nd latch LUT as general logic gate x x 2 x 3 x 4 y x x 2 x 3 x 4 LUT y x x 2 x 3 x 4 x x 2 x 3 x 4 y Look-Up Tables are primary elements for logic implementation Each LUT can implement any function of 4 inputs x x 2 y y 5

5-Input functions implemented using two LUTs X5 X4 X3 X2 X Y LUT LUT OUT Recall: Multiplexer/Demultiplexer Multiplexer: route one of many inputs to a single output Demultiplexer: route single input to one of many outputs control control multiplexer demultiplexer 4x4 switch 6

Multiplexers/Selectors: to implement logic 2: mux: Z = A' I + A I 4: mux: Z = A' B' I + A' B I + A B' I2 + A B I3 8: mux: Z = A'B'C'I + A'B'CI + A'BC'I2 + A'BCI3 + AB'C'I4 + AB'CI5 + ABC'I6 + ABCI7 I I 2: mux Z I I I2 I3 4: mux Z I I I2 I3 I4 I5 I6 I7 8: mux Z A A B A B C Multiplexers as LUTs 2 n : multiplexer implements any function of n variables With the variables used as control inputs and Data inputs tied to or In essence, a look-up table Example: F(A,B,C) = m + m2 + m6 + m7 = A'B'C' + A'BC' + ABC' + ABC = A'B'(C') + A'B(C') + AB'() + AB() 2 3 4 8: MUX 5 6 7 S2 S S F A B C 7

Cascading Multiplexers Large multiplexers implemented by cascading smaller ones I I I2 I3 I4 I5 I6 I7 4: mux 4: mux B C 2: mux control signals B and C simultaneously choose one of I, I, I2, I3 and one of I4, I5, I6, I7 control signal A chooses which of the upper or lower mux's output to gate to Z A 8: mux Z I I I2 I3 I4 I5 I6 I7 2: mux 2: mux 2: mux 2: mux C alternative implementation 4: mux A B 8: mux Z 4-LUT Implementation INPUTS 6 latch latch latch latch 6 x mux OUTPUT Latches programmed as part of configuration bit-stream n-bit LUT is implemented as a 2 n x memory: Inputs choose one of 2 n memory locations. Memory locations (latches) are normally loaded with values from user s configuration bit stream. Inputs to mux control are the CLB inputs. Result is a general purpose logic gate n-lut can implement any function of n inputs! Example: I I I2 I3 4: mux Z A B 36 8

Example: Xilinx Virtex-E Floorplan Configurable Logic Blocks 4-input function gens buffers flipflop Input/Output Blocks combinational, latch, and flipflop output sampled inputs Block RAM 496 bits each every 2 CLB columns Virtex-E Configurable Logic Block (CLB) CLB = 4 logic cells (LC) in two slices LC: 4-input function generator, carry logic, storage element 8 x 2 CLB array on 2E 6x synchronous RAM FF or latch 9

Details of Virtex-E Slice implements any two 4-input functions 4-input function 3-input function; registered Basic I/O Block (IOB) Structure Three-State FF Enable Clock Set/Reset Output FF Enable D EC SR D EC SR Q Q Three-State Control Output Path Direct Input FF Enable Registered Input Q D EC SR Input Path 2

IOB Functionality IOB provides interface between the package pins and CLBs Each IOB can work as uni- or bi-directional I/O Outputs can be forced into High Impedance Inputs and outputs can be registered advised for high-performance I/O Inputs can be delayed Example: Virtex-E IOB detail 2

Interconnects: Routing Logic blocks embedded in a sea of connection resources CLB = logic block IOB = I/O buffer PSM = programmable switch matrix (switch block) Interconnections critical Transmission gates on paths Flexibility Connect any LB to any other but Much slower than connections within a logic block Much slower than long lines on an ASIC Switch Blocks and Connection Blocks 44 22

Switch Blocks Control = Configuration SRAM cell Stores or 45 Connection Blocks Connection to Output of CLB Connection to Input of CLB 46 23

Example: SRAM-type FPGA Interconnection SB Configuring an FPGA Millions of SRAM cells holding LUTs and Interconnect Routing info Volatile Memory. Loses configuration when board power is turned off Keep Bit Pattern describing the SRAM cells in non-volatile Memory Configuration takes ~ secs JTAG Port Configuration data in Configuration data out Programming Bit File = I/O pin/pad = SRAM cell SRAM JTAG Testing 24

Overview FPGA Devices ASIC vs. FPGA FPGA architecture FPGA Design Flow Synthesis Place Route Typical Digital IC Design Flow Vs. FPGA Design Flow 25

FPGA Generic Design Flow Design Entry: Create your design files using: schematic editor or hardware description language (VHDL, Verilog) Design implementation on FPGA: Partition, place, and route to create bit-stream file Design verification: Use Simulator to check function. Load onto FPGA device (cable connects PC to development board) Check operation at full speed in real environment VHDL description (Your Source Files) Library IEEE; use ieee.std_logic_64.all; use ieee.std_logic_unsigned.all; entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(3 downto ); data_output: out std_logic_vector(3 downto ); out_full: in std_logic; key_input: in std_logic_vector(3 downto ); key_read: out std_logic; ); end AES_core; Functional simulation Synthesis Post-synthesis simulation Implementation Timing simulation Configuration On chip testing 26

Logic Synthesis VHDL description Circuit netlist architecture MLU_DATAFLOW of MLU is signal A:STD_LOGIC; signal B:STD_LOGIC; signal Y:STD_LOGIC; signal MUX_, MUX_, MUX_2, MUX_3: STD_LOGIC; begin A<=A when (NEG_A='') else not A; B<=B when (NEG_B='') else not B; Y<=Y when (NEG_Y='') else not Y; MUX_<=A and B; MUX_<=A or B; MUX_2<=A xor B; MUX_3<=A xnor B; with (L & L) select Y<=MUX_ when "", MUX_ when "", MUX_2 when "", MUX_3 when others; end MLU_DATAFLOW; Implementation After synthesis the entire implementation process is performed by FPGA vendor tools 27

Translation Synthesis Circuit netlist Electronic Design Interchange Format EDIF Timing Constraints Native Constraint File NCF UCF Constraint Editor User Constraint File Translation NGD Native Generic Database file Pin Assignment FPGA B P H3 K2 G5 CLOCK CONTROL() CONTROL() CONTROL(2) RESET top_level_design SEGMENTS() SEGMENTS() SEGMENTS(2) SEGMENTS(3) SEGMENTS(4) SEGMENTS(5) SEGMENTS(6) H2 H6 H5 K3 H K4 G4 28

Circuit netlist Mapping LUT LUT4 LUT LUT2 LUT5 FF LUT3 FF2 29

Placement FPGA CLB SLICES Example placement (VPR tool) 3

Example placement (ISE tool) Routing FPGA Programmable Connections 3

Example routing (VPR tool) Example routing (VPR tool) zoom-in 32

Xilinx FPGA Editor Configuration Once a design is implemented, you must create a file that the FPGA can understand This file is called a bitstream: a BIT file (.bit extension) The BIT file can be downloaded directly to the FPGA, or can be converted into a PROM file which stores the programming information 33

Map report Design Summary -------------- Number of errors: Number of warnings: Logic Utilization: Number of Slice Flip Flops: 3 out of 26,624 % Number of 4 input LUTs: 38 out of 26,624 % Logic Distribution: Number of occupied Slices: 33 out of 3,32 % Number of Slices containing only related logic: 33 out of 33 % Number of Slices containing unrelated logic: out of 33 % *See NOTES below for an explanation of the effects of unrelated logic Total Number 4 input LUTs: 62 out of 26,624 % Number used as logic: 38 Number used as a route-thru: 24 Number of bonded IOBs: out of 22 4% IOB Flip Flops: 7 Number of GCLKs: out of 8 2% Place & route report Asterisk (*) preceding a constraint indicates it was not met. This may be due to a setup or hold violation. ------------------------------------------------------------------------------------------------------ Constraint Requested Actual Logic Absolute Number of Levels Slack errors ------------------------------------------------------------------------------------------------------ * TS_CLOCK = PERIOD TIMEGRP "CLOCK" 5 ns 5.ns 5.4ns 4 -.4ns 5 HIGH 5% ------------------------------------------------------------------------------------------------------ TS_genHz_ClockHz = PERIOD TIMEGRP "gen 5.ns 4.37ns 2.863ns "genhz_clockhz" 5 ns HIGH 5% ------------------------------------------------------------------------------------------------------ 34

Post layout timing report Clock to Setup on destination clock CLOCK ---------------+---------+---------+---------+---------+ Src:Rise Src:Fall Src:Rise Src:Fall Source Clock Dest:Rise Dest:Rise Dest:Fall Dest:Fall ---------------+---------+---------+---------+---------+ CLOCK 5.4 ---------------+---------+---------+---------+---------+ Timing summary: --------------- Timing errors: 9 Score: 543 Constraints cover 574 paths, nets, and 87 connections Design statistics: Minimum period: 5.4ns (Maximum frequency: 94.553MHz) Summary FPGAs are more and more prevalent! They are here to stay! They offer a flexible platform for increasingly complex systems Design automation tools (i.e., CAD tools) take care of the entire design process from VHDL configuration bitstream file 35