Xilinx ASMBL Architecture

Similar documents
Topics. Midterm Finish Chapter 7

Topics. Midterm Finish Chapter 7

ECE 645: Lecture 1. Basic Adders and Counters. Implementation of Adders in FPGAs

HDL Coding Style Xilinx, Inc. All Rights Reserved

FPGA architecture and design technology

TSEA44 - Design for FPGAs

Block RAM. Size. Ports. Virtex-4 and older: 18Kb Virtex-5 and newer: 36Kb, can function as two 18Kb blocks

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

Verilog Sequential Logic. Verilog for Synthesis Rev C (module 3 and 4)

Why Should I Learn This Language? VLSI HDL. Verilog-2

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

CSE140L: Components and Design Techniques for Digital Systems Lab

EEL 4783: HDL in Digital System Design

INTRODUCTION TO FPGA ARCHITECTURE

CSE140L: Components and Design

PINE TRAINING ACADEMY

Synthesis vs. Compilation Descriptions mapped to hardware Verilog design patterns for best synthesis. Spring 2007 Lec #8 -- HW Synthesis 1

ECE 545 Lecture 12. FPGA Resources. George Mason University

DSP Resources. Main features: 1 adder-subtractor, 1 multiplier, 1 add/sub/logic ALU, 1 comparator, several pipeline stages

Verilog for High Performance

EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis

Lecture 3. Behavioral Modeling Sequential Circuits. Registers Counters Finite State Machines

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

Logic Synthesis. EECS150 - Digital Design Lecture 6 - Synthesis

N-input EX-NOR gate. N-output inverter. N-input NOR gate

Review from last time. CS152 Computer Architecture and Engineering Lecture 6. Verilog (finish) Multiply, Divide, Shift

Note: Closed book no notes or other material allowed, no calculators or other electronic devices.

Verilog Fundamentals. Shubham Singh. Junior Undergrad. Electrical Engineering

FPGA for Software Engineers

Digital Design with SystemVerilog

Field Programmable Gate Array (FPGA)

EECS150 - Digital Design Lecture 10 Logic Synthesis

Verilog Tutorial. Introduction. T. A.: Hsueh-Yi Lin. 2008/3/12 VLSI Digital Signal Processing 2

EECS150 - Digital Design Lecture 10 Logic Synthesis

Virtex-II Architecture

Logic Circuits II ECE 2411 Thursday 4:45pm-7:20pm. Lecture 3

Chapter 9: Sequential Logic Modules

Lecture 12 VHDL Synthesis

FPGA: What? Why? Marco D. Santambrogio

Verilog for Synthesis Ing. Pullini Antonio

Field Programmable Gate Array

Chap 6 Introduction to HDL (d)

Introduction to Field Programmable Gate Arrays

ECE 448 Lecture 5. FPGA Devices

DESIGN AND IMPLEMENTATION OF ADDER ARCHITECTURES AND ANALYSIS OF PERFORMANCE METRICS

Writing Circuit Descriptions 8

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I

Altera FLEX 8000 Block Diagram

Synthesis Options FPGA and ASIC Technology Comparison - 1

VHDL for Synthesis. Course Description. Course Duration. Goals

EECS150 - Digital Design Lecture 20 - Finite State Machines Revisited

Verilog 1 - Fundamentals

Don t expect to be able to write and debug your code during the lab session.

Digital Integrated Circuits

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

An easy to read reference is:

Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

CSCB58 - Lab 3. Prelab /3 Part I (in-lab) /2 Part II (in-lab) /2 TOTAL /8

Verilog Module 1 Introduction and Combinational Logic

FPGA Architecture Overview. Generic FPGA Architecture (1) FPGA Architecture

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

Digital Circuit Design and Language. Datapath Design. Chang, Ik Joon Kyunghee University

Verilog 1 - Fundamentals

EECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are:

CDA 4253 FPGA System Design Op7miza7on Techniques. Hao Zheng Comp S ci & Eng Univ of South Florida

Advanced High-level HDL Design Techniques for Programmable Logic


Introduction to Verilog HDL. Verilog 1

ECE 574: Modeling and Synthesis of Digital Systems using Verilog and VHDL. Fall 2017 Final Exam (6.00 to 8.30pm) Verilog SOLUTIONS

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

Stratix II vs. Virtex-4 Performance Comparison

ECE 551: Digital System *

MCMASTER UNIVERSITY EMBEDDED SYSTEMS

Chapter 9: Sequential Logic Modules

Product Obsolete/Under Obsolescence

The Verilog Language COMS W Prof. Stephen A. Edwards Fall 2002 Columbia University Department of Computer Science

Lecture 3: Modeling in VHDL. EE 3610 Digital Systems

In this lecture, we will go beyond the basic Verilog syntax and examine how flipflops and other clocked circuits are specified.

קורס VHDL for High Performance. VHDL

CSE 591: Advanced Hardware Design and Verification (2012 Spring) LAB #0

Modeling Synchronous Logic Circuits. Debdeep Mukhopadhyay IIT Madras

PROJECT REPORT - UART

Speaker: Kayting Adviser: Prof. An-Yeu Wu Date: 2009/11/23

Verilog introduction. Embedded and Ambient Systems Lab

Homework deadline extended to next friday

The Virtex FPGA and Introduction to design techniques

What is Xilinx Design Language?

EE260: Digital Design, Spring 2018

Digital Design (VIMIAA01) Introduction to the Verilog HDL

Designing Safe Verilog State Machines with Synplify

EECS 151/251A Spring 2019 Digital Design and Integrated Circuits. Instructor: John Wawrzynek. Lecture 18 EE141

ECE 448 Lecture 5. FPGA Devices

A Brief Introduction to Verilog Hardware Definition Language (HDL)

EECS150 - Digital Design Lecture 3 - Field Programmable Gate Arrays (FPGAs) Project platform: Xilinx ML

Readings: Storage unit. Can hold an n-bit value Composed of a group of n flip-flops. Each flip-flop stores 1 bit of information.

ENGN1640: Design of Computing Systems Topic 02: Lab Foundations

Advanced FPGA Design. Jan Pospíšil, CERN BE-BI-BP ISOTDAQ 2018, Vienna

ECE Digital Engineering Laboratory. Designing for Synthesis

CS Digital Systems Project Laboratory

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Transcription:

FPGA Structure

Xilinx ASMBL Architecture

Design Flow Synthesis: HDL to FPGA primitives Translate: FPGA Primitives to FPGA Slice components Map: Packing of Slice components into Slices, placement of Slices on fabric Route: connecting Slices Bitstream generation: generating the required configuration bits to load into the FPGA

Configurable Logic Blocks

Slice LUTs 1 CLB = 2 Slices 1 Slice = 4 LUT Modules, 2 MUXF7, 1 MUXF8 A LUT Module contains: 1 LUT6, 1 MUXCY, 1 XORCY, 2 FFs

Slice LUTs 1 CLB = 2 Slices 1 Slice = 4 LUT Modules, 2 MUXF7, 1 MUXF8 A LUT Module contrains: 1 LUT6, 1 MUXCY, 1 XORCY, 2 FFs

Slice LUTs LUTs = Multiplexers Data Inputs from the FPGA configuration bits Selection inputs from user logic For N-input LUT, 2 N configuration bits A B A B 0 0 0 0 1 0 1 0 0 1 1 1 A B 1 0 0 0 MUX 4:1 A B

Slice LUTs Number of LUT inputs is architecture dependent: Xilinx Virtex 4 and earlier: 4-input LUTs (LUT4), 2 LUTs per Slice Xilinx Virtex 5 and later: 6-input LUTs (LUT6), 4 LUTs per Slice Xilinx Virtex 6 and 7-Series: 6-input LUTs, configurable as two shared-input 5-input LUTs Altera: 8-input LUTs (ALM), configurable as combinations of smaller LUTs

Area and Delay vs. LUT Size LUTs have constant delay, regardless of implemented function Example: LUT4 architecture (Virtex4) O = I1 & I2 & I3 & I4 => 1 LUT, 0.43 ns O = I1 & I2 & I3 & I4 & I5 & I6 => 2 LUTs, 0.93 ns As a general rule: LUT delays are small, routing delays can be large

Exercise: Multiplexers Implementation of 2:1, 4:1, 8:1 multiplexers on Virtex-4 architecture Advanced multiplexing techniques: XAPP522

Slice Multiplexer Resources A slice contains multiplexers that mux between LUT outputs (MUXFx) Virtex-4: MUXF5 Virtex-5 and later: MUXF7, MUXF8

Logic optimization using MUXFx Example: LUT4 architecture (Virtex4) O = I1 & I2 & I3 & I4 & I5 => 1 LUT, 1 MUXF5, 0.72ns Use of MUXFx reduces LUT usage and routing requirements Local (in-slice) routing -> more predictable performance

Logic optimization using MUXFx

Multiplexers vs First Index Decoders Example code: always @* if(a) Out = E F; else if(b) Out = E & F; else if(c) Out = E ^ F; else if(d) Out = ~E F; else Out = 1'bx;

Multiplexers vs First Index Decoders

Multiplexers vs. First Index Decoders The synthesis tool cannot always detect properties of the inputs of logic functions Inputs are from pins Inputs are from memories (Block or Distributed) Unconstrained version, for when we know {A,B,C,D} is one-hot: assign sel1 = ~(A B); assign sel0 = B D; always @* case({sel1,sel0}) 2'b00: Out = E F; 2'b01: Out = E & F; 2'b10: Out = E ^ F; 2'b11: Out = ~E F; endcase

Multiplexers vs. First Index Decoders

Exercise: Adder LUT6 Architecture (7-Series) Implement 4-bit adder optimally (carry-chain versus carry look-ahead)

Slice Arithmetic Resources Carry-chain adder implementation

Slice Arithmetic Resources Slice primitives: MUXCY, XORCY

Selecting Adder Type Analysis of various adder types (Design and Performance Analysis of Various Adders using Verilog; M. SaiKumar, P. Samundiswary) Ripple-carry (RCA) has hardware support in FPGA Carry-save (CSA) / Carry-look-ahead (CLA) are first choice for VLSI Assignment: compare RCA/CSA based Popcount Implement Verilog for 16b input Synthesize/Map Record delay and LUT/Slice counts

Adder Inference Synthesis tools will infer RCA Multi-input adder implementation may be controlled through parentheses: E = (A+B)+(C+D) results in tree E= (A+(B+(C+D))) results in cascade

Optimization Example Implementing an increment instruction in a processor ALU in Virtex-4 Behavioral description in ISE 14.2 yields bad results for the following code: always @* if(sel) else sum = a+b+carry_in; sum = a+4'd1+carry_in; 12 Slices, 1.5ns delay. Can we do better?

Optimization Results Using LUT3s instead of LUT2s in the adder structure, we can hardcode a multiplexer that selects between operand b and constant 1 before XOR-ing with operand a The rest of the structure is identical (MUXCY, XORCY) and built with primitive instantiation Results: 2 Slices, 1ns delay

Slice Registers Latch/FlipFlop primitives available within the Slice Available control signals: Set/Reset, Clock Enable, Clock Asynchronous/Synchronous Set/Reset Set/Reset typically refers to synchronous Clear/Preset refers to asynchronous Primitives: FD{R/S/RS/C/P}{E}

Slice Registers Synchronous Asynchronous Set always @(posedge clock) if(set) q<=1; else q<=d; always @(posedge clock or posedge set) if(set) q<=1; else q<=d; Reset always @(posedge clock) if(reset) q<=0; else q<=d; always @(posedge clock or negedge reset) if(set) q<=0; else q<=d;

Using Clock Enables Control signal priority determines implementation of the Flip-Flop primitive Example 1: Set before CE always @(posedge clk) if(set) q<=1; else if(ce) q<=d; Example 2: CE before Set always @(posedge clk) if(ce) if(set) q<=1; else q<=d;

Using Clock Enables Example 1 Results: FDSE Example 2 Results: FDE + LUT

Using Clock Enables Flip-Flop primitives are architecture dependent Example: always @(posedge clk) if(reset) q<=0; else if(set) q<=1; else if(ce) q <= d;

Using Clock Enables Spartan-3: FDRSE Virtex-6: LUT+FDR

Using Clock Enables Another example: always @(posedge clk) if(set) q<=1; else if(reset) q<=0; else if(ce) q <= d;

Using Clock Enables

Async vs Sync Resets Asynchronous resets Act immediately Are clock-independent Induce timing hazards Synchronous resets Ports can be used to minimize logic Predictable

Global Resets We use global resets to: Get the circuit into a known state at power-up Initialize simulations properly (!) Global resets have to be routed to S/R pins of the Flip-Flop FPGAs include a dedicated global set-reset net for post-configuration initialization (GSR)

Using the GSR STARTUP_VIRTEX6 #(.PROG_USR("FALSE") // Activate program event security feature ) STARTUP_VIRTEX6_inst (.CFGCLK(CFGCLK), // 1-bit Configuration main clock output.cfgmclk(cfgmclk), // 1-bit Configuration internal oscillator clock output.dinspi(dinspi), // 1-bit DIN SPI PROM access output.eos(eos), // 1-bit Active high output signal indicating the End Of Configuration..PREQ(PREQ), // 1-bit PROGRAM request to fabric output.tckspi(tckspi), // 1-bit TCK configuration pin access output.clk(clk), // 1-bit User start-up clock input.gsr(gsr), // 1-bit Global Set/Reset input (GSR cannot be used for the port name).gts(gts), // 1-bit Global 3-state input (GTS cannot be used for the port name).keyclearb(keyclearb), // 1-bit Clear AES Decrypter Key input from Battery-Backed RAM (BBRAM).PACK(PACK), // 1-bit PROGRAM acknowledge input.usrcclko(usrcclko), // 1-bit User CCLK input.usrcclkts(usrcclkts), // 1-bit User CCLK 3-state enable input.usrdoneo(usrdoneo), // 1-bit User DONE pin output control.usrdonets(usrdonets) // 1-bit User DONE 3-state enable output );

Avoid Latches! Latches are most commonly generated by unterminated control sequences Missing else statements in if clauses Missing default statements in case clauses To avoid over-constraining the logic, use x values in your else and default statements