ECE 545 Lecture 12. FPGA Embedded Resources 12/8/11. Resources. Recommended reading. Use of Embedded FPGA Resources in SHA-3 Candidates

Similar documents
Lecture 11 Memories in Xilinx FPGAs

ELE432. ADVANCED DIGITAL DESIGN HACETTEPE UNIVERSITY ROUTING and FPGA MEMORY. In part from ECE 448 FPGA and ASIC Design with VHDL

ECE 448 Lecture 13. FPGA Memories. George Mason University

ECE 699: Lecture 9. Programmable Logic Memories

ECE 545: Lecture 11. Programmable Logic Memories

ECE 545: Lecture 11. Programmable Logic Memories. Recommended reading. Memory Types. Memory Types. Memory Types specific to Xilinx FPGAs

ECE 545 Lecture 17 RAM. George Mason University

CDA 4253 FGPA System Design Xilinx FPGA Memories. Hao Zheng Comp Sci & Eng USF

Hardware Design with VHDL Design Example: BRAM ECE 443

CprE 583 Reconfigurable Computing

ECE 545 Lecture 12. FPGA Resources. George Mason University

ECEU530. Last Few Lectures. ECE U530 Digital Hardware Synthesis. What is on Quiz 2. Projects. Today:

Programmable Logic. Simple Programmable Logic Devices

The Virtex FPGA and Introduction to design techniques

EECS150 - Digital Design Lecture 16 - Memory

ECE 645: Lecture 1. Basic Adders and Counters. Implementation of Adders in FPGAs

EECS150 - Digital Design Lecture 16 Memory 1

VHDL: Modeling RAM and Register Files. Textbook Chapters: 6.6.1, 8.7, 8.8, 9.5.2, 11.2

CPE 626 Advanced VLSI Design Lecture 7: VHDL Synthesis

Spartan-6 Libraries Guide for HDL Designs. UG615 (v 14.1) April 24, 2012

Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

ECE 448 Lecture 5. FPGA Devices

ECEU530. Project Presentations. ECE U530 Digital Hardware Synthesis. Rest of Semester. Memory Structures

EE178 Lecture Module 2. Eric Crabill SJSU / Xilinx Fall 2007

HDL Coding Style Xilinx, Inc. All Rights Reserved

Block RAM. Size. Ports. Virtex-4 and older: 18Kb Virtex-5 and newer: 36Kb, can function as two 18Kb blocks

ECE 448 Lecture 5. FPGA Devices

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 15 Memories

IP cores. V. Angelov

Field Programmable Gate Array

Lecture 7. Standard ICs FPGA (Field Programmable Gate Array) VHDL (Very-high-speed integrated circuits. Hardware Description Language)

Digital System Construction

ECE 545 Lecture 8. Data Flow Description of Combinational-Circuit Building Blocks. George Mason University

Chapter 8 FPGA Basics

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Topics. Midterm Finish Chapter 7

Xilinx 7 Series FPGA Libraries Guide for HDL Designs. UG768 (v 13.4) January 18, 2012

CPE 626 Advanced VLSI Design Lecture 6: VHDL Synthesis. Register File: An Example. Register File: An Example (cont d) Aleksandar Milenkovic

Midterm Exam. Solutions

Digital Design Laboratory Lecture 2

Spartan-6 FPGA Block RAM Resources

Virtex-5 Libraries Guide for HDL Designs. UG621 (v 14.1) April 24, 2012

Virtex-II Architecture

Architecture by Xilinx, Inc. All rights reserved.

Topics. Midterm Finish Chapter 7

COVER SHEET: Total: Regrade Info: 5 (5 points) 2 (8 points) 6 (10 points) 7b (13 points) 7c (13 points) 7d (13 points)

EECS150 - Digital Design Lecture 13 - Project Description, Part 2: Memory Blocks. Project Overview

Laboratory Memory Components

VHDL And Synthesis Review

Implementing Multipliers in Xilinx Virtex II FPGAs

Vivado Design Suite 7 Series FPGA Libraries Guide. UG953 (v ) July 25, 2012

Xilinx 7 Series FPGA and Zynq-7000 All Programmable SoC Libraries Guide for HDL Designs

COVER SHEET: Total: Regrade Info: 2 (6 points) 3 (8 points) 4 (10 points) 8 (12 points) 6 (6 points) 7 (6 points) 9 (30 points) 10 (4 points)

Parameterizable LocalLink FIFO Author: Wen Ying Wei, Dai Huang

ECE 448 Lecture 3. Combinational-Circuit Building Blocks. Data Flow Modeling of Combinational Logic

Virtex-5 Libraries Guide for HDL Designs. UG621 (v 13.1) March 1, 2011

Vivado Design Suite 7 Series FPGA and Zynq-7000 All Programmable SoC Libraries Guide. UG953 (v2014.2) June 4, 2014

Spartan-6 Libraries Guide for HDL Designs. UG615 (v 13.1) March 1, 2011

INTRODUCTION TO VHDL ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás. Additional reading: - ARQUITECTURAS AVANÇADAS DE COMPUTADORES (AAC)

ECE 545 Lecture 6. Behavioral Modeling of Sequential-Circuit Building Blocks. George Mason University

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

Summary. Introduction. Application Note: Virtex, Virtex-E, Spartan-IIE, Spartan-3, Virtex-II, Virtex-II Pro. XAPP152 (v2.1) September 17, 2003

Field Programmable Gate Array (FPGA)

Design Recommendations User Guide

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

Design Guidelines for Using DSP Blocks

ECE 448 Lecture 3. Combinational-Circuit Building Blocks. Data Flow Modeling of Combinational Logic

Course Overview Revisited

Laboratory Exercise 8

Guidelines to Migrating Spartan Designs to Cyclone Designs

INTRODUCTION TO FPGA ARCHITECTURE

Asynchronous FIFO Design

Digital Systems Design

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

TSEA44 - Design for FPGAs

13. Recommended HDL Coding Styles

COVER SHEET: Total: Regrade Info: 5 (14 points) 7 (15 points) Midterm 1 Spring 2012 VERSION 1 UFID:

FPGAs: FAST TRACK TO DSP

Summary of FPGA & VHDL

Design Guidelines for Using DSP Blocks

FSM Components. FSM Description. HDL Coding Methods. Chapter 7: HDL Coding Techniques

Lecture 12 VHDL Synthesis

Timing in synchronous systems

Assignment. Last time. Last time. ECE 4514 Digital Design II. Back to the big picture. Back to the big picture

Review: Timing. EECS Components and Design Techniques for Digital Systems. Lec 13 Storage: Regs, SRAM, ROM. Outline.

Design Problem 5 Solution

LogiCORE IP Block Memory Generator v6.1

IE1204 Digital Design L7: Combinational circuits, Introduction to VHDL

MAX 10. Memory Modules

Logic Synthesis. EECS150 - Digital Design Lecture 6 - Synthesis

ECE 448 Lecture 4. Sequential-Circuit Building Blocks. Mixing Description Styles

EECS150 - Digital Design Lecture 10 Logic Synthesis

Reconfigurable Hardware Design (coursework)

LogiCORE IP Block Memory Generator v7.1

Synthesis vs. Compilation Descriptions mapped to hardware Verilog design patterns for best synthesis. Spring 2007 Lec #8 -- HW Synthesis 1

COE 405, Term 062. Design & Modeling of Digital Systems. HW# 1 Solution. Due date: Wednesday, March. 14

CARDBUS INTERFACE USER MANUAL

VHDL for Complex Designs

COVER SHEET: Total: Regrade Info: Problem#: Points. 7 (14 points) 6 (7 points) 9 (6 points) 10 (21 points) 11 (4 points)

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

Transcription:

ECE 545 Lecture 12 FPGA Embedded Resources Resources FPGA Embedded Resources web page available from the course web page George Mason University 2 Recommended reading XAPP463 Using Block RAM in Spartan-3 Generation FPGAs Google search: XAPP463 XAPP464 Using Look-Up Tables as Distributed RAM in Spartan-3 Generation FPGAs Google search: XAPP464 XST User Guide, Section: Coding Techniques Google search: XST User Guide (PDF) http://www.xilinx.com/itp/xilinx4/data/docs/xst/hdlcode.html (HTML) ISE In-Depth Tutorial, Section: Creating a CORE Generator Module Google search: ISE In-Depth Tutorial Use of Embedded FPGA Resources in SHA-3 Candidates ECE 448 FPGA and ASIC Design with VHDL 3 ECE 448 FPGA and ASIC Design with VHDL Xilinx FPGA Devices Technology Low- cost High- performance 120/150 nm Virtex 2, 2 Pro 90 nm Spartan 3 Virtex 4 65 nm Virtex 5 45 nm Spartan 6 40 nm Virtex 6 Altera FPGA Devices Technology Low- cost Mid- range High- performance 130 nm Cyclone StraKx 90 nm Cyclone II StraKx II 65 nm Cyclone III Arria I StraAx III 40 nm Cyclone IV Arria II StraAx IV 1

FPGA Embedded Resources Embedded Multipliers ECE 448 FPGA and ASIC Design with VHDL ECE 448 FPGA and ASIC Design with VHDL Multipliers in Spartan 3 The Design Warrior s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com) ECE 448 FPGA and ASIC Design with VHDL 10 Number of Multipliers per Spartan 3 Device Combinational and Registered Multiplier 11 ECE 448 FPGA and ASIC Design with VHDL 12 2

Dedicated Multiplier Block Interface of a Dedicated Multiplier ECE 448 FPGA and ASIC Design with VHDL 13 ECE 448 FPGA and ASIC Design with VHDL 14 Unsigned vs. Signed Multiplication Unsigned Signed 1111 x 1111 15 x 15 1111 x 1111-1 x -1 Cyclone II 11100001 225 00000001 1 ECE 448 FPGA and ASIC Design with VHDL 15 Embedded MulKplier Block Overview Number of Embedded MulKpliers Each Cyclone II has one to three columns of embedded multipliers. Each embedded multiplier can be configured to support One 18 x 18 multiplier Two 9 x 9 multipliers 3

MulKplier Block Architecture Two MulKplier Types MulKplier Stage Signals signa and signb are used to idenkfy the signed and unsigned inputs. 3 Ways to Use Dedicated Hardware Three (3) ways to use dedicated (embedded) hardware Inference Instantiation CoreGen in Xilinx MegaWizard Plug-In Manager in Altera 22 library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; Inferred Multiplier entity mult18x18 is generic ( word_size : natural := 18; signed_mult : boolean := true); port ( clk : in std_logic; a : in std_logic_vector(word_size-1 downto 0); b : in std_logic_vector(word_size-1 downto 0); c : out std_logic_vector(2*word_size-1 downto 0)); end entity mult18x18; architecture infer of mult18x18 is process(clk) if rising_edge(clk) then if signed_mult then c <= std_logic_vector(signed(a) * signed(b)); else c <= std_logic_vector(unsigned(a) * unsigned(b)); end if; end if; end process; end architecture infer; Forcing a particular implementation in VHDL Synthesis tool: Xilinx XST Attribute MULT_STYLE: string; Attribute MULT_STYLE of c: signal is block; Allowed values of the attribute: block dedicated multiplier lut - LUT-based multiplier pipe_block pipelined dedicated multiplier pipe_lut pipelined LUT-based multiplier auto automatic choice by the synthesis tool 4

Instantiation for Spartan 3 FPGAs CORE Generator Xilinx XtremeDSP DSP Units Starting with Virtex 4 family, Xilinx introduced DSP48 block for high-speed DSP on FPGAs Essentially a multiply-accumulate core with many other features Now also in Spartan-3A, Spartan 6, Virtex 5, and Virtex 6 ECE 448 FPGA and ASIC Design with VHDL 28 DSP48 Slice: Virtex 4 Simplified Form of DSP48 Adder Out = (Z ± (X + Y + CIN)) 29 30 5

Choosing Inputs to DSP Adder DSP48E Slice : Virtex5 P = Adder Out = (Z ± (X + Y + CIN)) 31 32 New in Virtex 5 Compared to Virtex 4 Stratix III DSP Unit 33 Memory Types Memory RAM ROM Embedded Memories Single port Memory Dual port Memory ECE 448 FPGA and ASIC Design with VHDL With asynchronous read With synchronous read 36 6

Memory Types in Xilinx Memory Memory Types in Altera Memory Distributed (MLUT-based) Memory Block RAM-based (BRAM-based) Distributed (ALUT-based, Stratix III onwards) Small size (512) Memory block-based Medium size Large size (4K, 9K, 20K) (144K, 512K) Inferred Manually Instantiated Using Core Generator Inferred Memory Instantiated 37 Manually Using MegaWizard Plug-In Manager 38 Inference vs. Instantiation 39 40 CLB Slice COUT YB FPGA Distributed Memory G4 G3 G2 G1 F5IN BY SR F4 F3 F2 F1 Look-Up O Table Look-Up Table O Y Carry & Control Logic XB X Carry & Control Logic S D CK EC R S D CK EC R Q Q CIN CLK CE SLICE 41 42 7

Xilinx Multipurpose LUT Distributed RAM The Design Warrior s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com) CLB LUT configurable as Distributed RAM An LUT equals 16x1 RAM Cascade LUTs to increase RAM size Synchronous write Asynchronous read Can create a synchronous read by using extra flip-flops Naturally, distributed RAM read is asynchronous Two LUTs can make 32 x 1 single-port RAM 16 x 2 single-port RAM 16 x 1 dual-port RAM LUT LUT LUT = RAM32X1S D WE WCLK A0 O A1 A2 A3 A4 or = RAM16X2S D0 D1 WE WCLK O0 A0 O1 A1 A2 A3 or RAM16X1S D WE WCLK A0 O A1 A2 A3 RAM16X1D D WE WCLK A0 SPO A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3 43 44 Block RAM Port A Spartan-3 Dual-Port Block RAM Port B FPGA Block RAM Block RAM Most efficient memory implementation Dedicated blocks of memory Ideal for most memory requirements 4 to 104 memory blocks 18 kbits = 18,432 bits per block (16 k without parity bits) Use multiple blocks for larger memories Builds both single and true dual-port RAMs Synchronous write and read (different from distributed RAM) 45 46 RAM Blocks and Multipliers in Xilinx FPGAs Spartan-3E Block RAM Amounts The Design Warrior s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com) 47 48 8

Block RAM can have various configurations (port aspect ratios) Block RAM Port Aspect Ratios 0 1 0 2 0 4 8k x 2 4k x 4 16k x 1 8,191 0 2047 4,095 8+1 2k x (8+1) 16,383 0 1023 16+2 1024 x (16+2) 49 50 Single-Port Block RAM Dual-Port Block RAM DOA[w A-p A-1:0] DIA[w A-p A-1:0] DI[w-p-1:0] DO[w-p-1:0] DOA[w B-p B-1:0] DIB[w B-p B-1:0] 51 52 Dual-Port Bus Flexibility Two Independent Single-Port RAMs Port A In 1K-Bit Depth Port B In 2k-Bit Depth WEA ENA RSTA CLKA ADDRA[9:0] DIA[17:0] WEB ENB RSTB CLKB ADDRB[10:0] DIB[8:0] RAMB4_S18_S9 DOA[17:0] DOB[8:0] Port A Out 18-Bit Width Port B Out 9-Bit Width Each port can be configured with a different data bus width Provides easy data width conversion without any additional logic Port A In 8K-Bit Depth 0, ADDR[12:0] Port B In 8K-Bit Depth 1, ADDR[12:0] WEA ENA RSTA CLKA ADDRA[12:0] DIA[0] WEB ENB RSTB CLKB ADDRB[12:0] DIB[0] Added advantage of True Dual-Port No wasted RAM Bits Can split a Dual-Port 16K RAM into two Single-Port 8K RAM Simultaneous independent access to each RAM RAMB4_S1_S1 DOA[0] DOB[0] Port A Out 1-Bit Width Port B Out 1-Bit Width To access the lower RAM Tie the MSB address bit to Logic Low To access the upper RAM Tie the MSB address bit to Logic High 53 54 9

Cyclone II Memory Blocks The embedded memory structure consists of columns of M4K memory blocks that can be configured as RAM, first-in first-out (FIFO) buffers, and ROM Memory Modes The M4K memory blocks support the following modes:! Single-port RAM (RAM:1-Port)! Simple dual-port RAM (RAM: 2-Port)! True dual-port RAM (RAM:2-Port)! Tri-port RAM (RAM:3-Port)! Single-port ROM (ROM:1-Port)! Dual-port ROM (ROM:2-Port)! Single- Port ROM StraAx II TriMatrix Memory The address lines of the ROM are registered The outputs can be registered or unregistered A.mif file is used to initialize the ROM contents StraAx II TriMatrix Memory StraAx III & StraAx IV TriMatrix Memory 10

StraAx II & III ShiI- Register Memory ConfiguraAon Test Circuit Example ECE 448 FPGA and ASIC Design with VHDL test_circuit: ATHENa Example including embedded FPGA resources Generic MulAplier (1) enkty mult is generic ( vendor : integer := XILINX - - vendor : XILINX=0, ALTERA=1 mulkplier_type : integer:= MUL_DEDICATED; - - mulkplier_type : MUL_LOGIC_BASED=0, MUL_DSP_BASED=1 WIDTH : integer := 8 - - width : width (fixed width for input and output) ); port ( a : in std_logic_vector (WIDTH- 1 downto 0); b : in std_logic_vector (WIDTH- 1 downto 0); s : out std_logic_vector (WIDTH- 1 downto 0) ); end mult; Generic MulAplier (2) architecture mult of mult is xil_dsp_mult_gen : if (mulkplier_type = MUL_DEDICATED and vendor = XILINX) generate mult_xil: enkty work.mult(xilinx_dsp) generic map ( WIDTH => WIDTH ) port map (a => a, b => b, s => s ); end gen xil_logic_mult_gen : if (mulkplier_type=mul_logic_based and vendor = XILINX) generate mult_xil: enkty work.mult(xilinx_logic) generic map ( WIDTH => WIDTH ) port map (a => a, b => b, s => s ); end generate; alt_dsp_mult_gen : if (mulkplier_type=mul_dedicated and vendor = ALTERA) generate mult_alt: enkty work.mult(altera_dsp) generic map ( WIDTH => WIDTH ) port map (a => a, b => b, s => s ); end generate; alt_logic_mult_gen : if (mulkplier_type=mul_logic_based and vendor = ALTERA) generate mult_alt: enkty work.mult(altera_logic) generic map ( WIDTH => WIDTH ) port map (a => a, b => b, s => s ); end generate; end mult; 11

Generic MulAplier (3) architecture xilinx_logic of mult is signal temp1 : std_logic_vector(2*width - 1 downto 0); afribute mult_style : string ; afribute mult_style of temp1: signal is "lut ; temp1 <= STD_LOGIC_VECTOR(unsigned(a) * unsigned(b)); s <= temp1(width- 1 downto 0); end xilinx_logic; architecture xilinx_dsp of mult is signal temp2 : std_logic_vector(2*width - 1 downto 0); afribute mult_style : string ; afribute mult_style of temp2: signal is "block ; temp2 <= STD_LOGIC_VECTOR(unsigned(a) * unsigned(b)); s <= temp2(width- 1 downto 0); end xilinx_dsp; Generic MulAplier (4) architecture altera_logic of mult is signal temp : std_logic_vector(2*width - 1 downto 0); afribute multstyle : string ; afribute multstyle of altera_logic : architecture is "logic ; temp <= STD_LOGIC_VECTOR(unsigned(a) * unsigned(b)); s <= temp(width- 1 downto 0); end altera_logic; architecture altera_dsp of mult is signal temp : std_logic_vector(2*width - 1 downto 0); afribute multstyle : string ; afribute multstyle of altera_dsp : architecture is "dsp"; temp <= STD_LOGIC_VECTOR(unsigned(a) * unsigned(b)); s <= temp(width- 1 downto 0); end altera_dsp; CLB Slice COUT YB FPGA Distributed Memory G4 G3 G2 G1 F5IN BY SR F4 F3 F2 F1 Look-Up O Table Look-Up Table O Y Carry & Control Logic XB X Carry & Control Logic S D CK EC R S D CK EC R Q Q CIN CLK CE SLICE 69 70 Xilinx Multipurpose LUT Distributed RAM CLB LUT configurable as Distributed RAM An LUT equals 16x1 RAM Cascade LUTs to increase RAM size Synchronous write Asynchronous read Can create a synchronous read by using extra flip-flops Naturally, distributed RAM read is asynchronous Two LUTs can make 32 x 1 single-port RAM 16 x 2 single-port RAM 16 x 1 dual-port RAM LUT LUT LUT = RAM32X1S D WE WCLK A0 O A1 A2 A3 A4 or = RAM16X2S D0 D1 WE WCLK O0 A0 O1 A1 A2 A3 or RAM16X1S D WE WCLK A0 O A1 A2 A3 RAM16X1D D WE WCLK A0 SPO A1 A2 A3 DPRA0 DPO The Design Warrior s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com) DPRA1 DPRA2 DPRA3 71 72 12

Inference vs. Instantiation 73 74 Distributed versus Block RAM Inference Examples: 1. Distributed single-port RAM with asynchronous read Generic Inferred RAM 2. Distributed dual-port RAM with asynchronous read 3. Distributed single-port RAM with "false" synchronous read 4. Block RAM with synchronous read (no version with asynchronous read!) More excellent RAM examples from XST Coding Guidelines: http://toolbox.xilinx.com/docsan/xilinx4/data/docs/xst/hdlcode.html (Click on RAMs) 75 76 Distributed RAM with asynchronous read Distributed single-port RAM with asynchronous read LIBRARY ieee; USE ieee.std_logic_1164.all; USE ieee.std_logic_arith.all; entity raminfr is generic ( data_bits : integer := 32; -- number of bits per RAM word addr_bits : integer := 3); -- 2^addr_bits = number of words in RAM port (clk : in std_logic; we : in std_logic; a : in std_logic_vector(addr_bits-1 downto 0); di : in std_logic_vector(data_bits-1 downto 0); do : out std_logic_vector(data_bits-1 downto 0)); end raminfr; 77 78 13

Distributed single-port RAM with asynchronous read architecture behavioral of raminfr is type ram_type is array (2**addr_bits-1 downto 0) of std_logic_vector (data_bits-1 downto 0); signal RAM : ram_type; process (clk) if (clk'event and clk = '1') then if (we = '1') then RAM(conv_integer(unsigned(a))) <= di; end if; end if; end process; do <= RAM(conv_integer(unsigned(a))); end behavioral; Report from Synthesis Resource Usage Report for raminfr Mapping to part: xc3s50pq208-5 Cell usage: GND 1 use RAM16X4S 8 uses I/O ports: 69 I/O primitives: 68 IBUF 36 uses OBUF 32 uses BUFGP 1 use I/O Register bits: 0 Register bits not including I/Os: 0 (0%) RAM/ROM usage summary Single Port Rams (RAM16X4S): 8 Global Clock Buffers: 1 of 8 (12%) 79 Mapping Summary: Total LUTs: 32 (2%) 80 Report from Implementation Distributed dual-port RAM with asynchronous read Design Summary: Number of errors: 0 Number of warnings: 0 Logic Utilization: Logic Distribution: Number of occupied Slices: 16 out of 768 2% Number of Slices containing only related logic: 16 out of 16 100% Number of Slices containing unrelated logic: 0 out of 16 0% *See NOTES below for an explanation of the effects of unrelated logic Total Number of 4 input LUTs: 32 out of 1,536 2% Number used as 16x1 RAMs: 32 Number of bonded IOBs: 69 out of 124 55% Number of GCLKs: 1 out of 8 12% 81 82 Distributed dual-port RAM with asynchronous read Distributed dual-port RAM with asynchronous read library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; use ieee.std_logic_arith.all; entity raminfr is generic ( data_bits : integer := 32; -- number of bits per RAM word addr_bits : integer := 3); -- 2^addr_bits = number of words in RAM port (clk : in std_logic; we : in std_logic; a : in std_logic_vector(addr_bits-1 downto 0); dpra : in std_logic_vector(addr_bits-1 downto 0); di : in std_logic_vector(data_bits-1 downto 0); spo : out std_logic_vector(data_bits-1 downto 0); dpo : out std_logic_vector(data_bits-1 downto 0)); end raminfr; architecture syn of raminfr is type ram_type is array (2**addr_bits-1 downto 0) of std_logic_vector (data_bits-1 downto 0); signal RAM : ram_type; process (clk) if (clk'event and clk = '1') then if (we = '1') then RAM(conv_integer(unsigned(a))) <= di; end if; end if; end process; spo <= RAM(conv_integer(unsigned(a))); dpo <= RAM(conv_integer(unsigned(dpra))); end syn; 83 84 14

Report from Synthesis Report from Implementation Resource Usage Report for raminfr Mapping to part: xc3s50pq208-5 Cell usage: GND 1 use I/O ports: 104 I/O primitives: 103 IBUF 39 uses OBUF 64 uses BUFGP 1 use I/O Register bits: 0 Register bits not including I/Os: 0 (0%) RAM/ROM usage summary Dual Port Rams (RAM16X1D): 32 Global Clock Buffers: 1 of 8 (12%) Mapping Summary: Total LUTs: 64 (4%) Design Summary: Number of errors: 0 Number of warnings: 0 Logic Utilization: Logic Distribution: Number of occupied Slices: 32 out of 768 4% Number of Slices containing only related logic: 32 out of 32 100% Number of Slices containing unrelated logic: 0 out of 32 0% *See NOTES below for an explanation of the effects of unrelated logic Total Number of 4 input LUTs: 64 out of 1,536 4% Number used for Dual Port RAMs: 64 (Two LUTs used per Dual Port RAM) Number of bonded IOBs: 104 out of 124 83% Number of GCLKs: 1 out of 8 12% 85 86 Distributed RAM with "false" synchronous read Distributed RAM with "false" synchronous read LIBRARY ieee; USE ieee.std_logic_1164.all; USE ieee.std_logic_arith.all; USE ieee.std_logic_unsigned.all; entity raminfr is generic ( data_bits : integer := 32; -- number of bits per RAM word addr_bits : integer := 3); -- 2^addr_bits = number of words in RAM port (clk : in std_logic; we : in std_logic; a : in std_logic_vector(addr_bits-1 downto 0); di : in std_logic_vector(data_bits-1 downto 0); do : out std_logic_vector(data_bits-1 downto 0)); end raminfr; 87 88 Distributed RAM with "false" synchronous read architecture behavioral of raminfr is type ram_type is array (2**addr_bits-1 downto 0) of std_logic_vector (data_bits-1 downto 0); signal RAM : ram_type; process (clk) if (clk'event and clk = '1') then if (we = '1') then RAM(conv_integer(unsigned(a))) <= di; end if; do <= RAM(conv_integer(unsigned(a))); end if; end process; end behavioral; Report from Synthesis Resource Usage Report for raminfr Mapping to part: xc3s50pq208-5 Cell usage: FD 32 uses GND 1 use RAM16X4S 8 uses I/O ports: 69 I/O primitives: 68 IBUF 36 uses OBUF 32 uses BUFGP 1 use I/O Register bits: 0 Register bits not including I/Os: 32 (2%) RAM/ROM usage summary Single Port Rams (RAM16X4S): 8 Global Clock Buffers: 1 of 8 (12%) 89 Mapping Summary: Total LUTs: 32 (2%) 90 15

Report from Implementation Block RAM with synchronous read Design Summary: Number of errors: 0 Number of warnings: 0 Logic Utilization: Number of Slice Flip Flops: 32 out of 1,536 2% Logic Distribution: Number of occupied Slices: 16 out of 768 2% Number of Slices containing only related logic: 16 out of 16 100% Number of Slices containing unrelated logic: 0 out of 16 0% *See NOTES below for an explanation of the effects of unrelated logic Total Number of 4 input LUTs: 32 out of 1,536 2% Number used as 16x1 RAMs: 32 Number of bonded IOBs: 69 out of 124 55% Number of GCLKs: 1 out of 8 12% Total equivalent gate count for design: 4,355 91 92 Block RAM with synchronous read (write first mode) LIBRARY ieee; USE ieee.std_logic_1164.all; USE ieee.std_logic_arith.all; entity raminfr is generic ( data_bits : integer := 32; -- number of bits per RAM word addr_bits : integer := 3); -- 2^addr_bits = number of words in RAM port (clk : in std_logic; we : in std_logic; a : in std_logic_vector(addr_bits-1 downto 0); di : in std_logic_vector(data_bits-1 downto 0); do : out std_logic_vector(data_bits-1 downto 0)); end raminfr; Block RAM with synchronous read (write first mode) cont'd architecture behavioral of raminfr is type ram_type is array (2**addr_bits-1 downto 0) of std_logic_vector (data_bits-1 downto 0); signal RAM : ram_type; signal read_a : std_logic_vector(addr_bits-1 downto 0); process (clk) if (clk'event and clk = '1') then if (we = '1') then RAM(conv_integer(unsigned(a))) <= di; end if; read_a <= a; end if; end process; do <= RAM(conv_integer(unsigned(read_a))); end behavioral; 93 94 Block RAM Waveforms WRITE_FIRST mode Report from Synthesis Resource Usage Report for raminfr Mapping to part: xc3s50pq208-5 Cell usage: GND 1 use RAMB16_S36 1 use VCC 1 use I/O ports: 69 I/O primitives: 68 IBUF 36 uses OBUF 32 uses BUFGP 1 use I/O Register bits: 0 Register bits not including I/Os: 0 (0%) RAM/ROM usage summary Block Rams : 1 of 4 (25%) Global Clock Buffers: 1 of 8 (12%) 95 Mapping Summary: Total LUTs: 0 (0%) 96 16

Report from Implementation Design Summary: Number of errors: 0 Number of warnings: 0 Logic Utilization: Logic Distribution: Number of Slices containing only related logic: 0 out of 0 0% Number of Slices containing unrelated logic: 0 out of 0 0% *See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs: 69 out of 124 55% Number of Block RAMs: 1 out of 4 25% Number of GCLKs: 1 out of 8 12% Generic Inferred ROM 97 98 Distributed ROM with asynchronous read Distributed ROM with asynchronous read LIBRARY ieee; USE ieee.std_logic_1164.all; USE ieee.std_logic_arith.all; USE ieee.std_logic_unsigned.all; entity rominfr is generic ( data_bits : integer := 10; -- number of bits per ROM word addr_bits : integer := 3); -- 2^addr_bits = number of words in ROM port (a : in std_logic_vector(addr_bits-1 downto 0); do : out std_logic_vector(data_bits-1 downto 0)); end rominfr; architecture behavioral of rominfr is type rom_type is array (2**addr_bits-1 downto 0) of std_logic_vector (data_bits-1 downto 0); constant ROM : rom_type := ("0000110001", "0100110100", "0100110110", "0110110000", "0000111100", "0111110101", "0100110100", "1111100111"); do <= ROM(conv_integer(unsigned(a))); end behavioral; 99 100 CORE Generator Using CORE Generator 101 17

CORE Generator FPGA specific memories (Instantiation) 104 RAM 16x1 (1) library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.all; entity RAM_16X1_DISTRIBUTED is port( CLK : in STD_LOGIC; WE : in STD_LOGIC; ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_IN : in STD_LOGIC; DATA_OUT : out STD_LOGIC ); end RAM_16X1_DISTRIBUTED; RAM 16x1 (2) architecture RAM_16X1_DISTRIBUTED_STRUCTURAL of RAM_16X1_DISTRIBUTED is -- part used by the synthesis tool, Synplify Pro, only; -- ignored during simulation attribute INIT : string; attribute INIT of RAM_16x1s_1: label is "0000 ; component ram16x1s generic( INIT : BIT_VECTOR(15 downto 0) := X"0000"); port( O : out std_ulogic; -- note std_ulogic not std_logic A0 : in std_ulogic; A1 : in std_ulogic; A2 : in std_ulogic; A3 : in std_ulogic; D : in std_ulogic; WCLK : in std_ulogic; WE : in std_ulogic); end component; 105 106 RAM 16x1 (3) RAM 16x8 (1) RAM_16x1s_1: ram16x1s generic map (INIT => X"0000") port map (O => DATA_OUT, A0 => ADDR(0), A1 => ADDR(1), A2 => ADDR(2), A3 => ADDR(3), D => DATA_IN, WCLK => CLK, WE => WE ); end RAM_16X1_DISTRIBUTED_STRUCTURAL; library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.all; entity RAM_16X8_DISTRIBUTED is port( CLK : in STD_LOGIC; WE : in STD_LOGIC; ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_IN : in STD_LOGIC_VECTOR(7 downto 0); DATA_OUT : out STD_LOGIC_VECTOR(7 downto 0) ); end RAM_16X8_DISTRIBUTED; 107 108 18

RAM 16x8 (2) architecture RAM_16X8_DISTRIBUTED_STRUCTURAL of RAM_16X8_DISTRIBUTED is -- part used by the synthesis tool, Synplify Pro, only; -- ignored during simulation attribute INIT : string; attribute INIT of RAM_16x1s_1: label is "0000"; component ram16x1s generic( INIT : BIT_VECTOR(15 downto 0) := X"0000"); port( O : out std_ulogic; A0 : in std_ulogic; A1 : in std_ulogic; A2 : in std_ulogic; A3 : in std_ulogic; D : in std_ulogic; WCLK : in std_ulogic; WE : in std_ulogic); end component; RAM 16x8 (3) GENERATE_MEMORY: for I in 0 to 7 generate RAM_16x1_S_1: ram16x1s generic map (INIT => X"0000") port map (O => DATA_OUT(I), A0 => ADDR(0), A1 => ADDR(1), A2 => ADDR(2), A3 => ADDR(3), D => DATA_IN(I), WCLK => CLK, WE => WE ); end generate; end RAM_16X8_DISTRIBUTED_STRUCTURAL; 109 110 ROM 16x1 (1) library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.all; entity ROM_16X1_DISTRIBUTED is port( ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_OUT : out STD_LOGIC ); end ROM_16X1_DISTRIBUTED; ROM 16x1 (2) architecture ROM_16X1_DISTRIBUTED_STRUCTURAL of ROM_16X1_DISTRIBUTED is -- part used by the synthesis tool, Synplify Pro, only; -- ignored during simulation attribute INIT : string; attribute INIT of rom16x1s_1: label is "F0C1"; component ram16x1s generic( INIT : BIT_VECTOR(15 downto 0) := X"0000"); port( O : out std_ulogic; A0 : in std_ulogic; A1 : in std_ulogic; A2 : in std_ulogic; A3 : in std_ulogic; D : in std_ulogic; WCLK : in std_ulogic; WE : in std_ulogic); end component; signal Low : std_ulogic := '0'; 111 112 ROM 16x1 (3) Block RAM library components rom16x1s_1: ram16x1s generic map (INIT => X"F0C1") port map (O=>DATA_OUT, A0=>ADDR(0), A1=>ADDR(1), A2=>ADDR(2), A3=>ADDR(3), D=>Low, WCLK=>Low, WE=>Low ); end ROM_16X1_DISTRIBUTED_STRUCTURAL; Component Data Cells Parity Cells Address Bus Data Bus Parity Bus Depth Width Depth Width RAMB16_S1 16384 1 - - (13:0) (0:0) - RAMB16_S2 8192 2 - - (12:0) (1:0) - RAMB16_S4 4096 4 - - (11:0) (3:0) - RAMB16_S9 2048 8 2048 1 (10:0) (7:0) (0:0) RAMB16_S18 1024 16 1024 2 (9:0) (15:0) (1:0) RAMB16_S36 512 32 512 4 (8:0) (31:0) (3:0) 113 114 19

Component declaration for BRAM (1) Genaral template of BRAM instantiation (1) -- Component Declaration for RAMB16_S1 -- Should be placed after architecture statement but before component RAMB16_S1 -- synthesis translate_off generic ( INIT : bit_vector := X"0"; INIT_00 : bit_vector := X"0000000000000000000000000000000000000000000000000000000000000000"; INIT_3F : bit_vector := X"0000000000000000000000000000000000000000000000000000000000000000"; SRVAL : bit_vector := X"0"; WRITE_MODE : string := "WRITE_FIRST"); -- synthesis translate_on port (DO : out STD_LOGIC_VECTOR (0 downto 0) ADDR : in STD_LOGIC_VECTOR (13 downto 0); CLK : in STD_ULOGIC; DI : in STD_LOGIC_VECTOR (0 downto 0); EN : in STD_ULOGIC; SSR : in STD_ULOGIC; WE : in STD_ULOGIC); end component; -- Component Attribute Specification for RAMB16_{S1 S2 S4} -- Should be placed after architecture declaration but before the -- Put attributes, if necessary -- Component Instantiation for RAMB16_{S1 S2 S4} -- Should be placed in architecture after the keyword RAMB16_{S1 S2 S4}_INSTANCE_NAME : RAMB16_S1 -- synthesis translate_off generic map ( INIT => bit_value, INIT_00 => vector_value, INIT_01 => vector_value,.. INIT_3F => vector_value, SRVAL=> bit_value, WRITE_MODE => user_write_mode) -- synopsys translate_on port map (DO => user_do, ADDR => user_addr, CLK => user_clk, DI => user_di, EN => user_en, SSR => user_ssr, WE => user_we); 115 116 Initializing Block RAMs 1024x16 Component declaration for BRAM (2) INIT_00 : BIT_VECTOR := X"014A0C0F09170A04076802A800260205002A01C5020A0917006A006800060040"; INIT_01 : BIT_VECTOR := X"000000000000000008000A1907070A1706070A020026014A0C0F03AA09170026"; INIT_02 : BIT_VECTOR := X"0000000000000000000000000000000000000000000000000000000000000000"; INIT_03 : BIT_VECTOR := X"0000000000000000000000000000000000000000000000000000000000000000"; INIT_3F : BIT_VECTOR := X"0000000000000000000000000000000000000000000000000000000000000000") INIT_00 ADDRESS INIT_01 ADDRESS INIT_3F ADDRESS 014A 0F 0000 1F 0000 FF 0C0F 0E 0000 1E 0000 FE DATA ADDRESS Addresses are shown in red and data corresponding to the same memory location is shown in black 0917 04 014A 14 0000 F4 006A 03 0C0F 13 0000 F3 0068 02 03AA 12 0000 F2 0006 01 0917 11 0000 F1 0040 00 0026 10 0000 F0 117 VHDL Instantiation Template for RAMB16_S9, S18 and S36 -- Component Declaration for RAMB16_{S9 S18 S36} component RAMB16_{S9 S18 S36} -- synthesis translate_off generic ( INIT : bit_vector := X"0"; INIT_00 : bit_vector := X"0000000000000000000000000000000000000000000000000000000000000000"; INIT_3E : bit_vector := X"0000000000000000000000000000000000000000000000000000000000000000"; INIT_3F : bit_vector := X"0000000000000000000000000000000000000000000000000000000000000000"; INITP_00 : bit_vector := X"0000000000000000000000000000000000000000000000000000000000000000"; INITP_07 : bit_vector := X"0000000000000000000000000000000000000000000000000000000000000000"; SRVAL : bit_vector := X"0"; WRITE_MODE : string := "WRITE_FIRST"; ); 118 Component declaration for BRAM (2) -- synthesis translate_on port (DO : out STD_LOGIC_VECTOR (0 downto 0); DOP : out STD_LOGIC_VECTOR (1 downto 0); ADDR : in STD_LOGIC_VECTOR (13 downto 0); CLK : in STD_ULOGIC; DI : in STD_LOGIC_VECTOR (0 downto 0); DIP : in STD_LOGIC_VECTOR (0 downto 0); EN : in STD_ULOGIC; SSR : in STD_ULOGIC; WE : in STD_ULOGIC); end component; 119 Genaral template of BRAM instantiation (2) -- Component Attribute Specification for RAMB16_{S9 S18 S36} -- Component Instantiation for RAMB16_{S9 S18 S36} -- Should be placed in architecture after the keyword RAMB16_{S9 S18 S36}_INSTANCE_NAME : RAMB16_S1 -- synthesis translate_off generic map ( INIT => bit_value, INIT_00 => vector_value,.......... INIT_3F => vector_value, INITP_00 => vector_value, INITP_07 => vector_value SRVAL => bit_value, WRITE_MODE => user_write_mode) -- synopsys translate_on port map (DO => user_do, DOP => user_dop, ADDR => user_addr, CLK => user_clk, DI => user_di, DIP => user_dip, EN => user_en, SSR => user_ssr, WE => user_we); 120 20

Block RAM Waveforms WRITE_FIRST mode Block RAM Waveforms READ_FIRST mode 121 122 Block RAM Waveforms NO_CHANGE mode 123 21