Memory and Programmable Logic Memory units allow us to store and/or retrieve information Essentially look-up tables Good for storing data, not for function implementation Programmable logic device (PLD), programmable logic array (PLA), programmable array logic (PAL), and field programmable gate array (FPGA) are devices which allow us to program logic functions More efficient approach to Boolean functions programming refers to the procedure to configure the hardware devices 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 1
Random Access Memory (RAM) RAM refers to memory in which access time to all information is constant Sequential Access Memory (SAM) refers to memory in which access time is based on the memory s current location, or by some other physical constraint Eg. Tape Drive, Hard Drive RAM should be referred to as Read/Write Memory or RWM Read Only Memory (ROM), covered later, is also random access memory, by term refers to its storage properties 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 2
Random Access Memory Regardless of storage properties, it sometimes isn t necessary to have access to all bits of storage D flip-flops can be interconnected such that we can access all bits in a single clock cycle; this may not be necessary as it is quite expensive Choose to create a memory storage building block where access to bits is limited by a certain amount of bits per clock Limits the amount of wiring to interconnect with this block Makes the block more generic for any design 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 3
Random Access Memory RAM devices store information in groups of bits, called a word Can only access a single word in single clock cycle A word is usually in groups of 8 bits (byte) Refer to storage capacity in terms of the number of bytes it can store (eg. MB, GB) 1K = 1000 or 10 3, 1Ki = 1024 or 2 10 The difference can be substantial as values get larger (eg. difference between TB and TiB is 10%) Text books incorrectly states 1K = 1024 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 4
Random Access Memory Use various input/output ports to communicate with RAM component Data input/output (n bit word) Address input (k bits, 2 k total words) Control lines (Read, Write) In general: When Read asserted, information at the address location will be placed on data output When Write asserted, information at data input will be committed to the address location specified 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 5
Random Access Memory Example: 1Ki x 16 RAM 2 10 or 1024 addresses 16 bits (b) per address or 2 bytes (B) RAM size is 16Kib (bits) or 2KiB (bytes) 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 6
Random Access Memory (Legacy - Unclocked clocked) When no changes are detected on inputs (in a specific amount of time), RAM generates internal clock signals to perform operations Easier to interface with when FF were limited Writing to Legacy RAM Apply all inputs (address, data); Assert Write line Data is committed in a specified amount of time Reading from Legacy RAM Apply address; Assert Read line In a specified amount of time, data should be available on output May also have Memory Enable (ME) or Chip Select (CS) input to disable memory and set output to highimpedance state May also have unified data input/output bus with tri-states controlled by Read/Write and/or ME/CE inputs 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 7
Random Access Memory (Modern - Clocked) Modern RAM is clocked On rising/falling edge all inputs are processed; no self clocking; increased performance Operations performed in single clock; must honor timing Writing to Modern RAM Apply all inputs (address, data, write mode) On clock edge RAM begins committing data; complete by next cycle Reading from Modern RAM Apply all inputs (address, read mode) Output will be available before next clock Read/Write lines control enable No bidirectional ports, point-to-point connections 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 8
Types of RAMs RWM Volatile (loses value when power removed) Static RAM (SRAM) Uses 6 transistors (6T) to store a bit Dynamic RAM (DRAM) ROM Uses 1 transistor (1T, as a capacitor) to store a bit Requires a refresh (rewrite data) before charge decays Read Only Memory (no write) Non-volatile (values remain after power is removed) Meets required timing for read operation 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 9
Inferring Modern SRAM in HDL Define Mem using an array of flip-flops: reg [n-1:0] Mem [0:(1<<k)-1] Coding of always block ensures only ONE write to memory is done in one cycle, or ONE read to memory is done in one cycle Otherwise, flip-flops are used and design gets REALLY BIG Transparent memory when current write value is carried onto output port (ie. DataOut<=DataIn) Otherwise, no change mode, may infer flip-flops for storing old output module myram (CK, Write, Address, DataIn, DataOut); parameter n=4, k=10; input CK, Write; input [k-1:0] Address; input [n-1:0] DataIn; output [n-1:0] DataOut; reg [n-1:0] DataOut; reg [n-1:0] Mem [0:(1<<k)-1]; always @(posedge CK) begin if (Write) begin Mem[Address] <= DataIn; DataOut <= DataIn; end else DataOut <= Mem[Address]; end endmodule 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 10
Memory Decoding Need circuits to hold bits being stored Bit Cells (BC) Also need circuits to connect inputs and outputs to BCs Address Decoders Use small memory (4x4) configuration to demonstrate this Use a slightly modified OR gate to simplify diagrams for the array logic 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 11
Memory Decoding - Bit Cells Functionally a BC is a D latch with an AND on the output Figure is only a functional model SRAM (6T) and DRAM (1T) When Select and Read/Write are high, the Output will be the contents of the latch, otherwise 0 When Select is high and Read/Write is low, the input is stored in the latch 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 12
Use a decoder (2-to-4) to decode the 2 address inputs into 4 select lines Each select line goes into a row of BC Each column has a common data input and output Read/Write is globally connected Outputs of the rows of BCs are connect to the corresponding final OR gate Address decoder requires 2 k AND gates with k inputs per gate Functional model; built with transistors to minimize cost Memory Decoding - Internal Construction 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 13
Memory Decoding - Coincident Decoding To try to minimize the address decoding circuit, we can use 2 k/2 decoders with two sets of decoding lines: column and row Example: 1024 x 32 RAM Intersection of column and row select lines contains a n*bc 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 14
Memory Decoding - Address Multiplexing DRAMs use fewer transistors More densely packed More address bits More pins Use address multiplexing to reduce pin count Requires two steps to load address (RAS and CAS) Sequential memory access only requires CAS be loaded on each cycle as previous row address are the same CAS and RAS also allowed for user controlled refreshing Some DRAMS perform auto refresh 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 15
Error Detection and Correction: Hamming Code Higher density transistors increases chances for bit errors Given n data bits, use k data bits to generate parity to detect and locate a single error Only use data bits to generate parity p1 = XOR(d1,d2,d4,d5,d7,d9,d11) p2 = XOR(d1,d3,d4,d6,d7,d10,d11) p4 = XOR(d2,d3,d4,d8,d9,d10,d11) p8 = XOR(d5,d6,d7,d8,d9,d10,d11) Bit position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Encoded data bits p1 p2 d1 p4 d2 d3 d4 p8 d5 d6 d7 d8 d9 d10 d11 p16 p1 X X X X X X X X Parity p2 X X X X X X X X bit p4 X X X X X X X X coverage p8 X X X X X X X X p16 X 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 16
Error Detection and Correction: Hamming Code Check parity with all bits: c1 = XOR(p1,d1,d2,d4,d5,d7,d9,d11) c2 = XOR(p2,d1,d3,d4,d6,d7,d10,d11) c4 = XOR(p4,d2,d3,d4,d8,d9,d10,d11) c8 = XOR(p8,d5,d6,d7,d8,d9,d10,d11) C = c8c4c2c1 (combine bits) If C=0000, no error, otherwise C is bit position of error Bit position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Encoded data bits p1 p2 d1 p4 d2 d3 d4 p8 d5 d6 d7 d8 d9 d10 d11 p16 p1 X X X X X X X X Parity p2 X X X X X X X X bit p4 X X X X X X X X coverage p8 X X X X X X X X p16 X 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 17
Read-Only Memory (ROM) Non-volatile storage of binary information A read-only RAM (random access memory) No write lines, no data inputs, may have a ME/CE with tristate output bus k input address bits, n output data bits ROMs are usually small (<1MB) Typically used for code for small devices or startup code 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 18
Read-Only Memory (ROM) Example: 32 x 8 ROM (k=5, n=8) Inputs are decoded using a 5-to-32 decoder to generate an output for each memory address Each decoder output line MAY BE connected to one of the inputs of the OR gates OR gates require 32 inputs each Functional model; not built using OR gates as it doesn t scale well; use tri-states instead 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 19
Read-Only Memory (ROM) All cross-points are programmable depending on type of ROM (see later) Example: Use table to populate ROM When output bit is: 0: no connection 1: a connection is made (shown by X) 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 20
ROM: Combinational Circuits 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 21
Types of ROMs Creation of 0 or 1 depends on technology eg. Presence or lack of transistor(s), connected or blown fuse, etc. 1. Masked Transistors placed during manufacturing to create output Expensive; economical for large quantities 2. Programmable (PROM) One time programmable Fuses blown to create output 3. Electrically Programmable (EPROM) Field used to set or clear output by higher voltage Requires UV light to erase or reset (1 hour exposure) 4. Electrically Erasable (EEPROM, NOR) Similar to EPROM, but uses regular voltage to program AND erase Good for changing a few bytes at a time 5. Flash (NAND) Similar to EEPROM, but uses larger blocks of bits (>16K) Must erase whole block (slow) before programming 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 22
Combinational Programmable AND-OR sum-ofproduct implementation PROM has a fixed AND array and programmable OR array PAL has programmable AND array and fixed OR array PLA has programmable AND and OR array Logic Devices (PLDs( PLDs) 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 23
Programmable Logic Array (PLA) Similar in concept to a PROM, except the PLA does not provide full decoding of the variables and does not generate all the minterms PROM s decoder is replaced by an array of AND gates that can be programmed to generate any product term of the input variables Limited number of product terms The product terms are then connected to OR gates to provide the sum of products for the required Boolean functions Limited number of OR gates 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 24
Programmable Logic Array (PLA) Example: F 1 = AB' + AC + A'BC' F 2 = (AC + BC) OR outputs may go to an XOR gate which can invert the output PLA functions can be easily described in tabular form 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 25
PLA: Example Inputs: 1 connection to input, 0 connection to complement of input, - is no connection Outputs: 1 connection to OR, - no connection (T) OR output direct or true, (C) OR output complemented 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 26
Programmable Array Logic (PAL) PLD with a programmable AND array and a fixed OR array Easier to program than the PLA, but not as flexible Figure shows logic configuration of a typical PAL with four inputs and four outputs Each input has a buffer inverter gate Each AND gate has multiple (ie. 10) programmable input connections Each major section is composed of three programmable AND gates and one fixed OR gate One of the outputs is connected to a buffer inverter gate and then fed back into two inputs of the AND gates 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 27
Programmable Array Logic (PAL) Commercial PAL devices contain more gates The output terminals are sometimes driven by three-state buffers or inverters The Boolean functions must be simplified to fit into each section A product term cannot be shared among two or more OR gates The number of product terms in each section is fixed (ie. 3) If the number of terms in the function is too large, it may be necessary to use two sections to implement one Boolean function (ie. Use F1 to drive F2) 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 28
PAL: Example w(a, B, C, D) = (2,12,13) = ABC' + A'B'CD' x(a, B, C, D) = (7,8,9,10,11,12,13,14,15) = A + BCD y(a, B, C, D) = (0,2,3,4,5,6,7,8,10,11,15) = A'B + CD + B'D' z(a, B, C, D) = (1,2,8,12,13) = ABC' + A'B'CD' + AC'D' + A'B'C'D = w + AC'D' + A'B'C'D 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 29
Sequential PLDs (SPLDs) Includes a FF in addition to AND- OR array Example: PAL with FF Registered PAL Each section is called a macrocell 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 30
Complex PLDs (CPLDs) Multiple SPLDs connected together Switch Matrix allow all components to interconnect I/O blocks allow connections to external pins and can be programmed to be tri-state, input, or output Manufacturers take different approaches to general architecture 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 31
Field-Programmable Gate Consists of an array of configurable logic blocks (CLBs), a variety of local and global routing resources, and input/output blocks (IOBs) Programmable I/O buffers, SRAM based configuration Xilinx (1985) specific, others similar in design Arrays (FPGAs( FPGAs) 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 32
Xilinx CLBs Xilinx CLBs have evolved over time XC2000 has two 4-bit function generators, and a single storage element XC3000 has two 5-bit function generators, and two storage elements XC4000 has two 4-bit function generators which can be joined or independent, configured as RAM, two storage elements, carry logic to improve arithmetic, better FF control Far newer devices available 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 33
XC2000 CLB X Inputs A B C D Combinatorial Logic G F D S Q Outputs Y K R Clock 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 34 K Information only, not on exam.
XC3000 CLB DATA IN DI 0 MUX D Q F 1 DIN G LOGIC VARIABLES A B C D QX COMBINATORIAL FUNCTION F RD QX F X CLB OUTPUTS E QY G G Y F QY DIN G 0 MUX D Q 1 ENABLE CLOCK EC RD 1 (ENABLE) CLOCK K DIRECT RESET RD 0 (INHIBIT) (GLOBAL RESET) Information only, not on exam. 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 35
XC4000 CLB C1 C2 C3 C4 H1 DIN S/R EC Information only, not on exam. G4 S/R CONTROL G3 G2 LOGIC FUNCTION OF G1-G4 G' DIN F' G' H' D SD Q XQ G1 LOGIC FUNCTION OF H' F', G', AND H1 G' H' 1 EC RD X F4 F3 F2 LOGIC FUNCTION OF F1-F4 F' DIN F' G' H' S/R CONTROL D SD Q YQ F1 K (CLOCK) 1 EC RD H' F' Y 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 36
Interconnect Resources Series of interconnecting lies (single-length, double-length, long, direct) offer various connections between CLBs/IOBs near by and far away Switch Matrix is comprised of 8 programmable interconnect points (PIPs) Offers a selected set of routing configurations Six Pass Transistors Per Switch Matrix Interconnect Point 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 37
Additional FPGA Resources Modern FPGAs also include: RAMs Distributed RAMs (unclocked SRAMs; from CLBs) Block RAMs (large clocked SRAMs) Can be parallelized to create larger configurations Digital Signal Processing (DSP) blocks Arithmetic components (Add/Sub/Multi/Divide/etc) Better than implementing in CLBs Hard-core components Full CPUs, Memory controllers, Ethernet controllers, Video controllers, High-Speed Serial Transceivers, etc. 2018 Roberto Muscedere Images 2013 Pearson Education Inc. 38