Review: Timing. EECS Components and Design Techniques for Digital Systems. Lec 13 Storage: Regs, SRAM, ROM. Outline.

Similar documents
EECS150 - Digital Design Lecture 16 Memory 1

EECS150 - Digital Design Lecture 16 - Memory

ECE 485/585 Microprocessor System Design

EECS150 - Digital Design Lecture 17 Memory 2

Introduction to SRAM. Jasur Hanbaba

EECS 151/251A Spring 2019 Digital Design and Integrated Circuits. Instructor: John Wawrzynek. Lecture 18 EE141

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 15 Memories

Memory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM

ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems

Lecture 11 SRAM Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

EECS150 - Digital Design Lecture 13 - Project Description, Part 2: Memory Blocks. Project Overview

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

Memory and Programmable Logic

Digital Integrated Circuits Lecture 13: SRAM

Introduction to CMOS VLSI Design Lecture 13: SRAM

SRAM. Introduction. Digital IC

ELCT 912: Advanced Embedded Systems

Semiconductor Memories: RAMs and ROMs

EE178 Lecture Module 2. Eric Crabill SJSU / Xilinx Fall 2007

! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

EECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are:

Topics. Midterm Finish Chapter 7

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007

Computer Organization and Assembly Language (CS-506)

FPGA RAM (C1) Young Won Lim 5/13/16

EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Outline. EECS Components and Design Techniques for Digital Systems. Lec 11 Putting it all together Where are we now?

ECEN 449 Microprocessor System Design. Memories

Chapter 5 Internal Memory

Homework deadline extended to next friday

Outline. Field Programmable Gate Arrays. Programming Technologies Architectures. Programming Interfaces. Historical perspective

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

Semiconductor Memory Classification

CHAPTER 12 ARRAY SUBSYSTEMS [ ] MANJARI S. KULKARNI

Memory and Programmable Logic

ECE 2300 Digital Logic & Computer Organization

CSEE 3827: Fundamentals of Computer Systems. Storage

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

8051 INTERFACING TO EXTERNAL MEMORY

Very Large Scale Integration (VLSI)

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

Memories. Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu.

problem maximum score 1 8pts 2 6pts 3 10pts 4 15pts 5 12pts 6 10pts 7 24pts 8 16pts 9 19pts Total 120pts

+1 (479)

Sense Amplifiers 6 T Cell. M PC is the precharge transistor whose purpose is to force the latch to operate at the unstable point.

Field Programmable Gate Array (FPGA)

Topic 21: Memory Technology

Topic 21: Memory Technology

CSE140L: Components and Design Techniques for Digital Systems Lab

Read Only Memory ROM

CSE140L: Components and Design

Computer Organization. 8th Edition. Chapter 5 Internal Memory

Introduction to Semiconductor Memory Dr. Lynn Fuller Webpage:

Microcontroller Systems. ELET 3232 Topic 11: General Memory Interfacing

! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Caches and Memory Hierarchies.

CENG 4480 L09 Memory 2

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

Interface DAC to a PC. Control Word of MC1480 DAC (or DAC 808) 8255 Design Example. Engineering 4862 Microprocessors

Memory Arrays. Array Architecture. Chapter 16 Memory Circuits and Chapter 12 Array Subsystems from CMOS VLSI Design by Weste and Harris, 4 th Edition


ECSE-2610 Computer Components & Operations (COCO)

Sistemas Digitais I LESI - 2º ano

ECEU530. Project Presentations. ECE U530 Digital Hardware Synthesis. Rest of Semester. Memory Structures

Chapter 3 Semiconductor Memories. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Lecture Objectives. Introduction to Computing Chapter 0. Topics. Numbering Systems 04/09/2017

UNIT V (PROGRAMMABLE LOGIC DEVICES)

EECS150, Fall 2004, Midterm 1, Prof. Culler. Problem 1 (15 points) 1.a. Circle the gate-level circuits that DO NOT implement a Boolean AND function.

RTL Design (2) Memory Components (RAMs & ROMs)

CpE 442. Memory System

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

CS152 Computer Architecture and Engineering Lecture 16: Memory System

ECE 331 Digital System Design

ECE 485/585 Microprocessor System Design

Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

Unit 6 1.Random Access Memory (RAM) Chapter 3 Combinational Logic Design 2.Programmable Logic

Computer Structure. Unit 2: Memory and programmable devices

Read and Write Cycles

Address connections Data connections Selection connections

CENG 4480 L09 Memory 3

CS311 Lecture 21: SRAM/DRAM/FLASH

FPGA Implementations

Memory Expansion. Lecture Embedded Systems

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

Very Large Scale Integration (VLSI)

EECS Components and Design Techniques for Digital Systems. Lec 07 PLAs and FSMs 9/ Big Idea: boolean functions <> gates.

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.

Memories: Memory Technology

Section 6. Memory Components Chapter 5.7, 5.8 Physical Implementations Chapter 7 Programmable Processors Chapter 8

Synthesis vs. Compilation Descriptions mapped to hardware Verilog design patterns for best synthesis. Spring 2007 Lec #8 -- HW Synthesis 1

! Serial Access Memories. ! Multiported SRAM ! 5T SRAM ! DRAM. ! Shift registers store and delay data. ! Simple design: cascade of registers

COMP3221: Microprocessors and. and Embedded Systems. Overview. Lecture 23: Memory Systems (I)

CMPEN 411 VLSI Digital Circuits Spring Lecture 22: Memery, ROM

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES

Transcription:

Review: Timing EECS 150 - Components and Design Techniques for Digital Systems Lec 13 Storage: Regs,, ROM David Culler Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~culler http://inst.eecs.berkeley.edu/~cs150 All gates have delays RC delay in driving the output Wires are distributed RCs Delays goes with the square of the length Source circuits determines strength Serial vs parallel Delays in combinational logic determine by Input delay Path length Delay of each gate along the path Worst case over all possible input-output paths Setup and CLK-Q determined by the two latches in flipflop Clock cycle : T cycle T CL +T setup +T clk Q + worst case skew Delays can introduce glitches in combinational logic 10-9-2007 EECS150-Fa07 Lec13-RAM 1 10-9-2007 EECS150-Fa07 Lec13-RAM 2 Outline Memory Basics Memory concepts Register Files Access Multiported Memories FIFOS ROM, EPROM, FLASH Relationship to Comb. Logic Uses: Whenever a large collection of state elements is required. data & program storage general purpose registers buffering table lookups CL implementation Types: RAM - random access memory ROM - read only memory EPROM, FLASH - electrically programmable read only memory Example RAM: Register file from microprocessor clk regid = register identifier (address of word in memory) sizeof(regid) = log2(# of reg) WE = write enable 10-9-2007 EECS150-Fa07 Lec13-RAM 3 10-9-2007 EECS150-Fa07 Lec13-RAM 4

Examples in your project Local device configuration Registry of other devices Video object store Video object pixel maps FIFOs connecting peripherals to the core Audio storage Definitions Memory Interfaces for Accessing Data Asynchronous (unclocked): A change in the address results in data appearing Synchronous (clocked): A change in address, followed by an edge on CLK results in data appearing or write operation occurring. A common arrangement is to have synchronous write operations and asynchronous read operations. Volatile: Looses its state when the power goes off. Nonvolatile: Retains it state when power goes off. 10-9-2007 EECS150-Fa07 Lec13-RAM 5 10-9-2007 EECS150-Fa07 Lec13-RAM 6 Register File Internals Regid (address) Decoding For read operations, functionally the regfile is equivalent to a 2-D array of flip-flops with tristate outputs on each MUX, but distributed Unary control with added write logic: These circuits are just functional abstractions of the actual circuits used. How do we go from "regid" to "SEL"? 10-9-2007 EECS150-Fa07 Lec13-RAM 7 The function of the address decoder is to generate a one-hot code word from the address. Binary -> unary Simplified DEMUX The output is used for row selection. Many different circuits exist for this function. A simple one is shown. Where have you seen this before? 10-9-2007 EECS150-Fa07 Lec13-RAM 8

Accessing Register Files Basic Memory Subsystem Block Diagram Read: output is a combinational function of the address input Change address, see data from a different word on the output Regardless of clock Write is synchronous If enabled, input data is written to selected word on the clock edge Often multi-ported (more on that later) clk addr addr X addr Y Address Decoder n Address Bits Memory cell m Bit Lines Word Line 2 n word lines what happens if n and/or m is very large? dout R[X] R[Y] val RAM/ROM naming convention: din val 10-9-2007 WE EECS150-Fa07 Lec13-RAM 9 32 X 8, "32 by 8" => 32 8-bit words 1M X 1, "1 meg by 1" => 1M 1-bit words 10-9-2007 EECS150-Fa07 Lec13-RAM 10 Memory Components Types: Volatile: Random Access Memory (RAM):» "static"» DRAM "dynamic" Non-volatile: Read Only Memory (ROM):» Mask ROM "mask programmable"» EPROM "electrically programmable"» EEPROM "erasable electrically programmable"» FLASH memory - similar to EEPROM with programmer integrated on chip Read Only Memory (ROM) Simplified form of memory. No write operation needed. Functional Equivalence: Connections to Vdd used to store a logic 1, connections to GND for storing logic 0. address decoder bit-cell array Full tri-state buffers are not needed at each cell point. In practice, single transistors are used to implement zero cells. Logic one s are derived through precharging or bit-line pullup transistor. 10-9-2007 EECS150-Fa07 Lec13-RAM 11 10-9-2007 EECS150-Fa07 Lec13-RAM 12

Static RAM Typical Organization: 16-word x 4-bit 6-Transistor 0 1 0 word (row select) Read: bit bit 1. Select row 2. pulls one line low and one high 3. Sense output on bit and bit Write: 1. Drive bit lines (e.g, bit=1, bit=0) 2. Select row Why does this work? When one bit-line is low, it will force output high; that will set new state 10-9-2007 EECS150-Fa07 Lec13-RAM 13 1 Din 3 Wr Driver - + Din 2 Wr Driver - + Din 1 Wr Driver - + Din 0 Wr Driver - + : : : : - Sense Amp+ - Sense Amp+ - Sense Amp+ - Sense Amp+ Word 0 Word 1 Word 15 10-9-2007 Dout 3 Dout 2 EECS150-Fa07 DoutLec13-RAM 1 Dout 0 14 Address Decoder WrEn A0 A1 A2 A3 Simplified timing diagram Logic Diagram of a Typical A N WE_L OE_L 2 N words x M bit M D Read: Valid address, then Chip Select Access Time: address good to data valid even if not visible on out Cycle Time: min between subsequent mem operations Write: Valid address and data with WE_l, then CS Address must be stable a setup time before WE and CS go low And hold time after one goes high When do you drive, sample, or Z the data bus? 10-9-2007 EECS150-Fa07 Lec13-RAM 15 Write Enable is usually active low (WE_L) Din and Dout are combined to save pins: A new control signal, Output Enable (OE_L) WE_L is asserted (Low), OE_L is unasserted (High)» D serves as the data input pin WE_L is unasserted (High), OE_L is asserted (Low)» D is the data output pin Neither WE_L and OE_L are asserted?» Chip is disconneted or chipselect (CS) + WE Never both asserted! 10-9-2007 EECS150-Fa07 Lec13-RAM 16

Example: ST microelectronics M68AW256M Administration and Announcements Thanks for feedback on survey Slow down course and lecture More examples lecture and lab lecture Mid term too long Lab lecture => Pre Lab => Design review => Execute => Check off You need to pay attention to lab lecture on what to do next while executing current lab. Do the prelab before lab. The beginning of lab is turn in design review document of current and check off of previous. Update on lab check offs 4 days of slip that you can use as needed (but cannot extend into weekend) Can accepts black box on lab to catch up» Recoup 50% if implement your own within 2 weeks All homeworks graded. On-line grades available for labs, HWs, and Mid Solution for current HW will be posted on Friday! 10-9-2007 EECS150-Fa07 Lec13-RAM 17 http://www.st.com/stonline/products/literature/ds/7996/m68aw256m.pdf 10-9-2007 EECS150-Fa07 Lec13-RAM 18 What happens when # bits gets large Inside a Tall-Thin RAM is a short-fat RAM Big slow decoder Bit lines very log Large distributed RC load Log n bit address n bits Treat output as differential signal, rather than rail-to-rail logic Sense amps on puts Can precharge both bit lines high, so cell only has to pull one low ==> Make it shorter and wider Log k bit address Log m bit address n = k x m bits Sense amps mux 1 data bit 10-9-2007 EECS150-Fa07 Lec13-RAM 19 10-9-2007 EECS150-Fa07 Lec13-RAM 20

Column MUX Example 2: 16M (2M x 8) Controls physical aspect ratio Important for physical layout and to control delay on wires. In DRAM, allows time-multiplexing of chip address pins (later) Cypress Semiconductor CY62168DV30LL-55BVXI 10-9-2007 EECS150-Fa07 Lec13-RAM 21 10-9-2007 EECS150-Fa07 Lec13-RAM 22 Typical Timing Read Series with CS A N WE_L OE_L 2 N words x M bit M D OE determines direction Hi = Write, Lo = Read Writes are dangerous! Be careful! Double signaling: OE Hi, WE Lo D A OE_L Write Timing: Data In Write Address High Z Read Timing: Data Out Junk Read Address Read Address WE_L Write Hold Time Read Access Read Access Time Time 10-9-2007 Write Setup Time EECS150-Fa07 Lec13-RAM 23 Data Out Rate determined by cycle time Data valid: max(addr + Access time, CS + Tco) Remain valid TOHA after addr changes Return to tri-state after read sequence 10-9-2007 EECS150-Fa07 Lec13-RAM 24

Example 1 continued: read Example 1 continued: write 10-9-2007 EECS150-Fa07 Lec13-RAM 25 10-9-2007 EECS150-Fa07 Lec13-RAM 26 Cascading Memory Modules (or chips) Example: assemble of 256 x 8 ROM using 256 x 4 modules: example: 1K x * ROM using 256 x 4 modules: Memory Blocks in FPGAs LUTs can double as small RAM blocks: 4-LUT is really a 16x1 memory. Normally we think of the contents being written from the configuration bit stream, but Virtex architecture (and others) allow bits of LUT to be written and read from the general interconnect structure. achieves 16x density advantage over using CLB flip-flops. Furthermore, the two LUTs within a slice can be combined to create a 16 x 2-bit or 32 x 1-bit synchronous RAM, or a 16x1-bit dual-port synchronous RAM. The Virtex-E LUT can also provide a 16-bit shift register of adjustable length. each module has tri-state outputs: 10-9-2007 EECS150-Fa07 Lec13-RAM 27 Newer FPGA families include larger onchip RAM blocks (usually dual ported): Called block selectrams in Xilinx Virtex series 4k bits each 10-9-2007 EECS150-Fa07 Lec13-RAM 28

Synchronous Verilog for Virtex LUT RAM module ram16x1(q, a, d, we, clk); output q; input d; input [3:0] a; input clk, we; reg mem [15:0]; always @(posedge clk) begin if(we) mem[a] <= d; end assign q = mem[a]; endmodule Note: synchronous write and asynchronous read. Deeper and/or wider RAMs can be specified and the synthesis tool will do the job of wiring together multiple LUTs. How does the synthesis tool choose to implement your RAM as a collection of LUTs or as block RAMs? 10-9-2007 EECS150-Fa07 Lec13-RAM 29 10-9-2007 EECS150-Fa07 Lec13-RAM 30 Virtex Block RAMs Each block SelectRAM (block RAM) WEA is a fully synchronous ENA (synchronous write and read) dualported (true dual port) 4096-bit RAM ADDRA[#:0] RSTA DOA[#:0] CLKA with independent control signals for DIA[#:0] each port. The data widths of the two ports can be configured WEB ENB independently, providing built-in RSTB DOB[#:0] bus-width conversion. CLKB ADDRB[#:0] CLKA and CLKB can be DIB[#:0] independent, providing an easy way to cross clock boundaries. Around 160 of these on the 2000E. Multiples can be combined to implement, wider or deeper memories. See chapter 8 of Synplify reference manual on how to write Verilog for implied Block RAMs. Or instead, explicitly instantiate as primitive (project checkpoint will use this method). 10-9-2007 EECS150-Fa07 Lec13-RAM 31 Multi-ported Memory Motivation: Consider CPU core register file:» 1 read or write per cycle limits processor performance.» Complicates pipelining. Difficult for different instructions to simultaneously read or write regfile.» Common arrangement in pipelined CPUs is 2 read ports and 1 write port. sel a sel b sel c data b data a Regfile data c What do we need in the project? 10-9-2007 EECS150-Fa07 Lec13-RAM 32

Dual-ported Memory Internals First-in-first-out (FIFO) Memory Add decoder, another set of read/write logic, bits lines, word lines: dec a dec b cell array address ports data ports r/w logic r/w logic Example cell: b 2 b 1 b 1 b 2 Repeat everything but crosscoupled inverters. This scheme extends up to a couple more ports, then need to add additional transistors. WL 2 WL 1 Used to implement queues. These find common use in computers and communication circuits. Generally, used for rate matching data producer and consumer: stating state after write after read Producer can perform many writes without consumer performing any reads (or vice versa). However, because of finite buffer size, on average, need equal number of reads and writes. Typical uses: interfacing I/O devices. Example network interface. Data bursts from network, then processor bursts to memory buffer (or reads one word at a time from interface). Operations not synchronized. Example: Audio output. Processor produces output samples in bursts (during process swap-in time). Audio DAC clocks it out at constant sample rate. 10-9-2007 EECS150-Fa07 Lec13-RAM 33 10-9-2007 EECS150-Fa07 Lec13-RAM 34 FIFO Interfaces D IN EMPTY RE D OUT RST WE FULL FIFO HALF FULL CLK Address pointers are used internally to keep next write position and next read position into a dual-port memory. write ptr read ptr If pointers equal after write FULL: After write or read operation, FULL and EMPTY indicate write ptr read ptr status of buffer. Used by external logic to If pointers equal after read EMPTY: control own reading from or writing to the buffer. write ptr read ptr FIFO resets to EMPTY state. HALF FULL (or other indicator of partial fullness) 10-9-2007 EECS150-Fa07 Lec13-RAM is optional. 35 FIFO Implementation Assume, dual-port memory with asynchronous read, synchronous write. Binary counter for each of read and write address. CEs controlled by WE and RE. Equal comparator to see when pointers match. Flip-flop each for FULL and EMPTY flags: WE RE equal EMPTY i FULL i Control logic with truth-table draft 0 0 0 0 0 equal determined after clock allows counter to tick 0 0 1 EMPTY i-1 FULL i-1 - when is empty and full set? 0 1 0 0 0 - could play funny games with 0 1 1 1 0 clocking See Xilinx application notes 1 0 0 0 0 Can waste a slot to simplify 1 0 1 0 1 1 1 0 0 0 1 1 1 EMPTY i-1 FULL i-1 10-9-2007 EECS150-Fa07 Lec13-RAM 36

Xilinx BlockRam Versions Our simple version has latency problems. FULL and EMPTY signals are asserted near the end of the clock period. Xilinx version solves this by predicting when full or empty will be asserted on next write/read operation. Also uses BlockRam with synchronous reads. Available on Xilinx website along with app note application note. These are linked to our page. Two versions available. Easy to modify to change the width if necessary. Will use the cc (common clock) version. 10-9-2007 EECS150-Fa07 Lec13-RAM 37 Non-volatile Memory Used to hold fixed code (ex. BIOS), tables of data (ex. FSM next state/output logic), slowly changing values that persist over power off (date/time) Mask ROM Used with logic circuits for tables etc. Contents fixed at IC fab time (truly write once!) EPROM (erasable programmable) & FLASH requires special IC process (floating gate technology) writing is slower than RAM. EPROM uses special programming system to provide special voltages and timing. reading can be made fairly fast. rewriting is very slow.» erasure is first required, EPROM - UV light exposure, EEPROM electrically erasable 10-9-2007 EECS150-Fa07 Lec13-RAM 38 FLASH Memory Electrically erasable In system programmability and erasability (no special system or voltages needed) On-chip circuitry (FSM) and voltage generators to control erasure and programming (writing) Erasure happens in variable sized "sectors" in a flash (16K - 64K Bytes) See: http://developer.intel.com/design/flash/ for product descriptions, etc. Compact flash cards are based on this type of memory. NAND flash Configuration memory, microcontrollers usually NOR flash 10-9-2007 EECS150-Fa07 Lec13-RAM 39 Relationship between Memory and CL Memory blocks can be (and often are) used to implement combinational logic functions: Examples: LUTs in FPGAs 1Mbit x 8 EPROM can implement 8 independent functions each of log 2 (1M)=20 inputs. The decoder part of a memory block can be considered a minterm generator. The cell array part of a memory block can be considered an OR function over a subset of rows. The combination gives us a way to implement logic functions directly in sum of products form. Several variations on this theme exist in a set of devices called Programmable logic devices (PLDs) 10-9-2007 EECS150-Fa07 Lec13-RAM 40

A ROM as AND/OR Logic Device PLD Summary 10-9-2007 EECS150-Fa07 Lec13-RAM 41 10-9-2007 EECS150-Fa07 Lec13-RAM 42 PLA Example PAL Example 10-9-2007 EECS150-Fa07 Lec13-RAM 43 10-9-2007 EECS150-Fa07 Lec13-RAM 44

Summary Basic RAM structure Address decoder to select row of cell array bit, ~bit lines to read & write Sense difference in each bit Column mux Read/write protocols Synchronous (reg files, fpga block ram) Asynchronous read, synchronous writes Asynchronous Multiported RAMs reg files and fifos Non-volatile memory ROM, EPROM, EEPROM, FLASH Memory as combinational logic Relationship to programmable logic 10-9-2007 EECS150-Fa07 Lec13-RAM 45