The Memory Hierarchy Part I

Similar documents
Chapter 7- Memory System Design

Topics in Memory Subsystem

CPE300: Digital System Architecture and Design

Concept of Memory. The memory of computer is broadly categories into two categories:

CENG4480 Lecture 09: Memory 1

CENG3420 Lecture 08: Memory Organization

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

ECE 485/585 Microprocessor System Design

ECE 152 Introduction to Computer Architecture

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13

Topic 21: Memory Technology

Topic 21: Memory Technology

chapter 8 The Memory System Chapter Objectives

COMP3221: Microprocessors and. and Embedded Systems. Overview. Lecture 23: Memory Systems (I)

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

UMBC. Select. Read. Write. Output/Input-output connection. 1 (Feb. 25, 2002) Four commonly used memories: Address connection ... Dynamic RAM (DRAM)

EEC 483 Computer Organization

Address connections Data connections Selection connections

ECE 2300 Digital Logic & Computer Organization

ECSE-2610 Computer Components & Operations (COCO)

EEM 486: Computer Architecture. Lecture 9. Memory

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

EE414 Embedded Systems Ch 5. Memory Part 2/2

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Lecture-14 (Memory Hierarchy) CS422-Spring

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]

The University of Adelaide, School of Computer Science 13 September 2018

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.

Microcontroller Systems. ELET 3232 Topic 11: General Memory Interfacing

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

Computer Organization. 8th Edition. Chapter 5 Internal Memory

Chapter 4 Main Memory

Design with Microprocessors

Memory. Memory Technologies

CSC Memory System. A. A Hierarchy and Driving Forces

Design with Microprocessors

ECE 341. Lecture # 16

Random Access Memory (RAM)

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

CS 261 Fall Mike Lam, Professor. Memory

! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR

CS429: Computer Organization and Architecture

Memory Hierarchy and Caches

Logic and Computer Design Fundamentals. Chapter 8 Memory Basics

CS429: Computer Organization and Architecture

Chapter 8 Memory Basics

CS 320 February 2, 2018 Ch 5 Memory

CMPEN 411 VLSI Digital Circuits Spring Lecture 22: Memery, ROM

k -bit address bus n-bit data bus Control lines ( R W, MFC, etc.)

a) Memory management unit b) CPU c) PCI d) None of the mentioned

Memory Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Memory hierarchy Outline

Where Have We Been? Ch. 6 Memory Technology

CS152 Computer Architecture and Engineering Lecture 16: Memory System

Chapter 5 Internal Memory

Memory Overview. Overview - Memory Types 2/17/16. Curtis Nelson Walla Walla University

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

CS152 Computer Architecture and Engineering Lecture 17: Cache System

Computer Organization and Assembly Language (CS-506)

Chapter 5. Internal Memory. Yonsei University

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

Read and Write Cycles

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (5 th Week)

Contents. Main Memory Memory access time Memory cycle time. Types of Memory Unit RAM ROM

ECE 485/585 Microprocessor System Design

Memory Challenges. Issues & challenges in memory design: Cost Performance Power Scalability

Transistor: Digital Building Blocks

Memory System Overview. DMA & Endian-ness. Technology. Architectural. Problem: The Memory Wall

Summer 2003 Lecture 18 07/09/03

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory

CS250 VLSI Systems Design Lecture 9: Memory

CS 33. Memory Hierarchy I. CS33 Intro to Computer Systems XVI 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Design and Implementation of an AHB SRAM Memory Controller

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

Large and Fast: Exploiting Memory Hierarchy

Unit 6 1.Random Access Memory (RAM) Chapter 3 Combinational Logic Design 2.Programmable Logic

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.

COMPUTER ARCHITECTURES

Memory Supplement for Section 3.6 of the textbook

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Magnetic core memory (1951) cm 2 ( bit)

Memory System Design. Outline

CREATED BY M BILAL & Arslan Ahmad Shaad Visit:

+1 (479)

UNIT V (PROGRAMMABLE LOGIC DEVICES)

Chapter TEN. Memory and Memory Interfacing

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

Introduction read-only memory random access memory

Memory Devices. Future?

ECE468 Computer Organization and Architecture. Memory Hierarchy

Unit IV MEMORY SYSTEM PART A (2 MARKS) 1. What is the maximum size of the memory that can be used in a 16-bit computer and 32 bit computer?

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

Transcription:

Chapter 6 The Memory Hierarchy Part I The slides of Part I are taken in large part from V. Heuring & H. Jordan, Computer Systems esign and Architecture 1997. 1

Outline: Memory components: RAM memory cells and cell arrays Static RAM more expensive, but less complex Tree and matrix decoders needed for large RAM chips ynamic RAM less expensive, but needs refreshing Chip organization Timing ROM Read-only memory Memory boards Arrays of chips give more addresses and/or wider words 2- and 3- chip arrays Memory modules Large systems can benefit by partitioning memory for separate access by system components fast access to multiple words 2

Memory Hierarchy Outline (cont): The Memory Hierarchy: from fast and expensive to slow and cheap: Registers Cache Main Memory isk Consider two adjacent hierarchy levels: Cache Main Memory Cache: High speed, expensive (1 st level on-chip, 2 nd level off-chip) esign Types: irect mapped, associative, set associative Virtual memory: Makes the hierarchy to disk transparent Translate the address from CPU s logical address to the physical address where the information is actually stored. Memory management how to move information back and forth. Multiprogramming what to do while we wait. The TLB helps in speeding the address translation process. Memory as a subsystem: Overall performance. 3

Memory Technology Characteristics Level Memory Type Average Access Time Typical Size Unit of Transfer (Block Size) 1 Cache.5 20ns 8KB - 32MB Word 16-32bits 2 Main Memory 40 200ns 2MB - 16GB Cache line 8B-16B 3 isk 5 10ms > 100Gb Page 4KB-16KB 4 Magnetic Tape 1 5sec > 200Gb Record 16KB 4

Memory Performance Gap Processor-RAM Memory Gap (latency) 1000 100 10 1 Moore s Law CPU RAM µproc 60%/yr. (2X/1.5yr) Processor-Memory Performance Gap: (grows 50% / year) RAM 9%/yr. (2X/10 yrs) 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Performance Time 5

Levels of the Memory Hierarchy Capacity, Access Time, Cost CPU Registers 100s Bytes <10s ns Cache K Bytes 10-50 ns 1-0.1 cents/bit Main Memory M Bytes 100ns- 400ns $.0001-.00001 cents /bit isk G Bytes, 10 ms (10,000,000 ns) -5-6 10-10 cents/bit Tape infinite sec-min 10-8 Registers Instr. Operands Cache Blocks Memory Pages isk Files Tape Staging Xfer Unit prog./compiler 1-8 bytes cache cntl 8-128 bytes OS 512-8K bytes user/operator Mbytes Upper Level faster Larger Lower Level 6

The CPU Memory Interface ata bus bus CPU m Main memory s MAR m A 0 A m 1 0 w MR b 0 b 1 1 2 w Register file REQUEST 3 2 m 1 COMPLETE Control signals Sequence of events: Read: 1. CPU loads MAR, issues Read, and REQUEST 2. Main memory transmits words to MR 3. Main memory asserts COMPLETE Write: 1. CPU loads MAR and MR, asserts Write, and REQUEST 2. Value in MR is written into address in MAR 3. Main memory asserts COMPLETE 7

The CPU Memory Interface (cont d.) ata bus bus CPU m Main memory s MAR w MR m b A 0 A mð1 0 bð1 0 1 2 w Register file REQUEST 3 2 m 1 COMPLETE Additional points: If b < w, main memory must make w/b b-bit transfers. Some CPUs allow reading and writing of word sizes < w Example: Intel 8088: m = 20, w = 16, s = b = 8 8- and 16-bit values can be read and written If memory is sufficiently fast, or if its response is predictable, then COMPLETE may be omitted. Some systems use separate R and W lines, and omit REQUEST. 8 Control signals

Memory Performance Parameters Symbol efinition Units Meaning t a Access time time Time to access a memory word t c Cycle time time Time from start of access to start of next access k Block size words Number of words per block ω Bandwidth words/time Word transmission rate t l Latency time Time to access first word of a sequence of words t bl = Block time Time to access an entire block of words t l + k/ω access time (Information is stored and moved in blocks at the cache and disk level.) 9

Memories: Basic Technologies SRAM: value is stored on a pair of inverting gates very fast but takes up more space than RAM (4 to 6 transistors) Cross Coupled gates (more later) RAM: value is stored as a charge on capacitor (must be refreshed) very small but slower than SRAM (factor of 5 to 10) Word line Pass transistor Capacitor Bit line 10

Memory Cell Structure Regardless of the technology, all RAM memory cells must provide these four functions: Select, atain, ataout, and. Select atain ataout 11

An 8-Bit Register as a 1- RAM Array The entire register is selected with one select line, and uses one line Select atain ataout Select d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 ata bus is bidirectional and buffered. (Why?) 12

A 4 x 8 2- Memory Cell Array 2-4 line decoder selects one of the four 8-bit arrays 2-bit address 2 4 decoder A 1 A 0 is common to all d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 Bidirectional 8-bit buffered data bus 13

A 64 K x 1 Static RAM Chip ~square array fits IC design paradigm Row address: 8 8 256 A 0 A 7 row Selecting rows separately decoder from columns means only 256 x 2 = 512 circuit elements instead of 65536 circuit elements! Column address: A 8 A 15 256 8 256 256 cell array 256 1256 1mux 11 256demux CS, Chip Select, allows chips in arrays to be selected individually 1 CS This chip requires 21 pins including power and ground, and so will fit in a 22-pin package. ata Input - output 14

A 16 K x 4 SRAM Chip Row address: 8 8 A 0 A 256 7 row decoder 256 464 256 cell arrays There is little difference between this chip and the previous one, except that there are 4 64-1 multiplexers instead of 1 256-1 multiplexer. Column address: A 8 A 13 6 464 1muxes 41 64demuxes 64 each 4 CS This chip requires 24 pins including power and ground, and so will require a 24-pin package. Package size and pin count can dominate chip cost. ata Input-output 15

Matrix and Tree ecoders 2-level decoders are limited in size because of gate fan-in. Most technologies limit fan-in to ~8. When decoders must be built with fan-in >8, then additional levels of gates are required. Tree and matrix decoders are two ways to design decoders with large fan-in: m 0 m 4 m 8 m 12 m 0 m 4 m 1 m 5 m 9 m 13 m 1 m 5 x 0 x 1 2 4 decoder m 2 m 6 m 10 m 14 x 0 x 1 2 4 decoder m 2 m 6 m 3 m 7 m 11 m 15 m 3 m 7 2 4 decoder x 2 x 2 x 2 x 3 3-to-8 line tree decoder constructed from 2-input gates. 4-to-16 line matrix decoder constructed from 2-input gates. 16

6-Transistor Static RAM Cell ual rail data lines for reading and writing bi +5 b NOT Active loads Reading a value: Storage cell 1) precharge the bit lines to a value 1/2 way between a 0 and a 1, 2) At the same time assert the word line. This allows the latch to drive the bit lines to the value stored in the latch. Column select (from column address decoder) CS Word line w i Switches to control access to cell Additional cells Sense/write amplifiers sense and amplify data on Read, drive b i and b i on write d i 17

Static RAM Read Operation Memory address Read/write CS ata t AA Access time from the time required of the RAM array to decode the address and provide value to the data bus. 18

Static RAM Write Operations Memory address Read/write CS ata t w Write time the time the data must be held valid in order to decode address and store value in memory cells. 19

ynamic RAM Organization Single bit line b i Switch to control access to cell Capacitor discharges in 4 15 ms. Refresh capacitor by reading (sensing) value on bit line, amplifying it, and placing it back on bit line where it recharges capacitor. Word line w j t c Additional cells Capacitor stores charge for a 1, no charge for a0 Write: place value on bit line and assert word line. Read: precharge bit line, assert word line, sense value on bit line with sense/amp. Column select (from column address decoder) Sense/write amplifiers sense and amplify data on Read, drive b i and b i on write This need to refresh the storage cells of dynamic RAM chips complicates RAM system design. CS R W d i 20

ynamic RAM Chip Organization es are timemultiplexed on address bus using RAS and CAS as strobes of rows and columns. CAS is normally used as the CS function. Row latches and decoder 1024 1024 1024 cell array 10 1024 A 0 A 9 RAS CAS Control logic Control 10 1024 sense/write amplifiers andcolumnlatches 1024 10 column address latches, 1 1024 muxes and demuxes Pin counts: Without addr. multiplexing: 27 pins including power & ground. With address multiplexing: 17 pins including power & ground. d o d i 21

RAM Read and Write Cycles Typical RAM Read operation Typical RAM Write operation Memory address Row address Column address Memory address Row address Column address RAS t RAS t Prechg RAS t RAS t prechg CAS CAS W ata ata t A t HR Access time Cycle time Notice that it is the bit line precharge operation that causes the difference between access time and cycle time. t C ata hold from RAS. t C 22

RAM Refresh and Row Access Refresh is usually accomplished by a RAS-only cycle. The row address is placed on the address lines and RAS asserted. This refreshed the entire row. CAS is not asserted. The absence of a CAS phase signals the chip that a row refresh is requested, and thus no data is placed on the external data lines. Many chips use CAS before RAS to signal a refresh. The chip has an internal counter, and whenever CAS is asserted before RAS, it is a signal to refresh the row pointed to by the counter, and to increment the counter. Most RAM vendors also supply one-chip RAM controllers that encapsulate the refresh and other functions. Page mode, nibble mode, and static column mode allow rapid access to the entire row that has been read into the column latches. Video RAMS, VRAMS, clock an entire row into a shift register where it can be rapidly read out, bit by bit, for display. 23

A 2- CMOS ROM Chip +v 00 Row decoder CS 1 0 1 0 24

ROM Types ROM Cost Programmability Time to Time to Erase Type Program Mask- Very At factory Weeks N/A programmed inexpensive only ROM PROM Inexpensive Once, by Seconds N/A end user EPROM Moderate Many times Seconds 20 minutes Flash Expensive Many times 100 µs 1 s, large EPROM block EEPROM Very Many times 100 µs 10 ms, expensive byte 25

Memory Boards and Modules There is a need for memories that are larger and wider than a single chip Chips can be organized into boards. Boards may not be actual, physical boards, but may consist of structured chip arrays present on the motherboard. A board or collection of boards make up a memory module. Memory modules: Satisfy the processor main memory interface requirements May have RAM refresh capability May expand the total main memory capacity May be interleaved to provide faster access to blocks of words 26

General Structure of a Memory Chip This is a slightly different view of the memory chip than previous. Chip selects... Multiple chip selects ease the assembly of chips into chip arrays. Usually provided by an external AN gate. m Row decoder Memory cell array I/O multiplexer s s s m CS...... ata CS s s ata 27

Word Assembly from Narrow Chips All chips have common CS,, and lines. Select CS CS... CS ata ata ata s s s p s P chips expand word size from s bits to p x s bits. 28

Increasing the Num. of Words by a Factor of 2 k The additional k address bits are used to select one of 2 k chips, each one of which has 2 m words: m+k m k kto2 k decoder... CS CS CS ata ata ata s s s s Word size remains at s bits. 29

Chip Using 2 Chip Selects m+q+k k Horizontal decoder m CS1 CS2 q ata This scheme simplifies the decoding from use of a (q+k)-bit decoder to using one q-bit and one k-bit decoder. Vertical decoder Multiple chip select lines are used to replace the last level of gates in this matrix decoder scheme. s One of 2 m+q+k s-bit words 30

3-imensional ynamic RAM Array CAS Enable k c +k r High address k r k c 2 k c decoder... RAS 2 k r decoder... 2 k r decoder... Multiplexed address m/2 RAS CAS RAS CAS CAS is used to enable top decoder in decoder tree. Use one 2- array for each bit. Each 2- array on separate board. ata w ata RAS CAS ata ata 31

A Memory Module and Its Interface Must provide Read and Write signals. Ready: memory is ready to accept commands. to be sent with Read/Write command. ata sent with Write or available upon Read when Ready is asserted. Module select needed when there is more than one module. Bus Interface: k+m k register m Chip/board selection Control signal generator: for SRAM, just strobes data on Read, Provides Ready on Read/Write Module select Read Write Control signal generator Memory boards and/or chips For RAM also provides CAS, RAS,, multiplexes address, generates refresh signals, and provides Ready. Ready ata w ata register w 32

ynamic RAM Module with Refresh Control k+m register Chip/board selection k m/2 m/2 m/2 Refresh clock and control Refresh counter 2 multiplexer m/2 Module select Read Write Request Refresh Memory timing generator Grant Board and chip selects RAS CAS ynamic RAM array ata lines lines Ready w ata register ata w 33

Two Kinds of Memory Module Organizations msbs lsbs j + k = m-bit address bus j k Module 0 Module select msbs lsbs k + j = m-bit address bus k j Module 0 Module select Module 1 Module select Module 1 Module select Memory modules are used to allow access to more than one word simultaneously... Module 2 k 1 Module select.. Module 2 k 1 Module select (a) Consecutive words in consecutive modules (interleaving) (b) Consecutive words in the same module 34

Timing Advantage of Interleaving If time to transmit information over bus, t b, is < module cycle time, t c, it is possible to time multiplex information transmission to several modules; Example: store one word of each cache line in a separate module. Main Memory : Word Module No. This provides successive words in successive modules. Timing: Bus Read module 0 address Write module 3 address and data Module 0 ata return Module 0 Module 0 read Module 3 Module 3 write t b t c t b With interleaving of 2 k modules, and t b < t b /2k, it is possible to get a 2 k -fold increase in memory bandwidth, provided memory requests are pipelined. MA satisfies this requirement. 35