CENG3420 Lecture 08: Memory Organization

Similar documents
CENG4480 Lecture 09: Memory 1

ECE468 Computer Organization and Architecture. Memory Hierarchy

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]

CS152 Computer Architecture and Engineering Lecture 16: Memory System

CpE 442. Memory System

ECE 341. Lecture # 16

Concept of Memory. The memory of computer is broadly categories into two categories:

ECE232: Hardware Organization and Design

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CSC Memory System. A. A Hierarchy and Driving Forces

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

k -bit address bus n-bit data bus Control lines ( R W, MFC, etc.)

EEC 483 Computer Organization

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Recap: Machine Organization

Lecture 18: Memory Systems. Spring 2018 Jason Tang

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

UNIT-V MEMORY ORGANIZATION

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13

Announcement. Computer Architecture (CSC-3501) Lecture 20 (08 April 2008) Chapter 6 Objectives. 6.1 Introduction. 6.

Large and Fast: Exploiting Memory Hierarchy

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

chapter 8 The Memory System Chapter Objectives

The University of Adelaide, School of Computer Science 13 September 2018

The Memory Hierarchy Part I

The Memory Hierarchy & Cache

CPE300: Digital System Architecture and Design

Memory Hierarchy Y. K. Malaiya

CS 261 Fall Mike Lam, Professor. Memory

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

CS429: Computer Organization and Architecture

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM

CS429: Computer Organization and Architecture

EEM 486: Computer Architecture. Lecture 9. Memory

Unit IV MEMORY SYSTEM PART A (2 MARKS) 1. What is the maximum size of the memory that can be used in a 16-bit computer and 32 bit computer?

14:332:331. Week 13 Basics of Cache

a) Memory management unit b) CPU c) PCI d) None of the mentioned

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

Module 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

CS 31: Intro to Systems Storage and Memory. Kevin Webb Swarthmore College March 17, 2015

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

Memory hierarchy Outline

CMSC 611: Advanced Computer Architecture

ECE 152 Introduction to Computer Architecture

Contents. Main Memory Memory access time Memory cycle time. Types of Memory Unit RAM ROM

Memory Hierarchy and Caches

Storage Technologies and the Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

CENG 4480 L09 Memory 2

CS 33. Memory Hierarchy I. CS33 Intro to Computer Systems XVI 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Course Administration

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.

Memory. Lecture 22 CS301

Where Have We Been? Ch. 6 Memory Technology

Lecture-14 (Memory Hierarchy) CS422-Spring

CS 261 Fall Mike Lam, Professor. Memory

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Memory. Objectives. Introduction. 6.2 Types of Memory

Semiconductor Memory Types Microprocessor Design & Organisation HCA2102

Picture of memory. Word FFFFFFFD FFFFFFFE FFFFFFFF

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Foundations of Computer Systems

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

Memory latency: Affects cache miss penalty. Measured by:

ECE 485/585 Microprocessor System Design

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The Memory Hierarchy 10/25/16

EE414 Embedded Systems Ch 5. Memory Part 2/2

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

MEMORY. Objectives. L10 Memory

LECTURE 5: MEMORY HIERARCHY DESIGN

ECEN 449 Microprocessor System Design. Memories

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

Logic and Computer Design Fundamentals. Chapter 8 Memory Basics

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

CS 201 The Memory Hierarchy. Gerson Robboy Portland State University

Memory Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Computer Organization. 8th Edition. Chapter 5 Internal Memory

Copyright 2012, Elsevier Inc. All rights reserved.

Chapter 8 Memory Basics

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Transcription:

CENG3420 Lecture 08: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest update: February 22, 2018) Spring 2018 1 / 48

Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory Conclusion 2 / 48

Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory Conclusion 3 / 48

Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache 3 / 48

Why We Need Memory? Combinational Circuit: Always gives the same output for a given set of inputs E.g., adders Sequential Circuit: Store information Output depends on stored information E.g., counter Need a storage element 4 / 48

Who Cares About the Memory Hierarchy? Performance 1000 100 10 1 Processor Growth Curve follows Moore s Law 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Time CPU DRAM Processor 60%/yr. (2x/1.5 yr) Processor-Memory Performance Gap: (grows 50% / year) DRAM 9%/yr. (2x/10 yrs) Processor-DRAM Memory Performance Gap 5 / 48

6 / 48

7 / 48

Memory System Revisted Maximum size of memory is determined by addressing scheme E.g. 16-bit addresses can only address 2 16 = 65536 memory locations Most machines are byte-addressable each memory address location refers to a byte Most machines retrieve/store data in words Common abbreviations 1k 2 10 (kilo) 1M 2 20 (Mega) 1G 2 30 (Giga) 1T 2 40 (Tera) 8 / 48

Simplified View Data transfer takes place through MAR: memory address register MDR: memory data register Processor MAR MDR k-bit address bus n-bit data bus Memory Up to 2 k addressable locations Control lines ( R / W, MFC, etc.) Word length = n bits Several addressable locations (bytes) are grouped into a word 9 / 48

Big Picture Processor usually runs much faster than main memory: Small memories are fast, large memories are slow. Use a cache memory to store data in the processor that is likely to be used. Main memory is limited: Use virtual memory to increase the apparent size of physical memory by moving unused sections of memory to disk (automatically). A translation between virtual and physical addresses is done by a memory management unit (MMU) To be discussed in later lectures 10 / 48

Characteristics of the Memory Hierarchy Increasing distance from the processor in access time Processor L1$ L2$ Main Memory 4-8 bytes (word) 8-32 bytes (block) 1 to 4 blocks 1,024+ bytes (disk sector = page) Inclusive what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM Secondary Memory (Relative) size of the memory at each level 11 / 48

Memory Hierarchy: Why Does it Work? Temporal Locality (locality in time) If a memory location is referenced then it will tend to be referenced again soon Keep most recently accessed data items closer to the processor 12 / 48

Memory Hierarchy: Why Does it Work? Temporal Locality (locality in time) If a memory location is referenced then it will tend to be referenced again soon Keep most recently accessed data items closer to the processor Spatial Locality (locality in space) If a memory location is referenced, the locations with nearby addresses will tend to be referenced soon Move blocks consisting of contiguous words closer to the processor 12 / 48

Memory Hierarchy Taking advantage of the principle of locality: Present the user with as much memory as is available in the cheapest technology. Provide access at the speed offered by the fastest technology Processor Datapath Control Registers On-Chip Cache Second Level Cache (SRAM) Main Memory (DRAM) Secondary Storage (Disk) Tertiary Storage (Tape) Speed: ~1 ns Tens ns Hundreds ns 1 us Tens ms Tens sec Size (bytes): Hundreds Mega's Giga's Tera's 13 / 48

Terminology Random Access Memory (RAM) Property: comparable access time for any memory locations Block (or line) the minimum unit of information that is present (or not) in a cache 14 / 48

Terminology Hit Rate: the fraction of memory accesses found in a level of the memory hierarchy Miss Rate: the fraction of memory accesses not found in a level of the memory hierarchy, i.e. 1 - (Hit Rate) Hit Time Time to access the block + Time to determine hit/miss Miss Penalty Time to replace a block in that level with the corresponding block from a lower level Hit Time << Miss Penalty 15 / 48

Bandwidth v.s. Latency Example Mary acts FAST but she s always LATE. Peter is always PUNCTUAL but he is SLOW. 16 / 48

Bandwidth v.s. Latency Example Mary acts FAST but she s always LATE. Peter is always PUNCTUAL but he is SLOW. Bandwidth: talking about the number of bits/bytes per second when transferring a block of data steadily. Latency: amount of time to transfer the first word of a block after issuing the access signal. Usually measure in number of clock cycles or in ns/µs. 16 / 48

Question: Suppose the clock rate is 500 MHz. What is the latency and what is the bandwidth, assuming that each data is 64 bits? Clock Row Access Strobe Data d0 d1 d2 17 / 48

Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory Conclusion 18 / 48

Storage based on Feedback What if we add feedback to a pair of inverters? 18 / 48

Storage based on Feedback What if we add feedback to a pair of inverters? Usually drawn as a ring of cross-coupled inverters Stable way to store one bit of information (w. power) 18 / 48

How to change the value stored? Replace inverter with NOR gate SR-Latch 19 / 48

QUESTION: What s the Q value based on different R, S inputs? R=S=1: S=0, R=1: S=1, R=0: R=S=0: 20 / 48

SRAM Cell At least 6 transistors (6T) Used in most commercial chips A pair of weak cross-coupled inverters Data stored in cross-coupled inverters word bit bit_b 21 / 48

Classical SRAM Organization bit (data) lines R o w D e c o d e r RAM Cell Array Each intersection represents a 6-T SRAM cell word (row) line row address Column Selector & I/O Circuits data bit or word column address One memory row holds a block of data, so the column address selects the requested bit or word from that block 22 / 48

Classical SRAM Organization Latch based memory Write$bitlines Memory$Data$In m WE D D.$.$. D Memory$WriteAddress n Write$Address$Decoder.$.$..$.$. D D D.$.$. D D.$.$. D Read$bitlines Read$word$line Write$word$line.$.$. Read$Address$Decoder n Memory$Read$Address D Q Gated D8latch WE m Memory$Data$Out 23 / 48

DRAM Cell 1 Transistor (1T) Requires presence of an extra capacitor Modifications in the manufacturing process. Higher density Write: Charged or discharged the capacitor (slow) Read: Charge redistribution takes place between bit line and storage capacitance 24 / 48

Classical DRAM Organization bit (data) lines R o w D e c o d e r row address RAM Cell Array Column Selector & I/O Circuits data bit data bit data bit Each intersection represents a 1-T DRAM cell word (row) line column address The column address selects the requested bit from the row in each plane 25 / 48

Synchronous DRAM (SDRAM) The common type used today as it uses a clock to synchronize the operation. The refresh operation becomes transparent to the users. All control signals needed are generated inside the chip. The initial commercial SDRAM in the1990s were designed for clock speed of up to 133MHz. Today s SDRAM chips operate with clock speeds exceeding 1 GHz. Memory modules are used to hold several SDRAM chips and are the standard type used in a computer s motherboard, of size like 4GB or more. 26 / 48

Double Data Rate (DDR) SDRAM normal SDRAMs only operate once per clock cycle Double Data Rate (DDR) SDRAM transfers data on both clock edges DDR-2 (4x basic memory clock) and DDR-3 (8x basic memory clock) are in the market. They offer increased storage capacity, lower power and faster clock speeds. For example, DDR2 can operate at clock frequencies of 400 and 800 MHz. Therefore, they can transfer data at effective clock speed of 800 and 1600 MHz. 27 / 48

Performance of SDRAM 1 Hertz 1 Cycle per second RAM Type SDRAM 100 MHz (PC100) SDRAM 133 MHz (PC133) DDR SDRAM 200 MHz (PC1600) DDR SDRAM 266 MHz (PC2100) DDR SDRAM 333 MHz (PC2600) DDR-2 SDRAM 667 MHz (PC2-5400) DDR-2 SDRAM 800 MHz (PC2-6400) Theoretical Maximum Bandwidth 100 MHz X 64 bit/ cycle = 800 MByte/sec 133 MHz X 64 bit/ cycle = 1064 MByte/sec 2 X 100 MHz X 64 bit/ cycle ~= 1600 MByte/sec 2 X 133 MHz X 64 bit/ cycle ~= 2100 MByte/sec 2 X 166 MHz X 64 bit/ cycle ~= 2600 MByte/sec 2 X 2 X 166 MHz X 64 bit/ cycle ~= 5400 MByte/sec 2 X 2 X 200 MHz X 64 bit/ cycle ~= 6400 MByte/sec 28 / 48 Bandwidth comparison. However, due to latencies, SDRAM does not perform as good as the figures shown.

SRAM v.s. DRAM Static RAM (SRAM) Capable of retaining the state as long as power is applied. They are fast, low power (current flows only when accessing the cells) but costly (require several transistors), so the capacity is small. They are the Level 1 cache and Level 2 cache inside a processor, of size 3 MB or more. Dynamic RAM (DRAM) store data as electric charge on a capacitor. Charge leaks away with time, so DRAMs must be refreshed. In return for this trouble, much higher density (simpler cells). 29 / 48

Memory Hierarchy Processor Registers Aim: to produce fast, big and cheap memory Increasing latency Increasing size Primary cache L1 Increasing speed Increasing cost per bit L1, L2 cache are usually SRAM Main memory is DRAM Relies on locality of reference Secondary cache Main memory L2 Magnetic disk secondary memory 30 / 48

Mix-and-Match: Best of Both By taking advantages of the principle of locality: Present the user with as much memory as is available in the cheapest technology. Provide access at the speed offered by the fastest technology. DRAM is slow but cheap and dense: Good choice for presenting the user with a BIG memory system main memory SRAM is fast but expensive and not very dense: Good choice for providing the user FAST access time L1 and L2 cache 31 / 48

Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory Conclusion 32 / 48

Memory Controller A memory controller is normally used to interface between the memory and the processor. DRAMs have a slightly more complex interface as they need refreshing and they usually have time-multiplex signals to reduce pin number. SRAM interfaces are simpler and may not need a memory controller. Address Row/Column address Processor R/ W Request Clock Memory controller RAS CAS R/ W CS Clock Memory Data RAS (CAS) = Row (Column) Address Strobe; CS = Chip Select 32 / 48

Memory Controller The memory controller accepts a complete address and the R/W signal from the processor. The controller generates the RAS (Row Access Strobe) and CAS (Column Access Strobe) signals. 33 / 48

Memory Controller The memory controller accepts a complete address and the R/W signal from the processor. The controller generates the RAS (Row Access Strobe) and CAS (Column Access Strobe) signals. The high-order address bits, which select a row in the cell array, are provided first under the control of the RAS (Row Access Strobe) signal. Then the low-order address bits, which select a column, are provided on the same address pins under the control of the CAS (Column Access Strobe) signal. 33 / 48

Memory Controller The memory controller accepts a complete address and the R/W signal from the processor. The controller generates the RAS (Row Access Strobe) and CAS (Column Access Strobe) signals. The high-order address bits, which select a row in the cell array, are provided first under the control of the RAS (Row Access Strobe) signal. Then the low-order address bits, which select a column, are provided on the same address pins under the control of the CAS (Column Access Strobe) signal. The right memory module will be selected based on the address. Data lines are connected directly between the processor and the memory. SDRAM needs refresh, but the refresh overhead is only less than 1 percent of the total time available to access the memory. 33 / 48

Memory Module Interleaving Processor and cache are fast, main memory is slow. Try to hide access latency by interleaving memory accesses across several memory modules. Each memory module has own Address Buffer Register (ABR) and Data Buffer Register (DBR) 34 / 48

Memory Module Interleaving Processor and cache are fast, main memory is slow. Try to hide access latency by interleaving memory accesses across several memory modules. Each memory module has own Address Buffer Register (ABR) and Data Buffer Register (DBR) Which scheme below can be better interleaved? k bits Module m bits Address in module MM address m bits Address in module k bits Module MM address ABR DBR ABR DBR ABR DBR ABR DBR ABR DBR ABR DBR Module Module Module 0 i n - 1 Module 0 Module i Module 2 k - 1 (a) Consecutive words in a module (b) Consecutive words in different modules 34 / 48

Memory Module Interleaving Two or more compatible (identical the best) memory modules are used. Within a memory module, several chips are used in parallel. E.g. 8 modules, and within each module 8 chips are used in parallel. Achieve a 8 8 = 64-bit memory bus. Memory interleaving can be realized in technology such as Dual Channel Memory Architecture. bit (data) lines R o w D e c o d e r row address RAM Cell Array Column Selector & I/O Circuits data bit data bit data bit Each intersection represents a 1-T DRAM cell word (row) line column address The column address selects the requested bit from the row in each plane 35 / 48

Non-Interleaving v.s. Interleaving on-chip CPU Cache bus on-chip CPU Cache DRAM Memory (a) Non-Interleaving bus DRAM DRAM DRAM DRAM Memory Memory Memory Memory bank 0 bank 1 bank 2 bank 3 (b) Interleaving 36 / 48

Example Suppose we have a cache read miss and need to load from main memory Assume cache with 8-word block, i.e., cache line size = 8 words (bytes) Assume it takes one clock to send address to DRAM memory and one clock to send data back. In addition, DRAM has 6 cycle latency for first word Good that each of subsequent words in same row takes only 4 cycles Single Memory Read: 1 + 6 + 1 = 8 Cycles 1 6 1 37 / 48

Example: Non-Interleaving 1 6 1 First byte DRAM needs 6 cycle (same as single memory read) 38 / 48

Example: Non-Interleaving 1 6 1 1 4 1 1 4 1 1 4 1 First byte DRAM needs 6 cycle (same as single memory read) All subsequent words DRAM needs 4 cycle Non-overlappings in cache access Assumption: all words are in the same row 38 / 48

Example: Non-Interleaving 1 6 1 1 4 1 1 4 1 1 4 1 First byte DRAM needs 6 cycle (same as single memory read) All subsequent words DRAM needs 4 cycle Non-overlappings in cache access Assumption: all words are in the same row Non-Interleaving Cycle# 1 + 1 6 + 7 4 + 1 = 36 38 / 48

Example: Four Module Interleaving 1 6 1 1 6 1 1 6 1 1 6 1 39 / 48

Example: Four Module Interleaving 1 6 1 1 6 1 1 6 1 1 6 1 1 4 1 1 4 1 1 4 1 1 4 1 Interleaving Cycle# 1 + 6 + 1 8 = 15 39 / 48

Question: To transfer 8 bytes, what is the cycle# if just have TWO-module interleaved? 40 / 48

Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory Conclusion 41 / 48

Major Components of A Computer Processor Devices Control Memory Output Datapath Input Secondary Memory (Disk) Main Memory Cache 41 / 48

Magnetic Disk Long term, nonvolatile storage Lowest level memory: slow; large; inexpensive A rotating platter coated with a magnetic surface A moveable read/write head to access the information Sector Controller + Cache Track Sector Cylinder Track Head Platter 42 / 48

Magnetic Disk (cont.) Latency: average seek time plus the rotational latency Bandwidth: peak transfer time of formatted data from the media (not from the cache) Bandwidth (MB/s) Latency (msec) Year of Introduction In the time the bandwidth doubles, latency improves by a factor of only around 1.2 43 / 48

Read-Only Memory (ROM) Memory content fixed and cannot be changed easily. Useful to bootstrap a computer since RAM is volatile (i.e. lost memory) when power removed. We need to store a small program in such a memory, to be used to start the process of loading the OS from a hard disk into the main memory. PROM/EPROM/EEPROM 44 / 48

Flash Storage First credible challenger to disks Nonvolatile, and 100 1000 faster than disks Wear leveling to overcome wear out problem 45 / 48

FLASH Memory Flash devices have greater density, higher capacity and lower cost per bit. Can be read and written This is normally used for non-volatile storage Typical applications include cell phones, digital cameras, MP3 players, etc. 46 / 48

FLASH Cards Flash cards are made from FLASH chips Flash cards with standard interface are usable in a variety of products. Flash cards with USB interface are widely used memory keys. Larger cards may hold 32GB. A minute of music can be stored in about 1MB of memory, hence 32GB can hold 500 hours of music. 47 / 48

Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory Conclusion 48 / 48

Conclusion Processor usually runs much faster than main memory Common RAM types: SRAM, DRAM, SDRAM, DDR SDRAM Principle of locality: Temporal and Spatial Present the user with as much memory as is available in the cheapest technology. Provide access at the speed offered by the fastest technology. Memory hierarchy: Register Cache Main Memory Disk Tape 48 / 48