Lecture 2: Memory Systems

Similar documents
Mismatch of CPU and MM Speeds

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

William Stallings Computer Organization and Architecture 8th Edition. Cache Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Transistor: Digital Building Blocks

Overview IN this chapter we will study. William Stallings Computer Organization and Architecture 6th Edition

Advanced Memory Organizations

Assignment 1 due Mon (Feb 4pm

Characteristics of Memory Location wrt Motherboard. CSCI 4717 Computer Architecture. Characteristics of Memory Capacity Addressable Units

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Unit 2. Chapter 4 Cache Memory

CS3350B Computer Architecture

Memory Hierarchy Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory.

A Cache Hierarchy in a Computer System

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Memory Hierarchy Y. K. Malaiya

Structure of an Operating System

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Chapter 4. Cache Memory. Yonsei University

a process may be swapped in and out of main memory such that it occupies different regions

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

MEMORY. Objectives. L10 Memory

ECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache.

Caches. Samira Khan March 23, 2017

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 4 - Cache Memory

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

Memory. Lecture 22 CS301

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

COSC 6385 Computer Architecture - Memory Hierarchies (I)

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Page 1. Memory Hierarchies (Part 2)

Introduction to OpenMP. Lecture 10: Caches

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy.

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Caches 3/23/17. Agenda. The Dataflow Model (of a Computer)

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

TK2123: COMPUTER ORGANISATION & ARCHITECTURE. CPU and Memory (2)

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

LECTURE 11. Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Memory Hierarchy: The motivation

Computer & Microprocessor Architecture HCA103

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Memory hierarchy and cache

Page 1. Multilevel Memories (Improving performance using a little cash )

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

A Review on Cache Memory with Multiprocessor System

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S.

EE 4683/5683: COMPUTER ARCHITECTURE

CPE300: Digital System Architecture and Design

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors

Lecture-14 (Memory Hierarchy) CS422-Spring

Memory Technologies. Technology Trends

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

The Memory Hierarchy & Cache

Cache memory. Lecture 4. Principles, structure, mapping

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Introduction to cache memories

Characteristics. Microprocessor Design & Organisation HCA2102. Unit of Transfer. Location. Memory Hierarchy Diagram

Memory Hierarchies &

ECE 485/585 Microprocessor System Design

1. Creates the illusion of an address space much larger than the physical memory

CMPSC 311- Introduction to Systems Programming Module: Caching

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Virtual Memory. Chapter 8

Improving Cache Performance

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

CPSC 352. Chapter 7: Memory. Computer Organization. Principles of Computer Architecture by M. Murdocca and V. Heuring

Memory Hierarchy: Motivation

Physical characteristics (such as packaging, volatility, and erasability Organization.

CS/ECE 3330 Computer Architecture. Chapter 5 Memory

Lecture 9: MIMD Architecture

Transcription:

Lecture 2: Memory Systems Basic components Memory hierarchy Cache memory Virtual Memory Zebo Peng, IDA, LiTH Many Different Technologies Zebo Peng, IDA, LiTH 2

Internal and External Memories CPU Date transfer Control Main Memory Data transfer Control Secondary Memory Zebo Peng, IDA, LiTH 3 Main Memory Model A word (8, 6, 32, or 64 bits) Memory Control Unit 3 2 address 0 a bit Read/write control MBR (in CPU) Address selection MAR (in CPU) Zebo Peng, IDA, LiTH 4 2

Memory Characteristics The most important characteristics of a memory: speed as fast as possible; size as large as possible; cost reasonable price. They are determined by the technology used for implementation. Your personal library Zebo Peng, IDA, LiTH 5 Memory Access Bottleneck CPU Quantitative measurement of the capacity of the bottleneck is the Memory Bandwidth Memory Zebo Peng, IDA, LiTH 6 3

Memory Bandwidth Memory bandwidth denotes the amount of data that can be accessed from a memory per second: M-Bandwidth = amount of data per access memory cycle time Ex. MCT = 00 nano second and 4 bytes (a word) per access: M-Bandwidth = 40 mega bytes per second. There are two basic techniques to increase the bandwidth of a given memory: Reduce the memory cycle time Expensive Memory size limitation Divide the memory into several banks, each of which has its own control unit (using parallelism). Zebo Peng, IDA, LiTH 7 Memory Banks Interleaving placement of program and data 2 3 4 5 8 9 0 4 5 6 7 0 2 3 Control Unit Control Unit Control Unit Control Unit CPU Zebo Peng, IDA, LiTH 8 4

Lecture 2: Memory Systems Basic components Memory hierarchy Cache memory Virtual Memory Zebo Peng, IDA, LiTH 9 Motivation What do we need? A memory to store very large programs/data and to work at a speed comparable to that of the CPU. The reality is: The larger a memory, the slower it will be; The faster a memory, the greater the cost per bit. A solution: To build a composite memory system which combines a small and fast memory with a large and slow memory, and behaves, most of the time, like a large and fast memory. This two-level principle can be extended to a hierarchy of many levels. Zebo Peng, IDA, LiTH 0 5

Memory Hierarchy CPU Registers Cache Main Memory Secondary Memory of direct access type Secondary Memory of archive type Zebo Peng, IDA, LiTH Access time example 0.25 ns ns 8ns 2 ms (for 4KB) 5 s (for 8KB) CPU Registers Cache Main Memory Secondary Memory of direct access type Secondary Memory of archive type Memory Hierarchy Capacity example KB 4MB 8GB 2 TB (00 M/tape) As one goes down the hierarchy, the following occur: Decreasing cost/bit. Increasing capacity. Increasing access time. Decreasing frequency of access by the CPU. Zebo Peng, IDA, LiTH 2 6

Lecture 2: Memory Systems Basic components Memory hierarchy Cache memory Virtual Memory Zebo Peng, IDA, LiTH 3 Mismatch of CPU and MM Speeds Cycle Time (nano second) 0 4 0 3 0 2 0 Speed Gap (ca. one order of magnitude, i.e., 0 times) 0 955 960 965 970 975 980 985 990 2000 2005 200 205 Zebo Peng, IDA, LiTH 4 7

Cache Memory addresses CPU Cache hit addresses instructions and data instructions and data Main Memory Cache addresses instructions and data A cache is a very fast memory which is put between the main memory and the CPU, and used to hold segments of program and data of the main memory. Zebo Peng, IDA, LiTH 5 Zebo s Cache Memory Model Personal library for a high-speed reader Storage cells Cache Memory controller A computer is a predictable and iterative reader. High cache hit ratio, e.g., 96%, is achievable, even with a relatively small cache (e.g., 0.% of memory size). Zebo Peng, IDA, LiTH 6 8

Cache Memory Features It is transparent to the programmers. Only a very small part of the program/data in the main memory has its copy in the cache (e.g., 4MB cache with 8GB memory). If the CPU wants to access program/data not in the cache (called a cache miss), the relevant block of the main memory will be copied into the cache. The intermediate-future memory access will usually refer to the same word or words in the neighborhood, and will not have to involve the main memory. This property of program executions is denoted as locality of reference. Zebo Peng, IDA, LiTH 7 Locality of Reference Programs access a small proportion of their address space at any short period of time. Temporal locality: If an item is accessed, it will tend to be accessed again soon. Spatial locality: If an item is accessed, items whose addresses are close by will tend to be accessed soon. This access pattern is an intrinsic features of the von Neumann architecture: Sequential instruction storage and execution. Loops and iterations (e.g., subroutine calls). Sequential data storage (e.g., array). Zebo Peng, IDA, LiTH 8 9

Layered Memory Performance Average Access Time P hit x T cache_access + ( P hit ) x (T mm_access + T cache_access ) x Block_size + T checking where P hit = the probability of cache hit, cache hit ratio; T cache_access = cache access time; T mm_access = main memory access time; Block_size = number of words in a cache block; and T checking = the time needed to check for cache hit or miss. Ex. A computer has 8MB MM with 00 ns access time, 8KB cache with 0 ns access time, BS=4, and T checking = 2. ns, P hit = 0.97, AAT will be 25 ns. Zebo Peng, IDA, LiTH 9 Cache Design The size and nature of the copied block must be carefully designed, as well as the algorithm to decide which block to be removed from the cache when it is full: Cache block size (line size). Total cache size. Mapping function. Replacement method. Write policy. Numbers of caches: Single, two-level, or three-level cache. Unified vs. split cache. Zebo Peng, IDA, LiTH 20 0

Split Data and Instruction Caches? Split caches (Harvard Architectures): + Competition for the cache between instruction processing and execution units is eliminated. + Instruction fetch can proceed in parallel with memory access from the CPU for operands. One may be overloaded while the other is under utilized. Unified caches: + Better balance the load between instruction and data fetches depending on the dynamics of the program execution. + Design and implementation are cheaper. Lower performance. Zebo Peng, IDA, LiTH 2 Direct Mapping Cache Direct mapping - Each block of the main memory is mapped into a fixed cache slot. 2 2 Storage cells 2 Cache Memory controller Zebo Peng, IDA, LiTH 22

Direct Mapping Cache Example We have a 0,000-word MM and a 00-word cache. 0 memory cells are grouped into a block. Tag Slot Word Memory address = 2 9990-9999 0 5 Tag Slot No. 020-029 00-09 000-009 0020-0029 000-009 0000-0009 0 9 8 7 6 5 4 3 2 0 90-99 80-89 70-79 66-69 50-59 40-49 30-39 20-29 0-9 00-09 0,000-Word Memory 00-Word Cache Zebo Peng, IDA, LiTH 23 Direct Mapping Pros & Cons Simple to implement and therefore inexpensive. Very fast checking time for cache hit or miss. Fixed location for blocks. If a program accesses 2 blocks that map to the same cache slot repeatedly, cache miss rate is very high. 2 2 Storage cells 2 Cache Memory controller Zebo Peng, IDA, LiTH 24 2

Associative Mapping A main memory block can be loaded into any slot of the cache. To determine if a block is in the cache, a mechanism is needed to simultaneously examine every slot s tag. Memory address = Tag Word 9990-9999 020-029 00-09 000-009 0020-0029 000-009 0000-0009 0,000-Word Memory Tag 006, 007 Associative memory example Tag (3 ps) 0 0 2 8 7 0 0 2 9 7 00-Word Cache 90-99 80-89 70-79 66-69 50-59 40-49 30-39 20-29 0-9 00-09 Zebo Peng, IDA, LiTH 25 Fully Associative Organization Zebo Peng, IDA, LiTH 26 3

Set Associative Organization The cache is divided into a number of sets (K). Each set contains a number of slots (W). A given block maps to any slot in a given set. e.g. block i can be in any slot of set j. For example, 2 slots per set (W = 2): 2-way associative mapping. A given block can be in one of 2 slots. Direct mapping: W = (no alternative). Fully associative: K = (W = total number of all slots in the cache, all mappings possible). W is the most important parameter (typically 2-6). Zebo Peng, IDA, LiTH 27 Replacement Algorithms With direct mapping, it is no need. With associative mapping, a replacement algorithm is needed in order to determine which block to replace: First-in-first-out (FIFO). Least-recently used (LRU) - replace the block that has been in the cache longest with not reference to it. Lest-frequently used (LFU) - replace the block that has experienced the fewest references. Random. Use info 5:55 Tag Zebo Peng, IDA, LiTH 28 4

The problem: Write Policy How to keep cache content and main memory content consistent without losing too much performance? Write through: All write operations are passed to main memory: If the addressed location is currently in the cache, the cache is updated so that it is coherent with the main memory. For writes, the processor always slows down to main memory speed. Since the percentage of writes is small (ca. 5%), this scheme doesn t lead to large performance reduction. Zebo Peng, IDA, LiTH 29 Write Policy (Cont d) Write through with buffered write: The same as write-through, but instead of slowing the processor down by writing directly to main memory, the write address and data are stored in a high-speed write buffer; the write buffer transfers data to main memory while the processor continues its task. Higher speed, but more complex hardware. Write back: Write operations update only the cache memory which is not kept coherent with main memory. When the slot is replaced from the cache, its content has to be copied back to memory. Good performance (usually several writes are performed on a cache block before it is replaced), but more complex hardware is needed. Cache coherence problems are very complex and difficult to solve in multiprocessor systems (to be discussed later)! Zebo Peng, IDA, LiTH 30 5

Cache Architecture Examples Intel Pentium (introduced 993) Two on-chip caches, one for data and one for instructions. Each cache: 8 KB. Line size: 32 bytes. 2-way set associative organization. AMD Opteron 40 (introduced 2003) Two L cache: one for instruction and one for data; 64 KB each. 2-way associative organization. L2 cache: MB, 6-way associative organization. ARM Cortex-A5 (introduced 202) Each core has separate L data and instruction caches. 64 KB (32 KB I-cache, 32 KB D-cache) per core. L2 cache, unified and common for all cores, up to 4 MB. Zebo Peng, IDA, LiTH 3 3-Level Cache Example Intel Itanium 2 (introduced 2002): L L2 L3 Contents Split D and I Unified D + I Unified D + I Size 6 Kbytes each 256 Kbytes 3 Mbytes Line size 64 bytes 28 bytes 28 bytes Associativity 4 way 8 way 2 way Access time cycle 5-7 cycles 4-7 cycles Store policy Write-through Write-back Write-back Zebo Peng, IDA, LiTH 32 6

Lecture 2: Memory Systems Basic components Memory hierarchy Cache memory Virtual Memory Zebo Peng, IDA, LiTH 33 Motivation for Virtual Memory The physical main memory (RAM) is relatively limited in capacity. It may not be big enough to store all the executing programs at the same time. A program may need memory larger than the main memory size, but the whole program doesn t need to be kept in the main memory at the same time. Virtual Memory takes advantage of the fact that at any given instant of time, an executing program needs only a fraction of the memory that the whole program occupies. The basic idea: Load only pieces of each executing program which are currently needed. Zebo Peng, IDA, LiTH 34 7

Paging of Memory Divide programs (processes) into equal sized, small blocks, called pages. Divide the primary memory into equal sized, small blocks called page frames. Allocate the required number of page frames to a program. A program does not require continuous page frames! The operating system (OS) is responsible for: Maintaining a list of free frames. Using a page table to keep track of the mapping between pages and page frames. Zebo Peng, IDA, LiTH 35 Logical and Physical Addresses Implementation of the page-tables: Main memory slow since an extra memory access is needed. 0 2 3 Separate registers fast but expensive. Cache. Zebo Peng, IDA, LiTH 36 8

Objective of Virtual Memory To give the programmer a much bigger memory space than the main memory with the help of the operative system. Virtual memory size is very much bigger than the main memory size. Program addresses 0000 000 2000 3000 MM addresses 0000 000 2000 3000 4000 Secondary memory 5000 Zebo Peng, IDA, LiTH 37 Page Fault When accessing a VM page which is not in the main memory, a page fault occurs. The page must then be loaded from the secondary memory into the main memory by the OS. Virtual Address Page Number Offset Page Map Page Fault (Interrupt to OS) Pages in MM Zebo Peng, IDA, LiTH 38 9

Page Replacement When a page fault occurs and all page frames are occupied, one of them must be replaced. If the replaced page has been modified during the time it resides in the main memory, the updated version should be written back to the secondary memory. Our wish is to replace the page which will not be accessed in the future for the longest amount of time. Problem We don t know exactly what will happen in the future. Solution We predict the future by studying the access patterns up till now ( learn from history ). Zebo Peng, IDA, LiTH 39 Replacement Algorithms FIFO (First In First Out) To replace the one in MM the longest of time. LRU (Least Recently Used) To replace the one that has not be accessed the longest time. LFU (Least Frequently Used) To replace the one that has the smallest number of access during the latest time period. The replacement by random (used for Cache) is not used for VM! Zebo Peng, IDA, LiTH 40 20

Summary A memory system has to store very large programs and a lot of data and still provide fast access. No single type of memory can provide all such need of a computer system. Therefore, several different storage mechanisms are organized in a layer hierarchy. The layer structure works very well due to the locality of reference principle. Cache is a hardware solution to improve memory access which is transparent to the programmers. Virtual memory provides a much larger address space than the available physical space with the help of the OS (software solution). Zebo Peng, IDA, LiTH 4 2