Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

Similar documents
Virtual or Logical. Logical Addr. MMU (Memory Mgt. Unit) Physical. Addr. 1. (50 ns access)

Memory Hierarchy Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory.

1. Creates the illusion of an address space much larger than the physical memory

CPE300: Digital System Architecture and Design

Processes and Tasks What comprises the state of a running program (a process or task)?

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

Virtual to physical address translation

Course Outline. Processes CPU Scheduling Synchronization & Deadlock Memory Management File Systems & I/O Distributed Systems

CS162 Operating Systems and Systems Programming Lecture 14. Caching (Finished), Demand Paging

CS 153 Design of Operating Systems Winter 2016

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

Computer Systems. Virtual Memory. Han, Hwansoo

ECE 341. Lecture # 18

Virtual Memory. 1 Administrivia. Tom Kelliher, CS 240. May. 1, Announcements. Homework, toolboxes due Friday. Assignment.

V. Primary & Secondary Memory!

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 8. Virtual Memory

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

Chapter 8 & Chapter 9 Main Memory & Virtual Memory

CS450/550 Operating Systems

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Last Class. Demand Paged Virtual Memory. Today: Demand Paged Virtual Memory. Demand Paged Virtual Memory

a process may be swapped in and out of main memory such that it occupies different regions

CS399 New Beginnings. Jonathan Walpole

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

CS 333 Introduction to Operating Systems. Class 11 Virtual Memory (1) Jonathan Walpole Computer Science Portland State University

Memory and multiprogramming

Chapter 4 Memory Management

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management

Virtual Memory Review. Page faults. Paging system summary (so far)

Memory Management. Chapter 4 Memory Management. Multiprogramming with Fixed Partitions. Ideally programmers want memory that is.

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System

LECTURE 11. Memory Hierarchy

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Caching and Demand-Paged Virtual Memory

Translation Buffers (TLB s)

Memory: Page Table Structure. CSSE 332 Operating Systems Rose-Hulman Institute of Technology

Topic 18: Virtual Memory

Why memory hierarchy

Operating Systems. Paging... Memory Management 2 Overview. Lecture 6 Memory management 2. Paging (contd.)

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1

Virtual Memory 1. Virtual Memory

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CS370 Operating Systems

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

Virtual Memory Nov 9, 2009"

Topics: Memory Management (SGG, Chapter 08) 8.1, 8.2, 8.3, 8.5, 8.6 CS 3733 Operating Systems

Virtual Memory COMPSCI 386

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

ECE331: Hardware Organization and Design

Introduction to Virtual Memory Management

15 Sharing Main Memory Segmentation and Paging

Lecture 2: Memory Systems

CISC 7310X. C08: Virtual Memory. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 3/22/2018 CUNY Brooklyn College

Memory Management. To improve CPU utilization in a multiprogramming environment we need multiple programs in main memory at the same time.

Assignment 1 due Mon (Feb 4pm

ECE468 Computer Organization and Architecture. Virtual Memory

Processes and Virtual Memory Concepts

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

Addresses in the source program are generally symbolic. A compiler will typically bind these symbolic addresses to re-locatable addresses.

ECE4680 Computer Organization and Architecture. Virtual Memory

VIRTUAL MEMORY READING: CHAPTER 9

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

16 Sharing Main Memory Segmentation and Paging

Pipelined processors and Hazards

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

Memory Hierarchy. Mehran Rezaei

CS 5523 Operating Systems: Memory Management (SGG-8)

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

ECE232: Hardware Organization and Design

Memory management. Requirements. Relocation: program loading. Terms. Relocation. Protection. Sharing. Logical organization. Physical organization

Motivation. Memory Management. Memory Management. Computer Hardware Review

Memory Hierarchy: Caches, Virtual Memory

Virtual Memory 1. Virtual Memory

The memory of a program. Paging and Virtual Memory. Swapping. Paging. Physical to logical address mapping. Memory management: Review

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

MEMORY: SWAPPING. Shivaram Venkataraman CS 537, Spring 2019

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Transcription:

Memory Hierarchy Goal: Fast, unlimited storage at a reasonable cost per bit. Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory. Fast: When you need something from memory check faster cache(s) first for a copy Unlimited storage: Virtual memory - executing program s logical address space (ML pgm, run-time stack, heap, global memory) completing out on disk with main memory (DRAM) acting like cache for hard disk. Cache and Virtual Memory -

Main Idea of a Cache - keep a copy of frequently used information as close (w.r.t access time) to the processor as possible. CPU Memory cache cache line memory System Bus Steps when the CPU generates a memory request: ) check the (faster) cache first ) If the addressed memory value is in the cache (called a hit), then no need to access memory ) If the addressed memory value is NOT in the cache (called a miss), then transfer the of memory containing the reference to cache. (The CPU is stalled and idle while this occurs) 4) The cache supplies the memory value from the cache. Effective (Average) Memory Access Time Suppose that the hit time (i.e., access time of cache, t c ) is ns, the cache miss penalty (i.e., load cache line from memory might involve multiple reads) is 5 ns, and the hit ratio is 99% (so miss ratio is %). Effective Access Time l (hit time) + (miss penalty * miss ratio) Effective Access Time = + 5 * ( -.99) = +.5 =.5 ns (One way to reduce the miss penalty is to not have the cache wait for the whole to be read from memory before supplying the accessed memory word.) Cache and Virtual Memory -

Fortunately, programs exhibit locality of reference that helps achieve high hit-ratios: ) spatial locality - if a (logical) memory address is referenced, nearby memory addresses will tend to be referenced soon. ) temporal locality - if a memory address is referenced, it will tend to be referenced again soon. Typical Flow of Control in a Program boundaries "Text" segment "Data" segment a locality of reference a locality of reference for i := end for 5 6 7 8 Data references within the loop i: Run-Time Stack Global Data Cache and Virtual Memory -

Three Types of Cache Cache - Small fast memory between the CPU and RAM/Main memory. Example: -bit address 5 KB ( 9 ) 8 byte per /line byte-addressable memory Number of Cache Line = size of cache size of line = 9 = 6 ) Direct-mapped - a memory maps to a single cache line Line # Block # 6 - - 6 6 6 + -bit address: 6 line # offset Cache and Virtual Memory - 4

Cache - Small fast memory between the CPU and RAM/Main memory. Example: -bit address, byte-addressable memory 5 KB ( 9 ) 8 byte per /line Number of Cache Line = size of cache size of line = 9 = 6 ) Fully-Associative Cache - a memory can map to any cache line Line # 9 Block # 6 - - 6 6 6 + -bit address: 9 offset Advane: Flexibility on what s in the cache Disadvane: Complex circuit to compare all s of the cache with the in the target address Therefore, they are expensive and slower so use only for small caches (say 8-64 lines) Replacement algorithms - on a miss of a full cache, we must select a in the cache to replace LRU - replace the cache that has not been used for the longest time (need additional bits) Random - select a randomly (only slightly worse that LRU) FIFO - select the that has been in the cache for the longest time (slightly worse that LRU) Cache and Virtual Memory - 5

) Set-Associative Cache - a memory can map to a small (, 4, or 8) set of cache lines Common Possibilities: -way set associative - each memory can map to either of two lines in the cache 4-way set associative - each memory can map to either of four lines in the cache Number of Sets = number of cache lines size of each set = 6 4 = 6 = 4 Other Bits 4-way Set Associative Cache way way way way 5 5 5 5 Set # Block # 4-4 - 4 4 + -bit address: 5 4 set # offset Cache and Virtual Memory - 6

Typical system view of the memory hierarchy Virtual Memory - programmer views memory as large address space without concerns about the amount of physical memory or memory management. (What do the terms -bit (or 64-bit) operating system mean?) Benefits: ) programs can be bigger that physical memory size since only a portion of them may actually be in physical memory ) higher degree of multiprogramming is possible since only portions of programs are in memory An Operating System goal with hardware support is to make virtual memory efficient and transparent to the user. Memory-Management Unit (MMU) for paging Running Process A CPU page# offset Page Table for A Frame# Bit 4 5 6 4-5 - - Physical Memory Frame Number page of B page of A page 5 of A page 5 of B 4 page of A 5 page of A 6 page 4 of B frame# offset 5 5 5 Process B page page page page page 4 page 5 page 6 Process A page page page page page 4 page 5 page 6 Logical Addr. Physical Addr. Note: The bit is called the Resident R-bit in the textbook (Figure 9.9). Cache and Virtual Memory - 7

Demand paging is a common way for OSs to implement virtual memory. Demand paging ( lazy pager ) only brings a page into physical memory when it is needed. A bit is used in a page table entry to indicate if the page is in memory or only on disk. A page fault occurs when the CPU generates a logical address for a page that is not in physical memory. The MMU will cause a page-fault trap (interrupt) to the OS. Steps for OS s page-fault trap handler: ) Check page table to see if the page is valid (exists in logical address space). If it is invalid, terminate the process; otherwise continue. ) Find a free frame in physical memory (take one from the free-frame list or replace a page currently in memory). ) Schedule a disk read operation to bring the page into the free page frame. (We might first need to schedule a previous disk write operation to update the virtual memory copy of a dirty page that we are replacing.) 4) Since the disk operations are soooooo slooooooow, the OS would context switch to another ready process selected from the ready queue. 5) After the disk (a DMA device) reads the page into memory, it involves an I/O completion interrupt. The OS will then update the PCB and page table for the process to indicate that the page in now in memory and the process is ready to run. 6) When the process is selected by the short-term scheduler to run, it repeats the instruction that caused the page fault. The memory reference that caused the page fault will now succeed. Performance of Demand Paging To achieve acceptable performance degradation (5-%) of our virtual memory, we must have a very low page fault rate (probability that a page fault will occur on a memory reference). When does a CPU perform a memory reference? ) Fetch instructions into CPU to be executed ) Fetch operands used in an instruction (load and store instructions on RISC machines) Example: Let p be the page fault rate, and ma be the memory-access time. Assume that p =., ma = 5 ns and the time to perform a page fault is,, ns (. ms). effective memory prob. of main memory = & access time no page fault access time = ( - p) * 5ns + p *,, =.98 * 5ns +. *,, = 44,49ns The program would appear to run very slowly!!! + prob. of page fault & page fault time If we only want say % slow down of our memory, then the page fault rate must be much better! 55 = ( - p) * 5ns + p *,,ns 55 = 5-5p +,,p p =.4 or page fault in,49,99 references Fortunately, programs exhibit locality of reference that helps achieve low page-fault rates. Page size is typically 4 KB. Cache and Virtual Memory - 8

Storage of the Page Table Issues ) Where is it located? If it is in memory, then each memory reference in the program, results in two memory accesses; one for the page table entry, and another to perform the desired memory access. Solution: TLB (Translation-lookaside Buffer) - small, fully-associative cache to hold PT entries Ideally, when the CPU generates a memory reference, the PT entry is found in the TLB, the page is in memory, and the with the page is in the cache, so NO memory accesses are needed. However, each CPU memory reference involves two cache lookups and these cache lookups must be done sequentially, i.e., first check TLB to get physical frame # used to build the physical address, then use the physical address to check the of the L cache. Alternatively, the L cache can contain virtual addresses (called a virtual cache). This allows the TLB and cache access to be done in parallel. If the cache hits, the result of the TLB is not used. If the cache misses, then the address translation is under way and used by the L cache. ) Ways to handle large page tables: Page table for each process can be large e.g., -bit address, 4 KB ( bytes) pages, byte-addressable memory, 4 byte PT entry bits Page # bits Offset M ( ) of page table entries, or 4MB for the whole page table with 4 byte page table entries A solution: a) two-level page table - the first level (the "directory") acts as an index into the page table which is scattered across several pages. Consider a -bit example with 4KB pages and 4 byte page table entries. Virtual Addr. bits Page # bit bit directory page index offset bits Offset First K entries of the page table Directory Page Bit Frame#.. Second K entries of the page table Frame# Offset Physical Addr. Problem with paging: ) Protection unit is a page, i.e., each Page Table Entry can contain protection information, but the virtual address space is divided into pages along arbitrary boundaries. Cache and Virtual Memory - 9

Segmentation - divides virtual address space in terms of meaningful program modules which allows each to be associated with different protection. For example, a segment containing a matrix multiplication subprogram could be shared by several programs. Programmer views memory as multiple address spaces, i.e., segments. Memory references consist of two parts: < segment #, offset within segment >. Main Subpgm A Subpgm B Global Data Run-time Stack <, > 64 78 <, > segment segment 4 segment segment segment 4 As in paging, the operating system with hardware support can move segments into and out of memory as needed by the program. Each process (running program) has its own segment table similar to a page table for performing address translations. Running Process X CPU 4 5 6 Segment Table for X Location Bit - - 4 - - Length 78 64 4 4 Physical Global Data seg Stack seg 4 Main seg Subpgm A seg Subpgm B seg Stack seg 4 seg.# offset + Main seg <, > Global Data seg Logical Addr. Physical Addr. Problems with Segmentation: ) hard to manage memory efficiently due to external fragmentation ) segments can be large in size so not many can be loaded into memory at one time Solution: Combination of paging with segmentation by paging each segment. Cache and Virtual Memory -

Page Table for Seg. Frame# Bit - - - Running Process X CPU seg.# offset 4 page offset Logical Addr. Segment Table for X Pointer to Bit Page Table 4 5 6 Page Table for Seg. Frame# Bit 6 - - - Page Table for Seg. Frame# Bit - - - - Page Table for Seg. Frame# Bit 4 - - Physical Memory Frame Number page of seg 4 page of seg page of seg page 4 of seg 4 4 page of seg 5 page of seg 4 6 page of seg Physical Addr. frame# offset 4... Cache and Virtual Memory -