Virtual Memory. Samira Khan Apr 27, 2017

Similar documents
18-447: Computer Architecture Lecture 18: Virtual Memory III. Yoongu Kim Carnegie Mellon University Spring 2013, 3/1

Computer Systems. Virtual Memory. Han, Hwansoo

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

Computer Architecture Lecture 13: Virtual Memory II

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Processes and Virtual Memory Concepts

CSE 351. Virtual Memory

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

Virtual Memory: Systems

CSE 153 Design of Operating Systems

Virtual Memory: Concepts

14 May 2012 Virtual Memory. Definition: A process is an instance of a running program

Lecture 19: Virtual Memory: Concepts

Virtual Memory II. CSE 351 Autumn Instructor: Justin Hsia

Virtual Memory II. CSE 351 Autumn Instructor: Justin Hsia

CS 153 Design of Operating Systems

Virtual Memory II CSE 351 Spring

Virtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011

A Few Problems with Physical Addressing. Virtual Memory Process Abstraction, Part 2: Private Address Space

Carnegie Mellon. 16 th Lecture, Mar. 20, Instructors: Todd C. Mowry & Anthony Rowe

Foundations of Computer Systems

P6/Linux Memory System Nov 11, 2009"

Virtual Memory: Systems

Virtual Memory. Physical Addressing. Problem 2: Capacity. Problem 1: Memory Management 11/20/15

Virtual Memory: Concepts

This lecture. Virtual Memory. Virtual memory (VM) CS Instructor: Sanjeev Se(a

virtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk

Memory System Case Studies Oct. 13, 2008

Virtual Memory Nov 9, 2009"

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 24

Virtual Memory: Concepts

CS 153 Design of Operating Systems

Understanding the Design of Virtual Memory. Zhiqiang Lin

Pentium/Linux Memory System March 17, 2005

Virtual Memory Oct. 29, 2002

Virtual Memory. Alan L. Cox Some slides adapted from CMU slides

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23

Virtual Memory. Computer Systems Principles

VM as a cache for disk

University of Washington Virtual memory (VM)

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 23

CISC 360. Virtual Memory Dec. 4, 2008

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory. CS 3410 Computer System Organization & Programming

Virtual Memory 2. q Handling bigger address spaces q Speeding translation

Virtual Memory. Virtual Memory

Virtual Memory. Motivation:

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

VIRTUAL MEMORY II. Jo, Heeseung

Virtual Memory 3. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. P & H Chapter 5.4

Virtual Memory 2. Today. Handling bigger address spaces Speeding translation

Virtual Memory. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Virtual Memory 2. To do. q Handling bigger address spaces q Speeding translation

CSE 451: Operating Systems Winter Page Table Management, TLBs and Other Pragmatics. Gary Kimura

1. Creates the illusion of an address space much larger than the physical memory

Virtual Memory. Today. Handling bigger address spaces Speeding translation

Page Which had internal designation P5

CS162 Operating Systems and Systems Programming Lecture 14. Caching (Finished), Demand Paging

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan

Virtual Memory 2. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. P & H Chapter 5.4

CS 153 Design of Operating Systems Winter 2016

Virtual Memory, Address Translation

Computer Structure. X86 Virtual Memory and TLB

EECS 470. Lecture 16 Virtual Memory. Fall 2018 Jon Beaumont

Intel P6 (Bob Colwell s Chip, CMU Alumni) The course that gives CMU its Zip! Memory System Case Studies November 7, 2007.

Virtual Memory. Stefanos Kaxiras. Credits: Some material and/or diagrams adapted from Hennessy & Patterson, Hill, online sources.

CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8)

Computer Science 146. Computer Architecture

Virtual Memory. CS 351: Systems Programming Michael Saelee

Virtual Memory I. CSE 351 Spring Instructor: Ruth Anderson

CSE 120 Principles of Operating Systems Spring 2017

CSE 120 Principles of Operating Systems

Memory Hierarchy Requirements. Three Advantages of Virtual Memory

Translation Buffers (TLB s)

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

CS 318 Principles of Operating Systems

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

CS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory. Bernhard Boser & Randy Katz

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

Intel P The course that gives CMU its Zip! P6/Linux Memory System November 1, P6 memory system. Review of abbreviations

Problem 9. VM address translation. (9 points): The following problem concerns the way virtual addresses are translated into physical addresses.

Virtual memory Paging

ECE232: Hardware Organization and Design

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, SPRING 2013

CSE 560 Computer Systems Architecture

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, FALL 2012

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

Address Translation. Tore Larsen Material developed by: Kai Li, Princeton University

Transcription:

Virtual Memory Samira Khan Apr 27, 27

Virtual Memory Idea: Give the programmer the illusion of a large address space while having a small physical memory So that the programmer does not worry about managing physical memory Programmer can assume he/she has infinite amount of physical memory Hardware and software cooperatively and automatically manage the physical memory space to provide the illusion Illusion is maintained for each independent process 2

Basic Mechanism Indirection (in addressing) Address generated by each instruction in a program is a virtual address i.e., it is not the physical address used to address main memory An address translation mechanism maps this address to a physical address Address translation mechanism can be implemented in hardware and software together At the heart [...] is the notion that address is a concept distinct from physical location. Peter Denning 3

Overview of Paging Virtual Page Process Virtual Page Process 2 virtual virtual 4GB 4GB Physical Page Frame physical 6MB 4

Review: Virtual Memory & Physical Memory Virtual address Valid PTE PTE 7 Physical page number or disk address null null Memory resident page table (DRAM) Physical memory (DRAM) VP VP 2 VP 7 VP 4 Virtual memory (disk) VP VP 2 VP 3 VP 4 VP 6 VP 7 PP PP 3 A page table contains page table entries (PTEs) that map virtual pages to physical pages. 5

Translation Assume: Virtual Page 7 is mapped to Physical Page 32 For an access to Virtual Page 7 Virtual Address: 3 2 VPN Offset Translated Physical Address: 27 2 PPN Offset 6

Address Translation With a Page Table Page table base register (PTBR) (CR3 in x86) Virtual address n- Virtual page number (VPN) p p- Virtual page offset (VPO) Physical page table address for the current process Page table Valid Physical page number (PPN) Valid bit = : Page not in memory (page fault) Valid bit = m- p p- Physical page number (PPN) Physical page offset (PPO) Physical address 7

Address Translation: Page Hit CPU Chip CPU VA MMU 2 PTEA PTE 3 PA Cache/ Memory 4 Data 5 ) Processor sends virtual address to MMU 2-3) MMU fetches PTE from page table in memory 4) MMU sends physical address to cache/memory 5) Cache/memory sends data word to processor 8

Address Translation: Page Fault Exception Page fault handler 4 CPU Chip CPU VA 7 MMU 2 PTEA PTE 3 Cache/ Memory Victim page 5 New page Disk 6 ) Processor sends virtual address to MMU 2-3) MMU fetches PTE from page table in memory 4) Valid bit is zero, so MMU triggers page fault exception 5) Handler identifies victim (and, if dirty, pages it out to disk) 6) Handler pages in new page and updates PTE in memory 7) Handler returns to original process, restarting faulting instruction 9

Integrating VM and Cache PTE CPU Chip CPU VA MMU PTEA PA PTEA hit PTEA miss PA miss PTE PTEA PA Memory PA hit Data Data L cache VA: virtual address, PA: physical address, PTE: page table entry, PTEA = PTE address

Two Problems Two problems with page tables Problem #: Page table is too large Problem #2: Page table is stored in memory Before every memory access, always fetch the PTE from the slow memory? è Large performance penalty

Multi-Level Page Tables Suppose: 4KB (2 2 ) page size, 48-bit address space, 8-byte PTE Level 2 Tables Problem: Would need a 52 GB page table! 2 48 * 2-2 * 2 3 = 2 39 bytes Level Table Common solution: Multi-level page table Example: 2-level page table Level table: each PTE points to a page table (always memory resident) Level 2 table: each PTE points to a page (paged in and out like any other data)...... 2

A Two-Level Page Table Hierarchy Level page table PTE PTE PTE 2 (null) PTE 3 (null) PTE 4 (null) PTE 5 (null) PTE 6 (null) PTE 7 (null) PTE 8 (K - 9) null PTEs Level 2 page tables PTE... PTE 23 PTE... PTE 23 23 null PTEs Virtual memory VP... VP 23 VP 24... VP 247 Gap PTE 23 23 unallocated pages VP 925... 2K allocated VM pages for code and data 6K unallocated VM pages 23 unallocated pages allocated VM page for the stack 32 bit addresses, 4KB pages, 4-byte PTEs 3

Translating with a k-level Page Table Page table base register (PTBR) n- VPN VIRTUAL ADDRESS VPN 2... VPN k p- VPO the Level page table a Level 2 page table...... a Level k page table PPN m- PPN p- PPO PHYSICAL ADDRESS 4

Translation: Flat Page Table pte_t PAGE_TABLE[<<2];// 32-bit VA, 28-bit PA, 4KB page PAGE_TABLE[7]=2; PAGE_TABLE 5 NULL PTE <<2- NULL PTE 7 NULL PTE NULL PTE Virtual Address 3 2 XXX VPN Offset Physical Address 27 2 XXX PPN Offset 5

Translation: Two-Level Page Table pte_t *PAGE_DIRECTORY[<<]; PAGE_DIRECTORY[]=malloc((<<)*sizeof(pte_t)); PAGE_DIRECTORY[][7]=2; PAGE_DIR 3 PDE 23 NULL PAGE_TABLE 5 NULL PTE 23 PDE PDE NULL NULL &PT NULL PTE 7 NULL PTE VPN[3:2]=_ Directory index Table index 6

Two-Level Page Table (x86) CR3: Control Register 3 (or Page Directory Base Register) Stores the physical address of the page directory Q: Why not the virtual address? 7

Multi-Level Page Table (x86-64) 8

Per-Process Virtual Address Space Each process has its own virtual address space Process X: text editor Process Y: video player X writing to its virtual address does not affect the data stored in Y s virtual address (or any other address) This was the entire purpose of virtual memory Each process has its own page directory and page tables On a context switch, the CR3 s value must be updated CR3 X s PAGE_DIR Y s PAGE_DIR 9

Two Problems Two problems with page tables Problem #: Page table is too large Page table has M entries Each entry is 4B (because 4B 2-bit PPN) Page table = 4MB (!!) very expensive in the 8s Solution: Hierarchical page table Problem #2: Page table is in memory Before every memory access, always fetch the PTE from the slow memory? è Large performance penalty 2

Speeding up Translation with a TLB Page table entries (PTEs) are cached in L like any other memory word PTEs may be evicted by other data references PTE hit still requires a small L delay Solution: Translation Lookaside Buffer (TLB) Small set-associative hardware cache in MMU Maps virtual page numbers to physical page numbers Contains complete page table entries for small number of pages 2

Accessing the TLB MMU uses the VPN portion of the virtual address to access the TLB: TLBT matches tag of line within set VPN n- p+t TLB tag (TLBT) T = 2 t sets p+t- p p- TLB index (TLBI) VPO Set v tag PTE v tag PTE TLBI selects the set Set v tag PTE v tag PTE Set T- v tag PTE v tag PTE 22

TLB Hit CPU Chip TLB 2 VPN PTE 3 CPU VA MMU PA 4 Cache/ Memory Data 5 A TLB hit eliminates a memory access 23

TLB Miss CPU Chip 2 TLB 4 PTE VPN CPU VA MMU 3 PTEA PA 5 Cache/ Memory Data 6 A TLB miss incurs an additional memory access (the PTE) Fortunately, TLB misses are rare. Why? 24

Simple Memory System Example Addressing 4-bit virtual addresses 2-bit physical address Page size = 64 bytes 3 2 9 8 7 6 5 4 3 2 VPN Virtual Page Number VPO Virtual Page Offset 9 8 7 6 5 4 3 2 PPN Physical Page Number PPO Physical Page Offset 25

Simple Memory System TLB 6 entries 4-way associative TLBT TLBI 3 2 9 8 7 6 5 4 3 2 VPN Translation Lookaside Buffer (TLB) VPN = b PPN =? VPO Set Tag PPN Valid Tag PPN Valid Tag PPN Valid Tag PPN Valid 3 9 D 7 2 3 2D 2 4 A 2 2 8 6 3 3 7 3 D A 34 2 26

Simple Memory System Page Table Only showing the first 6 entries (out of 256) VPN PPN Valid VPN PPN Valid VPN = b PPN =? 2 3 28 33 2 8 9 A B 3 7 9 4 5 6 6 C D E 2D xd x2d 7 F D 27

Context Switches Assume that Process X is running Process X s VPN 5 is mapped to PPN The TLB caches this mapping VPN 5 à PPN Now assume a context switch to Process Y Process Y s VPN 5 is mapped to PPN 2 When Process Y tries to access VPN 5, it searches the TLB Process Y finds an entry whose tag is 5 Hurray! It s a TLB hit! The PPN must be! Are you sure? 28

Context Switches (cont d) Approach #. Flush the TLB Whenever there is a context switch, flush the TLB All TLB entries are invalidated Example: 8836 Updating the value of CR3 signals a context switch This automatically triggers a TLB flush Approach #2. Associate TLB entries with processes All TLB entries have an extra field in the tag... That identifies the process to which it belongs Invalidate only the entries belonging to the old process Example: Modern x86, MIPS 29

Handling TLB Misses The TLB is small; it cannot hold all PTEs Some translations will inevitably miss in the TLB Must access memory to find the appropriate PTE Called walking the page directory/table Large performance penalty Who handles TLB misses?. Hardware-Managed TLB 2. Software-Managed TLB 3

Handling TLB Misses (cont d) Approach #. Hardware-Managed (e.g., x86) The hardware does the page walk The hardware fetches the PTE and inserts it into the TLB If the TLB is full, the entry replaces another entry All of this is done transparently Approach #2. Software-Managed (e.g., MIPS) The hardware raises an exception The operating system does the page walk The operating system fetches the PTE The operating system inserts/evicts entries in the TLB 3

Handling TLB Misses (cont d) Hardware-Managed TLB Pro: No exceptions. Instruction just stalls Pro: Independent instructions may continue Pro: Small footprint (no extra instructions/data) Con: Page directory/table organization is etched in stone Software-Managed TLB Pro: The OS can design the page directory/table Pro: More advanced TLB replacement policy Con: Flushes pipeline Con: Performance overhead 32

Address Translation and Caching When do we do the address translation? Before or after accessing the L cache? In other words, is the cache virtually addressed or physically addressed? Virtual versus physical cache What are the issues with a virtually addressed cache? Synonym problem: Two different virtual addresses can map to the same physical address à same physical address can be present in multiple locations in the cache à can lead to inconsistency in data 33

Homonyms and Synonyms Homonym: Same VA can map to two different PAs Why? VA is in different processes Synonym: Different VAs can map to the same PA Why? Different pages can share the same physical frame within or across processes Reasons: shared libraries, shared data, copy-on-write pages within the same process, Do homonyms and synonyms create problems when we have a cache? Is the cache virtually or physically addressed? 34

Cache-VM Interaction CPU CPU CPU TLB VA PA cache cache tlb VA PA cache tlb VA PA lower hier. lower hier. lower hier. physical cache virtual (L) cache virtual-physical cache 35

Virtually-Indexed Physically-Tagged If C (page_size associativity), the cache index bits come only from page offset (same in VA and PA) If both cache and TLB are on chip index both arrays concurrently using VA bits check cache tag (physical) against TLB output at the end VPN Page Offset CIndex CO TLB physical cache PPN = tag data 36 TLB hit? cache hit?

Virtually-Indexed Physically-Tagged If C>(page_size associativity), the cache index bits include VPN Þ Synonyms can cause problems The same physical address can exist in two locations Solutions? VPN Page Offset Cache Index CO TLB a physical cache PPN = tag data TLB hit? cache hit? 37

Sanity Check Core 2 Duo: 32 KB, 8-way set associative, page size 4K Cache size (page_size associativity)? 2 P = 4K P = 2 Needs 2 bits for page offset 2 C = 32KB, C = 5 Needs 5 bits to address a byte in the cache 2 A = 8-way, A = 3 Increasing the associativity of the cache reduces the number of address bits needed to index into the cache Needs 2 bits for cache index and offset, as tags are matched for blocks in the same set C P + A? 5 2+3? True 38

Some Solutions to the Synonym Problem Limit cache size to (page size times associativity) get index from page offset On a write to a block, search all possible indices that can contain the same physical block, and update/invalidate Used in Alpha 2264, MIPS RK Restrict page placement in OS make sure index(va) = index(pa) Called page coloring Used in many SPARC processors 39

Today Case study: Core i7/linux memory system 4

Intel Core i7 Memory System Processor package Core x4 Registers Instruction fetch MMU (addr translation) L d-cache 32 KB, 8-way L i-cache 32 KB, 8-way L d-tlb 64 entries, 4-way L i-tlb 28 entries, 4-way L2 unified cache 256 KB, 8-way L3 unified cache 8 MB, 6-way (shared by all cores) L2 unified TLB 52 entries, 4-way QuickPath interconnect 4 links @ 25.6 GB/s each DDR3 Memory controller 3 x 64 bit @.66 GB/s 32 GB/s total (shared by all cores) To other cores To I/O bridge Main memory 4

End-to-end Core i7 Address Translation TLB VPN miss CPU 36 2 32 TLBT Virtual address (VA) VPO 4 TLBI... TLB hit 32/64 Result L hit L d-cache (64 sets, 8 lines/set)... L2, L3, and main memory L miss L TLB (6 sets, 4 entries/set) CR3 9 9 VPN VPN2 PTE 9 9 VPN3 VPN4 PTE PTE PTE 4 2 PPN PPO Physical address (PA) 4 6 6 CT CI CO Page tables 42

Speeding Up L Access CT Tag Check Physical address (PA) 4 6 6 CT CI CO PPN PPO Virtual address (VA) Observation Address Translation VPN VPO 36 2 No Change Bits that determine CI identical in virtual and physical address Can index into cache while address translation taking place Generally we hit in TLB, so PPN bits (CT bits) available next Virtually indexed, physically tagged Cache carefully sized to make this possible CI L Cache

Core i7 Level -3 Page Table Entries 63 XD 62 52 5 2 9 8 7 6 5 4 3 2 Unused Page table physical base address Unused G PS A CD WT U/S R/W P= Available for OS (page table location on disk) P= Each entry references a 4K child page table. Significant fields: P: Child page table present in physical memory () or not (). R/W: Read-only or read-write access access permission for all reachable pages. U/S: user or supervisor (kernel) mode access permission for all reachable pages. WT: Write-through or write-back cache policy for the child page table. A: Reference bit (set by MMU on reads and writes, cleared by software). PS: Page size either 4 KB or 4 MB (defined for Level PTEs only). Page table physical base address: 4 most significant bits of physical page table address (forces page tables to be 4KB aligned) XD: Disable or enable instruction fetches from all pages reachable from this PTE. 44

Core i7 Level 4 Page Table Entries 63 XD 62 52 5 2 9 8 7 6 5 4 3 2 Unused Page physical base address Unused G D A CD WT U/S R/W P= Available for OS (page location on disk) P= Each entry references a 4K child page. Significant fields: P: Child page is present in memory () or not () R/W: Read-only or read-write access permission for child page U/S: User or supervisor mode access WT: Write-through or write-back cache policy for this page A: Reference bit (set by MMU on reads and writes, cleared by software) D: Dirty bit (set by MMU on writes, cleared by software) Page physical base address: 4 most significant bits of physical page address (forces pages to be 4KB aligned) XD: Disable or enable instruction fetches from this page. 45

Core i7 Page Table Translation 9 9 VPN VPN 2 9 VPN 3 VPN 4 9 2 Virtual VPO address CR3 Physical address of L PT L PT Page global directory L2 PT Page upper directory L3 PT Page middle directory 4 / 4 / 4 / 4 / L PTE L2 PTE L3 PTE L4 PT Page table L4 PTE /2 Offset into physical and virtual page 52 GB region per entry GB region per entry 2 MB region per entry 4 KB region per entry Physical address of page 4 / 4 2 Physical PPN PPO address 46

Virtual Memory Samira Khan Apr 27, 27 47