Systems Programming and Computer Architecture ( ) Timothy Roscoe

Similar documents
14 May 2012 Virtual Memory. Definition: A process is an instance of a running program

Computer Systems. Virtual Memory. Han, Hwansoo

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

This lecture. Virtual Memory. Virtual memory (VM) CS Instructor: Sanjeev Se(a

CSE 153 Design of Operating Systems

Processes and Virtual Memory Concepts

Virtual Memory: Concepts

Carnegie Mellon. 16 th Lecture, Mar. 20, Instructors: Todd C. Mowry & Anthony Rowe

Virtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

Virtual Memory. Alan L. Cox Some slides adapted from CMU slides

Virtual Memory: Concepts

Virtual Memory: Concepts

University of Washington Virtual memory (VM)

Understanding the Design of Virtual Memory. Zhiqiang Lin

Lecture 19: Virtual Memory: Concepts

VM as a cache for disk

Virtual Memory II. CSE 351 Autumn Instructor: Justin Hsia

Virtual Memory II. CSE 351 Autumn Instructor: Justin Hsia

Virtual Memory II CSE 351 Spring

Foundations of Computer Systems

Virtual Memory. Physical Addressing. Problem 2: Capacity. Problem 1: Memory Management 11/20/15

Virtual Memory. Samira Khan Apr 27, 2017

Virtual Memory I. CSE 351 Spring Instructor: Ruth Anderson

Virtual Memory Oct. 29, 2002

Memory System Case Studies Oct. 13, 2008

Pentium/Linux Memory System March 17, 2005

Virtual Memory: Systems

A Few Problems with Physical Addressing. Virtual Memory Process Abstraction, Part 2: Private Address Space

VIRTUAL MEMORY: CONCEPTS

virtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

CISC 360. Virtual Memory Dec. 4, 2008

CSE 351. Virtual Memory

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

P6/Linux Memory System Nov 11, 2009"

CS 153 Design of Operating Systems

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 24

Virtual Memory I. CSE 351 Winter Instructor: Mark Wyse

Virtual Memory Nov 9, 2009"

CS 153 Design of Operating Systems

Virtual Memory. Computer Systems Principles

Virtual Memory: Systems

virtual memory Page 1 CSE 361S Disk Disk

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23

University*of*Washington*

Intel P6 (Bob Colwell s Chip, CMU Alumni) The course that gives CMU its Zip! Memory System Case Studies November 7, 2007.

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 23

Page Which had internal designation P5

Virtual Memory III /18-243: Introduc3on to Computer Systems 17 th Lecture, 23 March Instructors: Bill Nace and Gregory Kesden

Intel P The course that gives CMU its Zip! P6/Linux Memory System November 1, P6 memory system. Review of abbreviations

P6 memory system P6/Linux Memory System October 31, Overview of P6 address translation. Review of abbreviations. Topics. Symbols: ...

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

@2010 Badri Computer Architecture Assembly II. Virtual Memory. Topics (Chapter 9) Motivations for VM Address translation

Las time: Large Pages Last time: VAX translation. Today: I/O Devices

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

CS 261 Fall Mike Lam, Professor. Virtual Memory

Computer Structure. X86 Virtual Memory and TLB

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Virtual Memory. CS 351: Systems Programming Michael Saelee

Computer Architecture Lecture 13: Virtual Memory II

Virtual Memory. Spring Instructors: Aykut and Erkut Erdem

18-447: Computer Architecture Lecture 16: Virtual Memory

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, FALL 2012

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, SPRING 2013

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1

CSE 560 Computer Systems Architecture

Virtual Memory. Virtual Memory

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

Memory Management! Goals of this Lecture!

Virtual Memory. Stefanos Kaxiras. Credits: Some material and/or diagrams adapted from Hennessy & Patterson, Hill, online sources.

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Virtual Memory: From Address Translation to Demand Paging

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Virtual Memory. CS 3410 Computer System Organization & Programming

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory. Bernhard Boser & Randy Katz

CS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory

Chapter 8. Virtual Memory

Memory Hierarchy. Mehran Rezaei

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

CSE 120 Principles of Operating Systems

Chapter 5: Memory Management

Learning Outcomes. An understanding of page-based virtual memory in depth. Including the R3000 s support for virtual memory.

COSC3330 Computer Architecture Lecture 20. Virtual Memory

Memory Management. Goals of this Lecture. Motivation for Memory Hierarchy

Virtual Memory III. CSE 351 Autumn Instructor: Justin Hsia

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

Virtual Memory. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

1. Creates the illusion of an address space much larger than the physical memory

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Transcription:

Systems Group Department of Computer Science ETH Zürich Systems Programming and Computer Architecture (252-6-) Timothy Roscoe Herbstsemester 26 AS 26 Virtual Memory

8: Virtual Memory Computer Architecture and Systems Programming 252-6-, Herbstsemester 26 Timothy Roscoe AS 26 Virtual Memory 2

Virtual Memory (up until now) Programs refer to virtual memory addresses movl (%rcx),%eax Conceptually very large array of bytes Each byte has its own address Actually implemented with hierarchy of different memory types System provides address space private to particular process Allocation: Compiler and run-time system Where different program objects should be stored All allocation within single virtual address space But why virtual memory? Why not physical memory? FF F AS 26 Virtual Memory 3

8.: Recap: Address Translation AS 26 Virtual Memory 4

Address Translation: a level of Virtual memory indirection Process Physical memory mapping Virtual memory Process n Each process gets its own private memory space AS 26 Virtual Memory 5

Address spaces Linear address space: Ordered set of contiguous non-negative integer addresses: {,, 2, 3 } Virtual address space: Set of N = 2 n virtual addresses {,, 2, 3,, N-} Physical address space: Set of M = 2 m physical addresses {,, 2, 3,, M-} Clean distinction between data (bytes) and their attributes (addresses) Each object can now have multiple addresses Every byte in main memory: one physical address, one (or more) virtual addresses AS 26 Virtual Memory 6

System with physical addressing CPU Physical address (PA) Main memory : : 2: 3: 4: 5: 6: 7: 8:... M-: Data word [Still] used in simple systems like embedded microcontrollers in devices like cars, elevators, and digital picture frames AS 26 Virtual Memory 7

System with virtual addressing CPU Chip CPU Virtual address (VA) MMU Physical address (PA) Main memory : : 2: 3: 4: 5: 6: 7: 8:... M-: Data word Used in all modern desktops, laptops, workstations One of the great ideas in computer science MMU checks the cache AS 26 Virtual Memory 8

Address translation with a page table Page table base register (PTBR) Virtual address Virtual page number (VPN) Virtual page offset (VPO) Page table address for process Page table Valid Physical page number (PPN) Valid bit = : page not in memory (page fault) Physical page number (PPN) Physical page offset (PPO) Physical address AS 26 9

8.2: Uses of virtual memory AS 26 Virtual Memory

Why virtual memory (VM)? Efficient use of limited main memory (RAM) Use RAM as a cache for the parts of a virtual address space some non-cached parts stored on disk some (unallocated) non-cached parts stored nowhere Keep only active areas of virtual address space in memory transfer data back and forth as needed Simplifies memory management for programmers Each process gets the same full, private linear address space Isolates address spaces One process can t interfere with another s memory because they operate in different address spaces User process cannot access privileged information different sections of address spaces have different permissions AS 26 Virtual Memory

: VM as a cache for more memory than you have RAM 64-bit addresses: 6 Exabyte Physical main memory: Few Gigabytes? And there are many processes. AS 26 Virtual Memory 2

: VM as a tool for caching Virtual memory: array of N = 2 n contiguous bytes think of the array (allocated part) as being stored on disk Physical main memory (DRAM) = cache for allocated virtual memory Blocks are called pages; size = 2 p Virtual memory Physical memory Disk or VP VP Unallocated Cached Uncached Unallocated Cached Empty Empty PP PP SSD VP 2 n-p - Uncached Cached Uncached 2 n - 2 m - Empty PP 2 m-p - Virtual pages (VP's) stored on disk Physical pages (PP's) cached in DRAM AS 26 Virtual Memory 3

Memory hierarchy: Skylake CPU Thrpt: Latency: L/L2/L3 cache: 64 B blocks Reg ~8 B/cycle 4 cycles L I-cache 32 KB L D-cache ~29 B/cycle 2 cycles ~256kB L2 cache ~8 B/cycle 44 cycles ~8 MB L3 unified cache ~ B/cycle 8 cycles ~6 GB ~3 GB Main Memory Not drawn to scale B/ cycles millions SSD Miss penalty (latency): 2x Miss penalty (latency):,x AS 26 Virtual Memory 4

DRAM cache organization DRAM cache organization driven by the enormous miss penalty DRAM is about x slower than SRAM Disk is about,x slower than DRAM For first byte, faster for next byte Consequences Large page (block) size: typically 4-8 KB, sometimes 4 MB Fully associative Any VP can be placed in any PP Requires a large mapping function different from CPU caches Highly sophisticated, expensive replacement algorithms Too complicated and open-ended to be implemented in hardware Write-back rather than write-through AS 26 Virtual Memory 5

Why does it work? Locality! Virtual memory works because of locality At any point in time, programs tend to access a set of active virtual pages called the working set Programs with better temporal locality will have smaller working sets If (working set size < main memory size) Good performance for one process after compulsory misses If ( SUM(working set sizes) > main memory size ) Thrashing: Performance meltdown where pages are swapped (copied) in and out continuously AS 26 Virtual Memory 6

Problem 2: Memory management Physical main memory Process Process 2 Process 3 Process n x stack heap.text.data What goes where? AS 26 Virtual Memory 7

2. VM as a tool for memory management Key idea: each process has its own virtual address space It can view memory as a simple linear array Mapping function scatters addresses through physical memory Well chosen mappings simplify memory allocation and management Virtual Address Space for Process : Virtual Address Space for Process 2: N- VP VP 2... VP VP 2... Address translation Physical Address Space (DRAM) (e.g., read-only library code) AS 26 8 N- M- PP 2 PP 6 PP 8...

2. VM as a tool for memory management Memory allocation Each virtual page can be mapped to any physical page A virtual page can be stored in different physical pages at different times Sharing code and data among processes Map virtual pages to the same physical page (here: PP 6) Virtual Address Space for Process : Virtual Address Space for Process 2: N- VP VP 2... VP VP 2... Address translation Physical Address Space (DRAM) (e.g., read-only library code) AS 26 9 N- M- PP 2 PP 6 PP 8...

3. Using VM to simplify linking and loading Linking Each program has similar virtual address space Code, stack, and shared libraries always start at the same address Loading execve() allocates virtual pages for.text and.data sections = creates PTEs marked as invalid The.text and.data sections are copied, page by page, on demand by the virtual memory system Kernel virtual memory User stack (created at runtime) Memory-mapped region for shared libraries Run-time heap (created by malloc) Read/write segment (.data,.bss) Read-only segment (.init,.text,.rodata) Memory invisible to user code %rsp (stack pointer) brk Loaded from the executable file Unused AS 26 2

Problem 3: Protection Process i Physical main memory Process j Problem 4: Sharing Process i Physical main memory Process j AS 26 Virtual Memory 2

4. VM as a tool for memory protection Extend PTEs with permission bits Page fault handler checks these before remapping If violated, send process SIGSEGV (segmentation fault) Process i: VP : VP : VP 2: SUP No No Yes READ Yes Yes Yes WRITE No Yes Yes Address PP 6 PP 4 PP 2 Physical Address Space PP 2 PP 4 PP 6 Process j: VP : VP : SUP No Yes READ Yes Yes WRITE No Yes Address PP 9 PP 6 PP 8 PP 9 VP 2: No Yes Yes PP PP AS 26 Virtual Memory 22

Summary Virtual Memory is an important tool for:. Caching disk contents in RAM 2. Performing memory management 3. Simplifying linking and loading 4. Protecting memory AS 26 Virtual Memory 23

8.3: The address translation process AS 26 Virtual Memory 24

Address translation: page hit CPU Chip CPU VA MMU 2 PTEA PTE 3 PA Cache/ Memory 4 Data 5 ) Processor sends virtual address to MMU 2-3) MMU fetches PTE from page table in memory 4) MMU sends physical address to cache/memory 5) Cache/memory sends data word to processor AS 26 Virtual Memory 25

Address translation: page fault Exception 4 Page fault handler CPU Chip CPU VA 7 MMU 2 PTEA PTE 3 Cache/ Memory Victim page 5 New page Disk 6 ) Processor sends virtual address to MMU 2-3) MMU fetches PTE from page table in memory 4) Valid bit is zero, so MMU triggers page fault exception 5) Handler identifies victim (and, if dirty, pages it out to disk) 6) Handler pages in new page and updates PTE in memory 7) Handler returns to original process, restarting faulting instruction 26

8.4: Translation lookaside buffers AS 26 Virtual Memory 27

Speeding up translation with a TLB Page table entries (PTEs) are cached in L like any other memory word PTEs may be evicted by other data references PTE hit in L still requires an extra (or 2)-cycle delay Solution: Translation Lookaside Buffer (TLB) Small hardware cache in MMU Maps virtual page numbers to physical page numbers Contains complete page table entries for small number of pages AS 26 Virtual Memory 28

TLB hit CPU Chip TLB 2 PTE VPN 3 CPU VA MMU PA 4 Cache/ Memory Data 5 A TLB hit eliminates a memory access AS 26 Virtual Memory 29

TLB miss CPU Chip 2 VPN TLB 4 PTE CPU VA MMU 3 PTEA PA 5 Cache/ Memory Data 6 A TLB miss incurs an additional memory access (the PTE) Fortunately, TLB misses are rare (we hope) AS 26 Virtual Memory 3

8.5: A simple memory system example AS 26 Virtual Memory 3

A simple memory system Addressing 4-bit virtual addresses 2-bit physical address Page size = 64 bytes 3 2 9 8 7 6 5 4 3 2 VPN Virtual Page Number VPO Virtual Page Offset 9 8 7 6 5 4 3 2 PPN PPO Physical Page Number Physical Page Offset AS 26 Virtual Memory 32

Simple memory system: page table Only show first 6 entries (out of 256) VPN PPN Valid VPN PPN Valid 28 8 3 9 7 2 33 A 9 3 2 B 4 C 5 6 D 2D 6 E 7 F D AS 26 Virtual Memory 33

Simple memory system: TLB 6 entries 4-way associative TLBT TLBI 3 2 9 8 7 6 5 4 3 2 VPN VPO Set Tag PPN Valid Tag PPN Valid Tag PPN Valid Tag PPN Valid 3 9 D 7 2 3 2D 2 4 A 2 2 8 6 3 3 7 3 D A 34 2 AS 26 Virtual Memory 34

Simple memory system: cache 6 lines, 4-byte block size Physically addressed Direct mapped CT CI CO 9 8 7 6 5 4 3 2 PPN PPO Idx Tag Valid B B B2 B3 Idx Tag Valid B B B2 B3 9 99 23 8 24 3A 5 89 5 9 2D 2 B 2 4 8 A 2D 93 5 DA 3B 3 36 B B 4 32 43 6D 8F 9 C 2 5 D 36 72 F D D 6 4 96 34 5 6 3 E 3 83 77 B D3 7 6 C2 DF 3 F 4

Address translation example # Virtual Address: x3d4 TLBT 3 2 9 8 TLBI 7 6 5 4 3 2 VPN VPO VPN xf TLBI 3 TLBT x3 TLB Hit? Y Page Fault? N PPN: xd Physical Address CT CI CO 9 8 7 6 5 4 3 2 PPN PPO x5 xd Y x36 CO CI CT Hit? Byte: AS 26 Virtual Memory 36

Address translation example #2 Virtual Address: xb8f TLBT 3 2 9 8 TLBI 7 6 5 4 3 2 VPN VPO VPN x2e TLBI 2 TLBT xb TLB Hit? N Page Fault? Y PPN: TBD Physical Address CO CI CT Hit? Byte: AS 26 Virtual Memory 37

Address translation example #3 Virtual Address: x2 TLBT 3 2 9 8 TLBI 7 6 5 4 3 2 VPN VPO VPN x TLBI TLBT x TLB Hit? N Page Fault? N PPN: x28 Physical Address CT CI CO 9 8 7 6 5 4 3 2 PPN PPO x8 x28 N Mem CO CI CT Hit? Byte: AS 26 Virtual Memory 38

Summary Programmer s view of virtual memory Each process has its own private linear address space Cannot be corrupted by other processes System view of virtual memory Uses memory efficiently by caching virtual memory pages Efficient only because of locality Simplifies memory management and programming Simplifies protection by providing a convenient interpositioning point to check permissions AS 26 Virtual Memory 39

8.6: Multi-level page tables AS 26 Virtual Memory 4

Terminology Virtual Page may refer to: Page-aligned region of virtual address space and Contents thereof (different might appear on disk) Physical Page: Page-aligned region of physical memory (RAM) Physical Frame (=Physical Page) Alternative terminology Page = contents, Frame = container Page size may be frame size (rarely) AS 26 Virtual Memory 4

Linear page table size Given: 4KB (2 2 ) page size 48-bit address space 8-byte PTE How big a page table do you need? Answer: 52 GB! 2 48 / 2 2 * 2 3 = 2 39 bytes AS 26 Virtual Memory 42

2-Level page table hierarchy Level page table PTE PTE PTE 2 (null) PTE 3 (null) PTE 4 (null) PTE 5 (null) PTE 6 (null) PTE 7 (null) PTE 8 (K - 9) null PTEs Level 2 page tables PTE... PTE 23 PTE... PTE 23 23 null PTEs Virtual memory VP... VP 23 VP 24... VP 247 Gap PTE 23 23 unallocated pages VP 925... 2K allocated VM pages for code and data 6K unallocated VM pages 23 unallocated pages allocated VM page for the stack AS 26 Virtual Memory 43

Translating with a k-level page table Virtual Address n- VPN VPN 2... VPN k p- VPO Level page table Level 2 page table...... Level k page table PPN m- PPN p- PPO Physical Address AS 26 Virtual Memory 44

8.7: Case study of the core i7 virtual memory system AS 26 Virtual Memory 45

A typical Core i7 processor Processor package Core x4 Registers Instruction fetch MMU (addr translation) L d-cache 32 KB, 8-way L i-cache 32 KB, 8-way L d-tlb 64 entries, 4-way L i-tlb 28 entries, 4-way L2 unified cache 256 KB, 8-way L2 unified TLB 52 entries, 4-way QuickPath interconnect 4 links @ 25.6 GB/s 2.4 GB/s total To other cores To I/O bridge L3 unified cache 8 MB, 6-way (shared by all cores) DDR3 Memory controller 3 x 64 bit @.66 GB/s 32 GB/s total (shared by all cores) Main memory AS 26 46

Core i7 memory system* CPU 36 2 VPN VPO Virtual address (VA) 32/64 Result L hit L2, L3, and main memory L miss CR3 TLB 32 miss TLBT 9 9 VPN VPN2 PTE 4 TLBI... L TLB (6 sets, 4 entries/set) 9 9 VPN3 VPN4 PTE PTE PTE PPN TLB hit 4 2 PPO L d-cache (64 sets, 8 lines/set) Physical... address (PA) 4 6 6 CT CI CO Page tables *A bit simplified AS 26 Virtual Memory 47

More abbreviations Components of the virtual address (VA) TLBI: TLB index TLBT: TLB tag VPO: virtual page offset VPN: virtual page number Components of the physical address (PA) PPO: physical page offset (same as VPO) PPN: physical page number CO: byte offset within cache line CI: cache index CT: cache tag AS 26 Virtual Memory 48

x86-64 paging 48-bit virtual address 256 terabytes (TB) Not yet ready for full 64 bits Nobody can buy that much DRAM yet Mapping tables would be huge Multi-level array map may not be the right data structure 52-bit physical address = 4 bits for PPN Requires 64-bit table entries Keep older x86 4KB page size (including for page tables) (496 bytes per PT) / (8 bytes per PTE) = 52 entries per page AS 26 Virtual Memory 49

Core i7 TLB translation CPU 36 2 VPN VPO Virtual address (VA) 32/64 Result L hit L2, L3, and main memory L miss CR3 TLB 32 miss TLBT 9 9 VPN VPN2 PTE 4 TLBI... L TLB (6 sets, 4 entries/set) 9 9 VPN3 VPN4 PTE PTE PTE PPN TLB hit 4 2 PPO L d-cache (64 sets, 8 lines/set) Physical... address (PA) 4 6 6 CT CI CO AS 26 Virtual Memory 5 Page tables

Core i7 TLB TLB entry (not all documented, so this is speculative): 4 32 PPN TLBTag V G S W D V: indicates a valid () or invalid () TLB entry TLBTag: disambiguates entries cached in the same set PPN: translation of the address indicated by index & tag G: page is global according to PDE, PTE S: page is supervisor-only according to PDE, PTE W: page is writable according to PDE, PTE D: PTE has already been marked dirty (once is enough) Structure of the data TLB: 6 sets, 4 entries/set entry entry entry entry entry entry entry entry... entry entry entry entry set set set 5 AS 26 Virtual Memory 5

Translating with the i7 TLB CPU. Partition VPN into TLBT and TLBI. 36 2 VPN VPO virtual address 2. Is the PTE for VPN cached in set TLBI? 32 TLBT 4 TLBI 2 3. Yes: Check permissions, build physical address TLB miss partial TLB hit PDE... page table translation PTE TLB hit 4 3 4 2 PPN PPO physical address 4. No: Read PTE (and PDE if not cached) from memory and build physical address AS 26 Virtual Memory 52

Core i7 page translation CPU 36 2 VPN VPO Virtual address (VA) 32/64 Result L hit L2, L3, and main memory L miss CR3 TLB 32 miss TLBT 9 9 VPN VPN2 PTE 4 TLBI... L TLB (6 sets, 4 entries/set) 9 9 VPN3 VPN4 PTE PTE PTE PPN TLB hit 4 2 PPO L d-cache (64 sets, 8 lines/set) Physical... address (PA) 4 6 6 CT CI CO Page tables Virtual Memory 53

x86-64 paging 9 VPN 9 9 9 VPN2 VPN3 VPN4 2 VPO Virtual address Page Map Table Page Directory Pointer Table Page Directory Table Page Table PM4LE PDPE PDE PTE BR 52 GB region per entry GB region per entry 2 MB region per entry 4 KB region per entry 4 PPN 2 PPO Physical address AS 26 Virtual Memory 54

Page-Map Level 4 Page-Directory-Pointer Base Address: 4 most significant bits of physical page table address (forces page tables to be 4KB aligned) Avail: These bits available for system programmers A: accessed (set by MMU on reads and writes, cleared by software) PCD: cache disabled () or enabled () PWT: write-through or write-back cache policy for this page table U/S: user or supervisor mode access R/W: read-only or read-write access P: page table is present in memory () or not () AS 26 Virtual Memory 55

Page Directory Pointer Entry (level 3) Page-Directory Base Address: 4 most significant bits of physical page table address (forces page tables to be 4KB aligned) Avail: These bits available for system programmers A: accessed (set by MMU on reads and writes, cleared by software) PCD: cache disabled () or enabled () PWT: write-through or write-back cache policy for this page table U/S: user or supervisor mode access R/W: read-only or read-write access P: page table is present in memory () or not () AS 26 Virtual Memory 56

Page Directory Entry (level 2) Page table physical base address: 4 most significant bits of physical page table address (forces page tables to be 4KB aligned) Avail: These bits available for system programmers A: accessed (set by MMU on reads and writes, cleared by software) PCD: cache disabled () or enabled () PWT: write-through or write-back cache policy for this page table U/S: user or supervisor mode access R/W: read-only or read-write access P: page table is present in memory () or not () AS 26 Virtual Memory 57

Page Table Entry (level ) Physical Page base address: 4 most significant bits of physical page address (forces pages to be 4 KB aligned) Avail: available for system programmers G: global page (don t evict from TLB on task switch) PAT: Page-Attribute Table D: dirty (set by MMU on writes) A: accessed (set by MMU on reads and writes) PCD: cache disabled or enabled PWT: write-through or write-back cache policy for this page U/S: user/supervisor R/W: read/write P: page is present in physical memory () or not () AS 26 Virtual Memory 58

x86-64 paging Virtual address VPN VPO VPN VPN2 VPN3 VPN4 Physical address PPN PPO PM4L(p=) PDPE(p=) PDE (p=) PTE (p=) data Page tables and page present BR Disk MMU action: build physical address fetch data word OS action None 59

x86-64 paging Virtual address VPN VPO VPN VPN2 VPN3 VPN4 BR Disk PM4L(p=) PDPE(p=) PDE (p=) PTE (p=) data Page table present, page missing MMU action: Page fault excpt. Handler receives args: %rip of fault VA of fault Fault type Fault type: non-present page page-level protection 6

x86-64 paging Virtual address VPN VPO VPN VPN2 VPN3 VPN4 Physical address PPN PPO BR Disk PM4L(p=) PDPE(p=) PDE (p=) PTE (p=) data OS action: Check for legal VA Read PTE through PT. Find free physical page (swap out current page if necessary) Read VP from disk into physical page Adjust PTE to point to PP, set p= Restart faulting instruction 6

x86-64 paging Virtual address VPN VPO BR Disk VPN VPN2 VPN3 VPN4 PM4L(p=) PDPE(p=) PDE (p=) PTE (p=) data Page tables and page missing MMU action: Page fault OS action: Swap in page table Restart faulting instruction by returning from handler Goto previous case Multiple disk reads 62

x86-64 paging Virtual address VPN VPO BR Disk VPN VPN2 VPN3 VPN4 PM4L(p=) PDPE(p=) PDE (p=) PTE (p=) data Page table missing, page present Introduces consistency issue Potentially every page-out requires update of disk page table Linux disallows this If a page table is swapped out, then swap out its data pages too 63

8.8: What about the cache? AS 25 Systems Programming and Computer Architecture 64

Core i7 cache access CPU 36 2 VPN VPO Virtual address (VA) 32/64 Result L hit L2, L3, and main memory L miss CR3 TLB 32 miss TLBT 9 9 VPN VPN2 PTE 4 TLBI... L TLB (6 sets, 4 entries/set) 9 9 VPN3 VPN4 PTE PTE PTE PPN TLB hit 4 2 PPO L d-cache (64 sets, 8 lines/set) Physical... address (PA) 4 6 6 CT CI CO AS 26 Virtual Memory 65 Page tables

L cache access 64 data L hit L (64sets, 8 lines/set)... physical address (PA) L2 and DRAM L miss 4 6 6 CT CI CO Partition physical address: CO, CI, and CT Use CT to determine if line containing word at address PA is cached in set CI No: check L2 (then L3 ) Yes: extract word at byte offset CO and return to processor AS 26 Virtual Memory 66

Much older (P6) memory design CPU 32 result L2 and DRAM 2 2 VPN VPO 6 TLBT 4 TLBI virtual address (VA) L hit L miss TLB miss... TLB hit L (28 sets, 4 lines/set)... VPN VPN2 PDBR PDE TLB (6 sets, 4 entries/set) Page tables PTE 2 2 PPN PPO 2 7 5 CT CI CO physical address (PA) Same size cache (just less associative). Virtual Memory Why? 67

Speeding up L access Tag Check Physical address (PA) 4 6 6 CT CI CO PPN PPO Address Translation No Change CI Observation Virtual address (VA) VPN VPO 36 2 Bits that determine CI identical in virtual and physical address Can index into cache while address translation taking place Generally we hit in TLB, so PPN bits (CT bits) available next Virtually indexed, physically tagged Cache carefully sized to make this possible AS 26 Virtual Memory 68

8.9: Large pages AS 26 Virtual Memory 69

x86-64 paging 9 VPN 9 9 9 VPN2 VPN3 VPN4 2 VPO Virtual address Page Map Table Page Directory Pointer Table Page Directory Table Page Table PM4LE PDPE PDE PTE BR 52 GB region per entry GB region per entry 2 MB region per entry 4 KB region per entry 4 PPN 2 PPO Physical address AS 26 Virtual Memory 7

x86-64 large pages 9 VPN 9 9 9 VPN2 VPN3 VPN4 2 VPO Virtual address Page Map Table Page Directory Pointer Table Page Directory Table PM4LE PDPE PDE BR 52 GB region per entry GB region per entry 2 MB region per entry 2 bits 2MB pages PPN PPO Physical address 27 9 2 AS 26 Virtual Memory 7

Page Directory Entry (level 2) Page table physical base address: 4 most significant bits of physical page table address (forces page tables to be 4KB aligned) Avail: These bits available for system programmers A: accessed (set by MMU on reads and writes, cleared by software) PCD: cache disabled () or enabled () PWT: write-through or write-back cache policy for this page table U/S: user or supervisor mode access R/W: read-only or read-write access P: page table is present in memory () or not () AS 26 Virtual Memory 72

PDE for large pages AS 26 Virtual Memory 73

Huge pages 9 VPN 9 9 9 VPN2 VPN3 VPN4 2 VPO Virtual address Page Map Table Page Directory Pointer Table PM4LE PDPE BR 52 GB region per entry GB region per entry 3 bits GB pages PPN PPO Physical address 8 8 2 AS 26 Virtual Memory 74

PDPE (level 3) for huge pages AS 26 Virtual Memory 75

Large and huge pages Terminate page table walk early 2MB, GB pages Simplify address translation Useful for programs with very large, contiguous working sets Reduces compulsory TLB misses Hard to use (Linux) hugetlbfs support Use libhugetlbs {m,c,re}alloc replacements AS 26 Virtual Memory 76

8.: Optimizing for the TLB AS 26 Virtual Memory 77

Buffering: example MMM Blocked for cache Block size B x B c = a i * b + c Assume blocking for L2 cache say, 52 MB = 2 9 B = 2 6 doubles = C 3B 2 < C means B 5 AS 26 Virtual Memory 78

Buffering: example MMM But: Look at one iteration assume > 4 KB = 52 doubles c = a * b blocksize B = 5 + c Consequence each row used O(B) times but every time O(B 2 ) ops between Each row is on different page More rows than TLB entries: TLB thrashing Solution: buffering = copy block to contiguous memory O(B 2 ) cost for O(B 3 ) operations AS 26 Virtual Memory 79