Memory Addressing Motivation: why not direct physical memory access? Address translation with pages Optimizing translation: translation lookaside buffer Extra benefits: sharing and protection Memory as a contiguous array of bytes is a lie! Why? : : address : (PA) : : : : 7: 8: Main memory M- : Data Small embedded systems without processes Elevators, microwaves, radio- powered devices, Problem : Memory Management Problem : Capacity Main memory - bit addresses can address several exabytes (8,,7,7,79,, bytes) main memory offers a few gigabytes (e.g. 8,89,9,9 bytes) Process Process Process Process n x stack heap.text.data What goes where?? (Actually, it s smaller than that dot compared to virtual memory.) Also: Context switches must swap out entire memory contents. Isn't that expensive? virtual address space per process, with many processes
Problem : Protection Solution: Memory (address indirection) main memory memory Process i Process Process j Problem : Sharing memory virtual addresses virtual- to- physical mapping physical addresses memory Process i main memory Process n virtual addresses Process j Private virtual address space per process. Single physical address space managed by OS/hardware. Indirection Tangent: Indirection everywhere Direct naming Indirect naming "" "x" What X currently maps to Thing 7 Pointers Constants Procedural abstraction Domain Name Service (DNS) Dynamic Host Configuration Protocol (DHCP) Phone numbers 9 Call centers What if we move Thing? Snail mail forwarding Any problem in computer science can be solved by adding another level of indirection. David Wheeler, inventor of the subroutine (a.k.a. procedure), or Butler Lampson 7 Another Wheeler quote? "Compatibility means deliberately repeating other people's mistakes."
Addressing Memory Management Unit translates virtual address to physical address Chip address (VA) MMU : : address : (PA) : : : : 7: 8: Data Main memory M- : addresses are invisible to programs. 9 - based Mapping n - v - both address spaces divided into fixed- size, aligned pages page size = power of two Map virtual pages onto physical pages. m - p - Some virtual pages do not fit! Where are they stored? Memory: cache for disk? t drawn to scale Memory n - v - virtual address space usually much larger than physical address space. On disk(if used). where! (if not yet used) Memory m - p - Reg Throughput: Latency: Intel Core Duo c.a. - SRAM L I- cache KB L D- cache ~ MB L unified cache ~8 GB ~ GB Main Memory B/cycle 8 B/cycle B/cycle B/ cycles cycles cycles cycles millions Cache miss penalty (latency): x DRAM Memory miss penalty (latency):,x Disk solid- state "flash" memory or spinning magnetic platter.
Design for a Slow Disk: Exploit Locality Address Translation Memory n - v - Fully associative Store any virtual page in any physical page Large mapping function on disk Large page size usually KB, up to - MB Memory m - p - Sophisticated replacement policy t just hardware Write- back Chip address (VA) MMU What happens in here? : : address : (PA) : : : : 7: 8: Data Main memory M- : Table array of page table entries(ptes) mapping virtual page to where it is stored PTE PTE 7 Number or disk address page table Memory resident, managed by HW (MMU), OS Swap space (Disk) VP VP pages ( memory) VP VP VP 7 VP PP PP Address Translation with a Table table base register (PTBR) Base address of current process's page table page mapped to physical page? address (VA) page number (VPN) table page number () page number () address (PA) page offset (VPO) page offset (PPO) How many page tables are in the system?
Hit: virtual page in memory Number Number or disk address PTE PTE 7 page table Swap space (Disk) VP VP pages ( memory) VP VP VP 7 VP PP PP 7 Fault: exceptional control flow Process accessed virtual address in a page that is not in physical memory. Process User Code movl exception: page fault return Returns to faulting instruction: movl is executed again! OS exception handler Create page and load into memory 8 Fault:. page not in memory Number Number or disk address PTE PTE 7 page table What now? OS handles fault Swap space (Disk) VP VP pages ( memory) VP VP VP 7 VP PP PP 9 Fault:. OS evicts another page. Number Number or disk address PTE PTE 7 page table Swap space (Disk) VP VP VP pages ( memory) VP VP VP 7 VP " out" PP PP
Fault:. OS loads needed page. Number Number or disk address PTE PTE 7 page table Finally: Reexecute faulting instruction. hit! Swap space (Disk) VP VP VP pages ( memory) VP VP VP 7 VP " in" PP PP Terminology context switch Switch control between processes on the same. page in Move pages of virtual memory from disk to physical memory. page out Move pages of virtual memory from physical memory to disk. thrash Total working set size of processes is larger than physical memory. Most time is spent paging in and out instead of doing useful computation. (I find all these terms useful when talking to other computer scientists about my brain) swap Address Translation: Hit Address Translation: Fault Chip VA MMU PTEA PTE PA Cache/ Memory Chip VA 7 MMU Exception PTEA PTE fault handler Cache/ Memory Victim page New page Disk Data ) Processor sends virtual address to MMU (memory management unit) - ) MMU fetches PTE from page table in cache/memory ) MMU sends physical address to cache/memory ) Cache/memory sends data word to processor ) Processor sends virtual address to MMU - ) MMU fetches PTE from page table in cache/memory ) bit is zero, so MMU triggers page fault exception ) Handler identifies victim (and, if dirty, pages it out to disk) ) Handler pages in new page and updates PTE in memory 7) Handler returns to original process, restarting faulting instruction
Translation sounds slow! Each access = accesses: load PTE, access requested address PTEs may be cached, but may be evicted. L cache hit still requires - cycles What can we do to make this faster? Translation Lookaside Buffer (TLB) Small hardware cache in MMU just for page table entries Modern Intel processors: 8 or entries in TLB Much faster than a page table lookup in cache/memory In the running for "classiest name of a thing in CS" TLB Hit TLB Miss Chip TLB PTE VPN Chip VPN TLB PTE VA MMU PA Cache/ Memory VA MMU PTEA PA Cache/ Memory Data Data A TLB hit eliminates a memory access A TLB miss incurs an additional memory access (the PTE) Fortunately, TLB misses are rare. Does a TLB miss require disk access? 7 8 7
Simple Memory System Example (small) Addressing - bit virtual addresses - bit physical address size = bytes 9 8 7 VPN Number VPO 9 8 7 Number Offset PPO Offset Simple Memory System Table Only showing first entries (out of = 8 ) VPN VPN 8 8 9 7 A 9 B C D D E 7 F D What about a real address space? Read more in the book 9 Simple Memory System TLB Simple Memory System Cache entries - way associative TLB ignores page offset. Why? TLB tag TLB index 9 8 7 lines, - byte block size ly addressed Direct mapped cache tag cache index cache offset 9 8 7 virtual page number virtual page offset physical page number physical page offset Idx B B B B Idx B B B B 9 99 8 A 89 Set D 9 8 D 7 A B D D 7 8F F 8 9 D 9 A B C D D D B 9 9 DA B 7 D A E 8 77 B D 7 C DF F 8
Address Translation Example # Address Translation Example # Address: xd Cache Idx B B B B Idx B B B B TLB tag TLB index 9 99 8 9 D A 89 9 8 7 B 8 A D 9 DA B B B D 8F 9 C virtual page number page offset D 7 F D D E 8 9 77 B D virtual page # TLB index TLB tag TLB Hit? Fault? physical page #: 7 C DF F TLB Address: x Set cache tag cache index cache offset 9 D 7 9 8 7 D A 8 physical page number physical page offset 7 D A cache offset cache index cache tag Hit? Byte: Address Translation Example # Address: xb8f TLB tag TLB index 9 8 7 Address Translation Example # Address: x TLB tag TLB index 9 8 7 virtual page number page offset virtual page number page offset virtual page # TLB index TLB tag TLB Hit? Fault? physical page #: TLB virtual page # TLB index TLB tag TLB Hit? Fault? physical page #: TLB Set Set 9 D 7 9 D 7 D A D A 8 8 7 D A 7 D A 9
Address Translation Example # Address Translation Example # Address: x Cache TLB tag TLB index 9 8 7 virtual page number page offset virtual page # TLB index TLB tag TLB Hit? Fault? physical page #: VPN VPN 8 8 9 7 A 9 B table C D D E 7 F D 7 Idx B B B B Idx B B B B 9 99 8 A 89 9 D B 8 A D 9 DA B B B D 8F 9 C D 7 F D D 9 E 8 77 B D 7 C DF F Address: xa cache tag cache index cache offset 9 8 7 physical page number physical page offset cache offset cache index cache tag Hit? Byte: 8 Easy address space allocation Process needs private contiguous address space. Fully associative: pages can live anywhere in physical memory. Easy cached access to storage > memory Good temporal locality + small working set = mostly page hits Great if working sets < physical memory, even if total data > physical memory Address Space for Process : Address Space for Process : N- VP VP VP VP PP PP PP 8 PP 9 Address Space (DRAM) If combined working sets of all processes) > physical memory: Thrashing: Performance meltdown. always waiting or paging. Full indirection quote: Every problem in computer science can be solved by adding another level of indirection, but that usually will create another problem. N- M- 9
Easy protection and sharing Protection: All user accesses go through translation. Impossible to access physical memory not mapped in virtual address space. Sharing: Map virtual pages in separate address spaces to same physical page(pp ). Address Space for Process : VP VP PP Address Space (DRAM) Easy protection All user accesses go through translation. Extend page table entries with permission bits. MMU checks permission bits on every memory access If not allowed, raise exception. Process : VP : VP : VP : READ WRITE EXEC Num PP PP PP PP PP Address Space for Process : N- VP VP PP PP 8 (e.g., read- only library code) Process : VP : VP : VP : READ WRITE EXEC Num PP 9 PP PP PP PP 8 PP 9 PP N- M- Summary Programmer s view of virtual memory Each process has its own private linear address space Cannot be corrupted by other processes System view of virtual memory Uses memory efficiently by caching virtual memory pages Efficient only because of locality Simplifies memory management and sharing Simplifies protection - - easy to interpose and check permissions Memory System Summary L/L/L Cache Purely a speed- up technique "Invisible" to application programmer and OS Implemented totally in hardware Memory Supports processes, memory management Operating System (software) Allocates physical memory Maintains page tables and memory metadata Handles exceptions, fills tables used by hardware Hardware Translates virtual addresses via mapping tables, enforces permissions Accelerates mapping via translation cache (TLB)
Memory System Summary L/L/L Cache Controlled by hardware Programmer cannot control it Programmer can write code in a way that takes advantage of it Memory Controlled by OS and hardware Programmer cannot control mapping to physical memory Programmer can control sharing and some protection via OS system calls