CS 53 Design of Operating Systems Spring 8 Lectre 6: Paging Instrctor: Chengy Song Slide contribtions from Nael Ab-Ghazaleh, Harsha Madhyvasta and Zhiyn Qian Some slides modified from originals by Dave O hallaron
Recap: Address Spaces Linear address space: Ordered set of contigos non-negative integer addresses: {,, 2, 3 } Virtal address space: Set of N = 2 n virtal addresses {,, 2, 3,, N-} Physical address space: Set of M = 2 m physical addresses {,, 2, 3,, M-} Clean distinction between data (bytes) and their attribtes (addresses) Each object can now have mltiple addresses Every byte in main memory: one physical address, one (or more) virtal addresses CS53 Lectre 6 Paging 2
Recap: Paging Paging solves the external fragmentation problem by sing fixed sized nits in both physical and virtal memory Virtal Memory Page Page 2 Page 3 Physical Memory Page N CS53 Lectre 6 Paging 3
Segmentation and Paging Can combine segmentation and paging The x86 spports segments and paging Use segments to manage logically related nits Modle, procedre, stack, file, data, etc. Segments vary in size, bt sally large (mltiple pages) Use pages to partition segments into fixed size chnks Makes segments easier to manage within physical memory» Segments become pageable rather than moving segments into and ot of memory, jst move page portions of segment Need to allocate page table entries only for those pieces of the segments that have themselves been allocated Tends to be complex CS53 Lectre 6 Paging 4
Managing Page Tables Last lectre we compted the size of the page table for a 32-bit address space w/ 4K pages to be 4MB This is far too mch overhead for each process How can we redce this overhead? Observation: Only need to map the portion of the address space actally being sed (tiny fraction of entire addr space) How do we only map what is being sed? Can dynamically extend page table Use another level of indirection: two-level page tables CS53 Lectre 6 Paging 5
Two-Level Page Tables Two-level page tables Virtal addresses (VAs) have three parts:» Master page nmber, secondary page nmber, and offset Master page table maps VAs to secondary page table Secondary page table maps page nmber to physical page Offset indicates where in physical page address is located CS53 Lectre 6 Paging 6
One-Level Page Lookps Physical Memory Virtal Address Page nmber Offset Page Table Physical Address Page frame Offset Page frame CS53 Lectre 6 Paging 7
Two-Level Page Lookps Physical Memory Virtal Address Master page nmber Secondary Offset Page table Physical Address Page frame Offset Master Page Table Page frame Secondary Page Table CS53 Lectre 6 Paging 8
Example How many bits in offset? 4K = 2 bits 4KB pages, 4 bytes/pte Want master page table in one page: 4K/4 bytes = K entries Hence, K secondary page tables How many bits? Master page nmber = bits (becase K entries) Offset = 2 bits Secondary page nmber = 32 2 = bits CS53 Lectre 6 Paging 9
A Two-Level Page Table Hierarchy Level page table Level 2 page tables Virtal memory VP PTE PTE PTE 2 (nll) PTE 3 (nll) PTE 4 (nll) PTE 5 (nll) PTE 6 (nll) PTE 7 (nll) PTE... PTE 23 PTE... PTE 23... VP 23 VP 24... VP 247 Gap 2K allocated VM pages for code and data 6K nallocated VM pages PTE 8 (K - 9) nll PTEs 23 nll PTEs 32 bit addresses, 4KB pages, 4-byte PTEs PTE 23 23 nallocated pages VP 925 allocated VM page CS53 Lectre 6 Paging for the stack... 23 nallocated pages
Two-level Paging Two-level paging redces memory overhead of paging Only need one master page table and one secondary page table when a process begins As address space grows, allocate more secondary page tables and add PTEs to master page table What problem remains? Hint: what abot memory lookps? CS53 Lectre 6 Paging
Efficient Translations Recall that or original page table scheme dobled the latency of doing memory lookps One lookp into the page table, another to fetch the data Now two-level page tables triple the latency! Two lookps into the page tables, a third to fetch the data And this assmes the page table is in memory How can we se paging bt also have lookps cost abot the same as fetching from memory? Cache translations in hardware Translation Lookaside Bffer (TLB) TLB managed by Memory Management Unit (MMU) CS53 Lectre 6 Paging 2
TLBs Translation Lookaside Bffers Translate virtal page #s into PTEs (not physical addrs) Can be done in a single machine cycle TLBs implemented in hardware Flly associative cache (all entries looked p in parallel)» Keys are virtal page nmbers» Vales are PTEs (entries from page tables) With PTE + offset, can directly calclate physical address Why does this help? Exploits locality: Processes se only handfl of pages at a time» 6-48 entries/pages (64-92K)» Only need those pages to be mapped Hit rates are therefore very important CS53 Lectre 6 Paging 3
TLB Hit CPU Chip TLB 2 PTE VPN 3 CPU VA MMU PA 4 Cache/ Memory Data 5 A TLB hit eliminates one or more memory accesses CS53 Lectre 6 Paging 4
TLB Miss CPU Chip TLB 4 2 PTE VPN 3 CPU VA MMU PTEA PA Cache/ Memory 5 Data 6 A TLB miss incrs an additional memory access (the PTE) Fortnately, TLB misses are rare. Why? CS53 Lectre 6 Paging 5
Managing TLBs Hit rate: Address translations for most instrctions are handled sing the TLB >99% of translations, bt there are misses (TLB miss) Who places translations into the TLB (loads the TLB)? Hardware (Memory Management Unit) [x86]» Knows where page tables are in main memory» OS maintains tables, HW accesses them directly» Tables have to be in HW-defined format (inflexible) Software loaded TLB (OS) [MIPS, Alpha, Sparc, PowerPC]» TLB falts to the OS, OS finds appropriate PTE, loads it in TLB» Mst be fast (bt still 2-2 cycles)» CPU ISA has instrctions for maniplating TLB» Tables can be in any format convenient for OS (flexible) CS53 Lectre 6 Paging 6
Managing TLBs (2) OS ensres that TLB and page tables are consistent When it changes the protection bits of a PTE, it needs to invalidate the PTE if it is in the TLB (special hardware instrction) Reload TLB on a process context switch Invalidate all entries Why? Who does it? When the TLB misses and a new PTE has to be loaded, a cached PTE mst be evicted Choosing PTE to evict is called the TLB replacement policy Implemented in hardware, often simple, e.g., Least Recently Used (LRU) CS53 Lectre 6 Paging 7
Simple Memory System Example Addressing 4-bit virtal addresses 2-bit physical address Page size = 64 bytes 3 2 9 8 7 6 5 4 3 2 VPN Virtal Page Nmber VPO Virtal Page Offset 9 8 7 6 5 4 3 2 PPN PPO Physical Page Nmber CS53 Lectre 6 Paging 8 Physical Page Offset
Simple Memory System Page Table Only show first 6 entries (ot of 256) VPN PPN Valid VPN PPN Valid 28 8 3 9 7 2 33 A 9 3 2 B 4 C 5 6 D 2D 6 E 7 F D CS53 Lectre 6 Paging 9
Simple Memory System TLB 6 entries 4-way associative TLBT TLBI 3 2 9 8 7 6 5 4 3 2 VPN VPO Set Tag PPN Valid Tag PPN Valid Tag PPN Valid Tag PPN Valid 3 9 D 7 2 3 2D 2 4 A 2 2 8 6 3 3 7 3 CS53 Lectre 6 Paging 2 D A 34 2
Simple Memory System Cache 6 lines, 4-byte block size Physically addressed Direct mapped CT CI CO 9 8 7 6 5 4 3 2 PPN PPO Idx Tag Valid B B B2 B3 Idx Tag Valid B B B2 B3 9 99 23 8 24 3A 5 89 5 9 2D 2 B 2 4 8 A 2D 93 5 DA 3B 3 36 B B 4 32 43 6D 8F 9 C 2 5 D 36 72 F D D 6 4 96 34 5 6 7 3 6 C2 E 3 83 77 B CS53 Lectre 6 Paging 2 DF 3 F 4 D3
Address Translation Example # Virtal Address: x3d4 TLBT TLBI 3 2 9 8 7 6 5 4 3 2 VPN VPO xf x3 x3 Y N xd VPN TLBI TLBT TLB Hit? Page Falt? PPN: Physical Address CT CI CO 9 8 7 6 5 4 3 2 PPN PPO x5 xd Y x36 CO CI CT Hit? Byte: CS53 Lectre 6 Paging 22
Address Translation Example #2 Virtal Address: xb8f TLBT TLBI 3 2 9 8 7 6 5 4 3 2 VPN VPO x2e 2 xb N Y TBD VPN TLBI TLBT TLB Hit? Page Falt? PPN: Physical Address CT CI CO 9 8 7 6 5 4 3 2 PPN PPO CO CI CT Hit? Byte: CS53 Lectre 6 Paging 23
Address Translation Example #3 Virtal Address: x2 TLBT TLBI 3 2 9 8 7 6 5 4 3 2 VPN VPO x x N N x28 VPN TLBI TLBT TLB Hit? Page Falt? PPN: Physical Address CT CI CO 9 8 7 6 5 4 3 2 PPN PPO x8 x28 N Mem CO CI CT Hit? Byte: CS53 Lectre 6 Paging 24
Intel Core i7 Memory System Processor package Core x4 Registers Instrction fetch MMU (addr translation) L d-cache 32 KB, 8-way L i-cache 32 KB, 8-way L d-tlb 64 entries, 4-way L i-tlb 28 entries, 4-way L2 nified cache 256 KB, 8-way L2 nified TLB 52 entries, 4-way QickPath interconnect 4 links @ 25.6 GB/s each To other cores To I/O bridge L3 nified cache 8 MB, 6-way (shared by all cores) DDR3 Memory controller 3 x 64 bit @.66 GB/s 32 GB/s total (shared by all cores) CS53 Lectre 6 PagingMain memory 25
End-to-end Core i7 Address Translation CPU Virtal address (VA) 36 2 32/64 Reslt L2, L3, and main memory VPN 32 VPO 4 L hit L miss TLB miss TLBT TLBI... TLB hit L d-cache (64 sets, 8 lines/set)... L TLB (6 sets, 4 entries/set) CR3 9 VPN PTE 9 9 9 VPN2 VPN3 VPN4 PTE PTE PTE 4 2 PPN PPO Physical address (PA) 4 6 6 CT CI CO Page tables CS53 Lectre 6 Paging 26
Core i7 Level -3 Page Table Entries 63 XD 62 Unsed 52 5 2 9 8 7 6 5 4 3 2 Page table physical base address Unsed G PS A CD WT U/S R/W P= Available for OS (page table location on disk) P= Each entry references a 4K child page table P: Child page table present in physical memory () or not (). R/W: Read-only or read-write access access permission for all reachable pages. U/S: ser or spervisor (kernel) mode access permission for all reachable pages. WT: Write-throgh or write-back cache policy for the child page table. CD: Caching disabled or enabled for the child page table. A: Reference bit (set by MMU on reads and writes, cleared by software). PS: Page size either 4 KB or 4 MB (defined for Level PTEs only). G: Global page (don t evict from TLB on task switch) Page table physical base address: 4 most significant bits of physical page table address (forces page tables to be 4KB aligned) CS53 Lectre 6 Paging 27
Smmary Page Optimizations Managing page tables (space) Efficient translations (TLBs) (time) CS53 Lectre 6 Paging 28
Next time Advanced Paging Preparation Read Modle 2 CS53 Lectre 6 Paging 29