Virtual Memory COMPSCI 386
Motivation An instruction to be executed must be in physical memory, but there may not be enough space for all ready processes. Typically the entire program is not needed. Exception code needed only rarely. Arrays, lists, and tables typically have more memory than needed. For example, the symbol table created by the compiler may have space for over 2000 identifiers, but a typical program has only about 200 identifiers. Even if the entire program is needed, it is probably not needed in its entirety at any one time.
The Basic Idea Virtual memory completely separates the logical view of memory from its physical aspects. Provides an abstract view of memory as a uniform array of practically unlimited length. Allows execution of programs too large to fit in physical memory.
The Basic Idea virtual address space virtual memory larger than physical memory
Benefits Programs not constrained by physical memory. Increased degree of multiprogramming, leading to greater degree of CPU utilization and throughput at no cost to response time. Less I/O needed to swap entire processes in and out of memory, so processes execute more quickly.
Demand Paging Bring a page into memory only when needed. Less memory and I/O required. Faster response time. More user processes executing concurrently. Called lazy swapping. We use the term pager instead of swapper since we are talking about
Valid/Invalid Bit If set to invalid then: page is not in address space, or; page reference is valid but the page is not in memory.
Page Fault Trap to the OS. Context switch to interrupt handler. Check that page reference was valid and find location of the page on disk. Select a free frame or, if not available, select an existing frame to swap out and copy it to disk.
Page Fault Issue a read from the disk to free frame. Wait in queue until read request completed. Seek time and latency time. Begin transfer of the page to a free frame. Waiting CPU allocated to another process. Context switch to disk read interrupt handler.
Page Fault Update page table. Move to ready queue. Wait for CPU. Context switch. Go!
Effective Access Time Memory access: 200 nanoseconds Average page fault service time: 8 milliseconds That s 8,000,000 nanoseconds! p = page fault rate EAT = 200(1 p) + 8,000,000(p) If p = 0.001, then EAT = 8002 nanoseconds! If we want just a 10% slowdown then we need a page fault rate of just 1/400,000.
Copy on Write Parent and child process share pages initially. Only when one of the processes modifies a shared page is it actually copied. The fork function uses copy on write. Linux provides vfork. Parent is suspended an child uses address space of the parent (dangerous) until exec is called.
Page Replacement If page fault service routine determines that no free frame is available, a frame in use must be selected and written back to disk. The use of a dirty bit reduces the overhead of page transfers. Executables are never dirty.
Page Replacement We evaluate a page replacement algorithm by running it on a reference string. Reference strings may be randomly generated with respect to a particular probability distribution, or it may be a trace from an actual execution. If a reference string has consecutive references to the same page (as in 4 3 3 2 1 1 1 3 2), analysis of page replacement algorithm may be simplified by deleting consecutive repeats (4 3 2 1 3 2). Why?
Page Faults and Frame
FIFO 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 7 7 7 0 1 2 3 0 4 2 3 0 1 2 7 0 0 1 2 3 0 4 2 3 0 1 2 7 0 1 2 3 0 4 2 3 0 1 2 7 0 1 15 page faults
Belady s Anomaly Try reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 How many page faults, assuming 3 frames? What if we had an extra frame?
Belady s Anomaly Try reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 How many page faults, assuming 3 frames? What if we had an extra frame?
Optimal Replacement 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 7 7 7 0 1 2 3 0 4 2 3 0 1 2 7 0 0 1 2 3 0 4 2 3 0 1 2 7 0 1 2 3 0 4 2 3 0 1 2 7 0 1 FIFO: 15 page faults OPT: replace the page that will not be requested for the longest period of time.
Least Recently Used 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 7 7 7 0 1 2 3 0 4 2 3 0 1 2 7 0 0 1 2 3 0 4 2 3 0 1 2 7 0 1 2 3 0 4 2 3 0 1 2 7 0 1 FIFO: 15 page faults In what sense is LRU the same as OPT?
Stack Replacement Algorithms A priority is assigned to each page independently of the number of allocated frames. Examples: OPT, LRU, LFU Non-example: FIFO The set of pages in memory with k frames is always a subset of the pages in memory with k+1 frames. What does this buy us?
Least Recently Used 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 7 7 7 2 2 4 4 4 0 1 1 1 0 0 0 0 0 0 3 3 3 0 0 1 1 3 3 2 2 2 2 2 7 7 7 7 7 3 3 3 7 0 0 0 0 0 0 0 1 1 1 4 1 1 2 2 2 2 2
Implementing LRU Add a logical clock, or counter, to the CPU. Every memory reference increments counter. Page table has an entry for time of last access. Entire page table must be searched. Without special hardware support, there would be an interupt for every memory access in order to update the counter. This would slow down processes by a factor of 10.
Implementing LRU Doubly-linked stack of page numbers. When page is reference, its entry is removed from the stack and pushed on top. The update operation takes linear time, but the LRU page number can be found in constant time. (Why?) Requires hardware support.
Approximating LRU Reference bit There is a reference bit (initially 0) for each page. When page is referened, bit is set to 1. Replace a page with reference bit = 0. Additional reference bits Byte (for each page) used as a shift register. Records history of last 8 time periods. Time period = ~100 milliseconds (timer interupt).
Approximating LRU Second Chance Algorithm FIFO, except each page gets a second chance before being replaced.
Enhanced Second Chance Reference bit and modified bit 00 = not recently used, not modified. 01 = not recently used, modified. 10 = recently used but clean. 11 = recently used, dirty. Best page to replace? Which page should not be replaced?
Enhanced Second Chance Reference bit and modified bit 00 = not recently used, not modified. 01 = not recently used, modified. 10 = recently used but clean. 11 = recently used, dirty. Best page to replace? 00. Less likely to be needed. Which page should not be replaced? 11. Probably will be needed again soon, and must be copied to disk.
Counting Based LFU Problematic when page is heavily used at first, but not much after that. We can shift the count bits to the right, forming an exponentially decaying average use count. MFU The page with the smallest count has perhaps just been brought into memory and will soon be needed frequently. Both approaches are expensive and not particularly good.
Allocation of Frames Usually each process is allocated a fixed number of frames, perhaps in proportion to its size. Local replacement: each process selects a victim only from its own set of allocated frames. Global replacement: generally results in greater throughput and is more commonly used. But note that in this case a process has no control over its own page fault rate.
Thrashing A process does not have enough frames to support the pages in active use. It spends more time page faulting than executing.
Thrashing Example Global page replacement. OS monitors CPU utilization. If too low, degree of multiprogramming is increased. A process enters a new phase of execution and starts to thrash, taking pages from other processes. Page faulting processes queue up for the pager, the ready queue empties, and CPU utilization decreases.
Thrashing With global replacement, thrashing may cascade through the system. This effect can be limited with local replacement. But if some processes are thrashing, they will spend most of their time in the waiting queue for the pager, which increases the queue length and the effective access time for all processes.
Locality of Reference A locality is a set of pages used actively together. Loops, blocks, functions, objects,... As a typical process executes, it moves from one locality to another. The current page request is very likely to be among subsequent page requests. Crucial for caching. To maximize CPU utilization, we allocate as many frames to a process as needed to support its current locality.
Working Set The set of pages in the most recent Δ references. Typical Δ = 10,000 For practical reasons, WS must be approximated.
Working Set Δ too small: high page fault rate. Δ too large: pages for several localities. If the sum of the working sets of all processes exceeds the total number of frames, thrashing occurs. A process must be suspended. Working set approximation: keep track of all pages referenced within the last n units of time. What if a process has not been scheduled in a while?
Page Fault Frequency If page fault rate too high, process gets extra frame. If page fault rate too low, process loses a frame.
Kernel Memory Treated differently from user memory. Often allocated from a special pool. Kernel requests memory for structures of varying size, sometimes smaller than a page. Some kernel memory must be contiguous. Some devices interact directly with main memory, bypassing the virtual memory interface.
Buddy System Binary tree for memory management. The buddy of each block can be found with an XOR of the block s address and its size. Advantage: compactification of free memory in log time. Disadvantage: internal fragmentation.
Slab Allocation A slab is a set of contiguous pages. Each cache contains one or more slabs. There is a cache for each kernel data structure. No fragmentation. Fast memory access. First appeared in Solaris 2.4 kernel. Now used in Linux.
Other Issues Prepaging to reduce page faults at startup. Page size? Fragmentation, table size, I/O overhead, locality. Multiple page sizes? Program structure and language. Pages must sometimes be locked into memory, as when copying a file from a device (otherwise I/O requests may cause thrashing).
Other Issues Demand paging is supposed to be transparent to compilers, but knowledge of the pager can be exploited. Page size: 128 bytes. int[][] data = new int[128][128]; for (int c = 0; c < 128; c++) for (int r = 0; r < 128; r++) data[r][c] = 1; Arrays are stored in row-major order, so if there are fewer than 128 frames per process, each array access will cause a page fault. A smart compiler might translate data[c][r] = 1;
Other Issues Choice of programming language? C/C++ use of pointers has the effect of scattering memory accesses and thereby increasing the page fault rate. Some studies have shown that object-oriented programs tend to have poor locality of reference.