CS 3733 Operating Systems: Topics: Virtual Memory (SGG, Chapter 09) Instructor: Dr. Dakai Zhu Department of Computer Science @ UTSA 1 Reminders Assignment 3: due March 30, 2018 (mid-night)! Part I: fixed mapping with given page table content! Part II: general mapping with page replacement (LRU)! Part III: [Extra credit] flexible page table size Midterm Exam II: on Thursday, April 5, 2018! In class 75 minutes! Close books/notes/phones 1
Assignment 3: Read Binary Logical Addr. unisgned long *logicaladdr; //find file size from file stat using stat(filename, &st) logicaladdr = (unsigned long*) malloc( (size_t) st.st_size); int addrnum=0; totaladdr=st.st_size/sizeof(unsigned long); Approach #1: FILE *pfinput = fopen(filename, rb ); //check error properly while ( addrnum < totaladdr){ fread(&logicaladdr[addrnum], sizeof(unsigned long), 1, pfinput); ; addrnum++; } Approach #2: fd = open(filename, O_RDONLY); read(fd, logicaladdr, (int) st.st_size); //read all bytes together; need to check errors access logicaladdr[addrnum] for addrnum from 0 to totaladdr printf("logical=%#010lx: page# %u ->frame# %u : offset #%3u : physical=%#010lx\n", logicaddr[i], page, frame, offset, phyaddr); 3 Reviews on Memory and Paging Simple partitioning for multiprogramming Two major problems in partition-based memory management! P1: Requirement of continuous physical memory! P2: Load ALL required space into memory at beginning Paging! Effectively address the problem P1 What about the 2 nd problem P2?! What if more programs need to run simultaneously?! What if a program is larger than available memory? 2
Objectives on Virtual Memory To know benefits of virtual memory To learn demand paging VM technique To learn page replacement algorithms To learn allocation of frames To discuss the principle of the working set model To consider other issues affecting the performance To learn Memory-mapped files To study kernel memory allocation To discuss VM and page replacement examples Outline Concept of virtual memory! Allocate more memory to processes than physical MEM The demand paging VM technique! Bring in a virtual page to a physical frame when need Classical page replacement algorithms! FIFO, Second chance, LRU (approximate), Optimal Allocation of physical frames Working set model and degree of multiprogramming Memory-mapped files and other issues of VM Kernel memory allocation Examples: VM and page replacement 3
How Much Memory for a Process? A process must be in physical memory for execution! Do we need all code/data together? " NO, only small part What if a program NOT fit into physical memory?! Out of luck if no other technique # Observations: NOT all code or data will be needed at the same time! Error handling codes! Big arrays with max size! Some data might not be needed at the same time Can we execute/run a process if only part of! its code/data is in memory? Department of Computer Science @ UTSA 7 Why Virtual Memory? Allows a process NOT completely in memory to run User will have a very large logical address space! Execute programs larger than physical memory Especially helpful in multi-programmed systems! More processes can be executed concurrently! Each process occupies small portion of memory! Less I/O to load or swap user programs Physical Memory de/allocation! Keep recently used content in physical memory! Move less recently used stuff to disk! Movement to/from disk handled by the OS Department of Computer Science @ UTSA 8 4
Virtual (Logical) Address Space Process use virtual address! Addresses local to the process! Can be any size " limited by # of bits in address (32/64) Virtual address space! logical (or virtual) view of how a process is stored in memory! Virtual memory space Virtual address! Determine size of virtual memory! Can be much larger than physical memory 2 N 0 Auxiliary regions Stack Heap Text Virtual Memory Basic idea: allow OS to allocate more memory than what is available Files/memory: shared by two or more processes:! Libraries can be shared! Processes can communicate via shared memory! Fork() can speedup How to get physical address from the virtual one?! Department of Computer Science @ UTSA 10 5
Paging and Page Systems Virtual address! Divided into pages Physical memory! Divided into frames Page vs. Frame! Same size address block! Unit of mapping/allocation A page is mapped to a frame! All addresses in the same virtual page are in the same physical frame " offset in a page 60 64K 56 60K - 52 56K 48 52K - 44 48K 5 40 44K 1 36 40K 32 36K - 28 32K 3 24 28K 20 24K - 16 20K 0 12 16K 8 12K - 4 8K 4 0 4K 7 Virtual space 7 6 5 4 3 2 1 0 28 32K 24 28K 20 24K 16 20K 12 16K 8 12K 4 8K 0 4K Physical! memory Department of Computer Science @ UTSA 11. MMU: Virtual vs. Physical Addresses CPU CPU chip Virtual addresses from CPU to MMU Physical addresses on bus, in memory MMU Memory Disk! controller Virtual address space! Determined by instruction width! Same for all processes Physical memory indexed by physical addresses! Limited by bus size (# of bits)! Amount of available memory Memory Management Unit (MMU)! Translation: virtual " physical address! Only physical addresses leave the CPU/MMU chip How to implement virtual memory? 6
Demand Paging VM Technique When to Load Pages to Physical Memory? Previously: load everything (all pages) at once!! Slow starting a program! Not all loaded parts may be utilized! Process size is limited by physical memory Demand Paging VM technique! Load a page when it is needed!! Less memory needed! Less I/O needed! Faster response! More users/programs to run simultaneously Department of Computer Science @ UTSA 13 Valid-Invalid Bit With each page table entry a valid invalid bit is associated v in-memory, i not-in-memory) Initially valid invalid bit is set to i on all entries During address translation, if valid invalid bit in page table entry is i page fault What does it mean to have a page fault? 7
Page Fault: What to Do? 1. Reference to a page, If Invalid reference abort 2. If not in memory, page fault occurs (trap to OS) 3. Operating system allocates an empty frame 4. Swap page into frame 5. Reset page tables, set validation bit = v 6. Restart the instruction that caused the page fault Department of Computer Science @ UTSA 15 Page Fault: How to Restart an Instruction? During inst fetch! get the page and re-fetch During operand fetch! get the page and re-fetch instruction! (how many pages need depends on architecture, e.g., add a b c) But how about block move! Use temp buffer! Make sure both ends of the buffers are in the memory! If page fault occurs restore before re-starting Department of Computer Science @ UTSA 16 8
Performance of Demand Paging Page Fault Rate 0 p 1.0! if p = 0 no page faults (perfect! Impossible" cold start)! if p = 1, every reference is a fault (doomed! Impossible!) Effective Access Time (EAT) EAT = (1 p) x mem_time + p x page_fault_time page_fault_time depends on several factors! Save user reg and proc state, check page ref, read from the disk there might be a queue, (CPU can be given to another proc), get interrupt, save other user reg and proc state, correct the page table, put this process into ready queue.. Due to queues, the page_fault_time is a random variable Department of Computer Science @ UTSA 17 An Example Memory access time = 200 nanoseconds Average page-fault time = 8 milliseconds (disk access time) EAT = (1 p) x 200 + p (8 milliseconds) = (1 p x 200 + p x 8,000,000 = 200 + p x 7,999,800 If one access out of 1,000 causes a page fault, then EAT = 8.2 microseconds = 8200 nanoseconds! This is a slowdown by a factor of 40!! If we want 10% performance degradation, then p should be 220 > (1 p) x 200 + p (8 milliseconds) " p < 0.0000025, i.e., 1 page fault out of 400,000 memory accesses Department of Computer Science @ UTSA 18 9
Process Creation: Copy-on-Write Copy-on-Write (COW)! parent and child processes initially share the same pages in memory! If either process modifies a shared page, only then is the page copied! COW allows more efficient process creation as only modified pages are copied Department of Computer Science @ UTSA 19 Page Replacement Demand paging: get a free frame when need! What happens if there is NO free frame?! Option 1: Terminate user program! Option 2: Swap out some pages " which page? Issues to consider for page replacement! Pages that are replaced might be needed again later! Algorithms to minimize the number of page faults! Other improvement, e.g., use modify (dirty) bit to reduce overhead of page transfers only modified pages are written to disk Department of Computer Science @ UTSA 20 10
Page Replacement: An Example Which frame can be used for M? Department of Computer Science @ UTSA 21 Page Replacement: Basic Steps Find the location of the desired page on disk If free frame exist, use it If no free frame, page replacement algorithm 1. Select a victim frame, swap it out (use dirty bit to swap out only modified frames) 2. Bring the desired page into the (newly) free frame; 3. update the page and frame tables Restart the process Department of Computer Science @ UTSA 22 11
Page Replacement Algorithms How to select the victim frame?! You can select any frame, the page replacement will work; What about the performance?! Want an algorithms that gives the lowest page-fault rate Evaluate an algorithm by running it on a particular string of memory references (reference string) and compute the number of page faults on that string! In our examples, we assume the process is allocated only 3 frames and the reference string is shown above Department of Computer Science @ UTSA 23 First-In-First-Out (FIFO) Maintain an FIFO queue! + The code used before may not be needed! - An array used early, might be used again and again Easy to implement: update FIFO queue when page fault occurs Total of 15 page faults Department of Computer Science @ UTSA 24 12
Belady s Anomaly for FIFO Intuition: more frames " fewer page faults Belady s Anomaly: more frames " more page faults Ideal case! Example: Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 Department of Computer Science @ UTSA 25 Optimal Algorithm Replace the page that will not be used for longest period of time in the future! 1 2 3 4 X 1 2 3 5 Total of 9 page faults! How do you know the future? Answer is NO Used for measuring how well your algorithm performs Department of Computer Science @ UTSA 26 13
Least Recently Used (LRU) Algorithm Use recent past as an approximation of the future Select the page that is not used for a long time! NO Belady s Anomaly: so more frames less page faults Total of 12 page faults! Department of Computer Science @ UTSA 27 LRU Algorithm (Cont.) Counter (logical clock) implementation! Increase the counter every time a page is referenced! Save it into time-of-use field associated with this page s entry in the page table! When a page needs to be replaced, find the one that has the smallest time-of-use value! Problems: Counter overflow and linear search Stack implementation keep a stack of page numbers in a double link form:! Page referenced: $ move it to the top $ requires 6 pointers to be changed! No search for replacement $ Least recently used one is at the bottom Need Hardware support 14
LRU Practice Problem Given the reference string of page accesses: 1 2 3 4 2 3 4 1 2 1 1 3 1 4 and a system with 3 frames, what is the final configuration of the three frames after the true LRU algorithm is applied? Department of Computer Science @ UTSA 29 LRU Approximation Algorithms Reference bit! With each page associate a reference bit, initially = 0! When page is referenced, set this bit to 1 by hardware! Replace the one which is 0 (if one exists) $ We do not know the order, however $ Additional bits can help to gain more ordering information Second chance Algorithm! FIFO with an inspection of ref bit! If ref bit is 0, $ replace that page $ set its ref bit to 1! If ref bit is 1, /* give a second chance */ $ set ref bit to 0 $ leave page in memory $ Arrival time set to current time $ go to next one! Enhance it modify bit, avoid replacing modified pages What if all bits are 1. All pages will get second chance. Degenerates FIFO 15
Summary: Page Replacement Algorithms Algorithm FIFO (First-In, First Out) Second chance LRU (Least Recently Used) OPT (Optimal) Comment Might throw out useful pages Big improvement over FIFO Excellent, but hard to implement exactly Not implementable, but useful as a benchmark Department of Computer Science @ UTSA 31 Outline Concept of virtual memory! Allocate more memory to processes than physical MEM The demand paging VM technique! Bring in a virtual page to a physical frame when need Classical page replacement algorithms! FIFO, Second chance, LRU (approximate), Optimal Allocation of physical frames Working set model and degree of multiprogramming Memory-mapped files and other issues of VM Kernel memory allocation Examples: VM and page replacement 16
Frame Allocation Equal allocation! Allocate same number of frames to each process! E.g.,100 frames and 5 processes " each gets 20. Proportional allocation Allocate according to the size of process Priority allocation proportional with priority rather than size s = size of process p i S = s i m = total number of frames si ai = allocation for pi = m S m = 64 s 1 =10 s 2 =127 a 1 = 10 137 64 5 a 2 = 127 64 59 137 i Global vs. Local Allocation Global replacement process selects a replacement frame from the set of ALL frames; one process can take a frame from another! High priority processes can take all frames from low priority ones (cause thrashing)! A process cannot control its page fault rate Local replacement each process selects from only its own set of allocated frames! Pro: Consistent performance! Con: Lower utilization of memory and less throughput 17
Minimum Number of Frames for a Process Each process needs minimum number of pages! e.g., add a b c might require 3 pages IBM 370 6 pages to handle SS MOVE instruction:! instruction is 6 bytes, might span 2 pages! 2 pages to handle from! 2 pages to handle to Minimum depends on architecture! Level of indirection Maximum depends on available memory How about the optimal to maximize CPU utilization? Thrashing If a process does not have enough pages, the page-fault rate is very high " thrashing: process busy with swapping pages in and out This leads to:! low CPU utilization! operating system thinks that it needs to increase the degree of multiprogramming! another process added to the system! But then " more page faults for all processes 18
Locality and Thrashing Enough frames per process " prevent thrashing! But how much is enough Locality model! Process migrates from one locality to another (that is actually why demand paging or cashing works)! Localities may overlap When Σ size of locality > total memory size thrashing occurs Increase locality in your programs! Working-Set Model Δ working-set window fixed number of instructions! Example: 10,000 instruction WSS i (working set of a Process P i ) = total number of pages referenced in most recent Δ (varies in time)! if Δ too small will not encompass entire locality! if Δ too large will encompass several localities! if Δ = will encompass entire program 19
Keeping Track of the Working Set Approximate with interval timer + a reference bit Example: Δ = 10,000! Timer interrupts after every 5000 time units! Keep 2 bits per page: when timer interrupts copy and set the values of all reference bits to 0! If one of the bits in memory = 1 page in working set Why is this not completely accurate?! Improvement = 10 bits and interrupt every 1000 time units Working-Set vs. Degree of Multiprogramming D = Σ WSS i total demand frames When D > (available frames) m Thrashing Effects on multiprogramming! if D > m, then suspend one of the processes! i.e., reduce the degree of multiprogramming 20
Page-Fault Frequency (PFF) Scheme Working set: clumsy way to control thrashing PFF: more direct approach! High PFF " more thrashing! Establish acceptable pagefault rate $ If actual rate is too low, process loses frame $ If actual rate is too high, process gains frame Suspend a process if PFF is above upper bound when there is no free frame! Other Issues in VM Main concerns: page replacement & frame allocation Other issues! Size of page/frame! Pre-paging! TLB reach! Program structure! I/O interlock Department of Computer Science @ UTSA 42 21
Other Issues Page Size Page size selection must take into consideration:! Fragmentation (small size page is better)! Table size (large size page is better)! I/O overhead $ Seek $ Latency $ Transfer! Locality New OS tends to use larger an larger sizes.! KB " MB Other Issues -- Prepaging Pre-paging! To reduce the large number of page faults that occurs at process startup! Pre-page all or some of the pages a process will need, before they are referenced But what if pre-paged pages are NOT used! I/O and memory was wasted Assume s pages pre-paged with ratio α of them used! Is cost of s * α pages faults > or < the cost of pre-paging s * (1- α) unnecessary pages?! α near zero pre-paging loses 22
Other Issues TLB Reach TLB Reach - Amount of memory accessible from TLB! TLB Reach = (TLB Size) X (Page Size) Large TLB: working set of process is stored in TLB! But associative memory is expensive and power hungry Increase Page Size! Increase in fragmentation as not all applications require a large page size Provide Multiple Page Sizes! This allows applications that require larger page sizes the opportunity to use them without an increase in fragmentation Other Issues Program Structure Increase locality, separate code and data, avoid page boundaries for routines arrays! Stack has good locality but hash has bad locality! Pointers, Objects may diminish locality Example: int[128,128] data;! Each row is stored in one page Program 1 for (j = 0; j <128; j++) for (i = 0; i < 128; i++) data[i,j] = 0;! 128 x 128 = 16,384 page faults Program 2 for (i = 0; i < 128; i++) for (j = 0; j < 128; j++) data[i,j] = 0;! 128 page faults 23
Other Issues I/O interlock Users I/O might be done through kernel (mem-to-mem copy overhead) I/O Interlock Pages must sometimes be locked into memory Consider I/O - Pages that are used for copying a file from a device must be locked from being selected for eviction by a page replacement algorithm Lock bit might be dangerous! What if it locked due to a bug in OS! Some uses it as a hint but ignore it! Some periodically clears it Memory-Mapped Files Map disk blocks to a frame(s) in memory! File I/O can be treated as routine memory access and avoid avoiding system calls like read() write()! Data is written into memory is NOT immediate write to disk! A file is initially read using demand paging.! A page-sized portion of the file is read from the file system into a physical page.! Subsequent reads/writes to/from the file are treated as ordinary memory accesses. Also allows several processes to map the same file! Allow the frames in memory to be shared. 24
Memory-Mapped Files (cont.) Two processes share memory mapped files User-Level Memory Mapping void *mmap(void *start, int len, int prot, int flags, int fd, int offset) Map len bytes starting at offset offset of the file specified by file description fd, preferably at address start! start: may be 0 for pick an address! prot: PROT_READ, PROT_WRITE,...! flags: MAP_ANON, MAP_PRIVATE, MAP_SHARED,... Return a pointer to start of mapped area (may not be start) 25
User-Level Memory Mapping (cont.) void *mmap(void *start, int len, int prot, int flags, int fd, int offset) len bytes len bytes start (or address chosen by kernel) offset (bytes) 0 0 Disk file specified by file descriptor fd Process virtual memory Comparison Mmap vs. File I/O Mmap implications! # of VM regions increase! Files can be accessed by both mmap and filesystem File I/O: when read() completes! All blocks in given range were loaded: all work done (disk)! Copy of data in user s buffer: file changes " no effects When mmap() completes! Mapping of the file is complete! Virtual address space of the process changed! Blocks (pages) are needed only on access! No guarantee that file is in memory Department of Computer Science @ UTSA 52 26
Outline Concept of virtual memory! Allocate more memory to processes than physical MEM The demand paging VM technique! Bring in a virtual page to a physical frame when need Classical page replacement algorithms! FIFO, Second chance, LRU (approximate), Optimal Allocation of physical frames Working set model and degree of multiprogramming Memory-mapped files and other issues of VM Kernel memory allocation Examples: VM and page replacement Kernel Memory Allocation Kernel needs large consecutive memory E.g., Linux PCB (struct task_struct)! > 1.7 Kbytes each! Created on every fork and every thread create $ clone()! deleted on every exit Kernel memory allocators! Buddy system! Slab allocation 27
Buddy System Allocates memory from fixedsize segment consisting of physically-contiguous frames Memory allocated using power-of-2 allocator! Satisfies requests in unit sized as power of 2! Request rounded up to next power of 2! When smaller allocation needed than is available, current chunk split into two buddies of next-lower power of 2 $ Continue until appropriate sized chunk available! When freed, combine buddies (called coalescing) Rounding up causes fragmentation, e.g., 33K needs 64K 50% might be wasted Buddy Allocation Example: allocate 65 contiguous frames.! Look in list of free 128-page-frame blocks.! If free block exists, allocate it, else look in next highest order list (here, 256-page-frame blocks).! If first free block is in 256-page-frame list, allocate a 128- page-frame block and put remaining 128-page-frame block in lower order list.! If first free block is in 512-page-frame list, allocate a 128- page-frame block and split remaining 384 page frames into 2 blocks: 256 and 128 page frames. These blocks are allocated to the corresponding free lists. Question: What is the worst-case internal fragmentation? 28
Buddy De-Allocation When blocks of page frames are released the kernel tries to merge pairs of buddy blocks of size b into blocks of size 2*b Two blocks are buddies if:! They have equal size b.! They are located at contiguous physical addresses.! The address of the first page frame in the first block is aligned on a multiple of 2b*2 12 The process repeats by attempting to merge buddies of size 2b, 4b, 8b etc Slab Allocator Kernel objects marked as used when structures stored Cache: one or more slabs! When cache created, filled with objects marked as free! Each cache filled with objects instantiations of the data structure! Single cache for each unique kernel data structure (process descriptions, file objects, semaphores) Slab is one or more physically contiguous pages! If slab is full, next object is allocated from empty slab! If no empty slabs, new slab allocated Benefits include % no fragmentation, % memory request is satisfied quickly 29
Windows XP Uses demand paging with clustering. Clustering brings in pages surrounding the faulting page. Processes are assigned working set minimum & maximum.! Working set minimum is the minimum number of pages the process is guaranteed to have in memory.! A process may be assigned frames up to its working set maximum Working set trimming! When the amount of free memory in the system falls below a threshold, automatic working set trimming is performed to restore the amount of free memory! removes frames from processes that have pages in excess of their working set minimum Solaris Maintains a list of free frames to assign faulting processes Lotsfree threshold parameter (amount of free memory) to begin paging Desfree threshold parameter to increasing paging Minfree threshold parameter to being swapping Paging is performed by pageout process! Pageout scans pages using modified clock algorithm Scanrate is the rate at which pages are scanned. This ranges from slowscan to fastscan Pageout is called more frequently depending upon the amount of free memory available 30