Outlook. Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The Intel Pentium

Main Memory

Outlook Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The Intel Pentium 2

Backgound

Background So far we considered how to share the resource CPU among processes. Processes require an additional vital resource: RAM. Active processes must be kept in memory. They must share the amount of available physical memory. Physical memory Large array of words/bytes Each is identified by one address Contiguous sequence of addresses (0,,2 n -1) How is this memory used? Fetch, decode, execute, write back Decode: possibly load more memory contents into registers Write back: possibly store register into memory (Some architectures: direct operations on memory) Here we are however concerned about just memory accesses but not their meaning. 4

Basic Hardware Support Support for (quick) memory access definitively required Memory and registers are the only storage that the CPU can access directly (with appropriate machine instructions) Operations on registers are fast (typically one clock cycle) Operations on RAM is very slow (many CPU cycles). Access requires a CPU stall since data required is not available. Because memory access is so frequent CPUs have caches (faster but also more expensive memory) in between registers and RAM Another question: how to protect several concurrently executing processes from Overwriting data and code of each other? Overwriting the kernel space? We need some hardware support! 5

Basic Hardware Support Each process gets a separate space in memory Determine the range of legal access Two register solution Base and limit can only be overwritten in supervisor mode Base and limit are deactivated in supervisor mode 6

Motivation: Address Binding On execution each instruction needs to be referable by a unique address requires address binding Where does such binding take place? see steps a user program typically goes through Possible steps for address binding Compile time absolute code (example old MS-DOS.COMformat) Load time relocatable code Execution time movable during execution (special hardware must be available, most general purpose operating systems use this scheme) 7

Virtual and Physical Addresses Physical address the true address value which is to be loaded into the MAR Virtual address the value used by the processor We also speak of physical address space and logical address space. Compile and load time binding physical address space = virtual address space Execution time Requires mapping from virtual address space into physical address space Requires hardware support: Memory Management Unit (MMU) 8

A simple MMU Scheme A combination of relocation and limit register Virtual address space: 0 to max Physical address space: 0+R to max+r 9

Remark: Dynamic Loading, Linking and Shared Libraries In general: dynamic loading a program does not reside completely in memory dynamically load called routines if called for the first time advantage: routines never called to not occupy any memory Library specific: static linking: loader combines system language libraries into the binary program image waste of memory and disk space Library specific: dynamic linking: similar to dynamic loading, but linking process instead of loading is postponed until the first call postpone linking until execution time (see figure on slide 7 again) Advantage: keep only one copy of the library code in memory instead of linking library routines into each program using it (Shared library) Stubs for library reference Locate the memory resident library or load it into memory (if not present) Replace themselfes with address of the desired routine next time that particular code segment calls library routine directly incurs no further cost 10

Swapping

Swapping Temporarily move process and its address space to backing store And bring it back to memory later for continued execution Enables execution of more processes than would fit into memory Dispatcher is responsible for swapping in the selected process and possibly swapping out of another one 12

Swapping Address binding at compile or load time? Process has to be swapped in at the same memory space Execution time binding is much better! Process can be swapped into a different location But be careful: swapping of a process waiting for I/O?!? Asynchronous I/O swapping new process into memory region accessed by I/O request Solutions: Never swap such a process or use I/O buffer in kernel space Typically (consider for instance UNIX) swapping is disabled when system load is low and only used when many processes are using a threshold or more amount of memory Reason: swapping is expensive (swapping time versus CPU execution time) 13

Contiguous Memory Allocation

Our Model for this Section 15

Motivation Assumption for now Each process is contained in a single contiguous section of memory Each process has exclusive access on this region Managing free and occupied space? Fixed partition scheme Contiguous memory allocation 16

Fixed Partition Scheme Fixed partition scheme (simple but outdated) Equal sized memory partitions Each process is located in one partition Example: p1 p2 p3 p4 Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 17

Contiguous Memory Allocation Variable partition scheme Organize free memory space in a table of memory holes Initially one large hole of free memory On process arrival search hole which is large enough Allocate as much as required only Possibly splitting the hole On process termination release block of memory Possibly merging holes Example: processes p1, p2, p3, p4 enter, process p2 leaves again, processes p5, p6 enter,. 18

Contiguous Memory Allocation Dynamic storage allocation problem which of the free holes to select? First fit find the first matching hole (the next time continue from there or start from the beginning) Best fit find the smallest hole the process fits in Worst fit find the largest hole Simulation results: first and best fit are better than worst fit First fit and best fit are comparable but first fit is generally faster In general two problems can occur with contiguous memory allocation Internal fragmentation External fragmentation 19

Internal Fragmentation Example: allocate 18462 bytes given a hole of 18464 Remaining hole of 2 bytes Small holes make no sense and increase overhead to keep track of all holes Solution: Use fixed sized blocks and allocate a sequence of such blocks This leads however to some internal fragmentation Allocated memory might be more than needed: Required memory Unused memory 20

External Fragmentation As processes are loaded and removed from memory free memory space is broken into little pieces Problem: enough total free memory but no contiguous space of the required space exists An extreme case: P 1 P 2 P 3 P 4 P 5 P 6? P 1 P 2 P 3 P 4 P 5 ok 21

Solutions for External Fragmentation Compactation Not possible for compile and load time address binding Expensive scheme Non-contiguous address space Paging Segmentation 22

Paging

Paging Partition physical memory into fixed sized blocks frames Logical memory is a contiguous address space build from fixed sized blocks pages page number page offset Page size has to be a power of two. Why? Frame 4KB and 4-byte page table entries: addressable physical memory? 24

Example: 32-byte memory and 4-byte pages Examples: Logical 6 11 12 Physical 25

Properties A form of dynamic relocation (similar to table of base registers for each frame) No external fragmentation Internal fragmentation possible (maximum: page size - 1) average ½ page size Hardware requires a page table only for the currently running process However, operating system has to maintain a page table for each process Used when mapping logical address to physical address manually (e.g. when user process provides a logical address in an IO system call) Used for restoring the page table to be used after a context switch 26

Hardware Support Every access to memory goes thought the paging map efficiency is a mayor consideration Solution 1: Dedicated registers Fast access Expensive context switch (reload the whole table into dedicated registers) Useful for small tables Example PDP-11: 16-Bit address, 8KB page size 8 table entries 3 Bit Page NR 8KB 13 Bit offset 27

Hardware Support Solution 2: Keep page table in memory Reasonable for large page tables e.g. 32 Bit address, 4 KB page tables 20 Bit Page Number 12 Bit Offset more than one million entries Only one register points to the page table Context switch: load only one register! (State of the art) 28

The Problem with Solution 2 Logical address PNR Offset + PTBR physical + address Data Page Table Frame Address One memory access requires two ones! Memory access is slowed by factor 2! 29

Solution Caching: translation look-aside buffer (TLB) 30

TLB issues What happens on a TLB miss? Lookup page NR in memory Insert NR in Cache for next use TLB full? OS has to follow a replacement policy (LRU, Random, ) Some TLB allow wired down entries which are not replaced TLB and Context switches? Erase complete table to avoid wrong mappings Or use ASIDs to associate entries with processes 31

TLB Hit Ratio Hit ratio percentage of times a number is found Effective memory access time (example) Memory access: 100 ns TLB search: 20 ns Hit ratio: 80% Access time on TLB hit: 20 + 100 = 120 Access time on TLB miss: 20 + 100 + 100 = 220 Effective access time = 0.8 * 120 + 0.2 * 220 = 140 40 percent slowdown Reference locality hit ratio more than 98% here it would mean a slow down less than 22 % 32

Protection Page table may contain additional flags Read, write, execute, validinvalid The first three mentioned flags limit memory use to reading, writing, executing The valid-invalid bit: limit memory usage to valid pages Alternative/supplementing solution: page table length register (PTLR) The use of PTLR reduces memory overhead of the page table in case a process uses only the first n pages. 33

Shared Pages Shared Code must be reentrant code does not change during execution (e.g. clear write flag in each page table register) Example 40 users 150 KB editor + 50 KB data per user total memory required: 8000 KB Sharing the editor code 150 KB + 40 * 50 KB = 2150 KB significant savings! 34

Hierarchical Paging Recall: 32 Bit logical address, 4 KB page size 2 32 / 2 12 Page table entries (about one million) Reducing the page table size 2 level paging scheme 35

Quiz Consider a two level paging scheme for 32 Bit adresses Let the first 10 Bits be used for the outer page Let the next 8 Bits be used for the inner pages Let the remaining 14 Bits be used for the page offset Ignoring any additional flags, what is (measured in Bytes): the size of the outer page? the size of an inner page? the size of a page? 36

2 Level Paging Scheme and 64 Bit Addresses? Outer Page Index 42 Bit Inner Page Index 10 Bit 2^42 entries in the outer page! Offset 12 Bit Solution: n-level paging (e.g. 7-level paging) Prohibitive number of memory accesses in case of a TLB miss! We need other solutions here... 37

Hashed Page Tables 38

Example Consider 4kByte frames. What is the physical address of (32,17) in the depicted example? 32 17 39 4 32 2 39

Inverted Page Tables 40

Example 16 Bit Addresses, 8-entry inverted page table. Size of a page/frame? Physical address in this example? 2 3 19 PID P 2 12 1 14 4 1 3 2 2 3 1 4 3 11 4 3 41

Segmentation

Segmentation Recall the simple hardware solution: relocation (sometimes also called base register) and limit register 43

Segmentation Extending this idea to a table of base and limit values 44

Segmentation Example 45

Segmentation Example 46

Quiz Where is address 3 22 mapped to? 47

Quiz Where is address 0 1500 mapped to? 48

Example: Intel Pentium

Example: Intel Pentium This architecture supports both Pure segmentation and Segmentation with paging We do not consider the whole memory management structure of the Pentium but rather the major ideas on what it is based Logical to physical address translation 50

Example: Intel Pentium 51

Example: Intel Pentium Two types of pages possible Page types are determined by a page directory flag 52

Summary and References

Summary Memory management typically includes Checking an address and address use (e.g. writing to an address) for legality Mapping an address to a physical address This can not be realized efficiently in software Thus, memory management provided by an operating system is always constrained by the available hardware features (e.g. limit/base-register, translation tables, TLB) We have considered two major techniques Paging Segmentation (and a combination of both) The study of memory management also includes Fragmentation (internal, external) Support for relocation to solve external fragmentation Swapping to allow more process than would fit into memory Sharing code or data Protection (execute-only, read-only, read-write) 54

References Silberschatz, Galvin, Gagne, Operating System Concepts, Seventh Edition, Wiley, 2005 Chapter 8 Main Memory 55