Memory and multiprogramming COMP342 27 Week 5 Dr Len Hamey Reading TW: Tanenbaum and Woodhull, Operating Systems, Third Edition, chapter 4. References (computer architecture): HP: Hennessy and Patterson chapter 5. MH: Murdocca and Heuring chapter 7. Memory architecture Fast memory is expensive and small. Large, cheap memory is slow. Memory hierarchy: Small, fast memory to store frequently used data; Large, slow memory to store lots of infrequently used data. Memory hierarchy Principle of Locality Principle of Locality Faster Cheaper Memory unit Registers Cache Main memory Hard disk Managed by Compiler Hardware and OS OS OS Memory accesses tend to be close together. Locality in time: Repeat access to the same memory word is likely in a short period of time. Example: same instruction in a loop. Example: same variable used many times. Locality in memory space (spatial): Access to a nearby memory word is likely in the near future. Example: sequence of instructions. Example: items in an array. Example: elements of a structure/record. Refer MH, section 7.6; HP, section 1.6
, cache and memory Cache is not used by I/O, multiprocessor Monoprogramming chip 3GHz Cache 3GHz L2 Cache Bus 4MHz Main memory 4MHz MH fig 7-12 adapted Cache Main memory Bus I/O processor 2nd Cache E.g. MS-DOS, early Macintosh User program OS in RAM Operating in ROM Device drivers in ROM User program User program OS in RAM TW 4.1 Linking (?) Dynamic linking Stubs Program symbols es Combine program with libraries Static: Library linked just like program modules Binary executable = Program + libraries Each program has its own copy of the library Link at execution time Usually only used for libraries Saves disk space: Program does not have its own copy of the library Saves memory: Many processes share the same memory copy of the library Stub included in the executable image for each call to a library routine When stub is called: If library routine not in memory, load it Replace itself with of routine Only one copy of the routine is needed in memory -- multiple processes can use it
Stubs Multiprogramming (TW 4.1.2) Multiprogramming requirements Program Code Stub Code execution reaches stub code, replaced by routine Program Code Routine Address Routine Code Each process occupies a separate range of memory. Prog C Prog B Prog A OS in RAM x6 Load R1,x4A84 x3 x1 Prog D Prog E Prog B OS in RAM x6 x4 Load R1,x4A84 R1,x2A84 x1 Relocation Protection Allocation and management Relocation: Logical and es (TW 4.1.3) Relocation register Protection: Base and Limit Registers Program/ generates logical es. Hardware (MMU) converts logical es to es. Memory accesses es. Physical = logical + relocation register. MMU x1 limit register logical < yes Base (relocation) register + Memory Logical MMU Physical Memory 1A84 2A84 Logical 1A84 + Physical 2A84 Memory no trap; ing error
Limited Program Size Overlays (TW 4.3) Overlay Example - Two Pass Assembler Can a program be larger than main memory? Code + data cannot fit in memory! Can a program be larger than main memory? Code + data cannot fit in memory! Program is divided into sections. When one part is finished, it is replaced by another. Programmer must design overlays. pass 1 code symbol table common routines overlay driver pass 2 code Swapping (TW 4.2) Swapping Memory allocation Can we have more processes than the available memory? Running process must be in memory. Not running can be swapped out to disk. Memory manager ensures that there are always processes in memory able to run. After swap back in, process may be in a different memory location. Swapping is slow compared to other context switches. Process must be completely idle to swap. Swap part of process is more common than entire process. Allocate portions of memory for use by Operating User programs
Contiguous allocation Divide memory into at least two sections or partitions. OS is usually placed in the partition which is lowest in memory, as that is where the interrupt vector is usually located. Protect OS from other processes and processes from each other. Multiple-partition allocation - fixed size Supports multiple processes. Memory partitions are fixed sized. Each partition can hold one process. OS tracks free and used partitions. OS selects a process from disk to load into free partitions. Drawbacks Limits program and data size to the size of a single partition. Limits the number of processes which can be simultaneously present in memory. Can result in poor utilisation of memory. Multiple-partition allocation - variable size OS tracks free/used memory areas. Loading a process: OS looks for a large enough free memory area (hole). first fit (fast; good utilisation) best fit (poor utilisation many small holes) worst fit (poor utilisation) Process termination: combine process s partition with any neighbouring holes. Multiple partition example 256K 4K 216K operating Processes to be run P1 6K 1s P2 1K 5s P3 3K 2s P4 7K 8s P5 5K 15s Multiple partition example 256K 23K 2K 1K 4K P3 216K P2 P1 operating 256K 23K 2K 17K 1K 4K P3 P4 P1 operating 256K 23K 2K 17K 1K 9K 4K P3 216K P4 P5 operating
External fragmentation Compaction Compaction - example Total size of all holes is enough to load another process, but it is broken into pieces which are too small. Could be a useless hole between every pair of processes. Often one third of memory will be so wasted. Bring all free memory together in one hole. Requires relocatable processes. Choice of algorithms Moving everything down in memory may not be the cheapest (fastest) alternative 256K 23K 2K 17K 1K 9K 4K P3 216K P4 P5 operating 256K 19K 16K 9K 4K P3 216K P4 P5 operating Paging (TW 4.3) Paging Internal fragmentation Usual form of virtual memory: common. Non-contiguous allocation. Physical memory is divided into fixed sized blocks called page frames. Logical memory is divided into blocks of the same size as the frames, called pages. Process loading: some of its pages are loaded into available page frames. Logical memory Page Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page frame Page frame 1 Page frame 2 Page frame 3 Physical Memory Page Page 1 Page 2 Page 3 Page 4 Page 5 Adapted from Murdocca and Heuring, fig 7-21 Process size may not fill a whole number of pages. Additional unused spaces in allocated pages are internal fragmentation.
Page table For each process. Records page information. Disk. Memory if in memory. Access permissions. Pointer to page table kept in process table in OS. Page table (see also TW 4.3) Page number 1 2 3 4 5 Present in memory Disk Page frame number 1 7FA2C 1 11 6A9A8 11 1 89A67 17 11 88642 1 86817 1 1 92136 11 1 Access permission bits Hardware Support Address translation (page table lookup) in hardware. Registers hold (small) page table; or Register(s) point to page table in memory. Page size Typically between 512 bytes and 8192 bytes most common is 2k or 4k. Power of 2 split into page & offset. Address translation (see also TW fig 4-9) Page fault Logical and memory logical p o p f page table f o memory If the page is not in main memory a page fault results. Fetch page from disk into a frame. Process blocks on I/O waiting status. When process restarts: instruction is retried; translation retry should succeed. Page fault can result from instruction fetch and/or data access. OS handles page fault. Paging clearly separates: User s (logical) view of memory The actual memory User process views its memory as one contiguous space. Physical allocation may be scattered, random.
Advantages and disadvantages Process may exceed memory size. No external fragmentation but may have internal fragmentation. Simple logical memory model. Unused code/data on disk may never be loaded if unused throughout. Relocation and protection. Memory allocation seems simple Hardware support for page tables Simplest: page table in registers. High speed registers efficient translation. OS dispatcher reloads these registers on a context switch, as with all other registers. Instructions to modify the contents of these registers are privileged. Large page tables Sometimes hundreds to millions of page table entries. So much fast memory too expensive. Hold page table in main memory. Page table base register points to the page table. Problem Main memory is slow compared to registers. Two accesses required for each memory access. One for the page table. One for the information itself. Translation Lookaside Buffer (TW 4.3.3) TLB is a cache holding recently used page table entries. Page Frame High speed registers. Page Frame Match all at once. Page Frame Page Frame Not found normal translation. TLBs are fast but expensive, they typically contain between 8 and 248 entries Use of a TLB On a context switch the entire TLB must be flushed (erased), so that next process does not access the last process' memory. Percentage of translations satisfied by TLB is called hit ratio. On miss, load entry (replace another). Larger TLB better hit ratio. Better replacement algorithm better hit ratio.
TLB logical TLB miss p o p page frame f page table TLB hit f o memory Protection Protection bits for each page. Held in page table. Read only/read-write. Checked during translation. Invalid access hardware trap. Large logical spaces Modern computer s support a very large logical space (e.g. 2 32 to 2 64 ) Large space many pages. E.g. 2 32 space 4k (2 12 ) page size gives 2 32 /2 12 = 2 2 pages (about 1 million) Page table too large break table into pages and page it. Two-level page table Performance Inverted page table logical p 1 p 2 d p 1 outer page table p 2 f f d page of page table memory For larger memories the page table can be divided into three or four levels. If there was no TLB, this would drastically slow memory accesses. With TLB caching the degradation is not as bad. E.g. with a 98% hit rate the degradation is roughly only 28% for a four-level paging scheme. One entry for each page frame. Search for process ID, page number. Uses less memory, but slower. Still require per-process page table for pages on disk. Process ID Page number Page frame
Shared pages Processes share code (unchanging: pure or reentrant). Each process has own data. Same page has page table entry in both processes. Difficult for inverted page table. Segmentation (TW 4.6) Paging views memory as one continuous space. Users typically view a program as consisting of various modules and data structures which refer to each other by name. Memory order is unimportant. Segmentation supports this view. Segments Logical space is collection of segments. Segment is a piece of memory specified by a name and a length lengths differ. Logical = (segment name, offset). Compare to paging, where logical is a single quantity split up by the MMU Segment table: base and limit for each segment. Segmentation Segment table implementation Advantages of segmentation logical s o < s no limit base segment table yes + Memory Fast registers (small tables only) Memory Segment table base register to point to segment table in memory. Segment table length register indicates number of table entries. Segmentation reflects users view. Likely less segments than there would be pages, reducing the size of the table. Sharing of segments between processes is possible. Include shared segment in the segment table of both processes. trap; ing error
Problems with segmentation Susceptible to external fragmentation (as was base and limit registers). Whole segment must be in memory at the one time. Can solve these problems by paging the segments. Segmentation with paging Used in Intel x86 architecture, and others. Pentium has 16K segments per process, each holding up to 1 billion 32-bit words Logical consists of a selector (16 bits) and an offset (32 bits) Per-process segment table: local descriptor table (LDT) The table (for OS etc) is the global descriptor table (GDT) Pentium selector (TW 4.6.2) First 13 bits are index into descriptor table Next bit specifies whether local or global table is used Other two bits specify privilege level Half the 16K segments a process can access are local, half global Segment descriptor 64 bit quantity Limit Base 24-31 G D P DPL S Type Base 16-23 16-19 Base -15 Limit -15 G limit is bytes () or 4k pages (1) D - 16 bit segment (); 32 bit segment (1) P segment in memory (1) or not () DPL privilege level Type segment type and protection Address translation If segment in memory and offset in range: Linear = base + offset. For paging, linear is treated as virtual. Max 1 million pages 2 level page table. 1 bit directory index 1 bit page table index 12 bit offset in page Address translation 16 32 logical selector offset descriptor table segment descriptor linear directory page offset +
Address Translation linear directory page offset page directory directory entry page table page table entry page frame addr Advantages of segmentation with paging Avoids external fragmentation Allows much larger virtual es (for the Pentium 46 vs. 32 bit although only 32 bits of linear in total are supported) Question How could we use the Pentium memory management hardware that we have described to support OS features such as: Multiprogramming Shared library code page directory base register Memory management - summary Programs cannot simply use memory es as they will not be loaded into memory starting at Programs logical es begin at and these must be translated into es Memory management - summary (cont.) Methods for managing memory Methods of translating between logical and es Paging Segmentation