Main Memory Electrical and Computer Engineering Stephen Kim (dskim@iupui.edu) ECE/IUPUI RTOS & APPS 1 Main Memory Background Swapping Contiguous allocation Paging Segmentation Segmentation with paging ECE/IUPUI RTOS & APPS 2 1
Background Program must be brought into memory and placed within a process for it to be run. Input queue collection of processes on the disk that are waiting to be brought into memory to run the program. User programs go through several steps before being run. From the memory-unit viewpoint, a CPU generate a stream of memory address. It doesn t matter how they were generated or what they are for. ECE/IUPUI RTOS & APPS 3 Binding of Instructions and Data to Memory Most systems allow a user program to reside in any part of the physical memory. How can a user program know where its data are? Address binding of instructions and data to memory addresses can happen at three different stages Compile time : if memory location known a priori, absolute code can be generated; Must recompile code if starting location changes. Load time: must generate relocatable code if memory location is not known at compile time. Execution time : binding delayed until run time if the process can be moved during its execution from one memory segment to another. Need hardware support for address maps (e.g., Base and limit registers). ECE/IUPUI RTOS & APPS 4 2
Multistep Processing of a User Program ECE/IUPUI RTOS & APPS 5 Logical vs. Physical Address Space The concept of a logical address space that is bound to a separate physical address space is central to proper memory management. Logical address generated by the CPU; also referred to as virtual address. Physical address address seen by the memory unit. Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical (virtual) and physical addresses differ in execution-time address-binding scheme. ECE/IUPUI RTOS & APPS 6 3
Memory-Management Unit (MMU) Hardware device that maps virtual to physical address. In MMU scheme, the value in the relocation register is added to every address generated by a user process at the time it is sent to memory. The user program deals with logical addresses; it never sees the real physical addresses. ECE/IUPUI RTOS & APPS 7 Dynamic relocation using a relocation register ECE/IUPUI RTOS & APPS 8 4
Dynamic Loading Routine is not loaded until it is called Better memory-space utilization; Unused routine is never loaded. All routines are staying on disk in a relocatable load format. The main program is loaded into memory and is executed. When a routine needs to call another routine, the calling routine first check to see whether the other routine has been loaded. If not, the relocatable linking loader is called to load the desired routine into memory and to update the program s address table to reflect this change. Then, control is passed to the newly loaded routine. Useful when large amounts of code are needed to handle infrequently occurring cases. (eg. Error handling routines) No special support from the operating system is required implemented through program design. It is the responsibility of the user to design their program to take advantage of such a method. ECE/IUPUI RTOS & APPS 9 Dynamic Linking Similar to dynamic loading. Linking postponed until execution time. Small piece of code, stub, used to locate the appropriate memory-resident library routine. Stub replaces itself with the address of the routine, and executes the routine. All processes that use a language library execute only one copy of the library code. Operating system needed to check if routine is in processes memory address. Dynamic linking is particularly useful for libraries. A library may be replaced by a new version, and all programs that references the library will automatically use the new version. ECE/IUPUI RTOS & APPS 10 5
Overlays To make a process larger than an allocated memory Keep in memory only those instructions and data that are needed at any given time. Implemented by user, no special support needed from operating system, programming design of overlay structure is complex Example) Pass1 70 KB, Pass2 80 KB, Symbol Table 20KB, Common routine 30KB Without overlay, 200KB is required The overlay itself needs some memory space, says 10KB ECE/IUPUI RTOS & APPS 11 Overlays for a Two-Pass Assembler With the overlay, 140KB is enough ECE/IUPUI RTOS & APPS 12 6
Swapping (1) A process can be swapped temporarily out of memory to a backing store, and then brought back into memory for continued execution. Backing store fast disk large enough to accommodate copies of all memory images for all users; must provide direct access to these memory images. ECE/IUPUI RTOS & APPS 13 Swapping (2) Major part of swap time is transfer time; total transfer time is directly proportional to the amount of memory swapped. For example, 1 MB process, 5MB / sec hard-disk, no head seeks, average latency 8 msec. The total swap time is ( swap-out and swap-in) 2 * (1MB / (5 MB/sec) + 8 msec)=416 msec If the system uses Round-robin, the time quantum must be much larger than 416 msec. Modified versions of swapping are found on many systems, i.e., UNIX, Linux, and Windows. ECE/IUPUI RTOS & APPS 14 7
Schemes of Memory Allocation Contiguous Allocation Paging Segmentation Segmentation with Paging ECE/IUPUI RTOS & APPS 15 Contiguous Allocation Main memory usually into two partitions: Resident operating system, usually held in low memory User processes then held in high memory. Why an OS in low memory? location of interrupt vector table Single-partition allocation Relocation-register scheme used to protect user processes from each other, and from changing operating-system code and data. Relocation register contains value of smallest physical address; limit register contains range of logical addresses each logical address must be less than the limit register. ECE/IUPUI RTOS & APPS 16 8
CA, Hardware Support for Relocation and Limit Registers MMU maps the logical address dynamically by adding the value in the relocation register. Memory Management Unit limit register relocation register logical physical CPU address < yes + address Memory no trap: addressing error ECE/IUPUI RTOS & APPS 17 CA, Memory Allocation OS maintains a table for allocated memory block and available memory block Initially, all memory is available, so there is one large block of available memory, or one big hole Holes of various size are scattered throughout memory. When a process arrives, it is allocated memory from a hole large enough to accommodate it. If a hole is large, it is split into two segments, one for allocating to a new process, the other hole is returned to the pool of holes. When a process terminates, it releases its block which is placed to the pool of holes. ECE/IUPUI RTOS & APPS 18 9
CA, Dynamic Storage-Allocation Problem How to satisfy a request from the set of available holes Random-fit: choose an arbitrary hole among those large enough; not good utilization First-fit: Allocate the first hole of the table that is big enough. Time complexity is O(N/2), where N is the number of holes. Best-fit: Allocate the smallest hole that is big enough; must search entire list, unless ordered by size. Produces the smallest leftover hole. Time complexity is O(N). Worst-fit: Allocate the largest hole; must also search entire list. Produces the largest leftover hole. utilization is worse than FF & BF. Next-fit: Like FF, but start the search at the hole where the last placement was made and treat the list of holes as a circular lis t. Time complexity O(N) utilization = occupied_memory / total_memory ECE/IUPUI RTOS & APPS 19 CA, Knuth Foundings For most placement policies including BF&FF, the number of holes is, on average, approximately equal to one half the number of segments N segments N/2 unusable holes BF & FF had about the same utilization ECE/IUPUI RTOS & APPS 20 10
CA, Fragmentation External Fragmentation total memory space exists to satisfy a request, but there is no single hole to hold the request Internal Fragmentation To solve the external fragmentation, the memory is partitioned into a fixed-size block. Then, allocate memory in unit of block size, but not in byte. Allocated memory may be slightly larger than requested memory; this size difference is memory internal to a partition, but not being used. ECE/IUPUI RTOS & APPS 21 CA, Compaction Reduce external fragmentation by compaction Shuffle memory contents to place all free memory together in one large block. Compaction is possible only if relocation is dynamic, and is done at execution time. ECE/IUPUI RTOS & APPS 22 11
CA, Binary Buddy Systems, Method The system keeps the lists of holes, one list for each of the size 2 m, 2 m+1,,2 n, where 2 m is the smallest segment size. A request to place a segment of size s Round s upto the next power of 2, ie. 2 k such that 2 k-1 <s 2 k Look in the free list for size 2 k. If not empty, place the segment in the first entry & remove that entry from the list. If there are no holes of size 2 k, look in list for size 2 k+1, etc until a hole is found. Then recursively divide that hole by 2 until the required size is got and use it. If we use a hole of size 2 k+l, we will leave hole of size 2 k+l-1, 2 k+l-2,,2 k ECE/IUPUI RTOS & APPS 23 CA, BDS (example) Assume memory of size 1024, and memory request 68 68 is rounded to 128. 1024 is partitioned into 512, 256, and 2x128 one of size 128 is allocated for the request. If we wished to place a segment of size 600, the buddy system rejects the request, even though the 3 neighboring holes can add upto more than 600. 512 128 256 ECE/IUPUI RTOS & APPS 24 12
CA, BDS (Recombination) When a segment departs, the system looks to see if its buddy (block of same size with same parent in binary tree) is vacant. If so, the 2 buddies are recombined, and the recombination process continues recursively up the tree until a non-vacant buddy is found. 2 holes cannot be recombined ECE/IUPUI RTOS & APPS 25 CA, Fibonacci System Fibonacci sequence 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, Memory utilization Binary, about 65% Fibonacci, about 60% 89 55 34 34 21 21 12 ECE/IUPUI RTOS & APPS 26 13
CA, Weight System Power of 2 splits 2 k = 2 k-2 + 3*2 k-2 New power of 2 such as 3*2 p = 2 p + 2 p+1 The weighted scheme has twice as many available size for a given size range as does the binary scheme New entries are mid way between the binary scheme s entry Average internal fragmentation for segment is about ½ that of the binary schemes. 1024 768 256 256 512 384 128 192 64 128 64 ECE/IUPUI RTOS & APPS 27 CA, Fibonacci and Weighted Schemes Both Fibnacci and weighted schemes improve internal fragmentation, but degrade external fragmentation so much that the average utilization is worse than for the binary scheme ECE/IUPUI RTOS & APPS 28 14
CA, Dual Buddy Systems Keep 2 separate areas of memory one of size 2 k, operating the binary buddy system the other of size 3*2 p, also operating the binary buddy system Results in all the same size as the weighted schemes, but reduce external fragmentation Overall utilization is better than in the binary buddy system 1024 512 512 256 256 768 384 384 192 192 ECE/IUPUI RTOS & APPS 29 CA, Computing Mean Internal Fragmentation Assumption buddy sizes : 8, 16, 32, 64, 128, and segment size: 10, 11,, 100 are all equally likely to arrive P(s=10) = P(s=11) = P(s=100) = 1/91 f(s) = fragmentation for segment size s Mean internal fragmentation = Σ P(s)*f(s) 100 s = 10 P( s) f ( s) = = = 1 91 1 91 1 91 16 s = 10 16 s = 10 6 i= 0 f ( s) + (16 s) + i + 15 i= 0 32 s= 17 i + f ( s) + 31 32 s = 17 i= 0 (32 s) + i + 63 i= 0 64 s = 33 i f ( s) + f ( s) (64 s) + (128 s) 1 6 7 15 16 31 32 63 64 27 28 = 91 + + + 2 2 2 2 2 = 25 ECE/IUPUI RTOS & APPS 30 27 i= 0 64 s = 33 i 100 s = 65 100 s = 65 15
Paging Allow the physical address space of a process to be noncontiguous Divide physical memory into fixed-sized blocks called frames (size is power of 2. Why?) Divide logical memory into blocks of same size - pages. To run a program of size n pages, need to find n free frames Set up a page table to translate logical to physical addresses. Paging is usually required for virtual memory, which introduces an unexpectable long-term delay. Hence, realtime systems mostly have no virtual memory. However, the technique used in paging is valuable in many system architecture in real-time system. (e.g. IP router, ATM switch) ECE/IUPUI RTOS & APPS 31 Address Translation Scheme Address generated by CPU is divided into: Page number (p) used as an index into a page table which contains the base address of each page in physical memory. Page offset (d) combined with base address to define the physical memory address that is sent to the memory unit. The base address is combined with the page offset to define the physical memory address. ECE/IUPUI RTOS & APPS 32 16
Address Translation Architecture ECE/IUPUI RTOS & APPS 33 Paging Example ECE/IUPUI RTOS & APPS 34 17
Paging Example page 0 page 1 page 2 page 3 frame 0 frame 1 frame 2 frame 3 frame 4 frame 5 frame 6 frame 7 ECE/IUPUI RTOS & APPS 35 Internal fragmentation Using the paging scheme, no external fragmentation, but Some internal fragmentation. e.g.) Assume the page size is 2KB Assume the size of a memory request is random. What is the average internal fragmentation? Small page size is desirable due to internal fragmentation cause overhead for page table for each process. e.g.) 32-bit address and 4 byte for each entry in page table 2KB page size 11-bit offset 2 21 entries 8 MB page table 8KB page size 13-bit offset 2 19 entries 2 MB page table ECE/IUPUI RTOS & APPS 36 18
Free Frames and Allocation Before allocation After allocation ECE/IUPUI RTOS & APPS 37 Implementation of Page Table Page table is stored in main memory due to its size. Page-table base register (PTBR) points to the page table. In this scheme every memory access requires two memory accesses. One for the page table and One for the memory itself. The two memory access problem can be solved by the use of a special fast-lookup hardware cache called associative memory or content address memory (CAM), called translation look-aside buffers (TLBs) ECE/IUPUI RTOS & APPS 38 19
TLB Associative memory parallel search Page # (key) Frame # (value) Address translation [A, A ] Operation When A is in associative memory, it is compared with all key (page #) simultaneously. If found, get value (frame #) out. If not found, get frame # from page table in memory and add the page # and frame # to TLB. If TLB is full, OS selects a victim for replacement. Who will be the victim? LRU (Least Recently Used) or Random ECE/IUPUI RTOS & APPS 39 Paging Hardware With TLB ECE/IUPUI RTOS & APPS 40 20
TLB, ASID (Address Space ID) to identify who (which process) generate the memory address Several processes can share a TLB On a context switch, OS need not flush all TLB entries ECE/IUPUI RTOS & APPS 41 Effective Access Time Associative Lookup = ε time unit Assume memory cycle time is 1 time unit Hit ratio (r) percentage of times that a page number is found in the associative registers; ration related to number of associative registers. Effective Access Time T effective = (1 + ε) r + (2 + ε)(1 r) = 2 + ε r ECE/IUPUI RTOS & APPS 42 21
Memory Protection Memory protection implemented by associating protection bit with each frame. Valid-invalid bit attached to each entry in the page table: valid indicates that the associated page is in the process logical address space, and is thus a legal page. invalid indicates that the page is not in the process logical address space. ECE/IUPUI RTOS & APPS 43 Valid (v) or Invalid (i) Bit In A Page Table ECE/IUPUI RTOS & APPS 44 22
Page Table Structure Most modern systems supports a large logical memory address (32-bit to 64-bit) The page table itself become excessively large e.g.) 32-bit system, 4KB page size, 1M page entries. Ass4ume 4- byte per entry, 4 MB for page table only. However, in most case, user programs do not need all logical memory space. 200 KB process, but 4 MB page table for the process!! Solution: Organize the page table. Hierarchical Paging Hashed Page Tables Inverted Page Tables ECE/IUPUI RTOS & APPS 45 Hierarchical Page Tables Break up the logical address space into multiple page tables. A simple technique is a two-level page table. ECE/IUPUI RTOS & APPS 46 23
Two-Level Paging Example A logical address (on 32-bit machine with 4K page size) is divided into: a page number consisting of 20 bits. a page offset consisting of 12 bits. Since the page table is paged, the page number is further divided into: a 10-bit page number. a 10-bit page offset. Thus, a logical address is page number as follows: where p i is an index into the outer page table, and p 2 is the displacement within the page of the outer page table. p i p 2 d page offset 10 10 12 ECE/IUPUI RTOS & APPS 47 Two-Level Page-Table Scheme ECE/IUPUI RTOS & APPS 48 24
Address-Translation Scheme Address-translation scheme for a two-level 32-bit paging architecture ECE/IUPUI RTOS & APPS 49 High Level Page Table Some system supports more than 2-level paging. e.g. SPARC supports 3-level paging 32-bit MC 68030 supports 4-level paging ECE/IUPUI RTOS & APPS 50 25
Hashed Page Tables 64-bit systems need 7- level paging table in hierarcy structure NO WAY!! Too many memory access for address translation. Hashed page table is for handling address space larger than 32 bits. The virtual page number is hashed into a page table. This page table contains a chain of elements hashing to the same location. Virtual page numbers are compared in this chain searching for a match. If a match is found, the corresponding physical frame is extracted. ECE/IUPUI RTOS & APPS 51 Hashed Page Table ECE/IUPUI RTOS & APPS 52 26
Inverted Page Table Each process needs a page table, but all of them are for mapping to a single physical memory e.g.) Assume 100 processes, hence 100 page table. As a whole, all page tables are mapped to a physical memory. Many entries in page tables are just for wasting of memory. Solution: Inverted Page Table One entry for each real page of memory. Hence, map from a physical address to a logical address. Entry consists of the virtual address of the page stored in that real memory location, with information about the process that owns that page. Decreases memory needed to store each page table, but increases time needed to search the table when a page reference occurs. Use hash table to limit the search to one or at most a few page-table entries. ECE/IUPUI RTOS & APPS 53 Inverted Page Table Architecture ECE/IUPUI RTOS & APPS 54 27
Shared Pages One copy of read-only (reentrant) code shared among processes (i.e., text editors, compilers, window systems). The code must be a reentrant code. Shared code must appear in same location in the logical address space of all processes. Similar to the sharing of address space of a task by threads (tasks). Difficult to make shared pages for the systems using inverted page table. because each physical frame has only one virtual page. ECE/IUPUI RTOS & APPS 55 Shared Pages, Example ECE/IUPUI RTOS & APPS 56 28
Segmentation Memory-management scheme that supports user view of memory. A program is a collection of segments. A segment is a logical unit such as: main program, procedure, function, method, object, local variables, global variables, common block, stack, symbol table, arrays ECE/IUPUI RTOS & APPS 57 Segmentation Hardware ECE/IUPUI RTOS & APPS 58 29
User s View of a Program ECE/IUPUI RTOS & APPS 59 Logical View of Segmentation 1 1 4 2 3 4 2 3 user space physical memory space ECE/IUPUI RTOS & APPS 60 30
Segmentation Architecture Logical address consists of a two tuple: <segment-number, offset>, Segment table maps two-dimensional physical addresses; each table entry has: base contains the starting physical address where the segments reside in memory. limit specifies the length of the segment. Segment-table base register (STBR) points to the segment table s location in memory. Segment-table length register (STLR) indicates number of segments used by a program; A segment number s is legal if s < STLR. ECE/IUPUI RTOS & APPS 61 Example of Segmentation ECE/IUPUI RTOS & APPS 62 31
Segmentation Architecture (Cont.) Relocation. dynamic by segment table Sharing. shared segments same segment number Allocation. first fit/best fit external fragmentation ECE/IUPUI RTOS & APPS 63 Segmentation Architecture (Cont.) Protection. With each entry in segment table associate: validation bit = 0 illegal segment read/write/execute privileges Protection bits associated with segments; code sharing occurs at segment level. Since segments vary in length, memory allocation is a dynamic storage-allocation problem. ECE/IUPUI RTOS & APPS 64 32
Sharing of Segments ECE/IUPUI RTOS & APPS 65 Segmentation with Paging Combine paging and segmentation Intel 386 uses segmentation with paging for memory management with a two- level paging scheme. ECE/IUPUI RTOS & APPS 66 33
Notes Chapter 10 Virtual Memory is very critical in building an OS efficiently. Many performance aspects of OS es are measured by the virtual memory. Even though, this class will not cover this chapter because most of real-time operating systems do not support the virtual memory, because secondary storage is usually not an RTOS or embedded system component. (counterexample, ipod) execution time cannot be expected with the virtual memory But, I strongly recommend you to read the chapter by yourselves. ECE/IUPUI RTOS & APPS 67 34