Virtualization and the UNIX API Science Computer Science CS 450: Operating Systems Sean Wallace <swallac6@iit.edu>
Agenda The Process UNIX process management API Virtual Memory Dynamic memory allocation & related APIs 2
The Process 3
Definition & OS Responsibilities 4
Process = a program in execution 5
{ code (program), global data, local data (stack), dynamic data (heap), PC & other registers } 6
Programs describe what we want done, Processes realize what we want done 7
and the operating system runs processes 8
Execution = running a process Scheduling = running many processes Peripherals = I/O devices 9
To do this, the OS is constantly running in the background, keeping track of a large amount of process/system metadata. 10
{ code (program), global data, local data (stack), dynamic data (heap), PC & other registers, + OS-level metadata } 11
{ code (program), global data, local data (stack), dynamic data (heap), PC & other registers, + (e.g., pid, owner, memory/cpu usage) } 12
Logical Control Flow Process A Process B Process C Time 13
Physical Flow (1 CPU) Process A Process B Process C Time 14
Context Switches Time Process A Process B read Disk interrupt Return from read User Code Kernel Code User Code Kernel Code User Code } Context switch } Context switch 15
Context switches are external to a process s logical control flow (dictated by user program) part of key OS virtualization: exceptional control flow 16
Exceptional Control Flow 17
Logical c.f. Exception! int main() { while (1) { printf("hello World!\n"); } return 0; }? 18
Two classes of exceptions: I. Synchronous II. Asynchronous 19
I. Synchronous exceptions are caused by the currently executing instruction 20
3 subclasses of synchronous exceptions: 1. Traps 2. Faults 3. Aborts 21
1. Traps Traps are intentionally triggered by a process E.g., to invoke a system call 22
char *str = "Hello World" int len = strlen(str) write(1, str, len); mov edx, len mov ecx, str mov ebx, 1 mov eax, 4 ; syscall #4 int 0x80 ; trap to OS 23
Return from trap (if it happens) resumes execution at the next logical instruction 24
1. Faults Faults are usually unintentional, and may be recoverable or irrecoverable E.g., segmentation fault, protection fault, page fault, divide-by-zero 25
Often, return from fault will result in retrying the faulting instruction Esp. if the handler fixes the problem 26
1. Aborts Aborts are unintentional and irrecoverable I.e., abort = program/os termination E.g., memory ECC error 27
I. Asynchronous exceptions are caused by events external to the current instruction 28
int main() { while (1) { printf("hello World!\n"); } return 0; } Hello World! Hello World! Hello World! Hello World! ^C $ 29
Hardware initiated asynchronous exceptions are known as interrupts 30
E.g., ctrl-c, ctrl-alt-del, power switch 31
Interrupts are associated with specific processor (hardware) pins Checked after every CPU cycle Associated with interrupt handlers 32
(system) memory int. handler 0 code. 2 1 0 int # interrupt vector 33
(Typical) interrupt procedure Save context (e.g., user process) Load OS context Execute handler Load context (for?) Return 34
Important: after switching context to the OS (for exception handling), there is no guarantee if/when a process will be switched back in! 35
Switching context to the kernel is potentially very expensive But it s the only way to the OS API! 36
UNIX API 37
Process Management 38
Creating Processes 39
#include <unistd.h> pid_t fork(); 40
fork traps to OS to create a new process which is (mostly) a duplicate of the calling process! 41
E.g., the new (child) process runs the same program as the creating (parent) process And starts with the same PC, The same SP, FP, regs, The same open files, etc., etc. 42
Pparent int main() { fork(); foo(); } OS 43
Pparent int main() { fork(); foo(); } Pchild int main() { fork(); foo(); } OS creates 44
Pparent int main() { fork(); foo(); } Pchild int main() { fork(); foo(); } OS 45
fork, when called, returns twice (to each process @ the next instruction) 46
typedef int pid_t; pid_t fork(); System-wide unique process identifier Child s pid (> 0) is returned in the parent Sentinel value 0 is returned in the child 47
void fork0() { int pid = fork(); if (pid == 0) { printf("hello from Child!\n"); } else { printf("hello from Parent!\n"); } } main() { fork0(); } Hello from Child! Hello from Parent! (or) Hello from Parent! Hello from Child! 48
That is, order of execution is nondeterministic Parent & child run concurrently! 49
void fork1() { int x = 1; } if (fork() == 0) { printf("child has x = %d\n", ++x); } else { printf("parent has x = %d\n", --x); } Parent has x = 0 Child has x = 2 50
Important: post-fork, parent & child are identical, but separate! OS allocates and maintains separate data/state Control flow can diverge 51
All terminating processes turn into zombies 52
Dead but still tracked by OS PID remains in use Exit status can be queried 53
All processes are responsible for reaping their own (immediate) children 54
pid_t wait(int *stat_loc); 55
pid_t wait(int *stat_loc); When called by a process with 1 children: Waits (if needed) for a child to terminate Reaps a zombie child (if 1 zombified children, arbitrarily pick one) Returns reaped child s PID and exit status info via pointer (if non-null) 56
wait allows us to synchronize one process with events (e.g., termination) in another 57
void fork10() { int i, stat; pid_t pid[5]; for (i = 0; i < 5; i++) { if ((pid[i] = fork()) == 0) { sleep(1); exit(100 + i); } } for (i = 0; i < 5; i++) { pid_t cpid = wait(&stat); if (WIFEXITED(stat)) { printf("child %d terminated with status %d\n", cpid, WEXITSTATUS(stat)); } } } Child 9051 terminated with status 100 Child 9055 terminated with status 104 Child 9054 terminated with status 103 Child 9053 terminated with status 102 Child 9052 terminated with status 101 58
/* explicit waiting -- i.e., for a specific child */ pid_t waitpid(pid_t pid, int *stat_loc, int options); /** Wait options **/ /* return 0 immediately if no terminated children */ #define WNOHANG 0x00000001 /* also report info about stopped children (and others) */ #define WUNTRACED 0x00000002 59
void fork11() { int i, stat; pid_t pid[5]; for (i = 0; i < 5; i++) { if ((pid[i] = fork()) == 0) { sleep(1); exit(100 + i); } } for (i = 0; i < 5; i++) { pid_t cpid = waitpid(pid[i], &stat, 0); if (WIFEXITED(stat)) { printf("child %d terminated with status %d\n", cpid, WEXITSTATUS(stat)); } } } Child 9085 terminated with status 100 Child 9086 terminated with status 101 Child 9087 terminated with status 102 Child 9088 terminated with status 103 Child 9089 terminated with status 104 60
int main(int argc, const char * argv[]) { int stat; pid_t cpid; } if (fork() == 0) { printf("child pid = %d\n", getpid()); sleep(3); exit(1); } else { /* use with -1 to wait on any child (with options) */ while ((cpid = waitpid(-1, &stat, WNOHANG)) == 0) { sleep(1); printf("no terminated children!\n"); } printf("reaped %d with exit status %d\n", cpid, WEXITSTATUS(stat)); } Child pid = 9108 No terminated children! No terminated children! No terminated children! Reaped 9108 with exit status 1 61
Recap: fork: create new (duplicate) process wait: reap terminated (zombie) process 62
Running new programs (within processes) 63
/* the "exec family" of syscalls */ int execl(const char *path, const char *arg,...); int execlp(const char *file, const char *arg,...); int execv(const char *path, char *const argv[]); int execvp(const char *file, char *const argv[]); 64
Execute a new program within the current process context 65
Complements fork (1 call 2 returns): When called, exec (if successful) never returns! Starts execution of new program 66
int main(int argc, const char * argv[]) { execl("/bin/echo", "bin/echo", "hello", "world", (void *)0); printf("done exec-ing...\n"); return 0; } $./a.out hello world 67
int main(int argc, const char * argv[]) { printf("about to exec!\n"); sleep(1); execl("./execer", "./execer", (void *)0); printf("done exec-ing...\n"); return 0; } $ gcc execer.c -o execer $./execer About to exec! About to exec! About to exec! About to exec!... 68
int main(int argc, const char * argv[]) { if (fork() == 0) { execl("/bin/ls", "/bin/ls", "-l", (void *)0); exit(0); /* in case exec fails */ } wait(null); printf("command completed\n"); return 0; } $./a.out -rwxr-xr-x 1 sean staff 9500 Oct 7 00:15 a.out Command completed 69
Interesting question: Why are fork & exec separate syscalls? /* i.e., why not: */ fork_and_exec("/bin/ls",...) 70
A1: we might really want to just create duplicates of the current process (e.g.?) 71
A2: we might want to replace the current program without creating a new process 72
A3 (more subtle): we might want to tweak a process before running a program in it 73
The Unix Family Tree 74
BIOS bootloader kernel handcrafted process 75
kernel fork & exec /etc/inittab init 76
kernel fork & exec init fork & exec Daemons e.g., sshd, http getty 77
kernel init login exec 78
kernel init shell (e.g., sh) exec 79
kernel init (a fork-ing party!) shell (e.g., sh) user process user process user process user process 80
kernel (or, for the GUI-inclined) init display manager (e.g., xdm) X Server (e.g., XFree86) window manager (e.g., twm) 81
window manager (e.g., twm) terminal emulator (e.g., xterm) shell (e.g., sh) user process user process user process user process 82
The Shell (aka the CLI) 83
The original operating system user interface 84
Essential function: let the user issue requests to the operating system e.g., fork/exec a program, manage processes (list/stop/term), browse/manipulate the file system 85
(a read-eval-print-loop (REPL) for the OS) 86
pid_t pid; char buf[80], *argv[10]; while (1) { /* print prompt */ printf("$ "); buf argv l s \0 - l \0 \0 /* read command and build argv */ fgets(buf, 80, stdin); for (i = 0, argv[0] = strtok(buf, " \n"); argv[i]; argv[++i] = strtok(null, " \n")); /* fork and run command in child */ if ((pid = fork()) == 0) { if (execvp(argv[0], argv) < 0) { printf("command not found\n"); exit(0); } } } 87
The process management API (fork, wait, exec, etc.) let us tap into CPU virtualization i.e., create, manipulate, clean up concurrently running programs 88
Next key component in the Von Neumann machine: memory To be accessed by all of our processes Simultaneously With a simple, consistent API Withint trampling on each other! 89
Virtual Memory 90
Motivation 91
CPU instr data results Memory I/O Again, recall the Von Neumann architecture a stored-program computer with programs and data stored in the same memory 92
CPU instr data results Memory I/O Memory is an idealized storage device that hold our programs (instructions) and data (operands) 93
CPU instr data results Memory I/O Colloquially: RAM, random access memory ~ big array of byte-accessible data 94
execlp("/bin/echo", "/bin/echo", "hello", NULL); but our code & data clearly reside on the hard drive (to start)! 95
register file instr data results Memory hard disk in reality, memory is a combination of storage systems with very different access characteristics 96
common types of memory : SRAM, DRAM, NVRAM, HDD 97
Relative Speeds Type Size Access latency Unit Registers 8-32 words 0-1 cycles (ns) On-board SRAM 32-256 KB 1-3 cycles (ns) Off-board SRAM 256 KB - 16 MB 10 cycles (ns) DRAM 128 MB - 64 GB 100 cycles (ns) SSD 1 TB ~10,000 cycles (µs) HDD 4 TB 10,000,000 cycles (ms) 98
Numbers Every Programmer Should Know http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html 99
would like: 1. a lot of memory 2. fast access to memory 3. to not spend $$$ on memory 100
CPU registers cache (SRAM) smaller, faster costlier main memory (DRAM) local hard disk drive (HDD) larger, slower, cheaper remote storage (networked drive / cloud) an exercise in compromise: the memory hierarchy 101
idea: use the fast but scarce kind as much as possible; fall back on the slow but plentiful kind when necessary 102
registers cache (SRAM) main memory (DRAM) local hard disk drive (HDD/SSD) remote storage (networked drive / cloud) focus on DRAM HDD, SSD, etc. i.e., memory as a cache for disk 103
main goals: 1. maximize memory throughput 2. maximize memory utilization 3. provide address space consistency & memory protection to processes 104
throughput = # bytes per second - depends on access latencies (DRAM, HDD) and hit rate - improve by minimizing disk accesses 105
utilization = fraction of allocated memory that contains user data (aka payload) - reduced by storing metadata in memory - affected by alignment, block size, etc. 106
address space consistency provide a uniform view of memory to each process 107
0xffffffff 0xc0000000 Kernel virtual memory (code, data, heap, stack) User stack (created at runtime) Memory invisible to user code %esp (stack pointer) 0x40000000 Memory mapped region for shared libraries 0x08048000 0 Run-time heap (created by malloc) Read/write segment (.data,.bss) Read-only segment (.init,.text,.rodata) Unused brk Loaded from the executable file address space consistency provide a uniform view of memory to each process 108
memory protection prevent processes from directly accessing each other s address space 109
0xffffffff 0xc0000000 Kernel virtual memory (code, data, heap, stack) User stack (created at runtime) Memory invisible to user code %esp (stack pointer) 0xffffffff 0xc0000000 Kernel virtual memory (code, data, heap, stack) User stack (created at runtime) Memory invisible to user code %esp (stack pointer) 0xffffffff 0xc0000000 Kernel virtual memory (code, data, heap, stack) User stack (created at runtime) Memory invisible to user code %esp (stack pointer) 0x40000000 Memory mapped region for shared libraries 0x40000000 Memory mapped region for shared libraries 0x40000000 Memory mapped region for shared libraries Run-time heap (created by malloc) brk Run-time heap (created by malloc) brk Run-time heap (created by malloc) brk 0x08048000 0 Read/write segment (.data,.bss) Read-only segment (.init,.text,.rodata) Unused Loaded from the executable file 0x08048000 0 Read/write segment (.data,.bss) Read-only segment (.init,.text,.rodata) Unused Loaded from the executable file 0x08048000 0 Read/write segment (.data,.bss) Read-only segment (.init,.text,.rodata) Unused Loaded from the executable file P0 P1 P2 memory protection prevent processes from directly accessing each other s address space 110
i.e., every process should be provided with a managed, virtualized address space 111
memory addresses : what are they, really? 112
Main Memory CPU (note: cache not shown) N address: N data physical address: (byte) index into DRAM 113
int glob = 0xDEADBEEE; main() { fork(); glob += 1; } (gdb) set detach-on-fork off (gdb) break main Breakpoint 1 at 0x400508: file memtest.c, line 7. (gdb) run Breakpoint 1, main () at memtest.c:7 7 fork(); (gdb) next [New process 7450] 8 glob += 1; (gdb) print &glob $1 = (int *) 0x6008d4 (gdb) next 9 } (gdb) print /x glob $2 = 0xdeadbeef (gdb) inferior 2 [Switching to inferior 2 [process 7450] #0 0x000000310acac49d in libc_fork () 131 pid = ARCH_FORK (); (gdb) finish Run till exit from #0 in libc_fork () 8 glob += 1; (gdb) print /x glob $4 = 0xdeadbeee (gdb) print &glob $5 = (int *) 0x6008d4 parent child 114
Main Memory N CPU address: N data instructions executed by the CPU do not refer directly to physical addresses! 115
processes reference virtual addresses, the CPU relays virtual address requests to the memory management unit (MMU), which are translated to physical addresses 116
Main Memory MMU physical address CPU virtual address address translation unit disk address (note: cache not shown) swap space 117
the size of virtual memory is determined by the virtual address width the physical address width is determined by the amount of installed physical memory 118
e.g., given 48-bit virtual address, 8GB installed DRAM - 48-bits 256TB virtual space - 8GB 33-bit physical address 119
P0 2 n -1 virtual address space n virtual address P1 2 n -1 virtual address space 2 m -1 (global) physical address space n m P2 2 n -1 virtual address virtual address space physical address n virtual address 120
essential problem: map request for a virtual address physical address and this must be FAST! (happens on every memory access) 121
both hardware/software are involved: - MMU (hw) handles simple and fast operations (e.g., table lookups) - Kernel (sw) handles complex tasks (e.g., eviction policy) 122
Virtual Memory Implementation 123
keep in mind goals: 1. maximize memory throughput 2. maximize memory utilization 3. provide address space consistency & memory protection to processes 124
1. simple relocation Main Memory CPU MMU relocation reg. N B VA: N B PA: N+B data - per-process relocation address is loaded by kernel on every context switch 125
1. simple relocation Main Memory CPU MMU limit reg. L relocation reg. N B+L B process sandbox VA: N B PA: N+B assert (0 N L) data - incorporate a limit register to provide memory protection 126
pros: - simple & fast! - provides protection 127
stack heap Virtual Memory Main Memory Main Memory stack stack data data heap heap vs. code code B data data code code B but: available memory for mapping depends on value of base address i.e., address spaces are not consistent! 128
virtual address space Main Memory L stack stack possibly unused!! code code B also: all of a process below the address limit must be loaded in memory i.e., memory may be vastly under-utilized 129
2. segmentation - partition virtual address space into multiple logical segments - individually map them onto physical memory with relocation registers 130
Segmented Virtual Address Space Main Memory B2+L2! Seg #$: Stack MMU Segment Table B2! Seg ##: Heap 0 1 2 3 Base Limit! B0 L0 " B1 L1 # B2 L2 $ B3 L3 B3+L3 B3 B0+L0! Seg #": Data B0! Seg #!: Code B1+L1 B1 virtual address has form seg#:offset 131
0 1 2 3 MMU Segment Table Base Limit! B0 L0 " B1 L1 # B2 L2 $ B3 L3 Main Memory B2+L2 B2 B3+L3 CPU VA: seg#:o set PA: o set + B2 B3 B0+L0 B0 assert (o set L2) B1+L1 B1 data 132
0 1 2 3 Segment Table Base Limit! B0 L0 " B1 L1 # B2 L2 $ B3 L3 - implemented as MMU registers - part of kernel-maintained, per-process metadata (aka process control block ) - re-populated on each context switch 133
pros: - still very fast - translation = register access & addition - memory protection via limits - segmented addresses improve consistency 134
what about utilization? 135
virtual address space Main Memory simple relocation: L stack stack possibly unused!! code code B virtual address space Main Memory segmentation:! stack stack better!! code code 136
virtual address space Main Memory x virtual address space 2 stack! stack x stack! code code!! code x - variable segment sizes memory fragmentation - fragmentation potentially lowers utilization - can fix through compaction, but expensive! 137
3. paging - partition virtual and physical address spaces into uniformly sized pages - only map pages onto physical memory that contain required data 138
stack physical memory heap data code - pages boundaries are not aligned to segments! - instead, aligned to multiples of page size 139
stack physical memory heap data code - minimum mapping granularity = page - not all of a given segment need be mapped 140
new mapping problem: - given virtual address, decompose into virtual page number & virtual page offset - map VPN physical page number 141
Given page size = 2 p bytes VA: PA: virtual page number physical page number p virtual page offset p physical page offset 142
VA: virtual page number virtual page offset address translation PA: physical page number physical page offset 143
VA: n virtual page number virtual page offset translation structure: page table valid PPN index 2 n entries if invalid, page is not mapped PA: physical page number physical page offset 144
page table entries (PTEs) typically contain additional metadata, e.g.: - dirty (modified) bit - access bits (shared or kernel-owned pages may be read-only or inaccessible) 145
e.g., 32-bit virtual address, 4KB (2 12 ) pages, 4-byte PTEs; - size of page table? 146
e.g., 32-bit virtual address, 4KB (2 12 ) pages, 4-byte PTEs; - # pages = 2 (32-12) = 2 20 =1M - page table size = 1M 4 bytes = 4MB 147
4MB is much too large to fit in the MMU insufficient registers and SRAM! Page table resides in main memory 148
The translation process (aka page table walk) is performed by hardware (MMU). The kernel must initially populate, then continue to manage a process s page table The kernel also populates a page table base register on context switches 149
translation: hit CPU VA: N Address Translator (part of MMU) page table walk Main Memory Page Table PA: N' data 150
translation: miss transfer control to kernel CPU VA: N VA: N (retry) page fault Address Translator (part of MMU) page table walk PA: N' Main Memory Page Table PTE update kernel Disk (swap space) data transfer data 151
kernel decides where to place page, and what to evict (if memory is full) - e.g., using LRU replacement policy 152
this system enables on-demand paging i.e., an active process need only be partly in memory (load rest from swap dynamically) 153
but if working set (of active processes) exceeds available memory, we may have swap thrashing 154
integration with caches? 155
Q: Do caches use physical or virtual address for lookups? 156
Virtual Address Based Cache Process A CPU Process B Virtual Address Space Cache Address Data L X M Y N Z Ambiguous! Virtual Address Space Z N Synonym problem M M L X 0 0 157
Physical Address Based Cache Process A CPU Process B Virtual Address Space Cache Address Data S X Q Y R Z Virtual Address Space Z N M Physical Memory Y M L 0 X X Z Y S R Q 0 158
Q: Do caches use physical or virtual address for lookups? A: Caches typically use physical addresses 159
Main Memory page table walk process page table CPU MMU Cache VA (address translation unit) PA (miss) data (hit) (update) %*@$&#!! 160
Saved by hardware: The Translation Lookaside Buffer (TLB) a cache used solely for VPN PPN lookups 161
Main Memory page table walk process page table CPU MMU Cache VA (address translation unit) PA (miss) data (hit) (update) 162
MMU Main Memory only if TLB miss! address translation unit page table walk process page table CPU TLB Cache VA (VPN PPN cache) PA (miss) data (hit) (update) TLB + Page Table (exercise for you: revise earlier translation diagrams!) 163
virtual address n-1 p p-1 0 virtual page number (VPN) page offset valid tag physical page number (PPN) TLB = TLB Hit physical address valid tag data byte offset Cache Cache Hit = Data 164
Not so fast! TLB mappings are process specific requires flush & reload on context switch - Some architectures store PID (aka virtual space ID) in TLB 165
Another problem: TLB caches a few thousand mappings vs. millions of virtual pages per process! 166
We can improve TLB hit rate by reducing the number of pages by increasing the size of each page. 167
Compute # of pages for 32-bit memory for: 1KB, 512KB, 4MB pages 2 32 2 10 = 2 22 = 4M pages 2 32 2 19 = 2 13 = 8K pages 2 32 2 22 = 2 10 = 1K pages (not bad!) 168
Process A Virtual Memory Lots of wasted space! Physical Memory Process B Virtual Memory 169
Process A Virtual Memory Process B Virtual Memory Physical Memory 170
Increasing page size results in increased internal fragmentation and lower utilization 171
That is, TLB effectiveness needs to be balanced against memory utilization! 172
So, what about 64-bit systems? 2 64 = 16 ExbiByte address space 4 billion 4GB 173
Most modern implementations support a max of 2 48 (256TB) addressable space. 174
Page table size? # pages = 2 48 2 12 = 2 36 PTE size = 8 bytes (64 bits) PT size = 2 36 8 = 2 39 bytes = 512GB 175
512GB (just for the virtual memory mapping structure) (and we need one per process) 176
(We re not going to be able to fit these into memory) 177
Instead, use multi-level page tables: Split an address translation into two (or more) separate table lookups Unused parts of the table don t need to be in memory! 178
Toy memory system - 8 bit addresses - 32-byte pages 7 6 5 4 3 2 1 0 1 1 0 1 1 0 1 0 VPN page offset Page Table 7 (unmapped) 6 PPN 5 (unmapped) 4 PPN 3 (unmapped) 2 (unmapped) 1 (unmapped) 0 (unmapped) All 8 PTEs must be in memory at all times 179
Toy memory system - 8 bit addresses - 32-byte pages 7 6 5 4 3 2 1 0 1 1 0 1 1 0 1 0 page offset 3 (unmapped) 2 PPN 1 0 1 (unmapped) 0 PPN page directory all unmapped; don t need in memory! 180 3 (unmapped) 2 (unmapped) 1 (unmapped) 0 (unmapped)
Toy memory system - 8 bit addresses - 32-byte pages 7 6 5 4 3 2 1 0 1 1 0 1 1 0 1 0 page offset 1 0 3 (unmapped) 2 PPN 1 (unmapped) 0 PPN 181
Intel Architecture Memory Management http://www.intel.com/products/processor/ manuals/ (Software Developer s Manual Volume 3A) 182
Logical Address (or Far Pointer) Segment Selector Offset Linear Address Space Global Descriptor Table (GDT) Linear Address Dir Table Offset Physical Address Space Segment Descriptor Segment Lin. Addr. Page Directory Page Table Entry Page Phy. Addr. Entry Segment Base Address Page Segmentation Paging 183
Segment Registers CS SS DS ES FS GS Segment Descriptors Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Linear Address Space (or Physical Memory) Stack Code Data Data Data Data Segment Descriptors 184
Segment Registers CS SS Code- and Data-Segment Descriptors Linear Address Space (or Physical Memory) Code Not Present FFFFFFFFH DS ES Access Limit Base Address Data and Stack 0 FS GS Flat Model 185
Paging Modes 186
Linear Address 31 22 21 12 11 Directory Table Offset 12 0 4-KByte Page 10 Page Directory 10 Page Table Physical Address PDE with PS=0 20 PTE 20 32 CR3 32-Bit Paging (4KB Pages) 187
32-Bit Paging (4MB Pages) 188
IA-32e Paging (4KB Pages) 189
IA-32e Paging (1GB Pages) 190
Dynamic Memory Allocation 191
registers From: cache (SRAM) main memory (DRAM) local hard disk drive (HDD/SSD) remote storage (networked drive / cloud) The Memory Hierarchy 192
We now have: Virtual Memory 193
Now what? 194
- code, global variables, jump tables, etc. - Allocated at fork/exec - Lifetime: permanent Static Data 195
The Stack - Function activation records - Local vars, arguments, return values - Lifetime: LIFO Pages allocated as needed (up to preset stack limit) 196
Explicitly requested from the kernel - For dynamic allocation - Lifetime: arbitrary! The Heap 197
- Starts out empty - brk pointer marks top of the heap brk The Heap 198
Heap mgmt syscall: void *sbrk(int inc); /* resizes heap by inc, returns old brk value */ The Heap brk 199
void *hp = sbrk(n); N brk hp The Heap 200
Can use sbrk to allocate structures: int **make_jagged_arr(int nrows, const int *dims) { int i, j; int **jar = sbrk(sizeof(int *) * nrows); for (i = 0; i < nrows; i++) { jarr[i] = sbrk(sizeof(int) * dims[i]); } } return jarr; 201
But, we can t free this memory! int **make_jagged_arr(int nrows, const int *dims) { int i, j; int **jar = sbrk(sizeof(int *) * nrows); for (i = 0; i < nrows; i++) { jarr[i] = sbrk(sizeof(int) * dims[i]); } } return jarr; void free_jagged_arr(int **jar, int nrows) { int i; for (i = 0; i < nrows; i++) { free(jarr[i]); } free(jarr); } 202
After the kernel allocates heap space for a process, it is up to the process to manage it! 203
Manage = tracking memory in use, tracking memory not in use, reusing unused memory 204
Job of the dynamic memory allocator Typically included as a user-level library and/or language runtime feature 205
User Process application program malloc dynamic memory allocator Heap sbrk OS Kernel RAM Disk 206
User Process application program free(p) dynamic memory allocator Heap OS Kernel RAM Disk 207
User Process application program free(p) dynamic memory allocator Heap OS Kernel RAM Disk (heap space may not be returned to the kernel!) 208
The DMA constructs a user-level abstraction (re-usable blocks of memory) on top of a kernel-level one (virtual memory) 209
The user-level implementation must make good use of the underlying infrastructure (the memory hierarchy) 210