Virtualization and the UNIX API

Virtualization and the UNIX API Science Computer Science CS 450: Operating Systems Sean Wallace <swallac6@iit.edu>

Agenda The Process UNIX process management API Virtual Memory Dynamic memory allocation & related APIs 2

The Process 3

Definition & OS Responsibilities 4

Process = a program in execution 5

{ code (program), global data, local data (stack), dynamic data (heap), PC & other registers } 6

Programs describe what we want done, Processes realize what we want done 7

and the operating system runs processes 8

Execution = running a process Scheduling = running many processes Peripherals = I/O devices 9

To do this, the OS is constantly running in the background, keeping track of a large amount of process/system metadata. 10

{ code (program), global data, local data (stack), dynamic data (heap), PC & other registers, + OS-level metadata } 11

{ code (program), global data, local data (stack), dynamic data (heap), PC & other registers, + (e.g., pid, owner, memory/cpu usage) } 12

Logical Control Flow Process A Process B Process C Time 13

Physical Flow (1 CPU) Process A Process B Process C Time 14

Context Switches Time Process A Process B read Disk interrupt Return from read User Code Kernel Code User Code Kernel Code User Code } Context switch } Context switch 15

Context switches are external to a process s logical control flow (dictated by user program) part of key OS virtualization: exceptional control flow 16

Exceptional Control Flow 17

Logical c.f. Exception! int main() { while (1) { printf("hello World!\n"); } return 0; }? 18

Two classes of exceptions: I. Synchronous II. Asynchronous 19

I. Synchronous exceptions are caused by the currently executing instruction 20

3 subclasses of synchronous exceptions: 1. Traps 2. Faults 3. Aborts 21

1. Traps Traps are intentionally triggered by a process E.g., to invoke a system call 22

char *str = "Hello World" int len = strlen(str) write(1, str, len); mov edx, len mov ecx, str mov ebx, 1 mov eax, 4 ; syscall #4 int 0x80 ; trap to OS 23

Return from trap (if it happens) resumes execution at the next logical instruction 24

1. Faults Faults are usually unintentional, and may be recoverable or irrecoverable E.g., segmentation fault, protection fault, page fault, divide-by-zero 25

Often, return from fault will result in retrying the faulting instruction Esp. if the handler fixes the problem 26

1. Aborts Aborts are unintentional and irrecoverable I.e., abort = program/os termination E.g., memory ECC error 27

I. Asynchronous exceptions are caused by events external to the current instruction 28

int main() { while (1) { printf("hello World!\n"); } return 0; } Hello World! Hello World! Hello World! Hello World! ^C $ 29

Hardware initiated asynchronous exceptions are known as interrupts 30

E.g., ctrl-c, ctrl-alt-del, power switch 31

Interrupts are associated with specific processor (hardware) pins Checked after every CPU cycle Associated with interrupt handlers 32

(system) memory int. handler 0 code. 2 1 0 int # interrupt vector 33

(Typical) interrupt procedure Save context (e.g., user process) Load OS context Execute handler Load context (for?) Return 34

Important: after switching context to the OS (for exception handling), there is no guarantee if/when a process will be switched back in! 35

Switching context to the kernel is potentially very expensive But it s the only way to the OS API! 36

UNIX API 37

Process Management 38

Creating Processes 39

#include <unistd.h> pid_t fork(); 40

fork traps to OS to create a new process which is (mostly) a duplicate of the calling process! 41

E.g., the new (child) process runs the same program as the creating (parent) process And starts with the same PC, The same SP, FP, regs, The same open files, etc., etc. 42

Pparent int main() { fork(); foo(); } OS 43

Pparent int main() { fork(); foo(); } Pchild int main() { fork(); foo(); } OS creates 44

Pparent int main() { fork(); foo(); } Pchild int main() { fork(); foo(); } OS 45

fork, when called, returns twice (to each process @ the next instruction) 46

typedef int pid_t; pid_t fork(); System-wide unique process identifier Child s pid (> 0) is returned in the parent Sentinel value 0 is returned in the child 47

void fork0() { int pid = fork(); if (pid == 0) { printf("hello from Child!\n"); } else { printf("hello from Parent!\n"); } } main() { fork0(); } Hello from Child! Hello from Parent! (or) Hello from Parent! Hello from Child! 48

That is, order of execution is nondeterministic Parent & child run concurrently! 49

void fork1() { int x = 1; } if (fork() == 0) { printf("child has x = %d\n", ++x); } else { printf("parent has x = %d\n", --x); } Parent has x = 0 Child has x = 2 50

Important: post-fork, parent & child are identical, but separate! OS allocates and maintains separate data/state Control flow can diverge 51

All terminating processes turn into zombies 52

Dead but still tracked by OS PID remains in use Exit status can be queried 53

All processes are responsible for reaping their own (immediate) children 54

pid_t wait(int *stat_loc); 55

pid_t wait(int *stat_loc); When called by a process with 1 children: Waits (if needed) for a child to terminate Reaps a zombie child (if 1 zombified children, arbitrarily pick one) Returns reaped child s PID and exit status info via pointer (if non-null) 56

wait allows us to synchronize one process with events (e.g., termination) in another 57

void fork10() { int i, stat; pid_t pid[5]; for (i = 0; i < 5; i++) { if ((pid[i] = fork()) == 0) { sleep(1); exit(100 + i); } } for (i = 0; i < 5; i++) { pid_t cpid = wait(&stat); if (WIFEXITED(stat)) { printf("child %d terminated with status %d\n", cpid, WEXITSTATUS(stat)); } } } Child 9051 terminated with status 100 Child 9055 terminated with status 104 Child 9054 terminated with status 103 Child 9053 terminated with status 102 Child 9052 terminated with status 101 58

/* explicit waiting -- i.e., for a specific child */ pid_t waitpid(pid_t pid, int *stat_loc, int options); /** Wait options **/ /* return 0 immediately if no terminated children */ #define WNOHANG 0x00000001 /* also report info about stopped children (and others) */ #define WUNTRACED 0x00000002 59

void fork11() { int i, stat; pid_t pid[5]; for (i = 0; i < 5; i++) { if ((pid[i] = fork()) == 0) { sleep(1); exit(100 + i); } } for (i = 0; i < 5; i++) { pid_t cpid = waitpid(pid[i], &stat, 0); if (WIFEXITED(stat)) { printf("child %d terminated with status %d\n", cpid, WEXITSTATUS(stat)); } } } Child 9085 terminated with status 100 Child 9086 terminated with status 101 Child 9087 terminated with status 102 Child 9088 terminated with status 103 Child 9089 terminated with status 104 60

int main(int argc, const char * argv[]) { int stat; pid_t cpid; } if (fork() == 0) { printf("child pid = %d\n", getpid()); sleep(3); exit(1); } else { /* use with -1 to wait on any child (with options) */ while ((cpid = waitpid(-1, &stat, WNOHANG)) == 0) { sleep(1); printf("no terminated children!\n"); } printf("reaped %d with exit status %d\n", cpid, WEXITSTATUS(stat)); } Child pid = 9108 No terminated children! No terminated children! No terminated children! Reaped 9108 with exit status 1 61

Recap: fork: create new (duplicate) process wait: reap terminated (zombie) process 62

Running new programs (within processes) 63

/* the "exec family" of syscalls */ int execl(const char *path, const char *arg,...); int execlp(const char *file, const char *arg,...); int execv(const char *path, char *const argv[]); int execvp(const char *file, char *const argv[]); 64

Execute a new program within the current process context 65

Complements fork (1 call 2 returns): When called, exec (if successful) never returns! Starts execution of new program 66

int main(int argc, const char * argv[]) { execl("/bin/echo", "bin/echo", "hello", "world", (void *)0); printf("done exec-ing...\n"); return 0; } $./a.out hello world 67

int main(int argc, const char * argv[]) { printf("about to exec!\n"); sleep(1); execl("./execer", "./execer", (void *)0); printf("done exec-ing...\n"); return 0; } $ gcc execer.c -o execer $./execer About to exec! About to exec! About to exec! About to exec!... 68

int main(int argc, const char * argv[]) { if (fork() == 0) { execl("/bin/ls", "/bin/ls", "-l", (void *)0); exit(0); /* in case exec fails */ } wait(null); printf("command completed\n"); return 0; } $./a.out -rwxr-xr-x 1 sean staff 9500 Oct 7 00:15 a.out Command completed 69

Interesting question: Why are fork & exec separate syscalls? /* i.e., why not: */ fork_and_exec("/bin/ls",...) 70

A1: we might really want to just create duplicates of the current process (e.g.?) 71

A2: we might want to replace the current program without creating a new process 72

A3 (more subtle): we might want to tweak a process before running a program in it 73

The Unix Family Tree 74

BIOS bootloader kernel handcrafted process 75

kernel fork & exec /etc/inittab init 76

kernel fork & exec init fork & exec Daemons e.g., sshd, http getty 77

kernel init login exec 78

kernel init shell (e.g., sh) exec 79

kernel init (a fork-ing party!) shell (e.g., sh) user process user process user process user process 80

kernel (or, for the GUI-inclined) init display manager (e.g., xdm) X Server (e.g., XFree86) window manager (e.g., twm) 81

window manager (e.g., twm) terminal emulator (e.g., xterm) shell (e.g., sh) user process user process user process user process 82

The Shell (aka the CLI) 83

The original operating system user interface 84

Essential function: let the user issue requests to the operating system e.g., fork/exec a program, manage processes (list/stop/term), browse/manipulate the file system 85

(a read-eval-print-loop (REPL) for the OS) 86

pid_t pid; char buf[80], *argv[10]; while (1) { /* print prompt */ printf("$ "); buf argv l s \0 - l \0 \0 /* read command and build argv */ fgets(buf, 80, stdin); for (i = 0, argv[0] = strtok(buf, " \n"); argv[i]; argv[++i] = strtok(null, " \n")); /* fork and run command in child */ if ((pid = fork()) == 0) { if (execvp(argv[0], argv) < 0) { printf("command not found\n"); exit(0); } } } 87

The process management API (fork, wait, exec, etc.) let us tap into CPU virtualization i.e., create, manipulate, clean up concurrently running programs 88

Next key component in the Von Neumann machine: memory To be accessed by all of our processes Simultaneously With a simple, consistent API Withint trampling on each other! 89

Virtual Memory 90

Motivation 91

CPU instr data results Memory I/O Again, recall the Von Neumann architecture a stored-program computer with programs and data stored in the same memory 92

CPU instr data results Memory I/O Memory is an idealized storage device that hold our programs (instructions) and data (operands) 93

CPU instr data results Memory I/O Colloquially: RAM, random access memory ~ big array of byte-accessible data 94

execlp("/bin/echo", "/bin/echo", "hello", NULL); but our code & data clearly reside on the hard drive (to start)! 95

register file instr data results Memory hard disk in reality, memory is a combination of storage systems with very different access characteristics 96

common types of memory : SRAM, DRAM, NVRAM, HDD 97

Relative Speeds Type Size Access latency Unit Registers 8-32 words 0-1 cycles (ns) On-board SRAM 32-256 KB 1-3 cycles (ns) Off-board SRAM 256 KB - 16 MB 10 cycles (ns) DRAM 128 MB - 64 GB 100 cycles (ns) SSD 1 TB ~10,000 cycles (µs) HDD 4 TB 10,000,000 cycles (ms) 98

Numbers Every Programmer Should Know http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html 99

would like: 1. a lot of memory 2. fast access to memory 3. to not spend $$$ on memory 100

CPU registers cache (SRAM) smaller, faster costlier main memory (DRAM) local hard disk drive (HDD) larger, slower, cheaper remote storage (networked drive / cloud) an exercise in compromise: the memory hierarchy 101

idea: use the fast but scarce kind as much as possible; fall back on the slow but plentiful kind when necessary 102

registers cache (SRAM) main memory (DRAM) local hard disk drive (HDD/SSD) remote storage (networked drive / cloud) focus on DRAM HDD, SSD, etc. i.e., memory as a cache for disk 103

main goals: 1. maximize memory throughput 2. maximize memory utilization 3. provide address space consistency & memory protection to processes 104

throughput = # bytes per second - depends on access latencies (DRAM, HDD) and hit rate - improve by minimizing disk accesses 105

utilization = fraction of allocated memory that contains user data (aka payload) - reduced by storing metadata in memory - affected by alignment, block size, etc. 106

address space consistency provide a uniform view of memory to each process 107

0xffffffff 0xc0000000 Kernel virtual memory (code, data, heap, stack) User stack (created at runtime) Memory invisible to user code %esp (stack pointer) 0x40000000 Memory mapped region for shared libraries 0x08048000 0 Run-time heap (created by malloc) Read/write segment (.data,.bss) Read-only segment (.init,.text,.rodata) Unused brk Loaded from the executable file address space consistency provide a uniform view of memory to each process 108

memory protection prevent processes from directly accessing each other s address space 109

0xffffffff 0xc0000000 Kernel virtual memory (code, data, heap, stack) User stack (created at runtime) Memory invisible to user code %esp (stack pointer) 0xffffffff 0xc0000000 Kernel virtual memory (code, data, heap, stack) User stack (created at runtime) Memory invisible to user code %esp (stack pointer) 0xffffffff 0xc0000000 Kernel virtual memory (code, data, heap, stack) User stack (created at runtime) Memory invisible to user code %esp (stack pointer) 0x40000000 Memory mapped region for shared libraries 0x40000000 Memory mapped region for shared libraries 0x40000000 Memory mapped region for shared libraries Run-time heap (created by malloc) brk Run-time heap (created by malloc) brk Run-time heap (created by malloc) brk 0x08048000 0 Read/write segment (.data,.bss) Read-only segment (.init,.text,.rodata) Unused Loaded from the executable file 0x08048000 0 Read/write segment (.data,.bss) Read-only segment (.init,.text,.rodata) Unused Loaded from the executable file 0x08048000 0 Read/write segment (.data,.bss) Read-only segment (.init,.text,.rodata) Unused Loaded from the executable file P0 P1 P2 memory protection prevent processes from directly accessing each other s address space 110

i.e., every process should be provided with a managed, virtualized address space 111

memory addresses : what are they, really? 112

Main Memory CPU (note: cache not shown) N address: N data physical address: (byte) index into DRAM 113

int glob = 0xDEADBEEE; main() { fork(); glob += 1; } (gdb) set detach-on-fork off (gdb) break main Breakpoint 1 at 0x400508: file memtest.c, line 7. (gdb) run Breakpoint 1, main () at memtest.c:7 7 fork(); (gdb) next [New process 7450] 8 glob += 1; (gdb) print &glob $1 = (int *) 0x6008d4 (gdb) next 9 } (gdb) print /x glob $2 = 0xdeadbeef (gdb) inferior 2 [Switching to inferior 2 [process 7450] #0 0x000000310acac49d in libc_fork () 131 pid = ARCH_FORK (); (gdb) finish Run till exit from #0 in libc_fork () 8 glob += 1; (gdb) print /x glob $4 = 0xdeadbeee (gdb) print &glob $5 = (int *) 0x6008d4 parent child 114

Main Memory N CPU address: N data instructions executed by the CPU do not refer directly to physical addresses! 115

processes reference virtual addresses, the CPU relays virtual address requests to the memory management unit (MMU), which are translated to physical addresses 116

Main Memory MMU physical address CPU virtual address address translation unit disk address (note: cache not shown) swap space 117

the size of virtual memory is determined by the virtual address width the physical address width is determined by the amount of installed physical memory 118

e.g., given 48-bit virtual address, 8GB installed DRAM - 48-bits 256TB virtual space - 8GB 33-bit physical address 119

P0 2 n -1 virtual address space n virtual address P1 2 n -1 virtual address space 2 m -1 (global) physical address space n m P2 2 n -1 virtual address virtual address space physical address n virtual address 120

essential problem: map request for a virtual address physical address and this must be FAST! (happens on every memory access) 121

both hardware/software are involved: - MMU (hw) handles simple and fast operations (e.g., table lookups) - Kernel (sw) handles complex tasks (e.g., eviction policy) 122

Virtual Memory Implementation 123

keep in mind goals: 1. maximize memory throughput 2. maximize memory utilization 3. provide address space consistency & memory protection to processes 124

1. simple relocation Main Memory CPU MMU relocation reg. N B VA: N B PA: N+B data - per-process relocation address is loaded by kernel on every context switch 125

1. simple relocation Main Memory CPU MMU limit reg. L relocation reg. N B+L B process sandbox VA: N B PA: N+B assert (0 N L) data - incorporate a limit register to provide memory protection 126

pros: - simple & fast! - provides protection 127

stack heap Virtual Memory Main Memory Main Memory stack stack data data heap heap vs. code code B data data code code B but: available memory for mapping depends on value of base address i.e., address spaces are not consistent! 128

virtual address space Main Memory L stack stack possibly unused!! code code B also: all of a process below the address limit must be loaded in memory i.e., memory may be vastly under-utilized 129

2. segmentation - partition virtual address space into multiple logical segments - individually map them onto physical memory with relocation registers 130

Segmented Virtual Address Space Main Memory B2+L2! Seg #$: Stack MMU Segment Table B2! Seg ##: Heap 0 1 2 3 Base Limit! B0 L0 " B1 L1 # B2 L2 $ B3 L3 B3+L3 B3 B0+L0! Seg #": Data B0! Seg #!: Code B1+L1 B1 virtual address has form seg#:offset 131

0 1 2 3 MMU Segment Table Base Limit! B0 L0 " B1 L1 # B2 L2 $ B3 L3 Main Memory B2+L2 B2 B3+L3 CPU VA: seg#:o set PA: o set + B2 B3 B0+L0 B0 assert (o set L2) B1+L1 B1 data 132

0 1 2 3 Segment Table Base Limit! B0 L0 " B1 L1 # B2 L2 $ B3 L3 - implemented as MMU registers - part of kernel-maintained, per-process metadata (aka process control block ) - re-populated on each context switch 133

pros: - still very fast - translation = register access & addition - memory protection via limits - segmented addresses improve consistency 134

what about utilization? 135

virtual address space Main Memory simple relocation: L stack stack possibly unused!! code code B virtual address space Main Memory segmentation:! stack stack better!! code code 136

virtual address space Main Memory x virtual address space 2 stack! stack x stack! code code!! code x - variable segment sizes memory fragmentation - fragmentation potentially lowers utilization - can fix through compaction, but expensive! 137

3. paging - partition virtual and physical address spaces into uniformly sized pages - only map pages onto physical memory that contain required data 138

stack physical memory heap data code - pages boundaries are not aligned to segments! - instead, aligned to multiples of page size 139

stack physical memory heap data code - minimum mapping granularity = page - not all of a given segment need be mapped 140

new mapping problem: - given virtual address, decompose into virtual page number & virtual page offset - map VPN physical page number 141

Given page size = 2 p bytes VA: PA: virtual page number physical page number p virtual page offset p physical page offset 142

VA: virtual page number virtual page offset address translation PA: physical page number physical page offset 143

VA: n virtual page number virtual page offset translation structure: page table valid PPN index 2 n entries if invalid, page is not mapped PA: physical page number physical page offset 144

page table entries (PTEs) typically contain additional metadata, e.g.: - dirty (modified) bit - access bits (shared or kernel-owned pages may be read-only or inaccessible) 145

e.g., 32-bit virtual address, 4KB (2 12 ) pages, 4-byte PTEs; - size of page table? 146

e.g., 32-bit virtual address, 4KB (2 12 ) pages, 4-byte PTEs; - # pages = 2 (32-12) = 2 20 =1M - page table size = 1M 4 bytes = 4MB 147

4MB is much too large to fit in the MMU insufficient registers and SRAM! Page table resides in main memory 148

The translation process (aka page table walk) is performed by hardware (MMU). The kernel must initially populate, then continue to manage a process s page table The kernel also populates a page table base register on context switches 149

translation: hit CPU VA: N Address Translator (part of MMU) page table walk Main Memory Page Table PA: N' data 150

translation: miss transfer control to kernel CPU VA: N VA: N (retry) page fault Address Translator (part of MMU) page table walk PA: N' Main Memory Page Table PTE update kernel Disk (swap space) data transfer data 151

kernel decides where to place page, and what to evict (if memory is full) - e.g., using LRU replacement policy 152

this system enables on-demand paging i.e., an active process need only be partly in memory (load rest from swap dynamically) 153

but if working set (of active processes) exceeds available memory, we may have swap thrashing 154

integration with caches? 155

Q: Do caches use physical or virtual address for lookups? 156

Virtual Address Based Cache Process A CPU Process B Virtual Address Space Cache Address Data L X M Y N Z Ambiguous! Virtual Address Space Z N Synonym problem M M L X 0 0 157

Physical Address Based Cache Process A CPU Process B Virtual Address Space Cache Address Data S X Q Y R Z Virtual Address Space Z N M Physical Memory Y M L 0 X X Z Y S R Q 0 158

Q: Do caches use physical or virtual address for lookups? A: Caches typically use physical addresses 159

Main Memory page table walk process page table CPU MMU Cache VA (address translation unit) PA (miss) data (hit) (update) %*@$&#!! 160

Saved by hardware: The Translation Lookaside Buffer (TLB) a cache used solely for VPN PPN lookups 161

Main Memory page table walk process page table CPU MMU Cache VA (address translation unit) PA (miss) data (hit) (update) 162

MMU Main Memory only if TLB miss! address translation unit page table walk process page table CPU TLB Cache VA (VPN PPN cache) PA (miss) data (hit) (update) TLB + Page Table (exercise for you: revise earlier translation diagrams!) 163

virtual address n-1 p p-1 0 virtual page number (VPN) page offset valid tag physical page number (PPN) TLB = TLB Hit physical address valid tag data byte offset Cache Cache Hit = Data 164

Not so fast! TLB mappings are process specific requires flush & reload on context switch - Some architectures store PID (aka virtual space ID) in TLB 165

Another problem: TLB caches a few thousand mappings vs. millions of virtual pages per process! 166

We can improve TLB hit rate by reducing the number of pages by increasing the size of each page. 167

Compute # of pages for 32-bit memory for: 1KB, 512KB, 4MB pages 2 32 2 10 = 2 22 = 4M pages 2 32 2 19 = 2 13 = 8K pages 2 32 2 22 = 2 10 = 1K pages (not bad!) 168

Process A Virtual Memory Lots of wasted space! Physical Memory Process B Virtual Memory 169

Process A Virtual Memory Process B Virtual Memory Physical Memory 170

Increasing page size results in increased internal fragmentation and lower utilization 171

That is, TLB effectiveness needs to be balanced against memory utilization! 172

So, what about 64-bit systems? 2 64 = 16 ExbiByte address space 4 billion 4GB 173

Most modern implementations support a max of 2 48 (256TB) addressable space. 174

Page table size? # pages = 2 48 2 12 = 2 36 PTE size = 8 bytes (64 bits) PT size = 2 36 8 = 2 39 bytes = 512GB 175

512GB (just for the virtual memory mapping structure) (and we need one per process) 176

(We re not going to be able to fit these into memory) 177

Instead, use multi-level page tables: Split an address translation into two (or more) separate table lookups Unused parts of the table don t need to be in memory! 178

Toy memory system - 8 bit addresses - 32-byte pages 7 6 5 4 3 2 1 0 1 1 0 1 1 0 1 0 VPN page offset Page Table 7 (unmapped) 6 PPN 5 (unmapped) 4 PPN 3 (unmapped) 2 (unmapped) 1 (unmapped) 0 (unmapped) All 8 PTEs must be in memory at all times 179

Toy memory system - 8 bit addresses - 32-byte pages 7 6 5 4 3 2 1 0 1 1 0 1 1 0 1 0 page offset 3 (unmapped) 2 PPN 1 0 1 (unmapped) 0 PPN page directory all unmapped; don t need in memory! 180 3 (unmapped) 2 (unmapped) 1 (unmapped) 0 (unmapped)

Toy memory system - 8 bit addresses - 32-byte pages 7 6 5 4 3 2 1 0 1 1 0 1 1 0 1 0 page offset 1 0 3 (unmapped) 2 PPN 1 (unmapped) 0 PPN 181

Intel Architecture Memory Management http://www.intel.com/products/processor/ manuals/ (Software Developer s Manual Volume 3A) 182

Logical Address (or Far Pointer) Segment Selector Offset Linear Address Space Global Descriptor Table (GDT) Linear Address Dir Table Offset Physical Address Space Segment Descriptor Segment Lin. Addr. Page Directory Page Table Entry Page Phy. Addr. Entry Segment Base Address Page Segmentation Paging 183

Segment Registers CS SS DS ES FS GS Segment Descriptors Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Linear Address Space (or Physical Memory) Stack Code Data Data Data Data Segment Descriptors 184

Segment Registers CS SS Code- and Data-Segment Descriptors Linear Address Space (or Physical Memory) Code Not Present FFFFFFFFH DS ES Access Limit Base Address Data and Stack 0 FS GS Flat Model 185

Paging Modes 186

Linear Address 31 22 21 12 11 Directory Table Offset 12 0 4-KByte Page 10 Page Directory 10 Page Table Physical Address PDE with PS=0 20 PTE 20 32 CR3 32-Bit Paging (4KB Pages) 187

32-Bit Paging (4MB Pages) 188

IA-32e Paging (4KB Pages) 189

IA-32e Paging (1GB Pages) 190

Dynamic Memory Allocation 191

registers From: cache (SRAM) main memory (DRAM) local hard disk drive (HDD/SSD) remote storage (networked drive / cloud) The Memory Hierarchy 192

We now have: Virtual Memory 193

Now what? 194

- code, global variables, jump tables, etc. - Allocated at fork/exec - Lifetime: permanent Static Data 195

The Stack - Function activation records - Local vars, arguments, return values - Lifetime: LIFO Pages allocated as needed (up to preset stack limit) 196

Explicitly requested from the kernel - For dynamic allocation - Lifetime: arbitrary! The Heap 197

- Starts out empty - brk pointer marks top of the heap brk The Heap 198

Heap mgmt syscall: void *sbrk(int inc); /* resizes heap by inc, returns old brk value */ The Heap brk 199

void *hp = sbrk(n); N brk hp The Heap 200

Can use sbrk to allocate structures: int **make_jagged_arr(int nrows, const int *dims) { int i, j; int **jar = sbrk(sizeof(int *) * nrows); for (i = 0; i < nrows; i++) { jarr[i] = sbrk(sizeof(int) * dims[i]); } } return jarr; 201

But, we can t free this memory! int **make_jagged_arr(int nrows, const int *dims) { int i, j; int **jar = sbrk(sizeof(int *) * nrows); for (i = 0; i < nrows; i++) { jarr[i] = sbrk(sizeof(int) * dims[i]); } } return jarr; void free_jagged_arr(int **jar, int nrows) { int i; for (i = 0; i < nrows; i++) { free(jarr[i]); } free(jarr); } 202

After the kernel allocates heap space for a process, it is up to the process to manage it! 203

Manage = tracking memory in use, tracking memory not in use, reusing unused memory 204

Job of the dynamic memory allocator Typically included as a user-level library and/or language runtime feature 205

User Process application program malloc dynamic memory allocator Heap sbrk OS Kernel RAM Disk 206

User Process application program free(p) dynamic memory allocator Heap OS Kernel RAM Disk 207

User Process application program free(p) dynamic memory allocator Heap OS Kernel RAM Disk (heap space may not be returned to the kernel!) 208

The DMA constructs a user-level abstraction (re-usable blocks of memory) on top of a kernel-level one (virtual memory) 209

The user-level implementation must make good use of the underlying infrastructure (the memory hierarchy) 210