Opera&ng Systems CMPSCI 377 Garbage Collec&on Emery Berger and Mark Corner University of Massachuse9s Amherst
Ques<ons To Come Terms: Tracing Copying Conserva&ve Parallel GC Concurrent GC Run&me & space costs Live objects Reachable objects References 2
Explicit Memory Management malloc/new allocates space for an object free/delete returns memory to system Simple, but tricky to get right Forget to free memory leak free too soon dangling pointer Double free, invalid free...
Dangling Pointers Node x = new Node ( happy ); Node ptr = x; delete x; // But I m not dead yet! Node y = new Node ( sad ); cout << ptr- >data << endl; // sad Insidious, hard- to- track down bugs
Solu<on: Garbage Collec<on Garbage collector periodically scans Looks at all objects on heap No need to free Automa&c memory management Reclaims non- reachable objects Won t reclaim objects un&l they re dead (actually somewhat later)
No More Dangling Pointers Node x = new Node ( happy ); Node ptr = x; // x s&ll live (reachable through ptr) Node y = new Node ( sad ); cout << ptr- >data << endl; // happy! So why not use GC all the time?
Because This Guy Says Not To Linus Torvalds There just aren t all GC that sucks many donkey worse brains ways through to f*** a up straw your from cache a performance behavior than standpoint. by using lots of allocations and lazy GC to manage your memory.
One side of the argument GC impairs performance Extra processing collec&on, copying Degrades cache performance Degrades page locality Increases memory needs delayed reclama&on
On the other hand No, GC enhances performance! Faster alloca&on pointer- bumping vs. freelists Improves cache performance no need for headers Beaer locality can reduce fragmenta&on, compact data structures according to use
Outline Classical GC algorithms Quan&fying GC performance A hard problem Oracular memory management GC vs. malloc/free bakeoff
Classical Algorithms Three classical algorithms Mark- sweep Reference coun&ng Semispace Tweaks Genera&onal garbage collec&on Out of scope Parallel perform GC in parallel Concurrent run GC at same &me as app Real- &me ensure bounded pause &mes 11
Mark- Sweep Start with roots Global variables, variables on stack & in registers Recursively visit every object through pointers Mark every object we find (set mark bit) Everything not marked = garbage Can then sweep heap for unmarked objects and free them 12
Mark- Sweep Example roots global 1 global 2 global 3 object 1 object 2 object 3 object 4 object 5 object 6 Ini&ally, all objects white (garbage) 13
Mark- Sweep Example roots global 1 global 2 global 3 object 1 object 2 object 3 object 4 object 5 object 6 Ini&ally, all objects white (garbage) Visit objects, following object graph 14
Mark- Sweep Example roots global 1 global 2 global 3 object 1 object 2 object 3 object 4 object 5 object 6 Ini&ally, all objects white (garbage) Visit objects, following object graph 15
Mark- Sweep Example roots global 1 global 2 global 3 object 1 object 2 object 3 object 4 object 5 object 6 Ini&ally, all objects white (garbage) Visit objects, following object graph 16
Mark- Sweep Example roots global 1 global 2 global 3 freelist object 1 object 2 object 3 object 4 object 5 object 6 Ini&ally, all objects white (garbage) Visit objects, following object graph Can sweep immediately or lazily 17
Reference Coun<ng For every object, maintain reference count = number of incoming pointers a- >ptr = x refcount(x)++ a- >ptr = y refcount(x)- - ; refcount(y)++ Reference count = 0 no more incoming pointers: garbage 18
Reference Coun<ng Example roots global 1 global 2 global 3 2 1 1 object 2 object 4 object 5 New objects: ref count = 1 19
Reference Coun<ng Example roots global 1 global 2 global 3 1 1 1 object 2 object 4 object 5 New objects: ref count = 1 Delete pointer: refcount- - 20
Reference Coun<ng Example roots global 1 global 2 global 3 1 1 1 object 2 object 4 object 5 New objects: ref count = 1 Delete pointer: refcount- - 21
Reference Coun<ng Example roots global 1 global 2 global 3 1 1 0 object 2 object 4 object 5 New objects: ref count = 1 Delete pointer: refcount- - And recursively delete pointers 22
Reference Coun<ng Example roots global 1 global 2 global 3 1 0 0 object 2 object 4 object 5 New objects: ref count = 1 Delete pointer: refcount- - And recursively delete pointers refcount == 0: put on freelist 23
Quick Ac<vity roots global 1 global 2 global 3 1 0 0 object 2 object 4 object 5 What could go wrong with ref coun&ng? How could you fix it? 24
Cycles & Reference Coun<ng roots global 1 global 2 global 3 2 1 2 object 2 object 4 object 5 Big problem: cycles 25
Reference Coun<ng Example roots global 1 global 2 global 3 2 1 1 object 2 object 4 object 5 Big problem: cycles 26
Reference Coun<ng Example roots global 1 global 2 global 3 2 1 1 object 2 object 4 object 5 Big problem: cycles Cycles lead to unreclaimable garbage Need to do periodic tracing collec&on (e.g., mark- sweep) 27
Semispace Divide heap in two semispaces: Allocate objects from from- space When from- space fills, Scan from roots through live objects Copy them into to- space When done, switch the spaces Allocate from lepover part of to- space 28
Semispace Example from- space Allocate in from- space Pointer bumping to- space 29
Semispace Example from- space Allocate in from- space Pointer bumping to- space 30
Semispace Example from- space Allocate in from- space Pointer bumping to- space 31
Semispace Example from- space Allocate in from- space Pointer bumping Copy live objects into to- space to- space 32
Semispace Example from- space to- space Allocate in from- space Pointer bumping Copy live objects into to- space Leaves forwarding pointer 33
Semispace Example from- space to- space Allocate in from- space Pointer bumping Copy live objects into to- space Leaves forwarding pointer 34
Semispace Example from- space to- space Allocate in from- space Pointer bumping Copy live objects into to- space Leaves forwarding pointer 35
Semispace Example to- space from- space Allocate in from- space Pointer bumping Copy live objects into to- space Leaves forwarding pointer Flip spaces; allocate from end 36
Genera<onal GC Op&miza&on for copying collectors Genera&onal hypothesis: most objects die young Common- case op&miza&on Allocate into nursery Small region Collect frequently Copy out survivors Key idea: keep track of pointers from mature space into nursery 37
Genera<onal GC Example mature space nursery 38
Genera<onal GC Example mature space nursery 39
Genera<onal GC Example mature space nursery 40
Genera<onal GC Example mature space Copy out survivors (via roots & mature space pointers) nursery 41
Genera<onal GC Example mature space Copy out survivors (via roots & mature space pointers) nursery 42
Genera<onal GC Example mature space Copy out survivors (via roots & mature space pointers) nursery 43
Genera<onal GC Example mature space Copy out survivors (via roots & mature space pointers) Reset alloca&on pointer & con&nue nursery 44
Conserva<ve GC Non- copying collectors for C & C++ Must iden&fy pointers Duck test : if it looks like a pointer, it s a pointer Trace through pointers, marking everything Can link with Boehm- Demers- Weiser library ( libgc ) 45
Comparing GC to Malloc Omit details of how to do this It s complicated Depends on the language Assume perfect malloc Free right when done
Geo. Mean of Execu<on Time 130% 125% GenMS GenCopy GenRC Lea w/ Reach Lea w/ Life MSExplicit w/ Reach 120% Execution Time Relative to Lea 115% 110% 105% 100% 95% 90% 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00 Heap Size Relative to Collector Minimum Trades space for &me
Summary of Results Best collector equals Lea's performance Up to 10% faster on some benchmarks... but uses more memory Quickest runs require 5x or more memory GenMS at least doubles mean footprint
When to Use Garbage Collec<on Garbage collec&on fine if system has more than 3x needed RAM and no compe&&on with other processes or avoiding bugs / security more important Not so good: Limited RAM Compe&&on for physical memory Depends on RAM for performance In- memory database Search engines, etc.
The End 50
GC vs. malloc/free 51
Comparing Memory Managers Node v = malloc(sizeof(node)); v->data=malloc(sizeof(nodedata)); memcpy(v->data, old->data, sizeof(nodedata)); free(old->data); v->next = old->next; v->next->prev = v; v->prev = old->prev; v->prev->next = v; free(old); BDW Collector Using GC in C/C++ is easy:
Comparing Memory Managers Node v = malloc(sizeof(node)); v->data=malloc(sizeof(nodedata)); memcpy(v->data, old->data, sizeof(nodedata)); free(old->data); v->next = old->next; v->next->prev = v; v->prev = old->prev; v->prev->next = v; free(old); BDW Collector slide in BDW and ignore calls to free.
What About Other Garbage Collectors? Compares malloc to GC, but only conserva3ve, non- copying collectors Can t reduce fragmenta&on, reorder objects, etc. But: faster precise, copying collectors Incompa&ble with C/C++ Standard for Java
Comparing Memory Managers Node node = new Node(); node.data = new NodeData(); usenode(node); node = null;... node = new Node();... node.data = new NodeData();... Lea Allocator Adding malloc/free to Java: not so easy
Comparing Memory Managers Node node = new Node(); node.data = new NodeData(); usenode(node); node = null;... node = new Node(); free (node)? free(node.data)?... node.data = new NodeData();... Lea Allocator... need to insert frees, but where?
Oracular Memory Manager Java C malloc/free execute program here Simulator alloca3on perform actions at no cost below here Oracle Consult oracle at each alloca&on Oracle does not disrupt hardware state Simulator invokes free()
Object Life&me & Oracle Placement obj = new Object; live reachable freed can be by freed lifetimebased oracle freed by reachabilitybased oracle free(obj) free(obj) free(??) can be collected dead unreachable Oracles bracket placement of frees Life<me- based: most aggressive Reachability- based: most conserva&ve
Liveness Oracle Genera&on Java C malloc/ free execute program here alloca3on, mem access, prog. roots PowerPC Simulator perform actions at no cost below here trace file Post- process Oracle Liveness: record allocs, memory accesses
Reachability Oracle Genera&on Java C malloc/ free execute program here alloca3ons, ptr updates, prog. roots PowerPC Simulator perform actions at no cost below here trace file Merlin analysis Oracle Reachability: Illegal instruc&ons mark heap events (especially pointer assignments)
Oracular Memory Manager Java C malloc/ free execute program here PowerPC Simulator alloca3on perform actions at no cost below here oracle Run & consult oracle before each alloca&on When needed, modify instruc&on to call free Extra costs (oracle access) hidden by simulator
Execu&on Time for pseudojbb 150% GenM S 140% GenCopy GenRC Lea w/ Reach Lea w/ Life 130% 120% 110% 100% 90% 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00 Heap Size Relative to Collector Minimum GC can be faster than malloc/free
Footprint at Quickest Run 800% 700% 600% 500% 400% 300% 200% 100% 0% Lea w/ Reach Lea w/ Life MMTk Kingsley GenMS GenCopy CopyMS SemiSpace MarkSweep GC uses much more memory for speed
Footprint at Quickest Run 800% 700% 7.69 7.09 600% 500% 5.10 5.66 4.84 400% 300% 200% 100% 1.00 0.63 1.38 1.61 0% Lea w/ Reach Lea w/ Life MMTk Kingsley GenMS GenCopy CopyMS SemiSpace MarkSweep GC uses much more memory for speed
Javac Paging Performance GC: poor paging performance