Tolerating Memory Leaks Michael. ond Kathryn S. McKinley ugs in eployed Software eployed software fails ifferent environment and inputs different behaviors Greater complexity & reliance 1
ugs in eployed Software eployed software fails ifferent environment and inputs different behaviors Greater complexity & reliance Memory leaks are a real problem ixing leaks is hard Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them 2
Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them ead Reachable Live Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them ead Reachable Live 3
Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them ead Reachable Live Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them ead Live 4
Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them Slow & crash real programs ead Live Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them Slow & crash real programs Unacceptable for some applications 5
Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them Slow & crash real programs Unacceptable for some applications ixing leaks is hard Leaks take time to materialize ailure far from cause xample riverless truck 10,000 lines of # Leak: past obstacles remained reachable No immediate symptoms This problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles. Quick fix : after 40 minutes, stop & reboot nvironment sensitive More obstacles in deployment: failed in 28 minutes http://www.codeproject.com/k/showcase/ifonlywedusedntsprofiler.aspx 6
xample riverless truck 10,000 lines of # Leak: past obstacles remained reachable No immediate symptoms This problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles. Quick fix : restart after 40 minutes nvironment sensitive More obstacles in deployment ailed in 28 minutes http://www.codeproject.com/k/showcase/ifonlywedusedntsprofiler.aspx xample riverless truck 10,000 lines of # Leak: past obstacles remained reachable No immediate symptoms This problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles. Quick fix : restart after 40 minutes nvironment sensitive More obstacles in deployment ailed in 28 minutes http://www.codeproject.com/k/showcase/ifonlywedusedntsprofiler.aspx 7
xample riverless truck 10,000 lines of # Leak: past obstacles remained reachable No immediate symptoms This problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles. Quick fix : restart after 40 minutes nvironment sensitive More obstacles in deployment Unresponsive after 28 minutes http://www.codeproject.com/k/showcase/ifonlywedusedntsprofiler.aspx Uncertainty in eployed Software Unknown leaks; unexpected failures Online leak diagnosis helps Too late to help failing systems 8
Uncertainty in eployed Software Unknown leaks; unexpected failures Online leak diagnosis helps Too late to help failing systems lso tolerate leaks Uncertainty in eployed Software Unknown leaks; unexpected failures Online leak diagnosis helps Too late to help failing systems lso tolerate leaks Illusion of fix 9
Uncertainty in eployed Software Unknown leaks; unexpected failures Online leak diagnosis helps Too late to help failing systems lso tolerate leaks liminate bad effects Illusion of fix on t slow on t crash Uncertainty in eployed Software Unknown leaks; unexpected failures Online leak diagnosis helps Too late to help failing systems lso tolerate leaks Illusion of fix liminate bad effects Preserve semantics on t slow on t crash efer OOM errors 10
Predicting the uture ead objects not used again Highly stale objects likely leaked Reachable ead Live Predicting the uture ead objects not used again Highly stale objects likely leaked Reachable ead [hilimbi & Hauswirth 04] [Qin et al. 05] [ond & McKinley 06] Live 11
Tolerating Leaks with Melt Move highly stale objects to disk Much larger than memory Time & space proportional to live memory Preserve semantics Stale objects In-use objects Stale objects Sounds like Paging! Stale objects In-use objects Stale objects 12
Sounds like Paging! Paging insufficient for managed languages Need object granularity G s working set is all reachable objects Sounds like Paging! Paging insufficient for managed languages Need object granularity G s working set is all reachable objects ookmarking collection [Hertz et al. 05] 13
hallenge #1: How does Melt identify stale objects? hallenge #1: How does Melt identify stale objects? G: for all fields a.f a.f = 0x1; 14
hallenge #1: How does Melt identify stale objects? G: for all fields a.f a.f = 0x1; hallenge #1: How does Melt identify stale objects? G: for all fields a.f a.f = 0x1; pplication: b = a.f; if (b & 0x1) { b &= ~0x1; a.f = b; [atomic] } 15
hallenge #1: How does Melt identify stale objects? G: for all fields a.f a.f = 0x1; pplication: b = a.f; if (b & 0x1) { b &= ~0x1; a.f = b; [atomic] } dd 6% to application time hallenge #1: How does Melt identify stale objects? G: for all fields a.f a.f = 0x1; pplication: b = a.f; if (b & 0x1) { b &= ~0x1; a.f = b; [atomic] } 16
hallenge #1: How does Melt identify stale objects? G: for all fields a.f a.f = 0x1; pplication: b = a.f; if (b & 0x1) { b &= ~0x1; a.f = b; [atomic] } Stale Space Heap nearly full move stale objects to disk in-use space stale space 17
Stale Space in-use space stale space hallenge #2 How does Melt maintain pointers? in-use space stale space 18
Stub-Scion Scion Pairs scion space scion stub in-use space stale space Stub-Scion Scion Pairs scion table scion scion space scion stub in-use space stale space 19
Stub-Scion Scion Pairs scion scion scion? table space scion stub in-use space stale space Scion-Referenced Object ecomes Stale scion table scion scion space scion stub in-use space stale space 20
Scion-Referenced Object ecomes Stale scion table scion space stub in-use space stale space hallenge #3 What if program accesses highly stale object? scion table scion space stub in-use space stale space 21
pplication ccesses Stale Object scion table scion space b = a.f; if (b & 0x1) { b &= ~0x1; if (instalespace(b)) b = activate(b); stub a.f = b; [atomic] } in-use space stale space pplication ccesses Stale Object scion table scion scion space stub scion stub in-use space stale space 22
pplication ccesses Stale Object scion table scion scion space stub scion stub in-use space stale space pplication ccesses Stale Object scion table scion scion space stub scion stub in-use space stale space 23
Implementation Integrated into Jikes RVM 2.9.2 Works with any tracing collector valuation uses generational copying collector Implementation Integrated into Jikes RVM 2.9.2 Works with any tracing collector valuation uses generational copying collector 32-bit 2 G 64-bit 120 G 24
Implementation Integrated into Jikes RVM 2.9.2 Works with any tracing collector valuation uses generational copying collector mapping stub 32-bit 2 G 64-bit 120 G Performance valuation Methodology aapo, SPjbb2000, SPjvm98 ual-core Pentium 4 eterministic execution (replay) Results 6% overhead (read barriers) Stress test: still 6% overhead Speedups in tight heaps (reduced G workload) 25
Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) 26
Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) 27
Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) 28
Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) 29
clipse iff: Reachable Memory 256 Reachable memory (M) 192 128 64 0 0 200 400 600 800 1000 Iteration clipse iff: Reachable Memory 256 Reachable memory (M) 192 128 64 0 0 200 400 600 800 1000 Iteration 30
clipse iff: Performance 31
clipse iff: Performance clipse iff: Performance 32
Related Work Managed [LeakSurvivor, Tang et al. 08] [Panacea, Goldstein et al. 07, reitgand et al. 07] on t guarantee time & space proportional to live memory Native [yclic memory allocation, Nguyen & Rinard 07] [Plug, Novark et al. 08] ifferent challenges & opportunities Less coverage or change semantics Orthogonal persistence & distributed G arriers, swizzling, object faulting, stub-scion pairs 33
onclusion inding bugs before deployment is hard onclusion inding bugs before deployment is hard Online diagnosis helps developers Help users in meantime Tolerate leaks with Melt: illusion of fix Stale objects 34
onclusion inding bugs before deployment is hard Online diagnosis helps developers Help users in meantime Tolerate leaks with Melt: illusion of fix Time & space proportional to live memory Preserve semantics onclusion inding bugs before deployment is hard Online diagnosis helps developers Help users in meantime Tolerate leaks with Melt: illusion of fix Time & space proportional to live memory Preserve semantics uys developers time to fix leaks Thank you! 35
ackup Triggering Melt xpected heap fullness Start INTIV Unexpected heap fullness MRK STL Heap full or nearly full MOV & MRK STL Heap not nearly full Heap full or nearly full fter marking WIT Heap not nearly full ack 36
onclusion inding bugs before deployment is hard Online diagnosis helps developers To help users in meantime, tolerate bugs Tolerate leaks with Melt: illusion of fix Stale objects Related Work: Tolerating ugs Nondeterministic errors [tom-id] [iehard] [Grace] [Rx] Memory corruption: perturb layout oncurrency bugs: perturb scheduling General bugs Ignore failing operations [O] Need higher level, more proactive approaches 37
Melt s G Overhead 2.5 ase Marking (every G) Melt (every G) Normalized G time 2 1.5 1 0.5 0 1 1.5 2 2.5 3 3.5 4 4.5 5 Minimum heap multiplier ack 38