Greater complexity & reliance

Similar documents
I J C S I E International Science Press

Tolerating Memory Leaks

Leak Pruning. 1. Introduction. Abstract. Department of Computer Sciences The University of Texas at Austin

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

Managed runtimes & garbage collection

Testing Objectives. Successful testing: discovers previously unknown errors

Self-defending software: Automatically patching errors in deployed software

Memory Leak Detection with Context Awareness

Exploiting the Behavior of Generational Garbage Collector

How to Write a Distributed Garbage Collector

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Cooperative Memory Management in Embedded Systems

Bugs in Deployed Software

Cycle Tracing. Presented by: Siddharth Tiwary

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs

Integrating Reliable Memory in Databases

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

Advanced Memory Management

A Serializability Violation Detector for Shared-Memory Server Programs

Testing. ECE/CS 5780/6780: Embedded System Design. Why is testing so hard? Why do testing?

Short-term Memory for Self-collecting Mutators. Martin Aigner, Andreas Haas, Christoph Kirsch, Ana Sokolova Universität Salzburg

Garbage Collection (1)

Causes of Software Failures

Robust Memory Management Schemes

Crash Consistency: FSCK and Journaling. Dongkun Shin, SKKU

BEAMJIT: An LLVM based just-in-time compiler for Erlang. Frej Drejhammar

Tolerating Hardware Device Failures in Software. Asim Kadav, Matthew J. Renzelmann, Michael M. Swift University of Wisconsin Madison

Software Error Early Detection System based on Run-time Statistical Analysis of Function Return Values

NPTEL Course Jan K. Gopinath Indian Institute of Science

Manual Allocation. CS 1622: Garbage Collection. Example 1. Memory Leaks. Example 3. Example 2 11/26/2012. Jonathan Misurda

Scalable Shared Memory Programing

Optimistic Shared Memory Dependence Tracing

First-Aid: Surviving and Preventing Memory Management Bugs during Production Runs

Fast Byte-Granularity Software Fault Isolation

Garbage Collection. Steven R. Bagley

Efficient Garbage Collection for Large Object-Oriented Databases

Testing. Unit, integration, regression, validation, system. OO Testing techniques Application of traditional techniques to OO software

Reference Object Processing in On-The-Fly Garbage Collection

Reference Counting. Reference counting: a way to know whether a record has other users

Torturing Databases for Fun and Profit

Dynamic Storage Allocation

Triage: Diagnosing Production Run Failures at the User s Site

What You Corrupt Is Not What You Crash: Challenges in Fuzzing Embedded Devices

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26

SAY-Go: Towards Transparent and Seamless Storage-As-You-Go with Persistent Memory

Book keeping. Will post HW5 tonight. OK to work in pairs. Midterm review next Wednesday

Atomic In-place Updates for Non-volatile Main Memories with Kamino-Tx

Garbage Collection Algorithms. Ganesh Bikshandi

Habanero Extreme Scale Software Research Project

FS Consistency & Journaling

Samuel T. King, George W. Dunlap, and Peter M. Chen University of Michigan. Presented by: Zhiyong (Ricky) Cheng

Debugging at Scale Lindon Locks

Myths and Realities: The Performance Impact of Garbage Collection

The Google File System (GFS)

Quantifying the Performance of Garbage Collection vs. Explicit Memory Management

CSE P 501 Compilers. Memory Management and Garbage Collec<on Hal Perkins Winter UW CSE P 501 Winter 2016 W-1

1. Introduction to Concurrent Programming

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

(Lightweight) Recoverable Virtual Memory. Robert Grimm New York University

Lightweight Data Race Detection for Production Runs

IBM Integration Bus v9.0 System Administration: Course Content By Yuvaraj C Panneerselvam

Nooks. Robert Grimm New York University

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1

CrashMonkey: A Framework to Systematically Test File-System Crash Consistency. Ashlie Martinez Vijay Chidambaram University of Texas at Austin

Program Calling Context. Motivation

Networked Systems and Services, Fall 2018 Chapter 4. Jussi Kangasharju Markku Kojo Lea Kutvonen

Encyclopedia of Crash Dump Analysis Patterns

2. PICTURE: Cut and paste from paper

OS-caused Long JVM Pauses - Deep Dive and Solutions

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Computer Systems A Programmer s Perspective 1 (Beta Draft)

Implementation Garbage Collection

Towards Parallel, Scalable VM Services

EECS 482 Introduction to Operating Systems

Verifying the Safety of Security-Critical Applications

DNWSH - Version: 2.3..NET Performance and Debugging Workshop

Garbage Collection Techniques

Probabilistic Calling Context

CS4500/5500 Operating Systems File Systems and Implementations

Improving the Performance of your LabVIEW Applications

Web Services - Concepts, Architecture and Applications Part 3: Asynchronous middleware

DieHard: Probabilistic Memory Safety for Unsafe Programming Languages

Java Without the Jitter

A new Mono GC. Paolo Molaro October 25, 2006

Logging File Systems

Improving Error Checking and Unsafe Optimizations using Software Speculation. Kirk Kelsey and Chen Ding University of Rochester

Surviving Software Failures

Protecting Mission-Critical Workloads with VMware Fault Tolerance W H I T E P A P E R

Verifying a Concurrent Garbage Collector using a Rely-Guarantee Methodology

Exception Codes and Module Numbers Used in Cisco Unified MeetingPlace Express

GC Assertions: Using the Garbage Collector to Check Heap Properties

Database System Architectures Parallel DBs, MapReduce, ColumnStores

Implementing Persistent Handles in Samba. Ralph Böhme, Samba Team, SerNet

[537] Journaling. Tyler Harter

Virtualization. Dr. Yingwu Zhu

Program Testing via Symbolic Execution

PERFBLOWER: Quickly Detecting Memory-Related Performance Problems via Amplification

XenClient Enterprise Release Notes

Encyclopedia of Crash Dump Analysis Patterns Second Edition

Guide to Mitigating Risk in Industrial Automation with Database

Transcription:

Tolerating Memory Leaks Michael. ond Kathryn S. McKinley ugs in eployed Software eployed software fails ifferent environment and inputs different behaviors Greater complexity & reliance 1

ugs in eployed Software eployed software fails ifferent environment and inputs different behaviors Greater complexity & reliance Memory leaks are a real problem ixing leaks is hard Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them 2

Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them ead Reachable Live Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them ead Reachable Live 3

Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them ead Reachable Live Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them ead Live 4

Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them Slow & crash real programs ead Live Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them Slow & crash real programs Unacceptable for some applications 5

Memory Leaks in eployed Systems Memory leaks are a real problem Managed languages do not eliminate them Slow & crash real programs Unacceptable for some applications ixing leaks is hard Leaks take time to materialize ailure far from cause xample riverless truck 10,000 lines of # Leak: past obstacles remained reachable No immediate symptoms This problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles. Quick fix : after 40 minutes, stop & reboot nvironment sensitive More obstacles in deployment: failed in 28 minutes http://www.codeproject.com/k/showcase/ifonlywedusedntsprofiler.aspx 6

xample riverless truck 10,000 lines of # Leak: past obstacles remained reachable No immediate symptoms This problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles. Quick fix : restart after 40 minutes nvironment sensitive More obstacles in deployment ailed in 28 minutes http://www.codeproject.com/k/showcase/ifonlywedusedntsprofiler.aspx xample riverless truck 10,000 lines of # Leak: past obstacles remained reachable No immediate symptoms This problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles. Quick fix : restart after 40 minutes nvironment sensitive More obstacles in deployment ailed in 28 minutes http://www.codeproject.com/k/showcase/ifonlywedusedntsprofiler.aspx 7

xample riverless truck 10,000 lines of # Leak: past obstacles remained reachable No immediate symptoms This problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles. Quick fix : restart after 40 minutes nvironment sensitive More obstacles in deployment Unresponsive after 28 minutes http://www.codeproject.com/k/showcase/ifonlywedusedntsprofiler.aspx Uncertainty in eployed Software Unknown leaks; unexpected failures Online leak diagnosis helps Too late to help failing systems 8

Uncertainty in eployed Software Unknown leaks; unexpected failures Online leak diagnosis helps Too late to help failing systems lso tolerate leaks Uncertainty in eployed Software Unknown leaks; unexpected failures Online leak diagnosis helps Too late to help failing systems lso tolerate leaks Illusion of fix 9

Uncertainty in eployed Software Unknown leaks; unexpected failures Online leak diagnosis helps Too late to help failing systems lso tolerate leaks liminate bad effects Illusion of fix on t slow on t crash Uncertainty in eployed Software Unknown leaks; unexpected failures Online leak diagnosis helps Too late to help failing systems lso tolerate leaks Illusion of fix liminate bad effects Preserve semantics on t slow on t crash efer OOM errors 10

Predicting the uture ead objects not used again Highly stale objects likely leaked Reachable ead Live Predicting the uture ead objects not used again Highly stale objects likely leaked Reachable ead [hilimbi & Hauswirth 04] [Qin et al. 05] [ond & McKinley 06] Live 11

Tolerating Leaks with Melt Move highly stale objects to disk Much larger than memory Time & space proportional to live memory Preserve semantics Stale objects In-use objects Stale objects Sounds like Paging! Stale objects In-use objects Stale objects 12

Sounds like Paging! Paging insufficient for managed languages Need object granularity G s working set is all reachable objects Sounds like Paging! Paging insufficient for managed languages Need object granularity G s working set is all reachable objects ookmarking collection [Hertz et al. 05] 13

hallenge #1: How does Melt identify stale objects? hallenge #1: How does Melt identify stale objects? G: for all fields a.f a.f = 0x1; 14

hallenge #1: How does Melt identify stale objects? G: for all fields a.f a.f = 0x1; hallenge #1: How does Melt identify stale objects? G: for all fields a.f a.f = 0x1; pplication: b = a.f; if (b & 0x1) { b &= ~0x1; a.f = b; [atomic] } 15

hallenge #1: How does Melt identify stale objects? G: for all fields a.f a.f = 0x1; pplication: b = a.f; if (b & 0x1) { b &= ~0x1; a.f = b; [atomic] } dd 6% to application time hallenge #1: How does Melt identify stale objects? G: for all fields a.f a.f = 0x1; pplication: b = a.f; if (b & 0x1) { b &= ~0x1; a.f = b; [atomic] } 16

hallenge #1: How does Melt identify stale objects? G: for all fields a.f a.f = 0x1; pplication: b = a.f; if (b & 0x1) { b &= ~0x1; a.f = b; [atomic] } Stale Space Heap nearly full move stale objects to disk in-use space stale space 17

Stale Space in-use space stale space hallenge #2 How does Melt maintain pointers? in-use space stale space 18

Stub-Scion Scion Pairs scion space scion stub in-use space stale space Stub-Scion Scion Pairs scion table scion scion space scion stub in-use space stale space 19

Stub-Scion Scion Pairs scion scion scion? table space scion stub in-use space stale space Scion-Referenced Object ecomes Stale scion table scion scion space scion stub in-use space stale space 20

Scion-Referenced Object ecomes Stale scion table scion space stub in-use space stale space hallenge #3 What if program accesses highly stale object? scion table scion space stub in-use space stale space 21

pplication ccesses Stale Object scion table scion space b = a.f; if (b & 0x1) { b &= ~0x1; if (instalespace(b)) b = activate(b); stub a.f = b; [atomic] } in-use space stale space pplication ccesses Stale Object scion table scion scion space stub scion stub in-use space stale space 22

pplication ccesses Stale Object scion table scion scion space stub scion stub in-use space stale space pplication ccesses Stale Object scion table scion scion space stub scion stub in-use space stale space 23

Implementation Integrated into Jikes RVM 2.9.2 Works with any tracing collector valuation uses generational copying collector Implementation Integrated into Jikes RVM 2.9.2 Works with any tracing collector valuation uses generational copying collector 32-bit 2 G 64-bit 120 G 24

Implementation Integrated into Jikes RVM 2.9.2 Works with any tracing collector valuation uses generational copying collector mapping stub 32-bit 2 G 64-bit 120 G Performance valuation Methodology aapo, SPjbb2000, SPjvm98 ual-core Pentium 4 eterministic execution (replay) Results 6% overhead (read barriers) Stress test: still 6% overhead Speedups in tight heaps (reduced G workload) 25

Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) 26

Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) 27

Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) 28

Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) Tolerating Leaks Leak clipse iff clipse opy-paste JbbMod ListLeak SwapLeak MySQL elaunay Mesh ualleak SPjbb2000 Mckoi atabase Melt s effect Tolerates until 24-hr limit (1,000X longer) Tolerates until 24-hr limit (194X longer) Tolerates until 20-hr crash (19X longer) Tolerates until disk full (200X longer) Tolerates until disk full (1,000X longer) Some highly stale but in-use (74X longer) Short-running Heap growth is in-use (2X longer) Heap growth is mostly in-use (2X longer) Thread leak: extra support needed (2X longer) 29

clipse iff: Reachable Memory 256 Reachable memory (M) 192 128 64 0 0 200 400 600 800 1000 Iteration clipse iff: Reachable Memory 256 Reachable memory (M) 192 128 64 0 0 200 400 600 800 1000 Iteration 30

clipse iff: Performance 31

clipse iff: Performance clipse iff: Performance 32

Related Work Managed [LeakSurvivor, Tang et al. 08] [Panacea, Goldstein et al. 07, reitgand et al. 07] on t guarantee time & space proportional to live memory Native [yclic memory allocation, Nguyen & Rinard 07] [Plug, Novark et al. 08] ifferent challenges & opportunities Less coverage or change semantics Orthogonal persistence & distributed G arriers, swizzling, object faulting, stub-scion pairs 33

onclusion inding bugs before deployment is hard onclusion inding bugs before deployment is hard Online diagnosis helps developers Help users in meantime Tolerate leaks with Melt: illusion of fix Stale objects 34

onclusion inding bugs before deployment is hard Online diagnosis helps developers Help users in meantime Tolerate leaks with Melt: illusion of fix Time & space proportional to live memory Preserve semantics onclusion inding bugs before deployment is hard Online diagnosis helps developers Help users in meantime Tolerate leaks with Melt: illusion of fix Time & space proportional to live memory Preserve semantics uys developers time to fix leaks Thank you! 35

ackup Triggering Melt xpected heap fullness Start INTIV Unexpected heap fullness MRK STL Heap full or nearly full MOV & MRK STL Heap not nearly full Heap full or nearly full fter marking WIT Heap not nearly full ack 36

onclusion inding bugs before deployment is hard Online diagnosis helps developers To help users in meantime, tolerate bugs Tolerate leaks with Melt: illusion of fix Stale objects Related Work: Tolerating ugs Nondeterministic errors [tom-id] [iehard] [Grace] [Rx] Memory corruption: perturb layout oncurrency bugs: perturb scheduling General bugs Ignore failing operations [O] Need higher level, more proactive approaches 37

Melt s G Overhead 2.5 ase Marking (every G) Melt (every G) Normalized G time 2 1.5 1 0.5 0 1 1.5 2 2.5 3 3.5 4 4.5 5 Minimum heap multiplier ack 38