Memory Leak Detection with Context Awareness

Size: px

Start display at page:

Download "Memory Leak Detection with Context Awareness"

Margaret Christine Dorsey
5 years ago
Views:

1 Memory Leak Detection with Context Awareness Woosup Lim Quality Assurance Division Altibase Ltd. Seoul , Korea Seongsoo Park Department of Computer Engineering Sungkyunkwan University Suwon , Korea Hwansoo Han Department of Computer Engineering Sungkyunkwan University Suwon , Korea ABSTRACT Embedded applications with a long running life time particularly require a high degree of reliability. Many types of weaknesses residing in software can reduce the reliability, but memory leaks are prominent sources of software weaknesses for long running applications. As memory leaks are typically cumbersome and illusive, finding their sources demands programmers to make a huge effort even with fairly automated memory leak detection tools. Recently, dynamic detectors with light overheads have been emerged. They use sampling-based techniques to reduce overheads. According to the frequencies of code executions and data accesses, the memory monitor adaptively controls the sampling periods. The accuracies of existing sampling techniques are, however, unsatisfactory in some cases. In this paper, we present a more accurate memory leak detection technique, which takes advantage of context information. Our memory leak detector, which is also based on data sampling, adopts a notion of context (or call path) to sort out dynamically allocated memories and more accurately tracks the sources of memory leaks in the source code. Our experiments with SPEC CINT2000 benchmarks show our technique finds more memory leaks by up to 72% with comparable overheads to the existing data sampling technique. Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors Memory management; D.2.5 [Software Engineering]: Testing and Debugging Tracing General Terms Reliability, Management, Languages. The work was done while he was a graduate student at Sungkyunkwan University. Corresponding author: hhan@skku.edu Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. RACS 12 October 23-26, 2012, San Antonio, TX, USA. Copyright 2012 ACM /12/10...$ Keywords Memory leak detection, context-awareness, dynamic sampling. 1. INTRODUCTION Detecting and fixing faults in a program is a difficult work for programmers, as it takes an increasingly large portion of time and demands a meticulous efforts during the software development cycle. To build a reliable applications, programmers have to try their best to eliminate faults in the program, since even small software faults can lead to critical system failures. Memory leaks are well-known sources of software faults. Detecting memory leaks are often very difficult. They are caused by delicate mistakes of programmers and are rarely detected by general purpose debugging tools, such as gdb. Programmers usually find these faults by using additional tools, such as Valgrind [7], which typically incurs high overheads and forces developers to spend many hours in testing large software products. To detect memory leaks efficiently, researchers investigated sampling techniques in memory leak detection. Particularly, the data sampling proposed by Novark and Zorn [4] avoids the use of the dynamic instrumentation. This results in quite a runtime overhead reduction, but this technique omits some information needed for memory leak detections and consequently fails to report some of memory leaks. In this paper, we design a novel technique called context-aware data sampling. While the traditional data sampling allocates memory objects depending on their allocation sites [4], we allocate memory objects according to their contexts (callpaths of the allocation sites). With this extension, we can more accurately locate objects with potentially similar behaviors in the same page and minimize the loss of information needed for memory leak detection. To experimentally verify our technique, we implemented a memory leak detector applied to SPEC CINT2000 benchmarks [6]. According to our experimental results, our memory leak detector finds more memory leaks by up to about 2.5% than the existing data sampling technique. In addition, we also find that the size of inactive page list makes a rather large impact on the accuracy. The more inactive pages we can accommodate, the more accurate detections we can report. Thus, we add a knob to allow unlimited numbers of inactive pages and find that 72% more memory leaks can be detected than the traditional data sampling with a slight increase in execution time. The two main contributions of this paper are: Our technique enhances the accuracy of the staleness 276

2 information. In our allocation routine, we separately place objects according to their contexts. This is a key to increase the accuracy even if we are using the sampling technique to collect data access information. Our technique allows programmers to find more detailed information on where the leaking objects are originated from. Since we maintain the call path of each allocated object not just allocation site, the information can deliver more exact context where memory leaks occur. In Section 2, we discuss the background on memory leak detection techniques. In Section 3, we present our contextaware memory leak detection technique in detail. Finally, we present our experimental results and conclude our paper. 2. BACKGROUND 2.1 Code Sampling Bursty tracing [1] alternatively executes between checking code and instrumented code as shown in Figure 1. Checking code and instrumented code basically have the same functionality, but the former just counts how many times the procedure is executed and the latter collects the memory access information through the instrumentation. When the procedure enters from the outside or iterates repeatedly, it checks the previous sampling history and decides which code (checking code or instrumented code) should be executed. Since it runs the instrumented code with the predetermined sampling rate, bursty tracing shows lower overheads than always running instrumented code. Code sampling [3] is based on bursty tracing, but extends it by determining the sampling rates with the consideration of execution frequencies. If a procedure is executed frequently enough, we can sample the memory access information with a low sampling rate within the procedure. On the other hand, if the procedure is executed less frequently, we need to use a high sampling rate within the procedure. By adaptively controlling the sampling rate per procedure, the code sampling can achieve the even sampling rates across all procedures regardless of the execution frequencies of the procedures. Code sampling has a potential drawback to report many false positives. For example, when a procedure is sampled with 1% sampling rate, its instrumented code is executed once out of 100 runs. If a memory object is accessed mostly in the non-instrumented runs and not accessed in the instrumented run, code sampling will conclude that the memory object is not accessed at all. Thus, this is reported as a memory leak which is actually a false positive. 2.2 Data Sampling Data sampling [4] can partially remedy the problem of code sampling. Data sampling does not use dynamic instrumentation that incurs high overheads as in code sampling. Instead, it uses the mprotect system call provided by the operating systems and collects the memory access information via page fault mechanism. Page fault handlers surely have some overheads but they are much less than dynamic instrumentation. One drawback of data sampling is that it detects memory leaks in a rather large granularity. Since it uses the page fault mechanism to detect memory accesses, all the memory objects within one page are reported as a whole Figure 1: Code sampling and bursty tracing whether they are memory leaks or not. To enhance the accuracy of detection, the objects with similar access patterns should be enclosed in the same page. Since the memory allocated by the same allocation site has a similar life and action [10], data sampling allocates objects from the same allocation site at the same page. Still, leaking objects and normal objects can be intermingled into one page. Further enhancement in data sampling is to use temporal similarity among objects. The objects created around a similar time are allocated at the same page. The key insight of data sampling is that leaking objects are, by definition, not reclaimed for a long period of time without accesses. As a program proceeds its execution, leaking objects become older and older (staler and staler), meaning the time continues to elapse since the last access. To reflect the aging phenomenon of leaking objects, data sampling inserts the page with all slots full into the aging queue. The pages in the aging queue are divided into two groups: active list and inactive list. When a page becomes a part of an inactive list, this page is protected for read/write operations and tracked for the staleness information. When a page in the inactive list is accessed, the SIGSEGV signal for this page is delivered. In the page fault handler, this page is put back into the active list. Data sampling has two drawbacks. First, distinguishing objects only with their allocation sites does not provide enough accuracy to separate objects with similar access patterns. In SPEC CINT2000 benchmarks, there are many programs whose number of allocation sites are less than or equal to two. In such cases, the basic assumption that the objects allocated by the same allocation site have similar life and action can become incorrect. Second, excessive restriction on the size of inactive list can incur a large number of false negatives. With a limited number of inactive pages, data sampling only reports memory leaks for the objects within the inactive list. As a consequence, many objects outside the inactive list are not monitored properly. According to our experiments, the number of detected memory leaks are about 72% less than with the unlimited active list. 3. CONTEXT-AWARE MEMORY LEAK DE- TECTION 3.1 Allocation and deallocation call paths 277

3 (a) Source of Example (b) A Callgraph of Example Figure 2: Example of allocation call paths and deallocation call paths In small programs, malloc and free are directly called for allocation and deallocation, but in large-scale, well-structured programs, malloc and free are called through several levels of wrapping functions as shown in Figure 2(a). In such cases, the path where an allocation takes place and the path where a deallocation takes place exist separately. We will call them allocation call path and deallocation call path, respectively. For example, Figure 2(b) has an allocation call path starting from Build_tbl_array to Build_table, and finally to malloc. It also has an deallocation call path starting from Free_tbl_array to Free_table, and finally to free. Particularly, if a common data structure is used as in Figure 2, this pattern is more noticeable. The structure of this program is more prone to memory leaks around Free_tbl_array function than around free function. Missing free in Free_table is highly unlikely. On the other hand, the developer of Task1 function is more likely to omit, by mistake, the call of Free_tbl_array function at the end. Task1 function is the place where allocation call paths and deallocation call paths diverge for the first time in the call graph. Thus, this is the function where memory leaks are highly likely to occur. If we consider such a case, reporting the allocation sites, which is the very end of the allocation call path, is not enough to tell the actual memory leak situation. The allocation call paths we report are more exact information on how memory leaks are occurred. To achieve this in our detector, we separately allocate objects in different pages according to the allocation call paths. By adopting call paths, the accuracy of our memory leak detector becomes better, as leak objects and normal objects tend to be separated into different pages even if they share the same allocation site. 3.2 Allocation-call-path segregated heap According to a previous study [10], objects allocated from the same allocation site show similar life time and behaviors. That means the objects allocated from the same allocation site will have a similar staleness information. Thus, the staleness information on objects is believed to be maintained and tracked by the page granularity. However, as shown in the previous section, a heterogeneous page occurs. A heterogenous page is the page which contains objects with different life time, behavior and, as a consequence, different staleness information. To handle this problem, we consider the contexts with allocation call paths. If allocation call paths are considered in building the heap for dynamically allocated data, memory objects in the same page will have more similar life time and behavior, and the accuracy of staleness information, which is maintained per page basis, will be much higher. If we build a segregated heap where objects with the same allocation call path are allocated to the same page, we can obtain two advantages as follows. First, more accurate staleness information is given. Techniques, which rely on staleness to determine memory leaks, implement various mechanisms to collect staleness information. Ultimately, accurate staleness information in such techniques leads to more accurate memory leak detection. Our memory allocation policy based on allocation call paths tends to allocate leaking objects and normal objects to different pages with more accuracy. Thus, the objects in a page are more homogeneous in their behaviors and the staleness information for the page are more accurate for all objects inside the page. Second, more detailed report is provided. When reporting leaks, data sampling reports only the allocation site. As seen in Section 2.2, a program, where allocation sites are not diverse, requires more context information than only the allocation site. Since our memory allocation policy differentiates call paths to a certain level, it can provide detailed call path information without much additional overhead. In addition, when additional memory is used, we can report entire call paths. This helps programmers find the exact points of the source code where memory leaks occur. 3.3 Depths of call paths When we separate memory objects according to allocation call paths, we need to decide an appropriate depth of call paths that can distinguish diverse enough contexts. In Figure 2, Task1 is the function where an allocation call path and a deallocation call path begin to diverge. To approximately recognize such functions in large, complex applications, we inspect source code to count the numbers of allocation sites where malloc functions are directly called. In the call graph, 278

4 Table 1: Numbers of allocation sites and call sites alloc sites call sites level 1 level 2 level 3 level 4 level 5 gzip vpr gcc mcf crafty parser gap vortex bzip twolf we assume the allocation sites are level 1 and the function call sites that invoke the functions containing level 1 sites are level 2. We continually increase the levels of call sites in a similar way. From the level 2 and above, the number of call sites are the maximum number among the numbers of different call sites that call the same function at the right below level. Table 1 summarizes the count of call sites at each level for SPEC CINT2000 benchmarks [6]. For parser, 76 malloc allocation sites exist and in this benchmark 76 functions contain one allocation site each. One function out of 76 are called 6 different call sites and this is the maximum, which means the other 75 functions are called at 6 or below call sites. By inspecting upwards from the level 1, we find the level where the number of call sites increases sharply and decreases again sharply. By containing this level in the call paths, we can distinguish diverse contexts if exist. Four programs in Table 1 vpr, parser, vortex, andtwolf display such behaviors. If we make the depth of the call path set to three, we can include functions with diverse contexts. Some programs may not display such patterns. As for gcc, the number of call sites sharply increases at level 2 and 5. This implies we may need longer call paths to distinguish enough contexts. Actually, one function at level 4 is called 186 different call sites at level 5 and the rest of the level 4 functions are called 2 11 different call sites at level 5. If we increase the depth of call paths, the space requirement of our technique increase as much. Thus, we decide to set the depth to three which at least capture the diversity at level 2 for gcc. As for gzip, mcf, crafty, gap, and bzip2, not much diversity can be found through the all levels. These programs use malloc to allocate a large memory space and manage the memory within the allocated space in a regimented fashion. Otherwise, not many dynamic memory allocation activities exist by its nature. 3.4 Tracking staleness Figure 3 shows a state transition for a page from its creation to inactive and active status, and a state determined to leak status. Pages in active state are deleted from the active list and inserted into the inactive list every predetermined period of time. Pages in the inactive list are protected from accesses by using mprotect. Pages in inactive state stay in the list and checked whether they are continually used. If we access these pages, SIGSEGV signals arise and their states are changed to active by inserting them into the active list in the fault handler. Meanwhile, pages in the inactive list are periodically tracked their staleness. If they are not accessed for a long period of time (a predetermined staleness threshold time), these pages are determined to be Figure 3: State transition of a page in leak states and reported to programmers. The original data sampling has a limit on the number of inactive pages to regulate the overhead within a certain level. We allow unlimited number of inactive pages by putting all pages into inactive list periodically. This will cause a slightly more overhead, but can detect more memory leaks. According to our experiment, the overhead was comparable and 78% more memory leaks were found than the original data sampling. 3.5 Reporting memory leaks It is important to report accurate information on memory leaks so that a programmer can find and correct them. As discussed previously, reporting only the allocation site of each memory leak is sometimes not helpful at all. Instead, reporting the call path of each memory leak is much more informative. A programmer can more easily check if memory allocation and deallocation are correctly done in the given call path. This will increase the productivity to fix memory leaks. If the states of pages become leak, objects inside those pages are immediately reported as leaking objects. Additionally, at the end of the program execution, all the pages which contain un-freed memory objects are reported with their final states. Pages in leak state are already reported as leaking objects. Pages in inactive state or active state may contain leaking objects, but not reported which are false negatives. Since we sample data accesses by the page, an access to an object is interpreted as accesses to all the objects in the page. Thus, some objects can be never accessed but not reported as leaking objects, as another object in the same page is accessed. 4. EXPERIMENTS We use SPEC CINT2000 benchmarks to evaluate the performance of DeLeak, our memory leak detection tool. We measure execution times, memory consumptions, and precision and compare with the existing data sampling technique [4]. Additionally, we analyze the impact of the unlimited size of inactive list and explore various staleness parameters. We perform all our experiment on a 2.2GHz dual-core CPU with 3GB of DRAM memory. The reported numbers are out of three runs; median for execution times, max value for memory consumption, and average for precision. 4.1 Performance of DeLeak We measure the execution time and the precision of both 279

5 Figure 4: Runtime overheads of sampling techniques Figure 5: Effect of staleness threshold in DeLeak Table 2: Number of removed deallocation calls gzip vpr gcc vortex twolf level level the original data sampling and DeLeak. For both, we allow them to control the size of inactive list. If the size is not controlled, we add unlimited to its name. Otherwise, limited is added. In the limited versions of both, the sizes are adaptively controlled depending on the estimated overhead [4]. In current implementation, virtual compaction [4] is not applied to purely focus on the impact of heap organization and the parameters in staleness policies. The staleness threshold time is set to 1 second. This means pages in the inactive list are reported as leak state if they are not accessed for 1 second. The active page inspection period is set to 0.2 second. That is all pages in active state are put back to the inactive list every 0.2 second and tracked for the data accesses. Figure 4 shows the normalized execution times to the original programs. For some programs, the execution times are reduced even with the data sampling overheads. Since the heap allocation is altered to have per-allocation-site pages or per-allocationcall-path pages, pool allocation effects help increase the performance. Except vortex, most programs show nearly little overheads. As for vortex, overheads up to 42% occur due to excessive page faults. Particularly, unlimited inactive lists lead to larger overheads, since all pages in inactive list will get page faults when accessed. However, unlimited inactive lists help increase the precision of leak detection. In the next section, we will evaluate the impact of various inspection periods. We can trade-off the overhead of page faults with the accuracy of memory leak detection. Segregated heaps are organized to have pages per allocationsite (data sampling) or per allocation-call-path (DeLeak). Compared to the original memory allocator, some pages may not be fully utilized due to internal fragmentations. According to our measure, vortex and twolf show 23% 47% space overheads. For vortex, data sampling shows bigger overhead. For twolf, DeLeak shows bigger overhead. As for the other programs, space overheads are small enough for both schemes. Since DeLeak disperses objects into more pages according to the allocation call paths, it shows slightly bigger overhead but within a fairly small difference. 4.2 Staleness policy evaluation We inject leaks into the benchmarks by randomly remov- Table 3: Comparison of detected true positives #leaking unlimited limited objects D.S. DeLeak D.S. DeLeak gzip 419,724 79% 81% 14% 27% vortex 4,622,332 92% 94% 4.7% 4.3% ing several deallocation calls at as high levels as possible and make sure applications still run without crash. The numbers of removed deallocation calls are summarized in Table 2. Four programs, mcf, crafty, parser, and bzip2, are excluded in this leak insertion experiment. As the appropriate depths for allocation call paths are level 1 for those programs, our DeLeak is virtually the same as the data sampling. This is why we exclude four benchmarks. The staleness threshold time can affect the memory leak decision, as pages not accessed for the threshold time are reported as leak. Figure 5 shows the percentages of detected memory leaks (true positives) among total actual memory leaks. The percentages of detected leaks generally decrease for long threshold times, though they really depend on the memory usage characteristics. As the threshold time increases, memory objects, which are not accessed near the end of programs, tend to be determined non-leaking objects. The inspection period, which is the interval for putting active pages back to the inactive list, also affects the sampling rates for memory accesses and the overhead of our scheme. The shorter period will result in the more accurate detections, but with the higher overheads. Figure 6(a) shows the changes in execution overheads. By increasing the inspection period, we can reduce the overhead, particularly for vortex. Figure 6(b) shows the percentage of detected memory leaks among the total actual memory leaks, as the inspection period varies. In general, a shorter period should give more accurate results, but the results of gcc are different. Since we detect memory leaks by the unit of page, this still can interfere with the results. To compare the accuracy of DeLeak with the original data sampling, we select two benchmarks, gzip and vortex. For the two, we can distinguish true positives and false positives with data sampling and DeLeak. As the other benchmarks keep too many memory objects allocated until the end of the execution, it is hard to distinguish false positives in data sampling by only investigating allocation sites. Table 3 shows the percentages of detected true positives among the actual memory leak objects for data sampling (D.S.) and our proposed scheme (DeLeak). We measure the percent- 280

6 (a) Normalized execution times (b) Percentage of true positives Figure 6: Effect of active page inspection period in DeLeak ages with unlimited inactive list and with limited inactive list for both schemes. Under the same size policy for inactive list, DeLeak detects 2.5% more memory leaks than data sampling. Since the original data sampling uses the limited size of inactive list, unlimited DeLeak detects 72% more memory leaks than the original, limited data sampling. The percentages of false positives are zero for gzip and nearly zero for vortext for both data sampling and DeLeak. 5. RELATED WORK There are many techniques that detect memory leaks efficiently. Memcheck, a Valgrind based tool, traces unreachable memory objects from monitored pointers, and reports these as memory leaks [7]. This approach can miss memory leaks for the objects that are reachable just as in garbage collection. Another approach is a staleness-based technique. Stale memory objects, which are not accessed for a long time, are reported as memory leaks. Instrumentation of memory accesses are typically required for this approach. To reduce the overheads of instrumentation, sampling techniques are adopted for code sampling [3] and data sampling [4]. Our technique belongs to this category, too. Static analysis is yet another approach to detect memory leaks. Clouseau, a leak detection tool, detects memory leaks by finding violation of ownership constraints [5]. Escape analysis is extended to find memory leaks [11]. By building a program graph from allocations site to deallocation site, memory leaks are analyzed through flows [9]. The last approach is making software leak-tolerant. Even if a program has memory leaks, it never fails during the execution. Cyclic memory allocation prevents memory usage from increasing by preallocating a cyclic buffer per allocation site [8]. Melt [2] and LeakSurvivor [12] isolate and compress stale objects, thereby minimizing the impact of memory leaks. 6. CONCLUSION This paper presents context-aware data sampling. Our key contributions are allocation-call-path segregated heap and analyzing impact on limitation inactive list. We segregate leaking objects from normal objects more precisely by using allocation call path instead of allocation site. Our technique detects much more memory leaks within reasonable overheads by allowing unlimited size of the inactive list. DeLeak, our memory leak detector, is implemented and tested on SPEC CINT2000 benchmarks. We believe our tool can be easily deployed to many server applications and help to increase their reliability. Acknowledgement This research was supported by Korean government (MKE & MCST) under the industry technology development grant ( , SmartTV 2.0 Software Platform) and the research grant from Korea Copyright Commission in REFERENCES [1] M. Arnold and B. Ryder. A framework for reducing the cost of instrumented code. In Proccedings of PLDI 01. ACM, [2] M. D. Bond and K. S. McKinley. Tolerating memory leaks. In Proceedings of OOPSLA 08. ACM, [3] T. M. Chilimbi and M. Hauswirth. Low-overhead memory leak detection using adaptive statistical profiling. In Proceedings of ASPLOS 04. ACM, [4] E. D. B. Gene Novark and B. G. Zorn. Efficiently and precisely locating memory leaks and bloat. In Proceedings of PLDI 09. ACM, [5] D. L. Heine and M. S. Lam. A practical flow-sensitive and context-sensitive c and c++ memory leak detector. In Proceedings of PLDI 03. ACM, [6] J. L. Henning. SPEC CPU2000: Measuring CPU performance in the new millennium. IEEE Computer, pages 28 35, July [7] N. Nethercote and J. Seward. Valgrind: A framework for heavyweight dynamic binary instrumentation. In Proceedings of PLDI 07. ACM, [8] H. H. Nguyen and M. Rinard. Detecting and eliminating memory leaks using cyclic memory allocation. In Proceedings of ISMM 07. ACM, [9] L. P. S. Cherem and R. Rugina. Practical memory leak detection using guarded value-flow analysis. In Proceedings of PLDI 07. ACM, [10] M. L. Seidl and B. G. Zorn. Segregating heap objects by reference behavior and lifetime. In Proceedings of ASPLOS 98. ACM, [11] Y. Xie and A. Aiken. Context- and path-sensitive memory leak detection. In Proceedings of ESEC/FSE 05. ACM, [12] Q. G. Y. Tang and F. Qin. Leaksurvivor: Towards safely tolerating memory leaks for garbage-collected languages. In Proceedings of USENIX 08,

I J C S I E International Science Press

Vol. 5, No. 2, December 2014, pp. 53-56 I J C S I E International Science Press Tolerating Memory Leaks at Runtime JITENDER SINGH YADAV, MOHIT YADAV, KIRTI AZAD AND JANPREET SINGH JOLLY CSE B-tech 4th