Adaptive Analysis for Precise and Efficient Error Checking

Size: px

Start display at page:

Download "Adaptive Analysis for Precise and Efficient Error Checking"

Arleen Ryan
5 years ago
Views:

1 Adaptive Analysis for Precise and Efficient Error Checking Samuel Z. Guyer Dept. of Computer Sciences The University of Texas at Austin Austin, TX Calvin Lin Dept. of Computer Sciences The University of Texas at Austin Austin, TX In this paper we study the effects of analysis precision on automated error detection. We measure the impact of flow-sensitivity, context-sensitivity, and different types of pointer analysis on the accuracy of several error checking problems. We find that different problems can require vastly different levels of precision, so no fixed degree of precision is ideal for all cases. To solve this problem, we introduce an adaptive dataflow analysis algorithm that adjusts its level of precision at a fine grain to match the needs of the particular program and problem. We evaluate our algorithm on 14 C programs and find that it typically matches the accuracy of the most precise fixed-precision policy, at a small fraction of the cost. 1. INTRODUCTION Automatic program checking is an increasingly important method of finding bugs and detecting security vulnerabilities. Recent work in this area has used traditional tools, such as data-flow analysis and extended type systems, to systematically detect specific program properties, which are described by some lightweight specification. Examples include checking for deadlock caused by double locking [6, 8], making sure file handles are open when accessed [8, 5], and tracking tainted data to detect format string vulnerabilities [18]. Existing approaches differ not in the types of errors that they find, but in the precision, scope, and scalability of their analyses. We can characterize these approaches by their precision policy, which includes their choice of flow sensitivity, context sensitivity, and the precision of their pointer analysis. Each precision policy makes a difficult tradeoff between precision and scalability. However, no previous work has studied the efficacy of different precision policies on different program checking problems. In fact, a common way to present a new error checking algorithm is to apply it to just one or two error checking problems. This begs the question, how much analysis power is actually needed to effectively detect a particular error? This work is supported by DARPA grant F C-1892, NSF CAREER Award ACI , and DARPA contract NBCHC Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Submitted to PLDI 2003 San Diego, CA Copyright 2002 ACM X-XXXXX-XX-X/XX/XX...$5.00. The first goal of this paper is to understand the impact of commonly used precision policies on the cost of error detection and on the accuracy of the results. We present six different error checking problems, covering a range of complexity, and formulate them as data-flow analysis problems. Using a configurable data-flow analysis framework, we check our benchmark programs for each error under sixteen different precision policies. In total, we present results for approximately 1400 combinations of error problems, precision policies, and programs. Throughout this paper we distinguish between precision, which refers to the level of detail of the analysis, and accuracy, which measures the quality of the results. We introduce a new metric for accuracy that allows us to compare different precision policies and different error checking problems. As in previous work, the error checking problems presented here require some form of pointer analysis for both accuracy and soundness. Unfortunately, precise pointer analysis is often more expensive than the error checking analysis itself, and pointer analysis is not amenable to many previous techniques that reduce the cost of the error analysis [5]. Thus, a subsidiary goal of this paper is to measure the effect of precise pointer analysis on error checking, independent of the error analysis itself. Our results show that no single fixed-policy analysis is suitable for all error checking problems and all programs. Some problems, such as detecting format string vulnerabilities, are easy: we typically obtain the best accuracy even at the lowest level of precision. Other problems are harder: the more precision we use, the better the results. We are thus left with a quandary. If we opt for high precision, then the compiler wastes time over-analyzing the easy problems; if we opt for low precision, then the compiler cannot produce accurate results for the hard problems. To address this problem, we present a new adaptive analysis algorithm that monitors accuracy and dynamically adjusts the precision of the analysis to meet the particular needs of the problem and program. The algorithm exploits a unique feature of our implementation: the ability to control precision at a very fine granularity. For example, individual procedures can be analyzed contextsensitively, and individual variables can be analyzed flow-sensitively. We show that our adaptive algorithm effectively discovers the appropriate amount of precision needed to produce accurate results. For easy problems, it adds very little precision and produces an answer quickly. For hard problems, it adds more precision and requires more time and space. Even for the hardest problems, our adaptive algorithm is less expensive than a fixed high-precision policy because it manages to find parts of the program for which precision is not needed. This paper makes the following contributions: We present a comprehensive study of the effects of analysis

2 precision on the cost and accuracy of error detection, as applied to a set of real C programs. In particular, we focus on commonly used precision policies, such as flow-sensitivity and context-sensitivity, and on the transitive effects of precise pointer analysis. We find that no single fixed-precision analysis is ideal for all problems and programs. We present a new adaptive analysis algorithm that automatically adjusts precision as necessary. The algorithm starts with a very fast, imprecise analysis and adds precision only where it is needed to improve the accuracy of the results. We evaluate our adaptive analysis algorithm, showing that it can often achieve all the accuracy of the most precise analysis (flow-sensitive and context-sensitive) at a tiny fraction of the cost. This paper is organized as follows. Section 2 presents related work. In Section 3 we describe key aspects of the compiler that we use in our study, and in Section 4 we summarize the error checking problems that we will study. Section 5 presents our empirical study of the impact of analysis precision on error checking, and Section 6 describes and evaluates our adaptive algorithm. Finally, we conclude in Section RELATED WORK There is considerable prior work in both error checking and in evaluating the precision of program analysis. 2.1 Error detection Most of the recent work on automatic error checking, including ours, builds on the notion of typestate analysis introduced by Strom and Yemini [21]. These systems differ primarily in the kind of program analysis they use to derive the typestate information, and research has focused primarily on improving the performance and accuracy of the typestate analysis engine. However, one of the major challenges in checking C programs is constructing a precise enough model of the store to support accurate error checking. Unfortunately, many of the techniques used to speed up typestate analysis do not work for pointer analysis. Previous work has generally settled for a low-cost, fixed-policy pointer analysis that provides minimal store information without overwhelming the error checking analysis. This analysis by itself is often inadequate, requiring manual intervention to disambiguate memory locations [5]. Two recent papers have focused on using type systems to check for programming errors. Shankar et al. present a system for detecting format string vulnerabilities using type inference [18]. In this approach, two new type qualifiers, tainted and untainted, are introduced to the C language and added to the signatures of the standard C library functions. Type inference is performed by an extensible type qualifier framework, which derives a consistent assignment of these type qualifiers to string variables. Errors are reported as type conflicts. While this system is extremely efficient, it can produce a large number of false positives because the precision of the analysis is low. Initially, we believed that the inaccuracy was due to flow-insensitivity. However, our experiments show that flow-sensitivity is not necessary for the programs presented in their paper. Instead, the high false-positive rate is due to the equalitybased constraint solver, which allows information to propagate the wrong way through assignments; in particular, from formal parameters back to actual parameters. In several cases the authors address this problem by manually adding context-sensitivity ( polymorphism in the type-system terminology). Foster et al. extend Shankar s work to flow-sensitive type qualifiers, which they use to check the state of file handles and detect double-locking bugs [8]. Unfortunately, imprecision in the store model and equality-based constraints continue to hamper this approach. The authors add two features to their system to help improve accuracy. First, they introduce a new keyword restrict, which they add to the application code to disambiguate memory locations. Second, they add a very limited local form of pathsensitivity to handle the failure case of fopen() (when it returns a null pointer.) Our experiments show that these two features are as important, if not more important than the flow-sensitivity itself. In addition, the system generates many spurious errors, apparently due to the lack of parametric polymorphism for store locations. However, our experiments show that context-sensitivity has very little effect on these error checking problems: procedures are rarely called with open files in one context and closed files in another. The ESP system implements a path-sensitive variation of typestate analysis that significantly improves precision [5]. In particular, it uses a theorem prover to detect correlations between different branches, and it eliminates many paths that cannot actually occur. The implementation runs in polynomial time because it uses a dataflow analysis algorithm due to Reps et al. [16] that can efficiently summarize any dataflow problem that falls into a certain class of problems. However, this class does not include pointer analysis or constant propagation (with constant folding.) Therefore, the authors add a fast flow-insensitive, context-insensitive pointer analyzer as a front-end [4]. Unfortunately, the resulting store model is not precise enough to allow verification, and the authors must manually clone two procedures in order to disambiguate memory locations. Our algorithm, while not as powerful, detects these situations and automatically makes the procedures context-sensitive. The MC system checks for errors in operating system code using programmer-written checkers based on state machines [6]. A checker consists of a set of states and a set of syntax patterns that trigger transitions on the state machine. The compiler pushes the state machine down each path in the program and reports any error states that it encounters. While this approach has proven quite successful in finding errors, it has limitations. In particular, since the analysis is syntax-driven, the compiler lacks deep information about the program semantics, such as dataflow dependences and pointer relationships. In fact, the system is not sound: it can produce false negatives in which programs that have errors are reported as bug-free. The SLAM Toolkit approach is similar to MC but is more rigorous and more powerful [2]. SLAM includes a pointer analyzer and can check programs interprocedurally. The toolkit first generates an abstraction of the program that represents its behavior only with respect to the properties of interest. It then uses a model checker to perform path-sensitive analysis on the abstracted program. However, this system still uses a fixed-policy analysis to generate the initial program abstraction, including the store model. While our analysis engine is not as powerful, we allow the error checking problems themselves to dictate the precision of the store model. The SLAM approach could be combined with our technique to improve the initial abstraction or to control analysis precision during the iterative refinement process itself. 2.2 Analysis precision Many papers have compared the precision of different pointer analysis algorithms [7, 17, 20, 14]. The conclusions generally agree with our findings: precise pointer analysis is extremely expensive, but it does produce better results. Previous work has also measured the transitive effects of pointer analysis precision on op-

3 timization and on other analysis problems, such as program slicing [19]. Our work differs in two important ways. First, we include measurements of more expensive precision policies, such as combined flow-sensitivity and context-sensitivity. Second, we go beyond simply measuring the transitive effects; we feed them back into the analyzer to drive its precision. Interestingly, Stocks, et al, [20] mention the possibility of using different precision on different parts of the program. Demand-driven program analysis reduces the cost of analysis by computing only part of the analysis problem [15]. Demanddriven pointer analysis computes just enough information to determine the possible aliases of a given set of variables [12]. Our adaptive algorithm always computes the entire analysis problem, but increases precision only in the areas of the program that affect specific variables. We believe that our approach is complementary to the demand-driven approach: demand driven algorithms needs a specific problem, like error checking, to determine which variables are important. Our work is also related to lazy abstraction [13] which iteratively refines the abstraction of program based on the analysis results. 3. BACKGROUND We use the Broadway compiler [10] to perform the experiments for this research. The Broadway system integrates domain-specific library information into the compilation process. It consists of an annotation language for capturing domain-specific information and a configurable compiler that uses the annotations to perform library specific program analysis and optimization. The annotations describe only the library routines, not the application code. A key feature of the annotation language is that it allows the user to define new dataflow analyses. We use this feature to cast error detection problems as dataflow analyses. 3.1 Analysis engine Our analysis framework supports interprocedural, context-sensitive and flow-sensitive dataflow analysis problems. The engine has a built-in pointer analyzer, which is based on the storage shape graph of Chase et al. [3] and includes many of the improvements described by Wilson [24]. However, we do not compute procedure summaries because it significantly complicates the implementation, and because it does not provide a technique for summarizing both the pointer analysis and arbitrary user-defined analysis problems. All objects in the program, including surface variables and heap allocated memory, are represented by a node in the storage shape graph. We also include structs, unions, and their fields as separate nodes. Heap objects are identified by their context-sensitive allocation site, which helps maintain precision even when allocation occurs in reusable wrapper functions. Each array has its own node, but all the elements of the array are represented by a single node. The framework builds interprocedural factored use-def chains for the whole program. Every assignment to an object generates a definition that is labeled with the program location at which it occurs. In addition, each assignment generates special merge nodes at its dominance frontier just like an SSA function. Keeping the merge points separate from the code is simpler than using one of the special SSA forms developed for pointers. When we need the value of the object, we use an interprocedural dominance test to find the nearest reaching definition. The analyzer uses the factored def-use chains to perform sparse conditional dataflow analysis. Dataflow facts about an object are associated with each definition: for pointer analysis, each definition has a points-to set; for the user-defined analyses, each definition has an associated value from the lattice. The framework also includes constant propagation and constant folding. The analyzer starts at the main function and processes the program essentially in execution order. Within a procedure it uses a traditional worklist algorithm for the basic blocks. At a procedure call, it jumps immediately to the called procedure. We handle recursive procedures by forcing them be context-insensitive (see Section 5). Our analyzer is sound up to the safe features of C, but we do not handle certain kinds of pointer arithmetic. For example, we allow pointers to individual structure fields, but we cannot handle code that moves the pointer between fields. However, we do handle a number of difficult features of C and the C standard library: We properly analyze procedures that take and manipulate variable argument lists. We maintain separate nodes in the storage shape graph for each instance of a struct or union and it fields. The feature improves precision for complex data structures with multiple levels of pointers. Our annotation language provides a systematic way to describe library-specific behavior, so we properly model several library routines with internal state, such as strtok() and getenv(). 4. ERROR CHECKING PROBLEMS In this section we describe the six error checking problems that we use in our study. We focus on these problems for a number of reasons. First, we target realistic errors that actually occur in programs, and that could cause significant damage. Thus, we use the C standard library for our experiments because it is so widely used and controls access to almost all important system services. Second, we bring together a number of error checking problems that have only appeared in isolation in previous research. Finally, we compare different kinds of error checking problems to measure how difficult they are to detect and what analysis machinery is needed to detect them effectively. In particular, we experiment with both finite-state machine problems and information flow problems. We define these problems using the Broadway annotation language [11, 9]. Each problem consists of a set of possible states that the analyzer assigns to objects in the store model. For each relevant library routine, we specify how the routine affects the states of its arguments, including any objects accessed through pointers. During analysis, the compiler consults the annotations and applies the appropriate transfer function to the actual arguments at each library routine call site. Due to space limitations, we do not show the annotations here. 4.1 Detecting file access errors It is an error to access a file through a file stream or descriptor that is not open. We can model this by defining two states to describe files: the Open state and the Closed state. To track this state, we annotate fopen(), fclose(), and related functions, and at each read or write to the file, we check the associated file s state. One complication is that file handles are pointers, so we have to model the structure carefully. In particular, we must associate the file state with the internal object (IOHandle) rather than with the FILE pointer. Our annotations even allow us to handle file descriptors correctly we can analyze the use of fileno and fdopen(). 4.2 Tracking information flow

4 The concept of information flow is an important one in security. For example, can information from some private source, such as a local file, leak out to some external location, such as the network? Here, we use annotations to identify various kinds of data sources such as, files, sockets (internet, local), and environment variables and to associate them with specific variables. For example, we mark all file handles with the kinds of devices they access, and we mark data buffers according to the kinds of data they they contain. We then analyze the program to see where the associated data goes. For example, can the data from a local file go out on an internet socket? Our system allows us to analyze such information even when data are passed to various string and buffer manipulation routines, and we can handle difficult cases such as sprintf() and strtok(). 4.3 Tracking untrusted data Hostile clients can only manipulate programs through the various program inputs. We can approximate the extent of this control by tracking the input data and observing how it is used. Like the information flow analysis, this analysis labels data sources and sinks, and it associates the properties of the source with the data read from it. However the trust analysis has two distinguishing features. First, data is only as trustworthy as its least trustworthy source. For example, if the program reads both trusted and untrusted data into a single buffer, then we consider the whole buffer to be untrusted. Second, untrusted data has a domino effect on other data sources and sinks. For example, if the file name argument to fopen() is untrusted, then we treat all data read from that file descriptor as untrusted. For our analysis we model three levels of trust internal (trusted), locally trusted (for example, local files), and remote (untrusted). We generate an error message when untrusted data reaches certain sensitive routines, including any file access or manipulation, or reaches any program execution, such as exec(). 4.4 Detecting format string vulnerabilities A number of output functions in the C standard library, such as printf() and syslog(), take a format string argument that controls output formatting. The format string vulnerability (FSV) occurs when untrusted data ends up as part of the format string, and it is exploitable by stack-smashing. These vulnerabilities are a serious security problem that have been the subject of many CERT advisories. We detect these vulnerabilities using a limited form of the trust analysis. Following the terminology introduced in the Perl programming language [22] and by Shankar et. al. [18], we consider data to be tainted when it comes from an untrusted source. We track this data through the program to make sure that all format string arguments are untainted. Our system is the most accurate tool that we are aware of for detecting format string vulnerabilities [9]. 4.5 Combination: FSV exploitability FSV exploitability analysis combines taintedness analysis with trust analysis to determine whether an FSV can be exploited. For example, if the tainted data is still locally trusted (for example, from a local file), then the vulnerability is probably not remotely exploitable by hackers. Note that the domino effect of the trust analysis allows us to catch complex behavior, such as tricking a program into reading a different configuration file. 4.6 Combination: FTP -like behavior FTP-like behavior analysis combines trust analysis and information flow to determine whether a remote client can read or write arbitrary files. For example, we can detect FTP get behavior by looking for cases where data from an untrusted file (in which a remote client dictated the name of the file to be opened) is sent to a remote socket. Similarly, FTP put behavior occurs when data from a remote source is written to a file chosen by a remote source. Our analysis properly identifies the write() calls in the FTP daemon that perform these two functions. 5. MEASURING THE EFFECTS OF PRE- CISION In this section we study the impact of analysis precision on error checking. The study has several components. First, we measure the cost, in both time and memory, of several commonly used precision policies. By looking at a range of programs of different sizes we can evaluate the scalability of these policies. Second, we measure the accuracy of the analysis information produced by different levels of precision. Together with the cost information, these results allow us to quantify the tradeoff between cost and accuracy. In particular, we evaluate the transitive effects of pointer analysis precision on the error checking problems by varying their precision independently. Finally, we compare the precision requirements of different error checking problems by determining the cheapest precision policy for each problem that still yields the most accurate results. Our results show the following. Different problems have very different precision requirements. Precise pointer analysis is important in many cases. Increasing precision often has a dramatic effect on the cost of analysis. In particular, the most precise analysis mode is not scalable. The last bits of accuracy are expensive to obtain because they often require context sensitivity. Flow-sensitivity and context-sensitivity conspire to drive up analysis time. 5.1 Precision policies We focus on three aspects of precision: flow-sensitivity, contextsensitivity, and the representation of objects in memory. There are other dimensions to consider, but these three are commonly used to characterize analysis algorithms. In addition, we fix certain features of the analysis for all the experiments. We always perform interprocedural analysis because our error checking problems almost always span multiple procedures. Our implementation does not support general path-sensitivity, but we can often identify the places where it would improve accuracy. We further distinguish between the flow-sensitivity of the error analysis information, such as the state of a file handle, from the flow-sensitivity of the supporting pointer analysis. By measuring these two modes separately, we can determine the transitive effects of flow-sensitive pointer analysis on the error checking results. The significance of this effect is that many of the techniques for reducing the cost of precise dataflow analysis, the algorithm of Reps et al [16] for example, cannot be used for pointer analysis. We would like to know if the high cost of precise pointer analysis is worth it. Our implementation supports the following precision policies, each of which can be turned on or off independently.

5 5.1.1 Flow-sensitivity By default, our analyzer performs flow-sensitive analysis, which associates data flow facts with specific program points. Our representation is sparse, so we only store information at the points where it changes at defs. For example, we associate the state of a file with the call site that opens or closes it. In order to determine the state of an object at any other point, for example when the file is accessed, we must find the reaching definition. We implement flow-insensitive analysis by merging all updates to the dataflow information regardless of where they occur. The data flow information at any given point during the analysis is the meet of all the values seen so far, and we never need to find reaching definitions. In this mode, our pointer analysis is equivalent to Andersen s [1] Context-sensitivity During context-sensitive analysis, each call to a procedure is treated as a completely separate instantiation. The effect is equivalent to inlining all procedure calls. Context-sensitivity can cause an explosion in the cost of analysis, especially for procedures that are called in many different places. We implement context-insensitive analysis by instantiating a procedure only once and merging the information from all of its call sites. Since our analysis is interprocedural, we still visit all of the calling contexts. However, the analysis converges much more quickly, and we can often skip over a procedure call when no changes occur to the input values. The main drawback of this mode is that it suffers from the unrealizable paths problem [24], in which information from different call sites is merged together and then returned to all call sites Structure fields Our system represents the fields of a structure as separate objects. We can turn this feature off to determine the effect of ignoring structure fields Conditional constant propagation Along with the other analyses, our system performs constant propagation and constant folding. This analysis is interprocedural and also conforms to the current setting of flow and contextsensitivity. In addition, we can optionally use constant information to resolve branch conditions and to prune the control-flow graph. We use an interprocedural version of Wegman and Zadeck s [23] conditional constant propagation algorithm Multiple instance analysis Heap objects, unlike regular variables, can correspond to multiple actual objects during the program execution (for example, allocation within a loop). We adopt the analysis from Chase et al [3], which conservatively estimates which heap objects are single instance objects and which are multiple instance objects. We can apply strong updates to the single instance heap objects, which improves the accuracy of the analysis. This notion of multiplicity is equivalent to the concept of linearity used in type systems [8]. Turning this feature off reduces the burden on the analyzer, but forces all updates to heap object to be weak updates Path-sensitive multiple instance analysis Many library routines return a special value that indicates success or failure. For routines that allocate new objects, this special return value is often a null pointer if the routine failed. Figure 1 shows a code fragment that allocates some memory. Regardless of whether the allocation succeeds or fails, the memory is not allocated the end of the fragment, indicating that this is a single instance. We can use a small amount of path-sensitivity to detect this case, increasing the number of single instance objects. Foster et al [8] also recognized the importance of this feature. char * ptr; ptr = (char *) malloc(100); if (ptr) { compute_something(ptr); free(ptr); } Figure 1: The malloc d object is always a single instance. 5.2 Experimental setup For our experiments, we systematically run all the combinations of error checking problems, programs, and precision policies. For each run we measure the total amount of time and memory consumed, as well as several measures of accuracy. To save space, many of our tables and graphs summarize results using the following representative precision policies: Low flow-insensitive, context-insensitive, does not distinguish structure fields Medium flow-sensitive, context-insensitive, distinguishes structure fields High flow-sensitive, context-sensitive, distinguishes structure fields We turn on multiple instance analysis and conditional constant propagation for the flow-sensitive modes Measuring accuracy Ideally, our measure of accuracy would compare the number of errors found with the actual number of errors in each program. We can do this for format string vulnerabilities by looking at the security advisories, which tell us the number and location of each instance of this particular error. Unfortunately, manually checking all the programs for the other five kinds of errors is prohibitively difficult. Therefore we present a new methodology for measuring the accuracy of the error analysis that does not require us to know the actual number of errors, and that allows us to compare the accuracy of different error detection problems and different precision modes. The key idea is to measure accuracy as the confidence level of each data-flow value at the program points where we test for errors. Our analysis framework is sound (up to the safe features of C) so the results of every precision mode are always a superset of the actual behavior of the program. Therefore, when we test for a particular error state, we can compute the accuracy as a function of the number of possible states at that point: Accuracy = (max - num) / (max - 1) where max is the total number of states for the given problem, and num is the number of possible states determined by the analysis. This accuracy function produces a value between 0.0 and 1.0 that represents how confident the analyzer is about the state. A flow value with only one possible state produces a value of 1.0, and indicates that the object must be in that state. A value of 0.0 indicates that any state is possible (lattice bottom). For example, the code fragment in Figure 2 produces error reports at both calls to fgets(). However, in the first case, the file could be either open or closed (accuracy = 0.0), while in the second case the file is definitely closed (accuracy = 1.0). In context-sensitive mode, we test potential errors separately in each possible calling context. This approach gives the user more

6 if (condition) f = fopen( my_file, r ); fgets(buf, size, f); /* Error; acc = 0.0 */ fclose(f); fgets(buf, size, f); /* Error; acc = 1.0 */ Scalability Figure 2: Both reads generate errors, but the accuracy level is different. Accuracy High FSV Low FSV High FSV Exploit Low FSV Exploit High File state Low File state High Trust Low Trust High Info flow Low Info flow High FTP Low FTP Figure 3: Some error checking problems are harder than others. information about the cause of an error: is the code always incorrect, or does the error only occur in a specific calling sequence? However, in order to compare the accuracy of context-sensitive and context-insensitive analysis, we have to account for this difference. Therefore, we first average the accuracy levels of all the calling contexts that reach the location of an error test, then average the accuracy over all those program locations Programs Table 1 lists the input programs we test and provides details about their size and any known errors they contain. We choose these programs for a number of reasons. First, they are all real programs, taken from open-source projects, with all of the nuances and complexities of production software. Second, many of them are system tools or daemons that have significant security implications because they interact with remote clients and provide privileged services. Finally, we use security advisories to find versions of programs that are known to contain format string vulnerabilities. In addition, we also use subsequent versions in which the bugs are fixed, so that we can confirm their absence. We present several measures of program size, including number of lines of source code, number of lines of preprocessed code, and number of procedures. The number of procedures is an important measure to consider for context-sensitive analysis Platform We run all experiments on a Dell OptiPlex GX-400, with a Pentium 4 processor running at 1.7 GHz, and 2 GB of main memory. The machine runs Linux with the kernel. Our system is implemented entirely in C++ and compiled using the GNU g++ compiler version Results Figure 3 summarizes the range of accuracy for the six error checking problems. For each problem we show the accuracy obtained at two levels of precision, averaged over all of the programs. At a coarse level, we can see that different problems reach different lev- Time (s) High precision Medium precision Low precision Program # Figure 4: Full context-sensitivity is orders of magnitude more expensive. Accuracy improvement Low - no MIA Low - no fields Low - Cond Low - FS MIA Low FI Ptrs - no MIA FI Ptrs - no fields FI Ptrs - Cond FI Ptrs - FS MIA FI Ptrs Medium - no MIA Medium - no fields Medium - Cond Medium - FS MIA Medium Figure 5: Improvement in accuracy of different precision modes relative to the Low precision mode, averaged over all problems. els of accuracy, which confirms our belief that some problems are harder than others. For many of the problems, context-sensitivity is required to obtain the most accurate results. In particular, for four of the problems, using the most precise analysis noticeably improves the accuracy by 15 to 40 percent. The two format string vulnerability problems are already near perfect even for the Low precision mode. Figure 4 shows the amount of time required for Low, Medium, and High levels of precision for our benchmark programs, averaged over all of the problems. We note that the standard deviations are low. Each point on the x-axis represents one program, as numbered in Table 1. We can see from Figures 3 and 4 that obtaining the best accuracy requires orders of magnitude more time. Figure 5 focuses on the gap between the Low precision mode and the High precision mode. It shows the relative improvement in accuracy for each of the possible precision policies over the lowest precision, averaged across all programs and problems. We include the intermediate precision policy labeled as FI Ptrs in which the error properties are flow sensitive, but the pointer analysis is not. Flow-sensitive pointer analysis is more expensive, but does improve the results. We also find that ignoring structure fields seriously impairs accurate analysis. Finally, path-sensitive multiple instance analysis only helps at the higher levels of precision. 6. ADAPTIVE ALGORITHM We now present a new algorithm for dataflow analysis that au- High

7 Program Size Analysis Time (sec) Format String Vulnerability # name Lines Procedures Low Med High Adapt (min & max) Known Found False s 1 stunnel 3.8 2K 13K muh 2.05c 5K 25K muh 2.05d 5K 25K snmpd (cmu-snmp-3.4) 17K 41K crond (fcron-2.9.3) 9K 40K K cfengine K 350K K ssh (openssh 3.5p1) 38K 210K K wu-ftpd K 64K wu-ftpd K 66K named (BIND 4.9.4) 26K 84K apache (core) 30K 67K sshd (openssh 3.5p1) 50K 299K make K 50K lpd (LPRng ) 38K 150K Table 1: Properties of the input programs. The size is given as both lines of C (left column) and preprocessed lines of C (right). The analysis times for the fixed-precision policies are averages over all error checking problems for a given program; these averages have a small standard deviation. The analysis times for the adaptive algorithm vary greatly with the problem, so for that algorithm we give both minimum and maximum values. The running times indicated by are greater than 28K and not completed. The FSV identified in apache is not an exploitable one. Our analysis time for Apache uses an older and considerably slower version (sometimes an order of magnitude slower) of our adaptive algorithm, but at the time of submission we did not have time to rerun the experiment. tomatically adapts its precision to improve accuracy. Our study in Section 5 demonstrates the tradeoff between precision and performance: full precision is too costly, while low precision is insufficient for many problems. In particular, the high cost of precise pointer analysis is a significant obstacle. Therefore, a unique feature of our adaptive algorithm is that the precision of the pointer analysis is driven by the needs of the error detection problems. One possible way to provide adaptivity is to choose the best fixed-precision policy for each error detection problem. However, we can do much better than this by exploiting the following observation: we only need to add precision to the parts of the program that affect the results. For example, when tracking the state of file handles, we only need flow-sensitivity for the file handle objects themselves; precise analysis of other objects, such as data buffers, does not improve the results. Previous work has also acknowledged the benefit of judiciously adding context-sensitivity. Shankar et al [18] substantially improve their analysis by manually adding polymorphism (context-sensitivity) to certain hotspots in the application code. Das et al [5] manually clone two small procedures in gcc to make verification possible. Rather than view these cases as exceptions, our goal is to identify them automatically and add flow-sensitivity or context-sensitivity as necessary. The key insight is that we know exactly where more accuracy is needed: any point in the program where the analyzer reports an error with a low confidence value. The algorithm works backward from those points to identify the sources of the information loss. The benefit of this approach is that it drastically reduces the amount of precision needed, and it practically guarantees better accuracy. We show that this algorithm finds the right amount of precision for each problem and each program. For easy problems, it adds very little precision and produces an answer quickly. For hard problems, it adds more precision and requires more time and space. However, even for the hardest problems the adaptive algorithm is not as expensive as the full-precision algorithm because it still manages to find parts of the program for which increased precision is not needed. 6.1 Algorithm Our algorithm is iterative because it drives adaptivity by examining the analysis results. It starts by analyzing the program at the lowest possible precision, and based on the results it proposes updates to the precision policy. This process repeats until no new precision is added, or until overall accuracy stops improving. The algorithm has two components: a monitor that detects and tracks loss of information during program analysis, and a controller that analyzes the information collected by the monitor. During program analysis, the monitor identifies the places where accuracy is lost and then tracks the objects that are subsequently affected. This information is captured in a dependence graph. When the program analysis is complete, the controller looks at the objects whose values are tested (as part of the error reporting) and tracks them back through the dependence graph. The nodes and edges that make up this back-trace indicate which variables and procedures need more precision. Low-accuracy analysis information can occur in both the error detection analysis and in the pointer analysis. In the error detection analysis, we compute accuracy by counting the number of possible states in the flow value. Any value greater than one represents some uncertainty about the analysis. We can use the same notion of accuracy for the pointer analysis: any pointer with multiple possible targets has lost accuracy. 6.2 Program analysis monitor Our monitor discovers assignments that destroy accuracy, and it then then tracks the resulting bad information through the program. We divide these assignments into two categories: (1) destructive assignments, where the loss of accuracy is a direct consequence of the lack of precision, and (2) complicit assignments, in which bad information produced earlier is passed from one object to another. The monitor builds a dependence graph where each node is a variable (or other store location). A node has an associated list of destructive assignments that describes places where the variable lost information. We add a directed edge for each complicit assignment, from the lhs variable to the rhs variable. Although we record the program locations of each assignment, the dependence graph is flow-insensitive because we need the monitor to be significantly cheaper than the main analysis algorithm Destructive assignments Destructive assignments are the source of all inaccuracy in the analysis. They occur when the analyzer is forced to merge information, either by the precision policy or by the conservative nature of the analysis. We record a destructive assignment when the result of the merge is strictly worse than any of the values before merging.

8 We classify these assignments according to where they occur: Flow-insensitive assignment. When a variable is flow-insensitive, the analyzer merges all updates to the variable. Context-insensitive parameter passing. When a procedure is context insensitive, we merge the information from all of its call sites. For each formal parameter, the analyzer merges the values of all the possible actual parameters. Control-flow merge. Our analyzer uses a representation similar to SSA form, which includes functions to merge information from different control-flow paths. Multiple instance objects. Like flow-sensitive variables, the analyzer merges all updates to multiple instance objects. Figure 6 shows a small code fragment with three potential destructive assignments. The first one is a parameter passing destructive assignment, and it occurs if the procedure is called with different input strings. The second is a flow-insensitive update that occurs when we add the contents of the input string onto the message buffer. The third is a control-flow merge that merges the two possible pointer targets of p. We associate each destructive assignment with the variable that it affects. void my_func(char * input) /* Destructive */ { char * p, msg[200]; } strcpy(msg, "Message: "); strcat(msg, input); /* Destructive and complicit */ if (cond) p = &message; else p = "(empty)"; /* Destructive */ printf(p); /* Complicit */ Figure 6: Procedure my_func has three potential destructive assignments: at the formal parameter, the string concatenation, and the control-flow merge Complicit assignments Complicit assignments convey bad information from variable to variable, and to other parts of the program. An assignment is complicit when it is not destructive but still results in low accuracy. We use complicit assignments to trace bad information back to its source. To do this, we record a complicit assignment as an edge in the dependence graph from the variable that is modified to the variable that causes the inaccuracy. Complicit assignments occur in two ways: Simple assignment. In the simplest case, a complicit assignment passes bad information from the right-hand side to the left-hand side. In these cases we add an edge from the variable on the left back to the variable on the right. Assignment through a pointer. Since our pointer analysis allows pointer variables to have multiple targets, our analyzer has to merge the information from those targets whenever the pointer is dereferenced. However, unlike the destructive assignments, when these merges lose information it is because the pointer variable is not accurate enough. Therefore we add p format msg[] *input input Figure 7: The dependence graph for Figure 6 captures which variables are responsible for the accuracy of the format string. edges from the affected variables on the left to the pointer variable. Figure 6 shows two complicit assignments. In the first, any inaccuracy present in the input string is passed on to the message buffer. In second case, the call to printf() dereferences a pointer with two targets. If the resulting objects differ in their data-flow state, we hold the pointer p responsible. Figure 7 shows the resulting dependence graph. The format string of the printf() call depends on both the pointer, p, and the contents of msg string. The value of msg string depends on the input string, which in turns depends on the input parameter. 6.3 Controller Once an analysis pass is complete, the controller uses the information collected by the monitor to determine where to add precision. The algorithm uses graph reachability to compute a subgraph of the dependence graph whose nodes and edges track imprecise information from the error reports back to their sources. This subgraph tends to be significantly smaller than the graph as a whole. The controller starts at the nodes that correspond to the objects directly tested by the error checks and works backwards to find the destructive assignments. However, it only considers the objects with low accuracy. We analyze the subgraph to produce a plan for the next iteration of program analysis. For each object in the subgraph, we add precision according to the type of destructive assignment and the location where it occurs: Flow-insensitive assignment. We make the object flowsensitive for the next pass. In addition, we make all the objects flow sensitive that are between this object and the error checks in the graph, since the goal is to maintain the accuracy through the whole chain of assignments. Context-insensitive parameter passing. We make the procedure context-sensitive for the next pass. In addition, we make all of its descendants in the call graph context-sensitive. If we leave any descendants context-insensitive, then unrealizable paths destroy much of the benefit of context sensitivity. Control-flow merge. Ideally, we would apply some form of path-sensitivity to solve this problem. Unfortunately, our framework does not currently support path-sensitivity. Multiple instance objects. There are several possibilities to help disambiguate heap objects, but we have not yet implemented any. We could make the allocating procedure contextsensitive, which will cause it to produce more objects in the memory model. Information stored in the dependence graph is not flow-sensitive, which can cause the controller to add more precision than is necessary. For example, some of the destructive assignments may occur

Error Checking with Client-Driven Pointer Analysis

Error Checking with Client-Driven Pointer Analysis Samuel Z. Guyer, Calvin Lin The University of Texas at Austin, Department of Computer Sciences, Austin, TX 7872, USA Abstract This paper presents a new