Adaptive Analysis for Precise and Efficient Error Checking

Size: px
Start display at page:

Download "Adaptive Analysis for Precise and Efficient Error Checking"

Transcription

1 Adaptive Analysis for Precise and Efficient Error Checking Samuel Z. Guyer Dept. of Computer Sciences The University of Texas at Austin Austin, TX Calvin Lin Dept. of Computer Sciences The University of Texas at Austin Austin, TX In this paper we study the effects of analysis precision on automated error detection. We measure the impact of flow-sensitivity, context-sensitivity, and different types of pointer analysis on the accuracy of several error checking problems. We find that different problems can require vastly different levels of precision, so no fixed degree of precision is ideal for all cases. To solve this problem, we introduce an adaptive dataflow analysis algorithm that adjusts its level of precision at a fine grain to match the needs of the particular program and problem. We evaluate our algorithm on 14 C programs and find that it typically matches the accuracy of the most precise fixed-precision policy, at a small fraction of the cost. 1. INTRODUCTION Automatic program checking is an increasingly important method of finding bugs and detecting security vulnerabilities. Recent work in this area has used traditional tools, such as data-flow analysis and extended type systems, to systematically detect specific program properties, which are described by some lightweight specification. Examples include checking for deadlock caused by double locking [6, 8], making sure file handles are open when accessed [8, 5], and tracking tainted data to detect format string vulnerabilities [18]. Existing approaches differ not in the types of errors that they find, but in the precision, scope, and scalability of their analyses. We can characterize these approaches by their precision policy, which includes their choice of flow sensitivity, context sensitivity, and the precision of their pointer analysis. Each precision policy makes a difficult tradeoff between precision and scalability. However, no previous work has studied the efficacy of different precision policies on different program checking problems. In fact, a common way to present a new error checking algorithm is to apply it to just one or two error checking problems. This begs the question, how much analysis power is actually needed to effectively detect a particular error? This work is supported by DARPA grant F C-1892, NSF CAREER Award ACI , and DARPA contract NBCHC Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Submitted to PLDI 2003 San Diego, CA Copyright 2002 ACM X-XXXXX-XX-X/XX/XX...$5.00. The first goal of this paper is to understand the impact of commonly used precision policies on the cost of error detection and on the accuracy of the results. We present six different error checking problems, covering a range of complexity, and formulate them as data-flow analysis problems. Using a configurable data-flow analysis framework, we check our benchmark programs for each error under sixteen different precision policies. In total, we present results for approximately 1400 combinations of error problems, precision policies, and programs. Throughout this paper we distinguish between precision, which refers to the level of detail of the analysis, and accuracy, which measures the quality of the results. We introduce a new metric for accuracy that allows us to compare different precision policies and different error checking problems. As in previous work, the error checking problems presented here require some form of pointer analysis for both accuracy and soundness. Unfortunately, precise pointer analysis is often more expensive than the error checking analysis itself, and pointer analysis is not amenable to many previous techniques that reduce the cost of the error analysis [5]. Thus, a subsidiary goal of this paper is to measure the effect of precise pointer analysis on error checking, independent of the error analysis itself. Our results show that no single fixed-policy analysis is suitable for all error checking problems and all programs. Some problems, such as detecting format string vulnerabilities, are easy: we typically obtain the best accuracy even at the lowest level of precision. Other problems are harder: the more precision we use, the better the results. We are thus left with a quandary. If we opt for high precision, then the compiler wastes time over-analyzing the easy problems; if we opt for low precision, then the compiler cannot produce accurate results for the hard problems. To address this problem, we present a new adaptive analysis algorithm that monitors accuracy and dynamically adjusts the precision of the analysis to meet the particular needs of the problem and program. The algorithm exploits a unique feature of our implementation: the ability to control precision at a very fine granularity. For example, individual procedures can be analyzed contextsensitively, and individual variables can be analyzed flow-sensitively. We show that our adaptive algorithm effectively discovers the appropriate amount of precision needed to produce accurate results. For easy problems, it adds very little precision and produces an answer quickly. For hard problems, it adds more precision and requires more time and space. Even for the hardest problems, our adaptive algorithm is less expensive than a fixed high-precision policy because it manages to find parts of the program for which precision is not needed. This paper makes the following contributions: We present a comprehensive study of the effects of analysis

2 precision on the cost and accuracy of error detection, as applied to a set of real C programs. In particular, we focus on commonly used precision policies, such as flow-sensitivity and context-sensitivity, and on the transitive effects of precise pointer analysis. We find that no single fixed-precision analysis is ideal for all problems and programs. We present a new adaptive analysis algorithm that automatically adjusts precision as necessary. The algorithm starts with a very fast, imprecise analysis and adds precision only where it is needed to improve the accuracy of the results. We evaluate our adaptive analysis algorithm, showing that it can often achieve all the accuracy of the most precise analysis (flow-sensitive and context-sensitive) at a tiny fraction of the cost. This paper is organized as follows. Section 2 presents related work. In Section 3 we describe key aspects of the compiler that we use in our study, and in Section 4 we summarize the error checking problems that we will study. Section 5 presents our empirical study of the impact of analysis precision on error checking, and Section 6 describes and evaluates our adaptive algorithm. Finally, we conclude in Section RELATED WORK There is considerable prior work in both error checking and in evaluating the precision of program analysis. 2.1 Error detection Most of the recent work on automatic error checking, including ours, builds on the notion of typestate analysis introduced by Strom and Yemini [21]. These systems differ primarily in the kind of program analysis they use to derive the typestate information, and research has focused primarily on improving the performance and accuracy of the typestate analysis engine. However, one of the major challenges in checking C programs is constructing a precise enough model of the store to support accurate error checking. Unfortunately, many of the techniques used to speed up typestate analysis do not work for pointer analysis. Previous work has generally settled for a low-cost, fixed-policy pointer analysis that provides minimal store information without overwhelming the error checking analysis. This analysis by itself is often inadequate, requiring manual intervention to disambiguate memory locations [5]. Two recent papers have focused on using type systems to check for programming errors. Shankar et al. present a system for detecting format string vulnerabilities using type inference [18]. In this approach, two new type qualifiers, tainted and untainted, are introduced to the C language and added to the signatures of the standard C library functions. Type inference is performed by an extensible type qualifier framework, which derives a consistent assignment of these type qualifiers to string variables. Errors are reported as type conflicts. While this system is extremely efficient, it can produce a large number of false positives because the precision of the analysis is low. Initially, we believed that the inaccuracy was due to flow-insensitivity. However, our experiments show that flow-sensitivity is not necessary for the programs presented in their paper. Instead, the high false-positive rate is due to the equalitybased constraint solver, which allows information to propagate the wrong way through assignments; in particular, from formal parameters back to actual parameters. In several cases the authors address this problem by manually adding context-sensitivity ( polymorphism in the type-system terminology). Foster et al. extend Shankar s work to flow-sensitive type qualifiers, which they use to check the state of file handles and detect double-locking bugs [8]. Unfortunately, imprecision in the store model and equality-based constraints continue to hamper this approach. The authors add two features to their system to help improve accuracy. First, they introduce a new keyword restrict, which they add to the application code to disambiguate memory locations. Second, they add a very limited local form of pathsensitivity to handle the failure case of fopen() (when it returns a null pointer.) Our experiments show that these two features are as important, if not more important than the flow-sensitivity itself. In addition, the system generates many spurious errors, apparently due to the lack of parametric polymorphism for store locations. However, our experiments show that context-sensitivity has very little effect on these error checking problems: procedures are rarely called with open files in one context and closed files in another. The ESP system implements a path-sensitive variation of typestate analysis that significantly improves precision [5]. In particular, it uses a theorem prover to detect correlations between different branches, and it eliminates many paths that cannot actually occur. The implementation runs in polynomial time because it uses a dataflow analysis algorithm due to Reps et al. [16] that can efficiently summarize any dataflow problem that falls into a certain class of problems. However, this class does not include pointer analysis or constant propagation (with constant folding.) Therefore, the authors add a fast flow-insensitive, context-insensitive pointer analyzer as a front-end [4]. Unfortunately, the resulting store model is not precise enough to allow verification, and the authors must manually clone two procedures in order to disambiguate memory locations. Our algorithm, while not as powerful, detects these situations and automatically makes the procedures context-sensitive. The MC system checks for errors in operating system code using programmer-written checkers based on state machines [6]. A checker consists of a set of states and a set of syntax patterns that trigger transitions on the state machine. The compiler pushes the state machine down each path in the program and reports any error states that it encounters. While this approach has proven quite successful in finding errors, it has limitations. In particular, since the analysis is syntax-driven, the compiler lacks deep information about the program semantics, such as dataflow dependences and pointer relationships. In fact, the system is not sound: it can produce false negatives in which programs that have errors are reported as bug-free. The SLAM Toolkit approach is similar to MC but is more rigorous and more powerful [2]. SLAM includes a pointer analyzer and can check programs interprocedurally. The toolkit first generates an abstraction of the program that represents its behavior only with respect to the properties of interest. It then uses a model checker to perform path-sensitive analysis on the abstracted program. However, this system still uses a fixed-policy analysis to generate the initial program abstraction, including the store model. While our analysis engine is not as powerful, we allow the error checking problems themselves to dictate the precision of the store model. The SLAM approach could be combined with our technique to improve the initial abstraction or to control analysis precision during the iterative refinement process itself. 2.2 Analysis precision Many papers have compared the precision of different pointer analysis algorithms [7, 17, 20, 14]. The conclusions generally agree with our findings: precise pointer analysis is extremely expensive, but it does produce better results. Previous work has also measured the transitive effects of pointer analysis precision on op-

3 timization and on other analysis problems, such as program slicing [19]. Our work differs in two important ways. First, we include measurements of more expensive precision policies, such as combined flow-sensitivity and context-sensitivity. Second, we go beyond simply measuring the transitive effects; we feed them back into the analyzer to drive its precision. Interestingly, Stocks, et al, [20] mention the possibility of using different precision on different parts of the program. Demand-driven program analysis reduces the cost of analysis by computing only part of the analysis problem [15]. Demanddriven pointer analysis computes just enough information to determine the possible aliases of a given set of variables [12]. Our adaptive algorithm always computes the entire analysis problem, but increases precision only in the areas of the program that affect specific variables. We believe that our approach is complementary to the demand-driven approach: demand driven algorithms needs a specific problem, like error checking, to determine which variables are important. Our work is also related to lazy abstraction [13] which iteratively refines the abstraction of program based on the analysis results. 3. BACKGROUND We use the Broadway compiler [10] to perform the experiments for this research. The Broadway system integrates domain-specific library information into the compilation process. It consists of an annotation language for capturing domain-specific information and a configurable compiler that uses the annotations to perform library specific program analysis and optimization. The annotations describe only the library routines, not the application code. A key feature of the annotation language is that it allows the user to define new dataflow analyses. We use this feature to cast error detection problems as dataflow analyses. 3.1 Analysis engine Our analysis framework supports interprocedural, context-sensitive and flow-sensitive dataflow analysis problems. The engine has a built-in pointer analyzer, which is based on the storage shape graph of Chase et al. [3] and includes many of the improvements described by Wilson [24]. However, we do not compute procedure summaries because it significantly complicates the implementation, and because it does not provide a technique for summarizing both the pointer analysis and arbitrary user-defined analysis problems. All objects in the program, including surface variables and heap allocated memory, are represented by a node in the storage shape graph. We also include structs, unions, and their fields as separate nodes. Heap objects are identified by their context-sensitive allocation site, which helps maintain precision even when allocation occurs in reusable wrapper functions. Each array has its own node, but all the elements of the array are represented by a single node. The framework builds interprocedural factored use-def chains for the whole program. Every assignment to an object generates a definition that is labeled with the program location at which it occurs. In addition, each assignment generates special merge nodes at its dominance frontier just like an SSA function. Keeping the merge points separate from the code is simpler than using one of the special SSA forms developed for pointers. When we need the value of the object, we use an interprocedural dominance test to find the nearest reaching definition. The analyzer uses the factored def-use chains to perform sparse conditional dataflow analysis. Dataflow facts about an object are associated with each definition: for pointer analysis, each definition has a points-to set; for the user-defined analyses, each definition has an associated value from the lattice. The framework also includes constant propagation and constant folding. The analyzer starts at the main function and processes the program essentially in execution order. Within a procedure it uses a traditional worklist algorithm for the basic blocks. At a procedure call, it jumps immediately to the called procedure. We handle recursive procedures by forcing them be context-insensitive (see Section 5). Our analyzer is sound up to the safe features of C, but we do not handle certain kinds of pointer arithmetic. For example, we allow pointers to individual structure fields, but we cannot handle code that moves the pointer between fields. However, we do handle a number of difficult features of C and the C standard library: We properly analyze procedures that take and manipulate variable argument lists. We maintain separate nodes in the storage shape graph for each instance of a struct or union and it fields. The feature improves precision for complex data structures with multiple levels of pointers. Our annotation language provides a systematic way to describe library-specific behavior, so we properly model several library routines with internal state, such as strtok() and getenv(). 4. ERROR CHECKING PROBLEMS In this section we describe the six error checking problems that we use in our study. We focus on these problems for a number of reasons. First, we target realistic errors that actually occur in programs, and that could cause significant damage. Thus, we use the C standard library for our experiments because it is so widely used and controls access to almost all important system services. Second, we bring together a number of error checking problems that have only appeared in isolation in previous research. Finally, we compare different kinds of error checking problems to measure how difficult they are to detect and what analysis machinery is needed to detect them effectively. In particular, we experiment with both finite-state machine problems and information flow problems. We define these problems using the Broadway annotation language [11, 9]. Each problem consists of a set of possible states that the analyzer assigns to objects in the store model. For each relevant library routine, we specify how the routine affects the states of its arguments, including any objects accessed through pointers. During analysis, the compiler consults the annotations and applies the appropriate transfer function to the actual arguments at each library routine call site. Due to space limitations, we do not show the annotations here. 4.1 Detecting file access errors It is an error to access a file through a file stream or descriptor that is not open. We can model this by defining two states to describe files: the Open state and the Closed state. To track this state, we annotate fopen(), fclose(), and related functions, and at each read or write to the file, we check the associated file s state. One complication is that file handles are pointers, so we have to model the structure carefully. In particular, we must associate the file state with the internal object (IOHandle) rather than with the FILE pointer. Our annotations even allow us to handle file descriptors correctly we can analyze the use of fileno and fdopen(). 4.2 Tracking information flow

4 The concept of information flow is an important one in security. For example, can information from some private source, such as a local file, leak out to some external location, such as the network? Here, we use annotations to identify various kinds of data sources such as, files, sockets (internet, local), and environment variables and to associate them with specific variables. For example, we mark all file handles with the kinds of devices they access, and we mark data buffers according to the kinds of data they they contain. We then analyze the program to see where the associated data goes. For example, can the data from a local file go out on an internet socket? Our system allows us to analyze such information even when data are passed to various string and buffer manipulation routines, and we can handle difficult cases such as sprintf() and strtok(). 4.3 Tracking untrusted data Hostile clients can only manipulate programs through the various program inputs. We can approximate the extent of this control by tracking the input data and observing how it is used. Like the information flow analysis, this analysis labels data sources and sinks, and it associates the properties of the source with the data read from it. However the trust analysis has two distinguishing features. First, data is only as trustworthy as its least trustworthy source. For example, if the program reads both trusted and untrusted data into a single buffer, then we consider the whole buffer to be untrusted. Second, untrusted data has a domino effect on other data sources and sinks. For example, if the file name argument to fopen() is untrusted, then we treat all data read from that file descriptor as untrusted. For our analysis we model three levels of trust internal (trusted), locally trusted (for example, local files), and remote (untrusted). We generate an error message when untrusted data reaches certain sensitive routines, including any file access or manipulation, or reaches any program execution, such as exec(). 4.4 Detecting format string vulnerabilities A number of output functions in the C standard library, such as printf() and syslog(), take a format string argument that controls output formatting. The format string vulnerability (FSV) occurs when untrusted data ends up as part of the format string, and it is exploitable by stack-smashing. These vulnerabilities are a serious security problem that have been the subject of many CERT advisories. We detect these vulnerabilities using a limited form of the trust analysis. Following the terminology introduced in the Perl programming language [22] and by Shankar et. al. [18], we consider data to be tainted when it comes from an untrusted source. We track this data through the program to make sure that all format string arguments are untainted. Our system is the most accurate tool that we are aware of for detecting format string vulnerabilities [9]. 4.5 Combination: FSV exploitability FSV exploitability analysis combines taintedness analysis with trust analysis to determine whether an FSV can be exploited. For example, if the tainted data is still locally trusted (for example, from a local file), then the vulnerability is probably not remotely exploitable by hackers. Note that the domino effect of the trust analysis allows us to catch complex behavior, such as tricking a program into reading a different configuration file. 4.6 Combination: FTP -like behavior FTP-like behavior analysis combines trust analysis and information flow to determine whether a remote client can read or write arbitrary files. For example, we can detect FTP get behavior by looking for cases where data from an untrusted file (in which a remote client dictated the name of the file to be opened) is sent to a remote socket. Similarly, FTP put behavior occurs when data from a remote source is written to a file chosen by a remote source. Our analysis properly identifies the write() calls in the FTP daemon that perform these two functions. 5. MEASURING THE EFFECTS OF PRE- CISION In this section we study the impact of analysis precision on error checking. The study has several components. First, we measure the cost, in both time and memory, of several commonly used precision policies. By looking at a range of programs of different sizes we can evaluate the scalability of these policies. Second, we measure the accuracy of the analysis information produced by different levels of precision. Together with the cost information, these results allow us to quantify the tradeoff between cost and accuracy. In particular, we evaluate the transitive effects of pointer analysis precision on the error checking problems by varying their precision independently. Finally, we compare the precision requirements of different error checking problems by determining the cheapest precision policy for each problem that still yields the most accurate results. Our results show the following. Different problems have very different precision requirements. Precise pointer analysis is important in many cases. Increasing precision often has a dramatic effect on the cost of analysis. In particular, the most precise analysis mode is not scalable. The last bits of accuracy are expensive to obtain because they often require context sensitivity. Flow-sensitivity and context-sensitivity conspire to drive up analysis time. 5.1 Precision policies We focus on three aspects of precision: flow-sensitivity, contextsensitivity, and the representation of objects in memory. There are other dimensions to consider, but these three are commonly used to characterize analysis algorithms. In addition, we fix certain features of the analysis for all the experiments. We always perform interprocedural analysis because our error checking problems almost always span multiple procedures. Our implementation does not support general path-sensitivity, but we can often identify the places where it would improve accuracy. We further distinguish between the flow-sensitivity of the error analysis information, such as the state of a file handle, from the flow-sensitivity of the supporting pointer analysis. By measuring these two modes separately, we can determine the transitive effects of flow-sensitive pointer analysis on the error checking results. The significance of this effect is that many of the techniques for reducing the cost of precise dataflow analysis, the algorithm of Reps et al [16] for example, cannot be used for pointer analysis. We would like to know if the high cost of precise pointer analysis is worth it. Our implementation supports the following precision policies, each of which can be turned on or off independently.

5 5.1.1 Flow-sensitivity By default, our analyzer performs flow-sensitive analysis, which associates data flow facts with specific program points. Our representation is sparse, so we only store information at the points where it changes at defs. For example, we associate the state of a file with the call site that opens or closes it. In order to determine the state of an object at any other point, for example when the file is accessed, we must find the reaching definition. We implement flow-insensitive analysis by merging all updates to the dataflow information regardless of where they occur. The data flow information at any given point during the analysis is the meet of all the values seen so far, and we never need to find reaching definitions. In this mode, our pointer analysis is equivalent to Andersen s [1] Context-sensitivity During context-sensitive analysis, each call to a procedure is treated as a completely separate instantiation. The effect is equivalent to inlining all procedure calls. Context-sensitivity can cause an explosion in the cost of analysis, especially for procedures that are called in many different places. We implement context-insensitive analysis by instantiating a procedure only once and merging the information from all of its call sites. Since our analysis is interprocedural, we still visit all of the calling contexts. However, the analysis converges much more quickly, and we can often skip over a procedure call when no changes occur to the input values. The main drawback of this mode is that it suffers from the unrealizable paths problem [24], in which information from different call sites is merged together and then returned to all call sites Structure fields Our system represents the fields of a structure as separate objects. We can turn this feature off to determine the effect of ignoring structure fields Conditional constant propagation Along with the other analyses, our system performs constant propagation and constant folding. This analysis is interprocedural and also conforms to the current setting of flow and contextsensitivity. In addition, we can optionally use constant information to resolve branch conditions and to prune the control-flow graph. We use an interprocedural version of Wegman and Zadeck s [23] conditional constant propagation algorithm Multiple instance analysis Heap objects, unlike regular variables, can correspond to multiple actual objects during the program execution (for example, allocation within a loop). We adopt the analysis from Chase et al [3], which conservatively estimates which heap objects are single instance objects and which are multiple instance objects. We can apply strong updates to the single instance heap objects, which improves the accuracy of the analysis. This notion of multiplicity is equivalent to the concept of linearity used in type systems [8]. Turning this feature off reduces the burden on the analyzer, but forces all updates to heap object to be weak updates Path-sensitive multiple instance analysis Many library routines return a special value that indicates success or failure. For routines that allocate new objects, this special return value is often a null pointer if the routine failed. Figure 1 shows a code fragment that allocates some memory. Regardless of whether the allocation succeeds or fails, the memory is not allocated the end of the fragment, indicating that this is a single instance. We can use a small amount of path-sensitivity to detect this case, increasing the number of single instance objects. Foster et al [8] also recognized the importance of this feature. char * ptr; ptr = (char *) malloc(100); if (ptr) { compute_something(ptr); free(ptr); } Figure 1: The malloc d object is always a single instance. 5.2 Experimental setup For our experiments, we systematically run all the combinations of error checking problems, programs, and precision policies. For each run we measure the total amount of time and memory consumed, as well as several measures of accuracy. To save space, many of our tables and graphs summarize results using the following representative precision policies: Low flow-insensitive, context-insensitive, does not distinguish structure fields Medium flow-sensitive, context-insensitive, distinguishes structure fields High flow-sensitive, context-sensitive, distinguishes structure fields We turn on multiple instance analysis and conditional constant propagation for the flow-sensitive modes Measuring accuracy Ideally, our measure of accuracy would compare the number of errors found with the actual number of errors in each program. We can do this for format string vulnerabilities by looking at the security advisories, which tell us the number and location of each instance of this particular error. Unfortunately, manually checking all the programs for the other five kinds of errors is prohibitively difficult. Therefore we present a new methodology for measuring the accuracy of the error analysis that does not require us to know the actual number of errors, and that allows us to compare the accuracy of different error detection problems and different precision modes. The key idea is to measure accuracy as the confidence level of each data-flow value at the program points where we test for errors. Our analysis framework is sound (up to the safe features of C) so the results of every precision mode are always a superset of the actual behavior of the program. Therefore, when we test for a particular error state, we can compute the accuracy as a function of the number of possible states at that point: Accuracy = (max - num) / (max - 1) where max is the total number of states for the given problem, and num is the number of possible states determined by the analysis. This accuracy function produces a value between 0.0 and 1.0 that represents how confident the analyzer is about the state. A flow value with only one possible state produces a value of 1.0, and indicates that the object must be in that state. A value of 0.0 indicates that any state is possible (lattice bottom). For example, the code fragment in Figure 2 produces error reports at both calls to fgets(). However, in the first case, the file could be either open or closed (accuracy = 0.0), while in the second case the file is definitely closed (accuracy = 1.0). In context-sensitive mode, we test potential errors separately in each possible calling context. This approach gives the user more

6 if (condition) f = fopen( my_file, r ); fgets(buf, size, f); /* Error; acc = 0.0 */ fclose(f); fgets(buf, size, f); /* Error; acc = 1.0 */ Scalability Figure 2: Both reads generate errors, but the accuracy level is different. Accuracy High FSV Low FSV High FSV Exploit Low FSV Exploit High File state Low File state High Trust Low Trust High Info flow Low Info flow High FTP Low FTP Figure 3: Some error checking problems are harder than others. information about the cause of an error: is the code always incorrect, or does the error only occur in a specific calling sequence? However, in order to compare the accuracy of context-sensitive and context-insensitive analysis, we have to account for this difference. Therefore, we first average the accuracy levels of all the calling contexts that reach the location of an error test, then average the accuracy over all those program locations Programs Table 1 lists the input programs we test and provides details about their size and any known errors they contain. We choose these programs for a number of reasons. First, they are all real programs, taken from open-source projects, with all of the nuances and complexities of production software. Second, many of them are system tools or daemons that have significant security implications because they interact with remote clients and provide privileged services. Finally, we use security advisories to find versions of programs that are known to contain format string vulnerabilities. In addition, we also use subsequent versions in which the bugs are fixed, so that we can confirm their absence. We present several measures of program size, including number of lines of source code, number of lines of preprocessed code, and number of procedures. The number of procedures is an important measure to consider for context-sensitive analysis Platform We run all experiments on a Dell OptiPlex GX-400, with a Pentium 4 processor running at 1.7 GHz, and 2 GB of main memory. The machine runs Linux with the kernel. Our system is implemented entirely in C++ and compiled using the GNU g++ compiler version Results Figure 3 summarizes the range of accuracy for the six error checking problems. For each problem we show the accuracy obtained at two levels of precision, averaged over all of the programs. At a coarse level, we can see that different problems reach different lev- Time (s) High precision Medium precision Low precision Program # Figure 4: Full context-sensitivity is orders of magnitude more expensive. Accuracy improvement Low - no MIA Low - no fields Low - Cond Low - FS MIA Low FI Ptrs - no MIA FI Ptrs - no fields FI Ptrs - Cond FI Ptrs - FS MIA FI Ptrs Medium - no MIA Medium - no fields Medium - Cond Medium - FS MIA Medium Figure 5: Improvement in accuracy of different precision modes relative to the Low precision mode, averaged over all problems. els of accuracy, which confirms our belief that some problems are harder than others. For many of the problems, context-sensitivity is required to obtain the most accurate results. In particular, for four of the problems, using the most precise analysis noticeably improves the accuracy by 15 to 40 percent. The two format string vulnerability problems are already near perfect even for the Low precision mode. Figure 4 shows the amount of time required for Low, Medium, and High levels of precision for our benchmark programs, averaged over all of the problems. We note that the standard deviations are low. Each point on the x-axis represents one program, as numbered in Table 1. We can see from Figures 3 and 4 that obtaining the best accuracy requires orders of magnitude more time. Figure 5 focuses on the gap between the Low precision mode and the High precision mode. It shows the relative improvement in accuracy for each of the possible precision policies over the lowest precision, averaged across all programs and problems. We include the intermediate precision policy labeled as FI Ptrs in which the error properties are flow sensitive, but the pointer analysis is not. Flow-sensitive pointer analysis is more expensive, but does improve the results. We also find that ignoring structure fields seriously impairs accurate analysis. Finally, path-sensitive multiple instance analysis only helps at the higher levels of precision. 6. ADAPTIVE ALGORITHM We now present a new algorithm for dataflow analysis that au- High

7 Program Size Analysis Time (sec) Format String Vulnerability # name Lines Procedures Low Med High Adapt (min & max) Known Found False s 1 stunnel 3.8 2K 13K muh 2.05c 5K 25K muh 2.05d 5K 25K snmpd (cmu-snmp-3.4) 17K 41K crond (fcron-2.9.3) 9K 40K K cfengine K 350K K ssh (openssh 3.5p1) 38K 210K K wu-ftpd K 64K wu-ftpd K 66K named (BIND 4.9.4) 26K 84K apache (core) 30K 67K sshd (openssh 3.5p1) 50K 299K make K 50K lpd (LPRng ) 38K 150K Table 1: Properties of the input programs. The size is given as both lines of C (left column) and preprocessed lines of C (right). The analysis times for the fixed-precision policies are averages over all error checking problems for a given program; these averages have a small standard deviation. The analysis times for the adaptive algorithm vary greatly with the problem, so for that algorithm we give both minimum and maximum values. The running times indicated by are greater than 28K and not completed. The FSV identified in apache is not an exploitable one. Our analysis time for Apache uses an older and considerably slower version (sometimes an order of magnitude slower) of our adaptive algorithm, but at the time of submission we did not have time to rerun the experiment. tomatically adapts its precision to improve accuracy. Our study in Section 5 demonstrates the tradeoff between precision and performance: full precision is too costly, while low precision is insufficient for many problems. In particular, the high cost of precise pointer analysis is a significant obstacle. Therefore, a unique feature of our adaptive algorithm is that the precision of the pointer analysis is driven by the needs of the error detection problems. One possible way to provide adaptivity is to choose the best fixed-precision policy for each error detection problem. However, we can do much better than this by exploiting the following observation: we only need to add precision to the parts of the program that affect the results. For example, when tracking the state of file handles, we only need flow-sensitivity for the file handle objects themselves; precise analysis of other objects, such as data buffers, does not improve the results. Previous work has also acknowledged the benefit of judiciously adding context-sensitivity. Shankar et al [18] substantially improve their analysis by manually adding polymorphism (context-sensitivity) to certain hotspots in the application code. Das et al [5] manually clone two small procedures in gcc to make verification possible. Rather than view these cases as exceptions, our goal is to identify them automatically and add flow-sensitivity or context-sensitivity as necessary. The key insight is that we know exactly where more accuracy is needed: any point in the program where the analyzer reports an error with a low confidence value. The algorithm works backward from those points to identify the sources of the information loss. The benefit of this approach is that it drastically reduces the amount of precision needed, and it practically guarantees better accuracy. We show that this algorithm finds the right amount of precision for each problem and each program. For easy problems, it adds very little precision and produces an answer quickly. For hard problems, it adds more precision and requires more time and space. However, even for the hardest problems the adaptive algorithm is not as expensive as the full-precision algorithm because it still manages to find parts of the program for which increased precision is not needed. 6.1 Algorithm Our algorithm is iterative because it drives adaptivity by examining the analysis results. It starts by analyzing the program at the lowest possible precision, and based on the results it proposes updates to the precision policy. This process repeats until no new precision is added, or until overall accuracy stops improving. The algorithm has two components: a monitor that detects and tracks loss of information during program analysis, and a controller that analyzes the information collected by the monitor. During program analysis, the monitor identifies the places where accuracy is lost and then tracks the objects that are subsequently affected. This information is captured in a dependence graph. When the program analysis is complete, the controller looks at the objects whose values are tested (as part of the error reporting) and tracks them back through the dependence graph. The nodes and edges that make up this back-trace indicate which variables and procedures need more precision. Low-accuracy analysis information can occur in both the error detection analysis and in the pointer analysis. In the error detection analysis, we compute accuracy by counting the number of possible states in the flow value. Any value greater than one represents some uncertainty about the analysis. We can use the same notion of accuracy for the pointer analysis: any pointer with multiple possible targets has lost accuracy. 6.2 Program analysis monitor Our monitor discovers assignments that destroy accuracy, and it then then tracks the resulting bad information through the program. We divide these assignments into two categories: (1) destructive assignments, where the loss of accuracy is a direct consequence of the lack of precision, and (2) complicit assignments, in which bad information produced earlier is passed from one object to another. The monitor builds a dependence graph where each node is a variable (or other store location). A node has an associated list of destructive assignments that describes places where the variable lost information. We add a directed edge for each complicit assignment, from the lhs variable to the rhs variable. Although we record the program locations of each assignment, the dependence graph is flow-insensitive because we need the monitor to be significantly cheaper than the main analysis algorithm Destructive assignments Destructive assignments are the source of all inaccuracy in the analysis. They occur when the analyzer is forced to merge information, either by the precision policy or by the conservative nature of the analysis. We record a destructive assignment when the result of the merge is strictly worse than any of the values before merging.

8 We classify these assignments according to where they occur: Flow-insensitive assignment. When a variable is flow-insensitive, the analyzer merges all updates to the variable. Context-insensitive parameter passing. When a procedure is context insensitive, we merge the information from all of its call sites. For each formal parameter, the analyzer merges the values of all the possible actual parameters. Control-flow merge. Our analyzer uses a representation similar to SSA form, which includes functions to merge information from different control-flow paths. Multiple instance objects. Like flow-sensitive variables, the analyzer merges all updates to multiple instance objects. Figure 6 shows a small code fragment with three potential destructive assignments. The first one is a parameter passing destructive assignment, and it occurs if the procedure is called with different input strings. The second is a flow-insensitive update that occurs when we add the contents of the input string onto the message buffer. The third is a control-flow merge that merges the two possible pointer targets of p. We associate each destructive assignment with the variable that it affects. void my_func(char * input) /* Destructive */ { char * p, msg[200]; } strcpy(msg, "Message: "); strcat(msg, input); /* Destructive and complicit */ if (cond) p = &message; else p = "(empty)"; /* Destructive */ printf(p); /* Complicit */ Figure 6: Procedure my_func has three potential destructive assignments: at the formal parameter, the string concatenation, and the control-flow merge Complicit assignments Complicit assignments convey bad information from variable to variable, and to other parts of the program. An assignment is complicit when it is not destructive but still results in low accuracy. We use complicit assignments to trace bad information back to its source. To do this, we record a complicit assignment as an edge in the dependence graph from the variable that is modified to the variable that causes the inaccuracy. Complicit assignments occur in two ways: Simple assignment. In the simplest case, a complicit assignment passes bad information from the right-hand side to the left-hand side. In these cases we add an edge from the variable on the left back to the variable on the right. Assignment through a pointer. Since our pointer analysis allows pointer variables to have multiple targets, our analyzer has to merge the information from those targets whenever the pointer is dereferenced. However, unlike the destructive assignments, when these merges lose information it is because the pointer variable is not accurate enough. Therefore we add p format msg[] *input input Figure 7: The dependence graph for Figure 6 captures which variables are responsible for the accuracy of the format string. edges from the affected variables on the left to the pointer variable. Figure 6 shows two complicit assignments. In the first, any inaccuracy present in the input string is passed on to the message buffer. In second case, the call to printf() dereferences a pointer with two targets. If the resulting objects differ in their data-flow state, we hold the pointer p responsible. Figure 7 shows the resulting dependence graph. The format string of the printf() call depends on both the pointer, p, and the contents of msg string. The value of msg string depends on the input string, which in turns depends on the input parameter. 6.3 Controller Once an analysis pass is complete, the controller uses the information collected by the monitor to determine where to add precision. The algorithm uses graph reachability to compute a subgraph of the dependence graph whose nodes and edges track imprecise information from the error reports back to their sources. This subgraph tends to be significantly smaller than the graph as a whole. The controller starts at the nodes that correspond to the objects directly tested by the error checks and works backwards to find the destructive assignments. However, it only considers the objects with low accuracy. We analyze the subgraph to produce a plan for the next iteration of program analysis. For each object in the subgraph, we add precision according to the type of destructive assignment and the location where it occurs: Flow-insensitive assignment. We make the object flowsensitive for the next pass. In addition, we make all the objects flow sensitive that are between this object and the error checks in the graph, since the goal is to maintain the accuracy through the whole chain of assignments. Context-insensitive parameter passing. We make the procedure context-sensitive for the next pass. In addition, we make all of its descendants in the call graph context-sensitive. If we leave any descendants context-insensitive, then unrealizable paths destroy much of the benefit of context sensitivity. Control-flow merge. Ideally, we would apply some form of path-sensitivity to solve this problem. Unfortunately, our framework does not currently support path-sensitivity. Multiple instance objects. There are several possibilities to help disambiguate heap objects, but we have not yet implemented any. We could make the allocating procedure contextsensitive, which will cause it to produce more objects in the memory model. Information stored in the dependence graph is not flow-sensitive, which can cause the controller to add more precision than is necessary. For example, some of the destructive assignments may occur

Error Checking with Client-Driven Pointer Analysis

Error Checking with Client-Driven Pointer Analysis Error Checking with Client-Driven Pointer Analysis Samuel Z. Guyer, Calvin Lin The University of Texas at Austin, Department of Computer Sciences, Austin, TX 7872, USA Abstract This paper presents a new

More information

Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs. {livshits,

Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs. {livshits, Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs {livshits, lam}@cs.stanford.edu 2 Background Software systems are getting bigger Harder to develop Harder to modify Harder

More information

CSE 501 Midterm Exam: Sketch of Some Plausible Solutions Winter 1997

CSE 501 Midterm Exam: Sketch of Some Plausible Solutions Winter 1997 1) [10 pts] On homework 1, I asked about dead assignment elimination and gave the following sample solution: 8. Give an algorithm for dead assignment elimination that exploits def/use chains to work faster

More information

Securing Software Applications Using Dynamic Dataflow Analysis. OWASP June 16, The OWASP Foundation

Securing Software Applications Using Dynamic Dataflow Analysis. OWASP June 16, The OWASP Foundation Securing Software Applications Using Dynamic Dataflow Analysis Steve Cook OWASP June 16, 2010 0 Southwest Research Institute scook@swri.org (210) 522-6322 Copyright The OWASP Foundation Permission is granted

More information

Lecture Notes: Pointer Analysis

Lecture Notes: Pointer Analysis Lecture Notes: Pointer Analysis 15-819O: Program Analysis Jonathan Aldrich jonathan.aldrich@cs.cmu.edu Lecture 9 1 Motivation for Pointer Analysis In programs with pointers, program analysis can become

More information

A Propagation Engine for GCC

A Propagation Engine for GCC A Propagation Engine for GCC Diego Novillo Red Hat Canada dnovillo@redhat.com May 1, 2005 Abstract Several analyses and transformations work by propagating known values and attributes throughout the program.

More information

Lecture Notes: Pointer Analysis

Lecture Notes: Pointer Analysis Lecture Notes: Pointer Analysis 17-355/17-665/17-819: Program Analysis (Spring 2019) Jonathan Aldrich aldrich@cs.cmu.edu 1 Motivation for Pointer Analysis In the spirit of extending our understanding of

More information

Static Analysis of C++ Projects with CodeSonar

Static Analysis of C++ Projects with CodeSonar Static Analysis of C++ Projects with CodeSonar John Plaice, Senior Scientist, GrammaTech jplaice@grammatech.com 25 July 2017, Meetup C++ de Montréal Abstract Static program analysis consists of the analysis

More information

Safety Checks and Semantic Understanding via Program Analysis Techniques

Safety Checks and Semantic Understanding via Program Analysis Techniques Safety Checks and Semantic Understanding via Program Analysis Techniques Nurit Dor Joint Work: EranYahav, Inbal Ronen, Sara Porat Goal Find properties of a program Anti-patterns that indicate potential

More information

Scalable Program Analysis Using Boolean Satisfiability: The Saturn Project

Scalable Program Analysis Using Boolean Satisfiability: The Saturn Project Scalable Program Analysis Using Boolean Satisfiability: The Saturn Project Alex Aiken Stanford University Saturn 1 The Idea Verify properties of large systems! Doesn t {SLAM, BLAST, CQual, ESP} already

More information

Finding User/Kernel Pointer Bugs with Type Inference p.1

Finding User/Kernel Pointer Bugs with Type Inference p.1 Finding User/Kernel Pointer Bugs with Type Inference Rob Johnson David Wagner rtjohnso,daw}@cs.berkeley.edu. UC Berkeley Finding User/Kernel Pointer Bugs with Type Inference p.1 User/Kernel Pointer Bugs

More information

Towards Automatic Generation of Vulnerability- Based Signatures

Towards Automatic Generation of Vulnerability- Based Signatures Towards Automatic Generation of Vulnerability- Based Signatures David Brumley, James Newsome, Dawn Song, Hao Wang, and Somesh Jha (presented by Boniface Hicks) Systems and Internet Infrastructure Security

More information

Extensible Lightweight Static Checking

Extensible Lightweight Static Checking Extensible Lightweight Static Checking On the I/O Streams Challenge Problem David Evans evans@cs.virginia.edu http://lclint.cs.virginia.edu University of Virginia Computer Science LCLint Everyone Likes

More information

Making Context-sensitive Points-to Analysis with Heap Cloning Practical For The Real World

Making Context-sensitive Points-to Analysis with Heap Cloning Practical For The Real World Making Context-sensitive Points-to Analysis with Heap Cloning Practical For The Real World Chris Lattner Apple Andrew Lenharth UIUC Vikram Adve UIUC What is Heap Cloning? Distinguish objects by acyclic

More information

Advanced Compiler Construction

Advanced Compiler Construction CS 526 Advanced Compiler Construction http://misailo.cs.illinois.edu/courses/cs526 INTERPROCEDURAL ANALYSIS The slides adapted from Vikram Adve So Far Control Flow Analysis Data Flow Analysis Dependence

More information

System Administration and Network Security

System Administration and Network Security System Administration and Network Security Master SSCI, M2P subject Duration: up to 3 hours. All answers should be justified. Clear and concise answers will be rewarded. 1 Network Administration To keep

More information

Today Program Analysis for finding bugs, especially security bugs problem specification motivation approaches remaining issues

Today Program Analysis for finding bugs, especially security bugs problem specification motivation approaches remaining issues Finding Bugs Last time Run-time reordering transformations Today Program Analysis for finding bugs, especially security bugs problem specification motivation approaches remaining issues CS553 Lecture Finding

More information

Lecture 10. Pointless Tainting? Evaluating the Practicality of Pointer Tainting. Asia Slowinska, Herbert Bos. Advanced Operating Systems

Lecture 10. Pointless Tainting? Evaluating the Practicality of Pointer Tainting. Asia Slowinska, Herbert Bos. Advanced Operating Systems Lecture 10 Pointless Tainting? Evaluating the Practicality of Pointer Tainting Asia Slowinska, Herbert Bos Advanced Operating Systems December 15, 2010 SOA/OS Lecture 10, Pointer Tainting 1/40 Introduction

More information

Static Program Analysis CS701

Static Program Analysis CS701 Static Program Analysis CS701 Thomas Reps [Based on notes taken by Aditya Venkataraman on Oct 6th, 2015] Abstract This lecture introduces the area of static program analysis. We introduce the topics to

More information

CCured. One-Slide Summary. Lecture Outline. Type-Safe Retrofitting of C Programs

CCured. One-Slide Summary. Lecture Outline. Type-Safe Retrofitting of C Programs CCured Type-Safe Retrofitting of C Programs [Necula, McPeak,, Weimer, Condit, Harren] #1 One-Slide Summary CCured enforces memory safety and type safety in legacy C programs. CCured analyzes how you use

More information

Introduction. Background. Document: WG 14/N1619. Text for comment WFW-1 of N1618

Introduction. Background. Document: WG 14/N1619. Text for comment WFW-1 of N1618 Document: WG 14/N1619 Text for comment WFW-1 of N1618 Introduction Background An essential element of secure coding in the C programming language is a set of well-documented and enforceable coding rules.

More information

Static Program Analysis Part 1 the TIP language

Static Program Analysis Part 1 the TIP language Static Program Analysis Part 1 the TIP language http://cs.au.dk/~amoeller/spa/ Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University Questions about programs Does the program terminate

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

Type Checking and Type Equality

Type Checking and Type Equality Type Checking and Type Equality Type systems are the biggest point of variation across programming languages. Even languages that look similar are often greatly different when it comes to their type systems.

More information

Programming Languages Third Edition. Chapter 7 Basic Semantics

Programming Languages Third Edition. Chapter 7 Basic Semantics Programming Languages Third Edition Chapter 7 Basic Semantics Objectives Understand attributes, binding, and semantic functions Understand declarations, blocks, and scope Learn how to construct a symbol

More information

Learning from Executions

Learning from Executions Learning from Executions Dynamic analysis for program understanding and software engineering Michael D. Ernst and Jeff H. Perkins November 7, 2005 Tutorial at ASE 2005 Outline What is dynamic analysis?

More information

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection Deallocation Mechanisms User-controlled Deallocation Allocating heap space is fairly easy. But how do we deallocate heap memory no longer in use? Sometimes we may never need to deallocate! If heaps objects

More information

Lecture 14 Pointer Analysis

Lecture 14 Pointer Analysis Lecture 14 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis [ALSU 12.4, 12.6-12.7] Phillip B. Gibbons 15-745: Pointer Analysis

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

Interprocedural Analysis with Data-Dependent Calls. Circularity dilemma. A solution: optimistic iterative analysis. Example

Interprocedural Analysis with Data-Dependent Calls. Circularity dilemma. A solution: optimistic iterative analysis. Example Interprocedural Analysis with Data-Dependent Calls Circularity dilemma In languages with function pointers, first-class functions, or dynamically dispatched messages, callee(s) at call site depend on data

More information

G Programming Languages - Fall 2012

G Programming Languages - Fall 2012 G22.2110-003 Programming Languages - Fall 2012 Lecture 4 Thomas Wies New York University Review Last week Control Structures Selection Loops Adding Invariants Outline Subprograms Calling Sequences Parameter

More information

Points-To Analysis with Efficient Strong Updates

Points-To Analysis with Efficient Strong Updates Points-To Analysis with Efficient Strong Updates Ondřej Lhoták Kwok-Chiang Andrew Chung D. R. Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada {olhotak,kachung}@uwaterloo.ca

More information

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division CS 164 Spring 2010 P. N. Hilfinger CS 164: Final Examination (revised) Name: Login: You have

More information

CS711 Advanced Programming Languages Pointer Analysis Overview and Flow-Sensitive Analysis

CS711 Advanced Programming Languages Pointer Analysis Overview and Flow-Sensitive Analysis CS711 Advanced Programming Languages Pointer Analysis Overview and Flow-Sensitive Analysis Radu Rugina 8 Sep 2005 Pointer Analysis Informally: determine where pointers (or references) in the program may

More information

Lecture 20 Pointer Analysis

Lecture 20 Pointer Analysis Lecture 20 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis (Slide content courtesy of Greg Steffan, U. of Toronto) 15-745:

More information

Heap Management. Heap Allocation

Heap Management. Heap Allocation Heap Management Heap Allocation A very flexible storage allocation mechanism is heap allocation. Any number of data objects can be allocated and freed in a memory pool, called a heap. Heap allocation is

More information

Lecture 27. Pros and Cons of Pointers. Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis

Lecture 27. Pros and Cons of Pointers. Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis Pros and Cons of Pointers Lecture 27 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis Many procedural languages have pointers

More information

Static Program Analysis Part 9 pointer analysis. Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University

Static Program Analysis Part 9 pointer analysis. Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University Static Program Analysis Part 9 pointer analysis Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University Agenda Introduction to points-to analysis Andersen s analysis Steensgaards s

More information

Software Model Checking. Xiangyu Zhang

Software Model Checking. Xiangyu Zhang Software Model Checking Xiangyu Zhang Symbolic Software Model Checking CS510 S o f t w a r e E n g i n e e r i n g Symbolic analysis explicitly explores individual paths, encodes and resolves path conditions

More information

Intermediate Representation (IR)

Intermediate Representation (IR) Intermediate Representation (IR) Components and Design Goals for an IR IR encodes all knowledge the compiler has derived about source program. Simple compiler structure source code More typical compiler

More information

Points-to Analysis. Xiaokang Qiu Purdue University. November 16, ECE 468 Adapted from Kulkarni 2012

Points-to Analysis. Xiaokang Qiu Purdue University. November 16, ECE 468 Adapted from Kulkarni 2012 Points-to Analysis Xiaokang Qiu Purdue University ECE 468 Adapted from Kulkarni 2012 November 16, 2016 Simple example x := 5 ptr := @x *ptr := 9 y := x program S1 S2 S3 S4 dependences What are the dependences

More information

OOPLs - call graph construction. Example executed calls

OOPLs - call graph construction. Example executed calls OOPLs - call graph construction Compile-time analysis of reference variables and fields Problem: how to resolve virtual function calls? Need to determine to which objects (or types of objects) a reference

More information

Static Analysis and Dataflow Analysis

Static Analysis and Dataflow Analysis Static Analysis and Dataflow Analysis Static Analysis Static analyses consider all possible behaviors of a program without running it. 2 Static Analysis Static analyses consider all possible behaviors

More information

Static Analysis in Practice

Static Analysis in Practice in Practice 17-654/17-754: Analysis of Software Artifacts Jonathan Aldrich 1 Quick Poll Who is familiar and comfortable with design patterns? e.g. what is a Factory and why use it? 2 1 Outline: in Practice

More information

Path-sensitive Memory Leak Detector

Path-sensitive Memory Leak Detector Path-sensitive Memory Leak Detector Yungbum Jung Programming Research Laboratory Seoul National University ROSAEC 2nd Workshop 1 False Alarm from Tar fopen fclose escape 2 Alarm Explanation void foo(){

More information

Flow-sensitive Alias Analysis

Flow-sensitive Alias Analysis Flow-sensitive Alias Analysis Last time Client-Driven pointer analysis Today Demand DFA paper Scalable flow-sensitive alias analysis March 30, 2015 Flow-Sensitive Alias Analysis 1 Recall Previous Flow-Sensitive

More information

Lecture 16 Pointer Analysis

Lecture 16 Pointer Analysis Pros and Cons of Pointers Lecture 16 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis Many procedural languages have pointers

More information

Finding Firmware Defects Class T-18 Sean M. Beatty

Finding Firmware Defects Class T-18 Sean M. Beatty Sean Beatty Sean Beatty is a Principal with High Impact Services in Indianapolis. He holds a BSEE from the University of Wisconsin - Milwaukee. Sean has worked in the embedded systems field since 1986,

More information

Static Vulnerability Analysis

Static Vulnerability Analysis Static Vulnerability Analysis Static Vulnerability Detection helps in finding vulnerabilities in code that can be extracted by malicious input. There are different static analysis tools for different kinds

More information

Types and Type Inference

Types and Type Inference Types and Type Inference Mooly Sagiv Slides by Kathleen Fisher and John Mitchell Reading: Concepts in Programming Languages, Revised Chapter 6 - handout on the course homepage Outline General discussion

More information

Secure Programming I. Steven M. Bellovin September 28,

Secure Programming I. Steven M. Bellovin September 28, Secure Programming I Steven M. Bellovin September 28, 2014 1 If our software is buggy, what does that say about its security? Robert H. Morris Steven M. Bellovin September 28, 2014 2 The Heart of the Problem

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Fall 2017 Lecture 3a Andrew Tolmach Portland State University 1994-2017 Binding, Scope, Storage Part of being a high-level language is letting the programmer name things: variables

More information

A Sparse Algorithm for Predicated Global Value Numbering

A Sparse Algorithm for Predicated Global Value Numbering Sparse Predicated Global Value Numbering A Sparse Algorithm for Predicated Global Value Numbering Karthik Gargi Hewlett-Packard India Software Operation PLDI 02 Monday 17 June 2002 1. Introduction 2. Brute

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle   holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/22891 holds various files of this Leiden University dissertation Author: Gouw, Stijn de Title: Combining monitoring with run-time assertion checking Issue

More information

One-Slide Summary. Lecture Outline. Language Security

One-Slide Summary. Lecture Outline. Language Security Language Security Or: bringing a knife to a gun fight #1 One-Slide Summary A language s design principles and features have a strong influence on the security of programs written in that language. C s

More information

G Programming Languages - Fall 2012

G Programming Languages - Fall 2012 G22.2110-003 Programming Languages - Fall 2012 Lecture 2 Thomas Wies New York University Review Last week Programming Languages Overview Syntax and Semantics Grammars and Regular Expressions High-level

More information

Static Analysis methods and tools An industrial study. Pär Emanuelsson Ericsson AB and LiU Prof Ulf Nilsson LiU

Static Analysis methods and tools An industrial study. Pär Emanuelsson Ericsson AB and LiU Prof Ulf Nilsson LiU Static Analysis methods and tools An industrial study Pär Emanuelsson Ericsson AB and LiU Prof Ulf Nilsson LiU Outline Why static analysis What is it Underlying technology Some tools (Coverity, KlocWork,

More information

Module: Future of Secure Programming

Module: Future of Secure Programming Module: Future of Secure Programming Professor Trent Jaeger Penn State University Systems and Internet Infrastructure Security Laboratory (SIIS) 1 Programmer s Little Survey Problem What does program for

More information

Types and Type Inference

Types and Type Inference CS 242 2012 Types and Type Inference Notes modified from John Mitchell and Kathleen Fisher Reading: Concepts in Programming Languages, Revised Chapter 6 - handout on Web!! Outline General discussion of

More information

Cost Effective Dynamic Program Slicing

Cost Effective Dynamic Program Slicing Cost Effective Dynamic Program Slicing Xiangyu Zhang Rajiv Gupta Department of Computer Science The University of Arizona Tucson, Arizona 87 {xyzhang,gupta}@cs.arizona.edu ABSTRACT Although dynamic program

More information

Program Static Analysis. Overview

Program Static Analysis. Overview Program Static Analysis Overview Program static analysis Abstract interpretation Data flow analysis Intra-procedural Inter-procedural 2 1 What is static analysis? The analysis to understand computer software

More information

Alias Analysis. Advanced Topics. What is pointer analysis? Last Time

Alias Analysis. Advanced Topics. What is pointer analysis? Last Time Advanced Topics Last Time Experimental Methodology Today What s a managed language? Alias Analysis - dealing with pointers Focus on statically typed managed languages Method invocation resolution Alias

More information

CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers

CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers These notes are not yet complete. 1 Interprocedural analysis Some analyses are not sufficiently

More information

Lectures 20, 21: Axiomatic Semantics

Lectures 20, 21: Axiomatic Semantics Lectures 20, 21: Axiomatic Semantics Polyvios Pratikakis Computer Science Department, University of Crete Type Systems and Static Analysis Based on slides by George Necula Pratikakis (CSD) Axiomatic Semantics

More information

Stephen McLaughlin. From Uncertainty to Belief: Inferring the Specification Within

Stephen McLaughlin. From Uncertainty to Belief: Inferring the Specification Within From Uncertainty to Belief: Inferring the Specification Within Overview Area: Program analysis and error checking / program specification Problem: Tools lack adequate specification. Good specifications

More information

Checking and Inferring Local Non-Aliasing. UC Berkeley UC Berkeley

Checking and Inferring Local Non-Aliasing. UC Berkeley UC Berkeley Checking and Inferring Local Non-Aliasing Alex Aiken UC Berkeley Jeffrey S. Foster UMD College Park John Kodumal Tachio Terauchi UC Berkeley UC Berkeley Introduction Aliasing: A long-standing problem Pointers

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Fall 2016 Lecture 3a Andrew Tolmach Portland State University 1994-2016 Formal Semantics Goal: rigorous and unambiguous definition in terms of a wellunderstood formalism (e.g.

More information

Lecture Notes on Common Subexpression Elimination

Lecture Notes on Common Subexpression Elimination Lecture Notes on Common Subexpression Elimination 15-411: Compiler Design Frank Pfenning Lecture 18 October 29, 2015 1 Introduction Copy propagation allows us to have optimizations with this form: l :

More information

CMPSC 497: Static Analysis

CMPSC 497: Static Analysis CMPSC 497: Static Analysis Trent Jaeger Systems and Internet Infrastructure Security (SIIS) Lab Computer Science and Engineering Department Pennsylvania State University Page 1 Our Goal In this course,

More information

AUTOMATIC VULNERABILITY DETECTION USING STATIC SOURCE CODE ANALYSIS ALEXANDER IVANOV SOTIROV A THESIS

AUTOMATIC VULNERABILITY DETECTION USING STATIC SOURCE CODE ANALYSIS ALEXANDER IVANOV SOTIROV A THESIS AUTOMATIC VULNERABILITY DETECTION USING STATIC SOURCE CODE ANALYSIS by ALEXANDER IVANOV SOTIROV A THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in the

More information

Honours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui

Honours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui Honours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui Projects 1 Information flow analysis for mobile applications 2 2 Machine-learning-guide typestate analysis for UAF vulnerabilities 3 3 Preventing

More information

Interprocedural Analysis. CS252r Fall 2015

Interprocedural Analysis. CS252r Fall 2015 Interprocedural Analysis CS252r Fall 2015 Procedures So far looked at intraprocedural analysis: analyzing a single procedure Interprocedural analysis uses calling relationships among procedures Enables

More information

Static Analysis Basics II

Static Analysis Basics II Systems and Internet Infrastructure Security Network and Security Research Center Department of Computer Science and Engineering Pennsylvania State University, University Park PA Static Analysis Basics

More information

Static Analysis in Practice

Static Analysis in Practice in Practice 15-313: Foundations of Software Engineering Jonathan Aldrich 1 Outline: in Practice Case study: Analysis at ebay Case study: Analysis at Microsoft Analysis Results and Process Example: Standard

More information

CSE 565 Computer Security Fall 2018

CSE 565 Computer Security Fall 2018 CSE 565 Computer Security Fall 2018 Lecture 14: Software Security Department of Computer Science and Engineering University at Buffalo 1 Software Security Exploiting software vulnerabilities is paramount

More information

Counterexample Guided Abstraction Refinement in Blast

Counterexample Guided Abstraction Refinement in Blast Counterexample Guided Abstraction Refinement in Blast Reading: Checking Memory Safety with Blast 17-654/17-754 Analysis of Software Artifacts Jonathan Aldrich 1 How would you analyze this? * means something

More information

Maintaining Mutual Consistency for Cached Web Objects

Maintaining Mutual Consistency for Cached Web Objects Maintaining Mutual Consistency for Cached Web Objects Bhuvan Urgaonkar, Anoop George Ninan, Mohammad Salimullah Raunak Prashant Shenoy and Krithi Ramamritham Department of Computer Science, University

More information

COMP 181. Agenda. Midterm topics. Today: type checking. Purpose of types. Type errors. Type checking

COMP 181. Agenda. Midterm topics. Today: type checking. Purpose of types. Type errors. Type checking Agenda COMP 181 Type checking October 21, 2009 Next week OOPSLA: Object-oriented Programming Systems Languages and Applications One of the top PL conferences Monday (Oct 26 th ) In-class midterm Review

More information

Combining Analyses, Combining Optimizations - Summary

Combining Analyses, Combining Optimizations - Summary Combining Analyses, Combining Optimizations - Summary 1. INTRODUCTION Cliff Click s thesis Combining Analysis, Combining Optimizations [Click and Cooper 1995] uses a structurally different intermediate

More information

Buffer Overflows Defending against arbitrary code insertion and execution

Buffer Overflows Defending against arbitrary code insertion and execution www.harmonysecurity.com info@harmonysecurity.com Buffer Overflows Defending against arbitrary code insertion and execution By Stephen Fewer Contents 1 Introduction 2 1.1 Where does the problem lie? 2 1.1.1

More information

Garbage Collection (2) Advanced Operating Systems Lecture 9

Garbage Collection (2) Advanced Operating Systems Lecture 9 Garbage Collection (2) Advanced Operating Systems Lecture 9 Lecture Outline Garbage collection Generational algorithms Incremental algorithms Real-time garbage collection Practical factors 2 Object Lifetimes

More information

QUIZ. What is wrong with this code that uses default arguments?

QUIZ. What is wrong with this code that uses default arguments? QUIZ What is wrong with this code that uses default arguments? Solution The value of the default argument should be placed in either declaration or definition, not both! QUIZ What is wrong with this code

More information

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1 CSE P 0 Compilers SSA Hal Perkins Spring 0 UW CSE P 0 Spring 0 V- Agenda Overview of SSA IR Constructing SSA graphs Sample of SSA-based optimizations Converting back from SSA form Sources: Appel ch., also

More information

Analyzing Systems. Steven M. Bellovin November 26,

Analyzing Systems. Steven M. Bellovin November 26, Analyzing Systems When presented with a system, how do you know it s secure? Often, you re called upon to analyze a system you didn t design application architects and programmers build it; security people

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Computer Systems Engineering: Spring Quiz I Solutions

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Computer Systems Engineering: Spring Quiz I Solutions Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.033 Computer Systems Engineering: Spring 2011 Quiz I Solutions There are 10 questions and 12 pages in this

More information

Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras

Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras Week - 01 Lecture - 03 From Programs to Processes Hello. In

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Winter 2017 Lecture 4a Andrew Tolmach Portland State University 1994-2017 Semantics and Erroneous Programs Important part of language specification is distinguishing valid from

More information

On Reasoning about Finite Sets in Software Checking

On Reasoning about Finite Sets in Software Checking On Reasoning about Finite Sets in Software Model Checking Pavel Shved Institute for System Programming, RAS SYRCoSE 2 June 2010 Static Program Verification Static Verification checking programs against

More information

Optimizing for Bugs Fixed

Optimizing for Bugs Fixed Optimizing for Bugs Fixed The Design Principles behind the Clang Static Analyzer Anna Zaks, Manager of Program Analysis Team @ Apple What is This Talk About? LLVM/clang project Overview of the Clang Static

More information

Introduction to Programming Using Java (98-388)

Introduction to Programming Using Java (98-388) Introduction to Programming Using Java (98-388) Understand Java fundamentals Describe the use of main in a Java application Signature of main, why it is static; how to consume an instance of your own class;

More information

Alternatives for semantic processing

Alternatives for semantic processing Semantic Processing Copyright c 2000 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies

More information

Language Security. Lecture 40

Language Security. Lecture 40 Language Security Lecture 40 (from notes by G. Necula) Prof. Hilfinger CS 164 Lecture 40 1 Lecture Outline Beyond compilers Looking at other issues in programming language design and tools C Arrays Exploiting

More information

A classic tool: slicing. CSE503: Software Engineering. Slicing, dicing, chopping. Basic ideas. Weiser s approach. Example

A classic tool: slicing. CSE503: Software Engineering. Slicing, dicing, chopping. Basic ideas. Weiser s approach. Example A classic tool: slicing CSE503: Software Engineering David Notkin University of Washington Computer Science & Engineering Spring 2006 Of interest by itself And for the underlying representations Originally,

More information

Splint Pre-History. Security Flaws. (A Somewhat Self-Indulgent) Splint Retrospective. (Almost) Everyone Hates Specifications.

Splint Pre-History. Security Flaws. (A Somewhat Self-Indulgent) Splint Retrospective. (Almost) Everyone Hates Specifications. (A Somewhat Self-Indulgent) Splint Retrospective Splint Pre-History Pre-history 1973: Steve Ziles algebraic specification of set 1975: John Guttag s PhD thesis: algebraic specifications for abstract datatypes

More information

Run-Time Environments/Garbage Collection

Run-Time Environments/Garbage Collection Run-Time Environments/Garbage Collection Department of Computer Science, Faculty of ICT January 5, 2014 Introduction Compilers need to be aware of the run-time environment in which their compiled programs

More information

Visualizing Type Qualifier Inference with Eclipse

Visualizing Type Qualifier Inference with Eclipse Visualizing Type Qualifier Inference with Eclipse David Greenfieldboyce Jeffrey S. Foster University of Maryland, College Park dgreenfi,jfoster @cs.umd.edu Abstract Type qualifiers are a lightweight, practical

More information

Introduction to Optimization Local Value Numbering

Introduction to Optimization Local Value Numbering COMP 506 Rice University Spring 2018 Introduction to Optimization Local Value Numbering source IR IR target code Front End Optimizer Back End code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights

More information

Verifying the Safety of Security-Critical Applications

Verifying the Safety of Security-Critical Applications Verifying the Safety of Security-Critical Applications Thomas Dillig Stanford University Thomas Dillig 1 of 31 Why Program Verification? Reliability and security of software is a huge problem. Thomas Dillig

More information

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis Synthesis of output program (back-end) Intermediate Code Generation Optimization Before and after generating machine

More information

Refinement-Based Context-Sensitive Points-To Analysis for Java

Refinement-Based Context-Sensitive Points-To Analysis for Java Refinement-Based Context-Sensitive Points-To Analysis for Java Manu Sridharan, Rastislav Bodík UC Berkeley PLDI 2006 1 What Does Refinement Buy You? Increased scalability: enable new clients Memory: orders

More information