Program Sifting: Select Property-related Functions for Language-based Static Analysis

Size: px

Start display at page:

Download "Program Sifting: Select Property-related Functions for Language-based Static Analysis"

Neal Hawkins
5 years ago
Views:

2009 16th Asia-Pacific Software Engineering Conference Program Sifting: Select Property-related Functions for Language-based Static Analysis YU Kai,WANG Cong,CHEN Yin-li and LIN Meng-xiang State Key

1 th Asia-Pacific Software Engineering Conference Program Sifting: Select Property-related Functions for Language-based Static Analysis YU Kai,WANG Cong,CHEN Yin-li and LIN Meng-xiang State Key Laboratory of Software Development Environment Beihang University Beijing, , P.R.China Abstract Recent studies have demonstrated that languagebased static analysis is capable of finding hundreds of bugs in complex real systems. Such static analysis allows users to specify properties in a specification language on demand. Paths in control flow graphs are explored exhaustively against user-defined properties. To avoid the potential path explosion problem, many techniques have been used in practice such as summaries. In this paper, we investigate how to simplify programs under check utilizing user-specified properties. From our observations, most functions under check are irrelevant to given properties. Checking those functions is time consuming. A program sifting approach is proposed to select functions related to properties. Sifters are derived from user-specified properties automatically. Functions matched or affected by sifters are safely preserved while the others are safely removed. We implemented a tool SIFT and carried out some experiments. Results show that SIFT is capable of simplifying program under check remarkably with small cost. In our experiments, 85% functions in files are sifted out while 89% analysis time issavedonaverage. Keywords-program sifting; language-based static analysis; error detection; program simplification I. INTRODUCTION How to improve the quality of software by reducing the number of bugs remains an imminent challenge to the research community. Static analysis has the power to detect errors, such as illegal memory allocation [1], locking errors [2], without executing programs. However, a static analyzer is generally designed for a specific set of errors, which limits the ability of finding out other kinds of defects. Recent years have seen many advances in checking programs with respect to user-defined properties [3][4][5]. Such kind of approaches named language-based static analysis allows users to specify properties in a specification language (which are known as rules or checkers) on demand. For example, programmers can write more than fifty different kinds of checkers in metal [6] to detect hundreds of bugs in complex real systems. An analysis engine implementing language-based static analysis checks all achievable control paths in control flow graphs (CFG). In each control path, source code is analyzed in execution order to identify actions that are relevant to a user-defined checker. Potential errors will be reported when code actions of interest are found. In order for the analysis to be applicable to large software in the millions of lines of code, many techniques have been developed such as summaries [1] and false path pruning [4]. However, the information of user-specified checkers is rarely used to reduce the cost of analysis. From our observations, most functions under check are irrelevant to given properties. The analysis on them is unnecessary and timeconsuming. The issue we address here is how to simplify program under check for language-based static analysis. Program simplification should be complete and sound. Completeness guarantees that it will not miss real errors which are supposed to be found without simplification. Soundness means it will not bring in new false alarms. Formally, they are stated as the following two requests: Request 1. Let R(P C ) stand for real errors reported by a program P against a checker C. A program simplification is complete, if and only if, for any given C, R(P C )=R(P C ). Here P is the residual program of P after simplification. Request 2. Let F (P C ) stand for false alarms (incorrect error reports) reported by a program P against a checker C. A program simplification is sound, if and only if, for any given C, F (P C ) F (P C). HereP is the residual program of P after simplification. In this paper, we propose a program sifting approach to extract program parts with respect to given checkers. It works as a sieve to separate program parts unrelated to checkers from property-related functions. It includes three main steps as illustrated in Figure 1: generating sifters from a given checker, filtering programs under check by sifters and constructing residual programs. Functions matched by sifters are referred to as seed functions. Functions affected by seed functions in the call graph constitute residual programs. Residual programs preserve functions related to the given checker while irrelevant functions are removed. Contributions of this paper are as follows: We propose a program sifting approach for languagebased static analysis to reduce the cost of analysis. The information of user-defined properties is used to filter /09 $ IEEE DOI /APSEC

2 Figure 1. A program sifting approach initial state START state ALLOCATED state CHECKED error state WRONG pattern allocation %OR{ %X = malloc(% ); %X = dev alloc skb(% ); %X = kmalloc(%,% ); %X = scsi register(%,% ) } ; pattern dereference %OR{ *%X = % ; % =%X->% ; constant c memset(%x,%,% ); % = constant c memset(%x,%,% ) } ; pattern check %OR{%X == 0; %X == NULL} ; transition START ALLOCATED allocation; transition ALLOCATED WRONG dereference; transition ALLOCATED CHECKED check; transition CHECKED START dereference; Figure 2. A simplified NULL checker out functions irrelevant to given checkers. We prove the completeness and soundness about our program sifting approach on the formalization of requests for program simplification. We have implemented a tool SIFT and carried out some experiments. Results show that SIFT is capable of simplifying checked programs remarkably with small cost. In our experiments, 85% functions in files are sifted out while 89% analysis time is saved on average. The rest of this paper is laid out as follows. Section 2 gives the background of language-based static analysis. Section 3 explains our approach in detail. Section 4 shows the experimental results. Related work is given in Section 5 and Section 6 concludes the paper. II. BACKGROUND Our checkers are written in a language similar to metal. A checker is essentially a finite state machine (FSM). For example, the NULL checker in Figure 2 describes temporal properties of dynamic memory allocation. The initial state 1:char *foo(int size) 2:{ 3: char *result; 4: 5: if(size > 0) 6: result = malloc(size); 7: if(size == 1) 8: return NULL; 9: *result = 0; 10: return result; 11:} Figure 3. An example function is START, where the analysis starts from, while the error state WRONG indicates error occurrences. Each transition is a triple which consists of a source state, a target state and a pattern. During the process of static analysis, patterns are used to match variables, expressions or argument lists in order to identify actions of source code. Only if a FSM instance is in the source state and the following statement matches the pattern, will the state change to the target state. The arrival of error states indicates potential bugs in programs under check. Consider the piece of code in Figure 3 edited from [1]. The function allocates a memory block and initializes it. There are three errors in this function: memory leak(path: 5,6,7,8), invalid pointer dereference(path: 5,7,9,10) and NULL pointer dereference(path: 5,6,7,9,10). To detect violations against NULL checker, all control flow paths (Figure 4 (a)) through the function foo() have to be traversed. For example, for path: 5,6,7,9,10, since pattern allocation %X = malloc(% ); matches the statement in line 6, according to transition START ALLOCATED allocation, a FSM instance M is created and transits to state ALLOCATED. Subsequently, M reaches state WRONG since the dereference of pointer result is identified by pattern dereference *%X =%;. Then an error alert is reported, which reveals a potential NULL pointer dereference. III. PROGRAM SIFTING A. Methodology Overview In practice, the number of control flow paths through a function is often quite large, especially in the presence of function calls. Traversing all these paths will lead to path explosion. Therefore, summaries [1][4] are usually combined into static analysis to decrease the number of paths explored. However, from our observations, quite a few functions are irrelevant to checked properties and it is unnecessary and time-consuming to traverse paths in them. Removing those irrelative functions before static analysis will potentially improve the efficiency. A program simplification approach with respect to userdefined properties is proposed in this paper. Intuitively, 276

Figure 5. An example call graph Figure 4. CFG and the list of statements of example in Figure 3 functions under analysis that contain statements relevant to properties should be preserved.

Functions containing statements that trigger a transition starting from an initial state are referred to as seed functions, while those containing statements that trigger other transitions are called

3 Figure 5. An example call graph Figure 4. CFG and the list of statements of example in Figure 3 functions under analysis that contain statements relevant to properties should be preserved. During the analysis against a given checker, error states must be transited from some initial state. Functions containing statements that trigger a transition starting from an initial state are referred to as seed functions, while those containing statements that trigger other transitions are called trigger functions. In order to identify seed functions and trigger functions, the list of statements in each function instead of all control paths is scanned. Statements of interest are recognized as shown in Figure 4 (b). After the identification, a trigger function would be analyzed when it is directly or indirectly invoked by seed functions. For ease of understanding and clarity, we formalize our approach as follows: Definition 1 A checker is a FSM C = Q, Σ,δ,Q 0,F which consists of a set Q of states, a set Σ of patterns, a set of transitions δ Q Q Σ, a set of initial states Q 0 Q and a set of error states F Q. Transition starting from an initial state is referred to as t i C while succeeding transition is denoted as ts C. In NULL checker shown in Figure 2, the initial transition is transition START ALLOCATED allocation where the succeeding transitions are transition ALLOCATED WRONG dereference, transition ALLOCATED CHECKED check and transition CHECKED START dereference. Definition 2 Given a checker C = Q, Σ,δ,Q 0,F, a sifter σ C is a pattern in Σ, which triggers an initial transition t i C Ḋefinition 3 A seed function f s P C is a function in program P, which contains at least one statement that is matched by a sifter σ C. Definition 4 A trigger function fp t C is a function in program P, which contains at least one statement that triggers a succeeding transition t s C. Definition 5 A relevant callee fp r C is a function in the call graph G P of program P such that: 1) fp r C is invoked directly or indirectly by a seed function fp s C. 2) fp r C is a trigger function fp t C or an ancestor of a fp t C in G P. It is worth noting that a seed function is a trivial relevant callee since it can be viewed as invoked by itself. Definition 6 For a given checker C, theresidual program P of P after simplification consists of all relevant callees fp r C s. For example, we use NULL checker (Figure 2) to sift a contrived program whose call graph is described in Figure 5. Suppose memory allocation only occurs in function b(), and allocated pointers are dereferenced in function c() and f(). According to the definitions above, function b() is a seed function and the residual program consists of relevant callees b(), d() and f(). Succeeding analysis will start from b() and report potential bugs in f() if there exists. Function d() is a relevant callee since it is an ancestor of the trigger function f(). Function c() is not a relevant callee since it is not called directly or indirectly by a seed function. Function e() is not a relevant callee since it is not a trigger function or an ancestor of a trigger function. B. Algorithm The sifting algorithm is described in Figure 6. The input of SIFT is a given checker, a list of functions and corresponding call graphs. The outputs are seed functions and relevant callees, which are used in succeeding analysis. The algorithm performs three steps as follows. The first step is to generate sifters (Figure 7). As mentioned above, sifters are patterns which trigger transitions 277

4 procedure SIFT(checker,funcs,callGraph) begin //Step 1: [Generate sifters] (sifters,patternset) GenSifters(checker) //Step 2: [Sift out seed functions] (seeds,triggers) Siftout(funcs,sifters,patternSet) //Step 3: [Compute relevant callees] relevants seeds if seeds for each func seeds GenCallees(func,triggers,relevants,callGraph) end Figure 6. Algorithm of SIFT function GenSifters(checker) begin (initialstate, patternset, transitionset) parse checker names for each transition t transitionset if t starts from initialstate extract patternname in t names names {patternname} sifters for each pattern p patternset if name of p is in names sifters sifters {p} end Figure 7. Function GenSifters() starting from initial states. Names of those patterns are extracted first from transitions of a given checker and used to generate sifters from the set of patterns. For NULL checker in Figure 2, the generated sifter is the pattern allocation. The second step is to obtain seed functions and trigger functions (Figure 8), which are used for the computation of relevant callees. The parameters are funcs, sifters and patternset, which represent the list of functions, the set of function Siftout(funcs,sifters,patternSet) begin seeds triggers for each func funcs if a statement s in func while s matches a pattern p sifters seeds seeds {func} else if a statement s in func while s matches a pattern p patternset sifters triggers triggers {func} end Figure 8. Function Siftout() function GenCallees(func,triggers,relevants,callGraph) begin if func has no successor in callgraph return for each successor succ of func in callgraph if succ triggers OR succ is a trigger s ancestor if succ / relevants relevants relevants {succ} GenCallees(succ,triggers,relevants,callGraph) end Figure 9. Function GenCallees() sifters and the set of patterns in a checker, respectively. All statements in a function are scanned. Functions containing statements matched by patterns are identified as seed functions or trigger functions. The third step is to compute relevant callees (Figure 9). Relevant callees are computed by a graph-reachability algorithm with respect to call graphs. The traversal starts from a vertex representing a seed function in the call graph, and forwardly visits all the successors that could be reached from it. Only if a successor is a trigger function or an ancestor of a trigger function, will the function be identified as a relevant callee. The time complexity of SIFT mainly depends on the cost of the computation of relevant callees. The runtime of this algorithm is therefore bounded by O(m(m + n)), where m and n represents the number of vertices and edges in the call graph, respectively. C. Completeness and soundness To formulate the completeness and soundness, we first introduce a lemma. Lemma For a given checker C, after program sifting, the program P and the residual program P will report the same set of errors, i.e. R(P C )=R(P C ) and F (P C)=F (P C ). Proof. Here we only give a proof of R(P C )=R(P C ).It is similar to prove that F (P C )=F(P C ). 1) For any real error re R(P C ). Suppose that the transitions leading to re are t 0 re,t 1 re,...,t n re, and each t s re( s, 0 s n) occurs in function fre. s For any fre( s, s 0 s n), fre s is either a seed function or a trigger function. By definition 5 and definition 6, seed function fre 0 is in P. According to the correspondence between transitions and functions, any trigger function fre( s, s 1 s n) is invoked directly or indirectly by the seed function fre. 0 Then fre s is a relevant callee according to definition 5, i.e. fre s is in P according to definition 6. Thus each transition t s re ( s, 0 s n) will be triggered in P and the real error re will be arrived in P and reported in R(P C ). So R(P C ) R(P C ). 278

Table I THREE CHECKERS Checker Description Num(Patterns) Type NULL Check potentially dereference of null pointers 50 Variable-specific FREE Check uses of freed pointers 6 Variable-specific BLOCK

device drivers 52 1686 256722 85 1 2 fs file systems 9 170 33309 15 0 0 Figure 10. Function filtration percentage Figure 11. Time saved percentage 2) For any real error re R(P C ).

5 Table I THREE CHECKERS Checker Description Num(Patterns) Type NULL Check potentially dereference of null pointers 50 Variable-specific FREE Check uses of freed pointers 6 Variable-specific BLOCK Detect calls to blocking functions with interrupts disabled 7 Global Table II EXPERIMENTAL FILES Directory Description Files Processed functions Processed LOC NULL bugs FREE bugs BLOCK bugs drivers device drivers fs file systems Figure 10. Function filtration percentage Figure 11. Time saved percentage 2) For any real error re R(P C ). Suppose that the transitions leading to re are t 0 re,t1 re,...,tn re, and each t s re ( s, 0 s n) occurs in function f re s. Since P P, it is immediate that each function fre s ( s, 0 s n) is in P. Thus each transition t s re will be triggered in P and the real error re will be arrived in P and reported in R(P C ). So R(P C ) R(P C). By 1) and 2), we can conclude that R(P C )=R(P C ). THEOREM 1 SIFT is complete, i.e. the set of real errors reported by a program P against a checker C is same as the set of real errors reported by the residual program P after program sifting. Proof. It can be concluded immediately from Lemma since R(P C )=R(P C ). The completeness of SIFT means that, for a given checker, all the real bugs identified by checking the original program can also be recognized by checking the simplified program. That s because SIFT is able to identify and preserve all the transitions in the checker. It is worth noting that SIFT is only a simplification approach and whether the real errors in source code can be found out or not depends on the essence of analysis engines and the expressiveness of checkers. THEOREM 2 SIFT is sound, especially the set of false alarms reported by the residual program P of a program P after program sifting is same as the set of false alarms reported by P against a given checker C. Proof. It can be concluded immediately from Lemma since F (P C )=F (P C ). IV. EXPERIMENTAL EVALUATION A. Setup SIFT has been implemented on the top of a static analysis engine AXEC (An extensible static checker for C), which was developed in our previous work in spirit of the analysis engine in MC tool [4]. To clarify ambiguous constructs and permit easy analysis, source code is preprocessed into standard C code by CIL [7]. AXEC parses the preprocessed source code and transforms it into ASTs and CFGs as the intermediate representation. Three of the metal checkers (Table I) have been rewritten and used in our experiments. These checkers are employed to check 61 files from two directories in the Linux kernel. Our evaluation is based on the files preprocessed by CIL, which are presented in Table II. The number of bugs includes false alarms. Experiments were executed on a 2.66 GHz Intel Core 279

6 Table III FUNCTION FILTRATION N(func) N(seed) N(relevant) N(seed)/N(func) N(relevant)/N(func) Filtration percentage NULL-drivers % 31.08% 68.92% NULL-fs % 28.82% 71.18% FREE-drivers % 19.45% 80.55% FREE-fs % 3.53% 96.47% BLOCK-drivers % 4.80% 95.20% BLOCK-fs % 0.00% % Average / / / 5.37% 14.62% 85.38% Table IV TIME SAVED T(Orig)(sec) T(Sift)(sec) T(Res)(sec) (T(Sift)+T(Res))/T(Orig) Time saved NULL-drivers % 87.98% NULL-fs % 65.48% FREE-drivers % 89.21% FREE-fs % 98.69% BLOCK-drivers % 96.02% BLOCK-fs % 98.64% Average / / / 10.66% 89.34% 2 Duo machine with 2GB of memory running Windows XP. For simplicity, call graphs are constructed only for functions in the same file. B. Results and Discussion Effectiveness We first investigate the effectiveness of program sifting. For each checker, Figure 10 presents the average number of filtration percentage, which demonstrates the ratio between the number of removed functions and total functions. The numeric percentages are presented in Table III, which also shows the number of functions, seed functions and relevant callees. Overall, the average for each filtration is about 85%. Two relevant observations about the program sifting can be made from the table. First, only 6% and 15% of total functions are seed functions and relevant callees on average. The low percentage confirms previous observation that only a few functions are related to given properties. The second observation is that, filtration percentage varies from checkers. For example, no seed function is generated when BLOCK checker is applied to the files in fs, while 142 seed functions are produced when the files in drivers are checked against NULL checker. In the first case, no function will be checked after program sifting. Efficiency To validate the efficiency of SIFT, we consider the running time of analysis. The first column T(Orig) in Table IV indicates the running time spent on checking without sifting. The columns T(Sift) and T(Res) represent the time on sifting and checking on residual programs, respectively. The sum of T(Sift) and T(Res) is the total time spent on checking with sifting. The ratio of (T(Sift)+T(Res)) to T(Orig) and the percentage of time saved are also given. The results are shown in Figure 11 visually, where can be seen that at least 65% of time is saved. Another observation is that NULL checker needs more time than others. The Table V RELATION BETWEEN NUMBER OF FUNCTIONS AND TIME SAVED N(func) N(file) NULL FREE BLOCK % % % % 39.77% 45.42% % 78.62% 78.08% % 79.15% 88.86% % 64.60% 89.62% % 95.72% 95.10% % 81.14% 96.67% % 85.16% 93.15% % 85.81% 91.37% % 85.20% 98.70% % of time saved NULL 20 FREE BLOCK number of functions Figure 12. The percentage of time saved with respect to different number of functions (In this chart, line connection points are included only as a visual aid; strictly speaking, no intermediated values exist.) 280

7 main reason is that NULL checker contains more patterns, which are matched by a recursive algorithm in AXEC. C. Other Points of Discussion Number of functions and time saved We further investigate the relation between the number of functions and time saved. Figure 12 demonstrates the percentage of time reduced with respect to files of different size against the three checkers. In Table V, tested files are divided into ten groups according to the number of functions shown in column N(func). The results in Figure 12 and Table V indicate that SIFT performs better for files including more functions. In our experiments, it is expected to take less than 40% time in analysis, if a file has more than twenty functions. However, it s a bad bargain to sift files containing less than ten functions. Although SIFT has the advantage of function filtration, it doesn t pay to simplify such small files. Seed functions and relevant callees The distribution of seed functions and relevant callees, shown in Figure 13 and 14, clearly demonstrates that only a few functions are relevant to given properties. For files checked by NULL checker and FREE checker, both distributions are reasonable and centralized with a few extremes. Results for BLOCK checker is somewhat unusual due to few seed functions were matched and relevant callees were generated, as shown in Table III. In summary, only two seed functions and no more than five relevant callees will be generated per file. number of seeded functions per file Figure 13. number of relevant callees per file Figure 14. NULL FREE BLOCK The number of seed functions per file for each checker NULL FREE BLOCK The number of relevant callees per file for each checker D. Threats to Validity Like any empirical study, this study has limitations that must be considered when interpreting its results. The first threat lies in the test subjects selection. MC[6] and mygcc[8] checked files in Linux kernel and reported hundreds of bugs. To validate the completeness and soundness of SIFT, 61 of those files are chosen at random and used in our experiments and three checkers are rewritten. Since our results are obtained from experiments on these files, absolute performance measures (e.g., 85% functions are sifted out while 89% analysis time is saved on average) do not readily generalize to arbitrarily programs. On the other hand, these files have been checked by other static analysis tools, hence the results are credible. Results in this study are also subjected to the threat of call graphs. SIFT suppose that call graphs are available. Call graphs extracted may vary for differing treatments of macros, function pointers, input formats, etc, as reported in [9]. In the presence of function pointers, static analysis tools usually use the points-to information to construct call graphs. Our experiments are carried on with AXEC, which hasn t combined with a points-to analysis. Therefore, our experiments did not deal with function calls by pointers. However, SIFT is capable of coping with function pointers while the filtration percentages may vary. V. RELATED WORK In this section, we compare our work with research on simplification strategies used in earlier static analysis tools and program simplification techniques for static analysis. A. Simplification strategies in language-based static analysis tools PREfix[1] is an error detection tool by tracing execution paths. It selects representative paths to analysis. Path summaries are combined to compute function summaries. Inter-procedure analysis begins with the leaf functions in a bottom-up way. Unlike PREfix, instead of traversing paths, our approach scans the list of statements and sifts functions starting from seed functions in call graphs first. MC[4] is designed to find as many errors as possible and uses an extension language to write checkers. To make the analysis efficient, block summaries and function summaries are computed. False paths are pruned by a simple congruence closure algorithm. Our program sifting approach can be viewed as a pre-processing filter to such kind of tools. Irrelevant paths are pruned implicitly in function filtration. Mygcc[8] is a complier-integrated approach with a complier through a declarative language. Properties are expressed as restrictive regular path queries. It looks for all the 281

8 nodes containing the initial transition in CFG first, which is similar to the first and second steps of our approach. However, [8] does not discuss inter-procedural analysis, while we investigate how to generate relevant callees from call graphs to construct a residual program. ESP[3] is a path-sensitive tool for checking temporal safety properties. Their insight is that most branches in a program are not relevant to the given properties, which is similar to us. ESP symbolically computes the program and merges states when necessary. It is context-sensitive by using summary edges. In contrast, SIFT uses seed functions in the third step to filtrate irrelevant functions. Paths in irrelevant functions are neglected implicitly. B. Program simplification for static analysis Program slicing[10][11] is a technique to simplify programs by focusing on selected aspects of semantics. A program slice consists of the parts of a program that potentially affect the values computed at a certain point of interest. Such a point of interest is referred to as a slicing criterion. There are three differences between program slicing and program sifting. First, program slicing is variable-related and slicing criterions are generated from variables of interest, while program sifting is property-related and sifters are extracted from checkers. Second, slicing is used to remove statements while sifting filtrates functions. Last but not least, program slicing is mainly used in testing, debugging and reengineering. Program sifting is proposed to simplify programs before language-based static analysis. Jiang and Su[12] propose an approach utilizing user execution profilers to simplify programs for analysis. Information from statistical debugging is used to preserve erroneous behaviors of original programs. The approach can be viewed as a complement to static analysis tools while our work is a pre-processing filter. Scholz et al.[13] studies user-input dependence analysis as a preliminary step to static analysis tools. The intermediate representation adopted is assa(augmented Static Single Assignment), which is exploited to reduce the analysis into a simplified graph reachability problem. VI. CONCLUSION In this paper, we present a program sifting approach for language-based static analysis. It can be used as a preprocessing filter for succeeding analysis. Our approach is able to handle a variety of user-defined checkers. We prove the completeness and soundness of it. The effectiveness and efficiency are validated by experiments. ACKNOWLEDGMENT We would like to thank the anonymous reviewers for their valuable feedback. This effort is sponsored by the State 863 High-Tech Program No. 2007AA and No. 2007AA01Z146. REFERENCES [1] W. Bush, J. Pincus, and D. Sielaff, A static analyzer for finding dynamic programming errors, Software: Practice and Experience, vol. 30, no. 7, pp , [2] Y. Xie and A. Aiken, Scalable error detection using boolean satisfiability, in POPL 05: Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, 2005, pp [3] M. Das, S. Lerner, and M. Seigle, Esp: path-sensitive program verification in polynomial time, in PLDI 02: Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, 2002, pp [4] S. Hallem, B. Chelf, Y. Xie, and D. Engler, A system and language for building system-specific, static analyses, in PLDI 02: Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, 2002, pp [5] N. Volanschi, A portable compiler-integrated approach to permanent checking, in ASE 06: Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, 2006, pp [6] A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler, An empirical study of operating systems errors, in SOSP 01: Proceedings of the eighteenth ACM symposium on Operating systems principles, 2001, pp [7] G. C. Necula, S. Mcpeak, S. P. Rahul, and W. Weimer, Cil: Intermediate language and tools for analysis and transformation of c programs, in International Conference on Compiler Construction, 2002, pp [8] N. Volanschi, Condate: a proto-language at the confluence between checking and compiling, in PPDP 06: Proceedings of the 8th ACM SIGPLAN international conference on Principles and practice of declarative programming, 2006, pp [9] G. C. Murphy, D. Notkin, and E. S.-C. Lan, An empirical study of static call graph extractors, in ICSE 96: Proceedings of the 18th international conference on Software engineering. IEEE Computer Society, 1996, pp [10] M. Weiser, Program slicing, IEEE Transactions on Software Engineering, vol. SE-10(4), pp , July [11] D. Binkley and M. Harman, A large-scale empirical study of forward and backward static slice size and context sensitivity, in ICSM 03: Proceedings of the International Conference on Software Maintenance. IEEE Computer Society, 2003, p. 44. [12] L. Jiang and Z. Su, Profile-guided program simplification for effective testing and analysis, in SIGSOFT 08/FSE- 16: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering. ACM, 2008, pp [13] B. Scholz, C. Zhang, and C. Cifuentes, User-input dependence analysis via graph reachability, in Proceedings of the Eighth IEEE Working Conference on Source Code Analysis and Manipulation,

F-Soft: Software Verification Platform

F-Soft: Software Verification Platform F. Ivančić, Z. Yang, M.K. Ganai, A. Gupta, I. Shlyakhter, and P. Ashar NEC Laboratories America, 4 Independence Way, Suite 200, Princeton, NJ 08540 fsoft@nec-labs.com