Program Static Analysis. Overview

Similar documents
4/24/18. Overview. Program Static Analysis. Has anyone done static analysis? What is static analysis? Why static analysis?

Principles of Program Analysis. Lecture 1 Harry Xu Spring 2013

Data Flow Analysis. Agenda CS738: Advanced Compiler Optimizations. 3-address Code Format. Assumptions

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

CSE 501 Midterm Exam: Sketch of Some Plausible Solutions Winter 1997

Flow Analysis. Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM

Foundations of Dataflow Analysis

Hoare Logic: Proving Programs Correct

Advanced Compiler Construction

Abstract Interpretation

Lecture Notes: Hoare Logic

6. Hoare Logic and Weakest Preconditions

20b -Advanced-DFA. J. L. Peterson, "Petri Nets," Computing Surveys, 9 (3), September 1977, pp

Page # 20b -Advanced-DFA. Reading assignment. State Propagation. GEN and KILL sets. Data Flow Analysis

Interprocedural Analysis. CS252r Fall 2015

Dataflow analysis (ctd.)

Data Flow Analysis. CSCE Lecture 9-02/15/2018

Applications of Program analysis in Model-Based Design

Having a BLAST with SLAM

Reasoning about programs

Last time. Reasoning about programs. Coming up. Project Final Presentations. This Thursday, Nov 30: 4 th in-class exercise

Main Goal. Language-independent program verification framework. Derive program properties from operational semantics

Data Flow Analysis. Program Analysis

Proofs from Tests. Nels E. Beckman, Aditya V. Nori, Sriram K. Rajamani, Robert J. Simmons, SaiDeep Tetali, Aditya V. Thakur

Dependence and Data Flow Models. (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 1

Static Analysis and Dataflow Analysis

Formal Methods. CITS5501 Software Testing and Quality Assurance

No model may be available. Software Abstractions. Recap on Model Checking. Model Checking for SW Verif. More on the big picture. Abst -> MC -> Refine

Why Data Flow Models? Dependence and Data Flow Models. Learning objectives. Def-Use Pairs (1) Models from Chapter 5 emphasized control

Lecture 6. Abstract Interpretation

A Gentle Introduction to Program Analysis

Compiler Design. Fall Data-Flow Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Data-flow Analysis - Part 2

An Annotated Language

CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers

Why Global Dataflow Analysis?

Thursday, December 23, The attack model: Static Program Analysis

Lecture 27. Pros and Cons of Pointers. Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis

Reasoning About Imperative Programs. COS 441 Slides 10

Recap. Juan Pablo Galeotti,Alessandra Gorla, Software Engineering Chair Computer Science Saarland University, Germany

Advanced Programming Methods. Introduction in program analysis

Dependence and Data Flow Models. (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 1

Interprocedural Analysis with Data-Dependent Calls. Circularity dilemma. A solution: optimistic iterative analysis. Example

Data-flow Analysis. Y.N. Srikant. Department of Computer Science and Automation Indian Institute of Science Bangalore

Static Analysis. Systems and Internet Infrastructure Security

A Context-Sensitive Memory Model for Verification of C/C++ Programs

COMP 507: Computer-Aided Program Design

Compiler Optimization and Code Generation

Having a BLAST with SLAM

Lecture 20 Pointer Analysis

Overview. Verification with Functions and Pointers. IMP with assertions and assumptions. Proof rules for Assert and Assume. IMP+: IMP with functions

Proof Carrying Code(PCC)

Lecture 14 Pointer Analysis

Softwaretechnik. Program verification. Albert-Ludwigs-Universität Freiburg. June 28, Softwaretechnik June 28, / 24

We can express this in dataflow equations using gen and kill sets, where the sets are now sets of expressions.

CO444H. Ben Livshits. Datalog Pointer analysis

Lecture 5. Data Flow Analysis

Automatic Software Verification

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

We can express this in dataflow equations using gen and kill sets, where the sets are now sets of expressions.

Static Program Analysis Part 9 pointer analysis. Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Axiomatic Semantics. Automated Deduction - George Necula - Lecture 2 1

Program verification. Generalities about software Verification Model Checking. September 20, 2016

Dataflow Analysis. Xiaokang Qiu Purdue University. October 17, 2018 ECE 468

Interprocedural Analysis with Data-Dependent Calls. Circularity dilemma. A solution: optimistic iterative analysis. Example

Program Verification (6EC version only)

CS711 Advanced Programming Languages Pointer Analysis Overview and Flow-Sensitive Analysis

Hoare Logic. COMP2600 Formal Methods for Software Engineering. Rajeev Goré

Control Flow Analysis. Reading & Topics. Optimization Overview CS2210. Muchnick: chapter 7

Alias Analysis & Points-to Analysis. Hwansoo Han

VS 3 : SMT Solvers for Program Verification

Having a BLAST with SLAM

Proofs from Tests. Sriram K. Rajamani. Nels E. Beckman. Aditya V. Nori. Robert J. Simmons.

Symbolic Execution and Verification Conditions. Review of Verification Conditions. Survey from Homework 4. Time spent: 7.1 hrs (mean), 6 hrs (median)

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Homework Assignment #1 Sample Solutions

Semantics. There is no single widely acceptable notation or formalism for describing semantics Operational Semantics

Lecture 6 Foundations of Data Flow Analysis

Lecture 16 Pointer Analysis

In Our Last Exciting Episode

Static program checking and verification

Lecture 6 Foundations of Data Flow Analysis

Chapter 3 (part 3) Describing Syntax and Semantics

Data-Flow Analysis Foundations

Program Analysis and Verification

Softwaretechnik. Program verification. Software Engineering Albert-Ludwigs-University Freiburg. June 30, 2011

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Critical Analysis of Computer Science Methodology: Theory

Static Analysis. Xiangyu Zhang

Axiomatic Rules. Lecture 18: Axiomatic Semantics & Type Safety. Correctness using Axioms & Rules. Axiomatic Rules. Steps in Proof

More Dataflow Analysis

Review: Hoare Logic Rules

Compiler Construction 2010/2011 Loop Optimizations

CSE 307: Principles of Programming Languages

CS553 Lecture Generalizing Data-flow Analysis 3

Compilers. Optimization. Yannis Smaragdakis, U. Athens

Programming Languages and Compilers Qualifying Examination. Answer 4 of 6 questions.1

Program Verification. Aarti Gupta

Lecture Notes: Pointer Analysis

Transcription:

Program Static Analysis Overview Program static analysis Abstract interpretation Data flow analysis Intra-procedural Inter-procedural 2 1

What is static analysis? The analysis to understand computer software without executing programs Simple coding style Empty statement, EqualsHashcode Complex property of the program the program s implementation matches its specification Given program P and specification S, does P satisfy S? Can be conducted on source code or object code 3 Has anyone done static analysis? Code review 4 2

Why static analysis? Program comprehension Is this value a constant? Bug finding Is a file closed on every path after all its access? Program optimization Constant propagation 5 An Informal Introduction to Abstract Interpretation Patrick Cousot[2] Modified by Na Meng 3

Semantics & Safety The concrete semantics of a program formalizes (is a mathematical model of) the set of all its possible executions in all possible execution environments Safety: No possible execution in any possible execution environment can reach an erroneous state 7 Undecidability The concrete semantics of a program is undecidable Given an arbitrary program, can you prove that it halts or not on any possible input? Turing proved no algorithm can exist that always correctly decides whether, for a given arbitrary program and its input, the program halts when run with that input 8 4

Abstract Semantics A sound approximation (superset) of the concrete semantics It covers all possible concrete cases If the abstract semantics is proved to be safe, then so is the concrete semantics Abstract interpretation abstract semantics + proof of safe properties 9 Why is Testing/Debugging insufficient? Only consider a subset of the possible executions No correctness proof No guarantee of full coverage of concrete semantics 10 5

Static Analysis Techniques Model checking Theorem proving Data flow analysis 11 Model Checking The abstract semantics is modeled as a finite state machine of the program execution The model can be manually defined or automatically computed Each state is enumerated exhaustively to automatically check whether this model meets a given specification 12 6

An Example [3] import java.util.random; public class Rand { public static void main (String[] args) { Random random = new Random(42);// (1) int a = random.nextint(2); // (2) } } System.out.println("a=" + a);...... int b = random.nextint(3); // (3) System.out.println(" b=" + b); int c = a/(b+a - 2); // (4) System.out.println(" c=" + c); Is there any Divide-by-Zero error? 13 Model Checking b=0 3 start a=0 a=1 1 2 b=1 b=2 b=0 b=2 b=1 4 5 6 7 8 1 Random random = new Random() 2 int a = random.nextint(2) 3 int b = random.nextint(3) 4 int c = a/(b+a-2) 9 10 11 12 13 c=0 c=0/0 c=-1 c=1/0 c=1 What if there is any loop? 14 7

Limitations of Model Checking There can be too many states to enumerate Abstract model creation puts burden on programmers The model may be wrong If verification fails, is the problem in the model or the program? 15 An axiomatic approach [4] Add auxiliary specifications to the program to decompose the verification task into a set of local verification tasks Verify each local verification problem 16 8

Limitations Auxiliary spec is burden on programmers Auxiliary spec might be incorrect If verification fails, is the problem with the auxiliary specification or the program? 17 Theorem Proving Program Specification VC generation Semantics Validity Meets spec/found Bug Provability (theorem proving) Theorem in a logic Soundness If the theorem is valid then the program meets specification If the theorem is provable then it is valid 18 9

From programs to theorems Verification condition generation From theorems to proofs Theorem provers 19 Verification Condition Generation State predicates/assertions: Boolean functions on program states E.g., x = 8, x < y, true, false You can deduce verification condition predicates from known predicates at a given program location 20 10

Hoare Triples [6] For any predicates P and Q and program S, precondi5on {P} S {Q} postcondi5on says that if S is started in a state satisfying P, then it terminates in Q E.g., {true} x := 12 {x = 12}, {x < 40} x := x+1 {x 40} 21 Precise Triples If {P} S {Q R} holds, then do {P} S {Q} and {P} S {R} hold? Strongest postcondition The most precise postcondition (Q R), which implies any postcondition satisfied by the final state of any execution x of S E.g., {true} x := 12 {x = 12} vs. {true} x:=12 {x > 0}, which postcondition is stronger? 22 11

Precise Triples If {P} S {R} or {Q} S {R} hold, then does {P Q} S {R} hold? Weakest preconditions The most general precondition {P Q}, is the weakest precondition on the initial state ensuring that execution of S terminates in a final state satisfying R. E.g., {x=13} x = x+3 {x >13} vs. {x>10} x = x+3 {x >13}, which precondition is weaker? 23 Example: Does the program satisfy the specification? Specification requires true (precondition) ensures c = a b (postcondition) Program bool or(bool a, bool b) { if (a) c := true; else c := b; return c } 24 12

Theorem Proving Step 1 Given the post condition, infer the weakest precondition of the program Step 2 Verify that if the given precondition can infer the weakest precondition If so, the program satisfies the specification Otherwise, it does not 25 Weakest Precondition Rules WP(x := E, B) = B[E/x] WP(s1; s2, B) = WP(s1, WP(s2, B)) WP(if E then s1 else s2, B) = (E => WP(s1, B)) ( E => WP(s2, B)) WP(assert E, B) = E B What is the WP of our example program? WP(S) = (a=>true=a b) ( a=>b=a b) 26 13

Conjecture to be proved: true=> (a=>true=a b) ( a=>b=a b) 27 Data Flow Analysis [5] Peter Lee Modified by Na Meng 14

Data Flow Analysis A technique to gather information about the possible set of values calculated at various points in a computer program 29 How to do data flow analysis? Set up data-flow equations for each node of the control flow graph out b = trans(in b ) in b = join p predb (out p ) Solve the equation set iteratively, until reaching a fixpoint: all in-states do not change for i 1 to N initialize node i while (sets are still changing) for i 1 to N recompute sets at node i 30 15

Work List Iterative Algorithm for i 1 to N initialize node i add node i to worklist while (worklist is not empty) remove a node n from worklist calculate out-state based on in-state if out-state is different from the original value worklist = worklist U succ(n) 31 Directions of Data Flow Analysis Forward Calculate output-states based on inputstates Backward Calculate input-states based on outputstates 32 16

An Example [7] What variable definitions reach the current program point? 1: int N = input() 2: initialize array A[N + 1] 3: call check(n) 4: int I = 1 5: while (I < N) { 6: A(I) = A(I) + I 7: I = I + 1 } entry 1 2 3 4 5 6 8 8: print A(N) 33 7 exit Reaching Definition A definition at program point d reaches program point u if there is a controlflow path from d to u that does not contain a definition of the same variable as d 34 17

Reaching Definition Equations Forward analysis out b = gen b (in b kill b ) in b = p predb (out p ) gen b : variable definitions generated by b kill b : definitions killed at b by redefinitions of the variable(s) initialization: in = {} 35 Using Reaching Definition Constant propagation int x = 5; int y = 7; int z = x + y; int w = x y; Detection of uninitialized variables int x; if ( ) x = 1; a = x; 36 18

Using Reaching Definition Loop-invariant Code Motion Consider an expression inside a loop. If all reaching definitions are outside of the loop, then move the expression out of the loop 37 Revisit the Example[7] What variable definitions are or will be actually used? entry 1: int N = input() 2 2: initialize array A[N + 1] 3: call check(n) 4: int I = 1 3 4 5: while (I < N) { 6: A(I) = A(I) + I 5 7: I = I + 1 } 6 8 8: print A(N) 38 1 7 exit 19

Live-out Variables A variable v is live-out of statement n if v is used along some control path starting at n. Otherwise, we say that v is dead What variables definitions are actually used? 39 Liveness Analysis Equations Backward analysis in b = out b kill b gen b out b = p succb (in p ) gen b : variables used by b kill b : if v is defined without using v, all its prior definitions are killed initialization: out = {} 40 20

Using Liveness Analysis Dead code elimination Suppose we have a statement defining a variable, whose value is not used, then the definition can be removed without any side effect 41 Available Expression An expression e is available at statement n if it is computed along every path from entry node to n, and no variable used in e gets redefined between e s computation and n 42 21

1: c = a + b 2: d = a * c 3: e = d * d 4: i = 1 5: f[i] = a + b 6: c = c * 2 7: if c > d goto 10 8: g[i] = d * d 9: goto 11 10: g[i] = a * c 11: i = i + 1 12: if i <= 10 goto 5 entry What are the available expressions at 5? What direction is the analysis? How to define gen b and kill b? BB3:8-9 BB1:1-4 BB2:5-7 BB5:11-12 BB4:10 What are the equations for in b and out b? exit 43 Using Available Expressions Common-subexpression elimination 44 22

Our analyses so far backward forward union Reaching definitions Live variables intersection Available expressions 45 Questions Does work list iterative algorithm always terminate? 46 23

Lattices A lattice L is a (possibly infinite) set of values, along with and operations x, y L, unique w and z such that x y = w and x y = z x, y L, x y = y x and and x y = y x x, y, z L,(x y) z = x (y z) (x y) z = x (y z) such that and, Τ L, x L, x = x Τ = Τ 47 Monotonic Functions The join and meet operators induce a partial order on the lattice elements x y if and only if x y = x reflexive, anti-symmetric, transitive For a lattice L, a function f: L->L is monotonic if for all x, y L x y f (x) f (y) or x y f (x) f (y) 48 24

Reaching definition is monotonic out b = gen b (in b kill b ) Proof (for single-variable single-block programs) by contradiction: Suppose in b = {1},out b = {0}, where 1 means there is a variable definition, 0 means no definition, then gen b = {0}, kill b = {1}. However, kill b = {1} only if the block b has a redefinition of the variable, which means gen b = {1} 49 Therefore, after limited number of iterations (N* (E+1) at worst case), every definition is propagated to every node Therefore, we can find a fixpoint p, such that f(p) = p 50 25

In dataflow analysis, we require that all flow functions be monotone and have only finite-length effective chains Ingredients of a Dataflow Analysis Flow direction Transfer function Meet operator (Join function) Dataflow information Set of definitions, variables, and expressions initialization How about concrete data values? 52 26

1: c = 3 2: d = 5 3: e = c + d 4: f = input() 5: if (f>0) 6: e = 0 7: g = d + e Constant Propagation What is the value of variables at 4 and 7? How do you define the data flow analysis? 53 Constant-propagation Analysis For a single-variable program Direction: forward Transfer function: out b = gen b (in b kill b ) Dataflow value: elements in CP-lattice Meet operator(join function): CP-lattice Initialization: means uninitialized variable means not a constant -2-1 0 1 2 54 27

Inter-procedural Analysis [8] Stephen Chong Imported by Na Meng Procedures So far we have looked at intraprocedural analysis: analyzing a single procedure Inter-procedural analysis uses calling relationships among procedures Connect intra-procedural analysis results via call edges Enable more precise analysis information 56 28

Inter-procedural CFG void main() { x = 7; r = p(x); x = r; z = p(x+10); } int p(int a) { int y; if (a < 9) y = 0; else y = 1; return y; Entry main x = 7 call p(x) r = return p(x) x = r call p(x+10) z = return p(x+10) Exit main Entry p a < 9 y = 0 y = 1 return y; Exit p } 57 Imprecision Dataflow facts from one call site can taint results at other call sites Is z a constant? 58 29

Inlining Make a copy of the callee s CFG at each call site Entry p Entry main x := 7 Call p(x) r := Return p(x) x := r Call p(x+10) z := return p(x+10) Exit main a < 9 y := 0 y := 1 return y; Exit p Entry p a < 9 y := 0 y := 1 return y; Exit p 59 Exponential Size Increase How about recursive function calls? p(int n) { p(n - 1); } The exponential increase makes analysis infeasible 60 30

Context Sensitivity Make a finite number of copies Use context information to determine when to share a copy Different decisions achieve different tradeoffs between precision and scalability Common choice: approximation call stack 61 An Example Context insensitivity main Context sensitive, 1-stack depth main Context sensitive, 2-stack depth main b() e() b() e() b() e() c() f() c() f() c() f() c() f() c() f() d() g() d() g() d() g() d g d g d g d g 62 31

Procedure Summaries In practice, people don t construct a single global CFG and then perform dataflow Instead, construct and use procedure summaries Summarize effect of callees on callers E.g., is there any side effect on callers? Summarize effect of callers on callees E.g., is any parameter constant? 63 Other Contexts Object/pointer sensitivity What is the type of a given object and what are the corresponding possible method targets? What is the value of a given object s field? 64 32

Pointer Analysis What is the points-to set of p? int x = 3; int y = 0; int* p = unknown()? &x : & y; Alias analysis Decide whether separate memory references point to the same area of memory Can be used interchangeably with pointer analysis (points-to analysis) 65 Flow Sensitivity Flow insensitive analysis Perform analysis without caring about the statement execution order E.g., analysis of c1;c2 will be the same as c2;c1 Address-taken, Steensgaard, Anderson Flow sensitive analysis Observes the statement execution order 66 33

An Example 1: a = &b 2: b = &c 3: f = &d 4: d = &e 5: a = f 1 2 a -> b -> c 4 3 f -> d -> e After 5, both *a and *f point to d 67 Address Taken Assume that variables whose addresses are taken may be referenced by all pointers Address-taken variables: b, c, d, e A single alias pointer set: {a, b, f, d} 1: a = &b 2: b = &c 3: f = &d 4: d = &e 5: a = f 1 2 a -> b -> c 4 3 f -> d -> e 68 34

Steensgaard Constraints p = &x: x pts-to(p) p = q: pts-to(p) = pts-to(q) p = *q a pts-to(q), pts-to(p)=pts-to(a) *p = q b pts-to(p), pts-to(b)=pts-to(q) 1: a = &b Points-to set: pts(a) = pts(f) 2: b = &c ={b, d} 1 2 3: f = &d a -> b -> c 4: d = &e 4 3 f -> d -> e 5: a = f -> -> 69 Andersen Subset Constraints p = &x: x pts-to(p) p = q: pts-to(q) pts-to(p) p = *q a pts-to(q), pts-to(a) pts-to(a) *p = q b pts-to(p), pts-to(q) pts-to(b) 1: a = &b Points-to set: pts(a)={b, d}, pts(f)={d} 2: b = &c 1 2 3: f = &d a -> b -> c 4 3 4: d = &e f -> d -> e 5: a = f -> 70 35

Flow-sensitive Pointer Analysis in b = p predb (out p ) x = y: strong update kill clear pts(x) gen add pts(y) to pts(x) *x = y: out b = gen b (in b kill b ) If x definitely points to a single concrete memory location z, pts(z) = y (strong update) If x may point to multiple locations, then week update by adding y to pts of all locations 71 Reference [1] Static program analysis, https://en.wikipedia.org/wiki/static_program_analysis [2] Patrick Cousot, A Tutorial on Abstract Interpretation, http://homepage.cs.uiowa.edu/~tinelli/classes/seminar/cousot--a %20Tutorial%20on%20AI.pdf [3] Software Model Checking Example, http:// javapathfinder.sourceforge.net/sw_model_checking.html [4] Automated Theorem Proving, https://courses.cs.washington.edu/courses/cse599f/06sp/ lectures/atp1.ppt [5] Peter Lee, Classical Dataflow Optimizations, http://www.cs.cmu.edu/afs/cs/academic/class/15745-s06/web/ handouts/04.pdf [6] K. Rustan M. Leino, Hoare-style program verification, http://research.microsoft.com/en-us/um/people/leino/papers/ cse503-leino-lecture0.ppt. 72 36

Reference [7] Kathryn S. McKinley, Data Flow Analysis and Optimizations, http://www.cs.utexas.edu/users/ mckinley/380c/lecs/03.pdf [8] Stephen Chong, Interprocedural Analysis, http://www.seas.harvard.edu/courses/ cs252/2011sp/slides/lec05- Interprocedural.pdf 73 37