Program Analysis and Verification

Similar documents
Principles of Program Analysis. Lecture 1 Harry Xu Spring 2013

Iterative Program Analysis Abstract Interpretation

TVLA: A SYSTEM FOR GENERATING ABSTRACT INTERPRETERS*

Interprocedural Analysis. CS252r Fall 2015

Lecture 6. Abstract Interpretation

CS 6110 S14 Lecture 38 Abstract Interpretation 30 April 2014

Dataflow analysis (ctd.)

Algebraic Program Analysis

A Gentle Introduction to Program Analysis

Program Static Analysis. Overview

Automatic Software Verification

An Introduction to Heap Analysis. Pietro Ferrara. Chair of Programming Methodology ETH Zurich, Switzerland

Static Program Analysis

Advanced Programming Methods. Introduction in program analysis

Static Analysis and Dataflow Analysis

4/24/18. Overview. Program Static Analysis. Has anyone done static analysis? What is static analysis? Why static analysis?

Static Program Analysis

CS422 - Programming Language Design

Foundations of Dataflow Analysis

EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization

Formal Syntax and Semantics of Programming Languages

do such opportunities arise? (i) calling library code (ii) Where programs. object-oriented Propagating information across procedure boundaries is usef

Thursday, December 23, The attack model: Static Program Analysis

Lecture Notes: Widening Operators and Collecting Semantics

Static Analysis by A. I. of Embedded Critical Software

Modular Verification with Shared Abstractions

Static Analysis. Systems and Internet Infrastructure Security

Lecture 5. Data Flow Analysis

Flow Analysis. Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM

Widening Operator. Fixpoint Approximation with Widening. A widening operator 2 L ˆ L 7``! L is such that: Correctness: - 8x; y 2 L : (y) v (x y)

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS

Lecture 5: The Halting Problem. Michael Beeson

CS 242. Fundamentals. Reading: See last slide

Outline. Information flow analyis Data Race analyis. 1 Introduction. 2 A Certified Lightweight Array Bound Checker

The Apron Library. Bertrand Jeannet and Antoine Miné. CAV 09 conference 02/07/2009 INRIA, CNRS/ENS

HW/SW Codesign. WCET Analysis

Static Program Analysis

Backward Analysis via Over-Approximate Abstraction and Under-Approximate Subtraction

PROGRAM ANALYSIS & SYNTHESIS

Program Analysis: Lecture 02 Page 1 of 32

Compiler Design. Fall Data-Flow Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

3.7 Denotational Semantics

Abstract Interpretation Continued

A Note on Karr s Algorithm

Foundations. Yu Zhang. Acknowledgement: modified from Stanford CS242

Theory of Programming Languages COMP360

Distributed Systems Programming (F21DS1) Formal Verification

Advanced Compiler Construction

Static Program Analysis Part 7 interprocedural analysis

Abstract debugging of higher-order imperative languages

(Refer Slide Time: 1:27)

Principles of Program Analysis: A Sampler of Approaches

1KOd17RMoURxjn2 CSE 20 DISCRETE MATH Fall

Reasoning About Imperative Programs. COS 441 Slides 10

Time Stamps for Fixed-Point Approximation

Homework Assignment #1 Sample Solutions

Fundamental Algorithms for System Modeling, Analysis, and Optimization

Automatic Assume/Guarantee Reasoning for Heap-Manipulating Programs (Ongoing Work)

CS510 Assignment #2 (Due March 7th in class)

Lecture Notes for Chapter 2: Getting Started

Sendmail crackaddr - Static Analysis strikes back

axiomatic semantics involving logical rules for deriving relations between preconditions and postconditions.

20b -Advanced-DFA. J. L. Peterson, "Petri Nets," Computing Surveys, 9 (3), September 1977, pp

Page # 20b -Advanced-DFA. Reading assignment. State Propagation. GEN and KILL sets. Data Flow Analysis

Static Program Analysis CS701

EDAA40 At home exercises 1

Optimizing Compilers. Vineeth Kashyap Department of Computer Science, UCSB. SIAM Algorithms Seminar, 2014

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

No model may be available. Software Abstractions. Recap on Model Checking. Model Checking for SW Verif. More on the big picture. Abst -> MC -> Refine

Formal Syntax and Semantics of Programming Languages

Introduction to Denotational Semantics. Brutus Is An Honorable Man. Class Likes/Dislikes Survey. Dueling Semantics

G Programming Languages - Fall 2012

Putting Static Analysis to Work for Verification

Static Analysis and Bugfinding

Outline and Reading. Analysis of Algorithms 1

Lecture 5 Sorting Arrays

CS302 Data Structures using C++

Hoare triples. Floyd-Hoare Logic, Separation Logic

Lecture 4. More on Data Flow: Constant Propagation, Speed, Loops

II (Sorting and) Order Statistics

Establishing Local Temporal Heap Safety Properties with Applications to Compile-Time Memory Management

Model checking pushdown systems

Static Program Analysis Part 8 control flow analysis

Data Structure and Algorithm Midterm Reference Solution TA

DATA STRUCTURES AND ALGORITHMS

Abstract Interpretation

Goals of Program Optimization (1 of 2)

Control Flow Analysis. Reading & Topics. Optimization Overview CS2210. Muchnick: chapter 7

9/19/12. Why Study Discrete Math? What is discrete? Sets (Rosen, Chapter 2) can be described by discrete math TOPICS

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Lecture Notes on Binary Search Trees

Overview of Dataflow Languages. Waheed Ahmad

LECTURE 9 Data Structures: A systematic way of organizing and accessing data. --No single data structure works well for ALL purposes.

Data-Flow Based Detection of Loop Bounds

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Formal Semantics of Programming Languages

Lecture 15 Notes Binary Search Trees

CHENNAI MATHEMATICAL INSTITUTE M.Sc. / Ph.D. Programme in Computer Science

Transcription:

Program Analysis and Verification 0368-4479 Noam Rinetzky Lecture 12: Interprocedural Analysis + Numerical Analysis Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav 1

Procedural program void main() { int x; x = p(7); x = p(9); } int p(int a) { } return a + 1; 2

Effect of procedures foo() bar() call bar() The effect of calling a procedure is the effect of executing its body 3

Interprocedural Analysis foo() bar() call bar() goal: compute the abstract effect of calling a procedure 4

Naïve solutions Inilining Call/Return as goto

Guiding light Exploit stack regime Precision Efficiency 6

Stack regime P() { R(); } R(){ } Q() { R(); } R P 7

Interprocedural Valid Paths f 1 f 2 call q ret f k-1 f k enter q f 4 f 3 ( ) f k-2 exit q f k-3 f 5 IVP: all paths with matching calls and returns And prefixes 8

Join Over All Paths (JOP) i start n fk o... o f1 L L JOP[v] = {[[e 1, e 2,,e n ]]( ) (e 1,, e n ) paths(v)} JOP LFP Sometimes JOP = LFP precise up to symbolic execution Distributive problem 9

CFL-Graph reachability Special cases of functional analysis Finite distributive lattices Provides more efficient analysis algorithms Reduce the interprocedural analysis problem to finding context free reachability 10

IDFS / IDE IDFS Interprocedural Distributive Finite Subset Precise interprocedural dataflow analysis via graph reachability. Reps, Horowitz, and Sagiv, POPL 95 IDE Interprocedural Distributive Environment Precise interprocedural dataflow analysis with applications to constant propagation. Reps, Horowitz, and Sagiv, FASE 95, TCS 96 More general solutions exist 11

Possibly Uninitialized Variables {} Start {w,x,y} x = 3 {w,y} if... {w,y} y = x {w} w = 8 y = w {w,y} {} printf(y) {w,y} {w,y} 12

IFDS Problems Finite subset distributive Lattice L = (D) is is Transfer functions are distributive Efficient solution through formulation as CFL reachability 13

Encoding Transfer Functions Enumerate all input space and output space Represent functions as graphs with 2(D+1) nodes Special symbol 0 denotes empty sets (sometimes denoted ) Example: D = { a, b, c } f(s) = (S {a}) U {b} 0 a b c 207 0 a b c 14

Efficiently Representing Functions Let f:2 D 2 D be a distributive function Then: f(x) = f( ) ( { f({z}) z X }) f(x) = f( ) ( { f({z}) \ f( ) z X }) 15

Representing Dataflow Functions Identity Function a b c Constant Function a b c 16

Representing Dataflow Functions Gen/Kill Function a b c Non- Gen/Kill Function a b c 17

x y start p(a,b) a b start main x = 3 if... b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p 18

Composing Dataflow Functions a b c a b c 19

( x y start p(a,b) a b start main x = 3 if... p(x,y) Might b y be uninitialized here? b = a p(a,b) return from p return from p YES! printf(y) printf(b) NO! exit main ) exit p ] 20

The Tabulation Algorithm Worklist algorithm, start from entry of main Keep track of Path edges: matched paren paths from procedure entry Summary edges: matched paren call-return paths At each instruction Propagate facts using transfer functions; extend path edges At each call Propagate to procedure entry, start with an empty path If a summary for that entry exits, use it At each exit Store paths from corresponding call points as summary paths When a new summary is added, propagate to the return node 21

Interprocedural Dataflow Analysis via CFL-Reachability Graph: Exploded control-flow graph L: L(unbalLeft) unballeft = valid Fact d holds at n iff there is an L(unbalLeft)-path from 22

Asymptotic Running Time CFL-reachability Exploded control-flow graph: ND nodes Running time: O(N 3 D 3 ) Exploded control-flow graph Special structure Running time: O(ED 3 ) Typically: E N, hence O(ED 3 ) O(ND 3 ) Gen/kill problems: O(ED) 23

IDE Goes beyond IFDS problems Can handle unbounded domains Requires special form of the domain Can be much more efficient than IFDS 24

Example Linear Constant Propagation Consider the constant propagation lattice The value of every variable y at the program exit can be represented by: y = {(a x x + b x ) x Var * } c a x,c Z {, } b x Z Supports efficient composition and functional join [z := a * y + b] What about [z:=x+y]? 25

Linear constant propagation Point-wise representation of environment transformers 26

IDE Analysis Point-wise representation closed under composition CFL-Reachability on the exploded graph Compose functions 27

28

29

Costs O(ED 3 ) Class of value transformers F L L id F Finite height Representation scheme with (efficient) Application Composition Join Equality Storage 30

Conclusion Handling functions is crucial for abstract interpretation Virtual functions and exceptions complicate things But scalability is an issue Small call strings Small functional domains Demand analysis 31

Challenges in Interprocedural Analysis Respect call-return mechanism Handling recursion Local variables Parameter passing mechanisms The called procedure is not always known The source code of the called procedure is not always available 32

A trivial treatment of procedure Analyze a single procedure After every call continue with conservative information Global variables and local variables which may be modified by the call have unknown values Can be easily implemented Procedures can be written in different languages Procedure inline can help 33

Disadvantages of the trivial solution Modular (object oriented and functional) programming encourages small frequently called procedures Almost all information is lost 34

Bibliography Textbook 2.5 Patrick Cousot & Radhia Cousot. Static determination of dynamic properties of recursive procedures In IFIP Conference on Formal Description of Programming Concepts, E.J. Neuhold, (Ed.), pages 237-277, St-Andrews, N.B., Canada, 1977. North-Holland Publishing Company (1978). Two Approaches to interprocedural analysis by Micha Sharir and Amir Pnueli IDFS Interprocedural Distributive Finite Subset Precise interprocedural dataflow analysis via graph reachability. Reps, Horowitz, and Sagiv, POPL 95 IDE Interprocedural Distributive Environment Precise interprocedural dataflow analysis with applications to constant propagation. Sagiv, Reps, Horowitz, and TCS 96 35

A Semantics for Procedure Local Heaps and its Abstractions Noam Rinetzky Tel Aviv University Jörg Bauer Universität des Saarlandes Thomas Reps University of Wisconsin Mooly Sagiv Tel Aviv University Reinhard Wilhelm Universität des Saarlandes

Motivation Interprocedural shape analysis Conservative static pointer analysis Heap intensive programs Imperative programs with procedures Recursive data structures Challenge Destructive update Localized effect of procedures

Main idea Local heaps x y g x call p(x); y g t t

Main idea Local heaps Cutpoints x y x call p(x); y g g t t

Numerical Analysis

Abstract Interpretation [Cousot 77] Mathematical foundation of static analysis 41

Widening/Narrowing 42

How can we prove this automatically? RelProd(CP, VE) 43

Intervals domain One of the simplest numerical domains Maintain for each variable x an interval [L,H] L is either an integer of - H is either an integer of + A (non-relational) numeric domain 44

Intervals lattice for variable x [-,+ ]... [-,-1] [-,-1] [-,0] [0,+ ] [1,+ ] [2,+ ]... [-20,10] [-10,10]... [-2,-1] [-1,0] [0,1] [1,2] [2,3]...... [-2,-2] [-1,-1] [0,0] [1,1] [2,2]... 45

Intervals lattice for variable x D int [x] = { (L,H) L -,Z and H Z,+ and L H} =[-,+ ] =? [1,2] [3,4]? [1,4] [1,3]? [1,3] [1,4]? [1,3] [-,+ ]? What is the lattice height? 46

Intervals lattice for variable x D int [x] = { (L,H) L -,Z and H Z,+ and L H} =[-,+ ] =? [1,2] [3,4] no [1,4] [1,3] no [1,3] [1,4] yes [1,3] [-,+ ] yes What is the lattice height? Infinite 47

Joining/meeting intervals [a,b] [c,d] =? [1,1] [2,2] =? [1,1] [2, + ] =? [a,b] [c,d] =? [1,2] [3,4] =? [1,4] [3,4] =? [1,1] [1,+ ] =? Check that indeed x y if and only if x y=y 48

Joining/meeting intervals [a,b] [c,d] = [min(a,c), max(b,d)] [1,1] [2,2] = [1,2] [1,1] [2,+ ] = [1,+ ] [a,b] [c,d] = [max(a,c), min(b,d)] if a proper interval and otherwise [1,2] [3,4] = [1,4] [3,4] = [3,4] [1,1] [1,+ ] = [1,1] Check that indeed x y if and only if x y=y 49

Interval domain for programs D int [x] = { (L,H) L -,Z and H Z,+ and L H} For a program with variables Var={x 1,,x k } D int [Var] =? 50

Interval domain for programs D int [x] = { (L,H) L -,Z and H Z,+ and L H} For a program with variables Var={x 1,,x k } D int [Var] = D int [x 1 ] D int [x k ] How can we represent it in terms of formulas? 51

Interval domain for programs D int [x] = { (L,H) L -,Z and H Z,+ and L H} For a program with variables Var={x 1,,x k } D int [Var] = D int [x 1 ] D int [x k ] How can we represent it in terms of formulas? Two types of factoids x c and x c Example: S = {x 9, y 5, y 10} Helper operations c + + = + remove(s, x) = S without any x-constraints lb(s, x) = 52

Assignment transformers x := c # S =? x := y # S =? x := y+c # S =? x := y+z # S =? x := y*c # S =? x := y*z # S =? 53

Assignment transformers x := c # S = remove(s,x) {x c, x c} x := y # S = remove(s,x) {x lb(s,y), x ub(s,y)} x := y+c # S = remove(s,x) {x lb(s,y)+c, x ub(s,y)+c} x := y+z # S = remove(s,x) {x lb(s,y)+lb(s,z), x ub(s,y)+ub(s,z)} x := y*c # S = remove(s,x) if c>0 {x lb(s,y)*c, x ub(s,y)*c} else {x ub(s,y)*-c, x lb(s,y)*-c} x := y*z # S = remove(s,x)? 54

assume transformers assume x=c # S =? assume x<c # S =? assume x=y # S =? assume x c # S =? 55

assume transformers assume x=c # S = S {x c, x c} assume x<c # S = S {x c-1} assume x=y # S = S {x lb(s,y), x ub(s,y)} assume x c # S =? 56

assume transformers assume x=c # S = S {x c, x c} assume x<c # S = S {x c-1} assume x=y # S = S {x lb(s,y), x ub(s,y)} assume x c # S = (S {x c-1}) (S {x c+1}) 57

Effect of function f on lattice elements L = (D,,,,, ) f : D D monotone Fix(f) = { d f(d) = d } Red(f) = { d f(d) d } Ext(f) = { d d f(d) } Theorem [Tarski 1955] lfp(f) = Fix(f) = Red(f) Fix(f) gfp(f) = Fix(f) = Ext(f) Fix(f) Fix(f) Ext(f) Red(f) f n ( ) gfp lfp f n ( ) 58

Effect of function f on lattice elements L = (D,,,,, ) f : D D monotone Fix(f) = { d f(d) = d } Red(f) = { d f(d) d } Ext(f) = { d d f(d) } Theorem [Tarski 1955] lfp(f) = Fix(f) = Red(f) Fix(f) gfp(f) = Fix(f) = Ext(f) Fix(f) Fix(f) Ext(f) Red(f) f n ( ) gfp lfp f n ( ) 59

Continuity and ACC condition Let L = (D,,, ) be a complete partial order Every ascending chain has an upper bound A function f is continuous if for every increasing chain Y D*, f( Y) = { f(y) y Y } L satisfies the ascending chain condition (ACC) if every ascending chain eventually stabilizes: d 0 d 1 d n = d n+1 = 60

Fixed-point theorem [Kleene] Let L = (D,,, ) be a complete partial order and a continuous function f: D D then lfp(f) = n N f n ( ) 61

Resulting algorithm Kleene s fixed point theorem gives a constructive method for computing the lfp Mathematical definition lfp(f) = n N f n ( ) Algorithm d := while f(d) d do d := d f(d) return d lfp fn ( ) f 2 ( ) f( ) 62

Chaotic iteration Input: A cpo L = (D,,, ) satisfying ACC L n = L L L A monotone function f : D n D n A system of equations { X[i] f(x) 1 i n } Output: lfp(f) A worklist-based algorithm for i:=1 to n do X[i] := WL = {1,,n} while WL do j := pop WL // choose index non-deterministically N := F[i](X) if N X[i] then X[i] := N add all the indexes that directly depend on i to WL (X[j] depends on X[i] if F[j] contains X[i]) return X 63

Concrete semantics equations R[0] R[2] R[3] R[4] R[1] R[5] R[6] R[0] = {x Z} R[1] = x:=7 R[2] = R[1] R[4] R[3] = R[2] {s s(x) < 1000} R[4] = x:=x+1 R[3] R[5] = R[2] {s s(x) 1000} R[6] = R[5] {s s(x) 1001} 64

Abstract semantics equations R[0] R[2] R[3] R[4] R[1] R[5] R[6] R[0] = ({x Z}) R[1] = x:=7 # R[2] = R[1] R[4] R[3] = R[2] ({s s(x) < 1000}) R[4] = x:=x+1 # R[3] R[5] = R[2] ({s s(x) 1000}) R[6] = R[5] ({s s(x) 1001}) R[5] ({s s(x) 999}) 65

Abstract semantics equations R[0] R[2] R[3] R[4] R[1] R[5] R[6] R[0] = R[1] = [7,7] R[2] = R[1] R[4] R[3] = R[2] [-,999] R[4] = R[3] + [1,1] R[5] = R[2] [1000,+ ] R[6] = R[5] [999,+ ] R[5] [1001,+ ] 66

Too many iterations to converge 67

How many iterations for this one? 68

Widening Introduce a new binary operator to ensure termination A kind of extrapolation Enables static analysis to use infinite height lattices Dynamically adapts to given program Tricky to design Precision less predictable then with finiteheight domains (widening non-monotone) 69

Formal definition For all elements d 1 d 2 d 1 d 2 For all ascending chains d 0 d 1 d 2 the following sequence eventually stabilizes y 0 = d 0 y i+1 = y i d i+1 For a monotone function f : D D define x 0 = x i+1 = x i f(x i ) Theorem: There exits k such that x k+1 = x k x k Red(f) = { d d D and f(d) d } 70

Analysis with finite-height lattice Red(f) A Fix(f) f #n = lpf(f # ) f #3 f #2 f # 71

Analysis with widening A Red(f) f #2 f #3 Fix(f) lpf(f # ) f #3 f #2 f # 72

Widening for Intervals Analysis [c, d] = [c, d] [a, b] [c, d] = [ if a c then a else -, if b d then b else 73

Semantic equations with widening R[0] R[2] R[3] R[4] R[1] R[5] R[6] R[0] = R[1] = [7,7] R[2] = R[1] R[4] R[2.1] = R[2.1] R[2] R[3] = R[2.1] [-,999] R[4] = R[3] + [1,1] R[5] = R[2] [1001,+ ] R[6] = R[5] [999,+ ] R[5] [1001,+ ] 74

Non monotonicity of widening [0,1] [0,2] =? [0,2] [0,2] =?

Non monotonicity of widening [0,1] [0,2] = [0, ] [0,2] [0,2] = [0,2]

Analysis results with widening Did we prove it? 78

Analysis with narrowing Red(f) A f #2 f #3 Fix(f) lpf(f # ) f #3 f #2 f # 79

Formal definition of narrowing Improves the result of widening y x y (x y) x For all decreasing chains x 0 x 1 the following sequence is finite y 0 = x 0 y i+1 = y i x i+1 For a monotone function f: D D and x k Red(f) = { d d D and f(d) d } define y 0 = x y i+1 = y i f(y i ) Theorem: There exits k such that y k+1 =y k y k Red(f) = { d d D and f(d) d }

Narrowing for Interval Analysis [a, b] = [a, b] [a, b] [c, d] = [ if a = - then c else a, if b = then d else b ]

Semantic equations with narrowing R[0] R[2] R[3] R[4] R[1] R[5] R[6] R[0] = R[1] = [7,7] R[2] = R[1] R[4] R[2.1] = R[2.1] R[2] R[3] = R[2.1] [-,999] R[4] = R[3]+[1,1] R[5] = R[2] # [1000,+ ] R[6] = R[5] [999,+ ] R[5] [1001,+ ] 82

Analysis with widening/narrowing Two phases Phase 1: analyze with widening until converging Phase 2: use values to analyze with narrowing Phase 1: R[0] = R[1] = [7,7] R[2] = R[1] R[4] R[2.1] = R[2.1] R[2] R[3] = R[2.1] [-,999] R[4] = R[3] + [1,1] R[5] = R[2] [1001,+ ] R[6] = R[5] [999,+ ] R[5] [1001,+ ] Phase 2: R[0] = R[1] = [7,7] R[2] = R[1] R[4] R[2.1] = R[2.1] R[2] R[3] = R[2.1] [-,999] R[4] = R[3]+[1,1] R[5] = R[2] # [1000,+ ] R[6] = R[5] [999,+ ] R[5] [1001,+ ] 83

Analysis with widening/narrowing 84

Analysis results widening/narrowing Precise invariant 85

Project 1-2 Students in a group 3-4: Bigger projects Theoretical + Practical Your choice of topic Contact me in 3 weeks Submission 15/Sep Code + Examples Document 15 minutes presentation 87

Past projects JavaScript Dominator Analysis Attribute Analysis for JavaScript Simple Pointer Analysis for C Adding program counters to Past Abstraction (abstraction of finite state machines.) Verification of Asynchronous programs Verifying SDNs using TVLA Verifying independent accesses to arrays in GO 88

Past projects Detecting index out of bound errors in C programs Lattice-Based Semantics for Combinatorial Models Evolution Verifying sorting programs Cross-array sorting (array of arrays) use for storage systems version management Verifying LTL formulae over TVLA structures Worst-case memory consumption 89

Past projects Automatic loop parallelization via dependency tracking Handling asynchronous calls

Default Project Pick a framework LLVM ( C ) : http://llvm.org/ Soot ( Java ) : https://sable.github.io/soot/ Analysis: Refined pointer analysis Invent numerical domain