APA Interprocedural Dataflow Analysis

Similar documents
Interprocedural Analysis. CS252r Fall 2015

CS 6110 S14 Lecture 38 Abstract Interpretation 30 April 2014

SOFTWARE ENGINEERING DESIGN I

CS 125 Section #4 RAMs and TMs 9/27/16

Advanced Compiler Construction

Note that in this definition, n + m denotes the syntactic expression with three symbols n, +, and m, not to the number that is the sum of n and m.

CSE 501 Midterm Exam: Sketch of Some Plausible Solutions Winter 1997

Programming Languages and Compilers Qualifying Examination. Answer 4 of 6 questions.1

Static Program Analysis Part 7 interprocedural analysis

Lecture 2: SML Basics

Static Program Analysis

Static Analysis. Systems and Internet Infrastructure Security

Induction and Semantics in Dafny

CSCC24 Functional Programming Scheme Part 2

axiomatic semantics involving logical rules for deriving relations between preconditions and postconditions.

Overview. A normal-order language. Strictness. Recursion. Infinite data structures. Direct denotational semantics. Transition semantics

Lecture Notes: Control Flow Analysis for Functional Languages

Overview. CS389L: Automated Logical Reasoning. Lecture 6: First Order Logic Syntax and Semantics. Constants in First-Order Logic.

Principles of Program Analysis. Lecture 1 Harry Xu Spring 2013

Lecture 6: Arithmetic and Threshold Circuits

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

1.3. Conditional expressions To express case distinctions like

3.7 Denotational Semantics

Static Analysis and Dataflow Analysis

Application: Programming Language Semantics

Lecture 3: Recursion; Structural Induction

Compilers and computer architecture: A realistic compiler to MIPS

Lecture 5: Declarative Programming. The Declarative Kernel Language Machine. September 12th, 2011

Principles of Program Analysis: A Sampler of Approaches

Announcements. Scope, Function Calls and Storage Management. Block Structured Languages. Topics. Examples. Simplified Machine Model.

Foundations of AI. 9. Predicate Logic. Syntax and Semantics, Normal Forms, Herbrand Expansion, Resolution

Mosig M1 - PLSCD Written exam

Thursday, December 23, The attack model: Static Program Analysis

The PCAT Programming Language Reference Manual

6.852: Distributed Algorithms Fall, Class 21

Program verification. Generalities about software Verification Model Checking. September 20, 2016

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

Data Flow Analysis using Program Graphs

CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers

This book is licensed under a Creative Commons Attribution 3.0 License

Implementing Continuations

Lecture 7: Primitive Recursion is Turing Computable. Michael Beeson

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

Principles of Program Analysis: Data Flow Analysis

Foundations of Dataflow Analysis

Lecture Notes: Widening Operators and Collecting Semantics

Part I Logic programming paradigm

PROGRAM ANALYSIS & SYNTHESIS

Lecture 5: Predicate Calculus. ffl Predicate Logic ffl The Language ffl Semantics: Structures

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

G Programming Languages - Fall 2012

142

Static Program Analysis Part 9 pointer analysis. Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University

Lecture 6: The Declarative Kernel Language Machine. September 13th, 2011

This is already grossly inconvenient in present formalisms. Why do we want to make this convenient? GENERAL GOALS

Semantics via Syntax. f (4) = if define f (x) =2 x + 55.

The Logical Design of the Tokeniser

Going beyond propositional logic

Exercise 1 ( = 18 points)

6.001 Notes: Section 6.1

Program Static Analysis. Overview

Java byte code verification

The SPL Programming Language Reference Manual

Goals: Define the syntax of a simple imperative language Define a semantics using natural deduction 1

Compiler Construction Lecture 05 A Simple Stack Machine. Lent Term, Lecturer: Timothy G. Griffin. Computer Laboratory University of Cambridge

Learning is Change in Knowledge: Knowledge-based Security for Dynamic Policies

COMPUTABILITY THEORY AND RECURSIVELY ENUMERABLE SETS

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

Computer Science Lab Exercise 1

Interprocedural Analysis with Data-Dependent Calls. Circularity dilemma. A solution: optimistic iterative analysis. Example

6.001 Notes: Section 15.1

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Module 02 Lecture - 45 Memoization

CS450 - Structure of Higher Level Languages

Functional programming with Common Lisp

CS422 - Programming Language Design

Semantics of programming languages

Programming Language Concepts, cs2104 Lecture 01 ( )

More About Recursive Data Types

Program Syntax; Operational Semantics

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

F. Brief review of classical predicate logic (PC)

Compiler construction

Compilers and computer architecture From strings to ASTs (2): context free grammars

CS153: Compilers Lecture 17: Control Flow Graph and Data Flow Analysis

Part VI. Imperative Functional Programming

(Refer Slide Time: 1:27)

Interprocedural Analysis and Abstract Interpretation. cs6463 1

Virtual Machine Where we are at: Part I: Stack Arithmetic. Motivation. Compilation models. direct compilation:... 2-tier compilation:

Chapter 3: Propositional Languages

Lecture 2: Analyzing Algorithms: The 2-d Maxima Problem

Writing an Interpreter Thoughts on Assignment 6

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

An Interactive Desk Calculator. Project P2 of. Common Lisp: An Interactive Approach. Stuart C. Shapiro. Department of Computer Science

Department of Computer Science COMP The Programming Competency Test

Chapter 3: Lexing and Parsing

Lecture #13: Type Inference and Unification. Typing In the Language ML. Type Inference. Doing Type Inference

6. Hoare Logic and Weakest Preconditions

FUNCTIONS. The Anatomy of a Function Definition. In its most basic form, a function definition looks like this: function square(x) { return x * x; }

CMPSCI 250: Introduction to Computation. Lecture #7: Quantifiers and Languages 6 February 2012

Flow Analysis. Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM

Transcription:

APA Interprocedural Dataflow Analysis Jurriaan Hage e-mail: J.Hage@uu.nl homepage: http://www.cs.uu.nl/people/jur/ Department of Information and Computing Sciences, Universiteit Utrecht May 1, 2014

The While language with procedures 2

Procedural programming Any sensible programming language supports procedures or functions in some form. The main complications that will arise are: How do we propagate analysis information into and out of procedures? A procedure can be jumped to from arbitrarily many locations. Do we join the results over all possible callers? How do we know where to return? What if we blindly propagate a single analysis result to all return locations? We focus on forward analysis. 3

Adding procedures to While Extend the While-language with procedures A program takes the form: begin D S end D is a sequence of procedure declarations: proc p(val x, res y) is ln S end lx x and y are formal parameters and local to p A procedure call is a statement: [call p(a,z)] lc l r a is passed by-value and can be any arithmetic expression z is call-by-result: it can only be used to pass the result back 4

Information about programs New block types: is, end and call (...) Entry and exit labels attached to is and end Call and return labels attached to call Add new kind of flow: (l c ; l n ) for procedure call/entry (lx ; l r ) for procedure exit/return Assume all programs are statically correct: only calls to existing procedures, all labels and procedure names unique. 5

An example program begin proc fib(val z, u, res v) is 1 if [z<3] 2 then [v := 1] 3 else ([call fib(z-2,0,u)] 4 5 ; [call fib(z-1,0,v)] 6 7 ; [v := v+u] 11 ) end 8 ; [call fib(x,0,y)] 9 10 end Syntax more restrictive than examples imply. Mimicking local variables: add by-value parameters (like u) Variables x and y have global scope The scope of u, v, z is limited to the body of fib. 6

The flow graph [call fib(x,0,y)] 9 10 is 1 [z<3] 2 yes [v := 1] 3 no [call fib(z-2,0,u)] 4 5 [call fib(z-1,0,v)] 6 7 end 8 [v := u+v] 11 7

Meet over all valid paths: MVP Generalize the utopian MOP and MOP solutions to the more precise MVP and MVP. Later we consider how to adapt monotone frameworks. Paths up to l: vpath (l) = {[l 1,..., l n 1 ] n 1, l n = l, [l 1,..., l n ] a valid path} MVP (l) = {f l (ι) l vpath (l)} Similarly for the closed case, MVP (l). But what is a valid path? 8

Unbalance and poisoning begin proc neg(val z, res u) is 1 [u := -z] 2 ) end 3 ; [call neg(-1,p)] 5 6 ; [call neg(1,n)] 7 8 end 9 Suppose we treat (5; 1) like (5, 1)? Suppose we want to track the signs of all variables. Poisoning: information about the first call to neg also flows to the second call. Reasonable? path and path do not always pair call labels correctly with the label of the return. Valid paths, on the other hand, are balanced. [5, 1, 2, 3, 8] is not valid, but [5, 1, 2, 3, 6] is.

Use valid paths and context instead Issues when defining valid paths Consider only balanced executions. During analysis we only consider finite prefixes of these, including finite prefixes of infinite ones. begin proc infinite(val n, res x) is 2 [call infinite(0,x)] 3 4; end 5 ; [call infinite(0,x)] 1 6 end Context can be used to enforce balance: it can simulate/abstract behaviour of a call stack. The amount of context determines complexity and precision. 10

Interprocedural flows The previous slides motivate a need to distinguish interprocedural and intraprocedural flow. For the fibonacci program: flow(s ) = {(1, 2), (2, 3), (3, 8), (2, 4), (4; 1), (8; 5), (5, 6), (6; 1), (8; 7), (7, 11), (11, 8), (9; 1), (8; 10)} Interprocedural: inter-flow(s ) = {(9, 1, 8, 10), (4, 1, 8, 5), (6, 1, 8, 7)} 4-tuples of call and corresponding return information. (9, 1, 8, 5) / inter-flow(s ) init(s ) = 9 and final(s ) = {10} Backward variants exist: flow R and inter-flow R 11

The flow graph again [call fib(x,0,y)] 9 10 is 1 [z<3] 2 yes [v := 1] 3 no [call fib(z-2,0,u)] 4 5 [call fib(z-1,0,v)] 6 7 end 8 [v := u+v] 11 12

Intermediate summary Changes to the programming language have now been made. syntax, scoping rules, MOP is generalized to MVP Now come the changes to the monotone framework reuse as much as possible of intraprocedural monotone framework, transfer functions for the new statements, distinguish between certain execution paths via context. 13

Embellished Monotone Frameworks 14

Towards embellished monotone frameworks 15 From monotone framework to embellished monotone framework. We proceed by example. Define a monotone framework for Detection Of Signs Analysis. Specify the form of transfer functions for calls, entries, exits and returns. Change it to include context so that data flows along balanced paths, by lifting the original transfer functions so that they include context, and making sure that procedure call and return imply a context change. Context can be anything, but we choose contexts that help us to analyze along valid paths.

Detection of Sign Analysis Let (L, F, F, E, ι, λl.f l ) be an instance of a monotone framework for Detection of Sign Analysis (Exercise 2.15) Detection of Signs gives for each program point what signs each variable may have at that program point. Beware: my notation differs from that in NNH. 16

Detection of Sign Analysis - the lattice The complete lattice L consists of sets of functions More precisely: elements of P(Var S) with S = {, 0, +} Each function describes a set of executions leading to a certain program point. Example: {g, h} L with g(x) = g(y) = +, and h(x) = + and h(y) = In other words, there are executions where x and y are both positive and there are executions where x is positive and y is negative. 17

Detection of Sign Analysis example Assume Var = {x, y}, g(x) = + and g(y) = +, and h(x) = + and h(y) =. Consider the effect of [x := x+y] l on g: the function g which maps both x and y to + (so g = g ) The effect of [x := x+y] l on h is map y to, but x to, 0 or + the result is described by three functions, h, h 0 and h +, defined as h i (y) = and h i (x) = i (for all i). The set {g, h} is thus mapped to {g, h, h 0, h + }. 18

Relational vs. independent Recall: the set {g, h} was mapped to {g, h, h 0, h + }. g tells us y can be mapped to +, the h i that y maps to. The h i tell us that x can map to any one of the {0,, +}. Analysis is relational: we store combinations of x and y. To save on resources, merge the functions to a set of signs for each variable: x has signs {0,, +} and y has {+, } Thereby becoming an independent attributes analysis. This value also represents the previously known to be impossible [x, y +] and [x 0, y +]. The independent attribute analysis is really weaker, but also less resource consuming. 19

Interpreting expressions A s : AExp (Var S) P(S) gives all possible signs of an expression, when given a sign for each variable. A s x+y [x +, y +] = {+} A s x+y [x +, y ] = {0, +, } 20

Transfer functions Transfer function for [x := a] l maps sets of functions to sets of functions: f l (Y ) = {φ l (σ) σ Y } where Y L and φ l (σ) = {σ[x s] s A s a (σ)} Functions may split up : φ l ([x +, y ]) = {[x, y ], [x 0, y ], [x +, y ]} Finally f l (Y ) collects everything: {[x +, y +], [x, y ], 21 [x 0, y ], [x +, y ]}

Adding context to the lattice Add context to get an embellished monotone framework ( L, F, F, E, ι, λl. f l ) The complete lattice L becomes L: P(Var S) becomes P(Var S) Omit context by taking a one element set. For each δ we may have a different value in L. δ serves as an index. L is a complete lattice (page 398 of NNH). In the book they use P( (Var S)) = P(Var S). We don t. 22

Lifting the transfer functions We have a transfer function f l : L L. Lift pointwise to f l : ( L) ( L): Or simply, f l ( l) = f l l f l ( l) = λδ f l ( l(δ)) for l L In words, apply old transfer function independently, i.e., pointwise, for each value in. Example: f l ([δ 1 {g}, δ 2 {h, g}]) = [δ 1 f l ({g}), δ 2 f l ({g, h})] = [δ 1 {g}, δ 2 {h 0, h, h +, g}]. 23

Data flow in the new set-up Information flows along dataflow graph edges: A (l) = {A (l ) (l, l) F (l ; l) F } ι l E So for procedure entry labels, we take the join over all callers. How do we tell different calls apart? By using context. Transfer almost as usual: A (l) = f l (A (l)) Call and return are somewhat different. 24

What happens for a (forward) procedure call? Assume a call to procedure p: (l c, l n, l x, l r ) inter-flow(s ). Two transfer functions: f lc and f ln. f ln is the same for every call to p. In NNH always identity function. f lc can be different for each call to p. Chronologically : f lc transfer value at call A (l c ) = f lc (A (l c )) compute A (l n ) by joining A for all calls to p. transfer value at entry: A (l n ) = f ln (A (l n )) Often the identity function value ready to flow through p. is typically a function that knows about context. 25

What happens at procedure return? Procedure return encompasses the real difference: A (l r ) = f 2 l c,l r (A (l c ), A (l r )) 26 Transfers information from inside the procedure and from before the call to just after the call. Note: A (l r ) is (normally) just A (l x ). Information before a call can be passed directly to after the call. Instead of propagating it through the call. f 2 lc,l r may ignore one (or both) arguments. For a backward analysis, the transfer functions change arity: the one for call becomes binary, the one for return becomes unary.

Call strings as context Context intends to keep analyses of separate calls separated Call string: list of addresses from which a call was made. Abstraction of the call stack: = [Lab ] For fib: Λ, [4], [6], [9], [4, 4],..., [9, 9], [4, 4, 4],... Generate only when needed. Call-string abstracts an execution into the labels of calls seen during execution without seeing the corresponding return: [1, 6, 5, 8, 3, 2, 1, 4, 2, 1, 9] becomes [6, 9] Procedure call labels are added to the front (stack like). 27

The flow graph again [call fib(x,0,y)] 9 10 is 1 [z<3] 2 yes [v := 1] 3 no [call fib(z-2,0,u)] 4 5 [call fib(z-1,0,v)] 6 7 end 8 [v := u+v] 11 28

Call strings as context Call string: list of addresses from which a call was made. For (l c, l n, l x, l r ) we define f 1 l c ( l)(l c :δ) = f 1 l c ( l(δ)) and f 1 lc ( l)(λ) = f 1 computes the effect of a call and f 1 selects where the effect values should go. Valid paths simulated by the transferring between corresponding call strings. 29

Call strings example snapshot pointwise join [ ] [1] [4] [1, 1] [4, 1] A (3) f1 1 (ι) f 4 (V )... proc p(..,..) is 3 A (4) [call p(..,..)] 4.. A (4) end 8 V............ f 4 (V )... A (1) ι... 30 [call p(..,..)] 1.. f1 1 (ι)... A (1)

Call strings as context, return Similarly, for procedure return: f 2 l c,l r ( l, l )(δ) = f 2 l c,l r ( l(δ), l (l c :δ)) We use two values: from before the call, which is under the same context as the return, from inside the procedure, which is under the extended call string. 31

Detection of Signs: procedure calls Assume [call p(a,x)] lc l r and proc p(val x, res y) is ln S end lx A call consists of two assignments x := a and y :=?. The context-less transfer function mimicks those. For σ = [x +, z ] and a = -x we ought to obtain φ lc (σ) = {[x, y, z ], [x, y 0, z ], [x, y +, z ]} Semantics says value of y is undefined (instead of 0). New x shadows the old. In general, unshadow when returning. 32

Detection of Signs: procedure calls Assume [call p(a,x)] lc l r and proc p(val x, res y) is ln S end lx For σ = [x +, z ] and a = -x we ought to obtain φ lc (σ) = {[x, y, z ], [x, y 0, z ], [x, y +, z ]} f lc (Z) = {φ lc (σ) σ Z} φ lc (σ) = {σ[x s][y s ] s A s -x (σ) s {0, +, }} 33

Detection of Signs: adding context Consider the function Z L = L We want to obtain Z = [Λ σ 1, δ 2 σ 2,...] [Λ, [l c ] f 1 l c (σ 1 ), (l c :δ 2 ) f 1 l c (σ 2 ),...] So f 1 l c (Z) is such that for all δ f 1 l c (Z)(δ ) = { if δ = Λ f 1 l c (Z(δ)) if δ = l c :δ 34 Warning: in NNH they give the same general formula, but the example of Detection of Signs (2.38) uses different notation.

Call strings of bounded size L might have ACC, but L might not Call strings can be arbitrarily long for recursive programs Enforce termination by restricting length call strings to k For every different list of call labels, potentially a different analysis result: quickly exponential. 35

What if we run out of bounds? Assume k = 2. Consider call from 4 either with context [1, 4], [1, 1] or [1]. Then in all three cases, the context inside the call will be [4, 1]. To stay sound we must join the transferred analysis results. Here s where we gain finiteness at the price of precision. In a formula f 4 l c (Z)([4, 1]) = f 4 l c (Z([1, 4])) f 4 l c (Z([1, 1])) f 4 l c (Z([1])) We can choose the level of detail (value of k) with a known price to pay. Take k = 0 to omit context: then equals {Λ} 36

k = 2 bounded call strings, snapshot pointwise join [ ] [1] [4] [1, 1] [4, 1] [1, 4] [4, 4] A (3) f1 1 (ι)...... Y...... proc p(..,..) is 3 A (4) V X... W... [call p(..,..)] 4.. A (4) Y end 8 Y = f 4 (V ) f 4 (X) f 4 (W ) 37 [call p(..,..)] 1.. f 1 1 (ι) A (1)

Separate the context from the transfer Context is never used to compute the transfer, it only tells you which part of the value to use (and update). For different analyses you can use the same kind of context and context change In an implementation: decouple the context change from transfer The former selects which values influence a given value. The latter says how. 38

Flow-sensitive versus flow-insensitive Flow-sensitive vs. flow-insensitive: does the result of the analysis depend on the order of statements? Again a matter of cost vs. precision. To go from flow-insensitive to flow-sensitive: add program points as a form of context. In NNH, flow-sensitivity is hard-coded into the framework. 39

Final remarks about procedures Except for binary transfer functions, the technical changes are slight. Conceptually, changes may be bigger. For termination, restrict context to finite sets of values. Use context to balance cost and precision. Simple monotone frameworks can be easily extended to become embellished. A first step in building an analysis. Analyzing procedures can be a pain when scoping enters the picture. 40