Static analysis and all that

Similar documents
Chapter 1 Introduction

Principles of Program Analysis: A Sampler of Approaches

PROGRAM ANALYSIS & SYNTHESIS

1 Introduction. 3 Syntax

Programming Languages Lecture 14: Sum, Product, Recursive Types

CS 6110 S11 Lecture 25 Typed λ-calculus 6 April 2011

Programming Languages

Static Program Analysis

Application: Programming Language Semantics

CS 6110 S14 Lecture 38 Abstract Interpretation 30 April 2014

Programming Languages Lecture 15: Recursive Types & Subtyping

COSE312: Compilers. Lecture 1 Overview of Compilers

A CRASH COURSE IN SEMANTICS

Static Program Analysis

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

Implementation of Constraint Systems for Useless Variable Elimination

Intra-procedural Data Flow Analysis Introduction

Type Systems. Today. 1. Organizational Matters. 1. Organizational Matters. Lecture 1 Oct. 20th, 2004 Sebastian Maneth. 1. Organizational Matters

Denotational Semantics. Domain Theory

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

COS 320. Compiling Techniques

Induction and Semantics in Dafny

A Gentle Introduction to Program Analysis

Principles of Programming Languages

Comp 411 Principles of Programming Languages Lecture 7 Meta-interpreters. Corky Cartwright January 26, 2018

Compiler construction

Introduction to Lexical Analysis

CSCE 314 Programming Languages

CSCE 314 Programming Languages. Type System

CS152: Programming Languages. Lecture 11 STLC Extensions and Related Topics. Dan Grossman Spring 2011

Compiler construction

3.7 Denotational Semantics

Formal semantics of loosely typed languages. Joep Verkoelen Vincent Driessen

Recursive Types and Subtyping

Compilers and computer architecture: Semantic analysis

Simply-Typed Lambda Calculus

Lecture 9: Typed Lambda Calculus

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

CS 360 Programming Languages Interpreters

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

CS-XXX: Graduate Programming Languages. Lecture 9 Simply Typed Lambda Calculus. Dan Grossman 2012

More Lambda Calculus and Intro to Type Systems

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Static Program Analysis Part 1 the TIP language

Handout 10: Imperative programs and the Lambda Calculus

CMSC 330: Organization of Programming Languages

Note that in this definition, n + m denotes the syntactic expression with three symbols n, +, and m, not to the number that is the sum of n and m.

Formal Syntax and Semantics of Programming Languages

Flow Logic: A Multi-paradigmatic Approach to Static Analysis

In Our Last Exciting Episode

Principles of Program Analysis: Data Flow Analysis

Introduction to Lexical Analysis

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

Data Flow Analysis using Program Graphs

Data Abstraction. An Abstraction for Inductive Data Types. Philip W. L. Fong.

CMSC 330: Organization of Programming Languages. OCaml Expressions and Functions

Flow Analysis. Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM

Lecture Notes on Program Equivalence

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Lambda Calculus and Type Inference

COSE212: Programming Languages. Lecture 3 Functional Programming in OCaml

Advanced Programming Methods. Introduction in program analysis

Writing Evaluators MIF08. Laure Gonnord

Programs with infinite loops: from primitive recursive predicates to the arithmetic hierarchy

Semantics of programming languages

Compilers and computer architecture From strings to ASTs (2): context free grammars

Types. Type checking. Why Do We Need Type Systems? Types and Operations. What is a type? Consensus

Sémantique des Langages de Programmation (SemLP) DM : Region Types

Lecture 09: Data Abstraction ++ Parsing is the process of translating a sequence of characters (a string) into an abstract syntax tree.

Lambda Calculus and Type Inference

Lecture Programming in C++ PART 1. By Assistant Professor Dr. Ali Kattan

Variables. Substitution

Recursive Types and Subtyping

Introduction to Denotational Semantics. Brutus Is An Honorable Man. Class Likes/Dislikes Survey. Dueling Semantics

Program Analysis: Lecture 02 Page 1 of 32

RSL Reference Manual

Topics Covered Thus Far. CMSC 330: Organization of Programming Languages. Language Features Covered Thus Far. Programming Languages Revisited

CS 242. Fundamentals. Reading: See last slide

Dependent Types. Announcements. Project Presentations. Recap. Dependent Types. Dependent Type Notation

CS4120/4121/5120/5121 Spring 2018 Xi Type System Specification Cornell University Version of February 18, 2018

Lecture 6. Abstract Interpretation

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning

More Untyped Lambda Calculus & Simply Typed Lambda Calculus

Part III. Chapter 15: Subtyping

axiomatic semantics involving logical rules for deriving relations between preconditions and postconditions.

Lecture 6: The Declarative Kernel Language Machine. September 13th, 2011

The Substitution Model

1 Lexical Considerations

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

CSE341: Programming Languages Lecture 17 Implementing Languages Including Closures. Dan Grossman Autumn 2018

CST-402(T): Language Processors

Register Allocation. Ina Schaefer Selected Aspects of Compilers 100. Register Allocation

An introduction to Scheme

Lecture #13: Type Inference and Unification. Typing In the Language ML. Type Inference. Doing Type Inference

Static Program Analysis

CS 6110 S11 Lecture 12 Naming and Scope 21 February 2011

A Type System for Object Initialization In the Java TM Bytecode Language

Functional programming languages

Thursday, December 23, The attack model: Static Program Analysis

Transcription:

Static analysis and all that Martin Steffen IfI UiO Spring 2014 uio

Static analysis and all that Martin Steffen IfI UiO Spring 2014 uio

Plan approx. 15 lectures, details see web-page flexible time-schedule, depending on progress/interest covering parts/following the structure of textbook [2], concentrating on overview data-flow control-flow type- and effect systems helpful prior knowledge: having at least heard of typed lambda calculi (especially for CFA) simple type systems operational semantics lattice theory, fixpoints, induction

1 Introduction Setting the scene Data-flow analysis Equational approach Constraint-based approach Constraint-based analysis Type and effect systems Algorithms

Plan introduction/motivation into the field short survey about the material: 5 main topics data flow analysis control flow analysis/constraint based analysis [Abstract interpretation] type and effect systems [algorithmic issues] 2 lessons

SA: why and what? What: Why: static: at compile time analysis: deduction of program properties automatic/decidable formally, based on semantics error catching enhancing program quality catching common stupid errors without bothering the user much spotting errors early certain similarities to model checking examples: type checking, uninitialized variables (potential nil-pointer deref s), unused code optimization: based on analysis, transform the code 1, such the the result is better examples: precalculation of results, optimized register allocation... success-story for formal methods 1 source code, intermediate code at various levels

Nature of SA programs have differerent semantical phases corresponding to Chomsky s hierarchy static = in principle: before run-time, but in praxis, context-free 2 since: run-time most often: undecidable static analysis as approximation See [2, Figure 1.1] L0 L1 L2 L3 lexer parser sa exec. compile time run time 2 playing with words, one could call full-scale (hand?) verification static analysis, and likewise call lexical analysis a static analysis.

Phases machine indep. optimizations machine dep. optimizations lexical analysis syntactic analysis stat. semantic checking code generation stream of char s stream of tokens symbol table syntax tree syntax tree machine code

SA as approximation universe unsafe exact safe over-approximation

While-language simple, prototypical imperative language: untyped simple control structure: while, conditional, sequencing simple data (numerals, booleans) abstract syntax concrete syntax disambiguation when needed: (...), or {... } or begin... end a ::= x n a op a a arithm. expressions b ::= true false not b b op b b a op r a boolean expr. S ::= x := a skip S 1 ; S 2 statements if b then S else S while b do S Table: Abstract syntax

While-language: labelling associate flow information labels elementary block = labelled item identify basic building blocks unique labelling a ::= x n a op a a arithm. expressions b ::= true false not b b op b b a op r a boolean expr. S ::= [x := a] l [skip] l S 1 ; S 2 statements if [b] l then S else S while [b] l do S Table: Abstract syntax

Example: factorial y := x; z := 1; while y > 1 do (z := z y; y := y 1); y := 0 input variable: x output variable: z

Example: factorial [y := x] 1 ; [z := 1] 2 ; while [y > 1] 3 do ([z := z y] 4 ; [y := y 1] 5 ); [y := 0] 6 [y := x] 1 [z := 1] 2 [y > 1] 3 yes [z := z y] 4 no [y := 0] 6 [y := y 1] 5

Reaching definitions analysis definition of x: assignment to x: x := a better name: reaching assignment analysis first, simple example of data flow analysis assignment (= definition ) [x := a] l may reach a program point, if there exists an execution where x was last assigned at l, when the mentioned program point is reached.

Factorial: reaching assignment [y := x] 1 [z := 1] 2 [y > 1] 3 yes [z := z y] 4 no [y := 0] 6 [y := y 1] 5 (y, 1) (short for [y := x] 1 ) may reach: the entry to 4 (short for [z := z y] 4 ). the exit to 4 (not in the picture as arrow) the entry to 5 but: not the exit to 5

Factorial: reaching assignments points in the program: entry and exit to elementary blocks/labels?: special label (not occurring otherwise), representing entry to the program, i.e., (x,?) represents initial (uninitialized) value of x full information: pair of functions of type RD = (RD entry, RD exit ) (1) l RD entry RD exit 1 (x,?), (y,?), (z,?) (x,?), (y, 1), (z,?) 2 (x,?), (y, 1), (z,?) (x,?), (y, 1), (z, 2) 3 (x,?), (y, 1), (y, 5), (z, 2), (z, 4) (x,?), (y, 1), (y, 5), (z, 2), (z, 4) 4 (x,?), (y, 1), (y, 5), (z, 2), (z, 4) (x,?), (y, 1), (y, 5), (z, 4) 5 (x,?), (y, 1), (y, 5), (z, 4) (x,?), (y, 5), (z, 4) 6 (x,?), (y, 1), (y, 5), (z, 2), (z, 4) (x,?), (y, 6), (z, 2), (z, 4)

Reaching assignments: remarks elementary blocks of the form [b] l : entry/exit information coincides [x := a] l : entry/exit information (in general) different at program exit: (x,?), x is input variable table: best information = smallest : additional pairs in the table: still safe removing labels: unsafe note: still an approximation no real (= run time) data, no real execution, only data flow approximate since in concrete runs: at each point in that run, there is exactly one last assignment, not a set label represents (potentially infinitely many) runs e.g.: at program exit in concrete run: either (z, 2) or else (z, 4)

Data flow analysis standard: representation of program as flow graph nodes: elementary blocks with labels edges: flow of control two approaches (both here quite similar) equational approach constraint-based approach

From flow graphs to equations associate an equation system with the flow graph: describing the flow of information here: the information related to reaching assignments information imagined to flow forwards solution of the equations describe safe approximations not unique, interest in the least (or largest) solution here: give back RD of equation (1) on slide 16

Equations for RD and factorial: intra-block first type: local, intra-block : flow through each individual block relating for each elementary block its exit with its entry elementary block: [y := x] 1 RD exit (1) = RD entry (1) \{(y, l) l Lab} {(y, 1)} (2)

Equations for RD and factorial: intra-block first type: local, intra-block : flow through each individual block relating for each elementary block its exit with its entry elementary block: [y > 1] 3 RD exit (1) = RD entry (1) \{(y, l) l Lab} {(y, 1)} (2) RD exit (3) = RD entry (3)

Equations for RD and factorial: intra-block first type: local, intra-block : flow through each individual block relating for each elementary block its exit with its entry all equations with RD exit as left-hand side RD exit (1) = RD entry (1) \{(y, l) l Lab} {(y, 1)} RD exit (2) = RD entry (2) \{(z, l) l Lab} {(z, 2)} RD exit (3) = RD entry (3) RD exit (4) = RD entry (4) \{(z, l) l Lab} {(z, 4)} RD exit (5) = RD entry (5) \{(y, l) l Lab} {(y, 5)} RD exit (6) = RD entry (6) \{(y, l) l Lab} {(y, 6)} (2)

Equations for RD and factorial: inter-block second type: global, inter-block reflecting the control flow graph flow between the elementary blocks, following the control-flow edges relating the entry of each 3 block with the exits of other blocks, that are connected via an edge initial block: mark variables as uninitialized RD entry (2) = RD exit (1) (3) RD entry (4) = RD exit (3) RD entry (5) = RD exit (4) RD entry (6) = RD exit (3) 3 except (in general) the initial block.

Equations for RD and factorial: inter-block second type: global, inter-block reflecting the control flow graph flow between the elementary blocks, following the control-flow edges relating the entry of each 3 block with the exits of other blocks, that are connected via an edge initial block: mark variables as uninitialized RD entry (2) = RD exit (1) RD entry (3) = RD exit (2) RD exit (5) RD entry (4) = RD exit (3) RD entry (5) = RD exit (4) RD entry (6) = RD exit (3) (3) 3 except (in general) the initial block.

Equations for RD and factorial: inter-block second type: global, inter-block reflecting the control flow graph flow between the elementary blocks, following the control-flow edges relating the entry of each 3 block with the exits of other blocks, that are connected via an edge initial block: mark variables as uninitialized RD entry (2) = RD exit (1) RD entry (3) = RD exit (2) RD exit (5) RD entry (4) = RD exit (3) RD entry (5) = RD exit (4) RD entry (6) = RD exit (3) (3) RD entry (1) = {(x,?),(y,?),(z,?)} 3 except (in general) the initial block.

RD: general scheme Intra: Inter: for assignments [x := a] l RD exit (l) = RD entry (l) \{(x, l ) l Lab} {(x, l)} (4) for other blocks [b] l (side-effect free) Initial: l: label of the initial block 4 RD exit (l) = RD entry (l) (5) RD entry (l) = RD exit (l ) (6) l l RD entry (l) = {(x,?) x is a program variable} (7) 4 isolated entry.

The equation system as fix point in the example: solution to the equation system = 12 sets RD entry (1),..., RD exit (6) i.e., the RD entry (l), RD exit (l) are the variables of the equation system, of type : set of (x, l)-pairs RD: the mentioned twelve-tuple equation system understood as function F Equations RD = F( RD) more explicitly, broken down to its 12 parts (the equations ) F( RD) = (F entry (1)( RD), F exit (1)( RD),..., F exit (6)( RD)) for instance: F entry (3) = (..., RD exit (2),..., RD exit (5),...) = RD exit (2) RD exit (5)

The least solution Var = variables of interest (i.e., occurring), Lab : labels of interest here Var = {x, y, z}, Lab = {?, 1,..., 6} F : (2 Var Lab ) 12 (2 Var Lab ) 12 (8) domain (2 Var Lab ) 12 : partially ordered pointwise: complete lattice RD RD iff i. RD i RD i (9)

Constraint based approach here, for DFA: a simple variant of the equational approach trivial rearrangement of the entry-exit relationships instead of equations: inequations (sub-set instead of set-equality) in more complex settings: constraints become more complex, no split in exit- and entry-constraints

Factorial program: intra-block constraints elementary block: [y := x] 1 RD exit (1) RD entry (1) \{(y, l) l Lab} RD exit (1) {(y, 1)}

Factorial program: intra-block constraints elementary block: [y > 1] 3 RD exit (3) RD entry (3)

Factorial program: intra-block constraints all equations with RD exit as left-hand side RD exit (1) RD entry (1) \{(y, l) l Lab} RD exit (1) {(y, 1)} RD exit (2) RD entry (2) \{(z, l) l Lab} RD exit (2) {(z, 2)} RD exit (3) RD entry (3) RD exit (4) RD entry (4) \{(z, l) l Lab} RD exit (4) {(z, 4)} RD exit (5) RD entry (5) \{(y, l) l Lab} RD exit (5) {(y, 5)} RD exit (6) RD entry (6) \{(y, l) l Lab} RD exit (6) {(y, 6)}

Factorial program: inter-block constraints cf. slide 23 ff.: inter-block equations: RD entry (2) = RD exit (1) RD entry (3) = RD exit (2) RD exit (5) RD entry (4) = RD exit (3) RD entry (5) = RD exit (4) RD entry (6) = RD exit (3) RD entry (1) = {(x,?),(y,?),(z,?)}

Factorial program: inter-block constraints splitting of composed right-hand sides + using instead of =: RD entry (2) RD exit (1) RD entry (3) RD exit (2) RD entry (3) RD exit (5) RD entry (4) RD exit (3) RD entry (5) RD exit (4) RD entry (6) RD exit (3) RD entry (1) {(x,?),(y,?),(z,?)}

least solution revisited instead of F( RD) = RD for the same F F( RD) RD (10) clear: solution to the equation system solution to the constraint system important: least solution coincides!

Control-flow analysis goal: which elem. blocks lead to which other elem. blocks for while-language: immediate (labelled elem. blocks, resp., graph) complex for: more advanced features, higher-order languages, oo languages... here: prototypical higher-order functional language (λ-calc.) formulated as constraint based analysis

Simple example let f = fn x => x 1; g = fn y => y + 2; h = fn z => z + 3; in (f g) + (f h) higher-order function f : for simplicity untyped local definitions 5 via let-in goal (more specific): for each function application, which function may be applied interesting above: x 1 5 That s something else than assignment. We will not consider it (here) anyway.

Example more complex language more complex labelling elem. blocks can be nested all syntactic constructs (expressions) are labeled consider: (fn x x) (fn y y)

Example more complex language more complex labelling elem. blocks can be nested all syntactic constructs (expressions) are labeled consider: [ [fn x [x] 1 ] 2 [fn y [y] 3 ] 4 ] 5 functional language: side effect free no need to distinguish entry and exit of labelled blocks. data of the analysis: (Ĉ, ˆρ), pair of functions abstract cache: Ĉ(l): set of values/function abstractions, the subexpression labelled l may evaluate to abstract env.: ˆρ: values, x may be bound to

The constraint system ignoring let here: three syntactic constructs three kinds of constraints 1. function abstraction: [fn x x] l 2. variables: [x] l 3. application: [f g] l relating Ĉ, ˆρ, and the program in form of constraints (subsets, order-relation)

The constraint system ignoring let here: three syntactic constructs three kinds of constraints 1. function abstraction: [fn x x] l 2. variables: [x] l 3. application: [f g] l relating Ĉ, ˆρ, and the program in form of constraints (subsets, order-relation)

The constraint system ignoring let here: three syntactic constructs three kinds of constraints 1. function abstraction: [fn x x] l 2. variables: [x] l 3. application: [f g] l relating Ĉ, ˆρ, and the program in form of constraints (subsets, order-relation) function abstractions {fn x [x] 1 } Ĉ(2) {fn y [y] 3 } Ĉ(4)

The constraint system ignoring let here: three syntactic constructs three kinds of constraints 1. function abstraction: [fn x x] l 2. variables: [x] l 3. application: [f g] l relating Ĉ, ˆρ, and the program in form of constraints (subsets, order-relation) variables ˆρ(x) ˆρ(y) Ĉ(1) Ĉ(3)

The constraint system ignoring let here: three syntactic constructs three kinds of constraints 1. function abstraction: [fn x x] l 2. variables: [x] l 3. application: [f g] l relating Ĉ, ˆρ, and the program in form of constraints (subsets, order-relation) application: connecting function entry and (body) exit with the argument Ĉ(4) Ĉ(1) ˆρ(x) Ĉ(5)

The constraint system ignoring let here: three syntactic constructs three kinds of constraints 1. function abstraction: [fn x x] l 2. variables: [x] l 3. application: [f g] l relating Ĉ, ˆρ, and the program in form of constraints (subsets, order-relation) application: connecting function entry and (body) exit with the argument but: also [fn y [y] 3 ] 4 is a candidate at 2! (according to Ĉ(2)) Ĉ(4) Ĉ(1) Ĉ(4) Ĉ(3) ˆρ(x) Ĉ(5) ˆρ(y) Ĉ(5)

The constraint system ignoring let here: three syntactic constructs three kinds of constraints 1. function abstraction: [fn x x] l 2. variables: [x] l 3. application: [f g] l relating Ĉ, ˆρ, and the program in form of constraints (subsets, order-relation) {fn x [x] 1 } Ĉ(2) Ĉ(4) ˆρ(x) {fn x [x] 1 } Ĉ(2) Ĉ(1) Ĉ(5) {fn y [y] 3 } Ĉ(2) Ĉ(4) ˆρ(y) {fn y [y] 3 } Ĉ(2) Ĉ(3) Ĉ(5)

The least solution Ĉ(1) = {fn y [y] 3 } Ĉ(2) = {fn x [x] 1 } Ĉ(3) = Ĉ(4) = {fn y [y] 3 } Ĉ(5) = {fn y [y] 3 } ˆρ(x) = {fn y [y] 3 } ˆρ(y) =

Effects: Intro type system: classical static analysis: t : T judgment: term/program phrase has type T in general: context-sensitive judgments Γ: assumptions/context Γ t : T here: non-standard type systems: effects and annotations natural setting: typed languages, here: trivial! setting (While-language)

Trival type system setting: While-language each statement maps: a state to a state Σ: type of states judgment S : Σ Σ (11) specified as a derivation system note: partial correctness assertion

Trival type system: rules [x := a] l : Σ Σ ASS [skip] l : Σ Σ SKIP S 1 : Σ Σ S 1 ; S 2 : Σ Σ S 2 : Σ Σ SEQ S : Σ Σ WHILE while [b] l do S : Σ Σ S 1 : Σ Σ S 2 : Σ Σ COND if [b] l then S 1 else S 2 : Σ Σ

Types, effects, and annotations type and effect system (TES) often effect system + annotated type system (border fuzzy) annotated type system judgments: S : Σ 1 Σ 2 (12) Σ i : property of state ( Σ i Σ ) abstract properties: invariants, a variable is positive, etc. effect system judgments: S : Σ ϕ Σ (13) statement S maps state to state, with (potential... ) effect ϕ effect ϕ: e.g.: errors, exceptions, file/resource access,...

Annotated type systems example: reaching definitions/assignments in While-lang. 2 flavors 1. annotated base types: S : RD 1 RD 2 2. annotated type constructors: S : Σ X Σ RD

Annotated base types judgment RD 2 Var Lab S : RD 1 RD 2 (14) auxiliary functions note: every S has one initial elementary block, potentially more than one at the end init(s): the (unique) label at the entry of S final(s): the set of labels at the exits of S meaning of judgment S : RD 1 RD 2 : RD 1 is the set of var/label reaching the entry of S and RD 2 the corresponding set at the exit(s) of S : RD 1 RD 2 = RD entry (init(s)) = {RD exit (l) l final(s)}

[x := a] l : RD RD \{(x, l) l Lab} {(x, l )} ASS [skip] l : RD RD SKIP S 1 : RD 1 RD 2 S 2 : RD 2 RD 3 SEQ S 1 ; S 2 : RD 1 RD 3 S 1 : RD 1 RD 2 S 2 : RD 1 RD 2 IF if [b] l then S 1 else S 2 : RD 1 RD 2 S : RD RD WHILE while [b] l do S : RD RD S : RD 1 RD 2 RD 1 RD 1 RD 2 RD 2 SUB S : RD 1 RD 2

Meaning of annotated judgment Once more: meaning of judgment S : RD 1 RD 2 : RD 1 is the set of var/label reaching the entry of S and RD 2 the corresponding set at the exit(s) of S : RD 1 RD 2 = RD entry (init(s)) = {RD exit l l final(s)} Be careful: more concretely if [b] l then S 1 else S 2 if [b] l then [x := y] l 1 else [y := x] l 2

Meaning of annotated judgment Once more: meaning of judgment S : RD 1 RD 2 : RD 1 is the set of var/label reaching the entry of S and RD 2 the corresponding set at the exit(s) of S : if RD 1 RD entry (init(s)) then l final(s). RD exit (l) RD 2 compare subsumption rule SUB subsumption adds necessary slack similar to the contraint formulation Remember: data flow equations and their (possible/minimal) solution

Example: factorial [y := x] 1 ; [z := 1] 2 ; while [y > 1] 3 do ([z := z y] 4 ; [y := y 1] 5 ); [y := 0] 6 [y := x] 1 [z := 1] 2 [y > 1] 3 yes [z := z y] 4 no [y := 0] 6 [y := y 1] 5

[z := 1] 2 : {? x, 1,? z} {? x, 1, 2} f 3 : {? x, 1, 2} RD final [y := x] 1 : RD 0 {? x, 1,? z} f 2 : {? x, 1,? z} RD final f : RD 0 RD final RD 0 = {? x,? y,? z} RD final = {? x, 6, 2, 4} type sub-derivation for the rest f 3 =while... ;[y := 0] 6 loop invariant RD body = {? x, 1, 5, 2, 4} [z := ] 4 : RD body {? x, 1, 5, 4} [y := ] 5 : {? x, 1, 5, 4} {? x, 5, 4} f body : RD body {? x, 5, 4} f body : RD body RD body SUB f while : RD body RD body f while : {? x, 1, 2} RD body SUB [y := 0] 6 : RD body RD final f 3 : {? x, 1, 2} RD final

Annotated type constructors alternative approach of annotated type systems arrow constructor itself annotated annotion of : flavor of effect system judgment S : Σ RD annotation with RD (corresponding to the post-condition from above) alone is not enough Σ

Annotated type constructors alternative approach of annotated type systems arrow constructor itself annotated annotion of : flavor of effect system judgment S : Σ X Σ RD annotation with RD (corresponding to the post-condition from above) alone is not enough also need: the variables being changed Meaning S maps states to states, where RD is the set of reaching definition, S may produce and X the set of var s S must (= unavoidably) assign

[x := a] l : Σ {x} {(x,l)} Σ ASS [skip] l : Σ Σ SKIP S 1 : Σ X 1 Σ S 2 : Σ X 2 RD1 S 1 ; S 2 : Σ RD2 X 1 X 2 Σ RD 1 \ X 2 RD 2 Σ SEQ S 1 : Σ X RD Σ S 2 : Σ X Σ RD if [b] l then S 1 else S 2 : Σ X Σ RD S : Σ X Σ RD WHILE while [b] l do S : Σ Σ RD IF S : Σ X Σ X X RD RD RD S : Σ X Σ SUB

Effect systems this time: functional language 6 starting point: simple type system judgment: Γ e : τ Γ: type environment, mapping from var s to types types: bool, int, and τ τ 6 same as for constraint-based cfa.

Γ(x) = τ Γ x : τ VAR Γ, x:τ 1 e : τ 2 ABS Γ fn π x e : τ 1 τ 2 Γ e 1 : τ 1 τ 2 Γ e 2 : τ 1 APP Γ e 1 e 2 : τ 2

Effect: Call tracking analysis call tracking analysis: Determine: for each subexpression, which function abstractions may be applied during its evaluation. set of function names annotate: function type with latent effect annotated types: ˆτ: base types as before, arrow types: ˆτ 1 ϕ ˆτ2 (15) functions from τ 1 to τ 2, where in the execution, functions from set ϕ are called. judgment ˆΓ e : ˆτ & ϕ (16)

ˆΓ(x) = ˆτ VAR ˆΓ x : ˆτ & Γ, x:ˆτ 1 e : ˆτ 2 & ϕ ϕ {π} ABS Γ fn π x e : ˆτ 1 ˆτ 2 & ˆΓ e 1 : ˆτ 1 ϕ ˆτ2 & ϕ 1 ˆΓ e 2 : ˆτ 1 & ϕ 2 APP ˆΓ e 1 e 2 : ˆτ 2 & ϕ ϕ 1 ϕ 2

Call tracking: example x:int {Y } int x:int {Y } int & (fn X x x) : (int {Y } int) {X} (int {Y } int) & (fn Y y y) : int {Y } int & (fn X x x) (fn Y y y) : int {Y } int & {X}

Chaotic iteration back to Data flow/reaching def s goal: solve RD = F(RD) or RD F(RD) F : monotone, finite domain straightforward/naive approach init: RD0 = F 0 ( ) iterate: RDn+1 = F( RD n ) = F n+1 ( ) until stabilization approach to implement that: chaotic iteration abbrev: RD = (RD 1,..., RD 12 ) F( RD) = F( RD,..., RD)

Chaotic iteration (for RD) Input: example equations for reaching definitions Output: least solution: RD = (RD 1,..., RD 12 ) Method: step 1: step 2: initialization RD 1 := ;... ; RD 12 := iteration while RD j F j (RD 1,..., RD 12 ) for some j do RD j := F j (RD 1,..., RD 12 )

References I [1] A. W. Appel. Modern Compiler Implementation in ML. Cambridge University Press, 1998. [2] F. Nielson, H.-R. Nielson, and C. L. Hankin. Principles of Program Analysis. Springer-Verlag, 1999. [3] J. A. Robinson. A machine-oriented logic based on the resolution principle. Journal of the ACM, 12:23 41, 1965.