Static Program Analysis Part 8 control flow analysis

Similar documents
Static Program Analysis Part 9 pointer analysis. Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University

Static Program Analysis Part 1 the TIP language

Static Program Analysis Part 7 interprocedural analysis

Interprocedural Analysis with Data-Dependent Calls. Circularity dilemma. A solution: optimistic iterative analysis. Example

Static Program Analysis

Interprocedural Analysis with Data-Dependent Calls. Circularity dilemma. A solution: optimistic iterative analysis. Example

OOPLs - call graph construction Compile-time analysis of reference variables and fields. Example

Lecture 2: Control Flow Analysis

OOPLs - call graph construction. Example executed calls

Lambda Calculus: Implementation Techniques and a Proof. COS 441 Slides 15

Interprocedural Analysis. CS252r Fall 2015

Data Flow Analysis. Program Analysis

Lecture Notes on Static Analysis

Introduction A Tiny Example Language Type Analysis Static Analysis 2009

Flow Analysis. Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM

Static Program Analysis

Reference Analyses. VTA - Variable Type Analysis

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

Analysis of Object-oriented Programming Languages

Functional programming with Common Lisp

Comp 411 Principles of Programming Languages Lecture 7 Meta-interpreters. Corky Cartwright January 26, 2018

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Static Analysis and Dataflow Analysis

Lecture Notes: Pointer Analysis

CSE 501 Midterm Exam: Sketch of Some Plausible Solutions Winter 1997

CO444H. Ben Livshits. Datalog Pointer analysis

Control Flow Analysis with SAT Solvers

Conditioned Slicing. David Jong-hoon An. May 23, Abstract

CS422 - Programming Language Design

do such opportunities arise? (i) calling library code (ii) Where programs. object-oriented Propagating information across procedure boundaries is usef

CS153: Compilers Lecture 17: Control Flow Graph and Data Flow Analysis

Typed Lambda Calculus. Chapter 9 Benjamin Pierce Types and Programming Languages

CS 242. Fundamentals. Reading: See last slide

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers

Dimensions of Precision in Reference Analysis of Object-oriented Programming Languages. Outline

INF 102 CONCEPTS OF PROG. LANGS FUNCTIONAL COMPOSITION. Instructors: James Jones Copyright Instructors.

Lecture Notes: Pointer Analysis

Lecture 09: Data Abstraction ++ Parsing is the process of translating a sequence of characters (a string) into an abstract syntax tree.

Alias Analysis. Last time Interprocedural analysis. Today Intro to alias analysis (pointer analysis) CS553 Lecture Alias Analysis I 1

Restricting Grammars with Tree Automata

It is better to have 100 functions operate one one data structure, than 10 functions on 10 data structures. A. Perlis

CSC312 Principles of Programming Languages : Functional Programming Language. Copyright 2006 The McGraw-Hill Companies, Inc.

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

Alloy: A Lightweight Object Modelling Notation

LECTURE 3. Compiler Phases

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Lecture Notes: Control Flow Analysis for Functional Languages

Topology I Test 1 Solutions October 13, 2008

Advanced Compiler Construction

Data Flow Analysis. Agenda CS738: Advanced Compiler Optimizations. 3-address Code Format. Assumptions

Lambda Calculus. Variables and Functions. cs3723 1

Introduction to Shape and Pointer Analysis

Trust, but Verify. Two-Phase Typing for Dynamic Languages. Panagiotis Vekris, Benjamin Cosman, Ranjit Jhala. University of California, San Diego

System Functions and their Types in Shen

Lecture Notes on Static Analysis

Static Analysis. Systems and Internet Infrastructure Security

Static Analysis of JavaScript. Ben Hardekopf

Code Optimization. Code Optimization

Data Flow Analysis. CSCE Lecture 9-02/15/2018

CIS 341 Final Examination 4 May 2017

Lecture 15 CIS 341: COMPILERS

18.3 Deleting a key from a B-tree

Program Analysis and Verification

Closures. Mooly Sagiv. Michael Clarkson, Cornell CS 3110 Data Structures and Functional Programming

A Simple SQL Injection Pattern

Points-to Analysis. Xiaokang Qiu Purdue University. November 16, ECE 468 Adapted from Kulkarni 2012

Interprocedural Analysis. Dealing with Procedures. Course so far: Terminology

A Framework for Call Graph Construction Algorithms

Extra notes on Field- sensitive points- to analysis with inclusion constraints Sept 9, 2015

Lexical Analysis (ASU Ch 3, Fig 3.1)

Calvin Lin The University of Texas at Austin

Interprocedural Analysis. Motivation. Interprocedural Analysis. Function Calls and Pointers

Polymorphism and Type Inference

More Dataflow Analysis

A Type System for Functional Traversal-Based Aspects

Homework 1. Reading. Problems. Handout 1 CS242: Autumn September

Program analysis for determining opportunities for optimization: 2. analysis: dataow Organization 1. What kind of optimizations are useful? lattice al

CS553 Lecture Generalizing Data-flow Analysis 3

Lesson 4 Typed Arithmetic Typed Lambda Calculus

Constraint-based Analysis. Harry Xu CS 253/INF 212 Spring 2013

A Brief Introduction to Scheme (II)

Conflicts in LR Parsing and More LR Parsing Types

Foundations of Dataflow Analysis

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Intermediate Representation (IR)

April 2 to April 4, 2018

Compilation 2012 Static Analysis

Closures. Mooly Sagiv. Michael Clarkson, Cornell CS 3110 Data Structures and Functional Programming

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 17: Types and Type-Checking 25 Feb 08

INF 212 ANALYSIS OF PROG. LANGS FUNCTION COMPOSITION. Instructors: Crista Lopes Copyright Instructors.

Slides for Faculty Oxford University Press All rights reserved.

CMSC 330, Fall 2018 Midterm 2

Pointer Analysis in the Presence of Dynamic Class Loading. Hind Presented by Brian Russell

COMP 181. Agenda. Midterm topics. Today: type checking. Purpose of types. Type errors. Type checking

V Advanced Data Structures

CS558 Programming Languages

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Why Data Flow Models? Dependence and Data Flow Models. Learning objectives. Def-Use Pairs (1) Models from Chapter 5 emphasized control

Transcription:

Static Program Analysis Part 8 control flow analysis http://cs.au.dk/~amoeller/spa/ Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University

Agenda Control flow analysis for the -calculus The cubic framework Control flow analysis for TIP with function pointers Control flow analysis for object-oriented languages 2

Control flow complications Function pointers in TIP complicate CFG construction: several functions may be invoked at a call site this depends on the dataflow but dataflow analysis first requires a CFG Same situation for other features: higher-order functions (closures) a class hierarchy with objects and methods prototype objects with dynamic properties 3

Control flow analysis A control flow analysis approximates the CFG conservatively computes possible functions at call sites the trivial answer: all functions Control flow analysis is usually flow-insensitive: it is based on the AST the CFG is not available yet a subsequent dataflow analysis may use the CFG Alternative: use flow-sensitive analysis potentially on-the-fly, during dataflow analysis 4

CFA for the lambda calculus The pure lambda calculus E x.e (function definition) E 1 E 2 (function application) x (variable reference) Assume all -bound variables are distinct An abstract closure x abstracts the function x.e in all contexts (values of free variables) Goal: for each call site E 1 E 2 determine the possible functions for E 1 from the set { x 1, x 2,..., x n } 5

Closure analysis A flow-insensitive analysis that tracks function values: For every AST node, v, we introduce a variable v ranging over subsets of abstract closures For x.e we have the constraint x x.e For E 1 E 2 we have the conditional constraint x E 1 ( E 2 x E E 1 E 2 ) for every function x.e 6

Agenda Control flow analysis for the -calculus The cubic framework Control flow analysis for TIP with function pointers Control flow analysis for object-oriented languages 7

The cubic framework We have a set of tokens {t 1, t 2,..., t k } We have a collection of variables {x 1,..., x n } ranging over subsets of tokens A collection of constraints of these forms: t x t x y z Compute the unique minimal solution this exists since solutions are closed under intersection A cubic time algorithm exists! 8

The solver data structure Each variable is mapped to a node in a DAG Each node has a bitvector in {0,1} k initially set to all 0 s Each bit has a list of pairs of variables used to model conditional constraints The DAG edges model inclusion constraints The bitvectors will at all times directly represent the minimal solution to the constraints seen so far 9

An example graph x 1 x 2 x 3 x 4 (x 2,x 4 ) 10

Adding constraints (1/2) Constraints of the form t x: look up the node associated with x set the bit corresponding to t to 1 if the list of pairs for t is not empty, then add the edges corresponding to the pairs to the DAG y x t 0 x t 1 z (y,z) 11

Adding constraints (2/2) Constraints of the form t x y z: test if the bit corresponding to t is 1 if so, add the DAG edge from y to z otherwise, add (y,z) to the list of pairs for t x t 1 x t 1 y z x t 0 x t 0 (y,z) 12

Collapse cycles If a newly added edge forms a cycle: merge the nodes on the cycle into a single node form the union of the bitvectors concatenate the lists of pairs update the map from variables accordingly x 1 1 0 0 0 0 x,y,z y 1 0 1 0 0 1 1 1 1 0 0 1 z 0 0 1 0 0 0 (a,b) (a,b) (c,d) (c,d) 13

Propagate bitvectors Propagate the values of all newly set bits along all edges in the DAG 1 1 1 1 1 1 14

Time complexity (1/2) O(n) functions and O(n) applications, with program size n O(n) singleton constraints, O(n 2 ) conditional constraints O(n) nodes, O(n 2 ) edges, O(n) bits per node Total time for bitvector propagation: O(n 3 ) Total time for collapsing cycles: O(n 3 ) Total time for handling lists of pairs: O(n 3 ) 01 0 0 0 0 1 01 0 0 0 0 1 15

Time complexity (2/2) Adding it all up, the upper bound is O(n 3 ) This is known as the cubic time bottleneck: occurs in many different scenarios but O(n 3 /log n) is possible A special case of general set constraints: defined on sets of terms instead of sets of tokens solvable in time O(2 2 ) n 16

Agenda Control flow analysis for the -calculus The cubic framework Control flow analysis for TIP with function pointers Control flow analysis for object-oriented languages 17

CFA for TIP with function pointers For a computed function call E E( E,..., E ) we cannot immediately see which function is called A coarse but sound approximation: assume any function with right number of arguments Use CFA to get a much better result! 18

CFA constraints (1/2) Tokens are all functions {f 1, f 2,..., f k } For every AST node, v, we introduce the variable v denoting the set of functions to which v may evaluate For function definitions f(...){...}: f f For assignments x = E: E x 19

CFA constraints (2/2) For direct function calls f(e 1,..., E n ): E i a i for i=1,...,n E f(e 1,..., E n ) where f is a function with arguments a 1,..., a n and return expression E For computed function calls E(E 1,..., E n ): f E ( E i a i for i=1,...,n E (E)(E 1,..., E n ) ) for every function f with arguments a 1,..., a n and return expression E If we consider typable programs only: only generate constraints for those functions f for which the call would be type correct 20

Example program inc(i) { return i+1; } dec(j) { return j-1; } ide(k) { return k; } foo(n,f) { var r; if (n==0) { f=ide; } r = f(n); return r; } main() { var x,y; x = input; if (x>0) { y = foo(x,inc); } else { y = foo(x,dec); } return y; } 21

Generated constraints inc inc dec dec ide ide ide f f(n) r inc f n i i+1 f(n) dec f n j j-1 f(n) ide f n k k f(n) input x foo(x,inc) y foo(x,dec) y foo foo foo foo x n inc f f(n) foo(x,inc) foo foo x n dec f f(n) foo(x,dec) main main 22

Least solution inc = {inc} dec = {dec} ide = {ide} f = {inc, dec, ide} foo = {foo} main = {main} With this information, we can construct the call edges and return edges in the interprocedural CFG 23

Agenda Control flow analysis for the -calculus The cubic framework Control flow analysis for TIP with function pointers Control flow analysis for object-oriented languages 24

Simple CFA for OO (1/3) CFA in an object-oriented language: x.m(a,b,c) Which method implementations may be invoked? Full CFA is a possibility... But the extra structure allows simpler solutions 25

Simple CFA for OO (2/3) Simplest solution: select all methods named m with three arguments Class Hierarchy Analysis (CHA): consider only the part of the class hierarchy rooted by the declared type of x x 26

Simple CFA for OO (3/3) Rapid Type Analysis (RTA): restrict to those classes that are actually used in the program in new expressions x Variable Type Analysis (VTA): perform intraprocedural control flow analysis 27