Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

Similar documents
Lecture Compiler Middle-End

Global Optimization. Lecture Outline. Global flow analysis. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization

More Dataflow Analysis

Flow Analysis. Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM

CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers

Lecture 6 Foundations of Data Flow Analysis

Data Flow Analysis. Agenda CS738: Advanced Compiler Optimizations. 3-address Code Format. Assumptions

Lecture 4 Introduction to Data Flow Analysis

Why Data Flow Models? Dependence and Data Flow Models. Learning objectives. Def-Use Pairs (1) Models from Chapter 5 emphasized control

Compiler Optimization and Code Generation

Dependence and Data Flow Models. (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 1

Lecture 6 Foundations of Data Flow Analysis

Data Flow Analysis. CSCE Lecture 9-02/15/2018

CMSC430 Spring 2009 Midterm 2 (Solutions)

Midterm 2. CMSC 430 Introduction to Compilers Fall Instructions Total 100. Name: November 21, 2016

Dataflow analysis (ctd.)

Data Flow Analysis. Program Analysis

Compiler Optimisation

Control Flow Analysis. Reading & Topics. Optimization Overview CS2210. Muchnick: chapter 7

compile-time reasoning about the run-time ow represent facts about run-time behavior represent eect of executing each basic block

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Lecture 2. Introduction to Data Flow Analysis

CMSC430 Spring 2014 Midterm 2 Solutions

Why Global Dataflow Analysis?

CS 432 Fall Mike Lam, Professor. Data-Flow Analysis

Review; questions Basic Analyses (2) Assign (see Schedule for links)

Lecture 4. More on Data Flow: Constant Propagation, Speed, Loops

CS553 Lecture Generalizing Data-flow Analysis 3

Homework Assignment #1 Sample Solutions

(How Not To Do) Global Optimizations

Static Analysis and Dataflow Analysis

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Problem Score Max Score 1 Syntax directed translation & type

Compiler Design. Fall Data-Flow Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

We can express this in dataflow equations using gen and kill sets, where the sets are now sets of expressions.

Data-flow Analysis - Part 2

Principles of Program Analysis: Data Flow Analysis

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices

Example (cont.): Control Flow Graph. 2. Interval analysis (recursive) 3. Structural analysis (recursive)

Flow Graph Theory. Depth-First Ordering Efficiency of Iterative Algorithms Reducible Flow Graphs

Live Variable Analysis. Work List Iterative Algorithm Rehashed

Introduction to Machine-Independent Optimizations - 4

Program Static Analysis. Overview

Data-flow Analysis. Y.N. Srikant. Department of Computer Science and Automation Indian Institute of Science Bangalore

Dependence and Data Flow Models. (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 1

Compiler Optimisation

We can express this in dataflow equations using gen and kill sets, where the sets are now sets of expressions.

EECS 583 Class 6 Dataflow Analysis

CS153: Compilers Lecture 17: Control Flow Graph and Data Flow Analysis

When we eliminated global CSE's we introduced copy statements of the form A:=B. There are many

Statements or Basic Blocks (Maximal sequence of code with branching only allowed at end) Possible transfer of control

Register Allocation. CS 502 Lecture 14 11/25/08

CS5363 Final Review. cs5363 1

Compilation 2012 Static Analysis

Foundations of Dataflow Analysis

A main goal is to achieve a better performance. Code Optimization. Chapter 9

Control-Flow Analysis

Thursday, December 23, The attack model: Static Program Analysis

Program analysis for determining opportunities for optimization: 2. analysis: dataow Organization 1. What kind of optimizations are useful? lattice al

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

Control Flow Analysis & Def-Use. Hwansoo Han

Principles of Program Analysis. Lecture 1 Harry Xu Spring 2013

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

CS 406/534 Compiler Construction Putting It All Together

20b -Advanced-DFA. J. L. Peterson, "Petri Nets," Computing Surveys, 9 (3), September 1977, pp

Page # 20b -Advanced-DFA. Reading assignment. State Propagation. GEN and KILL sets. Data Flow Analysis

Intermediate representation

EECS 583 Class 6 More Dataflow Analysis

Declarative Intraprocedural Flow Analysis of Java Source Code

CSE 501 Midterm Exam: Sketch of Some Plausible Solutions Winter 1997

Class 6. Review; questions Assign (see Schedule for links) Slicing overview (cont d) Problem Set 3: due 9/8/09. Program Slicing

4/24/18. Overview. Program Static Analysis. Has anyone done static analysis? What is static analysis? Why static analysis?

Lecture 23 CIS 341: COMPILERS

EECS 583 Class 6 Dataflow Analysis

Data-Flow Analysis Foundations

Contents of Lecture 2

Lecture 3 Local Optimizations, Intro to SSA

Calvin Lin The University of Texas at Austin

Data Flow Analysis. Suman Jana. Adopted From U Penn CIS 570: Modern Programming Language Implementa=on (Autumn 2006)

Interprocedural Analysis. Motivation. Interprocedural Analysis. Function Calls and Pointers

Program Analysis. Readings

Static Analysis. Systems and Internet Infrastructure Security

Introduction. Data-flow Analysis by Iteration. CSc 553. Principles of Compilation. 28 : Optimization III

CS 6110 S14 Lecture 38 Abstract Interpretation 30 April 2014

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Static Program Analysis Part 9 pointer analysis. Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University

Advanced Program Analyses for Object-oriented Systems

Lecture 5. Data Flow Analysis

What If. Static Single Assignment Form. Phi Functions. What does φ(v i,v j ) Mean?

Compiler Construction 2016/2017 Loop Optimizations

Dataflow Analysis. Xiaokang Qiu Purdue University. October 17, 2018 ECE 468

CSC D70: Compiler Optimization LICM: Loop Invariant Code Motion

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Construction 2009/2010 SSA Static Single Assignment Form

Department of Computer Sciences CS502. Purdue University is an Equal Opportunity/Equal Access institution.

Midterm 2. CMSC 430 Introduction to Compilers Spring Instructions Total 100. Name: April 18, 2012

Control Flow Analysis

Lecture Notes: Dataflow Analysis Examples

Programming Language Processor Theory

Automatic Determination of May/Must Set Usage in Data-Flow Analysis

Transcription:

Compiler Structure Source Code Abstract Syntax Tree Control Flow Graph Object Code CMSC 631 Program Analysis and Understanding Fall 2003 Data Flow Analysis Source code parsed to produce AST AST transformed to CFG Simpler representation of the program Fewer forms to reason about Data flow analysis operates on control flow graph (and other intermediate representations) 2 Control-Flow Graph Data Flow Analysis A framework for proving facts about programs Reasons about lots of little facts Little or no interaction between facts Works best on properties about how program computes Based on all paths through program Including infeasible paths 3 4 Available Expressions Data Flow Facts An expression e is available at program point p if e is computed on every path to p, and the value of e has not changed since the last time e is computed on p Optimization If an expression is available, need not be recomputed - (At least, if it s still in a register somewhere) Is expression e available? Facts: a + b is available a * b is available is available 5 6

Gen and Kill What is the effect of each statement on the set of facts? Stmt Gen Kill a + b Data Flow Equations Let s be a statement succ(s) = { immediate successor statements of s } pred(s) = { immediate predecessor statements of s} In(s) = program point just before executing s Out(s) = program point just after executing s a * b In(s) = s pred(s) Out(s ) a + b, a * b, Out(s) = Gen(s) (In(s) - Kill(s)) Note: These are also called transfer functions 7 8 Computing Available Expressions Liveness Analysis A variable v is live at program point p if {a + b, a * b} {a + b, a * b, } v will be used on some execution path originating from p... before v is overwritten Optimization If a variable is not live, no need to keep it in a register If variable is dead at assignment, can eliminate assignment 9 10 Data Flow Equations Gen and Kill Available expressions is a forward must analysis Data flow propagate in same dir as CFG edges What is the effect of each statement on the set of facts? Expr is available only if available on all paths Liveness is a backward may problem To know if variable live, need to look at future uses Variable is live if available on some path In(s) = Gen(s) (Out(s) - Kill(s)) Out(s) = s succ(s) In(s ) Stmt Gen Kill a, b x a, b y y, a, b a a 11 12

Computing Live Variables Very Busy Expressions {a, b} {x, a, b} {x, y, a, b} {y, a, b} {y, a, b} {x, y, a, b} {x} An expression e is very busy at point p if On every path from p, e is evaluated before the value of e is changed Optimization Can hoist very busy expression computation What kind of problem? Forward or backward? May or must? backward must 13 14 Reaching Definitions Space of Data Flow Analyses A definition of a variable v is an assignment to v A definition of variable v reaches point p if There is no intervening assignment to v Also called def-use information What kind of problem? Forward or backward? May or must? forward may Forward Backward May Reaching definitions Live variables Must Available expressions Very busy expressions Most data flow analyses can be classified this way A few don t fit: bidirectional analysis Lots of literature on data flow analysis 15 16 Partial Orders Lattices A partial order is a pair (P, ) such that P P is reflexive: x x A partial order is a lattice if and are defined on any set S: is the meet or greatest lower bound operation: is anti-symmetric: x y and y x x = y is transitive: x y and y z x z - x y x and x y y - if z x and z y, then z x y is the join or least upper bound operation: - x x y and y x y - if x z and y z, then x y z 17 18

Lattices (cont d) A finite partial order is a lattice if meet and join exist for every pair of elements A lattice has unique elements and such that x = x = x x = x x = Data Flow Facts and Lattices Typically, data flow facts form a lattice Example: Available expressions a+b, a*b, y > a+b a+b, a*b a+b, y > a+b a*b, y > a+b In a lattice, x y iff x y = x x y iff x y = y a*b a+b (none) y > a+b 19 20 Forward May Data Flow Algorithm Monotonicity Out(s) = Gen(s) for all statements s Or, if you like, Out(s) = Top W := { all statements } repeat Take s from W In(s) := s pred(s) Out(s ) temp := Gen(s) (In(s) - Kill(s)) if (temp!= Out(s)) { - Out(s) := temp - W := W succ(s) } until W = (worklist) A function f on a partial order is monotonic if Easy to check that operations to compute In and Out are monotonic s pred(s) Out(s) Gen(s) (In(s) - Kill(s)) x y f(x) f(y) 21 22 Forward May Data Flow Algorithm Termination Out(s) = Gen(s) for all statements s Or, if you like, Out(s) = Top W := { all statements } repeat (worklist) Take s from W temp := f s ( s pred(s) Out(s )) if (temp!= Out(s)) { - Out(s) := temp - W := W succ(s) } until W = We know the algorithm terminates because The lattice has finite height The operations to compute In and Out are monotonic On every iteration, we remove a statement from the worklist and/or move down the lattice 23 24

Distributive Data Flow Problems Benefit of Distributivity By monotonicity, we also have f(x y) f(x) f(y) Joins lose no information f g A function f is distributive if f(x y) = f(x) f(y) k(h(f( ) g( ))) = k(h(f( )) h(g( ))) = k(h(f( ))) k(h(g( ))) h k 25 26 Accuracy of Data Flow Analysis What Problems are Distributive? Ideally, we would like to compute the meet over all paths (MOP) solution: Let f s be the transfer function for statement s If p is a path {s 1,..., s n }, let f p = f n ;...;f 1 Let path(s) be the set of paths from the entry to s If a data flow problem is distributive, then solving the data flow equations in the standard way yields the MOP solution MOP(s) = p path(s) f p( ) 27 Analyses of how the program computes Live variables Available expressions Reaching definitions Very busy expressions All Gen/Kill problems are distributive 28 A Non-Distributive Example Constant propagation Practical Implementation Data flow facts = assertions that are true or false at a program point x := 1 x := 2 y := 2 y := 1 Represent set of facts as bit vector Fact i represented by bit i z := x + y Intersection = bitwise and, union = bitwise or, etc In general, analysis of what the program computes in not distributive Only a constant factor speedup But very useful in practice 29 30

Basic Blocks A basic block is a sequence of statements s.t. No statement except the last in a branch There are no branches to any statement in the block except the first In practical data flow implementations, Compute Gen/Kill for each basic block - Compose transfer functions Store only In/Out for each basic block Typical basic block ~5 statements Complexity Data flow algorithm works for any order in which nodes are visited Let G = (V, E) be the CFG Let k be the height of the lattice Running time is O(k E ) If k non-trivial, can do better by choosing order carefully 31 32 Order Matters Order Matters Cycles Assume forward data flow problem If G has cycles, visit in reverse postorder Order from depth-first search If G acyclic, visit in topological order Visit head before tail of edge Running time O( E ) No matter what size the lattice 33 Let Q = max # back edges on cycle-free path Nesting depth Back edge is from node to ancestor on DFS tree Then if Running time is O((Q + 1) E ) x.f(x) x (sufficient, but not necessary) - Slightly better bound than Kam + Ullman paper - Note direction of req t depends on top vs. bottom 34 Another Approach: Elimination Elimination Methods: Conditionals Recall in practice, one transfer function per basic block If IfThenElse Then Else Why not generalize this idea beyond a basic block? Collapse larger constructs into smaller ones, combining data flow equations Eventually program collapsed into a single node! Expand out back to original constructs, rebuilding information z f ite = (f then f if ) (f else f if ) Out(if) = f if (In(ite))) Out(then) = (f then f if )(In(ite))) Out(else) = (f else f if )(In(ite))) z 35 36

Non-Reducible Flow Graphs Elimination methods only work on reducible flow graphs Ones that can be collapsed Standard constructs yield only reducible flow graphs Unrestricted goto can yield non-reducible graphs B A C Flow-Sensitivity Data flow analysis is flow-sensitive The order of statements is taken into account I.e., we keep track of facts per program point Alternative: Flow-insensitive analysis Analysis the same regardless of statement order Standard example: types - /* x : int */ x :=... /* x : int */ z w 37 38 Terminology Review Data Flow Analysis and Functions Must vs. May (Not always followed in literature) Forwards vs. Backwards Flow-sensitive vs. Flow-insensitive Distributive vs. Non-distributive What happens at a function call? Lots of proposed solutions in data flow analysis literature In practice, only analyze one procedure at a time Consequences Call to function kills all data flow facts May be able to improve depending on language, e.g., function call may not affect locals 39 40 More Terminology Data Flow Analysis and The Heap An analysis that models only a single function at a time is intraprocedural An analysis that takes multiple functions into account is interprocedural An analysis that takes the whole program into account is...guess? Note: global analysis means more than one basic block, but still within a function 41 Data Flow is good at analyzing local variables But what about values stored in the heap? Not modeled in traditional data flow In practice: *x := e Assume all data flow facts killed (!) Or, assume x may affect any variable whose address has not been taken In general, hard to analyze pointers 42