More Dataflow Analysis

Similar documents
Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

CS553 Lecture Generalizing Data-flow Analysis 3

Dataflow analysis (ctd.)

Data-flow Analysis. Y.N. Srikant. Department of Computer Science and Automation Indian Institute of Science Bangalore

CSE 501 Midterm Exam: Sketch of Some Plausible Solutions Winter 1997

Why Data Flow Models? Dependence and Data Flow Models. Learning objectives. Def-Use Pairs (1) Models from Chapter 5 emphasized control

Data Flow Analysis. Agenda CS738: Advanced Compiler Optimizations. 3-address Code Format. Assumptions

Dependence and Data Flow Models. (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 1

Compiler Optimisation

Lecture 4 Introduction to Data Flow Analysis

Data Flow Analysis. CSCE Lecture 9-02/15/2018

Data-flow Analysis - Part 2

MIT Introduction to Dataflow Analysis. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Dataflow Analysis. Xiaokang Qiu Purdue University. October 17, 2018 ECE 468

Lecture Compiler Middle-End

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Compiler Optimisation

Simone Campanoni Alias Analysis

Calvin Lin The University of Texas at Austin

Data Flow Analysis. Program Analysis

Global Optimization. Lecture Outline. Global flow analysis. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization

Compiler Optimization and Code Generation

Data Flow Testing. CSCE Lecture 10-02/09/2017

Dependence and Data Flow Models. (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 1

Compiler Design. Fall Data-Flow Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Dataflow Analysis 2. Data-flow Analysis. Algorithm. Example. Control flow graph (CFG)

Lecture 9: Loop Invariant Computation and Code Motion

Homework Assignment #1 Sample Solutions

Why Global Dataflow Analysis?

A main goal is to achieve a better performance. Code Optimization. Chapter 9

Calvin Lin The University of Texas at Austin

Lecture 2. Introduction to Data Flow Analysis

Interprocedural Analysis. Motivation. Interprocedural Analysis. Function Calls and Pointers

CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers

Example. Example. Constant Propagation in SSA

CS153: Compilers Lecture 17: Control Flow Graph and Data Flow Analysis

Plan for Today. Concepts. Next Time. Some slides are from Calvin Lin s grad compiler slides. CS553 Lecture 2 Optimizations and LLVM 1

EECS 583 Class 6 Dataflow Analysis

We can express this in dataflow equations using gen and kill sets, where the sets are now sets of expressions.

Flow Analysis. Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM

Calvin Lin The University of Texas at Austin

Control Flow Analysis. Reading & Topics. Optimization Overview CS2210. Muchnick: chapter 7

What If. Static Single Assignment Form. Phi Functions. What does φ(v i,v j ) Mean?

CS164: Midterm Exam 2

Lecture 21 CIS 341: COMPILERS

Interprocedural Analysis with Data-Dependent Calls. Circularity dilemma. A solution: optimistic iterative analysis. Example

Data Flow Information. already computed

QUIZ. What is wrong with this code that uses default arguments?

Live Variable Analysis. Work List Iterative Algorithm Rehashed

CIS 341 Final Examination 4 May 2017

Midterm 2. CMSC 430 Introduction to Compilers Fall Instructions Total 100. Name: November 21, 2016

CSC D70: Compiler Optimization LICM: Loop Invariant Code Motion

Introduction to Machine-Independent Optimizations - 6

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices

We can express this in dataflow equations using gen and kill sets, where the sets are now sets of expressions.

Induction and Semantics in Dafny

Partial Redundancy Analysis

20b -Advanced-DFA. J. L. Peterson, "Petri Nets," Computing Surveys, 9 (3), September 1977, pp

Page # 20b -Advanced-DFA. Reading assignment. State Propagation. GEN and KILL sets. Data Flow Analysis

Code generation and local optimization

: Compiler Design. 9.6 Available expressions 9.7 Busy expressions. Thomas R. Gross. Computer Science Department ETH Zurich, Switzerland

Lecture 3 Local Optimizations, Intro to SSA

Topic I (d): Static Single Assignment Form (SSA)

Code generation and local optimization

CSC D70: Compiler Optimization Register Allocation

EECS 583 Class 6 Dataflow Analysis

Department of Computer Sciences CS502. Purdue University is an Equal Opportunity/Equal Access institution.

Efficient JIT to 32-bit Arches

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1

Lecture 5. Partial Redundancy Elimination

Lecture Notes: Dataflow Analysis Examples

Register Allocation. CS 502 Lecture 14 11/25/08

Deferred Data-Flow Analysis : Algorithms, Proofs and Applications

Program analysis for determining opportunities for optimization: 2. analysis: dataow Organization 1. What kind of optimizations are useful? lattice al

Midterm 2. CMSC 430 Introduction to Compilers Spring Instructions Total 100. Name: April 18, 2012

ELEC 876: Software Reengineering

EECS 583 Class 6 More Dataflow Analysis

CS 432 Fall Mike Lam, Professor. Data-Flow Analysis

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Context-Sensitive Pointer Analysis. Recall Context Sensitivity. Partial Transfer Functions [Wilson et. al. 95] Emami 1994

HW/SW Codesign. WCET Analysis

EECS 583 Class 6 Dataflow Analysis

Optimizing Finite Automata

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Interprocedural Analysis. Dealing with Procedures. Course so far: Terminology

Pointers. A pointer is simply a reference to a variable/object. Compilers automatically generate code to store/retrieve variables from memory

Foundations of Dataflow Analysis

Goals of Program Optimization (1 of 2)

Lecture 5. Data Flow Analysis

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Code generation and optimization

CSE 401 Final Exam. March 14, 2017 Happy π Day! (3/14) This exam is closed book, closed notes, closed electronics, closed neighbors, open mind,...

Wednesday, October 15, 14. Functions

Data Flow Analysis. Suman Jana. Adopted From U Penn CIS 570: Modern Programming Language Implementa=on (Autumn 2006)

Intro. Scheme Basics. scm> 5 5. scm>

Introduction. Data-flow Analysis by Iteration. CSc 553. Principles of Compilation. 28 : Optimization III

Example of Global Data-Flow Analysis

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Lecture 6 Foundations of Data Flow Analysis

Transcription:

More Dataflow Analysis

Steps to building analysis Step 1: Choose lattice Step 2: Choose direction of dataflow (forward or backward) Step 3: Create transfer function Step 4: Choose confluence operator (i.e., what to do at merges) Let s walk through these steps for a new analysis

Liveness analysis Which variables are live at a particular program point? Used all over the place in compilers Register allocation Loop optimizations

Choose lattice What do we want to know? At each program point, want to maintain the set of variables that are live Lattice elements: sets of variables Natural choice for lattice: powerset of variables! {a,b,c} {a,b} {a,c} {b,c} {a} {b} {c} { }

Choose dataflow direction A variable is live if it is used later in the program without being redefined At a given program point, we want to know information about what happens later in the program This means that liveness is a backwards analysis Recall that we did liveness backwards when we looked at single basic blocks

Create x-fer functions What do we do for a statement like: x = y + z If x was live before (i.e., live after the statement), it isn t now (i.e., is not live before the statement) If y and z were not live before, they are now What about: x = x

Let s generalize Create x-fer functions For any statement s, we can look at which live variables are killed, and which new variables are made live (generated) Which variables are killed in s? The variables that are defined in s: DEF(s) Which variables are made live in s? The variables that are used in s: USE(s) If the set of variables that are live after s is X, what is the set of variables live before s? T s (X) =use(s) (X def(s))

Dealing with aliases Aliases, as usual, cause problems Consider int x, y, r, s int *z, *w; if (...) z = &y else z = &x if (...) w = &r else w = &s *z = *w; //which variable is defined? which is used? What should USE(*z = *w) and DEF(*z = *w) be? Keep in mind: the goal is to get a list of variables that may be live at a program point For now, assume there is no aliasing

Dealing with function calls Similar problem as aliases: int foo(int &x, int &y); //pass by reference! void main() { int x, y, z; z = foo(x, y); } Simple solution: functions can do anything redefine variables, use variables So DEF(foo()) is { } and USE(foo()) is V Real solution: interprocedural analysis, which determines what variables are used and defined in foo

Choose confluence operator What happens at a merge point? The variables live in to a merge point are the variables that are live along either branch y = x? y = w Confluence operator: Set union ( ) of all live sets of outgoing edges x = w T merge = X succ(merge) X

How to initialize analysis? At the end of the program, we know no variables are live value at exit point is { } What about if we re analyzing a single function? Need to make conservative assumption about what may be live What about elsewhere in the program? We should initialize other sets to { }

READ(Z) { } READ(N) { } X = 2 { } { } { } X = X + Z X < N? { } { } { } PRINT(X) { }

An alternate approach Dataflow analyses like live-variable analysis are bit-vector analyses: are even more structured than regular dataflow analysis Consistent lattice: powerset Consistent transfer functions Many sources only talk about bitvector dataflow

Bit-vector lattices Consider a single element, V, of the powerset(s) lattice Each item in S either appears in V or does not: can represent using a single bit Can represent V as a bit vector {a, b, c} = <1, 1, 1> { } = <0, 0, 0> {b, c} = <0, 1, 1> and (which are just and ) are simply bitwise and, respectively

Eliminating merge nodes Many dataflow presentations do not use explicit merge nodes in CFG How do we handle this? Problem: now a node may be a statement and a merge point X = X + Z Solution: compose confluence operator and transfer functions Note: non-merge nodes have just one successor; this equation works for all nodes! T (s) =use(s) (( X succ(s) X) def(s)) X = X + Z

Simplifying matters T (s) =use(s) (( X succ(s) X) def(s)) Lets split this up into two different sets OUT(s): the set of variables that are live immediately after a statement is executed IN(s): the set of variables that are live immediately before a statement is executed IN(s) = use(s) (OUT(s) def(s)) OUT(s) = t succ(s) IN(t)

Generalizing USE(s) are the variables that become live due to a statement they are generated by this statement DEF(s) are the variables that stop being live due to a statement they are killed by this statement IN(s) = gen(s) (OUT(s) kill(s)) OUT(s) = t succ(s) IN(t)

Bit-vector analyses A bit-vector analysis is any analysis that Operates over the powerset lattice, ordered by and with and as its meet and join Has transfer functions that can be written in the form: IN(s) = gen(s) (OUT(s) kill(s)) OUT(s) = t succ(s) IN(t) gen and kill are dependent on the statement, but not on IN or OUT Things are a little different for forward analyses, and some analyses use instead of

Reaching definitions What definitions of a variable reach a particular program point A definition of variable x from statement s reaches a statement t if there is a path from s to t where x is not redefined Especially important if x is used in t Used to build def-use chains and use-def chains, which are key building blocks of other analyses Used to determine dependences: if x is defined in s and that definition reaches t then there is a flow dependence from s to t We used this to determine if statements were loop invaraint All definitions that reach an expression must originate from outside the loop, or themselves be invariant

Creating a reaching-def analysis Can we use a powerset lattice? At each program point, we want to know which definitions have reached a particular point Can use powerset of set of definitions in the program V is set of variables, S is set of program statements Definition: d V S Use a tuple, <v, s> How big is this set? At most V S definitions

Forward or backward? What do you think?

Choose confluence operator Remember: we want to know if a definition may reach a program point What happens if we are at a merge point and a definition reaches from one branch but not the other? We don t know which branch is taken! We should union the two sets any of those definitions can reach We want to avoid getting too many reaching definitions should start sets at

Transfer functions for RD Forward analysis, so need a slightly different formulation Merged data flowing into a statement IN(s) = t pred(s) OUT(t) OUT(s) = gen(s) (IN(s) kill(s)) What are gen and kill? e.g., gen(s 1: x = e) is <x, s1> gen(s): the set of definitions that may occur at s kill(s): all previous definitions of variables that are definitely redefined by s e.g., kill(s1: x = e) is <x, *>

Available expressions We ve seen this one before What is the lattice? powerset of all expressions appearing in a procedure Forward or backward? Confluence operator?

Transfer functions for meet What do the transfer functions look like if we are doing a meet? IN(S) = t pred(s) OUT(t) OUT(S) = gen(s) (IN(S) kill(s) gen(s): expressions that must be computed in this statement kill(s): expressions that use variables that may be defined in this statement Note difference between these sets and the sets for reaching definitions or liveness Insight: gen and kill must never lead to incorrect results Must not decide an expression is available when it isn t, but OK to be safe and say it isn t Must not decide a definition doesn t reach, but OK to overestimate and say it does

Analysis initialization How do we initialize the sets? If we start with everything initialized to, we compute the smallest sets If we start with everything initialized to, we compute the largest Which do we want? It depends! Reaching definitions: a definition that may reach this point We want to have as few reaching definitions as possible Available expressions: an expression that was definitely computed earlier We want to have as many available expressions as possible Rule of thumb: if confluence operator is, start with, otherwise start with

Analysis initialization (II) The set at the entry of a program (for forward analyses) or exit of a program (for backward analyses) may be different One way of looking at this: start statement and end statement have their own transfer functions General rule for bitvector analyses: no information at beginning of analysis, so first set is always { }

Very busy expressions An expression is very busy if it is computed on every path that leads from a program point Why does this matter? Can calculate very busy expressions early without wasting computation (since the expression is used at least once on every outgoing path) this can save space Good candidates for loop invariant code motion

Very busy expressions Lattice? Direction? Confluence operator? Initialization? Transfer functions? Gen? Kill?

Four types of dataflow Analysis can either be forward or backward Analysis can either be over all paths or over any path All paths: merges consider values from all paths Any path: merges consider values from any path Forward Backward All paths available expressions very busy expressions Any path reaching definitions liveness analysis