Dataflow Analysis. Xiaokang Qiu Purdue University. October 17, 2018 ECE 468

Similar documents
Control flow graphs and loop optimizations. Thursday, October 24, 13

Data Flow Analysis. Program Analysis

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Data Flow Analysis. Agenda CS738: Advanced Compiler Optimizations. 3-address Code Format. Assumptions

CFG (Control flow graph)

More Dataflow Analysis

Global Optimization. Lecture Outline. Global flow analysis. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization

Lecture 3 Local Optimizations, Intro to SSA

(How Not To Do) Global Optimizations

CSE 501 Midterm Exam: Sketch of Some Plausible Solutions Winter 1997

Control Flow Analysis & Def-Use. Hwansoo Han

Compiler Construction 2009/2010 SSA Static Single Assignment Form

Introduction to Machine-Independent Optimizations - 6

Goals of Program Optimization (1 of 2)

Program analysis for determining opportunities for optimization: 2. analysis: dataow Organization 1. What kind of optimizations are useful? lattice al

Control Flow Analysis

Sta$c Single Assignment (SSA) Form

Calvin Lin The University of Texas at Austin

CS153: Compilers Lecture 17: Control Flow Graph and Data Flow Analysis

Introduction to Code Optimization. Lecture 36: Local Optimization. Basic Blocks. Basic-Block Example

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1

Intermediate representation

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

Data Flow Analysis. CSCE Lecture 9-02/15/2018

Conditional Elimination through Code Duplication

Intermediate Code & Local Optimizations

Tour of common optimizations

Optimizations. Optimization Safety. Optimization Safety CS412/CS413. Introduction to Compilers Tim Teitelbaum

Control-Flow Analysis

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Compiler Optimization Techniques

CS553 Lecture Generalizing Data-flow Analysis 3

Dataflow Analysis 2. Data-flow Analysis. Algorithm. Example. Control flow graph (CFG)

Dataflow analysis (ctd.)

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

Intermediate Code & Local Optimizations. Lecture 20

MIT Introduction to Program Analysis and Optimization. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Control Flow Analysis. Reading & Topics. Optimization Overview CS2210. Muchnick: chapter 7

Using Static Single Assignment Form

Constant Propagation with Conditional Branches

Homework Assignment #1 Sample Solutions

The Static Single Assignment Form:

Multi-dimensional Arrays

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Combining Optimizations: Sparse Conditional Constant Propagation

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

Programming Language Processor Theory

Register Allocation. CS 502 Lecture 14 11/25/08

Program Static Analysis. Overview

Topic 9: Control Flow

CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers

COP5621 Exam 4 - Spring 2005

Compiler Optimisation

CSE Section 10 - Dataflow and Single Static Assignment - Solutions

CS577 Modern Language Processors. Spring 2018 Lecture Optimization

Midterm 2. CMSC 430 Introduction to Compilers Fall Instructions Total 100. Name: November 11, 2015

Statements or Basic Blocks (Maximal sequence of code with branching only allowed at end) Possible transfer of control

More Code Generation and Optimization. Pat Morin COMP 3002

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

Other Forms of Intermediate Code. Local Optimizations. Lecture 34

Administrative. Other Forms of Intermediate Code. Local Optimizations. Lecture 34. Code Generation Summary. Why Intermediate Languages?

Interprocedural Analysis with Data-Dependent Calls. Circularity dilemma. A solution: optimistic iterative analysis. Example

Combining Analyses, Combining Optimizations - Summary

Lecture Compiler Middle-End

Static Analysis and Dataflow Analysis

Sparse Conditional Constant Propagation

Lecture Outline. Intermediate code Intermediate Code & Local Optimizations. Local optimizations. Lecture 14. Next time: global optimizations

CIS 341 Final Examination 4 May 2018

Contents of Lecture 2

Principles of Compiler Design

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

CODE GENERATION Monday, May 31, 2010

Intermediate Representations Part II

Calvin Lin The University of Texas at Austin

A Gentle Introduction to Program Analysis

Sparse Conditional Constant Propagation

Lecture 21 CIS 341: COMPILERS

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

MIT Introduction to Dataflow Analysis. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Data-flow Analysis - Part 2

Lecture Notes: Widening Operators and Collecting Semantics

Topic I (d): Static Single Assignment Form (SSA)

Sparse Conditional Constant Propagation

Principle of Compilers Lecture VIII: Intermediate Code Generation. Alessandro Artale

Lecture 4 Introduction to Data Flow Analysis

Operational Semantics of Cool

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

A Propagation Engine for GCC

Impossible Programs. Tom Stuart

Flow Control. CSC215 Lecture

Compiler Optimization and Code Generation

Code Optimization. Code Optimization

CSC D70: Compiler Optimization

Compilers. Optimization. Yannis Smaragdakis, U. Athens

CS 426 Fall Machine Problem 4. Machine Problem 4. CS 426 Compiler Construction Fall Semester 2017

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

Challenges in the back end. CS Compiler Design. Basic blocks. Compiler analysis

20b -Advanced-DFA. J. L. Peterson, "Petri Nets," Computing Surveys, 9 (3), September 1977, pp

Introduction to Optimization Local Value Numbering

Transcription:

Dataflow Analysis Xiaokang Qiu Purdue University ECE 468 October 17, 2018

Program optimizations So far we have talked about different kinds of optimizations Peephole optimizations Local common sub-expression elimination What about global optimizations Optimizations across multiple basic blocks (usually a whole procedure) Not just a single loop

Useful optimizations Common subexpression elimination (global) Need to know which expressions are available at a point Dead code elimination Need to know if the effects of a piece of code are never needed, or if code cannot be reached Constant folding Need to know if variable has a constant value So how do we get this information?

Dataflow analysis Framework for doing compiler analyses to drive optimization Works across basic blocks Examples Constant propagation: determine which variables are constant Liveness analysis: determine which variables are live Available expressions: determine which expressions have valid computed values Reaching definitions: determine which definitions could reach a use

Example: constant propagation Goal: determine when variables take on constant values Why? Can enable many optimizations Constant folding x = 1; y = x + 2; if (x > z) then y = 5... y... Create dead code x = 1; y = 3; if (1 > z) then y = 5... y... x = 1; y = x + 2; if (y > x) then y = 5... y... x = 1; y = 3; //dead code if (true) then y = 5 //simplify!... y...

How can we find constants? Run program and see which variables are constant? Problem: variables can be constant with some inputs, not others need an approach that works for all inputs! Problem: program can run forever (infinite loops?) need an approach that we know will finish Idea: run program symbolically Essentially, keep track of whether a variable is constant or not constant (but nothing else)

Overview of algorithm Build control flow graph Perform symbolic evaluation Keep track of whether variables are constant or not Replace constant-valued variable uses with their values, try to simplify expressions and control flow

Control Flow Graph

x = 1; y = x + 2; if (y > x) then y = 5;... y... Build CFG

Moving beyond basic blocks Up until now, we have focused on single basic blocks What do we do if we want to consider larger units of computation Whole procedures? Whole program? Idea: capture control flow of a program How control transfers between basic blocks due to: Conditionals Loops

Representation Use standard three-address code Jump targets are labeled Also label beginning/end of functions Want to keep track of targets of jump statements Any statement whose execution may immediately follow execution of jump statement Explicit targets: targets mentioned in jump statement Implicit targets: statements that follow conditional jump statements The statement that gets executed if the branch is not taken

Running example A = 4 t1 = A * B repeat { t2 = t1/c if (t2 W) { M = t1 * k t3 = M + I } H = I M = t3 - H } until (T3 0)

Running example 1 A = 4 2 t1 = A * B 3 L1: t2 = t1 / C 4 if t2 < W goto L2 5 M = t1 * k 6 t3 = M + I 7 L2: H = I 8 M = t3 - H 9 if t3 0 goto L3 10 11 L3: goto L1 halt

CFG for running example How do we build this automatically?

Control flow graphs Divides statements into basic blocks Basic block: a maximal sequence of statements I0, I1, I2,..., In such that if Ij and Ij+1 are two adjacent statements in this sequence, then The execution of Ij is always immediately followed by the execution of Ij+1 The execution of Ij+1 is always immediate preceded by the execution of Ij Edges between basic blocks represent potential flow of control

Constructing a CFG How to divide statements into basic blocks? Identify leaders: first statement of a basic block In program order, construct a block by appending subsequent statements up to, but not including, the next leader How to identify leaders? First statement in the program Explicit target of any conditional or unconditional branch Implicit target of any branch

Running example 1 A = 4 2 t1 = A * B 3 L1: t2 = t1 / C 4 if t2 < W goto L2 5 M = t1 * k 6 t3 = M + I 7 L2: H = I 8 M = t3 - H 9 if t3 0 goto L3 10 11 L3: goto L1 halt Leaders = Basic blocks =

Partitioning 1 A = 4 2 t1 = A * B 3 L1: t2 = t1 / C 4 if t2 < W goto L2 5 M = t1 * k 6 t3 = M + I 7 L2: H = I 8 M = t3 - H 9 if t3 0 goto L3 10 11 L3: goto L1 halt Leaders = {1, 3, 5, 7, 10, 11} Basic blocks = { {1, 2}, {3, 4}, {5, 6}, {7, 8, 9}, {10}, {11} }

Partitioning algorithm Input: set of statements, stat(i) = i th statement in input Output: set of leaders, set of basic blocks where block(x) is the set of statements in the block with leader x Algorithm leaders = {1} //Leaders always includes first statement for i = 1 to n // n = number of statements if stat(i) is a branch, then leaders = leaders all potential targets end for worklist = leaders while worklist not empty do x = remove earliest statement in worklist block(x) = {x} for (i = x + 1; i n and i leaders; i++) block(x) = block(x) {i} end for end while

Putting edges in CFG

Putting edges in CFG There is a directed edge from B1 to B2 if There is a branch from the last statement of B1 to the first statement (leader) of B2 B2 immediately follows B1 in program order and B1 does not end with an unconditional branch Input: block, a sequence of basic blocks Output: The CFG for i = 1 to block x = last statement of block(i) if stat(x) is a branch, then for each explicit target y of stat(x) create edge from block i to block y end for if stat(x) is not unconditional then create edge from block i to block i+1 end for

Discussion In constant analysis, we will also consider the statementlevel CFG, where each node is a statement rather than a basic block Either kind of graph is referred to as a CFG In statement-level CFG, we often use a node to explicitly represent merging of control Control merges when two different CFG nodes point to the same node Note: if input language is structured, front-end can generate basic block directly GOTO considered harmful

Statement level CFG

Reachability B2 is reachable from B1: there is a path from B1 to B2 in the CFG Useful in optimization Not reachable from the entry block: dead code Not reachable to the exit block: infinite loop Need to check the feasibility of the path conditions Undecidable in general: Halting problem Domination Must-reach relationship B1 dominates B2: Every path from entry to B2 pass through B1

Procedure calls More difficult to generate: why? Function pointers Dynamic dispatch Higher-order functions Procedure Call Graphs An edge from P to Q: P can invoke Q by its name

Symbolic Evaluation

Symbolic evaluation Idea: replace each value with a symbol constant (specify which), no information, definitely not constant Can organize these possible values in a lattice Set of possible values, arranged from least information to most information

Symbolic evaluation Evaluate expressions symbolically: eval(e, Vin) If e evaluates to a constant, return that value. If any input is (or ), return (or ) Why? Two special operations on lattice meet(a, b) highest value less than or equal to both a and b join(a, b) lowest value greater than or equal to both a and b Join often written as a b Meet often written as a b

Putting it together Keep track of the symbolic value of a variable at every program point (on every CFG edge) State vector What should our initial value be? Starting state vector is all Can t make any assumptions about inputs must assume not constant Everything else starts as, since we have no information about the variable at that point

Executing symbolically For each statement t = e evaluate e using Vin, update value for t and propagate state vector to next statement What about switches? If e is true or false, propagate Vin to appropriate branch What if we can t tell? Propagate Vin to both branches, and symbolically execute both sides What do we do at merges?

Handling merges Have two different Vins coming from two different paths Goal: want new value for Vin to be safe (shouldn t generate wrong information), and we don t know which path we actually took Consider a single variable. Several situations: V1 =, V2 = * Vout = * V1 = constant x, V2 = x Vout = x V1 = constant x, V2 = constant y Vout = V1 =, V2 = * Vout = Generalization: Vout = V1 V2 (Common Ancester)

Result: worklist algorithm Associate state vector with each edge of CFG, initialize all values to, worklist has just start edge While worklist not empty, do: Process the next edge from worklist Symbolically evaluate target node of edge using input state vector If target node is assignment (x = e), propagate Vin[eval(e)/x] to output edge If target node is branch (e?) If eval(e) is true or false, propagate Vin to appropriate output edge Else, propagate Vin along both output edges If target node is merge, propagate join(all Vin) to output edge If any output edge state vector has changed, add it to worklist

Running example

Running example 3 3

What do we do about loops? Unless a loop never executes, symbolic execution looks like it will keep going around to the same nodes over and over again Insight: if the input state vector(s) for a node don t change, then its output doesn t change If input stops changing, then we are done! Claim: input will eventually stop changing. Why? Because the value of a variable can only move up the lattice

Loop example First time through loop, x = 1 Subsequent times, x =

Complexity of algorithm V = # of variables, E = # of edges Height of lattice = 2 each state vector can be updated at most 2 * V times. So each edge is processed at most 2 * V times, so we process at most 2 * E * V elements in the worklist. Cost to process a node: O(V) Overall, algorithm takes O(EV 2 ) time

Question Can we generalize this algorithm and use it for more analyses?

Constant propagation Step 1: choose lattice (which values are you going to track during symbolic execution)? Use constant lattice Step 2: choose direction of dataflow (if executing symbolically, can run program backwards!) Run forward through program Step 3: create transfer functions How does executing a statement change the symbolic state? Step 4: choose confluence operator What do do at merges? For constant propagation, use join