Control Flow Analysis. Reading & Topics. Optimization Overview CS2210. Muchnick: chapter 7

Similar documents
Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

Goals of Program Optimization (1 of 2)

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Control-Flow Analysis

CS553 Lecture Generalizing Data-flow Analysis 3

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Control Flow Analysis & Def-Use. Hwansoo Han

Example (cont.): Control Flow Graph. 2. Interval analysis (recursive) 3. Structural analysis (recursive)

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Data Flow Analysis. Agenda CS738: Advanced Compiler Optimizations. 3-address Code Format. Assumptions

CONTROL FLOW ANALYSIS. The slides adapted from Vikram Adve

CS 406/534 Compiler Construction Putting It All Together

Statements or Basic Blocks (Maximal sequence of code with branching only allowed at end) Possible transfer of control

Homework Assignment #1 Sample Solutions

Lecture 2: Control Flow Analysis

Compiler Construction 2010/2011 Loop Optimizations

Compiler Optimization and Code Generation

Why Global Dataflow Analysis?

Data Flow Analysis. Program Analysis

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Compiler Construction 2016/2017 Loop Optimizations

CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers

Intermediate representation

Control Flow Analysis

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

Data-flow Analysis. Y.N. Srikant. Department of Computer Science and Automation Indian Institute of Science Bangalore

CSC D70: Compiler Optimization LICM: Loop Invariant Code Motion

Global Optimization. Lecture Outline. Global flow analysis. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization

Lecture 4. More on Data Flow: Constant Propagation, Speed, Loops

Compiler Optimisation

A main goal is to achieve a better performance. Code Optimization. Chapter 9

More Dataflow Analysis

EECS 583 Class 2 Control Flow Analysis LLVM Introduction

Challenges in the back end. CS Compiler Design. Basic blocks. Compiler analysis

Lecture 8: Induction Variable Optimizations

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

Why Data Flow Models? Dependence and Data Flow Models. Learning objectives. Def-Use Pairs (1) Models from Chapter 5 emphasized control

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210

Flow Analysis. Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM

Tour of common optimizations

Review; questions Basic Analyses (2) Assign (see Schedule for links)

Loops. Lather, Rinse, Repeat. CS4410: Spring 2013

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Program Static Analysis. Overview

Data-flow Analysis - Part 2

MIT Introduction to Program Analysis and Optimization. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

CS 432 Fall Mike Lam, Professor. Data-Flow Analysis

Lecture 9: Loop Invariant Computation and Code Motion

Dependence and Data Flow Models. (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 1

Compilers. Optimization. Yannis Smaragdakis, U. Athens

COP5621 Exam 4 - Spring 2005

Introduction to Machine-Independent Optimizations - 4

CSE 501 Midterm Exam: Sketch of Some Plausible Solutions Winter 1997

Principles of Program Analysis. Lecture 1 Harry Xu Spring 2013

Dataflow Analysis. Xiaokang Qiu Purdue University. October 17, 2018 ECE 468

Control flow graphs and loop optimizations. Thursday, October 24, 13

Lecture Compiler Middle-End

Lecture 3 Local Optimizations, Intro to SSA

Lecture 6 Foundations of Data Flow Analysis

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

Code Optimization. Code Optimization

Lecture 6 Foundations of Data Flow Analysis

Data-Flow Analysis Foundations

Module 14: Approaches to Control Flow Analysis Lecture 27: Algorithm and Interval. The Lecture Contains: Algorithm to Find Dominators.

Dataflow analysis (ctd.)

Calvin Lin The University of Texas at Austin

CS 6110 S14 Lecture 38 Abstract Interpretation 30 April 2014

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

EECS 583 Class 3 More on loops, Region Formation

Interprocedural Analysis with Data-Dependent Calls. Circularity dilemma. A solution: optimistic iterative analysis. Example

Lecture Notes on Loop Optimizations

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Data Flow Analysis. CSCE Lecture 9-02/15/2018

Office Hours: Mon/Wed 3:30-4:30 GDC Office Hours: Tue 3:30-4:30 Thu 3:30-4:30 GDC 5.

Control Flow Analysis

CSE P 501 Compilers. Loops Hal Perkins Spring UW CSE P 501 Spring 2018 U-1

Compiler Construction 2009/2010 SSA Static Single Assignment Form

CS5363 Final Review. cs5363 1

Loop Invariant Code Motion. Background: ud- and du-chains. Upward Exposed Uses. Identifying Loop Invariant Code. Last Time Control flow analysis

Lecture 4 Introduction to Data Flow Analysis

Introduction to Optimization Local Value Numbering

LOGIC SYNTHESIS AND VERIFICATION ALGORITHMS. Gary D. Hachtel University of Colorado. Fabio Somenzi University of Colorado.

Algorithms for Finding Dominators in Directed Graphs

Thursday, December 23, The attack model: Static Program Analysis

ELEC 876: Software Reengineering

Lecture 2. Introduction to Data Flow Analysis

Program Analysis. Readings

Using Static Single Assignment Form

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210

4/24/18. Overview. Program Static Analysis. Has anyone done static analysis? What is static analysis? Why static analysis?

Interprocedural Analysis with Data-Dependent Calls. Circularity dilemma. A solution: optimistic iterative analysis. Example

Compiler Optimisation

Live Variable Analysis. Work List Iterative Algorithm Rehashed

Introduction to Machine-Independent Optimizations - 1

Lecture 23 CIS 341: COMPILERS

Transcription:

Control Flow Analysis CS2210 Lecture 11 Reading & Topics Muchnick: chapter 7 Optimization Overview Control Flow Analysis Maybe start data flow analysis Optimization Overview Two step process Analyze program to learn things about it program analysis Determine when transformations are legal & profitable Transform the program based on information into semantically equivalent but better output program Optimization is a misnomer Almost never optimal Sometimes slows some programs down on some inputs (try to speed up most programs on most inputs) 1

Semantics Subtleties Evaluation order Arithmetic properties (e.g. associativity) Behavior in error cases Some languages very precise E.g., Ada Some weaker Potentially more optimization opportunity Analysis Scope Peephole Across small number of adjacent instructions Trivial Local Within a basic block simple Intraprocedural (aka. Global) Across basic blocks within a procedure More complex, branches, merges loops Interprocedural Across procedures, within whole program Even more complex, calls, returns More useful for higher-level languages Hard with separate compilation Whole-program Useful for safety properties Most complex Catalog of Optimizations Arithmetic simplification Constant folding x: = 3+4 => x := 7 Strength reduction x := y*4 => x := y<<2 Constant propagation x:=5 => x:=5 y := x+2 y :=5+2 => y := 7 Copy propagation x := y => x := y w := w+ x w := w+y 2

Catalog (2) Common Subexpression Elimination (CSE) x : = a + b => x := a+b y := a +b y := x Can also eliminate redundant memory references, branch tests Partial Redundancy Elimination (PRE) Like CSE but earlier expression available only along some path if then => if then x := a+b t :=a+b; x:=t end else t:=a+b end y := a+b y := t Catalog (3) Pointer analysis p := &x => p = &x *p := 5 *p := 5 y := x+1 y := 6 Dead assignment elimination x := y * z /* no use of x */ x := 6 Dead code elimination if (false) then Integer range analysis for (i=0;i<10;i++) { if (i>=10) goto error a[i] := 0; } Loop Optimizations (1) Loop-invariant code motion for j := 1 to 10 for i := 1 to 10 a[i] := a[i] + b[j] for j := 1 to 10 t := b[j] for i := 1 to 10 a[i] := a[i] + t Induction variable elimination for i := t to 10 a[i] := a[i] + 1 for p := &a[1] to &a[10] *p := *p + 1 a[i] is several instructions *p is one 3

Loop Optimizations (2) Loop unrolling for i := 1 to N a[i] := a[i] + 1 for i := 1 to N by 4 a[i] := a[i] + 1 a[i+1] := a[i+1] + 1 a[i+2] := a[i+2] + 1 a[i+3] := a[i+3] + 1 Creates more optimization opportunities in loop body Parallelization Interchange Reversal Fusion Blocking / tiling Data cache locality optimization Call Optimizations Inlining l := w : = 4 a := area(l,w) l := w := 4 a := l*w l := w := 4 a := l <<2 Many simple optimizations become important after inlining Interprocedural constant propagation More Call Optimizations Static binding of dynamic calls Calls through function pointers in imperative languages Call of computed function in functional language OO-dispatch in OO languages (e.g., COOL) If receiver class can be deduced, can replace with direct call Other optimizations possible even when multiple targets (e.g., using PICs = polymorphic inline caches) Procedure specialization Partial evaluation 4

Machine-dependent Optimizations Register allocation Instruction selection Important for CISCs Instruction scheduling Particularly important with long-delay instructions and on wide-issue machines (superscalar + VLIW) The Phase Ordering Problem In what order should optimizations be performed? Some optimizations create opportunities for others (order according to this dependence) Can make some optimizations simple Later optimization will clean up What about adverse interactions Common subexpression elimination register allocation Register allocation instructions scheduling What about cyclic dependences? Constant folding constant propagation Control Flow Analysis 5

Approaches Dominator-based Control flow graph with dominator relation to identify loops Most commonly used Interval-based Nested regions (= intervals) Control tree Special case: structural analysis Most sophisticated Classifies control structures (not just loops) Basic Blocks and Control Flow Graphs (CFGs) Basic block = maximal sequence of instructions entered only from first and exited from last Entry can be Procedure entry point Branch target Instruction immediately following branch or return Entry instructions are called leaders receive m Example receive m f0 <- 0 Y f1 <- 1 if m <= 1 goto L3 i <- 2 return m L1:if i<= m goto L2 return f2 L2:f2 <- f0 + f1 N f0 <- f1 f1 <- f2 return f2 i <- i+1 L3:return m f0 <- 0 f1 <- 1 m <= 1 N i <- 2 i <= m Y f2 <- f0+f1 f0 <- f1 f1 <- f2 i <- i+1 6

Example CFG Y entry N B2 B1 B3 B5 N B4 B6 Y exit Dominators & Postdominators Binary relation useful to determine loops Node d dom i : every possible execution path from entry to i includes d Dominance relatation is Reflexive: d dom d Transitive: a dom b and b dom c then a dom c Anti-symmetric: if a dom b and b dom a then a = b Immediate dominance For a b :a idom b iff a dom b and c: c dom b and a dom c c = a Dominators & Postdominators p pdom i : every possible execution path from node i to exit includes p Dual relation: i dom p in CFG with edges reversed and entry and exit switched a strictly dominates b : a dom b and a b 7

Dominator Example Y entry N B2 B1 B3 B5 N B4 B6 Y exit Computing Dominators (easy but slow) Initialize dom(i) = set of all nodes for i entry, dom(entry) = {entry} While changes occur do dom(i) = {i} U (dom(i) dom(pred(i)) for all predecessors of I) Works fastest if nodes are processed in reverse postorder O(n 2 e) complexity, n number of nodes, e number of edges Another Example Preorder: 0-1 - 2-3 - 4-5 -6 Postorder: 5-4 - 3-6 - 2-1 - 0 Reverse Postorder: 0-1 -2-6 - 3-4 - 5 8

Computing IDOM Compute dominators tmp(i) := dom(i) - {i} Remove all n tmp(i) from tmp(i) for which x ( n) tmp(i) such that n tmp(x) Computing Dominators Faster Lengauer & Tarjan s algorithm Described in the book O(e α(e,n)) running time where α is the inverse of Ackerman s function 9

Loops & SCCs An edge e = (m,n) is called a back edge iff n dom m (head dominates tail) A natural loop of back edge m->n =: subgraph containing n and all nodes from which m can be reached w/o passing through n and the edges that connect those nodes n is called the loop header Preheader a node inserted immediately before the loop header Useful in many loop optimization as a landing pad for code from the loop body Headers and Preheaders preheader header header B1 B2 B1 B2 B3 B3 Natural Loop Properties Natural loops with different headers are either Nested Disjoint What about natural loops with the same header? B1 B2 B3 10

Strongly Connected Components Generalization of loops A SCC is a subgraph G S =<N S,E S > in which every node in N S is reachable from every other node in N S by a path including only edges from E S An SCC S is maximal, iff every SCC containing it is the S itself Reducible Flow Graphs A flow graph G=(N,E) is reducible (aka. well-structured) if E can be partitioned into E = E B U E F, where E B is the back edge set, so that (N, E F ) forms a DAG in which all nodes are reachable from the entry node Patterns that make CFGs irreducible, are called improper regions Impossible in some languages (e.g., Modula-2) Dealing with Irreducibility Cannot use structural analysis directly Use iterative data flow analysis instead Make graph well-structured using node splitting Induced iteration on the lattice of monotone functions from the lattice to itself (more on this later) 11

Control Trees & Interval Analysis Idea: divide CFG into regions of various kinds Combine each region into a new node (abstract node) Obtain an abstract flow graph Final result is called control tree Root of control tree is abstract flow graph representing original flowgraph Leaves of control tree are basic blocks Nodes between root and leaves represent regions Edges represent relationships between abstract node and its descendents Example: T1-T2 Analysis T1: B1 B1a T2: B1 B2 B1a Interval Analysis A (maximal) interval I M (h) with header h is the maximal single-entry subgraph with h as only entry node and with all closed subpaths in the subgraph containing h Like natural loop but with acyclic stuff dangling off loop exits A (minimal) interval is either A natural loop A maximal acyclic subgraph A minimal irreducible region 12

Interval Analysis Steps Iterate until done: Postorder traversal of CFG looking for loop headers Construct natural loop for each loop header and reduce the loop to an abstract region natural loop For each set of entries of an improper region construct minimal SCC and reduce it to improper region For entry node and each immediate descendant of a node in a natural loop or irreducible region construct maximal acyclic graph with that node as root: if more than one node results, reduce to acyclic region Structural Analysis A refinement of interval analysis Advantage compared to standard iterative data flow analysis Uses specialized flow functions for recognized structures that are much faster Data flow equations are determined by the syntax and semantics of the (source) language Recognizes more structures than standard interval analysis Region Types Blocks If-then If-then-else Case-switch Self loop While loop Natural loop Improper interval Proper interval 13

Data Flow Analysis Reading Muchnick Chapter 8 Some material not in book just in lecture notes Data Flow Analysis Compute information about program At program points To identify optimization opportunities Can model as solving a system of constraints Each CFG node imposes a constraint relating predecessor and successor info Solution to constraints is result of analysis Solution must be Sound (aka safe) Can be conservative 14

Key Issues Do constraints define analysis correctly? How to efficiently represent information? How to represent and solve constraints efficiently? What about multiple solutions? How synchronize transformation with analysis? Example: Reaching Definitions For each program point want to know What set of definitions (statements) may reach that point Reach = the last definition of a variable Info = set of var -> LIR bindings, e.g., {x->s 1, y->s 5, y->s 8 } Can use the info to Build def-use chains Do constant- and copy propagation Safety Can have more bindings that true answer Constraints for Reaching Defs Strong update s: x := info succ = info pred - {x->s s } U {x->s} Weak update s: *p := info succ = info pred U {x->s x may-point-to(p)} Other statements do nothing info succ = info pred 15

More Constraints for RDEFs Branches info succ[i] = info pred Control flow merges We don t know which path is taken at run time Be conservative: info succ = i info pred[i] Procedure entry: receive x info = {x->entry} Solving Constraints For reaching definitions can solve traversing instructions in forward topological order x (5) x := y (1) x := (2) y := (3) y := (4) p := x (6) x := (7) *p := x y (8) y:= Another Example (1) x := (2) y := (3) y := (4) p := x (5) x := y x (6) x := (7) *p := x y (8) y:= 16

Loop Schema: Constraints and Loops Constraints are now recursively defined info loop-head =info loop-entry U info back-edge Info back-edge = info loop-head Substituting definition of info back-edge : info loo-head =info loop-entry U ( info loop-head ) summarized rhs as F: info loop-head =F(info loop-head ) Legal solutions to constraints is a fixedpoint of F Recursive Constraints Many solutions possible Want least or greatest fixed-point, whichever corresponds to most precise answer How can fixed-point be found Interval & structural analysis for certain CFGs Iterative approximation for arbitrary CFGs 17

Iterative Data Flow Analysis Start with initial guess of info at loop head info loop-head =guess Solve equations for body info back-edge =F body (info loop-head ) info loop-head =info loop-entry U info back-edge Test if fixed-point found info loop-head =info,loop-head If same done Else use result as new (better) guess: info back-edge =F body (info loop-head ) info loop-head =info loop-entry Uinfo back-edge When does Iterating work? Have to be able to make initial guess Info n+1 must be closer to fixed-point than info n Must eventually reach fixed-point, info must be drawn from a finite height domain To reach best fixed-point, initial guess should be optimististic Info loop-head = info loop-entry Even if guess is overly optimistic, iteration will ensure that we get a safe Iteration speed Ideal: solve constraints along shortest path from loop head to tail Practical: avoid solving constraints outside of loop until fixed-point reached inside loop Data Flow Analysis Direction Constraints are declarative, so may require mix of forward and backward Frequently directional propagation & iteration Forward or backward Topological traversal of acyclic subgraph minimizes analysis time Directional constraints are called flow functions RDEF s:x:= (in) = in - {x->s x } U {x->s} 18

GEN & KILL sets Flow functions can often be described by their GEN and KILL sets GEN = new information added KILL = old information removed F instr (in) = in - KILL instr U GEN instr Example reaching definitions GEN s:x:= = {x->s} KILL s:x:= = {x->s x } Bit Vectors Encode data flow information and GEN/KILL sets as bit vectors Works when info can be expressed abstractly as a set of things Each gets a specific bit position Reaching defs: info = bit vector over statements, each bit represents a specific statement, defined variable is implied by statement Advantages Compact representation Fast union & difference operations Can combine GEN / KILL sets of whole basic blocok into one GEN/KILL set for faster iteration Example:Constant Propagation What info should be represented? CP x:=n (in) = CP x := y+x (in) = Merge function? Initial info? Direction? Can bit vectors be used? 19

Another Example: Constant Propagation x := 3 x := x+1 w := 3 w := 3 y := x*2 z := y+5 w := w*2 May vs. Must Info Some kind of info implies guarantees: must info Some kind of info implies possibilities: may info Desired info Safe GEN KILL MERGE May Small set Overly big set Add everything that might be true Remove only if guaranteed wrong Must Big set Overly small set Add only if guaranteed true Remove everything possibly wrong Live Variables Analysis Desired info: set of variables live at each program point Live = might be used later in the program Supports dead assignment elimination, register allocation May or must? Direction? Merge function Bit vectors usable? Initial info? 20

Live Variables Example x := 5 y := x*2 x := x+1 y := x+10 y Lattice-theoretic Data Flow Analysis Framework Goals Provide single, formal model to describe all DFAs Formalize safe, conservative, optimistic Precise bounds on time complexity Connect analysis with underlying semantics to enable correctness proofs Plan Define domain of program properties computed by DFA Each domain has set of elements Each element represents one possible value of the property Order sets to reflect relative precision Domain = set of elements + order over elements = lattice Define flow/merge functions using lattice operators Benefit from lattice theory for realizing goals Lattices D = (S, ) S set of elements induces a partial order Reflexive, transitive & anti-symmetrics x,y S: meet ^(x,y) (greatest lower bound) join v(x,y) (least upper bound) ν = closure property Unique Top (T) & Bottom elements: X^ = and x v T = T ν Meet and join are commutative and associative Height of lattice: longest path through partial order from top to bottom 21

Lattices in Data Flow Analysis Model information by elements of a lattice domain Top = best case info Bottom = worst case info Initial info for optimistic analyses (at least back edges: top) If a b then a is a conservative approximation of b Merge function = meet (^) : the most precise element that s a conservative approximation of both input elements Initial info for optimistic analyses (at least back edges: top) Some Typical Lattice Domains Two point lattice: T Boolean property A tuple of two point lattices = bit vector Lifted set: set of incomparable values and and T Example? Powerset lattice: set of all subsets of S, ordered somehow (often by ) T = {} = S or vice versa Collecting analysis Isomorphic to tuple of booleans indicating membership in subset of elements of S Tuples of Lattices Often useful to break complex lattice into a tuple of lattices, one per variable analyzed D T =<S T, T > = <S, > N ν ν S T = S 1 XS 2 X XS N T pointwise ordering T T = <T D,,T D >, bottom tuple of bottoms Height(D T ) = N * height(d) Example? 22

Analysis of Loops with Lattices F=flow function for loop body F(info-at-loop-head) = info at back edge: F 0 =d entry^t F 1 =d entry^b(f 0 )=F(F 0 )=F(d entry ) F 2 =d entry^b(f 1 )=F(F(F 0 )) F k =d entry^b(f k-1 ) = F(F( (F(d entry )) )) Repeat until F k+1 =F k 23