CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1

Similar documents
Compiler Construction 2009/2010 SSA Static Single Assignment Form

CS153: Compilers Lecture 23: Static Single Assignment Form

Topic I (d): Static Single Assignment Form (SSA)

Control-Flow Analysis

What If. Static Single Assignment Form. Phi Functions. What does φ(v i,v j ) Mean?

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

8. Static Single Assignment Form. Marcus Denker

Introduction to Machine-Independent Optimizations - 6

Static Single Assignment (SSA) Form

SSA Construction. Daniel Grund & Sebastian Hack. CC Winter Term 09/10. Saarland University

CSE Section 10 - Dataflow and Single Static Assignment - Solutions

Example. Example. Constant Propagation in SSA

Calvin Lin The University of Texas at Austin

Lecture 3 Local Optimizations, Intro to SSA

6. Intermediate Representation

ECE 5775 High-Level Digital Design Automation Fall More CFG Static Single Assignment

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Advanced Compilers CMPSCI 710 Spring 2003 Computing SSA

Compiler Optimisation

Calvin Lin The University of Texas at Austin

Computing Static Single Assignment (SSA) Form. Control Flow Analysis. Overview. Last Time. Constant propagation Dominator relationships

6. Intermediate Representation!

Induction Variable Identification (cont)

Control Flow Analysis & Def-Use. Hwansoo Han

Advanced Compilers CMPSCI 710 Spring 2003 Dominators, etc.

Lecture 23 CIS 341: COMPILERS

The Static Single Assignment Form:

Data Flow Analysis and Computation of SSA

CS 432 Fall Mike Lam, Professor. Data-Flow Analysis

Why Global Dataflow Analysis?

CSE P 501 Compilers. Loops Hal Perkins Spring UW CSE P 501 Spring 2018 U-1

CS577 Modern Language Processors. Spring 2018 Lecture Optimization

ECE 5775 (Fall 17) High-Level Digital Design Automation. Static Single Assignment

Intermediate representation

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

Reuse Optimization. LLVM Compiler Infrastructure. Local Value Numbering. Local Value Numbering (cont)

Using Static Single Assignment Form

Static single assignment

Control Flow Analysis

Goals of Program Optimization (1 of 2)

Intermediate Representation (IR)

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

CS 406/534 Compiler Construction Putting It All Together

A Propagation Engine for GCC

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Loop Invariant Code Motion. Background: ud- and du-chains. Upward Exposed Uses. Identifying Loop Invariant Code. Last Time Control flow analysis

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Dataflow Analysis. Xiaokang Qiu Purdue University. October 17, 2018 ECE 468

Introduction to Machine-Independent Optimizations - 4

Lecture Notes on Loop-Invariant Code Motion

Sta$c Single Assignment (SSA) Form

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Towards an SSA based compiler back-end: some interesting properties of SSA and its extensions

Introduction to Optimization. CS434 Compiler Construction Joel Jones Department of Computer Science University of Alabama

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.

Global Optimization. Lecture Outline. Global flow analysis. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization

CSE 401 Final Exam. March 14, 2017 Happy π Day! (3/14) This exam is closed book, closed notes, closed electronics, closed neighbors, open mind,...

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

CSE 501 Midterm Exam: Sketch of Some Plausible Solutions Winter 1997

Review; questions Basic Analyses (2) Assign (see Schedule for links)

Contents of Lecture 2

The Development of Static Single Assignment Form

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

CONTROL FLOW ANALYSIS. The slides adapted from Vikram Adve

Topic 9: Control Flow

Lecture 21 CIS 341: COMPILERS

Lecture Notes on Common Subexpression Elimination

Lecture 9: Loop Invariant Computation and Code Motion

register allocation saves energy register allocation reduces memory accesses.

A main goal is to achieve a better performance. Code Optimization. Chapter 9

Challenges in the back end. CS Compiler Design. Basic blocks. Compiler analysis

Studying Optimal Spilling in the light of SSA

Compiler Design. Register Allocation. Hwansoo Han

More Dataflow Analysis

EECS 583 Class 8 Classic Optimization

Intermediate Representations Part II

Simple and Efficient Construction of Static Single Assignment Form

Code generation for modern processors

Code generation for modern processors

CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers

JOHANNES KEPLER UNIVERSITÄT LINZ

Combining Analyses, Combining Optimizations - Summary

SSA. Stanford University CS243 Winter 2006 Wei Li 1

Sparse Conditional Constant Propagation

Single-Pass Generation of Static Single Assignment Form for Structured Languages

Lecture Notes on Value Propagation

SSA-Form Register Allocation

Introduction to Optimization Local Value Numbering

Live Variable Analysis. Work List Iterative Algorithm Rehashed

Sparse Conditional Constant Propagation

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto

CSC D70: Compiler Optimization Register Allocation

Sparse Conditional Constant Propagation

Why Data Flow Models? Dependence and Data Flow Models. Learning objectives. Def-Use Pairs (1) Models from Chapter 5 emphasized control

This test is not formatted for your answers. Submit your answers via to:

Conditional Elimination through Code Duplication

CSE 401/M501 Compilers

CS415 Compilers. Intermediate Represeation & Code Generation

Transcription:

CSE P 0 Compilers SSA Hal Perkins Spring 0 UW CSE P 0 Spring 0 V-

Agenda Overview of SSA IR Constructing SSA graphs Sample of SSA-based optimizations Converting back from SSA form Sources: Appel ch., also an extended discussion in Cooper-Torczon sec.., Mike Ringenburg s CSE 0 slides (wi) UW CSE P 0 Spring 0 V-

Def-Use (DU) Chains Common dataflow analysis problem: Find all sites where a variable is used, or find the definition site of a variable used in an expression Traditional solution: def-use chains additional data structure on top of the dataflow graph Link each statement defining a variable to all statements that use it Link each use of a variable to its definition UW CSE P 0 Spring 0 V-

Def-Use (DU) Chains entry z> In this example, two DU chains intersect x= z> x= y=x+ z=x- x= z=x+7 exit UW CSE P 0 Spring 0 V-

DU-Chain Drawbacks Expensive: if a typical variable has N uses and M definitions, the total cost per-variable is O(N * M), i.e., O(n ) Would be nice if cost were proportional to the size of the program Unrelated uses of the same variable are mixed together Complicates analysis variable looks live across all uses even if unrelated UW CSE P 0 Spring 0 V-

SSA: Static Single Assignment IR where each variable has only one definition in the program text This is a single static definition, but that definition can be in a loop that is executed dynamically many times Makes many analyses (and associated optimizations) more efficient Separates values from memory storage locations Complementary to CFG/DFG better for some things, but cannot do everything UW CSE P 0 Spring 0 V-6

SSA in Basic Blocks Idea: for each original variable x, create a new variable x n at the n th definition of the original x. Subsequent uses of x use x n until the next definition point. Original a := x + y b := a a := y + b b := x * a := a + b SSA a := x + y b := a a := y + b b := x * a := a + b UW CSE P 0 Spring 0 V-7

Merge Points The issue is how to handle merge points if ( ) a = x; else a = y; b = a; if ( ) a = x; else a = y; b =??; UW CSE P 0 Spring 0 V-

Merge Points The issue is how to handle merge points if ( ) a = x; else a = y; b = a; if ( ) a = x; else a = y; a =Φ(a, a ); b = a ; Solution: introduce a Φ-function a := Φ(a, a ) Meaning: a is assigned either a or a depending on which control path is used to reach the Φ-function UW CSE P 0 Spring 0 V-

Another Example Original b := M[x] a := 0 SSA b := M[x] a := 0 if b < if b < a := b a := b c := a + b a := Φ(a, a ) c := a + b UW CSE P 0 Spring 0 V-0

How Does Φ Know What to Pick? It doesn t Φ-functions don t actually exist at runtime When we re done using the SSA IR, we translate back out of SSA form, removing all Φ-functions Basically by adding code to copy all SSA x i values to the single, non-ssa, actual x For analysis, all we typically need to know is the connection of uses to definitions no need to execute anything UW CSE P 0 Spring 0 V-

Example With a Loop Original a := 0 b := a + c := c + b a := b * if a < N return c SSA a := 0 a := Φ(a, a ) b := Φ(b 0, b ) c := Φ(c 0, c ) b := a + c := c + b a := b * if a < N return c Notes: Loop back edges are also merge points, so require Φ-functions a 0, b 0, c 0 are initial values of a, b, c on block entry b is dead can delete later c is live on entry either input parameter or uninitialized UW CSE P 0 Spring 0 V-

What does SSA buy us? No need for DU or UD chains implicit in SSA Compact representation SSA is recent (i.e., 0s) Prevalent in real compilers for { } languages UW CSE P 0 Spring 0 V-

Converting To SSA Form Basic idea First, add Φ-functions Then, rename all definitions and uses of variables by adding subscripts UW CSE P 0 Spring 0 V-

Inserting Φ-Functions Could simply add Φ-functions for every variable at every join point(!) Called maximal SSA But Wastes way too much space and time Not needed in many cases UW CSE P 0 Spring 0 V-

Path-convergence criterion Insert a Φ-function for variable a at point z when: There are blocks x and y, both containing definitions of a, and x ¹ y There are nonempty paths from x to z and from y to z These paths have no common nodes other than z UW CSE P 0 Spring 0 V-6

Details The start node of the flow graph is considered to define every variable (even if undefined ) Each Φ-function itself defines a variable, which may create the need for a new Φ- function So we need to keep adding Φ-functions until things converge How can we do this efficiently? Use a new concept: dominance frontiers UW CSE P 0 Spring 0 V-7

Dominators (review) Definition: a block x dominates a block y iff every path from the entry of the control-flow graph to y includes x So, by definition, x dominates x We can associate a Dom(inator) set with each CFG node x set of all blocks dominated by x Dom(x) Properties: Transitive: if a dom b and b dom c, then a dom c There are no cycles, thus can represent the dominator relationship as a tree UW CSE P 0 Spring 0 V-

Example 6 7 6 7 0 0 UW CSE P 0 Spring 0 V-

Dominators and SSA One property of SSA is that definitions dominate uses; more specifically: If x := Φ(,x i, ) is in block B, then the definition of x i dominates the i th predecessor of B If x is used in a non-φ statement in block B, then the definition of x dominates block B UW CSE P 0 Spring 0 V-0

Dominance Frontier () To get a practical algorithm for placing Φ- functions, we need to avoid looking at all combinations of nodes leading from x to y Instead, use the dominator tree in the flow graph UW CSE P 0 Spring 0 V-

Dominance Frontier () Definitions x strictly dominates y if x dominates y and x ¹ y The dominance frontier of a node x is the set of all nodes w such that x dominates a predecessor of w, but x does not strictly dominate w This means that x can be in it s own dominance frontier! That can happen if there is a back edge to x (i.e., x is the head of a loop) Essentially, the dominance frontier is the border between dominated and undominated nodes UW CSE P 0 Spring 0 V-

Example 6 7 6 7 0 0 = x = DomFrontier(x) = StrictDom(x) UW CSE P 0 Spring 0 V-

Example 6 7 6 7 0 0 = x = DomFrontier(x) = StrictDom(x) UW CSE P 0 Spring 0 V-

Example 6 7 6 7 0 0 = x = DomFrontier(x) = StrictDom(x) UW CSE P 0 Spring 0 V-

Example 6 7 6 7 0 0 = x = DomFrontier(x) = StrictDom(x) UW CSE P 0 Spring 0 V-6

Example 6 7 6 7 0 0 = x = DomFrontier(x) = StrictDom(x) UW CSE P 0 Spring 0 V-7

Example 6 7 6 7 0 0 = x = DomFrontier(x) = StrictDom(x) UW CSE P 0 Spring 0 V-

Example 6 7 6 7 0 0 = x = DomFrontier(x) = StrictDom(x) UW CSE P 0 Spring 0 V-

Example 6 7 6 7 0 0 = x = DomFrontier(x) = StrictDom(x) UW CSE P 0 Spring 0 V-0

Example 6 7 6 7 0 0 = x = DomFrontier(x) = StrictDom(x) UW CSE P 0 Spring 0 V-

Example 6 7 6 7 0 0 = x = DomFrontier(x) = StrictDom(x) UW CSE P 0 Spring 0 V-

Example 6 7 6 7 0 0 = x = DomFrontier(x) = StrictDom(x) UW CSE P 0 Spring 0 V-

Dominance Frontier Criterion for Placing Φ-Functions If a node x contains the definition of variable a, then every node in the dominance frontier of x needs a Φ- function for a Idea: Everything dominated by x will see x s definition of a. The dominance frontier represents the first nodes we could have reached via an alternative path, which will have an alternate reaching definition (recall that the entry node defines everything) Why is this right for loops? Hint: strict dominance Since the Φ-function itself is a definition, this placement rule needs to be iterated until it reaches a fixed-point Theorem: this algorithm places exactly the same set of Φ-functions as the path criterion given previously UW CSE P 0 Spring 0 V-

Placing Φ-Functions: Details See the book for the full construction, but the basic steps are:. Compute the dominance frontiers for each node in the flowgraph. Insert just enough Φ-functions to satisfy the criterion. Use a worklist algorithm to avoid reexamining nodes unnecessarily. Walk the dominator tree and rename the different definitions of each variable a to be a, a, a, UW CSE P 0 Spring 0 V-

Efficient Dominator Tree Computation Goal: SSA makes optimizing compilers faster since we can find definitions/uses without expensive bit-vector algorithms So, need to be able to compute SSA form quickly Computation of SSA from dominator trees are efficient, but UW CSE P 0 Spring 0 V-6

Lengauer-Tarjan Algorithm Iterative set-based algorithm for finding dominator trees is slow in worst case Lengauer-Tarjan is near linear time Uses depth-first spanning tree from start node of control flow graph See books for details UW CSE P 0 Spring 0 V-7

SSA Optimizations Why go to the trouble of translating to SSA? The advantage of SSA is that it makes many optimizations and analyses simpler and more efficient We ll give a couple of examples But first, what do we know? (i.e., what information is kept in the SSA graph?) UW CSE P 0 Spring 0 V-

SSA Data Structures Statement: links to containing block, next and previous statements, variables defined, variables used. Variable: link to its (single) definition and (possibly multiple) use sites Block: List of contained statements, ordered list of predecessors, successor(s) UW CSE P 0 Spring 0 V-

Dead-Code Elimination A variable is live ó its list of uses is not empty(!) That s it! Nothing further to compute Algorithm to delete dead code: while there is some variable v with no uses if the statement that defines v has no other side effects, then delete it Need to remove this statement from the list of uses for its operand variables which may cause those variables to become dead UW CSE P 0 Spring 0 V-0

Simple Constant Propagation If c is a constant in v := c, any use of v can be replaced by c Then update every use of v to use constant c If the c i s in v := Φ(c, c,, c n ) are all the same constant c, we can replace this with v := c Incorporate copy propagation, constant folding, and others in the same worklist algorithm UW CSE P 0 Spring 0 V-

Simple Constant Propagation W := list of all statements in SSA program while W is not empty remove some statement S from W if S is v:=φ(c, c,, c), replace S with v:=c if S is v:=c delete S from the program for each statement T that uses v substitute c for v in T add T to W UW CSE P 0 Spring 0 V-

Converting Back from SSA Unfortunately, real machines do not include a Φ instruction So after analysis, optimization, and transformation, need to convert back to a Φ-less form for execution UW CSE P 0 Spring 0 V-

Translating Φ-functions The meaning of x := Φ(x, x,, x n ) is set x:=x if arriving on edge, set x:=x if arriving on edge, etc. So, for each i, insert x := x i at the end of predecessor block i Rely on copy propagation and coalescing in register allocation to eliminate redundant copy instructions UW CSE P 0 Spring 0 V-

SSA Wrapup More details needed to fully and efficiently implement SSA, but these are the main ideas See recent compiler books (but not the Dragon book!) Allows efficient implementation of many optimizations SSA is used in most modern optimizing compilers (llvm is based on it) and has been retrofitted into many older ones (gcc is a well-known example) Not a silver bullet some optimizations still need non-ssa forms, but very effective for many UW CSE P 0 Spring 0 V-