Lecture 10: Lazy Code Motion

Similar documents
Lecture 5. Partial Redundancy Elimination

Lecture 9: Loop Invariant Computation and Code Motion

CSC D70: Compiler Optimization LICM: Loop Invariant Code Motion

Lecture 6 Foundations of Data Flow Analysis

Lecture 4 Introduction to Data Flow Analysis

Advanced Compilers Introduction to Dataflow Analysis by Example. Fall Chungnam National Univ. Eun-Sun Cho

Lecture 6 Foundations of Data Flow Analysis

Lecture 2. Introduction to Data Flow Analysis

Reuse Optimization. Partial Redundancy Elimination (PRE)

Data Flow Analysis. Agenda CS738: Advanced Compiler Optimizations. 3-address Code Format. Assumptions

Compiler Optimization and Code Generation

Data Flow Analysis. Program Analysis

Calvin Lin The University of Texas at Austin

Lecture 8: Induction Variable Optimizations

Lecture 3 Local Optimizations, Intro to SSA

Lazy Code Motion. Comp 512 Spring 2011

Data-flow Analysis. Y.N. Srikant. Department of Computer Science and Automation Indian Institute of Science Bangalore

Code Placement, Code Motion

Compilers. Optimization. Yannis Smaragdakis, U. Athens

More Dataflow Analysis

Data Flow Information. already computed

MIT Introduction to Dataflow Analysis. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Dataflow analysis (ctd.)

An Implementation of Lazy Code Motion for Machine SUIF

A Variation of Knoop, ROthing, and Steffen's Lazy Code Motion

CS 406/534 Compiler Construction Putting It All Together

: Compiler Design. 9.6 Available expressions 9.7 Busy expressions. Thomas R. Gross. Computer Science Department ETH Zurich, Switzerland

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

CSC D70: Compiler Optimization Register Allocation

Partial Redundancy Analysis

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

E-path PRE Partial Redundancy Elimination Made Easy

Control flow and loop detection. TDT4205 Lecture 29

Global Optimization. Lecture Outline. Global flow analysis. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization

EECS 583 Class 2 Control Flow Analysis LLVM Introduction

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Liveness Analysis and Register Allocation. Xiao Jia May 3 rd, 2013

Data-flow Analysis - Part 2

A main goal is to achieve a better performance. Code Optimization. Chapter 9

Instruction Selection and Scheduling

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

MIT Introduction to Program Analysis and Optimization. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Review; questions Basic Analyses (2) Assign (see Schedule for links)

Lecture 4. More on Data Flow: Constant Propagation, Speed, Loops

Anticipation-based partial redundancy elimination for static single assignment form

CS 6353 Compiler Construction, Homework #3

Dynamic Memory Allocation

Control Flow Analysis

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413

Compiler Design Spring 2017

CS577 Modern Language Processors. Spring 2018 Lecture Optimization

Intro to dataflow analysis. CSE 501 Spring 15

CSC D70: Compiler Optimization

Example. Example. Constant Propagation in SSA

Files and Folders What are Files and Folders Files: This is the Data, Images, and things the user makes

Compiler Design Spring 2017

Lecture 14 Pointer Analysis

Intermediate representation

Optimizations. Optimization Safety. Optimization Safety CS412/CS413. Introduction to Compilers Tim Teitelbaum

Homework Assignment #1 Sample Solutions

Dataflow Analysis 2. Data-flow Analysis. Algorithm. Example. Control flow graph (CFG)

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Code Optimization. Code Optimization

Tour of common optimizations

COSE312: Compilers. Lecture 20 Data-Flow Analysis (2)

EECS 583 Class 8 Classic Optimization

Example of Global Data-Flow Analysis

Compiler Optimisation

This test is not formatted for your answers. Submit your answers via to:

Global Register Allocation via Graph Coloring

Compiler Design. Fall Data-Flow Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013

Compiler Optimization Techniques

Lecture 21 CIS 341: COMPILERS

Register Allocation via Hierarchical Graph Coloring

Allocation & Efficiency Generic Containers Notes on Assignment 5

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

Announcements. CS243: Discrete Structures. Strong Induction and Recursively Defined Structures. Review. Example (review) Example (review), cont.

Lecture 16 Pointer Analysis

What If. Static Single Assignment Form. Phi Functions. What does φ(v i,v j ) Mean?

EECS 583 Class 3 More on loops, Region Formation

Single-Pass Generation of Static Single Assignment Form for Structured Languages

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Last Class: Clock Synchronization. Today: More Canonical Problems

CS153: Compilers Lecture 23: Static Single Assignment Form

Partial Redundancy Elimination and SSA Form

Control Flow Analysis. Reading & Topics. Optimization Overview CS2210. Muchnick: chapter 7

Lecture 7. Instruction Scheduling. I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code)

Data Flow Analysis. CSCE Lecture 9-02/15/2018

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Lecture 11. Lecture 11: External Sorting

Register allocation. TDT4205 Lecture 31

CS553 Lecture Generalizing Data-flow Analysis 3

PROGRAM CONTROL FLOW CONTROL FLOW ANALYSIS. Control flow. Control flow analysis. Sequence of operations Representations

Introduction to Machine-Independent Optimizations - 6

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

Dynamic Memory Allocation

Introduction to Machine-Independent Optimizations - 4

Arc Consistency and Domain Splitting in CSPs

Data Flow Analysis and Computation of SSA

Transcription:

Lecture : Lazy Code Motion I. Mathematical concept: a cut set II. Lazy Code Motion Algorithm Pass : Anticipated Expressions Pass 2: (Will be) Available Expressions Pass 3: Postponable Expressions Pass 4: Used Expressions ALSU 9.5.3-9.5.5 Phillip B. Gibbons 5745: Lazy Code Motion

Review: Loop Invariant Code Motion a = b + c a = b + c yes: b = read() exit a = b + c t = b + c no no a = t Can b + c can be moved to header? Given an expression (b+c) inside a loop, does the value of b+c change inside the loop? is the code executed at least once? 5745: Lazy Code Motion 2

Review: Partial Redundancy Elimination a = b + c t = b + c a = t t = b + c d = b + c d = t Can we place calculations of b+c such that no path re-executes the same expression? Partial Redundancy Elimination (PRE) subsumes: global common subexpression (full redundancy) loop invariant code motion (partial redundancy for loops) 5745: Lazy Code Motion 3

I. Full Redundancy: A Cut Set in a Graph Key mathematical concept entry = a+b = a+b = a+b cut set p: = a+b a = b = Full redundancy at p: expression a+b redundant on all paths a cut set: nodes that separate entry from p (there can be many cut sets) each node in a cut set contains a calculation of a+b a, b not redefined 5745: Lazy Code Motion 4

Partial Redundancy: Completing a Cut Set entry = a+b = a+b = a+b cut set p: = a+b a = b = Partial redundancy at p: redundant on some but not all paths Add operations to create a cut set containing a+b Note: Moving operations up can eliminate redundancy Constraint on placement: no wasted operation a+b is anticipated at B if its value computed at B will be used along ALL subsequent paths a, b not redefined, no branches that lead to exit without use Range where a+b is anticipated Choice 5745: Lazy Code Motion 5

Review: Where Can We Insert Computations? Safety: never introduce a new expression along any path. a = b + c Unsafe to insert b+c here d = b + c Insertion could introduce exception, change program behavior. Solution: insert expression only where it is anticipated, i.e., its value computed at point p will be used along ALL subsequent paths Performance: never increase the # of computations on any path. Under simple model, guarantees program won t get worse. Reality: might increase register lifetimes, add copies, lose. 5-745: Lazy Code Motion 6

Preparing the Flow Graph a = b + c d = b + c a = b + c d = b + c = b + c Safe! Definition: Critical edges source basic block has multiple successors destination basic block has multiple predecessors Modify the flow graph: Add a basic block for every edge that leads to a basic block with multiple predecessors (not just on critical edges) How does this help the example? To keep algorithm simple: consider each statement as its own basic block and restrict placement of instructions to the beginning of a basic block 5745: Lazy Code Motion 7

II. Lazy Code Motion Algorithm Pass : Anticipated Expressions Pass 2: (Will be) Available Expressions Pass 3: Postponable Expressions Pass 4: Used Expressions Big picture: First calculates the earliest set of blocks for insertion this maximizes redundancy elimination but may also result in long register lifetimes Then it calculates the latest set of blocks for insertion achieving the same amount of redundancy elimination as earliest but hopefully reducing the lifetime of the register holding the value of the expression 5745: Lazy Code Motion 8

Pass : Anticipated Expressions Backward pass: Anticipated expressions Anticipated[b].in: Set of expressions anticipated at the entry of b An expression is anticipated if its value computed at point p will be used along ALL subsequent paths First approximation: This pass does most of the heavy lifting in eliminating redundancy Domain Direction Anticipated Expressions Sets of expressions backward Transfer Function f b (x) = EUse b (x -EKill b ) EUse: used exp, EKill: exp killed Boundary Initialization place operations at the frontier of anticipation (boundary between not anticipated and anticipated) in[exit] = in[b] = {all expressions} 5745: Lazy Code Motion 9

Example See the algorithm in action IN[i] = EUse[i] (OUT[i] - EKill[i]) Meet = Where is a + b anticipated? x = a + b r = a + b a = Add BB for every edge to BB with multiple predecessors y = a + b z = a + b Synthetic Block (5 others not shown) What is the result if we insert t = a + b at the frontier of anticipation? 5745: Lazy Code Motion

Example 2 (Loop Invariant Code) IN[i] = EUse[i] (OUT[i] - EKill[i]) Meet = a = x = a + b a = x = a + b Add BB for every edge to BB with multiple predecessors Where is a + b anticipated? Was inserting a + b at the frontier of anticipation the right thing to do in this case? doesn t eliminate redundancy within loop (why not?) 5745: Lazy Code Motion

Example 3 (More Complex Loop) IN[i] = EUse[i] (OUT[i] - EKill[i]) Meet = a = x = a+b y = a+b Where would we ideally like to insert a+b in this case? What happens if we insert at the frontier of anticipation? x = a+b a = y = a+b a + b anticipated? only in added block on left insert in both yellow blocks 5745: Lazy Code Motion 2

Example 4 (Variation on Previous Loop) IN[i] = EUse[i] (OUT[i] - EKill[i]) Meet = a = x = a+b y = a+b Is there any opportunity to eliminate redundancy here? no: unsafe to insert in left added block (a+b not anticipated there) (e.g. a+b could be b/a & orange block could be if a > ) 5745: Lazy Code Motion 3 a = x = a+b y = a+b a + b anticipated? (2 synthetic blocks not shown)

Pass 2: Place As Early As Possible First approximation: frontier between not anticipated & anticipated Complication: anticipation may oscillate Pretend we calculate expression e whenever it is anticipated e will be available at p if e has been anticipated but not subsequently killed on all paths reaching p Domain Direction Transfer Function Boundary Initialization There is still some redundancy left! 5745: Lazy Code Motion 4 a = x = a+b y = a+b (will be) Available Expressions Sets of expressions forward f b (x) = (Anticipated[b].in x) - EKill b out[entry] = out[b] = {all expressions}

Early Placement earliest(b) set of expressions added to block b under early placement calculated from results of first 2 passes Place expression at the earliest point anticipated and not already available earliest(b) = anticipated[b].in - available[b].in Algorithm For all basic block b, if x+y earliest[b] at beginning of b: create a new variable t t = x+y, replace every original x+y by t Result: Maximized redundancy elimination Placed as early as possible But: register lifetimes? 5745: Lazy Code Motion 5

Pass 3: Postponable Expressions Let s be lazy without introducing redundancy Delay creating redundancy to reduce register pressure latest b = earliest x = b+c An expression e is postponable at a program point p if y = b+c all paths leading to p have seen earliest placement of e but not a subsequent use Domain Direction Transfer Function Boundary Initialization Postponable Expressions Sets of expressions forward f b (x) = (earliest[b] x) - EUse b out[entry] = out[b] = {all expressions} 5745: Lazy Code Motion 6

Example Illustrating Postponable Ant: Av: P: Ant: Av: P: Ant: Av: P: Earliest (Ant=, Av=) Ant: Av: P: Entry b = Av: P: Ant: Av: P: Ant: Av: P: P.out: x = b + c Anticipated.in (Ant) Available.in (Av) Postponable.in (P) Ant: Av: P: Ant: Av: P: Ant: Av: P: y = b + c Exit Ant: Ant: Av: P: EUse = TRUE (causes P.out = ) Ant.IN[i] = EUse[i] (Ant.OUT[i]-EKill[i]) Avail.OUT[i] = (Ant.IN[i] Avail.IN[i])-EKill[i] 5745: Lazy Code Motion 7 Post.OUT[i] = (Earliest[i] Post.IN[i])-EUse[i]

Latest: frontier at the end of postponable cut set latest[b] = (earliest[b] postponable.in[b]) (EUse b ( s succ[b] (earliest[s] postponable.in[s]))) OK to place expression: earliest or postponable Need to place at b if either used in b, or not OK to place in one of its successors Works because of pre-processing step (an empty block was introduced to an edge if the destination has multiple predecessors) if b has a successor that cannot accept postponement, b has only one successor The following does not exist: OK to place OK to place not OK to place 5745: Lazy Code Motion 8

Example Illustrating Latest Earliest Ant: Av: P: Ant: Av: P: Ant: Av: P: Ant: Av: P: Entry b = Av: P: Ant: Av: P: Ant: Av: P: P.out: x = b + c Anticipated.in (Ant) Available.in (Av) Postponable.in (P) Ant: Av: P: Ant: Av: P: Ant: Av: P: Latest y = b + c Exit latest[b] = (earliest[b] postponable.in[b]) (EUse b ( s succ[b] (earliest[s] postponable.in[s]))) 5745: Lazy Code Motion 9 Ant: Av: P: Ant:

Pass 4: Used Expressions Finally this is easy, it is like liveness (for expressions) x = a + b not used afterwards Eliminate temporary variable assignments unused beyond current block Compute: Used.out[b]: sets of used (live) expressions at exit of b. Domain Direction Transfer Function Boundary Initialization Used Expressions Sets of expressions backward f b (x) = (EUse[b] x) - latest[b] in[exit] = in[b] = 5745: Lazy Code Motion 2

Code Transformation For all basic blocks b, if (x+y) (latest[b] used.out[b]) at beginning of b: add new t = x+y replace every original x+y by t 5745: Lazy Code Motion 2

4 Passes for Partial Redundancy Elimination. Safety: Cannot introduce operations not executed originally Pass (backward): Anticipation: range of code motion Placing operations at the frontier of anticipation gets most of the redundancy 2. Squeezing the last drop of redundancy: An anticipation frontier may cover a subsequent frontier Pass 2 (forward): Availability Earliest: anticipated, but not yet available 3. Push the cut set out -- as late as possible To minimize register lifetimes Pass 3 (forward): Postponability: move it down provided it does not create redundancy Latest: where it is used or the frontier of postponability 4. Cleaning up Pass 4 (backward): Remove unneeded temporary assignments 5745: Lazy Code Motion 22

Remarks Powerful algorithm Finds many forms of redundancy in one unified framework Illustrates the power of data flow Multiple data flow problems 5745: Lazy Code Motion 23

Today s Class I. Mathematical concept: a cut set II. Lazy Code Motion Algorithm Pass : Anticipated Expressions Pass 2: (Will be) Available Expressions Pass 3: Postponable Expressions Pass 4: Used Expressions Static Single Assignment ALSU 6.2.4 Friday s Class 5-745: Lazy Code Motion 24