Overview Of Op*miza*on, 2

Similar documents
Introduction to Optimization. CS434 Compiler Construction Joel Jones Department of Computer Science University of Alabama

Loop Invariant Code Mo0on

Construc)on of Sta)c Single- Assignment Form

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

Operator Strength Reduc1on

The Big Picture. The Programming Assignment & A Taxonomy of Op6miza6ons. COMP 512 Rice University Spring 2015

Introduction to Optimization Local Value Numbering

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Welcome to COMP 512. COMP 512 Rice University Spring 2015

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto

Transla'on Out of SSA Form

Dynamic Compila-on in Smalltalk- 80

Instruction Selection: Preliminaries. Comp 412

Lazy Code Motion. Comp 512 Spring 2011

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Intermediate Representations

Combining Optimizations: Sparse Conditional Constant Propagation

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

Register Alloca.on via Graph Coloring

CS293S Redundancy Removal. Yufei Ding

What If. Static Single Assignment Form. Phi Functions. What does φ(v i,v j ) Mean?

Lab 3, Tutorial 1 Comp 412

Lessons from Fi,een Years of Adap2ve Compila2on

CS415 Compilers. Intermediate Represeation & Code Generation

Calvin Lin The University of Texas at Austin

Intermediate Representations

Code Shape II Expressions & Assignment

Parsing II Top-down parsing. Comp 412

The View from 35,000 Feet

Calvin Lin The University of Texas at Austin

The Static Single Assignment Form:

Global Register Allocation via Graph Coloring

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1

CS /534 Compiler Construction University of Massachusetts Lowell

Topic I (d): Static Single Assignment Form (SSA)

Implementing Control Flow Constructs Comp 412

Fortran H and PL.8. Papers about Great Op.mizing Compilers. COMP 512 Rice University Spring 2015

The Software Stack: From Assembly Language to Machine Code

CS 406/534 Compiler Construction Instruction Scheduling

CS553 Lecture Profile-Guided Optimizations 3

The Processor Memory Hierarchy

CS 406/534 Compiler Construction Putting It All Together

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

CSE Section 10 - Dataflow and Single Static Assignment - Solutions

Intermediate representation

Example. Example. Constant Propagation in SSA

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Instruction Selection, II Tree-pattern matching

Induction Variable Identification (cont)

Data Flow Analysis. Agenda CS738: Advanced Compiler Optimizations. 3-address Code Format. Assumptions

Local Register Allocation (critical content for Lab 2) Comp 412

Intermediate Representations & Symbol Tables

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

Sta$c Single Assignment (SSA) Form

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Languages and Compiler Design II IR Code Optimization

Tour of common optimizations

Compiler Optimization Intermediate Representation

Intermediate Representations Part II

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Lecture 09X: C Func/on Pointers Concurrent and Mul/core Programming

Goals of Program Optimization (1 of 2)

MIT Introduction to Program Analysis and Optimization. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Building a Parser Part III

Syntax Analysis, III Comp 412

Program Optimizations using Data-Flow Analysis

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Computer Science 160 Translation of Programming Languages

Syntax Analysis, III Comp 412

Pipelined Datapath. One register file is enough

Loops and Locality. with an introduc-on to the memory hierarchy. COMP 506 Rice University Spring target code. source code OpJmizer

Simple Machine Model. Lectures 14 & 15: Instruction Scheduling. Simple Execution Model. Simple Execution Model

Flaco Lock Service. RESTful Distributed Locking Over HTTP. Jose Falcon Bryon Jacob

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

: Advanced Compiler Design. 8.0 Instruc?on scheduling

Operational Semantics of Cool

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

UG3 Compiling Techniques Overview of the Course

Intermediate Representations

Supplement to. Logic and Computer Design Fundamentals 4th Edition 1

Computer Science 146. Computer Architecture

Query Processing & Optimization

Image Transforms & Warps. 9/25/08 Comp 665 Image Transforms & Warps 1

Just-In-Time Compilers & Runtime Optimizers

Advanced Compilers CMPSCI 710 Spring 2003 Dominators, etc.

Reuse Optimization. LLVM Compiler Infrastructure. Local Value Numbering. Local Value Numbering (cont)

SSA Construction. Daniel Grund & Sebastian Hack. CC Winter Term 09/10. Saarland University

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CS202 Compiler Construction

Transcription:

OMP 512 Rice University Spring 2015 Overview Of Op*miza*on, 2 Superlocal Value Numbering, SE opyright 2015, Keith D. ooper & Linda Torczon, all rights reserved. Students enrolled in omp 512 at Rice University have explicit permission to make copies of these materials for their personal use. aculty from other educahonal inshtuhons may use these materials for nonprofit educahonal purposes, provided this copyright nohce is preserved itahon numbers, when given, refer to entries in the Ea2e bibliography.

Local Value Numbering Review The algorithm or each operahon o in the block 1 et value numbers for the operands from a hash lookup 2 Hash <operator,vn(o 1 ),VN(o 2 )> to get a value number for o 3 If o already had a value number, replace o with a reference 4 If o 1 & o 2 are constant, evaluate it & use a load immediate If hashing behaves, the algorithm runs in linear Hme If you don t believe in hashing, try mulh- set discriminahon Minor issues ommutahve operator hash operands in each order or sort the operands by VN before hashing (either works, sor;ng is cheaper ) Looks at operand s value number, not its name Ea2e: digression on page 256 or reference [65] OMP 512, Spring 2015 2

Mul9- lock Example Review ontrol- flow graph () Nodes for basic blocks Edges for branches asis for much of program analysis & transformahon = (N,E) N = {,,, D, E,, } E = { (,), (,), (,),(,D), (,E), (D,), (E,), (,E) } N = 7, E = 8 OMP 512, Spring 2015 3

Mul9- lock Example Review Local Value Numbering (LVN) 1 block at a Hme Strong local results No inter- block effects LVN finds redundant ops in red OMP 512, Spring 2015 4

Mul9- lock Example Review Local Value Numbering (LVN) 1 block at a Hme Strong local results No inter- block effects LVN finds redundant ops in red LVN misses redundant ops in blue OMP 512, Spring 2015 5

eyond asic locks: Extended asic locks Review n Extended asic lock (E) Set of blocks 1, 2,, n 1 has > 1 predecessor ll other i have 1 pred. & that pred. is in the E OMP 512, Spring 2015 6

Extended asic locks Review n Extended asic lock (E) Set of blocks 1, 2,, n 1 has > 1 predecessor ll other i have 1 pred. & that pred. is in the E Three Es in this 1. {,,, D, E} OMP 512, Spring 2015 7

Extended asic locks Review n Extended asic lock (E) Set of blocks 1, 2,, n 1 has > 1 predecessor ll other i have 1 pred. & that pred. is in the E Three Es in this 1. {,,, D, E } 2. { } OMP 512, Spring 2015 8

Extended asic locks Review n Extended asic lock (E) Set of blocks 1, 2,, n 1 has > 1 predecessor ll other i have 1 pred. & that pred. is in the E Three Es in this 1. {,,, D, E } 2. { } 3. { } OMP 512, Spring 2015 9

Extended asic locks Review n Extended asic lock (E) Set of blocks 1, 2,, n 1 has > 1 predecessor ll other i have 1 pred. & that pred. is in the E Three Es in this 1. {,,, D, E } 2. { } 3. { } Degenerate or trivial Es OMP 512, Spring 2015 10

Value Numbering Over Extended asic locks Review Superlocal VN (SVN) pply LVN to each path in E arry hash table forward, block to block pply LVN to each path in E 1. (, ) OMP 512, Spring 2015 11

Value Numbering Over Extended asic locks Review Superlocal VN pply LVN to each path in E arry hash table forward, block to block pply LVN to each path in E 1. (, ) 2. (,, D) OMP 512, Spring 2015 12

Value Numbering Over Extended asic locks Review Superlocal VN pply LVN to each path in E arry hash table forward, block to block pply LVN to each path in E 1. (, ) 2. (,, D) 3. (,, E) OMP 512, Spring 2015 13

Superlocal Value Numbering Efficiency Easy to implement if we are willing to process three Hmes & twice,,,, D,,, E,, ould be faster if we reused the results from &,,, D, E,, OMP 512, Spring 2015 14

Superlocal Value Numbering Efficiency Easy to implement if we are willing to process three Hmes & twice,,,, D,,, E,, ould be faster if we reused the results from &,,, D, E,, Worst ase Imagine SVN on a case statement 1 2 3 n- 1 n OMP 512, Spring 2015 15

The Role of Names in Superlocal Value Numbering What work must be repeated in a predecessor block? Value numbers are stored in a hash table Keyed by name or <op,vn,vn> construct To avoid repeated work, SVN should roll back changes to the hash table Rather than,,, we want to go from to without revisihng x c + d x a + b In the example, the definihon of x in changes the hash table entry for x oer, SVN needs to roll x s value number back to the value from ould run backward through and undo each definihon (with bookkeeping) ould reprocess eper way is to rename so that each definihon has a unique name y a + b We saw the same issue in LVN, in local register alloca*on, & in local scheduling. We need a global name space with the right set of properhes OMP 512, Spring 2015 16

Superlocal Value Numbering Efficiency Easy to implement if we are willing to process three Hmes & twice,,,, D,,, E,, ould be faster if we reused the results from &,,, D, E,, Need an appropriate name space & a scoped hash table (parsing? ) lterna*ve is to add lots of complex mechanism for kills & table management Desired Name Space Unique name for each definihon Name VN SS name space is ideal OMP 512, Spring 2015 Scoped Table? 5.5 in Ea2e 17

side: SS Name Space (In eneral) Two principles Each name is defined by exactly one operahon Each operand refers to exactly one definihon To reconcile these principles with real code Insert φ- funchons at merge points to reconcile name space dd subscripts to variable names for uniqueness φ- funchon selects one of its operands, based on the control- flow path used to reach the block. x... x...... x +... becomes x 0... x 1... x 2 φ(x 0,x 1 ) x 2 +... We ll look at how to construct SS form in a week or two OMP 512, Spring 2015 18

Superlocal Value Numbering Now, SVN becomes 1. IdenHfy Es 2. In depth- first order over an E, starhng with the head of the E, b 0 a. pply LVN to b i b. Invoke SVN on each of b i s E successors When going from b i to its E successor b j, extend the symbol table with a new scope for b j, apply LVN to b j, & process b j s E successors When going from b j to its E predecessor b i, discard the scope for b j It is that easy, with a scoped table & the right name space OMP 512, Spring 2015 19

SVN on the Example LVN finds redundant ops in red SVN finds redundant ops in blue OMP 512, Spring 2015 20

SVN on the Example LVN finds redundant ops in red SVN finds redundant ops in blue oth miss redundancies in & OMP 512, Spring 2015 21

Perspec9ve SVN sidesteps the need for separate analysis & transforma9on pplies LVN over a larger acyclic context long a path in an E, order is fully specified Direct contrast with scheduling in an E or a trace, because scheduling moves around operahons and changes the order Result, in scheduling, is compensa;on code Redundancy eliminahon preserves the order, so we can stretch LVN to Es To go (much) beyond Es, we need separate transforma9on & analysis Later in the semester, we will look at methods that combine code mo;on & redundancy elimina;on, such as lazy code mo;on [225,133], and at a technique that applies HopcroL s par;;oning algorithm to expressions over SS names [22]. ut first, we will look at the classical formulahon of global common subexpression elimina;on based on the global data- flow problem: available expressions [218] OMP 512, Spring 2015 22

lobal ommon Subexpression Elimina9on (SE) The oal ind redundant expressions ( common subexpressions ) whose range spans mulhple basic blocks, and eliminate any unnecessary re- evaluahons Safety ormulate availability of a redundant expression at point p as a data- flow problem: available expressions (annotate each block b with a set VIL(b) ) If x VIL(b), then, along each path from the entry to block b, x is evaluated and its conshtuent subexpressions (i.e., operands) are not redefined EvaluaHng x at the start of b would produce the same answer as at its most recent evaluahon, along any path leading from the entry to b TransformaHon preserves the result of prior computahons and uses them Only replaces an evaluahon that is in the VIL set of its block & shll available at the point of evaluahon SE does not move evaluahons, it eliminates them Safety of SE hinges on the correctness of the VIL sets OMP 512, Spring 2015 This treatment follows ocke s classic paper [87]. 23

lobal ommon Subexpression Elimina9on The oal ind redundant expressions ( common subexpressions ) whose range spans mulhple basic blocks, and eliminate any unnecessary re- evaluahons Profitability The transformahon does not add any new evaluahons to the code The transformahon replaces the evaluahon of the redundant expression with a register- to- register copy from a preserved value opy operahons are inexpensive Many copies will coalesce away The transformahon can increase or decrease demand for registers If the redundant expression is the last use of one of its operands, it may reduce register pressure Difficult to understand the impact of any given replacement on register pressure OMP 512, Spring 2015 24

vailable Expressions or each block b Let VIL(b) be the set of expressions available on entry to b IniHally, VIL(n) = { all expressions }, n N, except n 0 IniHally, VIL(n 0 ) = Ø Let EXPRKILL(b) be the set of expressions killed in b Let DEEXPR(b) be the set of expressions defined in b and not subsequently killed in b (downward- exposed expressions) Now, VIL(b) can be defined as: complement operator VIL(b) = x preds(b) (DEEXPR(x) (VIL(x) EXPRKILL(x))) where preds(b) is the set of b s predecessors in the control- flow graph This system of simultaneous equahons forms a data- flow problem Solve it with a data- flow algorithm (e.g., itera;ve fixed- point scheme) OMP 512, Spring 2015 25

Using vailable Expressions for SE The Method 1. uild a control- flow graph () 2. block b, compute DEEXPR(b) and EXPRKILL(b) & inihalize VIL(b) 3. block b, compute VIL(b) Expressions killed in b Downward- exposed expressions 4. block b, replace expressions that are available with references Two key issues ompuhng VIL(b) Managing the replacement process We ll look at the replacement issue first ssume, without loss of generality (wlog), that we can compute VIL(b) correctly OMP and efficiently 512, Spring for 2015 each block b. 26

Replacement in SE The key lies in managing the name space Need a unique name e VIL(b) 1. an generate them as replacements are done (ortran H) 2. an pre- compute a stahc mapping (lassic answer) 3. an encode value numbers into names (Simpson) Strategy 1. This works; it is the classic method 2. ast; allows single pass to insert code to preserve values of non- redundant evaluahons & to replace the redundant evaluahons 3. Requires more analysis (VN), but yields more SEs ssume soluhon 2 OMP 512, Spring 2015 27

lobal SE (replacement step) ompute a sta9c mapping from expressions to names oer analysis & before transformahon block b, e VIL(b), assign a global name to e Integer can be Hed to index of bit- vector set representahon During transformahon step ommon strategy: EvaluaHon of e insert copy name(e) e Insert copies that might be useful Reference to e replace e with name(e) Let dead code elim. sort them out Simplifies design & implementahon The major problem with this approach Inserts extraneous copies to preserve values that are of no later use t all definihons and uses of any e VIL(b), b e VIL(b) says nothing about whether or not e is ever computed again Those extra copies are dead and easy to remove The useful ones ooen coalesce away OMP 512, Spring 2015 28

n side on Dead ode Elimina9on What does dead mean? Useless code result is never used Unreachable code code that cannot execute oth useless code & unreachable are ooen lumped together as dead To perform Dead ode Elimina9on Must have a global mechanism to recognize usefulness Must have a global mechanism to eliminate unneeded stores Must have a global mechanism to simplify control- flow predicates ll of these will come later in the course OMP 512, Spring 2015 29

lobal SE So, we have a three step process 1. ompute VIL(b), block b 2. ssign unique global names to expressions in VIL(b) 3. Perform replacement with local value numbering Earlier in the lecture, the slide said ssume, without loss of generality, that we can compute available expressions for a procedure. Next lecture, we will make good on that assump;on OMP 512, Spring 2015 30