Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

Similar documents
Introduction to Optimization Local Value Numbering

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Introduction to Optimization. CS434 Compiler Construction Joel Jones Department of Computer Science University of Alabama

Instruction Selection: Preliminaries. Comp 412

Intermediate Representations

CS293S Redundancy Removal. Yufei Ding

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Intermediate Representations

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.

Handling Assignment Comp 412

Local Register Allocation (critical content for Lab 2) Comp 412

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Combining Optimizations: Sparse Conditional Constant Propagation

The Software Stack: From Assembly Language to Machine Code

Just-In-Time Compilers & Runtime Optimizers

CS415 Compilers. Intermediate Represeation & Code Generation

Procedure and Function Calls, Part II. Comp 412 COMP 412 FALL Chapter 6 in EaC2e. target code. source code Front End Optimizer Back End

Lab 3, Tutorial 1 Comp 412

Runtime Support for Algol-Like Languages Comp 412

CS 406/534 Compiler Construction Putting It All Together

Code Shape II Expressions & Assignment

Parsing II Top-down parsing. Comp 412

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Naming in OOLs and Storage Layout Comp 412

Syntax Analysis, III Comp 412

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

Syntax Analysis, III Comp 412

Optimizer. Defining and preserving the meaning of the program

CS415 Compilers. Lexical Analysis

CS415 Compilers Register Allocation. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412

Code Merge. Flow Analysis. bookkeeping

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Runtime Support for OOLs Object Records, Code Vectors, Inheritance Comp 412

Intermediate Representations Part II

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

IR Optimization. May 15th, Tuesday, May 14, 13

Arrays and Functions

Global Register Allocation via Graph Coloring

Overview Of Op*miza*on, 2

Building a Parser Part III

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Instruction Selection, II Tree-pattern matching

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

The View from 35,000 Feet

Implementing Control Flow Constructs Comp 412

Transla'on Out of SSA Form

Instruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation

Goals of Program Optimization (1 of 2)

Compiler Optimization Techniques

opt. front end produce an intermediate representation optimizer transforms the code in IR form into an back end transforms the code in IR form into

Intermediate Representations

Calvin Lin The University of Texas at Austin

The Processor Memory Hierarchy

Intermediate Representations

Code Optimization. Code Optimization

Intermediate Code & Local Optimizations. Lecture 20

Lecture 12: Compiler Backend

CS415 Compilers. Procedure Abstractions. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

The ILOC Virtual Machine (Lab 1 Background Material) Comp 412

Computing Inside The Parser Syntax-Directed Translation. Comp 412

Languages and Compiler Design II IR Code Optimization

Intermediate representation

A Sparse Algorithm for Predicated Global Value Numbering

CS5363 Final Review. cs5363 1

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

Operational Semantics of Cool

Lecture 3 Local Optimizations, Intro to SSA

Intermediate Code & Local Optimizations

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

Lazy Code Motion. Comp 512 Spring 2011

CS 406/534 Compiler Construction Parsing Part I

Principles of Compiler Design

CS /534 Compiler Construction University of Massachusetts Lowell. NOTHING: A Language for Practice Implementation

CS 406/534 Compiler Construction Intermediate Representation and Procedure Abstraction

Loop Invariant Code Mo0on

Introduction to Parsing. Comp 412

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Instruction Selection and Scheduling

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

Compiling Techniques

CS 4201 Compilers 2014/2015 Handout: Lab 1

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Lexical Analysis. Introduction

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Lecture Outline. Intermediate code Intermediate Code & Local Optimizations. Local optimizations. Lecture 14. Next time: global optimizations

CSE 504: Compiler Design. Instruction Selection

Fall Compiler Principles Lecture 12: Register Allocation. Roman Manevich Ben-Gurion University

Introduction to Code Optimization. Lecture 36: Local Optimization. Basic Blocks. Basic-Block Example

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

Multi-dimensional Arrays

CS415 Compilers. Instruction Scheduling and Lexical Analysis

Operator Strength Reduc1on

Transcription:

COMP 412 FALL 2017 Local Optimization: Value Numbering The Desert Island Optimization Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. Chapter 8 in EaC2e

The Optimizer source code IR Front End Optimizer Back End IR target code Optimizer Typical Transformations Discover & propagate a constant value Move evaluation to a less-frequently executed place in the code Specialize code based on context Find & remove redundant code Remove useless or unreachable code Take advantage of a processor feature COMP 412, Fall 2017 1

Yet another meaning for scope Optimizers Work at Several Different Scopes Local optimization Operates entirely within a single basic block Properties of block lead to strong optimizations Regional optimization Operate on a region in the CFG that contains multiple blocks Loops, trees, paths, extended basic blocks Whole procedure optimization (intraprocedural or global) Operate on entire CFG for a procedure Presence of cyclic paths forces analysis then transformation Whole program optimization (interprocedural or universal) Operate on some or all of the call graph (multiple procedures) Must contend with call/return & parameter binding A basic block is a maximal length sequence of straight-line code. A control-flow graph (CFG) has a node for each basic block and an edge for each branch or jump. A call graph has a node for each COMP 412, Fall 2017 procedure and an edge for each 2 call

Scope of Optimization In scanning and parsing, scope refers to a region of the code that corresponds to a distinct name space. Recall that most ALLs and OOLs follow lexical scope rules In optimization scope refers to a region of the code that is subject to analysis and transformation. Notions are somewhat related Connection is not necessarily intuitive Different scopes introduces different challenges & different opportunities Historically, optimization has been performed at several distinct scopes. Today, we will look at one local optimization. Next lecture, we will look at a regional optimization. COMP 412, Fall 2017 3

The Optimizer source code IR Front End Optimizer Back End IR target code Optimizer Today: Local Value Numbering Discover & propagate a constant value Move evaluation to a less-frequently executed place in the code Specialize code based on context Find & remove redundant code Remove useless or unreachable code Take advantage of a processor feature COMP 412, Fall 2017 4

Redundancy Elimination as an Example An expression x+y is redundant if and only if, along every path from the procedure s entry, it has been evaluated, and its constituent subexpressions (x & y) have not been re-defined. If the compiler can prove that an expression is redundant It can preserve the results of earlier evaluations It can replace the current evaluation with a reference Two pieces to the problem Proving that x+y is redundant, or available Rewriting the code to eliminate the redundant evaluation One local technique for accomplishing both is called value numbering Assume a low-level, linear IR such as ILOC COMP 412, Fall 2017 5

Local Value Numbering Performing redundancy elimination in the local context works well Within a block, compiler understands the execution order Blocks that execute before or after the current block are uncertain territory Any value created inside the block is certain Much of the available improvement can be caught locally Redundancy in address expressions, constant folding Algebraic simplification of expressions Discovering constants at the whole procedure or whole program scope is profitable, but folding them is local What, then, is the role of larger scopes? Some opportunities need more context than a block Code motion & placement are clearly non-local Regional optimizations, such as improving a loop nest Removing useless or unreachable code requires a larger scope Discovering and using knowledge about the uncertain territory Serious opportunities exist, but compiler should get local ones first COMP 412, Fall 2017 6

Value Numbering Local algorithm due to Balke (1968) or Ershov (1954) The Key Notion Assign an identifying number, VN(n), to each expression VN(x+y) = VN(j) iff x+y and j always have the same value Use hashing over the value numbers to make it efficient Use these numbers to improve the code Improving the Code Replace redundant expressions Within a basic block; the definition becomes more complex across blocks Same VN Þ refer to prior value rather than recompute Simplify algebraic identities Discover constant-valued expressions, fold & propagate them Technique designed for low-level, linear IRs, similar methods exist for trees (e.g., build a DAG, a directed acyclic graph) COMP 412, Fall 2017 7

Local Value Numbering The Algorithm For each operation o = <operator, o 1, o 2 > in the block, in order 1. Get value numbers for operands from hash lookup 2. Hash <operator, VN(o 1 ), VN(o 2 )> to get a value number for o 3. If o already had a value number, replace o with a reference 4. If o 1 & o 2 are constant, evaluate it & replace with a loadi If hashing behaves, the algorithm runs in linear time Hashing 3 small integers is a hard case If not, use multi-set discrimination 1 or acyclic DFAs 2 Handling algebraic identities Case statement on operator type Handle special cases within each operator And, of course, give the result a value number Constant time per operation use a proven hash function & test it 1 see p. 256 in EaC2e 2 see 2.6.3 on p. 77 in EaC2e COMP 412, Fall 2017 8

Local Value Numbering A Simple Example Original Code a x + y * b x + y a 17 * c x + y With VNs a 3 x 1 + y 2 * b 3 x 1 + y 2 a 4 17 * c 3 x 1 + y 2 Rewritten a 3 x 1 + y 2 * b 3 a 3 a 4 17 * c 3 a 3 (oops!) Two redundancies Eliminate exprs with a * Coalesce results? Options Use c 3 b 3 Save a 3 in t 3 Rename around it COMP 412, Fall 2017 9

Local Value Numbering Example (continued) Original Code a 0 x 0 + y 0 * b 0 x 0 + y 0 a 1 17 * c 0 x 0 + y 0 With VNs a 03 x 0 1 + y 0 2 * b 03 x 01 + y 0 2 a 14 17 * c 03 x 01 + y 0 2 Rewritten a 03 x 0 1 + y 0 2 * b 03 a 0 3 a 14 17 * c 03 a 0 3 Renaming Give each value a unique name Makes it clear Notation While complex, the meaning is clear Result a 03 is available Simple rewriting now works COMP 412, Fall 2017 10

Digression on Renaming In the local register allocation lab, we introduced you to renaming Renaming fits rather naturally into that particular algorithm Choose a name space that matches the creation & destruction of values The right name space simplifies the transformation on the code This insight has profound consequences The compiler s choice of name space can differ from the programmer s This notion has become a theme of optimization in the last 20 years It has led to simpler, more effective algorithms The compiler may choose different name spaces at different times Name space for one situation may not be best for another Translation between name spaces has both costs and benefits In LVN, arbitrary names add myriad complications to the algorithm Lists to keep track of all names that hold VN 3, and code to maintain them, or introduction of temporary names, copies, & a copy-coalescing problem COMP 412, Fall 2017 11

Local Value Numbering The Algorithm For each operation o = <operator, o 1, o 2 > in the block, in order 1. Get value numbers for operands from hash lookup 2. Hash <operator,vn(o 1 ),VN(o 2 )> to get a value number for o 3. If o already had a value number, replace o with a reference 4. If o 1 & o 2 are constant, evaluate it & replace with a loadi asymptotic constants Complexity & Speed Issues Get value numbers search versus hash Hash <op,vn(o 1 ),VN(o 2 )> search versus hash Copy folding set value number of result Commutative ops hash twice versus sorting the operands Fast execution of the compiler s code is critical. As Randy Scarborough (FORTRAN H ENHANCED) put it, a compiler is one of the few applications that executes literally hundreds of millions of times. Modern JIT environments make the tie between compiler speed & application COMP 412, Fall 2017 speed even more critical. 12

Simple Extensions to Value Numbering Constant Folding Add a field to the hash table that records when a value is constant Evaluate constant values at compile-time Replace with load immediate or immediate operand No stronger local algorithm Algebraic Identities Must check (many) special cases Replace result with input VN Build a decision tree on operation No obvious way to hash Identities (on VNs) x y, x+0, x-0, x*1, x 1, x-x, x*0, x x, xú0, x Ù 0xFF FF, max(x,maxint), min(x,minint), max(x,x), min(y,y), and so on See Figure 8.3 on page 424 of EaC2e COMP 412, Fall 2017 13

Local Value Numbering (Recap) The LVN Algorithm, with bells & whistles for i 0 to n-1 1. get the value numbers V 1 and V 2 for L i and R i 2. if L i and R i are both constant then evaluate Li Op i R i, assign it to T i, and mark T i as a constant 3. if L i Op i R i matches an identity then replace operation i with a copy operation or an assignment 4. if Op i commutes and V 1 > V 2 then swap V 1 and V 2 5. construct a hash key <V 1,Op i,v 2 > 6. if the hash key is already present in the table then replace operation i with a copy into T i and mark T i with the VN else insert a new VN into table for hash key & mark T i with the VN Block is a sequence of n operations of the form T i L i Op i R i Constant folding Algebraic identities Commutativity COMP 412, Fall 2017 14

Another Brief Example In Lab 1, your allocator might have encountered, or generated, code that looked something like this sample: spill restore add r1, r2 => r3 loadi 4 => r4 store r3 => r4 loadi 4 => r17 load r17 => r18 add r18, r12 => r19 The uses of the value 4 are clearly redundant. If r4 is still alive at the load, the compiler could avoid the loadi that puts 4 into r17. LVN can rewrite this code to avoid the second loadi. Tracking the spill through RAM is more complicated, as you will learned in Lab 3. Ershov built LVN-like capability into an assembler, back in the early 1950s. It would simplify this example. COMP 412, Fall 2017 15

Value Numbering A m a + b n a + b LVN missed these opportunities (need stronger methods) B p c + d r c + d C q a + b r c + d LVN found these redundant ops D e b + 18 E e a + 17 s a + b t c + d u e + f u e + f G F y a + b z c + d v a + b w c + d x e + f Local Value Numbering 1 block at a time Strong local results No cross-block effects COMP 412, Fall 2017 16