Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Similar documents
Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Introduction to Optimization. CS434 Compiler Construction Joel Jones Department of Computer Science University of Alabama

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

Introduction to Optimization Local Value Numbering

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

Goals of Program Optimization (1 of 2)

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

CS293S Redundancy Removal. Yufei Ding

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Data Flow Analysis. Agenda CS738: Advanced Compiler Optimizations. 3-address Code Format. Assumptions

Lecture Compiler Middle-End

Compiler Construction 2016/2017 Loop Optimizations

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

Compiler Construction 2010/2011 Loop Optimizations

Tour of common optimizations

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Lecture 3 Local Optimizations, Intro to SSA

Calvin Lin The University of Texas at Austin

Intermediate representation

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Code Optimization. Code Optimization

Data Flow Analysis. Program Analysis

Introduction to Compilers

Office Hours: Mon/Wed 3:30-4:30 GDC Office Hours: Tue 3:30-4:30 Thu 3:30-4:30 GDC 5.

Introduction to Code Optimization. Lecture 36: Local Optimization. Basic Blocks. Basic-Block Example

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Compiler Optimization and Code Generation

MIT Introduction to Program Analysis and Optimization. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

CS 406/534 Compiler Construction Putting It All Together

Control Flow Analysis. Reading & Topics. Optimization Overview CS2210. Muchnick: chapter 7

CSE 501 Midterm Exam: Sketch of Some Plausible Solutions Winter 1997

Computer Science 160 Translation of Programming Languages

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

CS577 Modern Language Processors. Spring 2018 Lecture Optimization

Topic I (d): Static Single Assignment Form (SSA)

Optimization Prof. James L. Frankel Harvard University

Calvin Lin The University of Texas at Austin

CS5363 Final Review. cs5363 1

Languages and Compiler Design II IR Code Optimization

An example of optimization in LLVM. Compiler construction Step 1: Naive translation to LLVM. Step 2: Translating to SSA form (opt -mem2reg)

Compilers. Optimization. Yannis Smaragdakis, U. Athens

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210

Control Flow Analysis

Compiler Code Generation COMP360

Compiler Optimization Techniques

Register Allocation. CS 502 Lecture 14 11/25/08

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

Machine-Independent Optimizations

Intermediate Code & Local Optimizations

A main goal is to achieve a better performance. Code Optimization. Chapter 9

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

EECS 583 Class 8 Classic Optimization

CS153: Compilers Lecture 15: Local Optimization

What Do Compilers Do? How Can the Compiler Improve Performance? What Do We Mean By Optimization?

Why Global Dataflow Analysis?

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

Data-flow Analysis. Y.N. Srikant. Department of Computer Science and Automation Indian Institute of Science Bangalore

Compiler Optimization

A Propagation Engine for GCC

Intermediate Code & Local Optimizations. Lecture 20

CSC D70: Compiler Optimization Register Allocation

Compiler Optimization Intermediate Representation

CS 432 Fall Mike Lam, Professor. Data-Flow Analysis

Advanced C Programming

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

Intermediate Representations Part II

The View from 35,000 Feet

Overview of a Compiler

IR Optimization. May 15th, Tuesday, May 14, 13

Programming Language Processor Theory

CS553 Lecture Generalizing Data-flow Analysis 3

Lecture Outline. Intermediate code Intermediate Code & Local Optimizations. Local optimizations. Lecture 14. Next time: global optimizations

Using Static Single Assignment Form

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Code Generation. M.B.Chandak Lecture notes on Language Processing

Code generation and local optimization

COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013

Review. Pat Morin COMP 3002

Interprocedural Analysis. Dealing with Procedures. Course so far: Terminology

Code generation and local optimization

Intermediate Representation (IR)

More Dataflow Analysis

Compiler Construction SMD163. Understanding Optimization: Optimization is not Magic: Goals of Optimization: Lecture 11: Introduction to optimization

Code generation for modern processors

Global Register Allocation via Graph Coloring

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Code generation for modern processors

Compiler Design. Fall Data-Flow Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Lecture Notes on Loop Optimizations

More Code Generation and Optimization. Pat Morin COMP 3002

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall

COP5621 Exam 4 - Spring 2005

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Transcription:

Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis Synthesis of output program (back-end) Intermediate Code Generation Optimization Before and after generating machine code, devote one or more passes over the program to improve code quality token stream Syntactic Analysis abstract syntax tree Semantic Analysis annotated AST intermediate form Optimization intermediate form Code Generation target language Optimizations Identify inefficiencies in intermediate or target code Replace with equivalent but better sequences equivalent = "has the same externally visible behavior" Target-independent optimizations best done on IL code Target-dependent optimizations best done on target code 3 The Role of the Optimizer The compiler can implement a procedure in many ways The optimizer tries to find an implementation that is better Speed, code size, data space, To accomplish this, it Analyzes the code to derive knowledge about run-time behavior Data-flow analysis, pointer disambiguation, General term is static analysis Uses that knowledge in an attempt to improve the code Literally hundreds of transformations have been proposed Large amount of overlap between them Nothing optimal about optimization Proofs of optimality assume restrictive & unrealistic conditions Better goal is to usually improve 4 Traditional Three-pass Compiler The Optimizer (or Middle End) Source Code Front End IR Middle End IR Back End Machine code IR Opt IR Opt IR Opt IR Opt IR 1 3 n Errors Middle End does Code Improvement (Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code May also improve space, power consumption, Must preserve meaning of the code Measured by values of named variables A course (or two) unto itself 5 Errors Modern optimizers are structured as a series of passes Typical Transformations Discover & propagate some constant value Move a computation to a less frequently executed place Specialize some computation based on context Discover a redundant computation & remove it Remove useless or unreachable code Encode an idiom in some particularly efficient form 6 1

Kinds of optimizations Optimizations are characterized by which Transformation over what Scope. Typical scopes are: peephole: look at adjacent instructions local: look at straight-line sequence of statements global (intraprocedural): look at entire procedure whole program (interprocedural): look across procedures Larger scope => better optimization but more cost and Peephole Optimization After target code generation, look at adjacent instructions (a peephole on the code stream) try to replace adjacent instructions with something faster : movl %eax, 1(%ebp) movl 1(%ebp), %ebx => movl %eax, 1(%ebp) movl %eax, %ebx complexity 7 8 Algebraic Simplification constant folding, strength reduction z = 3 + 4; z = x + 0; z = x * 1; z = x * ; z = x * 8; z = x / 8; double x, y, z; z = (x + y) - y; Can be done by peephole optimizer, or by code generator 9 Local Optimizations Analysis and optimizations within a basic block Basic block: straight-line sequence of statements no control flow into or out of middle of sequence Better than peephole Not too hard to implement Machine-independent, if done on intermediate code 10 Local Constant Propagation If variable assigned a constant value, replace downstream uses of the variable with the constant. Can enable more constant folding : final int count = 10; x = count * 5; y = x ^ 3; Unoptimized intermediate code: t1 = 10; t = 5; t3 = t1 * t; x = t3; t4 = x; t5 = 3; t6 = exp(t4, t5); y = t6; 11 Local Dead Assignment (Store) Elimination If l.h.s. of assignment never referenced again before being overwritten, then can delete assignment. : final int count = 10; x = count * 5; y = x ^ 3; x = 7; Intermediate code after constant propagation: t1 = 10; t = 5; t3 = 50; x = 50; t4 = 50; t5 = 3; t6 = 15000; y = 15000; x = 7; Primary use: clean-up after previous optimizations! 1

Local Common Subexpression Elimination (AKA Redundancy Elimination) Avoid repeating the same calculation CSE of repeated loads: redundant load elimination Keep track of available expressions Source: a[i] + b[i] Unoptimized intermediate code: t1 = *(fp + ioffset); t = t1 * 4; t3 = fp + t; t4 = *(t3 + aoffset); t5 = *(fp + ioffset); t6 = t5 * 4; t7 = fp + t6; t8 = *(t7 + boffset); Redundancy Elimination Implementation An expression x+y is redundant if and only if, along every path from the procedure s entry, it has been evaluated, and its constituent subexpressions (x & y) have not been re-defined. If the compiler can prove that an expression is redundant It can preserve the results of earlier evaluations It can replace the current evaluation with a reference Two pieces to the problem Proving that x+y is redundant Rewriting the code to eliminate the redundant evaluation One technique for accomplishing both is called value numbering t9 = t4 + t8; 13 14 Value Numbering (An old idea) The key notion (Balke 1968 or Ershov 1954) Assign an identifying number, V(n), to each expression V(x+y) = V(j) iff x+y and j have the same value path Use hashing over the value numbers to make it efficient Use these numbers to improve the code Improving the code Replace redundant expressions Simplify algebraic identities Discover constant-valued expressions, fold & propagate them This technique was invented for low-level, linear IRs Local Value Numbering The Algorithm For each operation o = <operator, o 1, o > in the block 1 Get value numbers for operands from hash lookup Hash <operator,vn(o 1 ),VN(o )> to get a value number for o 3 If o already had a value number, replace o with a reference 4 If o 1 & o are constant, evaluate it & replace with a load If hashing behaves, the algorithm runs in linear time Handling algebraic identities Case statement on operator type Handle special cases within each operator Equivalent methods exist for trees (build a DAG ) 15 16 Local Value Numbering Local Value Numbering (continued) Original Code a x + y b x + y a 17 c x + y With VNs a 3 x 1 + y b 3 x 1 + y a 4 17 c 3 x 1 + y Rewritten a 3 x 1 + y b 3 a 3 a 4 17 c 3 a 3 (oops!) Original Code a 0 x 0 + y 0 b 0 x 0 + y 0 a 1 17 c 0 x 0 + y 0 With VNs a 03 x 0 1 + y 0 b 03 x 01 + y 0 a 14 17 c 03 x 01 + y 0 Rewritten a 03 x 0 1 + y 0 b 03 a 0 3 a 14 17 c 03 a 0 3 Two redundancies: Eliminate stmts with a Coalesce results? Options: Use c 3 b 3 Save a 3 in t 3 Rename around it Renaming: Give each value a unique name Makes it clear Notation: While complex, the meaning is clear Result: a 03 is available Rewriting just works 17 18 3

Simple Extensions to Value Numbering Constant folding Add a bit that records when a value is constant Evaluate constant values at compile-time Replace with load immediate or immediate operand No stronger local algorithm Algebraic identities Must check (many) special cases Replace result with input VN Build a decision tree on operation Identities: x y, x+0, x-0, x 1, x 1, x-x, x 0, x x, x 0, x 0xFF FF, max(x,maxint), min(x,minint), max(x,x), min(y,y), and so on With values, not names 19 Intraprocedural (Global) optimizations Enlarge scope of analysis to entire procedure more opportunities for optimization have to deal with branches, merges, and loops Can do constant propagation, common subexpression elimination, etc. at global level Can do new things, e.g. loop optimizations Optimizing compilers usually work at this level 0 Intraprocedural (Global) Optimizations Two data structures are commonly used to help analyze the of procedure body. Control flow graph (CFG) captures flow of control nodes are IL statements, or whole basic blocks edges represent control flow node with multiple successors = branch/switch node with multiple predecessors = merge loop in graph = loop Data flow graph (DFG) capture flow of data A common one is def/use chains: nodes are def(inition)s and uses edge from def to use a def can reach multiple uses a use can have multiple reaching defs 1 Control-flow Graph Models the transfer of control in the procedure Nodes in the graph are basic blocks Can be represented with quads or any other linear representation Edges in the graph represent control flow a b 5 if (x = y) c a * b a 3 b 4 Data-flow Graph Use/Def Chains Models the transfer of data in the procedure Nodes in the graph are definitions and uses Edges in the graph represent data flow a a if (x = y) a b 5 a 3 b 4 c a * b b b 3 Analysis and Transformation Each optimization is made up of some number of analyses followed by a transformation Analyze CFG and/or DFG by propagating info forward or backward along CFG and/or DFG edges edges called program points merges in graph require combining info loops in graph require iterative approximation Perform improving transformations based on info computed have to wait until any iterative approximation has converged Analysis must be conservative/safe/sound so that transformations preserve program behavior 4 4

Data-flow Analysis Data-flow analysis is a collection of techniques for compile-time reasoning about the run-time flow of values Almost always involves building a graph Problems are trivial on a basic block Global problems control-flow graph (or derivative) Whole program problems call graph (or derivative) Usually formulated as a set of simultaneous equations Sets attached to nodes and edges Lattice (or semilattice) to describe values Desired result is usually meet over all paths solution What is true on every path from the entry? Can this happen on any path from the entry? Related to the safety of optimization 5 Data-flow Analysis Limitations 1. Precision up to symbolic execution Assume all paths are taken. Solution cannot afford to compute MOP solution Large class of problems where MOP = MFP= LFP Not all problems of interest are in this class 3. Arrays treated naively in classical analysis Represent whole array with a single fact 4. Pointers difficult (and expensive) to analyze Imprecision rapidly adds up Good news: Need to ask the right questions Simple problems can carry us pretty far Summary For scalar values, we can quickly solve simple problems 6 Data-flow (Partial) : Constant Propagation, Folding Block b1 b b3 b4 Def {a {b {c {d LiveUse {b 0 0 {a,b LiveOut(b) = U LiveIn(i) i E Succ(b) b b = 1 b1 a = 1 a = b b4 d = a + b LiveIn(b) = LiveUse(b) U (LiveOut(b) Def(b)) b3 c = 1 Block b1 b b3 b4 LiveIn {b {a {a,b {a,b LiveOut {b {a,b {a,b 0 Can use either the CFG or the DFG CFG analysis info: table mapping each variable in scope to one of a particular constant NonConstant Undefined Transformation: at each instruction: if reference a variable that the table maps to a constant, then replace with that constant (constant propagation) if r.h.s. expression involves only constants, and has no sideeffects, then perform operation at compile-time and replace r.h.s. with constant result (constant folding) For best analysis, do constant folding as part of analysis, to learn all constants in one pass 7 8 x = 3; y = x * x; v = y - ; if (y > 10) { x = 5; y = y + 1; else { x = 6; y = x + 4; w = y / v; if (v > 0) { z = w * w; x = x - z; y = y - 1; System.out.println(x); Program 9 Analysis of Loops How to analyze a loop? i = 0; x = 10; y = 0; while () { // what s true here? i = i + 1; y = 30; // what s true here? x i y A safe but imprecise approach: forget everything when we enter or exit a loop A precise but unsafe approach: keep everything when we enter or exit a loop Can we do better? 30 5

back edge Loop Terminology tail preheader head entry edge loop Optimistic Iterative Analysis 1. Assuming info at loop head is same as info at loop entry. Then analyze loop body, computing info at back edge 3. Merge infos at loop back edge and loop entry 4. Test if merged info is same as original assumption a) If so, then we re done b) If not, then replace previous assumption with merged info, and goto step exit edge 31 3 i = 0; x = 10; y = 0; while () { // what s true here? i = i + 1; y = 30; // what s true here? x i y Why does optimistic iterative analysis work? Why are the results always conservative? Because if the algorithm stops, then the loop head info is at least as conservative as both the loop entry info and the loop back edge info the analysis within the loop body is conservative, given the assumption that the loop head info is conservative Why does the algorithm terminate? It might not! But it does if: there are only a finite number of times we could merge values together without reaching the worst case info (e.g. NotConstant) 33 34 Loop Optimization - Code Motion Goal: move loop-invariant calculations out of loops Can do at source level or at intermediate code level Source: for (i = 0; i < 10; i = i+1) { a[i] = a[i] + b[j]; z = z + 10000; Transformed source: t1 = b[j]; t = 10000; for (i = 0; i < 10; i = i+1) { a[i] = a[i] + t1; z = z + t; Loop Optimization - Induction Variable Elimination For-loop index is induction variable incremented each time around loop offsets & pointers calculated from it If used only to index arrays, can rewrite with pointers compute initial offsets/pointers before loop increment offsets/pointers each time around loop no expensive scaling in loop Source: for (i = 0; i < 10; i = i+1) { a[i] = a[i] + x; Transformed source: for (p = &a[0]; p < &a[10]; p = p+4) { *p = *p + x; then do loop-invariant code motion 35 36 6

Interprocedural ( Whole Program ) Optimizations Expand scope of analysis to procedures calling each other Can do local & intraprocedural optimizations at larger scope Can do new optimizations, e.g. inlining Inlining Replace procedure call with body of called procedure Source: double pi = 3.141597; double circle_area(double radius) { return pi * (radius * radius); double r = 5.0; double a = circle_area(r); After inlining: double r = 5.0; double a = pi * r * r; 37 (Then what?) 38 Summary Additional Material Enlarging scope of analysis yields better results today, most optimizing compilers work at the intraprocedural (global) level Optimizations organized as collections of passes, each rewriting IL in place into better version Presence of optimizations makes other parts of compiler (e.g. intermediate and target code generation) easier to write 39 40 Another example: live variable analysis Want the set of live variables at each pt. in program live: might be used later in the program Supports dead assignment elimination, register allocation What info computed for each program point? What is the requirement for this info to be conservative? How to merge two infos conservatively? How to analyze an assignment, e.g. X := Y + Z? given livevars before (or after?), what is computed after (or before?) What is live at procedure entry (or exit?)? 41 z := z+1 x := read() y := x * ; z := sin(y) return y y := x + 10; 4 7

Peephole Optimization of Jumps Eliminate jumps to jumps Eliminate jumps after conditional branches Adjacent instructions = adjacent in control flow Source code: IL: if (a < b) { if (c < d) { // do nothing else { stmt1; else { stmt; 43 Global Register Allocation Try to allocate local variables to registers If life times of two locals don t overlap, can give to same register Try to allocate most-frequently-used variables to registers first : int foo(int n, int x) { int sum; int i; int t; sum = x; for (i = n; i > 0; i=i-1) { sum = sum + i; t = sum * sum; return t; 44 Handling Larger Scopes Extended Basic Blocks Initialize table for b i with table from b Otherwise, it is complex i-1 With single-assignment naming, can use scoped hash table b b 4 b 5 b 6 b 1 b 3 The Plan: Process b 1, b, b 4 Pop two levels Process b 3 relative to b 1 Start clean with b 5 Start clean with b 6 Using a scoped table makes doing the full tree of EBBs that share a common header efficient. Handling Larger Scopes To go further, we must deal with merge points Our simple naming scheme falls apart in b 4 We need more powerful analysis tools Naming scheme becomes SSA This requires global data-flow analysis Compile-time reasoning about the run-time flow of values 1 Build a model of control-flow Pose questions as sets of simultaneous equations 3 Solve the equations 4 Use solution to transform the code b 1 b b 3 b 4 45 s: LIVE, REACHES, AVAIL 46 8