Compiler Construction SMD163. Understanding Optimization: Optimization is not Magic: Goals of Optimization: Lecture 11: Introduction to optimization

Similar documents
What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation

Compiler Construction D7011E

Machine-Independent Optimizations

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

What the CPU Sees Basic Flow Control Conditional Flow Control Structured Flow Control Functions and Scope. C Flow Control.

Code Generation. Lecture 30

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

Compiler Optimization Techniques

Lecture Outline. Code Generation. Lecture 30. Example of a Stack Machine Program. Stack Machines

Tour of common optimizations

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

CIT Week13 Lecture

What Do Compilers Do? How Can the Compiler Improve Performance? What Do We Mean By Optimization?

CODE GENERATION Monday, May 31, 2010

COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013

Machine Programming 1: Introduction

CS 2505 Computer Organization I

Assembly Language: Function Calls" Goals of this Lecture"

Introduction to Code Optimization. Lecture 36: Local Optimization. Basic Blocks. Basic-Block Example

Optimization Prof. James L. Frankel Harvard University

Assembly Language: Function Calls" Goals of this Lecture"

Compiler construction 2009

Compiler Design and Construction Optimization

Assembly Language: Function Calls

Running class Timing on Java HotSpot VM, 1

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Compiler construction in4303 lecture 9

Goals of Program Optimization (1 of 2)

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

CSE P 501 Exam 8/5/04 Sample Solution. 1. (10 points) Write a regular expression or regular expressions that generate the following sets of strings.

CS153: Compilers Lecture 15: Local Optimization

Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction

Assembly I: Basic Operations. Computer Systems Laboratory Sungkyunkwan University

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Code Optimization April 6, 2000

Lecture #16: Introduction to Runtime Organization. Last modified: Fri Mar 19 00:17: CS164: Lecture #16 1

CS 403 Compiler Construction Lecture 10 Code Optimization [Based on Chapter 8.5, 9.1 of Aho2]

Office Hours: Mon/Wed 3:30-4:30 GDC Office Hours: Tue 3:30-4:30 Thu 3:30-4:30 GDC 5.

Lecture 11 Code Optimization I: Machine Independent Optimizations. Optimizing Compilers. Limitations of Optimizing Compilers

Code Optimization. What is code optimization?

The Hardware/Software Interface CSE351 Spring 2013

Intermediate Code & Local Optimizations

Programming Language Implementation

Second Part of the Course

Assembly Language: Function Calls. Goals of this Lecture. Function Call Problems

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

CS , Fall 2004 Exam 1

Introduction to Optimization Local Value Numbering

Control flow graphs and loop optimizations. Thursday, October 24, 13

Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p

Compiler Optimization

8 Optimisation. 8.2 Machine-Independent Optimisation

Computer Systems Architecture I. CSE 560M Lecture 3 Prof. Patrick Crowley

Code Optimization September 27, 2001

CSC D70: Compiler Optimization

AS08-C++ and Assembly Calling and Returning. CS220 Logic Design AS08-C++ and Assembly. AS08-C++ and Assembly Calling Conventions

CHAPTER 3. Register allocation

CS 2505 Computer Organization I Test 2. Do not start the test until instructed to do so! printed

Assembly I: Basic Operations. Jo, Heeseung

CS 2505 Computer Organization I Test 2. Do not start the test until instructed to do so! printed

Great Reality #4. Code Optimization September 27, Optimizing Compilers. Limitations of Optimizing Compilers

Introduction to Computer Systems. Exam 1. February 22, Model Solution fp

Compilation /15a Lecture 7. Activation Records Noam Rinetzky

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

Compiler Construction 2010/2011 Loop Optimizations

ASSEMBLY I: BASIC OPERATIONS. Jo, Heeseung

! Must optimize at multiple levels: ! How programs are compiled and executed , F 02

Great Reality # The course that gives CMU its Zip! Code Optimization I: Machine Independent Optimizations Feb 11, 2003

CS 31: Intro to Systems Functions and the Stack. Martin Gagne Swarthmore College February 23, 2016

CSC 2400: Computing Systems. X86 Assembly: Function Calls"

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Introduction to Computer Systems. Exam 1. February 22, This is an open-book exam. Notes are permitted, but not computers.

Principles of Compiler Design

CS429: Computer Organization and Architecture

Compiler Construction 2016/2017 Loop Optimizations

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

CS577 Modern Language Processors. Spring 2018 Lecture Optimization

ASSEMBLY II: CONTROL FLOW. Jo, Heeseung

Systems I. Code Optimization I: Machine Independent Optimizations

CS241 Computer Organization Spring 2015 IA

Compiler Construction D7011E

Machine-Level Programming I: Introduction Jan. 30, 2001

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

CISC 360 Instruction Set Architecture

CprE 488 Embedded Systems Design. Lecture 6 Software Optimization

Instruction Set Architecture

Function Calls COS 217. Reading: Chapter 4 of Programming From the Ground Up (available online from the course Web site)

1 /* file cpuid2.s */ 4.asciz "The processor Vendor ID is %s \n" 5.section.bss. 6.lcomm buffer, section.text. 8.globl _start.

More Code Generation and Optimization. Pat Morin COMP 3002

Instruction Set Architecture

Instruction Set Architecture

Corrections made in this version not in first posting:

Project 5: Extensions to the MiniJava Compiler

Assembly II: Control Flow. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Transcription:

Compiler Construction SMD163 Understanding Optimization: Lecture 11: Introduction to optimization Viktor Leijon & Peter Jonsson with slides by Johan Nordlander Contains material generously provided by Mark P. Jones 1 2 Goals of Optimization: Optimization is about improving the target programs that are generated by a compiler. Optimization is not Magic: Optimizing compilers are just a tool; they work with what they re given. In most cases, optimization has two principal goals:! Time: make programs run faster;! Space: make programs use less memory. No optimizing compiler can make up for poor choice of algorithms or data structures. Other applications of optimization include:! Adapting code to particular architectures; etc 3 4

Optimization is not Absolute: Some optimization techniques give clear wins in both time and space. Others may require us to trade one against the other. The priorities that we attach to different optimizations will depend on the application.! In embedded systems, memory is often limited, and speed may be less of an issue (e.g., a VCR).! In high performance systems, execution speed is critical (e.g., video games). Optimization is not Free: Some optimizations only apply in particular situations, and require time-consuming analysis of the program. Use of such optimizations is only justified for programs that will be run often or for a long time. Optimization is appropriate in the construction and testing of products before wide release/distribution. 5 6 Optimization is a Misnomer: A compiler writer s job will never be done; there are always opportunities for new optimizations. Proof: Suppose that there is an optimizing compiler Comp that can optimize any program to the shortest possible equivalent program. Then Comp will compile any program that goes into an infinite loop without any output to the following, easily recognizable loop: lab : jmp lab This is impossible, because a program that did this Terminology: Terms like program optimization and optimizing compiler are firmly established. But we cannot build a truly optimizing compiler! We will focus instead on techniques for improving programs. But, following common usage, we will still refer to each one as an optimization. would be able to solve the halting problem. 7 8

Optimization by Transformation: Optimizations are program transformations. Most apply only in particular circumstances. The effectiveness of an optimization depends on the program in which it is applied.! Optimization of a particular language feature will have no impact on a program that does not use it.! In some cases, an optimization may actually result in a slower or bigger program. 9 Correctness is Essential! In all cases, it is essential that optimization preserves meaning: the optimized program must have the same meaning/behavior as the original program. Such transformations are often described as being safe. Better safe than sorry... if an optimization isn t safe, you shouldn't use it! A slow program that gives the right answer is better than a program that gives the wrong answer quickly. 10 An Suppose that we have a loop: for (int i=0; i<n; i++) { x/y If we can ensure that the values of x and y do not change on each iteration, then we can optimize it to: z = x/y; for (int i=0; i<n; i++) { z This is an example of code motion. 11 Take Care! (part one) If N=0, then the optimized code will evaluate x/y once, but the original won t evaluate it at all! So this is only an optimization if we can be sure that the loop will be executed at least once. 12

Take Care! (part two) Caveat Optimizer: If N=0, then the optimized program might raise an divide by zero exception where the original runs without fault. So the optimization is applicable only if:! We know that y will never be zero; or! We know that the loop will always be executed at least once, and that there are no other observable effects in the code between the new and old positions of x/y. 13 Optimizations can be quite subtle, and may require detailed analysis of a program to determine:! whether they are applicable;! whether they are safe; and! whether they will actually improve the program. In general, these questions are undecidable. But we always have the option not to use a given optimization! 14 Combining Optimizations: The task of an optimizing compiler is to choose and apply an appropriate sequence of optimizations: o 1 o 2 o 3 o 4 o 5 o 6 o 7 P 0 P 1 P 2 P 3 P 4 P 5 P 6 P 7 Applying one optimization may create new opportunities to apply another; the order in which they are applied can make a difference. If intermediate steps preserve behavior, each program will be equivalent to the original. Controlling Optimization: Compilers often allow programmers to control the use of optimization techniques:! To set priorities (for example, time or space?);! To select/deselect particular kinds of optimization;! To limit the time or the number of steps that are spent in optimization. 15 16

Optimization by Hand: Programmers sometimes have an opportunity to optimize their code by hand. Beware:! It s difficult to get right, it can obscure the code, it can make it less portable, and it can introduce bugs.! It s hard to compete with a good optimizing compiler. A Catalogue of Common Optimization Techniques: If performance is critical and you need to optimize by hand:! Wait until the program is almost finished;! Use a profiler to identify the hot spots. 17 18 An Overview: Dead Code Elimination: Compiler writers and researchers have discovered many different techniques for optimization. Unreachable code can be eliminated.! Code that follows a return, break, continue, or goto and has no label can be eliminated. For example, some of the most common optimization techniques try to remove:! Code that serves no useful purpose;! Code that repeats earlier computations;! Code that uses an inefficient method to calculate a value;! Code that carries an unnecessary overhead;! Etc 19 int f(int x) { int f(int x) { return x+1; return x+1;! Code that appears in functions that are never called can be eliminated. (This process is sometimes described as tree-shaking.) 20

Continued But be Careful! Code that has no effect can be eliminated.! An assignment to a variable that will not be used again can be eliminated. int f(int x) { int f(int x) { int temp = x*x; return x+1; return x+1; Items that have an effect cannot be eliminated:! An assignment to a variable that will not be used again can be eliminated. int f(int x) { int f(int x) { int temp = g(x); g(x); return x+1; return x+1;! An assignment to a variable that will be overwritten before it is used again can be eliminated. x = y; x = z; x = z; 21! An assignment to a variable that will be overwritten before it is used again can be eliminated. x = f1()+f2(); f1(); x = z; f2(); x = z; 22 Common-Subexpression Elimination: Copy and Constant Propagation: The results of computations can be shared rather than duplicated. An assignment of the form x = y; is called a copy instruction. x = a + b; x = a + b; y = a + b; y = x; x = y; x = y; z = x; z = y; x = (a+b)*(a+b); t = a + b; x = t * t; But beware side effects! x = 0; x = 0; z = x; z = 0; x = f(a)*f(a); t = f(a); x = t * t; What have we gained? Nothing directly, but if we manage to remove all references to x, then the first assignment will become dead code. 23 24

Constant-Folding: Do not put off until run-time what you can do at compile-time. x = 2^8 1; x = 255; More generally: evaluate expressions involving only constant values at compile-time. Strength-Reduction: Replace expensive operations with cheaper, but equivalent ones. For example: x**2 = x * x 2 * x = x + x x / 2 = x >> 1 x * 16 = x << 4 x % 128 = x & 127 25 26 Algebraic Identities: Standard algebraic identities can often be put to good use when only part of a program s data is known at compile-time. For example: x + 0 = x x 0 = x x x = 0 x * 1 = x x * 0 = 0 x / 1 = x Who writes x+0 in source code? How many programmers would actually write an expression like x + 0 in their source code? Are these optimizations of any use in practice? Yes!! Examples like this can occur in handwritten programs when symbolic constants are used.! Examples like this can show up in the code that we generate for other language constructs. For example, the address of a[0] is: a + 4*0 = a + 0 = a. 27 28

Continued At first, some identities might not seem to have any significant applications: Examples include associativity: (x + y) + z = x + (y + z) and commutativity: Enabling Transformations: Use of associativity and commutativity laws can open up opportunities for other optimizations. a = b+c; a = b + c; t = (c+d)+b; t = (b+c) + d; x + y = y + x (But don t forget the role that commutativity played in our register allocator). a = b + c; t = a + d; 29 30 Another Identities for Floating Point: Suppose that d is an array of Date objects: tag day month year Floating point numbers do not behave like real numbers floating point operators do not satisfy many useful laws.! Associativity? Now suppose that we want to access d[3].month; then we need to load the value at address: (d + 16*3) + 8 = d + (16*3 + 8) associativity = d + 56. constant folding 31 small+(big+(-big)) = small + 0 = small, but (small+big)+(-big) = big+(-big) = 0.! Additive Identities? NaN + 0 raises an exception, NaN does not.! Multiplicative zeroes? inf * 0 = NaN. 32

What the Language Permits : Even for integer values, algebraic identities can only be used within whatever scope the host language permits. For example, the definition of Fortran 77 states that the order of evaluation of expressions involving parentheses must respect the parentheses. (So Fortran 77 is a language in which parenthesized expressions would show up in the abstract syntax.) Removing Overhead: When we call a function, we spend some time constructing and then destroying the stack frame. When we execute a loop, we spend some time setting up and testing the loop variable. If the body of the function, or the body of the loop is small, then the overhead associated with either of these will be quite large in proportion. 33 34 Function Inlining: If we know the body of a function, we can use that instead of a call. For example, suppose we know int square(int x) { return x*x; Then we can rewrite a section of code: { square(square(x)) as: { int t1 = x*x; t2 = t1*t1; t2 Cautionary Notes: Inlining a large function many times can increase the size of the compiled program. Naïve inlining by copying text can increase the amount of work to be done. For example, changing: square(square(f(x))) to (f(x)*f(x)) * (f(x)*f(x)) will require 3 multiplications, not 2, and will duplicate any side-effect of f(x). 35 36

Loop Unrolling: For example, we can rewrite a section of code: for (int i=0; i<3; i++) { f(i); as: f(0); f(1); f(2); Typically produces more code But faster because we have eliminated a temporary variable, and all the operations on it. 37 Peephole Optimization: It is often possible to implement useful optimizations by looking for simple patterns in small sections of generated assembly code. Looking at the code through a peephole. To a large extent, the choice of peephole optimizations depends on the target machine. 38 Examples for IA 32: Summary: An instruction of the form addl $1,reg can be replaced by incl reg. An instruction of the form imul $2,reg can be replaced by addl reg,reg. In a sequence of instructions: movl reg,var movl var,reg the second instruction can be deleted, provided that it does not have a label. In a sequence of instructions: addl $4,%esp movl %ebp, %esp We have looked at: The basic goals and limits of optimization. A catalogue of standard optimization techniques:! Dead-code elimination;! Common subexpression elimination;! Constant and copy propagation;! Constant folding;! Strength reduction;! Algebraic identities;! Function inlining;! Loop unrolling;! Peephole optimization. Next: Putting these techniques into practice the first instruction is dead code. 39 40

Source High, Target Low: Optimization using an Intermediate Language: Source code is often too high-level to reveal opportunities for optimization.! For example, the assignment a[i] = x + y requires an (implicit) calculation of the address of the array element a[i]. Target code is often too low-level to reveal opportunities for optimization.! For example, temporary values have already been assigned to registers and it is more difficult to identity repeated or redundant computations. 41 42 Code Generation Breaking Down Programs: Here s part of a MiniJava program and the code that we might generate from it: class C { int[] makearray(int n) { int[] a; a = new int[200]; a[3] = n; return a; There is an independent sequence C_makeArray: pushl %ebp movl %esp,%ebp subl $4,%esp pushl $800 call _malloc addl $4,%esp movl %eax,-4(%ebp) movl 8(%ebp),%eax movl -4(%ebp),%ebx movl %eax,12(%ebx) movl -4(%ebp),%eax movl %ebp,%esp popl %ebp The constructs of a language, and the nature of the problem that is being solved, will lead a programmer to break down a program into a particular sequence of tasks. The output from a compiler is supposed to execute the same sequence of tasks. There is no reason, however, for it to use exactly the same break down as the programmer. ret of instructions for each statement 43 44

Code Generation The same program broken down into basic blocks class C { int[] makearray(int n) { int[] a; a = new int[200]; a[3] = n; return a; The correspondence is less direct but there are more opportunities for optimization. C_makeArray: pushl %ebp movl %esp,%ebp subl $4,%esp pushl $800 call _malloc addl $4,%esp movl %eax,-4(%ebp) movl 8(%ebp),%eax movl -4(%ebp),%ebx movl %eax,12(%ebx) movl -4(%ebp),%eax movl %ebp,%esp popl %ebp ret Intermediate Code: Intermediate code provides a compromise between the extremes of source and target code. It aims to be:! Sufficiently low-level to capture single steps in the program.! Sufficiently high-level to avoid machine dependencies, and premature code generation. Intermediate codes are usually some kind of idealized machine code. As a useful side-benefit, intermediate code provides a degree of portability (e.g., RTL in gcc, Java bytecodes and the JVM). But working at the level of 386 assembly code is difficult! 45 46 A High-Level View: Flat input UNCOL The Search Goes On: UNCOL is an old (1958), and as yet unrealized dream of compiler writers: Structure Intermediate Code Flat output Optimizations 47 UNiversal Computer Oriented Language It was hoped that a universal intermediate code could serve as a meeting point for all languages. One front end for each language, one backend for each machine, and smooth interoperability between languages. But no satisfactory UNCOL has been found yet; The range of programming languages is very diverse! There have been numerous attempts, some ongoing: ANDF, C, JVM, UVM, C--, 48

Three-Address Code: For Three-address code: a simple UNCOL? Three address code is primarily a sequence of statements of the general form: x := y op z where x, y and z are names, constants or compiler generated temporaries. For example, evaluation of x+y*z becomes: t1 := y * z t2 := x + t1 49 In three address code, the statement a[i] = a[i]+a[j] can be expressed as: t1 := 4 * i t2 := a[t1] t3 := 4 * j t4 := a[t3] t5 := t2 + t4 t6 := 4 * i a[t6] := t5 The calculation of 4*i is duplicated! 50 Intermediate Code as Trees: Three address code is really just a linear representation of the syntax tree for the intermediate code. x t2:+ y t1:* z Quads: The statement a = b * (-c) + b * (-c) is represented by the following three-address code: 0) t1 := -c (uminus, t1, c, _) 1) t2 := b * t1 (mult, t2, b, t1) 2) t3 := -c (uminus, t3, b, _) 3) t4 := b * t3 (mult, t4, b, t3) 4) t5 := t2 + t4 (add, t5, t2, t4) 5) a := t5 (save, a, t5, _) Other forms of instruction, for example, goto, unary operations, conditional branches, etc. are also used in practice. 51 In practice, three address code is often represented or described by quadruples (quads), of the form (op, dest, arg1, arg2). 52

Finding Basic Blocks: A basic block is a sequence of statements where control only enters at the first statement and leaves at the last. To partition a sequence of instructions into basic blocks, start by identifying the leaders:! The first statement is a leader.! An statement that is the target of a call or a goto (conditional or unconditional) is a leader.! Any statement that immediately follows a call or a goto is a leader. An Consider the following implementation of quicksort: void quicksort(int m, int n) { if (m<n) { int i = m - 1, j = n, p = a[n], t; while (1) { do { i = i + 1; while (a[i] < p); do { j = j - 1; while (a[j] > p); if (i >= j) break; t = a[i]; a[i] = a[j]; a[j] = t; t = a[i]; a[i] = a[n]; a[n] = t; // sorts global array a For each leader, there is a basic block consisting of the leader and all following statements up to, but not including, the next leader or the end of the program. 53 quicksort(m,j); quicksort(i+1,n); We will focus on optimizing the highlighted section. 54 In Three-Address code Flow Graphs: Translation of the core of quicksort into 3-address code: 1) 16) t7 := 4*i 2) 17) t8 := 4*j 3) 18) t9 := a[t8] 4) 19) a[t7] := t9 5) 20) t10 := 4*j 6) 21) a[t10] := t 7) 22) goto 5 8) if t3<p goto 5 23) t11 := 4*i 9) 24) t := a[t11] 10) 25) t12 := 4*i 11) 26) t13 := 4*n 12) if t5>p goto 9 27) t14 := a[t13] 13) if i>=j goto 23 28) a[t12] := t14 14) t6 := 4*i 29) t15 := 4*n 15) t := a[t6] 30) a[t15] := t There are six basic blocks: 1 4, 5 8, 9 12, 13, 14 22, 23 30. 55 We add directed edges between basic blocks to capture control flow. One basic block is distinguished as initial; it contains the first statement to be executed For any pair of basic blocks B 1 and B 2, there is an edge from B 1 to B 2 if B 2 can directly follow B 1 in some execution sequence. 56

t6 := 4*i t := a[t6] t7 := 4*i t8 := 4*j t9 := a[t8] a[t7] := t9 t10 := 4*j a[t10] := t goto if t3<p goto if t5>p goto t11 := 4*i t := a[t11] t12 := 4*i t13 := 4*n t14 := a[t13] a[t12] := t14 t15 := 4*n a[t15] := t 57 Program = Flow Graph + Blocks: if t3<p goto if t5>p goto t6 := 4*i t := a[t6] t7 := 4*i t8 := 4*j t9 := a[t8] a[t7] := t9 t10 := 4*j a[t10] := t goto t11 := 4*i t := a[t11] t12 := 4*i t13 := 4*n t14 := a[t13] a[t12] := t14 t15 := 4*n a[t15] := t 58 Optimizing Basic Block Code: Common Subexpression Elimination: Our goal is to optimize programs expressed in this format by transforming their basic blocks. In general, there are two kinds of transformation that we might want to use:! Local transformations, which can be applied to individual basic blocks, regardless of where they appear in the flowgraph.! Global transformations, which typically make use of information about larger sections of the flowgraph. Many transformations can be performed at both the local and global levels. Local transformations are usually performed first. 59 Consider the basic block: 1) a := b + c 2) b := a - d 3) c := b + c 4) d := a - d The second and fourth statements calculate the same value, so this block can be rewritten as: 1) a := b + c 2) b := a - d 3) c := b + c 4) d := b Note that, even though the same expression, b + c, appears on the right of both the first and third lines, it does not have the same value in each case. 60

A More Careful Analysis: To see this more formally, consider the following, annotating each variable on the right with the number of the step where it was last defined: if t3<p goto a b c d 0 0 0 0 1) a := b + c a1 := b0 + c0 1 0 0 0 2) b := a - d b2 := a1 - d0 1 2 0 0 3) c := b + c c3 := b2 + c0 1 2 3 0 4) d := a - d d4 := a1 - d0 1 2 3 4 61 t6 := 4*i t := a[t6] t7 := 4*i t8 := 4*j t9 := a[t8] a[t7] := t9 t10 := 4*j a[t10] := t goto if t5>p goto t11 := 4*i t := a[t11] t12 := 4*i t13 := 4*n t14 := a[t13] a[t12] := t14 t15 := 4*n a[t15] := t 62 Local CSE on blocks and Local CSE on blocks and if t3<p goto BEFORE if t3<p goto AFTER t6 := 4*i t := a[t6] t7 := 4*i t8 := 4*j t9 := a[t8] a[t7] := t9 t10 := 4*j a[t10] := t goto if t5>p goto t11 := 4*i t := a[t11] t12 := 4*i t13 := 4*n t14 := a[t13] a[t12] := t14 t15 := 4*n a[t15] := t 63 t6 := 4*i t := a[t6] t7 := t6 t8 := 4*j t9 := a[t8] a[t7] := t9 t10 := t8 a[t10] := t goto if t5>p goto t11 := 4*i t := a[t11] t12 := t11 t13 := 4*n t14 := a[t13] a[t12] := t14 t15 := t13 a[t15] := t 64

Copy Propagation for t7, t10, t12, t15 Copy Propagation for t7, t10, t12, t15 if t3<p goto BEFORE if t3<p goto AFTER t6 := 4*i t := a[t6] t7 := t6 t8 := 4*j t9 := a[t8] a[t7] := t9 t10 := t8 a[t10] := t goto if t5>p goto t11 := 4*i t := a[t11] t12 := t11 t13 := 4*n t14 := a[t13] a[t12] := t14 t15 := t13 a[t15] := t 65 t6 := 4*i t := a[t6] t7 := t6 t8 := 4*j t9 := a[t8] a[t6] := t9 t10 := t8 a[t8] := t goto if t5>p goto t11 := 4*i t := a[t11] t12 := t11 t13 := 4*n t14 := a[t13] a[t11] := t14 t15 := t13 a[t13] := t 66 Dead Code elimination for t7, t10, t12, t15 Dead Code elimination for t7, t10, t12, t15 if t3<p goto BEFORE if t3<p goto AFTER t6 := 4*i t := a[t6] t7 := t6 t8 := 4*j t9 := a[t8] a[t6] := t9 t10 := t8 a[t8] := t goto if t5>p goto t11 := 4*i t := a[t11] t12 := t11 t13 := 4*n t14 := a[t13] a[t11] := t14 t15 := t13 a[t13] := t 67 t6 := 4*i t := a[t6] t8 := 4*j t9 := a[t8] a[t6] := t9 a[t8] := t goto if t5>p goto t11 := 4*i t := a[t11] t13 := 4*n t14 := a[t13] a[t11] := t14 a[t13] := t 68

Global CSE: Global CSE: if t3<p goto BEFORE if t3<p goto AFTER t6 := 4*i t := a[t6] if t5>p goto t11 := 4*i t := a[t11] t6 := t2 t := a[t6] if t5>p goto t11 := t2 t := a[t11] t8 := 4*j t9 := a[t8] a[t6] := t9 a[t8] := t goto t13 := 4*n t14 := a[t13] a[t11] := t14 a[t13] := t 69 t8 := t4 t9 := a[t8] a[t6] := t9 a[t8] := t goto t13 := t1 t14 := a[t13] a[t11] := t14 a[t13] := t 70 Copy Propagation on t6, t8, t11, t13 Copy Propagation on t6, t8, t11, t13 if t3<p goto BEFORE if t3<p goto AFTER t6 := t2 t := a[t6] if t5>p goto t11 := t2 t := a[t11] t6 := t2 t := a[t2] if t5>p goto t11 := t2 t := a[t2] t8 := t4 t9 := a[t8] a[t6] := t9 a[t8] := t goto t13 := t1 t14 := a[t13] a[t11] := t14 a[t13] := t 71 t8 := t4 t9 := a[t4] a[t2] := t9 a[t4] := t goto t13 := t1 t14 := a[t1] a[t2] := t14 a[t1] := t 72

Dead Code Elimination on t6, t8, t11, t13 Dead Code Elimination on t6, t8, t11, t13 if t3<p goto BEFORE if t3<p goto AFTER t6 := t2 t := a[t2] if t5>p goto t11 := t2 t := a[t2] t := a[t2] if t5>p goto t := a[t2] t8 := t4 t9 := a[t4] a[t2] := t9 a[t4] := t goto t13 := t1 t14 := a[t1] a[t2] := t14 a[t1] := t 73 t9 := a[t4] a[t2] := t9 a[t4] := t goto t14 := a[t1] a[t2] := t14 a[t1] := t 74 Global CSE Global CSE if t3<p goto BEFORE if t3<p goto AFTER t := a[t2] if t5>p goto t := a[t2] t := t3 if t5>p goto t := t3 t9 := a[t4] a[t2] := t9 a[t4] := t goto t14 := a[t1] a[t2] := t14 a[t1] := t 75 t9 := t5 a[t2] := t9 a[t4] := t goto t14 := p a[t2] := t14 a[t1] := t 76

Copy Propagation on t, t9, t14 Copy Propagation on t, t9, t14 if t3<p goto BEFORE if t3<p goto AFTER t := t3 if t5>p goto t := t3 t := t3 if t5>p goto t := t3 t9 := t5 a[t2] := t9 a[t4] := t goto t14 := p a[t2] := t14 a[t1] := t 77 t9 := t5 a[t2] := t5 a[t4] := t3 goto t14 := p a[t2] := p a[t1] := t3 78 Dead Code Elimination on t, t9, t14 Dead Code Elimination on t, t9, t14 if t3<p goto BEFORE if t3<p goto AFTER t := t3 if t5>p goto t := t3 if t5>p goto t9 := t5 a[t2] := t5 a[t4] := t3 goto t14 := p a[t2] := p a[t1] := t3 79 a[t2] := t5 a[t4] := t3 goto a[t2] := p a[t1] := t3 80

Summary: A fairly complex process, but described using simple steps, that are sequenced and repeated until we get a good result. What more can we do? Where should we focus our efforts? Loop Optimization: Loops are an obvious source of repeated computation, and good candidates for optimization. The most important loop optimizations are: 1) Code motion: move loop-invariant code outside the loop. 2) Strength reduction: replacing expensive operations with cheaper ones. 81 3) Induction variables: recognize relationships between the values of variables on each pass through the body of a loop. 82 1) Code Motion: If we can decrease the amount of code in the body of a loop, then we can also decrease the execution time for each iteration of the loop. Avoid duplicated computation and you duplicate the savings instead! We need to find loop invariants: expressions that are guaranteed to have the same value on each pass through the loop. 83 Loop Invariants: Start by looking for variables that do not change then extend to expressions. For example, if x and y are invariant, then so are x+y and x*y. Use algebraic identities to increase the size of the loop invariant expressions. For example by rewriting (x-z)+y as (x+y)-z, we might be able to extract x+y as an invariant expression. But beware of aliasing! Expression a[i] is not necessarily invariant just because a and i are. 84

Moving the Code: When we move the code, we might need to introduce new temporaries. For example, the C/C++ loop: for (i = 0; i < n * n; i++)... code which doesn't change n... Can be rewritten as: t1 = n * n; for (i = 0; i < t1; i++)... code which doesn't change n... because the expression n*n is a loop invariant. 85 2) Strength Reduction: Instead of using expensive multiplications, we can obtain the same results using simple shifts, if one of the operands is a constant power of 2. For example, in the quicksort code, we can replace t2 := 4*i and with the cheaper operations t2 := i << 2 and t4 := j << 2. The same idea can be used to simplify uses of other expensive operations, including / and %. Strength reduction is however a much more general technique, and more opportunities can be revealed if we are able to identify some induction variables. 86 3) Induction Variables: An induction variable takes values in some arithmetic progression as we step through a loop. Variables that are used to control a loop often show this behavior: for (int j = 10; j<20; j++) And it can happen to other variables in the loop too for (int j = 10; j<20; j++) { int jthodd = 2*j + 1; 87 Induction Variables: In the quicksort example, notice that: Every time that i increases by 1, the value of t2 = 4*i increases by 4. Every time that j decreases by 1, the value of t4 = 4*j decreases by 4. As a result: Every time that i increases by 1, the address of a[t2] increases by 4. Every time that j decreases by 1, the address of a[t4] decreases by 4. 88

Another Strength Reduction: Instead of using expensive multiplications, why not initialize (and ) at the beginning of the loop and then increment t2 (and decrement t4) on each iteration...? Before: if t3<p goto This is where we had got to previously This is another form of strength reduction. if t5>p goto 89 a[t2] := t5 a[t4] := t3 goto a[t2] := p a[t1] := t3 90 After: t2 := 4 * i t4 := 4 * j t2 := t2 + 4 if t3<p goto Induction Variables t2 and t4, and strength reduction: More Strength Reduction: Consider the following loop: for (int i = 0; i<n; i++) { t = t + (i*i); t4 := t4-4 if t5>p goto This code does N multiplications, and N additions. Multiplies are expensive: can we eliminate them? a[t2] := t5 a[t4] := t3 goto a[t2] := p a[t1] := t3 91 92

Identifying Induction Variables: Differences of Differences: As i takes on values 0, 1, 2, 3, 4, 5, 6, so i*i takes values 0, 1, 4, 9, 16, 25, 36, We ve seen that the difference from one value of (i*i) to the next is d = 2*i + 1. Note that: (i+1)*(i+1) = i*i + 2*i + 1 So we could rewrite the loop as: int u = 0; for (int i = 0; i<n; i++) { t = t + u; u = u + 2*i + 1; Can be reduced to a shift. 93 What is the difference from one value of d to the next? So now we can rewrite the loop again as: int u = 0; int d = 1; for (int i = 0; i<n; i++) { t = t + u; u = u + d; d = d + 2; Longer, but each multiply has been replaced by two adds! 94 Choosing an Intermediate Code: Our intermediate code separates out the process of accessing an array element into two stages:! Multiply the index by 4;! Look up the value in the array. On a 386, we can use a single instruction to accomplish the same thing (and it s fast):! movl a(,%eax,4), %eax Doesn t our choice of intermediate code force us to use the less efficient version?! imull $4, %eax; movl a(%eax), %eax 95 On the other hand By including indexed addressing, perhaps we ve chosen an intermediate code that is too highlevel. Suppose that we used an intermediate code that has only a simple load instruction x := [y] and in which the address calculation must be done explicitly 96

Before: [u2] := t5 [u4] := t3 goto u1 := a + t1 p := [u1] u2 := a + t2 t3 := [u2] if t3<p goto u4 := a + t4 t5 := [u4] if t5>p goto This is where we would have got to previously if we d had only load instructions: v := [u] and save instructions: [v] := u. [u2] := p [u1] := t3 After: [u2] := t5 [u4] := t3 goto u1 := a + t1 u2 := a + t2 u4 := u1 p := [u1] u2 := u2 + 4 t3 := [u2] if t3<p goto u4 := u4-4 t5 := [u4] if t5>p goto Now we ve used an optimization based on the observation that the addresses of a[i] and a[j] are induction variables! [u2] := p [u1] := t3 97 98 The Right Level of Abstraction? Designing an intermediate code is hard because it is difficult to get the right level of abstraction:! Too high-level, and you will hide opportunities for optimization! Too low level, and it will be harder to utilize advanced target instructions and addressing modes. Simple loads and saves only Indexed loads and saves Increasing level of abstraction Indexed and scaled loads and saves 99 Instruction Selection Finding the best match between an intermediate program and the available machine instructions is the act of instruction selection. Together with register allocation, instruction selection constitutes the proper code generation phase in a compiler that uses an intermediate representation. If the intermediate language is low level, instruction selection might involve mapping several intermediate operations onto a single target machine instruction. More on instruction selection next week. 100

Summary: In this lecture we have seen: How optimization techniques can be used in practice. The role of intermediate code, illustrated by three-address code. Using flow graphs to capture the control flow in a particular program. Using basic blocks to coalesce the effects of multiple program statements into a single unit. 101