Compiler Construction SMD163. Understanding Optimization: Optimization is not Magic: Goals of Optimization: Lecture 11: Introduction to optimization

Size: px
Start display at page:

Download "Compiler Construction SMD163. Understanding Optimization: Optimization is not Magic: Goals of Optimization: Lecture 11: Introduction to optimization"

Transcription

1 Compiler Construction SMD163 Understanding Optimization: Lecture 11: Introduction to optimization Viktor Leijon & Peter Jonsson with slides by Johan Nordlander Contains material generously provided by Mark P. Jones 1 2 Goals of Optimization: Optimization is about improving the target programs that are generated by a compiler. Optimization is not Magic: Optimizing compilers are just a tool; they work with what they re given. In most cases, optimization has two principal goals:! Time: make programs run faster;! Space: make programs use less memory. No optimizing compiler can make up for poor choice of algorithms or data structures. Other applications of optimization include:! Adapting code to particular architectures; etc 3 4

2 Optimization is not Absolute: Some optimization techniques give clear wins in both time and space. Others may require us to trade one against the other. The priorities that we attach to different optimizations will depend on the application.! In embedded systems, memory is often limited, and speed may be less of an issue (e.g., a VCR).! In high performance systems, execution speed is critical (e.g., video games). Optimization is not Free: Some optimizations only apply in particular situations, and require time-consuming analysis of the program. Use of such optimizations is only justified for programs that will be run often or for a long time. Optimization is appropriate in the construction and testing of products before wide release/distribution. 5 6 Optimization is a Misnomer: A compiler writer s job will never be done; there are always opportunities for new optimizations. Proof: Suppose that there is an optimizing compiler Comp that can optimize any program to the shortest possible equivalent program. Then Comp will compile any program that goes into an infinite loop without any output to the following, easily recognizable loop: lab : jmp lab This is impossible, because a program that did this Terminology: Terms like program optimization and optimizing compiler are firmly established. But we cannot build a truly optimizing compiler! We will focus instead on techniques for improving programs. But, following common usage, we will still refer to each one as an optimization. would be able to solve the halting problem. 7 8

3 Optimization by Transformation: Optimizations are program transformations. Most apply only in particular circumstances. The effectiveness of an optimization depends on the program in which it is applied.! Optimization of a particular language feature will have no impact on a program that does not use it.! In some cases, an optimization may actually result in a slower or bigger program. 9 Correctness is Essential! In all cases, it is essential that optimization preserves meaning: the optimized program must have the same meaning/behavior as the original program. Such transformations are often described as being safe. Better safe than sorry... if an optimization isn t safe, you shouldn't use it! A slow program that gives the right answer is better than a program that gives the wrong answer quickly. 10 An Suppose that we have a loop: for (int i=0; i<n; i++) { x/y If we can ensure that the values of x and y do not change on each iteration, then we can optimize it to: z = x/y; for (int i=0; i<n; i++) { z This is an example of code motion. 11 Take Care! (part one) If N=0, then the optimized code will evaluate x/y once, but the original won t evaluate it at all! So this is only an optimization if we can be sure that the loop will be executed at least once. 12

4 Take Care! (part two) Caveat Optimizer: If N=0, then the optimized program might raise an divide by zero exception where the original runs without fault. So the optimization is applicable only if:! We know that y will never be zero; or! We know that the loop will always be executed at least once, and that there are no other observable effects in the code between the new and old positions of x/y. 13 Optimizations can be quite subtle, and may require detailed analysis of a program to determine:! whether they are applicable;! whether they are safe; and! whether they will actually improve the program. In general, these questions are undecidable. But we always have the option not to use a given optimization! 14 Combining Optimizations: The task of an optimizing compiler is to choose and apply an appropriate sequence of optimizations: o 1 o 2 o 3 o 4 o 5 o 6 o 7 P 0 P 1 P 2 P 3 P 4 P 5 P 6 P 7 Applying one optimization may create new opportunities to apply another; the order in which they are applied can make a difference. If intermediate steps preserve behavior, each program will be equivalent to the original. Controlling Optimization: Compilers often allow programmers to control the use of optimization techniques:! To set priorities (for example, time or space?);! To select/deselect particular kinds of optimization;! To limit the time or the number of steps that are spent in optimization

5 Optimization by Hand: Programmers sometimes have an opportunity to optimize their code by hand. Beware:! It s difficult to get right, it can obscure the code, it can make it less portable, and it can introduce bugs.! It s hard to compete with a good optimizing compiler. A Catalogue of Common Optimization Techniques: If performance is critical and you need to optimize by hand:! Wait until the program is almost finished;! Use a profiler to identify the hot spots An Overview: Dead Code Elimination: Compiler writers and researchers have discovered many different techniques for optimization. Unreachable code can be eliminated.! Code that follows a return, break, continue, or goto and has no label can be eliminated. For example, some of the most common optimization techniques try to remove:! Code that serves no useful purpose;! Code that repeats earlier computations;! Code that uses an inefficient method to calculate a value;! Code that carries an unnecessary overhead;! Etc 19 int f(int x) { int f(int x) { return x+1; return x+1;! Code that appears in functions that are never called can be eliminated. (This process is sometimes described as tree-shaking.) 20

6 Continued But be Careful! Code that has no effect can be eliminated.! An assignment to a variable that will not be used again can be eliminated. int f(int x) { int f(int x) { int temp = x*x; return x+1; return x+1; Items that have an effect cannot be eliminated:! An assignment to a variable that will not be used again can be eliminated. int f(int x) { int f(int x) { int temp = g(x); g(x); return x+1; return x+1;! An assignment to a variable that will be overwritten before it is used again can be eliminated. x = y; x = z; x = z; 21! An assignment to a variable that will be overwritten before it is used again can be eliminated. x = f1()+f2(); f1(); x = z; f2(); x = z; 22 Common-Subexpression Elimination: Copy and Constant Propagation: The results of computations can be shared rather than duplicated. An assignment of the form x = y; is called a copy instruction. x = a + b; x = a + b; y = a + b; y = x; x = y; x = y; z = x; z = y; x = (a+b)*(a+b); t = a + b; x = t * t; But beware side effects! x = 0; x = 0; z = x; z = 0; x = f(a)*f(a); t = f(a); x = t * t; What have we gained? Nothing directly, but if we manage to remove all references to x, then the first assignment will become dead code

7 Constant-Folding: Do not put off until run-time what you can do at compile-time. x = 2^8 1; x = 255; More generally: evaluate expressions involving only constant values at compile-time. Strength-Reduction: Replace expensive operations with cheaper, but equivalent ones. For example: x**2 = x * x 2 * x = x + x x / 2 = x >> 1 x * 16 = x << 4 x % 128 = x & Algebraic Identities: Standard algebraic identities can often be put to good use when only part of a program s data is known at compile-time. For example: x + 0 = x x 0 = x x x = 0 x * 1 = x x * 0 = 0 x / 1 = x Who writes x+0 in source code? How many programmers would actually write an expression like x + 0 in their source code? Are these optimizations of any use in practice? Yes!! Examples like this can occur in handwritten programs when symbolic constants are used.! Examples like this can show up in the code that we generate for other language constructs. For example, the address of a[0] is: a + 4*0 = a + 0 = a

8 Continued At first, some identities might not seem to have any significant applications: Examples include associativity: (x + y) + z = x + (y + z) and commutativity: Enabling Transformations: Use of associativity and commutativity laws can open up opportunities for other optimizations. a = b+c; a = b + c; t = (c+d)+b; t = (b+c) + d; x + y = y + x (But don t forget the role that commutativity played in our register allocator). a = b + c; t = a + d; Another Identities for Floating Point: Suppose that d is an array of Date objects: tag day month year Floating point numbers do not behave like real numbers floating point operators do not satisfy many useful laws.! Associativity? Now suppose that we want to access d[3].month; then we need to load the value at address: (d + 16*3) + 8 = d + (16*3 + 8) associativity = d constant folding 31 small+(big+(-big)) = small + 0 = small, but (small+big)+(-big) = big+(-big) = 0.! Additive Identities? NaN + 0 raises an exception, NaN does not.! Multiplicative zeroes? inf * 0 = NaN. 32

9 What the Language Permits : Even for integer values, algebraic identities can only be used within whatever scope the host language permits. For example, the definition of Fortran 77 states that the order of evaluation of expressions involving parentheses must respect the parentheses. (So Fortran 77 is a language in which parenthesized expressions would show up in the abstract syntax.) Removing Overhead: When we call a function, we spend some time constructing and then destroying the stack frame. When we execute a loop, we spend some time setting up and testing the loop variable. If the body of the function, or the body of the loop is small, then the overhead associated with either of these will be quite large in proportion Function Inlining: If we know the body of a function, we can use that instead of a call. For example, suppose we know int square(int x) { return x*x; Then we can rewrite a section of code: { square(square(x)) as: { int t1 = x*x; t2 = t1*t1; t2 Cautionary Notes: Inlining a large function many times can increase the size of the compiled program. Naïve inlining by copying text can increase the amount of work to be done. For example, changing: square(square(f(x))) to (f(x)*f(x)) * (f(x)*f(x)) will require 3 multiplications, not 2, and will duplicate any side-effect of f(x)

10 Loop Unrolling: For example, we can rewrite a section of code: for (int i=0; i<3; i++) { f(i); as: f(0); f(1); f(2); Typically produces more code But faster because we have eliminated a temporary variable, and all the operations on it. 37 Peephole Optimization: It is often possible to implement useful optimizations by looking for simple patterns in small sections of generated assembly code. Looking at the code through a peephole. To a large extent, the choice of peephole optimizations depends on the target machine. 38 Examples for IA 32: Summary: An instruction of the form addl $1,reg can be replaced by incl reg. An instruction of the form imul $2,reg can be replaced by addl reg,reg. In a sequence of instructions: movl reg,var movl var,reg the second instruction can be deleted, provided that it does not have a label. In a sequence of instructions: addl $4,%esp movl %ebp, %esp We have looked at: The basic goals and limits of optimization. A catalogue of standard optimization techniques:! Dead-code elimination;! Common subexpression elimination;! Constant and copy propagation;! Constant folding;! Strength reduction;! Algebraic identities;! Function inlining;! Loop unrolling;! Peephole optimization. Next: Putting these techniques into practice the first instruction is dead code

11 Source High, Target Low: Optimization using an Intermediate Language: Source code is often too high-level to reveal opportunities for optimization.! For example, the assignment a[i] = x + y requires an (implicit) calculation of the address of the array element a[i]. Target code is often too low-level to reveal opportunities for optimization.! For example, temporary values have already been assigned to registers and it is more difficult to identity repeated or redundant computations Code Generation Breaking Down Programs: Here s part of a MiniJava program and the code that we might generate from it: class C { int[] makearray(int n) { int[] a; a = new int[200]; a[3] = n; return a; There is an independent sequence C_makeArray: pushl %ebp movl %esp,%ebp subl $4,%esp pushl $800 call _malloc addl $4,%esp movl %eax,-4(%ebp) movl 8(%ebp),%eax movl -4(%ebp),%ebx movl %eax,12(%ebx) movl -4(%ebp),%eax movl %ebp,%esp popl %ebp The constructs of a language, and the nature of the problem that is being solved, will lead a programmer to break down a program into a particular sequence of tasks. The output from a compiler is supposed to execute the same sequence of tasks. There is no reason, however, for it to use exactly the same break down as the programmer. ret of instructions for each statement 43 44

12 Code Generation The same program broken down into basic blocks class C { int[] makearray(int n) { int[] a; a = new int[200]; a[3] = n; return a; The correspondence is less direct but there are more opportunities for optimization. C_makeArray: pushl %ebp movl %esp,%ebp subl $4,%esp pushl $800 call _malloc addl $4,%esp movl %eax,-4(%ebp) movl 8(%ebp),%eax movl -4(%ebp),%ebx movl %eax,12(%ebx) movl -4(%ebp),%eax movl %ebp,%esp popl %ebp ret Intermediate Code: Intermediate code provides a compromise between the extremes of source and target code. It aims to be:! Sufficiently low-level to capture single steps in the program.! Sufficiently high-level to avoid machine dependencies, and premature code generation. Intermediate codes are usually some kind of idealized machine code. As a useful side-benefit, intermediate code provides a degree of portability (e.g., RTL in gcc, Java bytecodes and the JVM). But working at the level of 386 assembly code is difficult! A High-Level View: Flat input UNCOL The Search Goes On: UNCOL is an old (1958), and as yet unrealized dream of compiler writers: Structure Intermediate Code Flat output Optimizations 47 UNiversal Computer Oriented Language It was hoped that a universal intermediate code could serve as a meeting point for all languages. One front end for each language, one backend for each machine, and smooth interoperability between languages. But no satisfactory UNCOL has been found yet; The range of programming languages is very diverse! There have been numerous attempts, some ongoing: ANDF, C, JVM, UVM, C--, 48

13 Three-Address Code: For Three-address code: a simple UNCOL? Three address code is primarily a sequence of statements of the general form: x := y op z where x, y and z are names, constants or compiler generated temporaries. For example, evaluation of x+y*z becomes: t1 := y * z t2 := x + t1 49 In three address code, the statement a[i] = a[i]+a[j] can be expressed as: t1 := 4 * i t2 := a[t1] t3 := 4 * j t4 := a[t3] t5 := t2 + t4 t6 := 4 * i a[t6] := t5 The calculation of 4*i is duplicated! 50 Intermediate Code as Trees: Three address code is really just a linear representation of the syntax tree for the intermediate code. x t2:+ y t1:* z Quads: The statement a = b * (-c) + b * (-c) is represented by the following three-address code: 0) t1 := -c (uminus, t1, c, _) 1) t2 := b * t1 (mult, t2, b, t1) 2) t3 := -c (uminus, t3, b, _) 3) t4 := b * t3 (mult, t4, b, t3) 4) t5 := t2 + t4 (add, t5, t2, t4) 5) a := t5 (save, a, t5, _) Other forms of instruction, for example, goto, unary operations, conditional branches, etc. are also used in practice. 51 In practice, three address code is often represented or described by quadruples (quads), of the form (op, dest, arg1, arg2). 52

14 Finding Basic Blocks: A basic block is a sequence of statements where control only enters at the first statement and leaves at the last. To partition a sequence of instructions into basic blocks, start by identifying the leaders:! The first statement is a leader.! An statement that is the target of a call or a goto (conditional or unconditional) is a leader.! Any statement that immediately follows a call or a goto is a leader. An Consider the following implementation of quicksort: void quicksort(int m, int n) { if (m<n) { int i = m - 1, j = n, p = a[n], t; while (1) { do { i = i + 1; while (a[i] < p); do { j = j - 1; while (a[j] > p); if (i >= j) break; t = a[i]; a[i] = a[j]; a[j] = t; t = a[i]; a[i] = a[n]; a[n] = t; // sorts global array a For each leader, there is a basic block consisting of the leader and all following statements up to, but not including, the next leader or the end of the program. 53 quicksort(m,j); quicksort(i+1,n); We will focus on optimizing the highlighted section. 54 In Three-Address code Flow Graphs: Translation of the core of quicksort into 3-address code: 1) 16) t7 := 4*i 2) 17) t8 := 4*j 3) 18) t9 := a[t8] 4) 19) a[t7] := t9 5) 20) t10 := 4*j 6) 21) a[t10] := t 7) 22) goto 5 8) if t3<p goto 5 23) t11 := 4*i 9) 24) t := a[t11] 10) 25) t12 := 4*i 11) 26) t13 := 4*n 12) if t5>p goto 9 27) t14 := a[t13] 13) if i>=j goto 23 28) a[t12] := t14 14) t6 := 4*i 29) t15 := 4*n 15) t := a[t6] 30) a[t15] := t There are six basic blocks: 1 4, 5 8, 9 12, 13, 14 22, We add directed edges between basic blocks to capture control flow. One basic block is distinguished as initial; it contains the first statement to be executed For any pair of basic blocks B 1 and B 2, there is an edge from B 1 to B 2 if B 2 can directly follow B 1 in some execution sequence. 56

15 t6 := 4*i t := a[t6] t7 := 4*i t8 := 4*j t9 := a[t8] a[t7] := t9 t10 := 4*j a[t10] := t goto if t3<p goto if t5>p goto t11 := 4*i t := a[t11] t12 := 4*i t13 := 4*n t14 := a[t13] a[t12] := t14 t15 := 4*n a[t15] := t 57 Program = Flow Graph + Blocks: if t3<p goto if t5>p goto t6 := 4*i t := a[t6] t7 := 4*i t8 := 4*j t9 := a[t8] a[t7] := t9 t10 := 4*j a[t10] := t goto t11 := 4*i t := a[t11] t12 := 4*i t13 := 4*n t14 := a[t13] a[t12] := t14 t15 := 4*n a[t15] := t 58 Optimizing Basic Block Code: Common Subexpression Elimination: Our goal is to optimize programs expressed in this format by transforming their basic blocks. In general, there are two kinds of transformation that we might want to use:! Local transformations, which can be applied to individual basic blocks, regardless of where they appear in the flowgraph.! Global transformations, which typically make use of information about larger sections of the flowgraph. Many transformations can be performed at both the local and global levels. Local transformations are usually performed first. 59 Consider the basic block: 1) a := b + c 2) b := a - d 3) c := b + c 4) d := a - d The second and fourth statements calculate the same value, so this block can be rewritten as: 1) a := b + c 2) b := a - d 3) c := b + c 4) d := b Note that, even though the same expression, b + c, appears on the right of both the first and third lines, it does not have the same value in each case. 60

16 A More Careful Analysis: To see this more formally, consider the following, annotating each variable on the right with the number of the step where it was last defined: if t3<p goto a b c d ) a := b + c a1 := b0 + c ) b := a - d b2 := a1 - d ) c := b + c c3 := b2 + c ) d := a - d d4 := a1 - d t6 := 4*i t := a[t6] t7 := 4*i t8 := 4*j t9 := a[t8] a[t7] := t9 t10 := 4*j a[t10] := t goto if t5>p goto t11 := 4*i t := a[t11] t12 := 4*i t13 := 4*n t14 := a[t13] a[t12] := t14 t15 := 4*n a[t15] := t 62 Local CSE on blocks and Local CSE on blocks and if t3<p goto BEFORE if t3<p goto AFTER t6 := 4*i t := a[t6] t7 := 4*i t8 := 4*j t9 := a[t8] a[t7] := t9 t10 := 4*j a[t10] := t goto if t5>p goto t11 := 4*i t := a[t11] t12 := 4*i t13 := 4*n t14 := a[t13] a[t12] := t14 t15 := 4*n a[t15] := t 63 t6 := 4*i t := a[t6] t7 := t6 t8 := 4*j t9 := a[t8] a[t7] := t9 t10 := t8 a[t10] := t goto if t5>p goto t11 := 4*i t := a[t11] t12 := t11 t13 := 4*n t14 := a[t13] a[t12] := t14 t15 := t13 a[t15] := t 64

17 Copy Propagation for t7, t10, t12, t15 Copy Propagation for t7, t10, t12, t15 if t3<p goto BEFORE if t3<p goto AFTER t6 := 4*i t := a[t6] t7 := t6 t8 := 4*j t9 := a[t8] a[t7] := t9 t10 := t8 a[t10] := t goto if t5>p goto t11 := 4*i t := a[t11] t12 := t11 t13 := 4*n t14 := a[t13] a[t12] := t14 t15 := t13 a[t15] := t 65 t6 := 4*i t := a[t6] t7 := t6 t8 := 4*j t9 := a[t8] a[t6] := t9 t10 := t8 a[t8] := t goto if t5>p goto t11 := 4*i t := a[t11] t12 := t11 t13 := 4*n t14 := a[t13] a[t11] := t14 t15 := t13 a[t13] := t 66 Dead Code elimination for t7, t10, t12, t15 Dead Code elimination for t7, t10, t12, t15 if t3<p goto BEFORE if t3<p goto AFTER t6 := 4*i t := a[t6] t7 := t6 t8 := 4*j t9 := a[t8] a[t6] := t9 t10 := t8 a[t8] := t goto if t5>p goto t11 := 4*i t := a[t11] t12 := t11 t13 := 4*n t14 := a[t13] a[t11] := t14 t15 := t13 a[t13] := t 67 t6 := 4*i t := a[t6] t8 := 4*j t9 := a[t8] a[t6] := t9 a[t8] := t goto if t5>p goto t11 := 4*i t := a[t11] t13 := 4*n t14 := a[t13] a[t11] := t14 a[t13] := t 68

18 Global CSE: Global CSE: if t3<p goto BEFORE if t3<p goto AFTER t6 := 4*i t := a[t6] if t5>p goto t11 := 4*i t := a[t11] t6 := t2 t := a[t6] if t5>p goto t11 := t2 t := a[t11] t8 := 4*j t9 := a[t8] a[t6] := t9 a[t8] := t goto t13 := 4*n t14 := a[t13] a[t11] := t14 a[t13] := t 69 t8 := t4 t9 := a[t8] a[t6] := t9 a[t8] := t goto t13 := t1 t14 := a[t13] a[t11] := t14 a[t13] := t 70 Copy Propagation on t6, t8, t11, t13 Copy Propagation on t6, t8, t11, t13 if t3<p goto BEFORE if t3<p goto AFTER t6 := t2 t := a[t6] if t5>p goto t11 := t2 t := a[t11] t6 := t2 t := a[t2] if t5>p goto t11 := t2 t := a[t2] t8 := t4 t9 := a[t8] a[t6] := t9 a[t8] := t goto t13 := t1 t14 := a[t13] a[t11] := t14 a[t13] := t 71 t8 := t4 t9 := a[t4] a[t2] := t9 a[t4] := t goto t13 := t1 t14 := a[t1] a[t2] := t14 a[t1] := t 72

19 Dead Code Elimination on t6, t8, t11, t13 Dead Code Elimination on t6, t8, t11, t13 if t3<p goto BEFORE if t3<p goto AFTER t6 := t2 t := a[t2] if t5>p goto t11 := t2 t := a[t2] t := a[t2] if t5>p goto t := a[t2] t8 := t4 t9 := a[t4] a[t2] := t9 a[t4] := t goto t13 := t1 t14 := a[t1] a[t2] := t14 a[t1] := t 73 t9 := a[t4] a[t2] := t9 a[t4] := t goto t14 := a[t1] a[t2] := t14 a[t1] := t 74 Global CSE Global CSE if t3<p goto BEFORE if t3<p goto AFTER t := a[t2] if t5>p goto t := a[t2] t := t3 if t5>p goto t := t3 t9 := a[t4] a[t2] := t9 a[t4] := t goto t14 := a[t1] a[t2] := t14 a[t1] := t 75 t9 := t5 a[t2] := t9 a[t4] := t goto t14 := p a[t2] := t14 a[t1] := t 76

20 Copy Propagation on t, t9, t14 Copy Propagation on t, t9, t14 if t3<p goto BEFORE if t3<p goto AFTER t := t3 if t5>p goto t := t3 t := t3 if t5>p goto t := t3 t9 := t5 a[t2] := t9 a[t4] := t goto t14 := p a[t2] := t14 a[t1] := t 77 t9 := t5 a[t2] := t5 a[t4] := t3 goto t14 := p a[t2] := p a[t1] := t3 78 Dead Code Elimination on t, t9, t14 Dead Code Elimination on t, t9, t14 if t3<p goto BEFORE if t3<p goto AFTER t := t3 if t5>p goto t := t3 if t5>p goto t9 := t5 a[t2] := t5 a[t4] := t3 goto t14 := p a[t2] := p a[t1] := t3 79 a[t2] := t5 a[t4] := t3 goto a[t2] := p a[t1] := t3 80

21 Summary: A fairly complex process, but described using simple steps, that are sequenced and repeated until we get a good result. What more can we do? Where should we focus our efforts? Loop Optimization: Loops are an obvious source of repeated computation, and good candidates for optimization. The most important loop optimizations are: 1) Code motion: move loop-invariant code outside the loop. 2) Strength reduction: replacing expensive operations with cheaper ones. 81 3) Induction variables: recognize relationships between the values of variables on each pass through the body of a loop. 82 1) Code Motion: If we can decrease the amount of code in the body of a loop, then we can also decrease the execution time for each iteration of the loop. Avoid duplicated computation and you duplicate the savings instead! We need to find loop invariants: expressions that are guaranteed to have the same value on each pass through the loop. 83 Loop Invariants: Start by looking for variables that do not change then extend to expressions. For example, if x and y are invariant, then so are x+y and x*y. Use algebraic identities to increase the size of the loop invariant expressions. For example by rewriting (x-z)+y as (x+y)-z, we might be able to extract x+y as an invariant expression. But beware of aliasing! Expression a[i] is not necessarily invariant just because a and i are. 84

22 Moving the Code: When we move the code, we might need to introduce new temporaries. For example, the C/C++ loop: for (i = 0; i < n * n; i++)... code which doesn't change n... Can be rewritten as: t1 = n * n; for (i = 0; i < t1; i++)... code which doesn't change n... because the expression n*n is a loop invariant. 85 2) Strength Reduction: Instead of using expensive multiplications, we can obtain the same results using simple shifts, if one of the operands is a constant power of 2. For example, in the quicksort code, we can replace t2 := 4*i and with the cheaper operations t2 := i << 2 and t4 := j << 2. The same idea can be used to simplify uses of other expensive operations, including / and %. Strength reduction is however a much more general technique, and more opportunities can be revealed if we are able to identify some induction variables. 86 3) Induction Variables: An induction variable takes values in some arithmetic progression as we step through a loop. Variables that are used to control a loop often show this behavior: for (int j = 10; j<20; j++) And it can happen to other variables in the loop too for (int j = 10; j<20; j++) { int jthodd = 2*j + 1; 87 Induction Variables: In the quicksort example, notice that: Every time that i increases by 1, the value of t2 = 4*i increases by 4. Every time that j decreases by 1, the value of t4 = 4*j decreases by 4. As a result: Every time that i increases by 1, the address of a[t2] increases by 4. Every time that j decreases by 1, the address of a[t4] decreases by 4. 88

23 Another Strength Reduction: Instead of using expensive multiplications, why not initialize (and ) at the beginning of the loop and then increment t2 (and decrement t4) on each iteration...? Before: if t3<p goto This is where we had got to previously This is another form of strength reduction. if t5>p goto 89 a[t2] := t5 a[t4] := t3 goto a[t2] := p a[t1] := t3 90 After: t2 := 4 * i t4 := 4 * j t2 := t2 + 4 if t3<p goto Induction Variables t2 and t4, and strength reduction: More Strength Reduction: Consider the following loop: for (int i = 0; i<n; i++) { t = t + (i*i); t4 := t4-4 if t5>p goto This code does N multiplications, and N additions. Multiplies are expensive: can we eliminate them? a[t2] := t5 a[t4] := t3 goto a[t2] := p a[t1] := t

24 Identifying Induction Variables: Differences of Differences: As i takes on values 0, 1, 2, 3, 4, 5, 6, so i*i takes values 0, 1, 4, 9, 16, 25, 36, We ve seen that the difference from one value of (i*i) to the next is d = 2*i + 1. Note that: (i+1)*(i+1) = i*i + 2*i + 1 So we could rewrite the loop as: int u = 0; for (int i = 0; i<n; i++) { t = t + u; u = u + 2*i + 1; Can be reduced to a shift. 93 What is the difference from one value of d to the next? So now we can rewrite the loop again as: int u = 0; int d = 1; for (int i = 0; i<n; i++) { t = t + u; u = u + d; d = d + 2; Longer, but each multiply has been replaced by two adds! 94 Choosing an Intermediate Code: Our intermediate code separates out the process of accessing an array element into two stages:! Multiply the index by 4;! Look up the value in the array. On a 386, we can use a single instruction to accomplish the same thing (and it s fast):! movl a(,%eax,4), %eax Doesn t our choice of intermediate code force us to use the less efficient version?! imull $4, %eax; movl a(%eax), %eax 95 On the other hand By including indexed addressing, perhaps we ve chosen an intermediate code that is too highlevel. Suppose that we used an intermediate code that has only a simple load instruction x := [y] and in which the address calculation must be done explicitly 96

25 Before: [u2] := t5 [u4] := t3 goto u1 := a + t1 p := [u1] u2 := a + t2 t3 := [u2] if t3<p goto u4 := a + t4 t5 := [u4] if t5>p goto This is where we would have got to previously if we d had only load instructions: v := [u] and save instructions: [v] := u. [u2] := p [u1] := t3 After: [u2] := t5 [u4] := t3 goto u1 := a + t1 u2 := a + t2 u4 := u1 p := [u1] u2 := u2 + 4 t3 := [u2] if t3<p goto u4 := u4-4 t5 := [u4] if t5>p goto Now we ve used an optimization based on the observation that the addresses of a[i] and a[j] are induction variables! [u2] := p [u1] := t The Right Level of Abstraction? Designing an intermediate code is hard because it is difficult to get the right level of abstraction:! Too high-level, and you will hide opportunities for optimization! Too low level, and it will be harder to utilize advanced target instructions and addressing modes. Simple loads and saves only Indexed loads and saves Increasing level of abstraction Indexed and scaled loads and saves 99 Instruction Selection Finding the best match between an intermediate program and the available machine instructions is the act of instruction selection. Together with register allocation, instruction selection constitutes the proper code generation phase in a compiler that uses an intermediate representation. If the intermediate language is low level, instruction selection might involve mapping several intermediate operations onto a single target machine instruction. More on instruction selection next week. 100

26 Summary: In this lecture we have seen: How optimization techniques can be used in practice. The role of intermediate code, illustrated by three-address code. Using flow graphs to capture the control flow in a particular program. Using basic blocks to coalesce the effects of multiple program statements into a single unit. 101

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation Compiler Construction SMD163 Lecture 8: Introduction to code generation Viktor Leijon & Peter Jonsson with slides by Johan Nordlander Contains material generously provided by Mark P. Jones What is a Compiler?

More information

Compiler Construction D7011E

Compiler Construction D7011E Compiler Construction D7011E Lecture 8: Introduction to code generation Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1 What is a Compiler? Compilers

More information

Machine-Independent Optimizations

Machine-Independent Optimizations Chapter 9 Machine-Independent Optimizations High-level language constructs can introduce substantial run-time overhead if we naively translate each construct independently into machine code. This chapter

More information

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker! 7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker! Roadmap > Introduction! > Optimizations in the Back-end! > The Optimizer! > SSA Optimizations! > Advanced Optimizations! 2 Literature!

More information

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space Code optimization Have we achieved optimal code? Impossible to answer! We make improvements to the code Aim: faster code and/or less space Types of optimization machine-independent In source code or internal

More information

What the CPU Sees Basic Flow Control Conditional Flow Control Structured Flow Control Functions and Scope. C Flow Control.

What the CPU Sees Basic Flow Control Conditional Flow Control Structured Flow Control Functions and Scope. C Flow Control. C Flow Control David Chisnall February 1, 2011 Outline What the CPU Sees Basic Flow Control Conditional Flow Control Structured Flow Control Functions and Scope Disclaimer! These slides contain a lot of

More information

Code Generation. Lecture 30

Code Generation. Lecture 30 Code Generation Lecture 30 (based on slides by R. Bodik) 11/14/06 Prof. Hilfinger CS164 Lecture 30 1 Lecture Outline Stack machines The MIPS assembly language The x86 assembly language A simple source

More information

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013 A Bad Name Optimization is the process by which we turn a program into a better one, for some definition of better. CS 2210: Optimization This is impossible in the general case. For instance, a fully optimizing

More information

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation Comp 204: Computer Systems and Their Implementation Lecture 22: Code Generation and Optimisation 1 Today Code generation Three address code Code optimisation Techniques Classification of optimisations

More information

Compiler Optimization Techniques

Compiler Optimization Techniques Compiler Optimization Techniques Department of Computer Science, Faculty of ICT February 5, 2014 Introduction Code optimisations usually involve the replacement (transformation) of code from one sequence

More information

Lecture Outline. Code Generation. Lecture 30. Example of a Stack Machine Program. Stack Machines

Lecture Outline. Code Generation. Lecture 30. Example of a Stack Machine Program. Stack Machines Lecture Outline Code Generation Lecture 30 (based on slides by R. Bodik) Stack machines The MIPS assembly language The x86 assembly language A simple source language Stack-machine implementation of the

More information

Tour of common optimizations

Tour of common optimizations Tour of common optimizations Simple example foo(z) { x := 3 + 6; y := x 5 return z * y } Simple example foo(z) { x := 3 + 6; y := x 5; return z * y } x:=9; Applying Constant Folding Simple example foo(z)

More information

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis Synthesis of output program (back-end) Intermediate Code Generation Optimization Before and after generating machine

More information

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation CSE 501: Compiler Construction Course outline Main focus: program analysis and transformation how to represent programs? how to analyze programs? what to analyze? how to transform programs? what transformations

More information

CIT Week13 Lecture

CIT Week13 Lecture CIT 3136 - Week13 Lecture Runtime Environments During execution, allocation must be maintained by the generated code that is compatible with the scope and lifetime rules of the language. Typically there

More information

What Do Compilers Do? How Can the Compiler Improve Performance? What Do We Mean By Optimization?

What Do Compilers Do? How Can the Compiler Improve Performance? What Do We Mean By Optimization? What Do Compilers Do? Lecture 1 Introduction I What would you get out of this course? II Structure of a Compiler III Optimization Example Reference: Muchnick 1.3-1.5 1. Translate one language into another

More information

CODE GENERATION Monday, May 31, 2010

CODE GENERATION Monday, May 31, 2010 CODE GENERATION memory management returned value actual parameters commonly placed in registers (when possible) optional control link optional access link saved machine status local data temporaries A.R.

More information

COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013

COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013 1 COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013 Lecture Outline 1. Code optimization strategies 2. Peephole optimization 3. Common subexpression elimination

More information

Machine Programming 1: Introduction

Machine Programming 1: Introduction Machine Programming 1: Introduction CS61, Lecture 3 Prof. Stephen Chong September 8, 2011 Announcements (1/2) Assignment 1 due Tuesday Please fill in survey by 5pm today! Assignment 2 will be released

More information

CS 2505 Computer Organization I

CS 2505 Computer Organization I Instructions: Print your name in the space provided below. This examination is closed book and closed notes, aside from the permitted one-page formula sheet. No calculators or other computing devices may

More information

Assembly Language: Function Calls" Goals of this Lecture"

Assembly Language: Function Calls Goals of this Lecture Assembly Language: Function Calls" 1 Goals of this Lecture" Help you learn:" Function call problems:" Calling and returning" Passing parameters" Storing local variables" Handling registers without interference"

More information

Introduction to Code Optimization. Lecture 36: Local Optimization. Basic Blocks. Basic-Block Example

Introduction to Code Optimization. Lecture 36: Local Optimization. Basic Blocks. Basic-Block Example Lecture 36: Local Optimization [Adapted from notes by R. Bodik and G. Necula] Introduction to Code Optimization Code optimization is the usual term, but is grossly misnamed, since code produced by optimizers

More information

Optimization Prof. James L. Frankel Harvard University

Optimization Prof. James L. Frankel Harvard University Optimization Prof. James L. Frankel Harvard University Version of 4:24 PM 1-May-2018 Copyright 2018, 2016, 2015 James L. Frankel. All rights reserved. Reasons to Optimize Reduce execution time Reduce memory

More information

Assembly Language: Function Calls" Goals of this Lecture"

Assembly Language: Function Calls Goals of this Lecture Assembly Language: Function Calls" 1 Goals of this Lecture" Help you learn:" Function call problems:" Calling and urning" Passing parameters" Storing local variables" Handling registers without interference"

More information

Compiler construction 2009

Compiler construction 2009 Compiler construction 2009 Lecture 3 JVM and optimization. A first look at optimization: Peephole optimization. A simple example A Java class public class A { public static int f (int x) { int r = 3; int

More information

Compiler Design and Construction Optimization

Compiler Design and Construction Optimization Compiler Design and Construction Optimization Generating Code via Macro Expansion Macroexpand each IR tuple or subtree A := B+C; D := A * C; lw $t0, B, lw $t1, C, add $t2, $t0, $t1 sw $t2, A lw $t0, A

More information

Assembly Language: Function Calls

Assembly Language: Function Calls Assembly Language: Function Calls 1 Goals of this Lecture Help you learn: Function call problems: Calling and returning Passing parameters Storing local variables Handling registers without interference

More information

Running class Timing on Java HotSpot VM, 1

Running class Timing on Java HotSpot VM, 1 Compiler construction 2009 Lecture 3. A first look at optimization: Peephole optimization. A simple example A Java class public class A { public static int f (int x) { int r = 3; int s = r + 5; return

More information

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall http://www.cse.buffalo.edu/faculty/alphonce/sp17/cse443/index.php https://piazza.com/class/iybn4ndqa1s3ei Announcements Grading survey

More information

Compiler construction in4303 lecture 9

Compiler construction in4303 lecture 9 Compiler construction in4303 lecture 9 Code generation Chapter 4.2.5, 4.2.7, 4.2.11 4.3 Overview Code generation for basic blocks instruction selection:[burs] register allocation: graph coloring instruction

More information

Goals of Program Optimization (1 of 2)

Goals of Program Optimization (1 of 2) Goals of Program Optimization (1 of 2) Goal: Improve program performance within some constraints Ask Three Key Questions for Every Optimization 1. Is it legal? 2. Is it profitable? 3. Is it compile-time

More information

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions? administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions? exam on Wednesday today s material not on the exam 1 Assembly Assembly is programming

More information

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code Traditional Three-pass Compiler Source Code Front End IR Middle End IR Back End Machine code Errors Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce

More information

CSE P 501 Exam 8/5/04 Sample Solution. 1. (10 points) Write a regular expression or regular expressions that generate the following sets of strings.

CSE P 501 Exam 8/5/04 Sample Solution. 1. (10 points) Write a regular expression or regular expressions that generate the following sets of strings. 1. (10 points) Write a regular ression or regular ressions that generate the following sets of strings. (a) (5 points) All strings containing a s, b s, and c s with at least one a and at least one b. [abc]*a[abc]*b[abc]*

More information

CS153: Compilers Lecture 15: Local Optimization

CS153: Compilers Lecture 15: Local Optimization CS153: Compilers Lecture 15: Local Optimization Stephen Chong https://www.seas.harvard.edu/courses/cs153 Announcements Project 4 out Due Thursday Oct 25 (2 days) Project 5 out Due Tuesday Nov 13 (21 days)

More information

Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction

Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction E I P CPU isters Condition Codes Addresses Data Instructions Memory Object Code Program Data OS Data Topics Assembly Programmer

More information

Assembly I: Basic Operations. Computer Systems Laboratory Sungkyunkwan University

Assembly I: Basic Operations. Computer Systems Laboratory Sungkyunkwan University Assembly I: Basic Operations Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moving Data (1) Moving data: movl source, dest Move 4-byte ( long )

More information

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction Group B Assignment 8 Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Code optimization using DAG. 8.1.1 Problem Definition: Code optimization using DAG. 8.1.2 Perquisite: Lex, Yacc, Compiler

More information

Code Optimization April 6, 2000

Code Optimization April 6, 2000 15-213 Code Optimization April 6, 2000 Topics Machine-Independent Optimizations Code motion Reduction in strength Common subexpression sharing Machine-Dependent Optimizations Pointer code Unrolling Enabling

More information

Lecture #16: Introduction to Runtime Organization. Last modified: Fri Mar 19 00:17: CS164: Lecture #16 1

Lecture #16: Introduction to Runtime Organization. Last modified: Fri Mar 19 00:17: CS164: Lecture #16 1 Lecture #16: Introduction to Runtime Organization Last modified: Fri Mar 19 00:17:19 2010 CS164: Lecture #16 1 Status Lexical analysis Produces tokens Detects & eliminates illegal tokens Parsing Produces

More information

CS 403 Compiler Construction Lecture 10 Code Optimization [Based on Chapter 8.5, 9.1 of Aho2]

CS 403 Compiler Construction Lecture 10 Code Optimization [Based on Chapter 8.5, 9.1 of Aho2] CS 403 Compiler Construction Lecture 10 Code Optimization [Based on Chapter 8.5, 9.1 of Aho2] 1 his Lecture 2 1 Remember: Phases of a Compiler his lecture: Code Optimization means floating point 3 What

More information

Office Hours: Mon/Wed 3:30-4:30 GDC Office Hours: Tue 3:30-4:30 Thu 3:30-4:30 GDC 5.

Office Hours: Mon/Wed 3:30-4:30 GDC Office Hours: Tue 3:30-4:30 Thu 3:30-4:30 GDC 5. CS380C Compilers Instructor: TA: lin@cs.utexas.edu Office Hours: Mon/Wed 3:30-4:30 GDC 5.512 Jia Chen jchen@cs.utexas.edu Office Hours: Tue 3:30-4:30 Thu 3:30-4:30 GDC 5.440 January 21, 2015 Introduction

More information

Lecture 11 Code Optimization I: Machine Independent Optimizations. Optimizing Compilers. Limitations of Optimizing Compilers

Lecture 11 Code Optimization I: Machine Independent Optimizations. Optimizing Compilers. Limitations of Optimizing Compilers Lecture 11 Code Optimization I: Machine Independent Optimizations Topics Machine-Independent Optimizations Code motion Reduction in strength Common subexpression sharing Tuning Identifying performance

More information

Code Optimization. What is code optimization?

Code Optimization. What is code optimization? Code Optimization Introduction What is code optimization Processor development Memory development Software design Algorithmic complexity What to optimize How much can we win 1 What is code optimization?

More information

The Hardware/Software Interface CSE351 Spring 2013

The Hardware/Software Interface CSE351 Spring 2013 The Hardware/Software Interface CSE351 Spring 2013 x86 Programming II 2 Today s Topics: control flow Condition codes Conditional and unconditional branches Loops 3 Conditionals and Control Flow A conditional

More information

Intermediate Code & Local Optimizations

Intermediate Code & Local Optimizations Lecture Outline Intermediate Code & Local Optimizations Intermediate code Local optimizations Compiler Design I (2011) 2 Code Generation Summary We have so far discussed Runtime organization Simple stack

More information

Programming Language Implementation

Programming Language Implementation A Practical Introduction to Programming Language Implementation 2014: Week 12 Optimisation College of Information Science and Engineering Ritsumeikan University 1 review of last week s topics why primitives

More information

Second Part of the Course

Second Part of the Course CSC 2400: Computer Systems Towards the Hardware 1 Second Part of the Course Toward the hardware High-level language (C) assembly language machine language (IA-32) 2 High-Level Language g Make programming

More information

Assembly Language: Function Calls. Goals of this Lecture. Function Call Problems

Assembly Language: Function Calls. Goals of this Lecture. Function Call Problems Assembly Language: Function Calls 1 Goals of this Lecture Help you learn: Function call problems: Calling and urning Passing parameters Storing local variables Handling registers without interference Returning

More information

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator CS 412/413 Introduction to Compilers and Translators Andrew Myers Cornell University Administration Design reports due Friday Current demo schedule on web page send mail with preferred times if you haven

More information

CS , Fall 2004 Exam 1

CS , Fall 2004 Exam 1 Andrew login ID: Full Name: CS 15-213, Fall 2004 Exam 1 Tuesday October 12, 2004 Instructions: Make sure that your exam is not missing any sheets, then write your full name and Andrew login ID on the front.

More information

Introduction to Optimization Local Value Numbering

Introduction to Optimization Local Value Numbering COMP 506 Rice University Spring 2018 Introduction to Optimization Local Value Numbering source IR IR target code Front End Optimizer Back End code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights

More information

Control flow graphs and loop optimizations. Thursday, October 24, 13

Control flow graphs and loop optimizations. Thursday, October 24, 13 Control flow graphs and loop optimizations Agenda Building control flow graphs Low level loop optimizations Code motion Strength reduction Unrolling High level loop optimizations Loop fusion Loop interchange

More information

Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p

Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p text C program (p1.c p2.c) Compiler (gcc -S) text Asm

More information

Compiler Optimization

Compiler Optimization Compiler Optimization The compiler translates programs written in a high-level language to assembly language code Assembly language code is translated to object code by an assembler Object code modules

More information

8 Optimisation. 8.2 Machine-Independent Optimisation

8 Optimisation. 8.2 Machine-Independent Optimisation 8 8.2 Machine-Independent 8.2.4 Replacing binary with Unary operations Replacing binary with Unary operators The following operations do not produce redundant quads: a=c-d; 1. (-, c, d, t1) b=d-c; => 2.

More information

Computer Systems Architecture I. CSE 560M Lecture 3 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 3 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 3 Prof. Patrick Crowley Plan for Today Announcements Readings are extremely important! No class meeting next Monday Questions Commentaries A few remaining

More information

Code Optimization September 27, 2001

Code Optimization September 27, 2001 15-213 Code Optimization September 27, 2001 Topics Machine-Independent Optimizations Code motion Reduction in strength Common subexpression sharing Tuning Identifying performance bottlenecks Great Reality

More information

CSC D70: Compiler Optimization

CSC D70: Compiler Optimization CSC D70: Compiler Optimization Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry and Phillip Gibbons CSC D70: Compiler Optimization

More information

AS08-C++ and Assembly Calling and Returning. CS220 Logic Design AS08-C++ and Assembly. AS08-C++ and Assembly Calling Conventions

AS08-C++ and Assembly Calling and Returning. CS220 Logic Design AS08-C++ and Assembly. AS08-C++ and Assembly Calling Conventions CS220 Logic Design Outline Calling Conventions Multi-module Programs 1 Calling and Returning We have already seen how the call instruction is used to execute a subprogram. call pushes the address of the

More information

CHAPTER 3. Register allocation

CHAPTER 3. Register allocation CHAPTER 3 Register allocation In chapter 1 we simplified the generation of x86 assembly by placing all variables on the stack. We can improve the performance of the generated code considerably if we instead

More information

CS 2505 Computer Organization I Test 2. Do not start the test until instructed to do so! printed

CS 2505 Computer Organization I Test 2. Do not start the test until instructed to do so! printed Instructions: Print your name in the space provided below. This examination is closed book and closed notes, aside from the permitted one-page formula sheet. No calculators or other electronic devices

More information

Assembly I: Basic Operations. Jo, Heeseung

Assembly I: Basic Operations. Jo, Heeseung Assembly I: Basic Operations Jo, Heeseung Moving Data (1) Moving data: movl source, dest Move 4-byte ("long") word Lots of these in typical code Operand types Immediate: constant integer data - Like C

More information

CS 2505 Computer Organization I Test 2. Do not start the test until instructed to do so! printed

CS 2505 Computer Organization I Test 2. Do not start the test until instructed to do so! printed Instructions: Print your name in the space provided below. This examination is closed book and closed notes, aside from the permitted one-page formula sheet. No calculators or other electronic devices

More information

Great Reality #4. Code Optimization September 27, Optimizing Compilers. Limitations of Optimizing Compilers

Great Reality #4. Code Optimization September 27, Optimizing Compilers. Limitations of Optimizing Compilers 15-213 Code Optimization September 27, 2001 Topics Machine-Independent Optimizations Code motion Reduction in strength Common subexpression sharing Tuning Identifying performance bottlenecks Great Reality

More information

Introduction to Computer Systems. Exam 1. February 22, Model Solution fp

Introduction to Computer Systems. Exam 1. February 22, Model Solution fp 15-213 Introduction to Computer Systems Exam 1 February 22, 2005 Name: Andrew User ID: Recitation Section: Model Solution fp This is an open-book exam. Notes are permitted, but not computers. Write your

More information

Compilation /15a Lecture 7. Activation Records Noam Rinetzky

Compilation /15a Lecture 7. Activation Records Noam Rinetzky Compilation 0368-3133 2014/15a Lecture 7 Activation Records Noam Rinetzky 1 Code generation for procedure calls (+ a few words on the runtime system) 2 Code generation for procedure calls Compile time

More information

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017 CS 137 Part 8 Merge Sort, Quick Sort, Binary Search November 20th, 2017 This Week We re going to see two more complicated sorting algorithms that will be our first introduction to O(n log n) sorting algorithms.

More information

Compiler Construction 2010/2011 Loop Optimizations

Compiler Construction 2010/2011 Loop Optimizations Compiler Construction 2010/2011 Loop Optimizations Peter Thiemann January 25, 2011 Outline 1 Loop Optimizations 2 Dominators 3 Loop-Invariant Computations 4 Induction Variables 5 Array-Bounds Checks 6

More information

ASSEMBLY I: BASIC OPERATIONS. Jo, Heeseung

ASSEMBLY I: BASIC OPERATIONS. Jo, Heeseung ASSEMBLY I: BASIC OPERATIONS Jo, Heeseung MOVING DATA (1) Moving data: movl source, dest Move 4-byte ("long") word Lots of these in typical code Operand types Immediate: constant integer data - Like C

More information

! Must optimize at multiple levels: ! How programs are compiled and executed , F 02

! Must optimize at multiple levels: ! How programs are compiled and executed , F 02 Code Optimization I: Machine Independent Optimizations Sept. 26, 2002 class10.ppt 15-213 The course that gives CMU its Zip! Topics! Machine-Independent Optimizations " Code motion " Reduction in strength

More information

Great Reality # The course that gives CMU its Zip! Code Optimization I: Machine Independent Optimizations Feb 11, 2003

Great Reality # The course that gives CMU its Zip! Code Optimization I: Machine Independent Optimizations Feb 11, 2003 Code Optimization I: Machine Independent Optimizations Feb 11, 2003 class10.ppt 15-213 The course that gives CMU its Zip! Topics Machine-Independent Optimizations Code motion Strength Reduction/Induction

More information

CS 31: Intro to Systems Functions and the Stack. Martin Gagne Swarthmore College February 23, 2016

CS 31: Intro to Systems Functions and the Stack. Martin Gagne Swarthmore College February 23, 2016 CS 31: Intro to Systems Functions and the Stack Martin Gagne Swarthmore College February 23, 2016 Reminders Late policy: you do not have to send me an email to inform me of a late submission before the

More information

CSC 2400: Computing Systems. X86 Assembly: Function Calls"

CSC 2400: Computing Systems. X86 Assembly: Function Calls CSC 24: Computing Systems X86 Assembly: Function Calls" 1 Lecture Goals! Challenges of supporting functions" Providing information for the called function" Function arguments and local variables" Allowing

More information

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,

More information

Introduction to Computer Systems. Exam 1. February 22, This is an open-book exam. Notes are permitted, but not computers.

Introduction to Computer Systems. Exam 1. February 22, This is an open-book exam. Notes are permitted, but not computers. 15-213 Introduction to Computer Systems Exam 1 February 22, 2005 Name: Andrew User ID: Recitation Section: This is an open-book exam. Notes are permitted, but not computers. Write your answer legibly in

More information

Principles of Compiler Design

Principles of Compiler Design Principles of Compiler Design Intermediate Representation Compiler Lexical Analysis Syntax Analysis Semantic Analysis Source Program Token stream Abstract Syntax tree Unambiguous Program representation

More information

CS429: Computer Organization and Architecture

CS429: Computer Organization and Architecture CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Science University of Texas at Austin Last updated: July 18, 2018 at 08:44 CS429 Slideset 20: 1 Performance: More than

More information

Compiler Construction 2016/2017 Loop Optimizations

Compiler Construction 2016/2017 Loop Optimizations Compiler Construction 2016/2017 Loop Optimizations Peter Thiemann January 16, 2017 Outline 1 Loops 2 Dominators 3 Loop-Invariant Computations 4 Induction Variables 5 Array-Bounds Checks 6 Loop Unrolling

More information

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,

More information

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program and Code Improvement Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program Review Front end code Source code analysis Syntax tree Back end code Target code

More information

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu. Optimization ASU Textbook Chapter 9 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction For some compiler, the intermediate code is a pseudo code of a virtual machine.

More information

CS577 Modern Language Processors. Spring 2018 Lecture Optimization

CS577 Modern Language Processors. Spring 2018 Lecture Optimization CS577 Modern Language Processors Spring 2018 Lecture Optimization 1 GENERATING BETTER CODE What does a conventional compiler do to improve quality of generated code? Eliminate redundant computation Move

More information

ASSEMBLY II: CONTROL FLOW. Jo, Heeseung

ASSEMBLY II: CONTROL FLOW. Jo, Heeseung ASSEMBLY II: CONTROL FLOW Jo, Heeseung IA-32 PROCESSOR STATE Temporary data Location of runtime stack %eax %edx %ecx %ebx %esi %edi %esp %ebp General purpose registers Current stack top Current stack frame

More information

Systems I. Code Optimization I: Machine Independent Optimizations

Systems I. Code Optimization I: Machine Independent Optimizations Systems I Code Optimization I: Machine Independent Optimizations Topics Machine-Independent Optimizations Code motion Reduction in strength Common subexpression sharing Tuning Identifying performance bottlenecks

More information

CS241 Computer Organization Spring 2015 IA

CS241 Computer Organization Spring 2015 IA CS241 Computer Organization Spring 2015 IA-32 2-10 2015 Outline! Review HW#3 and Quiz#1! More on Assembly (IA32) move instruction (mov) memory address computation arithmetic & logic instructions (add,

More information

Compiler Construction D7011E

Compiler Construction D7011E Compiler Construction D7011E Lecture 14: Memory Management Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1 First: Run-time Systems 2 The Final Component:

More information

Machine-Level Programming I: Introduction Jan. 30, 2001

Machine-Level Programming I: Introduction Jan. 30, 2001 15-213 Machine-Level Programming I: Introduction Jan. 30, 2001 Topics Assembly Programmer s Execution Model Accessing Information Registers Memory Arithmetic operations IA32 Processors Totally Dominate

More information

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

USC 227 Office hours: 3-4 Monday and Wednesday  CS553 Lecture 1 Introduction 4 CS553 Compiler Construction Instructor: URL: Michelle Strout mstrout@cs.colostate.edu USC 227 Office hours: 3-4 Monday and Wednesday http://www.cs.colostate.edu/~cs553 CS553 Lecture 1 Introduction 3 Plan

More information

CISC 360 Instruction Set Architecture

CISC 360 Instruction Set Architecture CISC 360 Instruction Set Architecture Michela Taufer October 9, 2008 Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective, R. Bryant and D. O'Hallaron, Prentice Hall, 2003 Chapter

More information

CprE 488 Embedded Systems Design. Lecture 6 Software Optimization

CprE 488 Embedded Systems Design. Lecture 6 Software Optimization CprE 488 Embedded Systems Design Lecture 6 Software Optimization Joseph Zambreno Electrical and Computer Engineering Iowa State University www.ece.iastate.edu/~zambreno rcl.ece.iastate.edu If you lie to

More information

Instruction Set Architecture

Instruction Set Architecture CS:APP Chapter 4 Computer Architecture Instruction Set Architecture Randal E. Bryant Carnegie Mellon University http://csapp.cs.cmu.edu CS:APP Instruction Set Architecture Assembly Language View Processor

More information

Function Calls COS 217. Reading: Chapter 4 of Programming From the Ground Up (available online from the course Web site)

Function Calls COS 217. Reading: Chapter 4 of Programming From the Ground Up (available online from the course Web site) Function Calls COS 217 Reading: Chapter 4 of Programming From the Ground Up (available online from the course Web site) 1 Goals of Today s Lecture Finishing introduction to assembly language o EFLAGS register

More information

1 /* file cpuid2.s */ 4.asciz "The processor Vendor ID is %s \n" 5.section.bss. 6.lcomm buffer, section.text. 8.globl _start.

1 /* file cpuid2.s */ 4.asciz The processor Vendor ID is %s \n 5.section.bss. 6.lcomm buffer, section.text. 8.globl _start. 1 /* file cpuid2.s */ 2.section.data 3 output: 4.asciz "The processor Vendor ID is %s \n" 5.section.bss 6.lcomm buffer, 12 7.section.text 8.globl _start 9 _start: 10 movl $0, %eax 11 cpuid 12 movl $buffer,

More information

More Code Generation and Optimization. Pat Morin COMP 3002

More Code Generation and Optimization. Pat Morin COMP 3002 More Code Generation and Optimization Pat Morin COMP 3002 Outline DAG representation of basic blocks Peephole optimization Register allocation by graph coloring 2 Basic Blocks as DAGs 3 Basic Blocks as

More information

Instruction Set Architecture

Instruction Set Architecture CISC 360 Instruction Set Architecture Michela Taufer October 9, 2008 Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective, R. Bryant and D. O'Hallaron, Prentice Hall, 2003 Chapter

More information

Instruction Set Architecture

Instruction Set Architecture CS:APP Chapter 4 Computer Architecture Instruction Set Architecture Randal E. Bryant Carnegie Mellon University http://csapp.cs.cmu.edu CS:APP Instruction Set Architecture Assembly Language View! Processor

More information

Corrections made in this version not in first posting:

Corrections made in this version not in first posting: 1 Changelog 1 Corrections made in this version not in first posting: 27 Mar 2017: slide 18: mark suspect numbers for 1 accumulator 5 May 2017: slide 7: slower if to can be slower if notes on rotate 2 I

More information

Project 5: Extensions to the MiniJava Compiler

Project 5: Extensions to the MiniJava Compiler Project 5: Extensions to the MiniJava Compiler Due: Friday, March 7, 12:30 pm. In this assignment you will make two required extensions to the target code generator. You will also make an extension to

More information

Assembly II: Control Flow. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Assembly II: Control Flow. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Assembly II: Control Flow Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu IA-32 Processor State %eax %edx Temporary data Location of runtime stack

More information