Introduction to optimizations. CS Compiler Design. Phases inside the compiler. Optimization. Introduction to Optimizations. V.

Similar documents
Register allocation. CS Compiler Design. Liveness analysis. Register allocation. Liveness analysis and Register allocation. V.

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

Acknowledgement. CS Compiler Design. Semantic Processing. Alternatives for semantic processing. Intro to Semantic Analysis. V.

Runtime management. CS Compiler Design. The procedure abstraction. The procedure abstraction. Runtime management. V.

Academic Formalities. CS Modern Compilers: Theory and Practise. Images of the day. What, When and Why of Compilers

Challenges in the back end. CS Compiler Design. Basic blocks. Compiler analysis

Academic Formalities. CS Language Translators. Compilers A Sangam. What, When and Why of Compilers

Loop Transformations! Part II!

The compilation process is driven by the syntactic structure of the program as discovered by the parser

Register allocation. instruction selection. machine code. register allocation. errors

Alternatives for semantic processing

Introduction to Code Optimization. Lecture 36: Local Optimization. Basic Blocks. Basic-Block Example

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall

CS 2461: Computer Architecture 1

CS553 Lecture Dynamic Optimizations 2

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013

Office Hours: Mon/Wed 3:30-4:30 GDC Office Hours: Tue 3:30-4:30 Thu 3:30-4:30 GDC 5.

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Matrix Multiplication

Supercomputing in Plain English Part IV: Henry Neeman, Director

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

Tiling: A Data Locality Optimizing Algorithm

PERFORMANCE OPTIMISATION

CS 132 Compiler Construction

Machine-Independent Optimizations

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

16.10 Exercises. 372 Chapter 16 Code Improvement. be translated as

Compiler Design and Construction Optimization

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

Outline. Issues with the Memory System Loop Transformations Data Transformations Prefetching Alias Analysis

Introduction to Compilers

CS222: Cache Performance Improvement

Last class. CS Principles of Programming Languages. Introduction. Outline

Autotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT

Computer Science 160 Translation of Programming Languages

Module 13: INTRODUCTION TO COMPILERS FOR HIGH PERFORMANCE COMPUTERS Lecture 25: Supercomputing Applications. The Lecture Contains: Loop Unswitching

Software Pipelining by Modulo Scheduling. Philip Sweany University of North Texas

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

Intermediate Code & Local Optimizations. Lecture 20

Multi-dimensional Arrays

Compiler Optimization

Lecture 9 Basic Parallelization

Lecture 9 Basic Parallelization

Lecture Outline. Intermediate code Intermediate Code & Local Optimizations. Local optimizations. Lecture 14. Next time: global optimizations

Simone Campanoni Loop transformations

Lecture 2. Memory locality optimizations Address space organization

Tiling: A Data Locality Optimizing Algorithm

Program Transformations for the Memory Hierarchy

Calvin Lin The University of Texas at Austin

Polyèdres et compilation

Control flow graphs and loop optimizations. Thursday, October 24, 13

Loops. Lather, Rinse, Repeat. CS4410: Spring 2013

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Administration. Prerequisites. Meeting times. CS 380C: Advanced Topics in Compilers

Interprocedural Analysis. Motivation. Interprocedural Analysis. Function Calls and Pointers

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

IR Optimization. May 15th, Tuesday, May 14, 13

Compilers. Optimization. Yannis Smaragdakis, U. Athens

Matrix Multiplication

211: Computer Architecture Summer 2016

Goals of Program Optimization (1 of 2)

Compiler Construction

Intermediate Code & Local Optimizations

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Optimization Prof. James L. Frankel Harvard University

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

Compiling for Performance on hp OpenVMS I64. Doug Gordon Original Presentation by Bill Noyce European Technical Update Days, 2005

Languages and Compiler Design II IR Code Optimization

Lecture Notes on Loop Optimizations

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Code Optimization. Code Optimization

Assignment statement optimizations

CSE P 501 Compilers. Loops Hal Perkins Spring UW CSE P 501 Spring 2018 U-1

Supercomputing and Science An Introduction to High Performance Computing

Compilers. Lecture 2 Overview. (original slides by Sam

Chapter 4: LR Parsing

Compilers and Interpreters

CSc 520 Principles of Programming Languages

Overview of a Compiler

Review. Topics. Lecture 3. Advanced Programming Topics. Review: How executable files are generated. Defining macros through compilation flags

Lecture 25: Register Allocation

IA-64 Compiler Technology

Tour of common optimizations

Register Allocation (via graph coloring) Lecture 25. CS 536 Spring

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Other Forms of Intermediate Code. Local Optimizations. Lecture 34

Administrative. Other Forms of Intermediate Code. Local Optimizations. Lecture 34. Code Generation Summary. Why Intermediate Languages?

Arrays and Functions

6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Code generation and optimization

Advanced Compiler Construction Theory And Practice

Chapter 10 Language Translation

Lecture 27. Pros and Cons of Pointers. Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis

Transcription:

Introduction to optimizations CS3300 - Compiler Design Introduction to Optimizations V. Krishna Nandivada IIT Madras Copyright c 2018 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from hosking@cs.purdue.edu. V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 2 / 30 Phases inside the compiler Optimization Front responsibilities: Recognize syntactically legal code; report errors. Recognize semantically legal code; report errors. Produce IR. Back responsibilities: Optimizations, code generation. Our target five out of seven phases. glance over optimizations att the graduate course, if interested. Goal: produce fast code What is optimality? Problems are often hard Many are intractable or even undecideable Many are NP-complete Which optimizations should be used? Many optimizations overlap or interact V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 3 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 4 / 30

Optimization Optimization - conflicting goals Definition: An optimization is a transformation that is expected to: improve the running time of a program or decrease its space requirements The point: improved code, not optimal code (forget optimum ) sometimes produces worse code range of speedup might be from 1.000001 to xxx It is undecidable whether in most cases, a particular optimization improves or (at least does not worsen) performance. Q: Can we not even say a simple transformation like algebraic simplification will always improve the code? Typical goals of optimization for the generated code: Qs: Speed Space Power Which one matters? deps on the target machine. Traditionally (hence the default behavior) compilers have targeted the speed of the generated code. Some times improving one goal improves another. Some times it does not. Example: loop unrolling, strength reduction (mult 5). V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 5 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 6 / 30 Choosing the optimizations Types of optimizations Some optimizations are more important than others. Optimizations that apply to loops impact register allocation instruction scheduling are essential for high performance. Choice of optimizations may dep on the input program: OO programs - inlining (why?) and leaf-routine optimizations. For recursive programs - tail call optimizations (replace some calls by jumps) For self-recursive programs - turn the recursive calls to loops. Classification of optimization (based on their scope) Local (within basic blocks) Intra-procedural Inter-procedural Classification based on their positioning: High level optimizations (use the program structure to optimize). Low level optimizations (work on medium/lower level IR) V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 7 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 8 / 30

Optimization classification (contd) A classical distinction Classification with respect to their depence on the target machine. Machine indepent applicable across broad range of machines move evaluation to a less frequently executed place specialize some general-purpose code remove redundant (unreachable, useless) code. create opportunities. Machine depent capitalize on machine-specific properties improve mapping from IR onto machine strength reduction. replace sequence of instructions with more powerful one (use exotic instructions) The distinction (Machine specific / indepent) is not always clear: replace multiply with shifts and adds V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 9 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 10 / 30 Optimization Types of program analysis Desirable properties of an optimizing compiler code at least as good as an assembler programmer stable, robust performance (predictability) architectural strengths fully exploited architectural weaknesses fully hidden broad, efficient support for language features instantaneous compiles Unfortunately, modern compilers often drop the ball Classification of analysis (based on their view) if (cond) { a =... b =... else { a =... c =... // Which of the variables may be assigned? -- {a,b,c // Which of the variables must be assigned? -- {a May analysis the analysis holds on at least one data flow path. Must analysis the analysis must hold on all data flow paths. V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 11 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 12 / 30

Example optimization: constant propagation Goal: Find the constant expressions in the program. Replace all the constant expressions with their constant literals. foo (int b) { a[1] = 1; a[2] = 2; a[3] = 3; i = 1; if (b > 2) { j = 2; else { j = i + 1; k = a[j]; return k; Classification of analysis (contd) Classification of analysis (based on precision) Flow sensitive / insensitive. Insensitive - the analysis should hold at every program point; does not dep on the type of control flow. Sensitive - Each program point has its own analysis. if (c) { a = 2; b = a; c = 3; print (a, b, c); // constants? else { a = 3 b = a; c = 3; print (a, b, c); // constants? V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 13 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 14 / 30 Classification of analysis (contd) Copy propagation Context sensitive and insensitive a = foo(2); b = foo (3); Copy propagation: given an assignent x := y, replaces the later uses of x with y, provided that the intervening instructions do not change the value of either x or y c = bar (2); d = bar(2); print (a, b, c, d); // a, b, c, d constants? int foo(int x) { return x int bar(int x) { return x x V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 15 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 16 / 30

Copy propagation (effect) Alias Analysis A seemingly simple and weak optimization. Eliminates copy instructions. Helps identify dead code. Helps in register allocation (fewer live ranges) Assists in other optimizations. Can be done both localy (basic block level) or globaly (whole procedure). Alias analysis: problem of identifying storage locations that can be accessed by more than one way. Are variable a and b aliases? a and b refer to the same location? Modifying the contents of a, modifies the contents of b. Unlike in copy propagation, in alias analysis we only talk about memory references (and not scalar values). foo(){ int p; int n; p = &n; n = 4; print ("%d", p); p n 4 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 17 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 18 / 30 Alias analysis (contd) Alias analysis (contd) Granularity of analysis: Flow sensitive or insensitive. extern int q; foo( ) { int a, k; k = a + 5; f (a, &k); q = 13; k = a + 5; / Assignment is redundant? / / Expression is redundant? / Only if a) the assignment to q does not change k or a, b) the function call f, does not change k. bar ( ) { int a, b, e[], d, i; extern int q; q = &a; a = 2; b = q + 2; q = &b; for (i = 0; i < 100; i++) { e [i] = e [i] + a; q = i; d = q + a; V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 19 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 20 / 30

Loop unrolling (Example) Matrix-matrix multiply do i 1, n, 1 do j 1, n, 1 c(i,j) 0 do k 1, n, 1 c(i,j) c(i,j) + a(i,k) b(k,j) All the array elements are floating point values. 2n 3 flops, n 3 loop increments and branches each iteration does 2 loads and 2 flops This is the most overstudied example in the literature Example: loop unrolling Matrix-matrix multiply (assume 4-word cache line) do i 1, n, 1 do j 1, n, 1 c(i,j) 0 do k 1, n, 4 c(i,j) c(i,j) + a(i,k) b(k,j) c(i,j) c(i,j) + a(i,k+1) b(k+1,j) c(i,j) c(i,j) + a(i,k+2) b(k+2,j) c(i,j) c(i,j) + a(i,k+3) b(k+3,j) 2n 3 flops, n3 4 loop increments and branches each iteration does 8 loads and 8 flops memory traffic is better c(i,j) is reused a(i,k) reference are from cache b(k,j) is problematic (put it in a register) V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 21 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 22 / 30 Example: loop unrolling Example: loop unrolling Matrix-matrix multiply (to improve traffic on b) do j 1, n, 1 do i 1, n, 4 c(i,j) 0 do k 1, n, 4 c(i,j) c(i,j) + a(i,k) b(k,j) + a(i,k+1) b(k+1,j) + a(i,k+2) b(k+2,j) + a(i,k+3) b(k+3,j) c(i+1,j) c(i+1,j) + a(i+1,k) b(k,j) + a(i+1,k+1) b(k+1,j) + a(i+1,k+2) b(k+2,j) + a(i+1,k+3) b(k+3,j) c(i+2,j) c(i+2,j) + a(i+2,k) b(k,j) + a(i+2,k+1) b(k+1,j) + a(i+2,k+2) b(k+2,j) + a(i+2,k+3) b(k+3,j) c(i+3,j) c(i+3,j) + a(i+3,k) b(k,j) + a(i+3,k+1) b(k+1,j) + a(i+3,k+2) b(k+2,j) + a(i+3,k+3) b(k+3,j) What happened? interchanged i and j loops unrolled i loop fused inner loops 2n 3 flops, n3 16 loop increments and branches first assignment does 8 loads and 8 flops 2 nd through 4 th do 4 loads and 8 flops memory traffic is better c(i,j) is reused a(i,k) references are from cache b(k,j) is reused (register) (register) V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 23 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 24 / 30

Loop optimizations: factoring loop-invariants Example: loop invariants Loop invariants: expressions constant within loop body Goal: move the loop invariant computation to outside the loop. The loop indepent code executes only once, instead of many times the loop might. foreach i=1.. 100 do foreach j=1.. 100 do foreach k=1.. 100 do A[i,j,k] = i j k; 3 million index operations 2 million multiplications V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 25 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 26 / 30 Example: loop invariants (cont.) Instruction Scheduling Factoring the inner loop: foreach i=1.. 100 do foreach j=1.. 100 do t1 = &A[i][j]; t2 = i j ; foreach k=1.. 100 do t1[k] = t2 k; And the second loop: foreach i=1.. 100 do t3 = &A[i]; foreach j=1.. 100 do t1 = &t3[j]; t2 = i j ; foreach k=1.. 100 do t1[k] = t2 k; Instruction scheduling V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 27 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 28 / 30

Optimization - overview Back to first lecture Good compilers are crafted, not assembled consistent philosophy careful selection of transformations thorough application coordinate transformations and data structures attention to results (code, time, space) Compilers are engineered objects minimize running time of compiled code minimize compile time use reasonable compile-time space Thus, results are sometimes unexpected (serious problem) Front responsibilities: Recognize syntactically legal code; report errors. Recognize semantically legal code; report errors. Produce IR. Back responsibilities: Optimizations, code generation. Our target five out of seven phases. glance over optimizations att the graduate course, if interested. V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 29 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2016 30 / 30