Fast Searches for Effective Optimization Phase Sequences

Similar documents
In Search of Near-Optimal Optimization Phase Orderings

Evaluating Heuristic Optimization Phase Order Search Algorithms

Exhaustive Phase Order Search Space Exploration and Evaluation

Exploiting Phase Inter-Dependencies for Faster Iterative Compiler Optimization Phase Order Searches

VISTA: VPO Interactive System for Tuning Applications

Evaluating Heuristic Optimization Phase Order Search Algorithms

Improving Both the Performance Benefits and Speed of Optimization Phase Sequence Searches

Optimizing Code by Selecting Compiler Flags using Parallel Genetic Algorithm on Multicore CPUs

Fast Searches for Effective Optimization Phase Sequences

Tour of common optimizations

A Feasibility Study for Methods of Effective Memoization Optimization

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall

Chapter 04: Instruction Sets and the Processor organizations. Lesson 20: RISC and converged Architecture

Advanced Databases. Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Lecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections )

Improving Data Access Efficiency by Using Context-Aware Loads and Stores

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Constructing an Optimisation Phase Using Grammatical Evolution. Brad Alexander and Michael Gratton

COMPILER DESIGN - CODE OPTIMIZATION

HW 2 is out! Due 9/25!

Techniques for Effectively Exploiting a Zero Overhead Loop Buffer

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems

Using De-optimization to Re-optimize Code

Debugging and Testing Optimizers through Comparison Checking

Hugh Leather, Edwin Bonilla, Michael O'Boyle

Compiler Optimization

Improving Low Power Processor Efficiency with Static Pipelining

Tuning the WCET of Embedded Applications

Advanced Pipelining and Instruction- Level Parallelism 4

Automatic Tuning Matrix Multiplication Performance on Graphics Hardware

Finding Resilience-Friendly Compiler Optimizations using Meta-Heuristic Search

Fault Simulation. Problem and Motivation

Topic I (d): Static Single Assignment Form (SSA)

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

Continuous Compilation for Aggressive and Adaptive Code Transformation

Modularization Based Code Optimization Technique for Ensuring Software Product Quality

Reducing Instruction Fetch Cost by Packing Instructions into Register Windows

CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines

8 Optimisation. 8.1 Introduction to Optimisation

Efficient Hardware Acceleration on SoC- FPGA using OpenCL

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.

Code Merge. Flow Analysis. bookkeeping

Global Register Allocation - Part 2

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Introduction to optimizations. CS Compiler Design. Phases inside the compiler. Optimization. Introduction to Optimizations. V.

Neural Network Weight Selection Using Genetic Algorithms

MODEL-DRIVEN CODE OPTIMIZATION. Min Zhao. B.E. Computer Science and Engineering, Xi an Jiaotong University, P.R. China, 1996

If-Conversion SSA Framework and Transformations SSA 09

MODEL-DRIVEN CODE OPTIMIZATION. Min Zhao. B.E. Computer Science and Engineering, Xi an Jiaotong University, P.R. China, 1996

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

Lecture: Static ILP. Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)

Calvin Lin The University of Texas at Austin

Automatic Selection of GCC Optimization Options Using A Gene Weighted Genetic Algorithm

Introduction to Machine-Independent Optimizations - 1

Static, multiple-issue (superscaler) pipelines

Introduction to Compilers

Instruction Set Architecture (Contd)

High Level, high speed FPGA programming

r[2] = M[x]; M[x] = r[2]; r[2] = M[x]; M[x] = r[2];

Genetic Algorithms. Kang Zheng Karl Schober

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

Lecture 2: Search 1. Victor R. Lesser. CMPSCI 683 Fall 2010

Computer Architecture and Organization

Chapter 17: Parallel Databases

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

Compiler Architecture

IA-64 Compiler Technology

Why Global Dataflow Analysis?

Speeding Up DLX Computer Architecture Hadassah College Spring 2018 Speeding Up DLX Dr. Martin Land

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation

Avoiding Conditional Branches by Code Replication

CSC D70: Compiler Optimization Register Allocation

REDUCED INSTRUCTION SET COMPUTERS (RISC)

Comparison Checking. Dept. of Computer Science. Chatham College. Pittsburgh, USA. Rajiv Gupta. University of Arizona. Tucson, USA

Unit 2: High-Level Synthesis

Loops. Announcements. Loop fusion. Loop unrolling. Code motion. Array. Good targets for optimization. Basic loop optimizations:

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases. Introduction

A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each

Register Allocation (via graph coloring) Lecture 25. CS 536 Spring

Global Register Allocation

8 1 Sequences. Defining a Sequence. A sequence {a n } is a list of numbers written in explicit order. e.g.

A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems

FSU. Av oiding Unconditional Jumps by Code Replication. Frank Mueller and David B. Whalley. Florida State University DEPARTMENT OF COMPUTER SCIENCE

Chapter 9. Register Allocation

Soot A Java Bytecode Optimization Framework. Sable Research Group School of Computer Science McGill University

Just-In-Time Compilers & Runtime Optimizers

Facilitating Compiler Optimizations through the Dynamic Mapping of Alternate Register Structures

FSU DEPARTMENT OF COMPUTER SCIENCE

Register Allocation Aware Instruction Selection

FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES DEPENDENCY COLLAPSING IN INSTRUCTION-LEVEL PARALLEL ARCHITECTURES VICTOR BRUNELL

An Assembler for the MSSP Distiller Eric Zimmerman University of Illinois, Urbana Champaign

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Grid-Based Genetic Algorithm Approach to Colour Image Segmentation

Automatic Programming with Ant Colony Optimization

Overview of a Compiler

Transcription:

Fast Searches for Effective Optimization Phase Sequences Prasad Kulkarni, Stephen Hines, Jason Hiser David Whalley, Jack Davidson, Douglas Jones Computer Science Department, Florida State University, Tallahassee, Florida Computer Science Department, University of Virginia, Charlottesville, Virginia Electrical and Computer Eng. Department, University of Illinois, Urbana, Illinois 1

Phase Ordering Problem A single ordering of optimization phases will not always produce the best code different applications different compilers different target machines Example register allocation and instruction selection 2

Approaches to Addressing the Phase Ordering Problem Framework for formally specifying compiler optimizations. Single intermediate language representation repeated applications of optimization phases Exhaustive search? Our approach intelligent search of the optimization space using genetic algorithm 3

Genetic Algorithm A biased sampling search method evolves solutions by merging parts of different solutions Create initial population of optimization sequences Evaluate fitness of each sequence in the population Terminate cond.? Y Output the best sequence found N Perform crossover/ mutation to create new generation 4

Genetic Algorithm A biased sampling search method evolves solutions by merging parts of different solutions Create initial population of optimization sequences Evaluate fitness of each sequence in the population Terminate cond.? Y Output the best sequence found N Perform crossover/ mutation to create new generation 5

Genetic Algorithm A biased sampling search method evolves solutions by merging parts of different solutions Create initial population of optimization sequences Evaluate fitness of each sequence in the population Terminate cond.? Y Output the best sequence found N Perform crossover/ mutation to create new generation 6

Genetic Algorithm A biased sampling search method evolves solutions by merging parts of different solutions Create initial population of optimization sequences Evaluate fitness of each sequence in the population Terminate cond.? Y Output the best sequence found N Perform crossover/ mutation to create new generation 7

Genetic Algorithm A biased sampling search method evolves solutions by merging parts of different solutions Create initial population of optimization sequences Evaluate fitness of each sequence in the population Terminate cond.? Y Output the best sequence found N Perform crossover/ mutation to create new generation 8

Genetic Algorithm (cont ) Crossover 20% sequences in each generation replaced Mutation phases in each sequence replaced with a low probability 9

Genetic Algorithm (cont ) C Source Function Compiler Assembly Function candidate phases best sequence Genetic Algorithm 10

Experiments Performed on six mibench benchmarks, which contained a total of 106 functions. Used 15 candidate optimization phases. Sequence length set to 1.25 times the number of successful batch phases. Population size set to 20. Performed 100 generations. Fitness value was 50% speed and 50% size. 11

Genetic Algorithm Results 12

Our Earlier Work Published in LCTES 03 complete compiler framework detailed description of the genetic algorithm improvements given by the genetic algorithm for code-size, speed, and 50% of both factors optimization sequences found by the genetic algorithm for each function Finding Effective Optimization Phase Sequences http://www.cs.fsu.edu/~whalley/papers/lctes03.ps 13

Genetic Algorithm Issues Very long search times evaluating each sequence involves compiling, assembling, linking, execution and verification simulation / execution on embedded processors is generally slower than general-purpose processors Reducing the search overhead avoiding redundant executions of the application. modifying the search to obtain comparable results in fewer generations. 14

Methods for Avoiding Redundant Executions Detect sequences that have already been attempted. Detect sequences of phases that have been successfully applied. Check if an instance of this function has already been generated. Check if an equivalent function has already been generated. 15

Reducing the Search Overhead Avoiding redundant executions. Obtaining similar results in fewer generations. 16

Overview of Avoiding Redundant Executions 17

Finding Redundant Attempted Sequences Same optimization phase sequence may be reattempted Crossover operation producing a previously attempted sequence Mutation not occurring on any of the phases in the sequence Mutation changing phases, but producing a previously attempted sequence 18

Finding Redundant Attempted Sequences (cont ) Before mutation. After mutation. seq i : d a e d. c f seq i : d a e d. c f seq j : f a c b c d seq j : f a c a c d.. seq k : f e c b b d seq k : f a c b c d 19

Finding Redundant Active Sequences An active optimization phase is one that is able to complete one or more transformations. Dormant phases do not affect the compilation. Compiler must indicate if phase was active. Attempted : seq i : d b e d c f seq j : d a e b c f Active : seq i : d e c f seq j : d e c f 20

Detecting Identical Code Sometimes identical code for a function can be generated from different active sequences. Some phases are essentially independent branch chaining and register allocation Sometimes more than one way to produce the same code. 21

Detecting Identical Code (cont ) Example: r[2] = 1; r[3] = r[4] + r[2]; instruction selection r[3] = r[4] + 1; r[2] = 1; r[3] = r[4] + r[2]; constant propagation r[2] = 1; r[3] = r[4] + 1; dead assignment elimination r[3] = r[4] + 1; Used CRC checksums to compare function instances. 22

Detecting Equivalent Code Code generated by different optimization sequences may be equivalent, but not identical. Some optimization phases consume registers. Different ordering of such phases may result in equivalent instructions, but different registers being used. 23

Detecting Equivalent Code (cont ) sum = 0; for (i = 0; i < 1000; i++ ) sum += a [ i ]; Source Code r[10]=0; r[12]=hi[a]; r[12]=r[12]+lo[a]; r[1]=r[12]; r[9]=4000+r[12]; L3 r[8]=m[r[1]]; r[10]=r[10]+r[8]; r[1]=r[1]+4; IC=r[1]?r[9]; PC=IC<0,L3; Register Allocation before Code Motion r[11]=0; r[10]=hi[a]; r[10]=r[10]+lo[a]; r[1]=r[10]; r[9]=4000+r[10]; L3 r[8]=m[r[1]]; r[11]=r[11]+r[8]; r[1]=r[1]+4; IC=r[1]?r[9]; PC=IC<0,L3; Code Motion before Register Allocation r[32]=0; r[33]=hi[a]; r[33]=r[33]+lo[a]; r[34]=r[33]; r[35]=4000+r[33]; L3 r[36]=m[r[34]]; r[32]=r[32]+r[36]; r[34]=r[34]+4; IC=r[34]?r[35]; PC=IC<0,L3; After Mapping Registers 24

Number of Avoided Executions 25

Relative Total Search Time 26

Reducing the Search Overhead Avoiding redundant executions. Obtaining similar results in fewer generations. 27

Producing Similar Results in Fewer Generations Can reduce search time by running the genetic algorithm for fewer generations. Can obtain better results in the same number of generations. We evaluate four methods for reducing the number of required generations to find the best sequence in the search. 28

Using the Batch Sequence Capture the active sequence of phases applied by the batch compiler. Place this sequence in the initial population. May allow the genetic algorithm to converge faster to the best sequence it can find. 29

Number of Generations When Using the Batch Sequence 30

Prohibiting Specific Phases Perform static analysis on the function. No loops, then no loop optimizations. No scalar variables, then no register allocation. Only one basic block, then no unreachable code elimination and no branch optimizations. Etc. Such phases are prohibited from being attempted for the entire search for that function. 31

Number of Generations When Prohibiting Specific Phases 32

Prohibiting Prior Dormant Phases Some phases will be found to be dormant given a specific prefix of active phases. If encounter the same prefix, then do not allow these prior dormant phases to be reattempted. Keep a tree of active prefixes and store the dormant phases with each node in the tree. Changed the genetic algorithm by forcing a prior dormant phase to mutate until finding a phase that has been active or not yet attempted with the prefix. 33

Prohibiting Prior Dormant Phases (cont ) a and f are dormant phases given the active prefix of bac in the tree. b a b f e c a e b d c a f d b f 34

Number of Generations When Prohibiting Prior Dormant Phases 35

Prohibiting Un-enabled Phases Most optimization phases when performed cannot be applied again until enabled. ex: Register allocation will not be enabled by most branch optimizations c enables a b and d do not enable a a b c a a b d a 36

Prohibiting Unenabled Phases (cont.) Assume b can be enabled by a, but cannot be enabled by c. Given the prefix bac, then b cannot be active at this point. b a b f e c a e b d c b d b f 37

Number of Generations When Prohibiting Unenabled Phases 38

Number of Generations When Applying All Techniques 39

Number of Avoided Executions When Reducing the Number of Generations 40

Relative Search Time before Finding the Best Sequence 41

Superoptimizers Related Work instruction selection: Massalin branch elimination: Granlund, Kenner Iterative compilation techniques using performance feedback information. loop unrolling, software pipelining, blocking Using genetic algorithms to improve compiler optimizations Parallelizing loop nests: Nisbet Improving compiler heuristics: Stephenson et al. Optimization sequences: Cooper et al. 42

Future Work Detecting likely active phases given active phases that precede it. Varying the characteristics of the search. Parallelize the genetic algorithm. 43

Conclusions Avoiding executions: Important for genetic algorithm to know if attempted phases were active or dormant to avoid redundant active sequences. Same code is often generated by different active sequences. Reducing the number of generations required to find the best sequence in the search: Inserting the batch compilation active sequence is simple and effective. Can use static analysis and empirical data to often detect when phases cannot be active. 44