Induction-variable Optimizations in GCC

Similar documents
Compiler Construction 2016/2017 Loop Optimizations

Compiler Construction 2010/2011 Loop Optimizations

Tour of common optimizations

Spring 2 Spring Loop Optimizations

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

CMPT 379 Compilers. Anoop Sarkar. 11/13/07 1. TAC: Intermediate Representation. Language + Machine Independent TAC

The GNU Compiler Collection

ECE 5775 High-Level Digital Design Automation Fall More CFG Static Single Assignment

Simone Campanoni Loop transformations

CS553 Lecture Dynamic Optimizations 2

Static Single Assignment Form in the COINS Compiler Infrastructure

Lecture Notes on Loop Optimizations

An example of optimization in LLVM. Compiler construction Step 1: Naive translation to LLVM. Step 2: Translating to SSA form (opt -mem2reg)

Using GCC as a Research Compiler

A Propagation Engine for GCC

Optimization Prof. James L. Frankel Harvard University

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

ECE 5775 (Fall 17) High-Level Digital Design Automation. Static Single Assignment

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

GCC Internals Control and data flow support

Control flow graphs and loop optimizations. Thursday, October 24, 13

Administrivia. CMSC 411 Computer Systems Architecture Lecture 14 Instruction Level Parallelism (cont.) Control Dependencies

The New Framework for Loop Nest Optimization in GCC: from Prototyping to Evaluation

Compiler Optimization

Diego Caballero and Vectorizer Team, Intel Corporation. April 16 th, 2018 Euro LLVM Developers Meeting. Bristol, UK.

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

EECS 583 Class 8 Classic Optimization

Languages and Compiler Design II IR Code Generation I

Plan for Today. Concepts. Next Time. Some slides are from Calvin Lin s grad compiler slides. CS553 Lecture 2 Optimizations and LLVM 1

A UNIFIED COMPILER FRAMEWORK FOR PROGRAM ANALYSIS, OPTIMIZATION, AND AUTOMATIC VECTORIZATION WITH CHAINS OF RECURRENCES

Appendix A The DL Language

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

Using Static Single Assignment Form

Intermediate Representa.on

Code Optimization Using Graph Mining

Loop Unrolling. Source: for i := 1 to 100 by 1 A[i] := A[i] + B[i]; endfor

Intermediate representation

Functional programming languages

Machine-Independent Optimizations

CS107 Handout 13 Spring 2008 April 18, 2008 Computer Architecture: Take II

4/1/15 LLVM AND SSA. Low-Level Virtual Machine (LLVM) LLVM Compiler Infrastructure. LL: A Subset of LLVM. Basic Blocks

GCC Internals Alias analysis

Languages and Compiler Design II IR Code Optimization

An Architectural Overview of GCC

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Intermediate Programming, Spring 2017*

CSCI-1200 Data Structures Spring 2018 Lecture 7 Order Notation & Basic Recursion

COMPILER DESIGN - CODE OPTIMIZATION

Linear Scan Register Allocation. Kevin Millikin

Hello I am Zheng, Hongbin today I will introduce our LLVM based HLS framework, which is built upon the Target Independent Code Generator.

CS 403 Compiler Construction Lecture 10 Code Optimization [Based on Chapter 8.5, 9.1 of Aho2]

15-411: LLVM. Jan Hoffmann. Substantial portions courtesy of Deby Katz

Introduction to optimizations. CS Compiler Design. Phases inside the compiler. Optimization. Introduction to Optimizations. V.

Auto-Vectorization with GCC

Porting LLVM to a Next Generation DSP

CS C Primer. Tyler Szepesi. January 16, 2013

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall

COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013

Compiler Construction SMD163. Understanding Optimization: Optimization is not Magic: Goals of Optimization: Lecture 11: Introduction to optimization

A Fast Review of C Essentials Part II

CprE 488 Embedded Systems Design. Lecture 6 Software Optimization

Induction Variable Identification (cont)

Dealing with Register Hierarchies

Usually, target code is semantically equivalent to source code, but not always!

Compiler Design. Fall Data-Flow Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

HW 2 is out! Due 9/25!

Review. Lecture #7 MIPS Decisions So Far...

CS 61C: Great Ideas in Computer Architecture Introduction to Assembly Language and RISC-V Instruction Set Architecture

Introducing the GCC to the Polyhedron Model

A main goal is to achieve a better performance. Code Optimization. Chapter 9

ECE 5775 (Fall 17) High-Level Digital Design Automation. More Pipelining

LLVM Auto-Vectorization

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

Compiling for HSA accelerators with GCC

On recovering multi-dimensional arrays in Polly

Lecture 23 CIS 341: COMPILERS

Procedure Call Instructions

AutoTuneTMP: Auto-Tuning in C++ With Runtime Template Metaprogramming

Topic 9: Control Flow

17.1. Unit 17. Instruction Sets Picoblaze Processor

HW/SW Codesign. WCET Analysis

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

Software Protection: How to Crack Programs, and Defend Against Cracking Lecture 3: Program Analysis Moscow State University, Spring 2014

Code generation for modern processors

Code generation for modern processors

Introduction to Compilers

Intermediate Representation (IR)

Introduction to Machine Descriptions

ECE232: Hardware Organization and Design

DETAILED SYLLABUS INTRODUCTION TO C LANGUAGE

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Translating JVM Code to MIPS Code 1 / 43

Fast Recognition of Scalar Evolutions on Three-Address SSA Code

Topic I (d): Static Single Assignment Form (SSA)

UCB CS61C : Machine Structures

CS61C : Machine Structures

Unit 17. Instruction Set Review INSTRUCTION SET OVERVIEW. Historical Instruction Format Options. Instruction Sets Picoblaze Processor

Transcription:

Induction-variable Optimizations in GCC 程斌 bin.cheng@arm.com 2013.11

Outline Background Implementation of GCC Learned points Shortcomings Improvements Question & Answer References

Background Induction variable Induction variable Variables whose successive values form an arithmetic progression over some part of program, usually a loop. int a[100], i; for (i = 0; i < 100; i++) { a[i] = 202-2 * i /*Induction variables: i, &a[i], 2*i, 202-2*i */

Background Identify induction variables Basic/Fundamental induction variable Variables explicitly modified by a same constant amount during each iteration of a loop. Derived/General induction variable Variables modified in more complex ways int a[100], i; for (i = 0; i < 100; i++) { a[i] = 202-2 * i /*Induction variables: i, &a(i), 2*i, 202-2*i */

Background Induction-variable optimizations Strength reduction int a[100], i; for (i = 0; i < 100; i++) { a[i] = 202-2 * i //&a[i] a + 4 * i //202 2 * i int a[100], i; int *iv1 = a, iv2 = 202; for (i = 0; i < 100; i++) { *iv1 = iv2; iv1 += 4; iv2 -= 2;

Background Induction-variable optimizations Linear function test replacement int a[100], i; int *iv1 = a, iv2 = 202; for (i = 0; i < 100; i++) { *iv1 = iv2; iv1 += 4; iv2 -= 2; int a[100], i; int *iv1 = a, iv2 = 202; for (i = 0; iv2!= 2; i++) { *iv1 = iv2; iv1 += 4; iv2 -= 2;

Background Induction-variable optimizations Removal of induction variables int a[100], i; int *iv1 = a, iv2 = 202; for (i = 0; iv2!= 2; i++) { *iv1 = iv2; iv1 += 4; iv2 -= 2; int a[100]; int *iv1 = a, iv2 = 202; for (; iv2!= 2;) { *iv1 = iv2; iv1 += 4; iv2 -= 2;

Background Induction-variable optimizations Unnecessary bounds checking elimination refers to determine whether the value of a variable is within specified bounds in all of its uses in a program

Implementation of GCC Identify induction variable SSA form Scalar evolution implemented in tree-scalar-evolution.[ch] analyzes the evolution of scalar variables in loop Chain of recurrence is a canonical representation of polyn1mials functions can evaluate polynomials function at a number of points in an interval effectively models induction variable analysis f(x) = x^a + x^(a-1) + c f(0), f(1), f(2),...f(n) 0 1 2 3 4... n

Implementation of GCC Unified algorithm Find IV uses int a[100], i; for (i = 0; i < 100; i++) { a[i] = 202-2 * i Create IV candidates Compute costs Choose optimal set Rewrite IV uses Remove dead code use 0: 202 2*i use 1: &a[i] use 2: i < 100 int a[100]; int *iv1 = a, iv2 = 202; for (; iv2!= 2;) { *iv1 = iv2; iv1 += 4; iv2 -= 2; cand 0: <0, 1> cand 1: <202, -2> cand 2: <a, 4> cand 3: <a-4, 4>

Implementation of GCC Example IR before IVOPT IV uses use 0: <202, -2, _9> use 1: <a, 4, _7> use 2: <0, 1, i_11> IV candidates cand 0: <0, 1, normal> cand 1: <202, -2, normal> cand 2: <a, 4, normal> cand 3: <a-4, 4, before> cand 4: <a, 4, after> <bb 3>: # i_14 = PHI <i_11(4), 0(2)> i.0_4 = (unsigned int) i_14; _5 = i.0_4 * 4; _7 = a_6(d) + _5; _8 = 101 - i_14; _9 = _8 * 2; *_7 = _9; i_11 = i_14 + 1; if (i_11!= 100) goto <bb 3>; else goto <bb 5>;

Implementation of GCC Example Representation use 0 use 1 use 2 cand 0 202-2*iv_cand a + 4 * iv_cand iv_cand!= 100 cand 1 iv_cand a+2*(202-iv_cand) iv_cand!= 2 cand 2 NA MEM[iv_cand] iv_cand!= a+400 cand 3 NA MEM[iv_cand] iv_cand!= a+400 cand 4 NA MEM[iv_cand] iv_cand!= a+400 Costs use 0 use 1 use 2 cand 0 8 8 0 cand 1 0 26 0 cand 2 NA 6 0 cand 3 NA 2 0 cand 4 NA 2 0

Implementation of GCC Example Choose optimal iv set use:0 cand:0 cand:1 cand:1 use:1 cand:0 cand:0 cand:4 use:2 cand:0 cand:0 cand:1 Rewrite iv uses and remove dead code <bb 3>: # i_14 = PHI <i_11(4), 0(2)> i.0_4 = (unsigned int) i_14; _5 = i_14 * 4; _7 = a_6(d) + _5; _8 = 101 - i_14; _9 = _8 * 2; *_7 = _9; i_11 = i_14 + 1; if (i_11!= 100) goto <bb 3>; else goto <bb 5>; <bb 2>: iv.9_2 = (unsigned int) a_6(d); <bb 3>: # i_14 = PHI <i_11(4), 0(2)> # iv.6_3 = PHI <iv.6_2(4), 202(2)> # iv.9_1 = PHI <iv.9_3(4),iv.9_2(2)> _5 = i_14 * 4; _7 = a_6(d) + _5; _8 = 101 - i_14; _9 = (int) iv.6_3; _16 = (void *) iv.9_1; MEM[base: _16, offset: 0B] = _9; iv.9_3 = iv.9_1 + 4; iv.6_2 = iv.6_3-2; i_11 = i_14 + 1; if (iv.6_2!= 2) goto <bb 3>; else goto <bb 5>;

Implementation of GCC Learned points Infrastructure INPUT IVOPT ALGORITHM OUTPUT Cost Model Unified algorithm Cost Model rtx cost, like +,-,*,/,%,<<,... address cost Context information Loop invariant Register pressure Interact with other optimizations addressing mode [reg] [reg + reg] [reg + reg<<const] [reg + const]! [reg], #const

Shortcomings implementation fine tuned only for x86/x86_64 scaled addressing mode not supported for ARM, etc. support of auto-increment addressing mode is weak for ARM addressing mode [reg] [reg + reg] [reg + reg<<const] [reg + const]! [reg], #const int a[100], i; for (i = 0; i < 100; i++){ tmp = a + i*4 MEM[tmp] = 202-2*i int a[100], i; for (i = 0; i < 100; i++){ MEM[a+i<<2] = 202-2*i int a[100]; int *iv1 = a, iv2 = 202; for (; iv2!= 2;) { *iv1 = iv2; iv1 += 4; iv2 -= 2; int a[100]; int *iv1 = a, iv2 = 202; for (; iv2!= 2;) { MEM([iv1], #4) = iv2; iv2 -= 2;

Shortcomings implementation iv uses outside of loop are not handled along loop exit edges a[i2] = 0 i1 = i2 i2 = i2 + 1 if (i2 < 100) a[i2] = 0 i2 = i2 + 1 if (i2 < 100) use i1 use (i2-1) induction variables in both branches of IFstatement are not recognized int a[100], i = 0; while (i < 100) { if (i % 2 == 0) a[i] = 0; i++; else a[i] = 1; i++;

Shortcomings implementation complex address expression hard to process inaccurate cost &MEM[ptr+offset] &arr[index].y &MEM[ptr+offset] +/- step &arr[index].y +/- step ptr + offset arr + index*obj_size + y_offset ptr + offset arr + index*obj_size + y_offset +/- step +/- step ptr + const_1 arr + const_2

Shortcomings infrastructure GCC s two IRs: GIMPLE(IVOPT) & RTL Inaccurate rtx cost WRONG address cost model for ARM Conflict with RTL optimizations, like loop unrolling INPUT Cost Model IVOPT ALGORITHM OUTPUT TREE IR IVOPT Algorithm TREE IR RTL IR RTL passes a.s Cost Model GIMPLE problems Conflict with GIMPLE optimizations, like DOM Association of loop invariant in IVOPT x=a+b+c t=a+b; x=t+c or t=a+c; x=t+b

Improvements Work & patches Support scaled addressing mode for architectures other than x86/x86_64. http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01642.html http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01927.html Simplify address expressions for IVOPT http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00537.html http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01075.html Fix wrong address cost for auto-increment addressing mode http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00156.html Expedite the use of auto-increment addressing mode http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00034.html more patches coming... Compute outside loop iv uses along exit edges http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00535.html Improve the optimal iv set choosing algorithm patches coming... Miscellaneous fixes for IVOPT...

Improvements Work & patches Following work Fix address cost mode and RTL passes, e.g. fwprop_addr addr mode [reg] [reg+reg] [reg+reg<<i] [reg+offset] [reg], #off cost 6 4 3 2 0

Improvements Benchmark data Performance Coremark EEMBC_v1 automotive, office, consumer, telecom Spec2000 Code size CSIBE

Question and Answer Thank You!

References source code & internal & mailing list gcc.gnu.org Advanced compiler design and implementation Symbolic Evaluation of Chains of Recurrences for Loop Optimization, etc.