Static Single Assignment Form in the COINS Compiler Infrastructure

Similar documents
Static Single Assignment Form in the COINS Compiler Infrastructure - Current Status and Background -

Static Single Assignment Form in the COINS Compiler Infrastructure Current Status and Background

Tour of common optimizations

COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013

8. Static Single Assignment Form. Marcus Denker

What Do Compilers Do? How Can the Compiler Improve Performance? What Do We Mean By Optimization?

6. Intermediate Representation

Programming Language Processor Theory

Code Merge. Flow Analysis. bookkeeping

Simple and Efficient Construction of Static Single Assignment Form

Intermediate representation

ECE 5775 High-Level Digital Design Automation Fall More CFG Static Single Assignment

Supercomputing and Science An Introduction to High Performance Computing

6. Intermediate Representation!

CS577 Modern Language Processors. Spring 2018 Lecture Optimization

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

Compiler Optimization

Static single assignment

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Supercomputing in Plain English Part IV: Henry Neeman, Director

CSE Section 10 - Dataflow and Single Static Assignment - Solutions

Languages and Compiler Design II IR Code Optimization

Introduction to Machine-Independent Optimizations - 1

Principles of Compiler Design

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1

Functional programming languages

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

Topic I (d): Static Single Assignment Form (SSA)

ECE 5775 (Fall 17) High-Level Digital Design Automation. Static Single Assignment

Overview of a Compiler

Compiler Construction 2010/2011 Loop Optimizations

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

Overview of a Compiler

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

SSA Construction. Daniel Grund & Sebastian Hack. CC Winter Term 09/10. Saarland University

CPSC 510 (Fall 2007, Winter 2008) Compiler Construction II

Compiler Optimization Intermediate Representation

Compiler Construction 2016/2017 Loop Optimizations

EECS 583 Class 8 Classic Optimization

A main goal is to achieve a better performance. Code Optimization. Chapter 9

MIT Introduction to Program Analysis and Optimization. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Code generation for modern processors

Code generation for modern processors

Towards an SSA based compiler back-end: some interesting properties of SSA and its extensions

Compiler Construction 2009/2010 SSA Static Single Assignment Form

Running class Timing on Java HotSpot VM, 1

Induction Variable Identification (cont)

CS 403 Compiler Construction Lecture 10 Code Optimization [Based on Chapter 8.5, 9.1 of Aho2]

Induction-variable Optimizations in GCC

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

CSC D70: Compiler Optimization

Introduction to Compilers

Usually, target code is semantically equivalent to source code, but not always!

SSA, CPS, ANF: THREE APPROACHES

Intermediate Code Generation

Appendix A The DL Language

Overview of a Compiler

Intermediate Representation (IR)

Compiler construction 2009

A Practical and Fast Iterative Algorithm for φ-function Computation Using DJ Graphs

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

Goals of Program Optimization (1 of 2)

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

IA-64 Compiler Technology

Machine-Independent Optimizations

Lecture Notes on Loop Optimizations

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Why Global Dataflow Analysis?

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

The View from 35,000 Feet

FSU. Av oiding Unconditional Jumps by Code Replication. Frank Mueller and David B. Whalley. Florida State University DEPARTMENT OF COMPUTER SCIENCE

Intermediate Code & Local Optimizations

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

CS202 Compiler Construction

Loops. Lather, Rinse, Repeat. CS4410: Spring 2013

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210

int foo() { x += 5; return x; } int a = foo() + x + foo(); Side-effects GCC sets a=25. int x = 0; int foo() { x += 5; return x; }

Compiler Design and Construction Optimization

Advanced Compilers CMPSCI 710 Spring 2003 Computing SSA

Using Static Single Assignment Form

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

Introduction to Machine-Independent Optimizations - 6

Introduction to Optimization Local Value Numbering

A Sparse Algorithm for Predicated Global Value Numbering

Using GCC as a Research Compiler

Register Allocation. CS 502 Lecture 14 11/25/08

CS 6353 Compiler Construction, Homework #3

Single-Pass Generation of Static Single Assignment Form for Structured Languages

Lecture Notes on Loop-Invariant Code Motion

Stack Tracing In A Statically Typed Language

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

This test is not formatted for your answers. Submit your answers via to:

CS553 Lecture Dynamic Optimizations 2

Quality and Speed in Linear-Scan Register Allocation

Compiler Construction SMD163. Understanding Optimization: Optimization is not Magic: Goals of Optimization: Lecture 11: Introduction to optimization

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function

Abstract Interpretation Continued

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

Intermediate Code & Local Optimizations. Lecture 20

Transcription:

Static Single Assignment Form in the COINS Compiler Infrastructure Masataka Sassa (Tokyo Institute of Technology)

Background Static single assignment (SSA) form facilitates compiler optimizations. Compiler infrastructure facilitates compiler development. Outline 0. COINS infrastructure and the SSA form 1. SSA optimization module in COINS 2. A comparison of two major algorithms for translating from normal form into SSA form 3. A comparison of two major algorithms for translating back from SSA form into normal form 4. Examples of SSA form optimization 5. Preliminary experimental results

0. COINS infrastructure and the static single assignment form (SSA form)

COINS compiler infrastructure Multiple source languages Retargetable Two intermediate forms, HIR and LIR Optimizations Parallelization C generation, source-tosource translation Written in Java 2000~ developed by Japanese institutions under Grant of the Ministry C C frontend HIR to LIR Code generator SPARC High-level Intermediate Representation (HIR) Basic analyzer & optimizer Low-level Intermediate Representation (LIR) x86 Fortran Fortran frontend SSA optimizer New language New machine frontend Basic parallelizer SIMD parallelizer C C generation OpenMP Advanced optimizer C generation C

Static single assignment (SSA) form 1: a = x + y 2: a = a + 3 3: b = x + y (a) Normal (conventional) form (source program or internal form) 1: a 1 = x 0 + y 0 2: a 2 = a 1 + 3 3: b 1 = x 0 + y 0 (b) SSA form SSA form is a recently proposed internal representation where each use of a variable has a single definition point. Indices are attached to variables so that their definitions become unique.

Optimization in static single assignment (SSA) form 1: a = x + y 2: a = a + 3 3: b = x + y SSA translation 1: a 1 = x 0 + y 0 2: a 2 = a 1 + 3 3: b 1 = x 0 + y 0 (a) Normal form 1: a 1 = x 0 + y 0 2: a 2 = a 1 + 3 3: b 1 = a 1 (b) SSA form Optimization in SSA form (common subexpression elimination) SSA back translation 1: a1 = x0 + y0 2: a2 = a1 + 3 3: b1 = a1 (c) After SSA form optimization (d) Optimized normal form SSA form is becoming increasingly popular in compilers, since it is suited for clear handling of dataflow analysis and optimization.

Translating into SSA form (SSA translation) L1 x = 1 L2 x = 2 L1 x1 = 1 L2 x2 = 2 L3 = x L3 x3 = φ (x1;l1, x2:l2) = x3 (a) Normal form (b) SSA form

Translating into SSA form (SSA translation) x = = x y = z = x = = x y = z = = y = y = z Normal form x1= = x1 y1= z1= x2= = x2 y2= z2= x1= = x1 y1= z1= x2= = x2 y2= z2= x1= = x1 y1= z1= x2= = x2 y2= z2= = y1 = y2 = y1 = y2 = y1 = y2 x3= φ (x1,x2) y3= φ (y1,y2) z3= φ (z1,z2) = z3 Minimal SSA form y3= φ (y1,y2) z3= φ (z1,z2) = z3 Semi-pruned SSA form z3= φ (z1,z2) = z3 Pruned SSA form

Translating back from SSA form (SSA back translation) L1 L2 x1 = 1 x2 = 2 L1 x1 = 1 x3 = x1 L2 x2 = 2 x3 = x2 L3 x3 = φ (x1;l1, x2:l2) = x3 L3 = x3 (a) SSA form (b) Normal form

1. SSA optimization module in COINS

COINS compiler infrastructure C Fortran New language C OpenMP C frontend Fortran frontend frontend C generation High-level Intermediate Representation (HIR) HIR to LIR Basic analyzer & optimizer Basic parallelizer Advanced optimizer Low-level Intermediate Representation (LIR) Code generator SSA optimizer SIMD parallelizer C generation SPARC x86 New machine C

SSA optimization module in COINS Source program Low-level Intermediate Representation (LIR) SSA optimization module LIR to SSA translation (3 variations) LIR in SSA SSA optimization common subexp elim GVN by question prop copy propagation cond const propagation and much more... transformation on SSA edge splitting redund φ elim empty block elim Code generation object code Optimized LIR in SSA SSA to LIR back translation (3 variations) + 2 coalescing about 9,000 lines

Outline of SSA module in COINS(1) Translation into and back from SSA form on Low-level Intermediate Representation (LIR) SSA translation Use dominance frontier [Cytron et al. 91] 3 variations: translation into Minimal, Semi-pruned and Pruned SSA forms SSA back translation [Sreedhar et al. 99] 3 variations: Method I, II, and III Coalescing SSA-based coalescing during SSA back translation [Sreedhar et al. 99] Chaitin s style coalescing after back translation Each variation and coalescing can be specified by options

Outline of SSA module in COINS (2) Several optimization on SSA form: dead code elimination, copy propagation, common subexpression elimination, global value numbering based on efficient query propagation, conditional constant propagation, loop invariant code motion, operator strength reduction for induction variable and linear function test replacement, empty block elimination, copy folding at SSA translation time... Useful transformation as an infrastructure for SSA form optimization: critical edge removal on control flow graph, loop transformation from while loop to if-do-while loop, redundant phi-function elimination, making SSA graph Each variation, optimization and transformation can be made selectively by specifying options

2. A comparison of two major algorithms for SSA translation Algorithm by Cytron [1991] Dominance frontier Algorithm by Sreedhar [1995] DJ-graph Comparison made to decide the algorithm to be included in COINS

Translating into SSA form (SSA translation) L1 x = 1 L2 x = 2 L1 x1 = 1 L2 x2 = 2 L3 = x L3 x3 = φ (x1;l1, x2:l2) = x3 (a) Normal form (b) SSA form

Translation time (milli sec) Translation time Usual programs No. of nodes of control flow graph (The gap is due to the garbage collection)

Peculiar programs (a) nested loop (b) ladder graph

Translation time (milli sec) Translation time Nested loop programs No. of nodes of control flow graph

Translation time (milli sec) Translation time Ladder graph programs No. of nodes of control flow graph

3. A comparison of two major algorithms for SSA back translation Algorithm by Briggs [1998] Insert copy statements Algorithm by Sreedhar [1999] Eliminate interference There have been no studies of comparison Comparison made on COINS

Translating back from SSA form (SSA back translation) L1 L2 x1 = 1 x2 = 2 L1 x1 = 1 x3 = x1 L2 x2 = 2 x3 = x2 L3 x3 = φ (x1;l1, x2:l2) = x3 L3 = x3 (a) SSA form (b) Normal form

Problems of naïve SSA back translation (lost copy problem) block1 block2 x0 = 1 x1 = φ (x0, x2) y = x1 x2 = 2 block3 return y block1 block2 x0 = 1 x1 = φ (x0, x2) x2 = 2 block3 return x1 block1 block2 block3 x0 = 1 x1 = x0 x2 = 2 x1 = x2 return x1 ï

To remedy these problems... (i) SSA back translation algorithm by Briggs block1 block2 x0 = 1 x1 = φ (x0, x2) x2 = 2 live range of x1 block1 block2 x0 = 1 x1 = x0 temp = x1 x2 = 2 x1 = x2 live range of temp block3 block3 return x1 return temp (a) SSA form (b) normal form after back translation

(ii) SSA back translation algorithm by Sreedhar block1 live range of x0 x1 x2 block1 live range of x0 x1' x2 block1 x0 = 1 x0 = 1 A = 1 block2 block2 block2 x1 = φ (x0, x2) x2 = 2 x1 = φ (x0, x2) x1 = x1 x2 = 2 x1 = A A = 2 block3 block3 block3 return x1 return x1 return x1 {x0, x1, x2} A (a) SSA form (b) eliminating interference (c) normal form after back translation

Empirical comparison of SSA back translation No. of copies (no. of copies in loops) SSA form Briggs Briggs + Coalescing Sreedhar Lost copy 0 3 1 (1) 1 (1) Simple ordering 0 5 2 (2) 2 (2) Swap 0 7 5 (5) 3 (3) Swap-lost 0 10 7 (7) 4 (4) do 0 9 6 (4) 4 (2) fib 0 4 0 (0) 0 (0) GCD 0 9 5 (2) 5 (2) Selection Sort 0 9 0 (0) 0 (0) Hige Swap 0 8 3 (3) 4 (4)

4. Examples of SSA form optimization

Conditional constant propagation Example source program /* from Appel, A.: Modern compiler implementation in Java, 2nd ed., 2002, Fig. 19.4 */ int main() { int i, j, k; i = 1; j = 1; k = 0; while (k < 100) { if (j < 20) { j = i; k = k + 1; } else { j = k; k = k + 2; } } printf("%d n",j); }

Conditional constant propagation By carefully analyzing the source program, we know that j is always 1. i = 1; j = 1; k = 0; while (k < 100) { if (j < 20) { j = i; k = k + 1; } else { j = k; k = k + 2; } } printf("%d n",j); after conditional constant propagation k = 0; L3: if (k >= 100) go to L8; k = k + 1; goto L3; L8: printf("%d n",1); (it is actually in SSA fom intermediate representation, but shown here in C style without φ)

Conditional constant propagation By carefully analyzing the source program, we know that j is always 1. i = 1; j = 1; k = 0; while (k < 100) { if (j < 20) { j = i; k = k + 1; } else { j = k; k = k + 2; } } printf("%d n",j); (same as before) k = 0; while (k < 100) { k = k + 1; } printf("%d n",1); (in structured program style) We see that the else part of the while statement is deleted.

Operator strength reduction of loop induction variables and linear function test replacement Example source program /* addvectosr.c -- add vector for osr example */ void addvect(int z[], int x[], int y[], int n) { int i; for (i = 0; i < n; i++) { z[i] = x[i] + y[i]; } }

Operator strength reduction of loop induction variables and linear function test replacement i1 = 0 L1: i2 = φ(i1, i3) if (i2 >= n1) goto L6 temp4i = 4 * i2 tempa = x + temp4i tempxi = * tempa... L6: i3 = i2 + 1 goto L1 SSA form intermediate representation shown in C style i1 i2 i3 0 φ + 1 temp4i * x + tempa SSA graph tempxi load ( load corresponds to prefix * in C)

Operator strength reduction of loop induction variables and linear function test replacement i1 temp4i1 0 0 i2 φ temp4i * x temp4i2 φ x i3 + + tempa tempxi load + temp4i3 + tempxi load 1 (same as before) 4 transformation of SSA graph - strength reduction

Operator strength reduction of loop induction variables and linear function test replacement temp4i1 temp4i2 0 φ x temp4i1 = 0 L1: temp4i2 = φ(temp4i1, temp4i3)... + tempxi + load temp4i3 4 SSA graph (same as before) tempxi = * (x + temp4i2)... temp4i3 = temp4i2 + 4... SSA form reconstruct SSA form

Operator strength reduction of loop induction variables and linear function test replacement Example source program void addvect (int z[], int x[], int y[], int n) { int i; for (i = 0; i < n; i++) { z[i] = x[i] + y[i]; } } In array accesses, multiply by 4 disappears. (Actually it is in SSA form intermediate representation, but shown here in C style without.) after strength reduction of ind var... temp4n = n << 2;... temp4i = 0; i = 0; if (i >= n) goto L6; L4: tempxi = * (x + temp4i); tempyi = * (y + temp4i); * (z + temp4i) = tempxi + tempyi; i = i + 1; temp4i = temp4i + 4; if (temp4i < temp4n) goto L4; L6:

Operator strength reduction of loop induction variables and linear function test replacement after strength reduction of ind var... temp4n = n << 2;... temp4i = 0; i = 0; if (i >= n) goto L6; L4: tempxi = * (x + temp4i); tempyi = * (y + temp4i); * (z + temp4i) = tempxi + tempyi; i = i + 1; temp4i = temp4i + 4; if (temp4i < temp4n) goto L4; L6: (same as before) after all optimization... temp4n = n << 2; if (0 >= n) goto L6; temp4i = 0; L4: tempxi = * (x + temp4i); tempyi = * (y + temp4i); * (z + temp4i) = tempxi + tempyi; temp4i = temp4i + 4; if (temp4i < temp4n) goto L4; L6: Test of induction variable i is replaced by that of temp4i, and i disappears.

Common subexpression elimination Example source program /* cse_test.c */... a = 1; b = 2; x = 3; y = a + b; c = x + y; if (a < 10) z = a + b; else z = a + b; d = x + z; printf("%d n",d);

Common subexpression elimination source program a = 1; b = 2; x = 3; y = a + b; c = x + y; if (a < 10) z = a + b; else z = a + b; d = x + z; printf("%d n", d); after analysis a = 1; b = 2; x = 3; y = a + b; c = x + y; if (a < 10) (z IS y) // a + b is a common subexpr // and z is equal to y. else (z IS y) // a + b is a common subexpr //and z is equal to y. (z IS y IS a+b) // therefore, z is equal to y // or a+b after if statement. (d IS x+z) // d is equal to x+z printf("%d n", d);

Common subexpression elimination a = 1; b = 2; x = 3; y = a + b; c = x + y; if (a < 10) (z IS y) else (z IS y) (z IS y IS a+b) (d IS x+z) printf("%d n", d); after analysis (same as before) a = 1; b = 2; x = 3; z = a + b; printf("%d n", x+z); after common subexpr elim, dead code elim, and coalescing printf("%d n", 6); after all optimizations

5. Preliminary experimental results Coins C-to-SPARC compiler Measured on Sun Blade 1000 Ultra SPARC III Superscalar SPARC V9 750 MHz * 2 CPU L1 cache : 64 kb data, 32 kb instruction memory: 1 GB SunOS 5.8

Execution time of object code Legend gcc -O0 NO SSA: Coins baseline compiler (without SSA optimization) SSA CSE: constant propagation, hoisting loop invariant, SSA graph, common subexpression elimination, copy propagation, constant propagation, dead code elimination, operator strength reduction, constant propagation, common subexpression elimination, redundant phi elimination, dead code elimination, empty block elimination, in this order (pruned SSA, Sreedhar s method 3 back translation, no memory alias analysis, loop transformation) SSA CSEQP: constant propagation, hoisting loop invariant, SSA graph, common subexpression elimination by EQP, copy propagation, constant propagation, dead code elimination, operator strength reduction, constant propagation, common subexpression elimination by EQP, redundant phi elimination, dead code elimination, empty block elimination, in this order (pruned SSA, Sreedhar s method 3 back translation and Chaitin s coalescing, loop transformation) gcc -O2

(gcc -O0 =1) Execution time of object code (small integer programs)

(gcc -O0 =1) Execution time of object code (small floating point programs)

(gcc -O0 = 1) Execution time of object code (SPEC 95 Fortran benchmarks rewritten to C)

Execution time of object code (SPEC 2000 benchmark) (gcc -O0 = 1) SSA CSE: hoist loop inv, common subexpression elimination SSA MANY: hoist loop inv, const prop, com subexp elim, dead code elim, coalescing

Related work: SSA form in compiler infrastructures SUIF (Stanford Univ.): no SSA form machine SUIF (Harvard Univ.): only one optimization in SSA form Scale (Univ. Massachusetts): a couple of SSA form optimizations. But it generates only C programs, and cannot generate machine codes like in COINS. GCC: some attempts but experimental Only COINS will have full support of SSA form as a compiler infrastructure

Summary SSA form module of the COINS infrastructure Empirical comparison of algorithms for SSA translation gave criterion to make a good choice Empirical comparison of algorithms for SSA back translation suggests Sreedhar et al. s method is recommended. A rich set of SSA form optimization modules Preliminary experimental results Hope COINS and its SSA module help the compiler writer to compare/evaluate/add optimization methods

the following slides are unnecessary

After Conditional Constant Propagation (cont.) k = 0; L3: if (k >= 100) go to L8; k = k + 1; goto L3; L8: printf("%d n",1); = k = 0; while (k < 100) { k = k + 1; } printf("%d n",1); (in structure program style) We see that the else part of the while statement is deleted.