Rematerialization. Graph Coloring Register Allocation. Some expressions are especially simple to recompute: Last Time

Similar documents
Improvements to Graph Coloring Register Allocation

Improvements to Graph Coloring Register Allocation

We describe two improvements to Chaitin-style graph coloring register allocators. The rst, optimistic

Register Allocation. Preston Briggs Reservoir Labs

Register allocation. Overview

Compiler Design. Register Allocation. Hwansoo Han

Register Allocation. Register Allocation. Local Register Allocation. Live range. Register Allocation for Loops

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

Topic 12: Register Allocation

Global Register Allocation - Part 2

Global Register Allocation via Graph Coloring

CSC D70: Compiler Optimization Register Allocation

Code generation for modern processors

Code generation for modern processors

Today More register allocation Clarifications from last time Finish improvements on basic graph coloring concept Procedure calls Interprocedural

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210

CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation

Fall Compiler Principles Lecture 12: Register Allocation. Roman Manevich Ben-Gurion University

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Using Static Single Assignment Form

Register allocation. instruction selection. machine code. register allocation. errors

Global Register Allocation

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413

Register Allocation in Just-in-Time Compilers: 15 Years of Linear Scan

Register Allocation 3/16/11. What a Smart Allocator Needs to Do. Global Register Allocation. Global Register Allocation. Outline.

Linear Scan Register Allocation. Kevin Millikin

Global Register Allocation - Part 3

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

register allocation saves energy register allocation reduces memory accesses.

CS 406/534 Compiler Construction Putting It All Together

Sparse Conditional Constant Propagation

Code generation and local optimization

Sparse Conditional Constant Propagation

Code generation and local optimization

Sparse Conditional Constant Propagation

A Propagation Engine for GCC

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1

Low-Level Issues. Register Allocation. Last lecture! Liveness analysis! Register allocation. ! More register allocation. ! Instruction scheduling

The C2 Register Allocator. Niclas Adlertz

Global Register Allocation via Graph Coloring The Chaitin-Briggs Algorithm. Comp 412

Live Variable Analysis. Work List Iterative Algorithm Rehashed

Reuse Optimization. LLVM Compiler Infrastructure. Local Value Numbering. Local Value Numbering (cont)

EECS 583 Class 15 Register Allocation

Optimized Interval Splitting in a Linear Scan Register Allocator

Computing Static Single Assignment (SSA) Form. Control Flow Analysis. Overview. Last Time. Constant propagation Dominator relationships

Improvements to Linear Scan register allocation

Iterated Register Coalescing

Live Range Splitting in a. the graph represent live ranges, or values. An edge between two nodes

Register Allocation via Hierarchical Graph Coloring

Register Allocation (via graph coloring) Lecture 25. CS 536 Spring

Lecture 21 CIS 341: COMPILERS

Register Allocation. Lecture 38

Low-level optimization

CSc 553. Principles of Compilation. 23 : Register Allocation. Department of Computer Science University of Arizona

Calvin Lin The University of Texas at Austin

Register Allocation 1

Lecture 6. Register Allocation. I. Introduction. II. Abstraction and the Problem III. Algorithm

Register Allocation & Liveness Analysis

Lecture Overview Register Allocation

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

Register Allocation. CS 502 Lecture 14 11/25/08

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

Semantic actions for expressions

Linear Scan Register Allocation in the Context of SSA Form and Register Constraints 1

Compilers and Code Optimization EDOARDO FUSELLA

Quality and Speed in Linear-Scan Register Allocation

Register allocation. TDT4205 Lecture 31

Lecture 15 Register Allocation & Spilling

SSA-Form Register Allocation

Program Optimizations using Data-Flow Analysis

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

k register IR Register Allocation IR Instruction Scheduling n), maybe O(n 2 ), but not O(2 n ) k register code

A Sparse Algorithm for Predicated Global Value Numbering

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Static Single Assignment (SSA) Form

CHAPTER 3. Register allocation

Lecture 5. Announcements: Today: Finish up functions in MIPS

Graph-Coloring Register Allocation for Irregular Architectures

MODEL ANSWERS COMP36512, May 2016

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

Today s Menu. >Use the Internal Register(s) >Use the Program Memory Space >Use the Stack >Use global memory

Register Allocation. Lecture 16

Abstract Interpretation Continued

CS153: Compilers Lecture 1: Introduction

Register Allocation III. Interference Graph Allocators. Coalescing. Granularity of Allocation (Renumber step in Briggs) Chaitin

CHAPTER 3. Register allocation

Register Allocation III. Interference Graph Allocators. Computing the Interference Graph (in MiniJava compiler)

Anticipation-based partial redundancy elimination for static single assignment form

Liveness Analysis and Register Allocation. Xiao Jia May 3 rd, 2013

Casting in C++ (intermediate level)

Register Allocation. 2. register contents, namely which variables, constants, or intermediate results are currently in a register, and

Quality and Speed in Linear-scan Register Allocation

Calvin Lin The University of Texas at Austin

CS61C : Machine Structures

Functions in MIPS. Functions in MIPS 1

Lectures 5. Announcements: Today: Oops in Strings/pointers (example from last time) Functions in MIPS

Register Allocation. Michael O Boyle. February, 2014

Transcription:

Graph Coloring Register Allocation Last Time Chaitin et al. Briggs et al. Today Finish Briggs et al. basics An improvement: rematerialization Rematerialization Some expressions are especially simple to recompute: Operands are constant (though not necessarily known) Operands are available globally Chaitin calls these expressions never-killed Typical examples include: Constant Constant + frame pointer Load of constant parameter Load from constant pool Access through display CS 380C Lecture 13 1 Register Allocation CS 380C Lecture 13 2 Register Allocation

Rematerialization What we get We should recognize that these are cheaper before we try to color Helps resolve spill choices correctly Chaitin spill p Chow - Spitting spill p Original Ideal reload p y y+[p] reload p y y+[p] y y+[p] y y+[p] reload p p p+1 p p+1 reload p p p+1 spill p p p+1 CS 380C Lecture 13 3 Register Allocation CS 380C Lecture 13 4 Register Allocation

Live Ranges and Values Chaitin s allocator works with live ranges A live range may include many values, connected by common uses The Plan To discover and isolate rematerializable values: Find values (use pruned SSA graph) Tag values according to definition A value corresponds to a single definition, including the merge of two values Propagate tags (use sparse simple constant algorithm) Union connected values if tags are identical You should be thinking: Hmmm, smells like SSA or something,... Chaitin s allocator can handle rematerializing a live range with a single value CS 380C Lecture 13 5 Register Allocation CS 380C Lecture 13 6 Register Allocation

A Lattice Lattice elements may have one of three types: No information is known a value defined by a copy instruction or a φ-node has an initial tag of inst A value defined by an appropriate instruction (never-killed) should be rematerialized the value s tag is just a pointer to the instruction The Meet Operation The meet operation is any = any any = inst i inst j = inst i if inst i = inst j inst i inst j = if inst i inst j Value cannot be rematerialized values defined by inappropriate instructions are immediately tagged with inst 1 inst 2 inst 3 inst n inst i = inst j compares the instructions on an operand-by-operand basis Since our instructions have only 2 operands, asymptotic complexity is not affected CS 380C Lecture 13 7 Register Allocation CS 380C Lecture 13 8 Register Allocation

Conservative Coalescing Example We remove splits where the source and destination values have the same tag Undoing Splits Briggs claims that they end up with splits placed perfectly SSA Splits Minimal All never-killed values are isolated with the minimum number of splits p 0 123 p 0 123 p 0 123 Nevertheless, some of the splits (copies) are never required y y+[p 0 ] y y+[p 0 ] y y+[p 0 ] Conservative coalescing Biased coloring p 1 p 0 p 12 p 0 p 1 φ(p 0, p 2 ) p 2 p 1 + 1 p 2 p 1 + 1 p 1 p 2 p 12 p 12 + 1 Similarly, we remove copies if the source and destination values have identical inst tags CS 380C Lecture 13 9 Register Allocation CS 380C Lecture 13 10 Register Allocation

Briggs phases renumber Find all distinct live ranges and number them uniquely build Construct the interference graph coalesce For each copy where the source and destination live ranges don t interfere, union the two live ranges and remove the copy spill costs Estimate the dynamic cost for spilling each live range simplify Repeatedly remove nodes with degree < k from the graph and push them on a stack. When necessary, choose spill candidates and push them on the stack select Reassemble the graph with nodes popped from the graph. As each node is added to the graph, choose a color differing from neighbors in the graph. If no color is available, the node is left uncolored spill code Spill uncolored nodes by inserting a load or store at each use or definition Conservative Coalescing Two rounds of coalescing 1. Coalesce copies (subsumption) 2. Conservatively coalesce splits Note that each round of coalescing may be repeated several times Conservative coalescing removes splits when it doesn t hurt colorability The conservative approximation is to remove a split if the resulting live range has < k neighbors of degree k CS 380C Lecture 13 11 Register Allocation CS 380C Lecture 13 12 Register Allocation

Biased Coloring We also modify select slightly Results Dynamic measurements on 70 routines (mostly SPEC Fortran) On a (simulated) machine with 16 integer registers and 16 floating-point registers Select (re)assembles the graph 1 node at a time As each node is added to the graph, we choose a color that differs from any neighbor With biased coloring, we try to choose a color that helps remove a split Counting loads, stores, copies, ldi s, addi s contributing to spill costs New < Old 28 routines New > Old 2 routines The new allocator is typically slightly slower (compile time) due to Propagation of tags Conservative coalescing An additional refinement limited lookahead Avoid choosing colors that an uncolored partner can t use Occasionally, the new allocator is faster because of simplified coalescing CS 380C Lecture 13 13 Register Allocation CS 380C Lecture 13 14 Register Allocation

Rematerialization A natural, low-cost extension to Chaitin s ideas on rematerialization Handles complex live ranges that are wholly or partially rematerializable Next Time Read: Traub, Holloway, and Smith, Quality and Speed in Linear-scan Register Allocation, ACM SIGPLAN 98 Conference on Programming Language Design and Implementation, June 1998, pp. 142-151. Especially significant to splitting allocators (e.g., Chow) A simple adaptation of Wegman and Zadeck s sparse simple constant propagation algorithm Other required engineering changes Results showing that significant opportunities for rematerialization occur in practice CS 380C Lecture 13 15 Register Allocation CS 380C Lecture 13 16 Register Allocation