Compiler Design. Register Allocation. Hwansoo Han

Similar documents
Global Register Allocation via Graph Coloring

Code generation for modern processors

Code generation for modern processors

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

k register IR Register Allocation IR Instruction Scheduling n), maybe O(n 2 ), but not O(2 n ) k register code

Register allocation. Overview

Global Register Allocation - Part 2

Global Register Allocation

Global Register Allocation via Graph Coloring The Chaitin-Briggs Algorithm. Comp 412

Register Allocation 3/16/11. What a Smart Allocator Needs to Do. Global Register Allocation. Global Register Allocation. Outline.

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

CS 406/534 Compiler Construction Putting It All Together

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices

The C2 Register Allocator. Niclas Adlertz

Register Allocation. Register Allocation. Local Register Allocation. Live range. Register Allocation for Loops

Register allocation. instruction selection. machine code. register allocation. errors

Global Register Allocation - Part 3

Lecture Overview Register Allocation

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210

register allocation saves energy register allocation reduces memory accesses.

CSC D70: Compiler Optimization Register Allocation

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1

Register Allocation & Liveness Analysis

Topic 12: Register Allocation

Rematerialization. Graph Coloring Register Allocation. Some expressions are especially simple to recompute: Last Time

Register Allocation. Preston Briggs Reservoir Labs

EECS 583 Class 15 Register Allocation

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210

Fall Compiler Principles Lecture 12: Register Allocation. Roman Manevich Ben-Gurion University

Low-Level Issues. Register Allocation. Last lecture! Liveness analysis! Register allocation. ! More register allocation. ! Instruction scheduling

Register allocation. CS Compiler Design. Liveness analysis. Register allocation. Liveness analysis and Register allocation. V.

Lecture 21 CIS 341: COMPILERS

Register Allocation via Hierarchical Graph Coloring

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413

Abstract Interpretation Continued

Global Register Allocation - 2

Register Allocation 1

Register Allocation. Lecture 38

Register Allocation (via graph coloring) Lecture 25. CS 536 Spring

Today More register allocation Clarifications from last time Finish improvements on basic graph coloring concept Procedure calls Interprocedural

Register Allocation. Michael O Boyle. February, 2014

Register allocation. TDT4205 Lecture 31

Instruction Selection and Scheduling

How to efficiently use the address register? Address register = contains the address of the operand to fetch from memory.

Lecture Compiler Backend

Compilers and Code Optimization EDOARDO FUSELLA

CS415 Compilers. Instruction Scheduling and Lexical Analysis

Register Allocation. CS 502 Lecture 14 11/25/08

Liveness Analysis and Register Allocation. Xiao Jia May 3 rd, 2013

Live Range Splitting in a. the graph represent live ranges, or values. An edge between two nodes

Global Register Allocation

Characteristics of RISC processors. Code generation for superscalar RISCprocessors. What are RISC and CISC? Processors with and without pipelining

CHAPTER 3. Register allocation

Code generation for superscalar RISCprocessors. Characteristics of RISC processors. What are RISC and CISC? Processors with and without pipelining

Improvements to Graph Coloring Register Allocation

Register Allocation. Lecture 16

Lecture 6. Register Allocation. I. Introduction. II. Abstraction and the Problem III. Algorithm

CHAPTER 3. Register allocation

SSA-Form Register Allocation

Lecture 15 Register Allocation & Spilling

Register Allocation. 2. register contents, namely which variables, constants, or intermediate results are currently in a register, and

Quality and Speed in Linear-Scan Register Allocation

Liveness Analysis and Register Allocation

Register Allocation. Stanford University CS243 Winter 2006 Wei Li 1

Static single assignment

CS415 Compilers Register Allocation. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Instruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Compiler Architecture

Linear Scan Register Allocation. Kevin Millikin

Lecture 25: Register Allocation

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1

Improvements to Graph Coloring Register Allocation

Register Allocation. Introduction Local Register Allocators

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Lecture Notes on Register Allocation

Reducing the Impact of Spill Code 1

Introduction to Optimization Local Value Numbering

April 15, 2015 More Register Allocation 1. Problem Register values may change across procedure calls The allocator must be sensitive to this

Computational Geometry

CSc 553. Principles of Compilation. 23 : Register Allocation. Department of Computer Science University of Arizona

Code Generation. CS 540 George Mason University

Register Allocation Amitabha Sanyal Department of Computer Science & Engineering IIT Bombay

1 Matchings in Graphs

We describe two improvements to Chaitin-style graph coloring register allocators. The rst, optimistic

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.

Linear-Scan Register Allocation. CS 352 Lecture 12 11/28/07

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Chapter 9. Register Allocation

CS5363 Final Review. cs5363 1

Register Allocation. Ina Schaefer Selected Aspects of Compilers 100. Register Allocation

Compiler Optimisation

CS415 Compilers. Intermediate Represeation & Code Generation

Review; questions Basic Analyses (2) Assign (see Schedule for links)

Instruction Selection: Preliminaries. Comp 412

Live Variable Analysis. Work List Iterative Algorithm Rehashed

Why Global Dataflow Analysis?

CS 406/534 Compiler Construction Instruction Scheduling

Transcription:

Compiler Design Register Allocation Hwansoo Han

Big Picture of Code Generation Register allocation Decides which values will reside in registers Changes the storage mapping Concerns about placement of data & memory operations Single basic block, no spilling, and one register size Whole procedure is NP-Complete Start from virtual registers and map enough into k With targeting, focus on good priority heuristic 2

Register Allocation Part of the compiler s back end IR Instruction Selection Instruction Scheduling m register IR Register Allocation k register IR Machine code Critical properties Produce correct code that uses k (or fewer) registers Minimize running time Minimize added loads and stores Minimize space used to hold spilled values NP complete problem Need heuristic Operate efficiently- O(n), O(n log 2 n), maybe O(n 2 ), but not O(2 n ) 3 Errors

Global Register Allocation Taking a global approach Abandon the distinction between local & global Make systematic use of registers or memory Adopt a general scheme to approximate a good allocation Need to discover relationship between definitions and uses. DU-chain, SSA are good representations SSA has information about defs and uses relationship presented with a single name Construct web DU-chain: maximal union of DU-chains SSA: Ø( ) unions sets associated with each parameter 4

Web from DU-chains Built from DU chains The maximal union of intersecting DU-chains for a variable Ex) assigning webs to temporaries B1 x := y := T1 union B4 x := := y T2 B5 := y y := B2 := x := y T3 B3 := x x := T4 B6 y := := x 5

Web from SSA Instead of DU-chains, the SSA form can be used to identify webs in the program. Each SSA-form variable is the head of a DU-chain. Union is found at -function. B5 B1 x 1 := y 1 := T1 union B4 x 2 := := y 2 T2 := y 4 y 2 := B2 := x 1 := y 1 false union B3 Pruned SSA Will eliminate x 3 = (x 1,x 2 ) y 3 = (y 1,y 2 ) := x 3 x 4 := T3 6 B6 y 5 := := x 4

Graph Coloring (A Background Digression) The problem A graph G is said to be k-colorable iff the nodes can be colored with k different colors so that no edge in G connects two nodes with the same color Examples 2-colorable 3-colorable Each color can be mapped to a distinct physical register 7

Global Register Allocation Graph coloring paradigm (Lavrov & (later) Chaitin ) 1 Build an interference graph G I for the procedure Computing LIVE is harder than in the local case G I is not an interval graph 2 Try to construct a k-coloring Minimal coloring is NP-Complete Spill placement becomes a critical issue 3 Map colors onto physical registers 8

Building the Interference Graph (1) What is an interference? (or conflict) Two values interfere if there exists an operation where both are simultaneously live If x and y interfere, they cannot occupy the same register The interference graph, G I Nodes in G I represent values, webs or live ranges Edges in G I represent individual interferences For x, y G I, <x,y> iff x and y interfere A k-coloring of G I can be mapped into an allocation to k registers 9

Building the Interference Graph (2) To build the interference graph 1 Discover live ranges Build SSA form At each -function, take the union of the arguments 2 Compute LIVE sets for each block Use an iterative data-flow solver Solve equations for LIVEOUT over domain of live ranges Draw edges among live ranges in LIVEOUT 3 Iterate over each block Track the current LIVENOW set backward from the bottom At each operation, add appropriate edges & update LIVENOW With operation form of op r1, r2 r3 for each r in LIVENOW draw an edge (r3, r) remove r3 from LIVENOW add r1, r2 to LIVENOW 10

Coloring for Register Allocation Suppose you have k registers look for a k coloring Any vertex n that has fewer than k neighbors in the interference graph (n < k) can always be colored! Pick any color not used by its neighbors there must be one Ideas behind Chaitin s algorithm: Pick any vertex n such that n < k and put it on the stack Remove that vertex and all edges incident from the graph This may make some new nodes have fewer than k neighbors At the end, if some vertex n still has k or more neighbors, then spill the live range associated with n Otherwise successively pop vertices off the stack and color them in the lowest color not used by some neighbor 11

Chaitin s Algorithm 1. While vertices with < k neighbors in G I Pick any vertex n such that n < k and put it on the stack Remove that vertex and all edges incident to it from G I This will lower the degree of n s neighbors 2. If G I is non-empty (all vertices have k or more neighbors) then: Pick a vertex n (using some heuristic) and spill the live range associated with n. Then a new code is generated with spill. Build G I from the new code and restart at step 1. 3. Successively pop vertices off the stack and color them in the lowest color not used by already colored neighbors 12

Chaitin s Algorithm in Practice 3 Registers 2 1 4 5 3 Stack 13

Chaitin s Algorithm in Practice 3 Registers 2 4 5 3 1 Stack 14

Chaitin s Algorithm in Practice 3 Registers 4 5 2 1 3 Stack 15

Chaitin s Algorithm in Practice 3 Registers 5 4 2 1 3 Stack 16

Chaitin s Algorithm in Practice 3 Registers Colors: 5 3 4 2 1 1: 2: 3: Stack 17

Chaitin s Algorithm in Practice 3 Registers Colors: 5 1: 3 4 2 1 2: 3: Stack 18

Chaitin s Algorithm in Practice 3 Registers Colors: 5 1: 4 2 1 3 2: 3: Stack 19

Chaitin s Algorithm in Practice 3 Registers Colors: 4 5 1: 2 1 3 2: 3: Stack 20

Chaitin s Algorithm in Practice 3 Registers 1 Stack 2 3 4 5 Colors: 1: 2: 3: 21

Chaitin s Algorithm in Practice 3 Registers 2 1 4 5 3 Colors: 1: 2: 3: Stack 22

Improvement in Coloring Scheme Optimistic Coloring (Briggs, Cooper, Kennedy, and Torczon) Instead of stopping at the end when all vertices have at least k neighbors, put each on the stack according to some priority When you pop them off they may still color! 2 Registers: 2-colorable 23

Chaitin-Briggs Algorithm 1. While vertices with < k neighbors in G I Pick any vertex n such that n < k and put it on the stack Remove that vertex and all edges incident to it from G I This may create vertices with fewer than k neighbors 2. If G I is non-empty (all vertices have k or more neighbors) then: Pick a vertex n (using some heuristic condition), push n on the stack and remove n from G I, along with all edges incident to it If this causes some vertex in G I to have fewer than k neighbors, then go to step 1; otherwise, repeat step 2 3. Successively pop vertices off the stack and color them in the lowest color not used by already colored neighbors If some vertex cannot be colored, then pick an uncolored vertex to spill, spill it. Build G I from the new code and restart at step 1. 24

Chaitin Allocator (Bottom-up Coloring) renumber Build SSA, build live ranges, rename build Build the interference graph coalesce Fold unneeded copies LR x LR y, and < LR x,lr y > G I combine LR x & LR y spill costs simplify Estimate cost for spilling each live range Remove nodes from the graph and push to the stack W at c h this edge select spill While stack is non-empty pop n, insert n back into G I, & try to color it Spill uncolored definitions & uses 25

Chaitin-Briggs Allocator (Bottom-up Coloring) renumber Build SSA, build live ranges, rename build Build the interference graph coalesce Fold unneeded copies LR x LR y, and < LR x,lr y > G I combine LR x & LR y spill costs simplify Estimate cost for spilling each live range Remove nodes from the graph & push to the stack select While stack is non-empty pop n, insert n into G I, & try to color it spill Spill uncolored definitions & uses 26

Picking a Spill Candidate Spill cost metric Weighted cost of loads & stores needed to spill x Repeat simplify, select, & spill with several different spill choice heuristics and keep the best (Bernstein et al) Chaitin s heuristic Minimize spill cost and reduce current degree of G I A few special cases Negative spill cost (spill preemptively) load and store to the same memory location, but no other uses Infinite spill cost (cannot be spilled) A use immediately follows its definition (short Live-Range) 27

Other Improvements to Chaitin-Briggs Spilling partial live ranges Bergner introduced interference region spilling Limits spilling to regions of high demand for registers Splitting live ranges Simple idea break up one or more live ranges Allocator can use different registers for distinct subranges Allocator can spill subranges independently Conservative coalescing Combining LR x LR y to form LR xy may increase register pressure Limit coalescing to case where LR xy < k Iterative form tries to coalesce before spilling 28

Chaitin-Briggs Allocator (Bottom-up Global) Strengths Precise interference graph Strong coalescing mechanism Handles register assignment well Runs fairly quickly Weaknesses Known to overspill in tight cases (overspill in high demand) Interference graph has no geography (size of overlapping range) Spills a live range everywhere (no partial spill) 29

Summary Register allocation Infinite register code to finite register code Spill may be needed Graph coloring problem Chaitin-Briggs algorithm Build interference graph Simplify graph with stack Optimistic spill Many more tweaks are possible to improve 30