Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

Similar documents
Register Allocation 3/16/11. What a Smart Allocator Needs to Do. Global Register Allocation. Global Register Allocation. Outline.

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

register allocation saves energy register allocation reduces memory accesses.

Compiler Design. Register Allocation. Hwansoo Han

Global Register Allocation via Graph Coloring

CSC D70: Compiler Optimization Register Allocation

Code generation for modern processors

Code generation for modern processors

The C2 Register Allocator. Niclas Adlertz

Global Register Allocation - Part 3

Register allocation. Overview

Register Allocation. Introduction Local Register Allocators

Low-Level Issues. Register Allocation. Last lecture! Liveness analysis! Register allocation. ! More register allocation. ! Instruction scheduling

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation

CS 406/534 Compiler Construction Putting It All Together

Lecture 15 Register Allocation & Spilling

Lecture 21 CIS 341: COMPILERS

Global Register Allocation

Global Register Allocation - Part 2

CHAPTER 3. Register allocation

Register Allocation via Hierarchical Graph Coloring

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210

Rematerialization. Graph Coloring Register Allocation. Some expressions are especially simple to recompute: Last Time

Register Allocation. Register Allocation. Local Register Allocation. Live range. Register Allocation for Loops

k register IR Register Allocation IR Instruction Scheduling n), maybe O(n 2 ), but not O(2 n ) k register code

Register allocation. instruction selection. machine code. register allocation. errors

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices

SSA-Form Register Allocation

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1

Register allocation. TDT4205 Lecture 31

Today More register allocation Clarifications from last time Finish improvements on basic graph coloring concept Procedure calls Interprocedural

CHAPTER 3. Register allocation

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

Introduction to Machine-Independent Optimizations - 6

April 15, 2015 More Register Allocation 1. Problem Register values may change across procedure calls The allocator must be sensitive to this

Register Allocation. Preston Briggs Reservoir Labs

Fall Compiler Principles Lecture 12: Register Allocation. Roman Manevich Ben-Gurion University

Register Allocation & Liveness Analysis

Liveness Analysis and Register Allocation. Xiao Jia May 3 rd, 2013

Chapter 9. Register Allocation

Lecture 6. Register Allocation. I. Introduction. II. Abstraction and the Problem III. Algorithm

Intermediate Code & Local Optimizations. Lecture 20

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1

How to efficiently use the address register? Address register = contains the address of the operand to fetch from memory.

Lecture Outline. Intermediate code Intermediate Code & Local Optimizations. Local optimizations. Lecture 14. Next time: global optimizations

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Linear Scan Register Allocation. Kevin Millikin

Static single assignment

EECS 583 Class 15 Register Allocation

Register Allocation. Stanford University CS243 Winter 2006 Wei Li 1

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

Register Allocation. Michael O Boyle. February, 2014

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413

Topic 12: Register Allocation

Requirements, Partitioning, paging, and segmentation

CSc 553. Principles of Compilation. 23 : Register Allocation. Department of Computer Science University of Arizona

Lecture 3 Local Optimizations, Intro to SSA

Low-level optimization

Run Time Environment. Procedure Abstraction. The Procedure as a Control Abstraction. The Procedure as a Control Abstraction

Lecture Compiler Backend

Simple Machine Model. Lectures 14 & 15: Instruction Scheduling. Simple Execution Model. Simple Execution Model

Functional programming languages

Compilers and Code Optimization EDOARDO FUSELLA

Register allocation. CS Compiler Design. Liveness analysis. Register allocation. Liveness analysis and Register allocation. V.

CS143 Final Spring 2016

Register Allocation 1

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

Register Allocation. Lecture 38

Towards an SSA based compiler back-end: some interesting properties of SSA and its extensions

Syntactic Directed Translation

Register Allocation Amitabha Sanyal Department of Computer Science & Engineering IIT Bombay

Requirements, Partitioning, paging, and segmentation

Register Allocation. CS 502 Lecture 14 11/25/08

Intermediate Code & Local Optimizations

Code generation and local optimization

Lecture Notes on Register Allocation

Lecture Overview Register Allocation

Code generation and local optimization

Global Optimization. Lecture Outline. Global flow analysis. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization

Rethinking Code Generation in Compilers

Register Allocation (via graph coloring) Lecture 25. CS 536 Spring

III Data Structures. Dynamic sets

Parsing Scheme (+ (* 2 3) 1) * 1

Run Time Environment. Activation Records Procedure Linkage Name Translation and Variable Access

Compiler Design. Fall Data-Flow Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Register Allocation in Just-in-Time Compilers: 15 Years of Linear Scan

Compiler Optimisation

Tour of common optimizations

Code generation for superscalar RISCprocessors. Characteristics of RISC processors. What are RISC and CISC? Processors with and without pipelining

Calvin Lin The University of Texas at Austin

Calvin Lin The University of Texas at Austin

Global Register Allocation via Graph Coloring The Chaitin-Briggs Algorithm. Comp 412

Quality and Speed in Linear-Scan Register Allocation

Run-time Environment

Global Register Allocation - 2

Register Allocation, i. Overview & spilling

Abstract Interpretation Continued

Transcription:

Register Allocation Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies of these materials for their personal use.

What a Smart Allocator Needs to Do Determine ranges for each variable can benefit from using a register (webs) Determine which of these ranges overlap (interference) Find the benefit of keeping each web in a register (spill cost) Decide which webs gets a register (allocation) Split webs if needed (spilling and splitting) Assign hard registers to webs (assignment) Generate code including spills (code gen) 2

Global Register Allocation... store r4 x load x r1... This is an assignment problem, not an allocation problem! What s harder across multiple blocks? Could replace a load with a move Good assignment would obviate the move Must build a control-flow graph to understand inter-block flow Can spend an inordinate amount of time adjusting the allocation 3

... store r4 x Global Register Allocation... store r4 x load x r1... What if one block has x in a register, but the other does not? A more complex scenario Block with multiple predecessors in the control-flow graph Must get the right values in the right registers in each predecessor In a loop, a block can be its own predecessors This adds tremendous complications 4

Outline What is Register allocation and Its Importance Simple Register Allocators Webs Interference Graphs Graph Coloring Splitting More Optimizations 5

Webs What needs to Gets Memorized is the Value Divide Accesses to a Variable into Multiple Webs All definitions that reaches a use are in the same web All uses that use the value defined are in the same web Divide the Variable into Live Ranges Implementation: use DU chains A du-chain connects a definition to all uses reached by the definition A web combines du-chains containing a common use 6

Example write y write y read y read y 7

Example write y write y read y read y 8

Example write y write y read y read y 9

Example write y write y read y read y 10

Example write y write y read y read y 11

Example write y write y read y read y 12

Example write y write y read y read y 13

Example write y w2 write y w1 read y read y w3 w4 14

Webs (continued) In two Webs of the same Variable: No use in one web will ever use a value defined by the other web Thus, no value need to be carried between webs Each web can be treated independently as values are independent Web is used as the Unit of Register Allocation If a web is allocated to a register, all the uses and definitions within that web don t need to load and store from memory Solves the issue of cross Basic Block register assignment Different webs may be assigned to different registers or one to register and one to memory 15

Outline What is Register Allocation A Simple Register Allocator Webs Interference Graphs Graph Coloring Splitting More Optimizations 16

Interference Two webs interfere if their live ranges overlap in time What does time Mean, more precisely? There exists an instruction common to both ranges where c. : a = b op c. : They variable values of webs are operands of the instruction If there is a single instruction in the overlap a and the variable for the web that ends at that instruction is an operands and the variable for the web that starts at the instruction is the destination of the instruction then the webs do not interfere Non-interfering webs can be assigned to the same register c. : a = b op c. : a 17

Example write y w1 write y w2 read y read y w3 w4 18

Webs w1 and w2 interfere Webs w2 and w3 interfere Example write y w1 write y w2 read y read y w3 w4 19

Interference Graph Representation of Webs & their Interference Nodes are the webs An edge exists between two nodes if they interfere w1 w2 w3 w4 20

Example w1 w2 w3 w4 21

Webs w1 and w2 interfere Webs w2 and w3 interfere Example write y w1 write y w2 read y read y w3 w1 w2 w4 w3 w4 22

Outline What is Register Allocation A Simple Register Allocator Webs Interference Graphs Graph Coloring Splitting More Optimizations 23

Reg. Allocation Using Graph Coloring Each Web is Allocated a Register each node gets a register (color) If two webs interfere they cannot use the same register if two nodes have an edge between them, they cannot have the same color w1 w2 w3 w4 24

Graph Coloring What is the minimum number of colors that takes to color the nodes of the graph such that any nodes connected with an edge does not have the same color? Classic Problem in Graph Theory 25

Graph Coloring Example 26

Graph Coloring Example 1 Color 27

Graph Coloring Example 28

Graph Coloring Example 2 Colors 29

Graph Coloring Example 30

Graph Coloring Example Still 2 Colors 31

Graph Coloring Example 32

Graph Coloring Example 3 Colors 33

Heuristics for Register Coloring Coloring a graph with N colors If degree < N (degree of a node = # of edges) Node can always be colored After coloring the rest of the nodes, you ll have at least one color left to color the current node If degree N still may be colorable with N colors exact solution is NP complete 34

Heuristics for Register Coloring Remove nodes that have degree < N Push the removed nodes onto a stack If all the nodes have degree N Find a node to spill (no color for that node) Remove that node When empty, start the coloring step pop a node from stack back Assign it a color that is different from its connected nodes (since degree < N, a color should exist) 35

N = 3 Coloring Example w1 w2 w0 w3 w4 36

N = 3 Coloring Example w1 w2 w0 w3 w4 37

N = 3 Coloring Example w1 w2 w0 w3 w4 w4 38

N = 3 Coloring Example w1 w2 w3 w0 w4 w2 w4 39

N = 3 Coloring Example w1 w2 w3 w0 w4 w1 w2 w4 40

N = 3 Coloring Example w1 w2 w3 w0 w4 w3 w1 w2 w4 41

N = 3 Coloring Example w1 w2 w3 w0 w4 w3 w1 w2 w4 42

N = 3 Coloring Example w1 w2 w3 w0 w4 w3 w1 w2 w4 43

N = 3 Coloring Example w1 w2 w3 w0 w4 w1 w2 w4 44

N = 3 Coloring Example w1 w2 w3 w0 w4 w1 w2 w4 45

N = 3 Coloring Example w1 w2 w3 w0 w4 w2 w4 46

N = 3 Coloring Example w1 w2 w3 w0 w4 w2 w4 47

N = 3 Coloring Example w1 w2 w0 w3 w4 w4 48

N = 3 Coloring Example w1 w2 w0 w3 w4 w4 49

N = 3 Coloring Example w1 w2 w0 w3 w4 50

N = 3 Coloring Example w1 w2 w0 w3 w4 51

Another Coloring Example N = 3 s1 s2 s0 s3 s4 52

Another Coloring Example N = 3 s1 s2 s0 s3 s4 s4 53

Another Coloring Example N = 3 s1 s2 s0 s3 s4 s4 54

Another Coloring Example N = 3 s1 s2 s3 s0 s4 s3 s4 55

Another Coloring Example N = 3 s1 s2 s3 s0 s4 s2 s3 s4 56

Another Coloring Example N = 3 s1 s2 s3 s0 s4 s2 s3 s4 57

Another Coloring Example N = 3 s1 s2 s3 s0 s4 s2 s3 s4 58

Another Coloring Example N = 3 s1 s2 s3 s0 s4 s3 s4 59

Another Coloring Example N = 3 s1 s2 s3 s0 s4 s3 s4 60

Another Coloring Example N = 3 s1 s2 s0 s3 s4 s4 61

Another Coloring Example N = 3 s1 s2 s0 s3 s4 s4 62

Another Coloring Example N = 3 s1 s2 s0 s3 s4 63

Another Coloring Example N = 3 s1 s2 s0 s3 s4 64

Outline What is Register Allocation A simple register Allocator Webs Interference Graphs Graph coloring Splitting More Optimizations 65

Spilling and Splitting When the graph is non-n-colorable Select a Web to Spill Find the least costly Web to Spill Use and Defs of that web are read and writes to memory Split the web Split a web into multiple webs so that there will be less interference in the interference graph making it N-colorable Spill the value to memory and load it back at the points where the web is split 66

Splitting Example write z read z x y z write y read y read z 67

Splitting Example write z read z x y z write y read y x z y read z 68

Splitting Example write z read z x y z write y read y x z y read z 2 colorable? 69

Splitting Example write z read z x y z write y read y x z y read z 2 colorable? NO! 70

Splitting Example write z read z x y z write y read y read z 71

Splitting Example write z read z x y z write y read y read z 72

Splitting Example write z read z write y read y x y z x z1 z2 y read z 73

Splitting Example write z read z write y read y x y z x z1 z2 y read z 2 colorable? 74

Splitting Example x y z write z read z z1 write y read y x z2 y read z 2 colorable? YES! 75

Splitting Example x y z write z read z write y read y r1 r2 r1 r1 x z1 z2 y read z 2 colorable? YES! 76

Splitting Example write z read z store z write y read y x y z r1 r2 r1 r1 x z1 z2 y load z read z 2 colorable? YES! 77

Splitting Identify a Program Point where the Graph is not R-colorable (point where # of webs > N) Pick a web that is not used for the largest enclosing block around that point of the program Split that web Redo the interference graph Try to re-color the graph 78

Cost and Benefit of Splitting Cost of splitting a node Proportion to number of times splitted edge has to be crossed dynamically Estimate by its loop nesting Benefit Increase colorability of the nodes the splitted web interferes with Can approximate by its degree in the interference graph Greedy heuristic pick the live-range with the highest benefit-to-cost ratio to spill 79

Outline Overview of procedure optimizations What is register allocation A simple register allocator Webs Interference Graphs Graph coloring Splitting More Optimizations 80

More Transformations Register Coalescing Register Targeting (pre-coloring) Pre-Splitting of Webs Inter-procedural Register Allocation 81

Register Coalescing Find register copy instructions s j = s i If s j and s i do not interfere, combine their webs Pros Similar to copy propagation Reduce the number of instructions Cons May increase the degree of the combined node A colorable graph may become non-colorable 82

Register Targeting (pre-coloring) Some Variables need to be in Special Registers at Specific Points in the Execution first 4 arguments to a function return value Pre-color those webs and bind them to the appropriate register Will eliminate unnecessary copy instructions 83

Pre-Splitting of the Webs Some Ranges have Very Large dead Regions Large region where the variable is unused Break-up the Ranges need to pay a small cost in spilling but the graph will be very easy to color Can find Strategic Locations to Break-up at a call site (need to spill anyway) around a large loop nest (reserve registers for values used in the loop) 84

Inter-Procedural Register Allocation Saving Registers across Procedure boundaries is expensive especially for programs with many small functions Calling convention is too general and inefficient Customize calling convention per function by doing inter-procedural register allocation 85

Chaitin-Briggs Allocator renumber Build SSA, build live ranges, rename build coalesce spill costs simplify select Build the interference graph Fold unneeded copies LR x LR y, and < LR x,lr y > G I combine LR x & LR y Estimate cost for spilling each live range Remove nodes from the graph While stack is non-empty pop n, insert n into G I, & try to color it while N is non-empty if n with n < k then push n onto stack else pick n to spill push n onto stack remove n from G I spill Spill uncolored definitions & uses Briggs algorithm (1989) 86

Summary Register Allocation and Assignment Very Important Transformations and Optimization In General Hard Problem (NP-Complete) Many Approaches Local Methods: Top-Down and Bottom-Up Global Methods: Graph Coloring Webs Interference Graphs Coloring Other Transformations 87