Global Register Allocation

Similar documents
Global Register Allocation - Part 2

Global Register Allocation - 2

Global Register Allocation - Part 3

Compiler Design. Register Allocation. Hwansoo Han

Linear-Scan Register Allocation. CS 352 Lecture 12 11/28/07

Register Allocation. CS 502 Lecture 14 11/25/08

Global Register Allocation via Graph Coloring

Fall Compiler Principles Lecture 12: Register Allocation. Roman Manevich Ben-Gurion University

CSC D70: Compiler Optimization Register Allocation

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

Implementing Object-Oriented Languages - 2

Register allocation. Overview

Lecture 6. Register Allocation. I. Introduction. II. Abstraction and the Problem III. Algorithm

Liveness Analysis and Register Allocation. Xiao Jia May 3 rd, 2013

Register allocation. instruction selection. machine code. register allocation. errors

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Code generation for modern processors

Register Allocation. Register Allocation. Local Register Allocation. Live range. Register Allocation for Loops

Code generation for modern processors

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1

Register allocation. CS Compiler Design. Liveness analysis. Register allocation. Liveness analysis and Register allocation. V.

Register Allocation. Preston Briggs Reservoir Labs

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210

CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation

register allocation saves energy register allocation reduces memory accesses.

Linear Scan Register Allocation. Kevin Millikin

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210

Register Allocation. Lecture 38

Register Allocation via Hierarchical Graph Coloring

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413

Register Allocation in Just-in-Time Compilers: 15 Years of Linear Scan

Register Allocation 1

Lecture 15 Register Allocation & Spilling

Compiler Architecture

Topic 12: Register Allocation

Register Allocation (via graph coloring) Lecture 25. CS 536 Spring

Register Allocation & Liveness Analysis

Rematerialization. Graph Coloring Register Allocation. Some expressions are especially simple to recompute: Last Time

Lecture Overview Register Allocation

The C2 Register Allocator. Niclas Adlertz

Register Allocation 3/16/11. What a Smart Allocator Needs to Do. Global Register Allocation. Global Register Allocation. Outline.

Code Generation. CS 540 George Mason University

Run-time Environments - 2

Liveness Analysis and Register Allocation

Register Allocation. Introduction Local Register Allocators

Register Allocation. Stanford University CS243 Winter 2006 Wei Li 1

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Register Allocation. Lecture 16

Compilers and Code Optimization EDOARDO FUSELLA

CS2210: Compiler Construction. Compiler Optimization

More Code Generation and Optimization. Pat Morin COMP 3002

Characteristics of RISC processors. Code generation for superscalar RISCprocessors. What are RISC and CISC? Processors with and without pipelining

Code generation for superscalar RISCprocessors. Characteristics of RISC processors. What are RISC and CISC? Processors with and without pipelining

Chapter 12: Query Processing. Chapter 12: Query Processing

Database System Concepts

Global Register Allocation

Low-Level Issues. Register Allocation. Last lecture! Liveness analysis! Register allocation. ! More register allocation. ! Instruction scheduling

Chapter 12: Query Processing

Chapter 12: Query Processing

Abstract Interpretation Continued

Lecture 21 CIS 341: COMPILERS

EECS 583 Class 15 Register Allocation

Lecture 25: Register Allocation

Run-time Environments - 3

Chapter 13: Query Processing

Calvin Lin The University of Texas at Austin

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

CS 406/534 Compiler Construction Putting It All Together

Relational Model, Relational Algebra, and SQL

Lecture Compiler Backend

Improvements to Linear Scan register allocation

Lecture 3 Local Optimizations, Intro to SSA

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

Compilation /17a Lecture 10. Register Allocation Noam Rinetzky

Register Allocation. Michael O Boyle. February, 2014

Compiler Optimization Techniques

Control Flow Analysis & Def-Use. Hwansoo Han

Register Allocation Amitabha Sanyal Department of Computer Science & Engineering IIT Bombay

Low-level optimization

Solutions to Problem Set 1

Outline. Graphs. Divide and Conquer.

Global Register Allocation via Graph Coloring The Chaitin-Briggs Algorithm. Comp 412

Compiler Optimization and Code Generation

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

Code generation and optimization

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today

Compiler Optimization

Advanced Database Systems

Computer Architecture

Introduction to Linked List: Review. Source:

Intermediate Representation (IR)

Joint Entity Resolution

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Basic Search Algorithms

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

IR Lowering. Notation. Lowering Methodology. Nested Expressions. Nested Statements CS412/CS413. Introduction to Compilers Tim Teitelbaum

Transcription:

Global Register Allocation Y N Srikant Computer Science and Automation Indian Institute of Science Bangalore 560012 NPTEL Course on Compiler Design

Outline n Issues in Global Register Allocation n The Problem n Register Allocation based in Usage Counts n Linear Scan Register allocation n Chaitin s graph colouring based algorithm Y.N. Srikant 2

Some Issues in Register Allocation n Which values in a program reside in registers? (register allocation) n In which register? (register assignment) q The two together are usually loosely referred to as register allocation n What is the unit at the level of which register allocation is done? q q q q Typical units are basic blocks, functions and regions. RA within basic blocks is called local RA The other two are known as global RA Global RA requires much more time than local RA Y.N. Srikant 3

Some Issues in Register Allocation n Phase ordering between register allocation and instruction scheduling q q Performing RA first restricts movement of code during scheduling not recommended Scheduling instructions first cannot handle spill code introduced during RA n Requires another pass of scheduling n Tradeoff between speed and quality of allocation q In some cases e.g., in Just-In-Time compilation, cannot afford to spend too much time in register allocation. Y.N. Srikant 4

The Problem n Global Register Allocation assumes that allocation is done beyond basic blocks and usually at function level n Decision problem related to register allocation : q Given an intermediate language program represented as a control flow graph and a number k, is there an assignment of registers to program variables such that no conflicting variables are assigned the same register, no extra loads or stores are introduced, and at most k registers are used. n This problem has been shown to be NP-hard (Sethi 1970). n Graph colouring is the most popular heuristic used. n However, there are simpler algorithms as well Y.N. Srikant 5

Conflicting variables n Two variables interfere or conflict if their live ranges intersect q A variable is live at a point p in the flow graph, if there is a use of that variable in the path from p to the end of the flow graph q A live range of a variable is the smallest set of program points at which it is live. q Typically, instruction no. in the basic block along with the basic block no. is the representation for a point. Y.N. Srikant 6

Example Live range of A: B2, B4 B5 Live range of B: B3, B4, B6 If (cond) A not live then A = else B = X: if (cond) B not live then = A else = B ----------------------------- A and B both live B5 B2 B1 If (cond) T F B3 A= B= B4 If (cond) F B6 =A =B Y.N. Srikant 7

Global Register Allocation via Usage Counts (for Single Loops) n Allocate registers for variables used within loops n Requires information about liveness of variables at the entry and exit of each basic block (BB) of a loop n Once a variable is computed into a register, it stays in that register until the end of of the BB (subject to existence of next-uses) n Load/Store instructions cost 2 units (because they occupy two words) Y.N. Srikant 8

Global Register Allocation via Usage Counts (for Single Loops) 1. For every usage of a variable v in a BB, until it is first defined, do: Ø savings(v) = savings(v) + 1 Ø after v is defined, it stays in the register any way, and all further references are to that register 2. For every variable v computed in a BB, if it is live on exit from the BB, Ø count a savings of 2, since it is not necessary to store it at the end of the BB Y.N. Srikant 9

Global Register Allocation via Usage Counts (for Single Loops) n Total savings per variable v are B Loop ( savings( v, B) + 2* liveandcomputed ( v, B)) q liveandcomputed(v,b) in the second term is 1 or 0 n On entry to (exit from) the loop, we load (store) a variable live on entry (exit), and lose 2 units for each q But, these are one time costs and are neglected n Variables, whose savings are the highest will reside in registers Y.N. Srikant 10

Global Register Allocation via Usage Counts (for Single Loops) B2 B1 b = a-f e = d+c a = b*c d = b-a e = b/f acdf bcf f = e * a acde B3 Savings for the variables B1 B2 B3 B4 a: (0+2)+(1+0)+(1+0)+(0+0) = 4 b: (3+0)+(0+0)+(0+0)+(0+2) = 5 c: (1+0)+(1+0)+(0+0)+(1+0) = 3 d: (0+2)+(1+0)+(0+0)+(1+0) = 4 e: (0+2)+(0+2)+(1+0)+(0+0) = 5 f: (1+0)+(1+0)+(0+2)+(0+0) = 4 cdef b = c - d B4 aef If there are 3 registers, they will be allocated to the variables, a, b, bcdf abcdef and e Y.N. Srikant 11

Global Register Allocation via Usage Counts (for Nested Loops) n We first assign registers for inner loops and then consider outer loops. Let L1 nest L2 n For variables assigned registers in L2, but not in L1 q load these variables on entry to L2 and store them on exit from L2 n For variables assigned registers in L1, but not in L2 q store these variables on entry to L2 and load them on exit from L2 n All costs are calculated keeping the above rules Y.N. Srikant 12

Global Register Allocation via Usage Counts (for Nested Loops) L2 Body of L2 L1 n n n case 1: variables x,y,z assigned registers in L2, but not in L1 q q Load x,y,z on entry to L2 Store x,y,z on exit from L2 case 2: variables a,b,c assigned registers in L1, but not in L2 q q Store a,b,c on entry to L2 Load a,b,c on exit from L2 case 3: variables p,q assigned registers in both L1 and L2 q No special action Y.N. Srikant 13

A Fast Register Allocation Scheme n Linear scan register allocation(poletto and Sarkar 1999) uses the notion of a live interval rather than a live range. n Is relevant for applications where compile time is important, such as in dynamic compilation and in just-in-time compilers. n Other register allocation schemes based on graph colouring are slow and are not suitable for JIT and dynamic compilers Y.N. Srikant 14

Linear Scan Register Allocation n Assume that there is some numbering of the instructions in the intermediate form n An interval [i,j] is a live interval for variable v if there is no instruction with number j > j such that v is live at j and no instruction with number i < i such that v is live at i n This is a conservative approximation of live ranges: there may be subranges of [i,j] in which v is not live but these are ignored Y.N. Srikant 15

Live Interval Example v live sequentially numbered instructions v live i : i: j: j :............... } i i does not exist j : live interval for variable v j does not exist Y.N. Srikant 16

Example If (cond) then A= else B= X: if (cond) then =A else = B A NOT LIVE HERE LIVE INTERVAL FOR A If (cond) T F A= B= If (cond) F =A =B Y.N. Srikant 17

Live Intervals n Given an order for pseudo-instructions and live variable information, live intervals can be computed easily with one pass through the intermediate representation. n Interference among live intervals is assumed if they overlap. n Number of overlapping intervals changes only at start and end points of an interval. Y.N. Srikant 18

The Data Structures n Live intervals are stored in the sorted order of increasing start point. n At each point of the program, the algorithm maintains a list (active list) of live intervals that overlap the current point and that have been placed in registers. n active list is kept in the order of increasing end point. Y.N. Srikant 19

Example i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 i11 A B C D Active lists (in order of increasing end pt) Active(A)= {i1} Active(B)={i1,i5} Active(C)={i8,i5} Active(D)= {i7,i4,i11} Sorted order of intervals (according to start point): i1, i5, i8, i2, i9, i6, i3, i10, i7, i4, i11 Three registers enough for computation without spills Y.N. Srikant 20

The Algorithm (1) { active := []; for each live interval i, in order of increasing start point do { ExpireOldIntervals (i); if length(active) == R then SpillAtInterval(i); else { register[i] := a register removed from the pool of free registers; add i to active, sorted by increasing end point } } } Y.N. Srikant 21

The Algorithm (2) ExpireOldIntervals (i) { for each interval j in active, in order of increasing end point do { if endpoint[j] > startpoint[i] then continue else { remove j from active; add register[j] to pool of free registers; } } } Y.N. Srikant 22

The Algorithm (3) SpillAtInterval (i) { spill := last interval in active; if endpoint [spill] > endpoint [i] then { register [i] := register [spill]; location [spill] := new stack location; remove spill from active; add i to active, sorted by increasing end point; } else location [i] := new stack location; } Y.N. Srikant 23

i1 i2 i3 i4 i5 i6 i7 Example 1 A B C i8 i9 i10 i11 D Active lists (in order of increasing end pt) Active(A)= {i1} Active(B)={i1,i5} Active(C)={i8,i5} Active(D)= {i7,i4,i11} Sorted order of intervals (according to start point): i1, i5, i8, i2, i9, i6, i3, i10, i7, i4, i11 Three registers enough for computation without spills Y.N. Srikant 24

Example 2 A B 2 registers available C D E 1 2 3 4 5 1,2 : give A,B register 3: Spill C since endpoint[c] > endpoint [B] 4: A expires, give D register 5: B expires, E gets register Y.N. Srikant 25

Example 3 A B 2 registers available C D E 1 2 3 4 5 1,2 : give A,B register 3: Spill B since endpoint[b] > endpoint [C] give register to C 4: A expires, give D register 5: C expires, E gets register Y.N. Srikant 26

Complexity of the Linear Scan Algorithm n n If V is the number of live intervals and R the number of available physical registers, then if a balanced binary tree is used for storing the active intervals, complexity is O(V log R). q q Active list can be at most R long Insertion and deletion are the important operations Empirical results reported in literature indicate that linear scan is significantly faster than graph colouring algorithms and code emitted is at most 10% slower than that generated by an aggressive graph colouring algorithm. Y.N. Srikant 27

Chaitin s Formulation of the Register Allocation Problem n A graph colouring formulation on the interference graph n Nodes in the graph represent either live ranges of variables or entities called webs n An edge connects two live ranges that interfere or conflict with one another n Usually both adjacency matrix and adjacency lists are used to represent the graph. Y.N. Srikant 28

Chaitin s Formulation of the Register Allocation Problem n Assign colours to the nodes such that two nodes connected by an edge are not assigned the same colour q The number of colours available is the number of registers available on the machine q A k-colouring of the interference graph is mapped onto an allocation with k registers Y.N. Srikant 29

Example n Two colourable Three colourable Y.N. Srikant 30

Idea behind Chaitin s Algorithm n Choose an arbitrary node of degree less than k and put it on the stack n Remove that vertex and all its edges from the graph q This may decrease the degree of some other nodes and cause some more nodes to have degree less than k n At some point, if all vertices have degree greater than or equal to k, some node has to be spilled n If no vertex needs to be spilled, successively pop vertices off stack and colour them in a colour not used by neighbours (reuse colours as far as possible) Y.N. Srikant 31

Simple example Given Graph 2 1 4 5 3 3 REGISTERS STACK Y.N. Srikant 32

Simple Example Delete Node 1 2 1 4 5 3 1 STACK 3 REGISTERS Y.N. Srikant 33

Simple Example Delete Node 2 2 1 4 5 2 1 STACK 3 3 REGISTERS Y.N. Srikant 34

Simple Example Delete Node 4 2 1 4 5 4 2 1 STACK 3 3 REGISTERS Y.N. Srikant 35

Simple Example Delete Nodes 3 2 1 4 5 3 4 2 1 STACK 3 3 REGISTERS Y.N. Srikant 36

Simple Example Delete Nodes 5 2 5 3 4 2 1 1 3 4 5 STACK 3 REGISTERS Y.N. Srikant 37

Simple Example Colour Node 5 COLOURS 3 4 2 1 STACK 5 3 REGISTERS Y.N. Srikant 38

Simple Example Colour Node 3 COLOURS 5 4 2 1 3 STACK 3 REGISTERS Y.N. Srikant 39

Simple Example Colour Node 4 COLOURS 4 5 2 1 3 STACK 3 REGISTERS Y.N. Srikant 40

Simple Example Colour Node 2 COLOURS 2 4 5 3 1 STACK 3 REGISTERS Y.N. Srikant 41

Simple Example Colour Node 1 COLOURS 2 1 4 5 3 3 REGISTERS STACK Y.N. Srikant 42

Steps in Chaitin s Algorithm n Identify units for allocation q Renames variables/symbolic registers in the IR such that each live range has a unique name (number) n Build the interference graph n Coalesce by removing unnecessary move or copy instructions n Colour the graph, thereby selecting registers n Compute spill costs, simplify and add spill code till graph is colourable Y.N. Srikant 43

The Chaitin Framework SPILL CODE RENUMBER BUILD COALESCE SPILL COST SIMPLIFY SELECT Y.N. Srikant 44

Example of Renaming a = a = s1 = s1 = = a a = Renaming = s1 s2 = = a = a = s2 = s2 Y.N. Srikant 45

An Example Original code Code with symbolic registers x= 2 y = 4 w = x+ y z = x+1 u = x*y x= z*2 1. s1=2; (lv of s1: 1-5) 2. s2=4; (lv of s2: 2-5) 3. s3=s1+s2; (lv of s3: 3-4) 4. s4=s1+1; (lv of s4: 4-6) 5. s5=s1*s2; (lv of s5: 5-6) 6. s6=s4*2; (lv of s6: 6-...) Y.N. Srikant 46

s5 s3 s1 r3 s6 s2 r1 s4 INTERFERENCE GRAPH HERE ASSUME VARIABLE Z (s4) CANNOT OCCUPY r1 r2 Y.N. Srikant 47

Example(continued) Final register allocated code r1 = 2 r2= 4 r3= r1+r2 r3= r1+1 r1= r1 *r2 r2= r3+r2 Three registers are sufficient for no spills Y.N. Srikant 48

Renumbering - Webs n The definition points and the use points for each variable v are assumed to be known n Each definition with its set of uses for v is a duchain n A web is a maximal union of du-chains such that, for each definition d and use u, either u is in the du-chain of d, or there exists a sequence d =d 1,u 1,d 2,u 2,, d n,u n such that for each i, u i is in the du-chains of both d i and d i+1. Y.N. Srikant 49

Renumbering - Webs n Each web is given a unique symbolic register. n Webs arise when variables are redefined several times in a program n Webs have intersecting du-chains, intersecting at the points of join in the control flow graph Y.N. Srikant 50

Example of Webs Def x Def y B2 Def x Use y Def y B3 B1 w3 w1 Use x Use y B4 Use x Def x Use x B5 B6 w2 W1: def x in B2, def x in B3, use x in B4, Use x in B5 W2: def x in B5, use x in B6 W3: def y in B2, use y in B4 W4: def y in B1, use y in B3 w4 Y.N. Srikant 51

Build Interference Graph n Create a node for each web and for each physical register in the interference graph n If two distinct webs interfere, that is, a variable associated with one web is live at a definition point of another add an edge between the two webs n If a particular variable cannot reside in a register, add an edge between all webs associated with that variable and the register Y.N. Srikant 52

Copy Subsumption or Coalescing n Consider a copy instruction: b := e in the program n If the live ranges of b and e do not overlap, then b and e can be given the same register (colour) q Implied by lack of any edges between b and e in the interference graph n The copy instruction can then be removed from the final program n Coalesce by merging b and e into one node that contains the edges of both nodes Y.N. Srikant 53

Copy Subsumption or Coalescing l.r of old b l.r of old b l.r of e b = e l.r of e b = e l.r of new b l.r of new b copy subsumption is not possible; lr(e) and lr(new b) interfere copy subsumption is possible; lr(e) and lr(new b) do not interfere Y.N. Srikant 54

Example of coalescing c Copy inst: b:=e c d d b e f be f a BEFORE AFTER a Y.N. Srikant 55

Copy Subsumption Repeatedly l.r of x l.r of e l.r of b b = e a = b copy subsumption happens twice - once between b and e, and second time between a and b. e, b, and a are all given the same register. l.r of a Y.N. Srikant 56

Coalescing n Coalesce all possible copy instructions q Rebuild the graph n may offer further opportunities for coalescing n build-coalesce phase is repeated till no further coalescing is possible. n Coalescing reduces the size of the graph and possibly reduces spilling Y.N. Srikant 57

Simple fact n Suppose the no. of registers available is R. n If a graph G contains a node n with fewer than R neighbors then removing n and its edges from G will not affect its R-colourability n If G = G-{n} can be coloured with R colours, then so can G. n After colouring G, just assign to n, a colour different from its R-1 neighbours. Y.N. Srikant 58

Simplification n If a node n in the interference graph has degree less than R, remove n and all its edges from the graph and place n on a colouring stack. n When no more such nodes are removable then we need to spill a node. n Spilling a variable x implies q loading x into a register at every use of x q storing x from register into memory at every definition of x Y.N. Srikant 59

Spilling Cost n The node to be spilled is decided on the basis of a spill cost for the live range represented by the node. n Chaitin s estimate of spill cost of a live range v q cost(v) = q q q all load or store operations in a live range v c *10 d where c is the cost of the op and d, the loop nesting depth. 10 in the eqn above approximates the no. of iterations of any loop The node to be spilled is the one with MIN(cost(v)/deg(v)) Y.N. Srikant 60

Spilling Heuristics n Multiple heuristic functions are available for making spill decisions (cost(v) as before) 1. h 0 (v) = cost(v)/degree(v) : Chaitin s heuristic 2. h 1 (v) = cost(v)/[degree(v)] 2 3. h 2 (v) = cost(v)/[area(v)*degree(v)] 4. h 3 (v) = cost(v)/[area(v)*(degree(v)) 2 ] where area(v) = all instructions I in the live range v width v I (,)*5 depth ( v, I ) n width(v,i) is the number of live ranges overlapping with instruction I and depth(v,i) is the depth of loop nesting of I in v Y.N. Srikant 61

Spilling Heuristics n n n n area(v) represents the global contribution by v to register pressure, a measure of the need for registers at a point Spilling a live range with high area releases register pressure; i.e., releases a register when it is most needed Choose v with MIN(h i (v)), as the candidate to spill, if h i is the heuristic chosen It is possible to use different heuristics at different times Y.N. Srikant 62

Example Here R = 3 and the graph is 3-colourable No spilling is necessary Y.N. Srikant 63

A 3-colourable graph which is not 3-coloured by colouring heuristic Example 1 2 3 4 5 Y.N. Srikant 64

Spilling a Node n To spill a node we remove it from the graph and represent the effect of spilling as follows (It cannot just be removed from the graph). q q Reload the spilled object at each use and store it in memory at each definition point This creates new webs with small live ranges but which will need registers. n After all spill decisions are made, insert spill code, rebuild the interference graph and then repeat the attempt to colour. n When simplification yields an empty graph then select colours, that is, registers Y.N. Srikant 65

Effect of Spilling Use x Use y Def x Def y B4 B2 x is spilled in web W1 Use x Def x Use x B5 B6 Def x Use y Def y B3 B1 w3 w2 W1: def x in B2, def x in B3, use x in B4, Use x in B5 W2: def x in B5, use x in B6 W3: def y in B2, use y in B4 W4: def y in B1, use y in B3 w1 w4 Y.N. Srikant 66

Effect of Spilling W4 W1 Def x store x Def y B2 W5 Def y Def x store x Use y B1 W2 B3 W6 load x Use x Use y B4 W7 load x Use x Def x B5 Interference Graph W3 w4 w5 w3 Use x B6 w6 w1 w2 w7 Y.N. Srikant 67

Colouring the Graph(selection) Repeat v= pop(stack). Colours_used(v)= colours used by neighbours of v Colours_free(v)=all colours - Colours_used(v). Colour (v) = any colour in Colours_free(v). Until stack is empty n Convert the colour assigned to a symbolic register to the corresponding real register s name in the code. Y.N. Srikant 68

A Complete Example 1. t1 = 202 2. i = 1 3. L1: t2 = i>100 4. if t2 goto L2 5. t1 = t1-2 6. t3 = addr(a) 7. t4 = t3-4 8. t5 = 4*i 9. t6 = t4 + t5 10. *t6 = t1 11. i = i+1 12. goto L1 13. L2: variable live range t1 1-10 i 2-11 t2 3-4 t3 6-7 t4 7-9 t5 8-9 t6 9-10 Y.N. Srikant 69

A Complete Example t6 t5 variable live range t1 1-10 t1 i i 2-11 t2 3-4 t3 6-7 t2 t3 t4 7-9 t5 8-9 t4 t6 9-10 Y.N. Srikant 70

A Complete Example t6 t5 Assume 3 registers. Nodes t6,t2, and t3 are first pushed onto a stack during reduction. t1 i t5 t1 i t2 t3 t4 t4 This graph cannot be reduced further. Spilling is necessary. Y.N. Srikant 71

A Complete Example Node V Cost(v) deg(v) t1 h 0 (v) t1 31 3 10 i 41 3 14 t4 20 3 7 t5 20 3 7 t1: 1+(1+1+1)*10 = 31 i : 1+(1+1+1+1)*10 = 41 t4: (1+1)*10 = 20 t5: (1+1)*10 = 20 t5 will be spilled. Then the graph can be coloured. t5 t4 i 1. t1 = 202 2. i = 1 3. L1: t2 = i>100 4. if t2 goto L2 5. t1 = t1-2 6. t3 = addr(a) 7. t4 = t3-4 8. t5 = 4*i 9. t6 = t4 + t5 10. *t6 = t1 11. i = i+1 12. goto L1 13. L2: Y.N. Srikant 72

A Complete Example t1 i t1 t4 t3 t2 t6 t4 i R3 t6 R1 t1 t2 R3 R3 t4 spilled t5 i t3 R2 R3 1. R1 = 202 2. R2 = 1 3. L1: R3 = i>100 4. if R3 goto L2 5. R1 = R1-2 6. R3 = addr(a) 7. R3 = R3-4 8. t5 = 4*R2 9. R3 = R3 + t5 10. *R3 = R1 11. R2 = R2+1 12. goto L1 13. L2: t5: spilled node, will be provided with a temporary register during code generation Y.N. Srikant 73

Drawbacks of the Algorithm n Constructing and modifying interference graphs is very costly as interference graphs are typically huge. n For example, the combined interference graphs of procedures and functions of gcc in mid-90 s have approximately 4.6 million edges. Y.N. Srikant 74

Some modifications n Careful coalescing: Do not coalesce if coalescing increases the degree of a node to more than the number of registers n Optimistic colouring: When a node needs to be spilled, push it onto the colouring stack instead of spilling it right away q spill it only when it is popped and if there is no colour available for it q this could result in colouring graphs that need spills using Chaitin s technique. Y.N. Srikant 75

A 3-colourable graph which is not 3-coloured by colouring heuristic, but coloured by optimistic colouring Example 1 2 4 3 5 Say, 1 is chosen for spilling. Push it onto the stack, and remove it from the graph. The remaining graph (2,3,4,5) is 3-colourable. Now, when 1 is popped from the colouring stack, there is a colour with which 1 can be coloured. It need not be spilled. Y.N. Srikant 76