Register Allocation via Hierarchical Graph Coloring

Size: px

Start display at page:

Download "Register Allocation via Hierarchical Graph Coloring"

Aubrey Parsons
5 years ago
Views:

1 Register Allocation via Hierarchical Graph Coloring by Qunyan Wu A THESIS Submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN COMPUTER SCIENCE MICHIGAN TECHNOLOGICAL UNIVERSITY 1996

2 This thesis, Register Allocation via Hierarchical Graph Coloring, is hereby approved in partial fulfillment of the requirements for the Degree of MASTER OF SCIENCE IN COMPUTER SCIENCE. DEPARTMENT of Computer Science Thesis Advisor Dr. Steve Carr Thesis Advisor Dr. Philip Sweany Chair of Department Dr. Linda Ott Date

3 Abstract Register allocation is a vital stage in compiler optimization. It greatly impacts the effectiveness of other compiler optimization techniques. Graph coloring is the commonly used mechanism for register allocation. Since graph coloring is an NP-complete problem, heuristics are needed to find a practical solution. Several heuristics are available that perform well in practice. Commonly used heuristics, however, have a problem in that they do not include program structure information in register allocation. Without the information of program structure, poor spill decisions can be made. Callahan and Koblenz developed an algorithm that takes the program structure into account by performing register allocation hierarchically. Unfortunately, Callahan and Koblenz have not presented any experimental evidence to establish the effectiveness of their algorithm. This thesis examines the effectiveness of Callahan and Koblenz s algorithm and shows that when register pressure is high, Callahan and Koblenz s method generates worse code than Briggs method -- the generally accepted method of graph coloring register allocation. The performance degradation of Callahan and Koblenz s method is due to reserving registers in the graph coloring process. Also, by performing register allocation hierarchically, compensation move and jump instructions are required to ensure the correctness of program semantics. Furthermore, the experimental results suggest that without considering the degree of conflict in the spill cost, more values can be spilled than necessary.

4 Acknowledgments I would like to thank my advisor, Dr. Steve Carr and Dr. Phil Sweany for their encouragement and valuable advise during my two years of study at Michigan Technological University. I would like to thank Dr. David Poplawski and Dr. Barbara Bertram for taking time to be my committee members. I am also grateful to all the people in the Compute Science Department for providing a pleasant and friendly environment. And finally, I would like to thank my parents, brothers and John Mangus who have always been there for me.

5 Table of Contents Table of Contents 1.0 Introduction Register Allocation Graph Coloring Register Allocation Chaitin s method Interference Graph Coloring Problems with Chaitin s method Briggs Method Chow s Method The Problem Callahan and Koblenz s Algorithm Tile Tree The First Phase Tile Allocation Summary Tile Interference graph Coloring and Preferencing The Second Pass Coloring Spilled Nodes Adding Spill Code Spill cost and Spill Decisions Algorithm Evaluation Benchmarks Register Allocation and Instruction Scheduling Experiment Data Evaluation of the Tile Interference Graph Evaluation of Reserving Registers Evaluation of Different Levels of Tiles 37 i

6 Table of Contents Evaluation of the effectiveness of Preference Briggs vs. Callahan and Koblenz with no spilling Conclusions 49 ii

7 List of Figures List of Figures Figure 2-1 An Example Of Interference Graph 4 Figure 2-2 Resulting code of spilling different nodes 5 Figure 2-3 Chaitin s Register Allocator 6 Figure 2-4 Example Code Segment for Chaitin s Interference Graph 7 Figure 2-5 Chaitin s Interference Graph 7 Figure colorable Interference Graph 9 Figure 2-7 Code Segment 10 Figure 2-8 Resulting Code After Spilling 10 Figure 2-9 Briggs Register Allocator 12 Figure 2-10 The Resulting of Coloring Using Briggs Method 12 Figure 2-11 Difference Between Chow s and Chaitin s Interference Graph 14 Figure 2-12 Example of the Effectiveness of Live Range Splitting 14 Figure 2-13 Live Range Splitting 15 Figure 2-14 Control Flow Graph 16 Figure 2-15 Adding Spill Code for Loop Structure 17 Figure 3-1 Control Flow Graph 20 Figure 3-2 Tiles and Tile Tree 21 Figure 3-3 outline of the first pass of tile register allocation 22 Figure 3-4 outline of the second pass of tile register allocation. 25 Figure 3-5 An example of move from proceed move to 27 Figure 3-6 An example for breaking move loops 27 Figure 3-7 Formulas for calculating spill cost 28 Figure 4-1 The effectiveness of instruction scheduling 31 Figure 4-2 Normalized execution time on preg11 35 Figure 4-3 Normalized Execution time on preg22 36 ιi

8 List of Figures Figure 4-4 different tile levels on preg11 reserving 2 registers 39 Figure 4-5 Different levels of tiles on preg22 reserving 2 registers 39 Figure 4-6 Different levels of tiles on preg11 reserving 4 registers 42 Figure 4-7 Different levels of tiles on preg22 reserving 4 registers 42 Figure 4-8 with preference and without preference on preg11 44 Figure 4-9 with preference and without preference on preg22 45 Figure 4-10 example for coloring with preference 45 Figure 4-11 results with no spilling on preg11 47 Figure 4-12 results with no spilling on preg22 47 ιι ii

9 List of Tables List of Tables Table 4-1 Least number of registers required without spilling 33 Table 4-2 Execution times on preg11 35 Table 4-3 execution time on preg22 36 Table 4-4 Different tile levels when reserving 2 registers 38 Table 4-5 Different tile levels when reserving 4 41 Table 4-6 reserve 4 registers 44 Table 4-7 Briggs vs. Callahan and Koblenz with no spilling 46 i

10 1. Introduction 1 Introduction In recent years, microprocessor speed has been improved dramatically. However, the speed of memory has not kept up with it. The latency of a memory access has become a major bottleneck to computing speed. Since register access is much faster than memory access, it is important and desirable to put values into registers. However, as a program gets larger, the number of values used in a function can be larger than the number of registers available in the target machine. Therefore, it is often impossible to put all values in registers. Register allocation is the mechanism in a compiler that determines the usage of the registers. There are two issues of significance in register allocation: First, given the number of registers, is it possible to put all scalar values used in a function into registers? Second, when it is impossible to assign all values into registers, which values should be in memory? Allocating values to memory is called spilling. In a load/store architecture machine, values have to be in registers in order for the CPU to use it. Therefore, once a value is spilled, load and store instructions are required to move it between memory and a register. A spilled value is loaded to a register from memory before it is used and stored back to memory after it is defined. These load and store instructions are called spill code. Spilling different values in a function can generate codes with different numbers of load / store instructions. This is because some values are used more frequently than others. Also, there are different places that spill code can be inserted in a function. Different sections of a function can have different execution frequency. Inserting a load/store instruction into a heavily used portion of a function can result in more memory operations than inserting the instruction into a less frequently used portion. Register allocation determines which value to spill when spilling is needed and where to add spill code. The goal of register allocation is to minimize dynamic memory operations. Register allocation is commonly treated as a graph coloring problem. An interference graph is built for a function where each node represents a value in the function and an edge between nodes represents a conflict. Two nodes conflict if they cannot share the same register at the same time. Allocating M values to N registers is solved by coloring the corresponding interference graph using N colors. Each color 1

11 1. Introduction represents a register. The graph is colored so that each node has a color different from each of its neighbors. When there is no suitable color for a node, the node is spilled to memory. The register allocation process continues until all values are either allocated to registers or assigned to memory. Graph-coloring has been shown to be an NP-complete problem [6]. Therefore, it is impractical to find the optimal solution. Several heuristics are available that perform well in practice. However, commonly used heuristics have a problem that they have no way to encode the program structure in the interference graph [2][3][4]. Program structure is used to represent sections of a function that have different execution frequency like loops or conditional structures. Without the information of program structure, poor spill decisions can be made. For example, without the knowledge of a loop boundary, spill code can be inserted inside a loop when it is not necessary. Callahan and Koblenz[1] developed an algorithm that takes the program structure into account by performing register allocation hierarchically. Unfortunately, Callahan and Koblenz have not presented any experimental evidence to establish the effectiveness of their algorithm. This research examines the effectiveness of Callahan and Koblenz s algorithm and shows that when the register pressure is high, Callahan and Koblenz s method generates worse code than Briggs method -- the generally accepted method of graph coloring register allocation. A brief description of graph-coloring register allocation and some commonly used heuristics can be found in chapter 2. Callahan and Koblenz s algorithm is detailed in chapter 3. Experimental data and an analysis of factors that affect Callahan and Koblenz s method is presented in Chapter 4. Conclusions can be found in Chapter 5. 2

12 2. Register Allocation 2 Register Allocation Register allocation is a vital stage in compiler optimization. It determines the effectiveness of other compiler optimization techniques. Most compiler optimization techniques create temporaries that are candidates for registers. The improvement of these techniques depends on the cost to access those temporaries. For example, common subexpression elimination stores the result of a repeatedly computed expression in a temporary instead of computing it every time. However, if the temporary is stored in memory, it can be faster to recompute the expression than try to reload the temporary from memory[8]. Graph coloring, described in Section 2.1, is the commonly used mechanism for register allocation. Since graph coloring problem is an NP-complete problem, heuristics are needed to find a practical solution. Several commonly used heuristics are detailed. Section 2.2 describes Chaitin s[2] method, Section 2.3 presents Briggs [4] method, Section 2.4. presents Chow s[3] method. Finally, the common problem with all these methods is presented in Section Graph Coloring Register Allocation Register allocation is commonly treated as a graph-coloring problem. An interference graph for a function is a graph in which a node represents a value in the function and an edge represents a conflict between two nodes. Two nodes conflict with each other if the values that the nodes represent are live at the same time. As an example of an interference graph, consider the code fragment of Figure 2-1(a). A value is live at a point p if the value is defined before p and is used some point after p. A live range of a value is the region from its definition to its last use. In the section of code shown in Figure 2-1(a), the live range of variables a, b and c overlap with each other.this means that these three variables cannot share the same register. Therefore, in the interference graph, an edge exists between each pair of nodes. Figure 2-1(c) shows the corresponding interference graph. 3

13 2. Register Allocation a = 3 b = 4 c = 1 c = b+c a = a* c output c output a a b c b c code segment live range interference graph (a) (b) (c) Figure 2-1 An Example Of Interference Graph a The problem of assigning these values to N registers is equivalent to the coloring of the interference graph using N colors so that each node has a different color from all its neighbors. A graph is N-colorable if the graph can be colored using N colors. A node with less then N neighbors is called a colorable node. A colorable node is guaranteed to receive a color in the coloring process. One heuristic method for graph coloring starts by removing colorable nodes from the graph[2]. A color is assigned to a colorable node. The node along with its adjacent edges are deleted from the graph. By deleting the colorable node from the graph, the interference graph is simplified. This makes other nodes in the graph colorable. When no nodes in the graph have less than N neighbors, the coloring process is blocked. Spilling is required in order to continue the coloring. The value that is spilled is resident in memory instead of in a register. Spill code is required to move the spilled value between memory and a register. When all the nodes in the interference graph are colorable, it is trivial to do register allocation. However, when the interference graph is not N-colorable, spill decisions must be made. Choosing different nodes to spill can result in codes with a different number of memory operations. The smaller the number of memory operations, the faster the code runs. The graph in Figure 2-1(c) is not 2-colorable. Thus if there are only two registers available, at least one of the variables must be spilled into memory. Spill code is then inserted to move the value to/from memory. Figure 2-2(a) shows the resulting code for spilling a and Figure 2-2(b) shows the resulting code for spilling b. 4

14 2. Register Allocation a = 3 a = 3 store a b = 4 c = 1 c = b+c load a a = a*c store a output c load a output a b =4 store b c = 1 load b c = b+c a = a*c output c output a code after spill a code after spill b (a) (b) Figure 2-2 Resulting code of spilling different nodes As shown in Figure 2-2, the resulting code is affected by the spill decision. When a is spilled, two load and two store instructions are added resulting in a total of four memory operations. On the other hand, when b is spilled, one load and one store instruction are added resulting in a total of two memory operations. Therefore, spilling b generates better code than spilling a. However, the graph coloring problem is NP-complete. Therefore, it is impractical to try to find an optimal solution. Heuristics are needed to find practical solutions for register allocation problem. The commonly used heuristics are Chaitin s method, Briggs method and Chow s method. 2.2 Chaitin s method In Chaitin s method, the spill decision is made based upon the spill cost of each node. Spill cost represents the use pattern of each node. A reference of a value is either a define or a use of the value. The higher the spill cost, the more references of the node exist in a function. The more references of the node, the more beneficial it is to put the node in a register. Chaitin s method can be pictured as in Figure 2-3: 5

15 2. Register Allocation Add Spill code Spilling Build the interference graph Simplify the interference graph no spilling color the interference graph Figure 2-3 Chaitin s Register Allocator Chaitin s method performs register allocation iteratively. In each iteration, the interference graph is built. The graph is simplified either by removing a colorable node or by spilling. Spill code is inserted for any node that is spilled. The simplification process stops once the graph is empty. If during one iteration, no node is spilled then the register allocation process is done. Otherwise, the interference graph is rebuilt and recolored to check if spilling is sufficient Interference Graph In Chaitin s interference graph, conflicts are calculated in a way different from the one described in Section 2.1. In Section 2.1, two nodes conflict with each other if they are live at the same time. However, in Chaitin s method, two nodes conflict with each other if one of them is live and available at the definition point of the other. Chromatic number is the minimum number of colors that are required to color a graph successfully. Using Chaitin s method, an interference graph with smaller chromatic number can be obtained for some programs than using the method described in Section 2.1. For example, consider the code segment in Figure

16 2. Register Allocation label: a =... b =... if (a>b) { m =... sum = a+m } else { n =... sum = b+n } output sum Figure 2-4 Example Code Segment for Chaitin s Interference Graph Variables a, b, m, n and sum are live at the statement label. However, m is no longer live at the definition point of n. This is because only one branch of a conditional structure is taken during the program execution time. Therefore, in Chaitin s interference graph, m doesn t conflict with n. Furthermore, variables a, b, m and n do not conflict with variable sum since they are no longer live after the definition of sum. Therefore, sum can share the same register with either of them. The resulting interference graph by using Chaitin s method is shown in Figure 2-5. a b m n sum Figure 2-5 Chaitin s Interference Graph After the interference graph is built, nodes that are the source and target of copy operations are coalesced together if they do not conflict with each other. This is done by combining nodes into a single node which interferes with all the nodes with which coalesced nodes conflicted before coalescing. Therefore, once nodes are coalesced 7

17 2. Register Allocation together, they are assigned to the same register and copy operation becomes unnecessary. The disadvantage of it is that the chromatic number of the interference graph can be higher after coalescing. Spilling may be required after coalescing even though the interference graph before coalescing is N-colorable. Therefore, coalescing is only done if doing so does not make a colorable node have a degree N Coloring Chaitin s allocator tries to color the interference graph using N colors, where N is the number of registers available in a machine, using the method described in Section 2.1. When the graph is not colorable, spilling is required. Spill code is added to the original program and the interference graph is rebuilt. Register allocation is repeated until the resulting code has an interference graph that is N-colorable. Two phases exist in Chatin s allocator: graph simplification and graph coloring. Chaitin s allocator simplifies the interference graph using the following two steps. These two steps are repeated until the graph is empty. 1. Remove nodes with less than N neighbors from the graph along with its edges. Push these nodes onto a stack for coloring. 2. If no node exists with less than N neighbors in the graph, pick a node with lowest cost degree ratio to spill. Remove the node and its adjacent edges from the graph. Colorable nodes are removed from the graph in a random order. When the simplification process is blocked, a node is picked to spill and the spill code is added to the program. Spill code is load/store instructions that move a value between memory and a register. Therefore, even if a value is spilled, a register is required to load the value from memory. Since spilling involves new register use, the interference graph must be rebuilt and colored to check if more spilling is needed. This process is repeated until no more values need to be spilled. When the simplification process is blocked, a heuristic is needed to decide which node to spill. In Chaitin s method, each node is associated with spill cost. Spill cost equals the number of defines and uses weighted by the nesting level. The degree of a node equals the number of edges associated with the node in the interference graph. When the coloring 8

18 2. Register Allocation process reaches a point where spilling is necessary, Chaitin s method spills a node with the lowest ratio of spill cost to the node s current degree. These are the nodes that are referenced less frequently, therefore, fewer load/store instructions are required after spilling. Also, these nodes have a high degree. The interference graph is more likely to be N-colorable without further spilling if nodes with a high degree are removed from the graph. After the graph is simplified without spilling, the coloring process starts. Nodes are added back to the interference graph in the reverse order of deletion. Each node is assigned a color that is different from all its neighbors Problems with Chaitin s method The spill decision of Chaitin s method is too pessimistic. In Chaitin s method, a node can be spilled when N of its neighbors are colored even if two neighbors have the same color. For example, consider the graph in Figure 2-6. The graph can be easily colored using two colors. However, using Chaitin s method, the graph is not 2-colorable. No node with a degree less than two exists in the graph. Therefore, a node is spilled in order to continue the coloring process Figure colorable Interference Graph Another problem with Chaitin s method is spilling everywhere. Spilling everywhere can cause more memory operations than necessary. In spilling everywhere, once a value is spilled, it is loaded into a register from memory before each use and stored back to memory after each definition. However, in some portions of a program, the register pressure can be so low that the spilled node can be assigned to a register. In this way, the number of load/store instructions is lowered in the resulting code. 9

19 2. Register Allocation For example, consider the section of code shown in Figure 2-7 stmt 1: a = 3 stmt 2: b = 4 stmt 3: c = 1 stmt 4: c = b+c stmt 5: b = b*c stmt 6: output b stmt 7: c = a*c stmt 8: output c stmt 9: output a a b c a b c Code Segment Live Range Interference Graph Figure 2-7 Code Segment As shown in Figure 2-7, the interference graph is not 2-colorable. Spill cost of a, b and c are equal to 3, 4 and 7, respectively. If the graph is colored using two registers, node a is spilled. If a is spilled everywhere, then a is in memory throughout its live range. A load instruction is added before each use and a store instruction is added after each definition. The resulting code after spilling everywhere is shown in Figure 2-8(a). A closer look at the code in Figure 2-7 shows that variable b is no longer live after stmt 6. Only two variables a and c are used in stmt 7, 8 and 9. Both variable a and c can be in registers after stmt 6. The resulting code after spilling is shown in Figure 2-8(b). One load instruction is deleted compared to spilling everywhere. a(r1) = 3 store a(r1) b(r1) = 4 c(r2) = 1 c(r2) = b(r1) + c(r2) b(r1) = b(r1)*c(r2) output b(r1) load a(r1) c(r2) = a(r1)*c(r2) output c(r2) load a(r1) <-- redundant since a output a(r1) is in r1 already a(r1) = 3 store a(r1) b(r1) = 4 c(r2) = 1 c(r2) = b(r1)+c(r2) b(r1) = b(r1)*c(r2) output b(r1) load a(r1) c(r2) = a(r1)*c(r2) output c(r2) output a(r1) (a) resulting code after spill everywhere (b) a better way of adding spill code Figure 2-8 Resulting Code After Spilling 10

20 2. Register Allocation 2.3 Briggs Method Briggs method is an improvement over Chaitin s method. In Chaitin s method, nodes with N colored neighbors are spilled during the graph simplification stage. The assumption behind this is that once N neighbors of a node are colored, no color is left for this node. However, this is not always true. If two neighbors are assigned the same color, then there is a color left for the node. In this case, spilling is unnecessary. Briggs made an improvement by delaying the spill decision. In Briggs method, during graph simplification, nodes are simply removed from the interference graph in the increasing order of the cost/degree ratio. No node is spilled even it is not colorable. After all the nodes are removed from the interference graph, nodes are added back to the graph in the reverse order of deletion. A node is colored so that it has a different color from all its colored neighbors. If no such color is available, then the node is spilled and spill code is added. In this way, the algorithm can check if a color is available for a node before spilling. Briggs method uses spill cost to determine the order in which nodes are removed from the interference graph. Spill cost determines which node is more cost effective to be in a register. Nodes are removed from the interference graph and pushed on to a stack in the increasing order of cost/degree ratio. Therefore, more expensive nodes are on the top of the stack. This ensures that more expensive nodes are likely to be colored sooner than less expensive nodes. Nodes with less than N neighbors are removed from the graph in a random order. When no such node exists, a node with lowest ratio of spill cost to its current degree is removed from the graph without spilling. The spill cost for each node is calculated in the same way as in Chaitin s method. In this way, when spilling is required in Briggs method, the same value is spilled as in Chaitin s method. Therefore, Briggs method spills a subset of nodes that are spilled in Chaitin s method. Figure 2-9 shows how Briggs method works 11

21 2. Register Allocation Add Spill code Spilling Build the interference graph Simplify the interference graph color the interference graph no spill Figure 2-9 Briggs Register Allocator For the graph in Figure 2-6, using Chaitin s method, a node is spilled in order to color the graph using two colors. Assuming node 1 is picked, Chaitin s method spills the node to memory. Spill code is added to the original code and the node is removed from the interference graph along with its adjacent edges. This makes the remaining graph 2- colorable and node 2, 3 and 4 can be removed from the graph. Therefore, using Chaitin s method, spilling is required. The same interference graph is 2-colorable using Briggs method. Since Briggs method removes the nodes in the same order as Chaitin s method, nodes are removed and pushed to the stack in the order 1, 2, 3,4. Even though node 1 is not colorable, it is not spilled at this point. Therefore, no spill code is added. After all the nodes are removed from the graph, the nodes are popped from the stack and appropriate colors are assigned to each node. If no suitable color is found for a node, the node is spilled to memory and spill code is added. Using this method, the interference graph in Figure 2-6 can be colored using two colors without spilling. The result of Briggs method is shown in Figure r1 r2 r2 r1 Figure 2-10 The Resulting of Coloring Using Briggs Method Although Briggs method makes better spill decisions than Chaitin s method, it still 12

22 2. Register Allocation suffers from the problem of spilling everywhere. 2.4 Chow s Method Chow s method overcomes the spill everywhere problem from which Chaitin s and Briggs method suffer. In Chow s method, the live range is computed at the basic block level. Each live range is a candidate for a register. In register allocation, a live range can be split into several smaller live ranges. Each smaller live range can be assigned to a different register or to memory. Also, in Chow s method, register allocation is done in a single pass. This is done by reserving a number of registers to handle the accesses to variables that are spilled. With no backtracking, register allocation can be performed faster. However, reserving registers can result in more spilling than necessary. There are two major differences between Chow s method and Chaitin s method. First, Chow s method uses a coarser interference graph than Chaitin s method. In Chaitin s method, the conflicts are calculated at the instruction level whereas in Chow s method, the conflicts are calculated at the basic block level. Using Chow s method, an interference graph with a higher chromatic number may result. For example, consider the section of code in Figure 2-11(a). Chaitin s interference graph for this section of code is shown in Figure 2-11(b). Chow s interference graph for the same section of code is shown in Figure 2-11(c). Variables a, b and c are live in the same basic block. Therefore, they interfere with each other in Chow s interference graph which makes the graph not 2-colorable. While using Chaitin s method which computes interference at the instruction level, variable a is no longer live after the definition point of c. Therefore, in Chaitin s interference graph, there is no edge between nodes a and c and the graph is 2-colorable. 13

23 2. Register Allocation a = b = c = b+a output c output b (a) code in a basic block (b) Chaitin s Interference Graph (c) Chow s Interference Graph Figure 2-11 Difference Between Chow s and Chaitin s Interference Another difference between Chow s and Chaitin s methods is that Chaitin s method never splits a live range. When the allocation process is blocked, a variable is selected to spill. Instead of spilling, Chow s method splits the live range that cannot be colored. By splitting, the node divides into several smaller live ranges. These smaller live ranges usually interfere with fewer nodes in the graph. This may result in coloring the graph without spilling to memory or at least only spilling part of its live range to memory. The effect of live range splitting is shown in Figure b a splitting a a a b c (a) live range before spilling a b c a (b) live range and interference graph after splitting a c Figure 2-12 Example of the Effectiveness of Live Range Splitting 14

24 2. Register Allocation As shown in Figure 2-12(a), variable a interferes with variables b and c. Therefore, the corresponding interference graph is not 2-colorable. If variable a is spilled, a is resident in memory throughout its live range. However, splitting a into two smaller live ranges a and a, part of a s live range--a can be in a register. Therefore, fewer memory operations are needed. Chow s live-range splitting technique separates a portion of a live range as large as possible to avoid creating too many small live ranges. However, this can end up splitting a live range inside a loop, even though it is possible to split the nodes outside the loop. For example, with live ranges shown in Figure 2-13(a), Chow s method keeps splitting live range a as long as it is possible, which ends up splitting a inside the loop as shown in Figure 2-13(b). A register to register move instruction or spill code is added inside the loop that is executed more frequently than other portions of the program. A better way to split a is to split it outside the loop as show in the Figure 2-13(c). The problem of splitting inside a loop results from the fact that no information about program structure is encoded into the interference graph. Therefore, there is no way to detect the loop boundary when splitting. loop a b c a1 b c a1 b c a2 a2 (a) live ranges (b) split a with Chow s (c) method better way of spilt a a1 a2 represent different live ranges after splitting a Figure 2-13 Live Range Splitting 15

25 2. Register Allocation 2.5 The Problem The problem with graph coloring register allocation results from its lack of ways to encode program structure information in the interference graph. Although spill cost gives values inside of a loop higher priorities, values in a conditional structure are still treated equally with values outside a conditional structure. Spill cost ignores the fact that some branches of a conditional structure are less likely to be executed compared to other portions of a program. When spilling is required, spilling a node in a conditional structure may result in fewer executed memory operations. For example, consider the control flow graph in Figure Using Chaitin s and Briggs method, node a spills and a load instruction is added in block 0 which is executed every time the program runs. A better solution is to spill node b which results in adding two load instructions. The probability of the load instructions are executed is only 25%. Therefore, the number of memory operations at execution time may be reduced. Block 0... = a 50% Block 1 Block 2 50%... = b... = b Block 3 Figure 2-14 Control Flow Graph 16

26 2. Register Allocation Also, in Figure 2-15, node b is used but not defined in a loop. If b is spilled using Chaitin s or Briggs method, a load b instruction is added inside the loop that is executed n times in one execution. Since b is never defined in the loop, it is better to load b before entering the loop. In this way, the load instruction is only executed once. load b n = b... n load b n = b... = b... (a) Control Flow Graph (b) Chaitin s Method for adding spill code (c) better way of adding spill code Figure 2-15 Adding Spill Code for Loop Structure The examples discussed above suggest that program structure can help to make better spill decisions and help to find a better place to insert spill code. However, it has not yet been shown on real programs that it is beneficial to encode the program structure information in the register allocation process. 17

27 3. Callahan and Koblenz s Algorithm 3 Callahan and Koblenz s Algorithm Callahan and Koblenz s algorithm uses a tree structure called a tile tree to represent program structure. The algorithm performs register allocation through two passes. In the first pass, the tile tree is visited in a bottom-up fashion and an interference graph is built for each tile and colored using pseudo-registers. Local spill decisions are made and register allocation information is summarized for each tile. Tile register allocation summary is passed to its parent. In the second pass, the tile tree is visited in a top-down fashion to update spill decisions and map pseudo-registers to physical registers. Using tile structure, program sections with different execution frequency like a loop or a conditional structure can be separated. In this way, the program structure can be encoded in the register allocation process. 3.1 Tile Tree Before performing the two passes of register allocation, a tile tree is built for each function. A tile is a set of basic blocks that represents a loop or a conditional structure of a program. A tile tree is valid if each tile in the tile tree is either disjoint or a subset of another. If tile t 1 is a subset of tile t 2 and there is no tile t that satisfies t t t, then t is the parent of t 1. Blocks(t) is a set of basic blocks that belong to tile t but do not belong to any child of t. An entry edge of tile t is an edge <n,m>, where n blocks ( parent ( t) ) and m belongs to tile t. An exit edge of tile t is an edge <m,n>, where m belongs to tile t and n blocks ( parent ( t) ). The root of the tile is the tile t that satisfies { start, end} blocks ( t) where start is the basic block with no predecessor and end is the basic block with no successor. There are many different tile trees that can be built for a function. The main purpose of building a tile tree is to separate loop structures or conditional structures from other portions of a function. Therefore, in this thesis, a tile tree is built based on a control flow graph(cfg) using the following steps: 18

28 3. Callahan and Koblenz s Algorithm Step 1. Start with a tile that includes all basic blocks in the CFG. This is the root of the tile tree. Step 2. Identify loops in the program. Basic block A dominates basic block B if all the edges entering B pass through A. Basic block A post-dominates basic block B if all the edges exiting B pass through A. A loop head is the entry to a loop. A loop tail is the exit from a loop. If the loop head does not dominate all basic blocks in a loop, then a new loop head is formed so that the loop has a single entry. Similarly, if the loop tail does not post-dominate all basic blocks in a loop, then a new loop tail is formed so that the loop only has a single exit. Each loop is considered a tile by itself. Step 3. Recognize conditional structures inside a loop L. Loops that are strictly contained in L are coalesced together as a single basic block. Basic blocks in L are divided into several groups using dominator and post-dominator relations. In each group g = {b n,..., b i, b i+1... b m }, b i dominates b i+1 and is post-dominated by b i+1. If basic blocks in g are directed connected, g is a tile. Otherwise, g is blocks(l). Step 4. Once tiles are recognized, a tile tree is built based on the parent and child relationship. Program structure is represented hierarchically by the tile tree. Inner loops and conditional structures are on the bottom of the tile tree where outer loops are on the top. A new basic block is added on the entry and exit edge of each tile. For example, consider the control flow graph in Figure

29 3. Callahan and Koblenz s Algorithm B1 B2 B3 B4 B5 B6 Figure 3-1 Control Flow Graph Using the steps described above, root tile t 0 {B1, B2, B3, B4, B5, B6} is built first. Then the loop {B2, B3, B4, B5} is recognized as a tile t 1. In t 1, basic blocks are divided into three groups by dominate and post-dominate relations. The three groups are g 1 = {B3}, g 2 = {B4} and g 3 = {B1, B5}. Group g 1 and g 2 are considered as tile t 2 and t 3 since only one basic block exists in each group. Group g 3 is blocks(t 1 ) since B2 and B5 are not directly connected. New basic blocks NB0 to NB5 are added along the entry and exit edge of each tile. The resulting tile tree is shown in Figure

30 3. Callahan and Koblenz s Algorithm t 0 = {B1,B2,B3,B4,B5,B6} t 1 = {B2,B3,B4,B5} t 2 = {B3} t 3 = {B4} t 0 t 1 t 2 t 3 Tiles Tile Tree B1 NB0 B2 NB2 B3 NB3 NB4 B4 NB5 B5 NB1 B6 The Result CFG After Adding New Basic Blocks Figure 3-2 Tiles and Tile Tree 3.2 The First Phase In the first phase of register allocation, a tile tree is visited in the bottom-up fashion. An interference graph is built for each tile and colored using pseudo-registers. The number of pseudo-registers equals the number of registers a target machine provides. The register 21

31 3. Callahan and Koblenz s Algorithm allocation information is summarized and passed up to its parent. Figure 3-3 shows the outline of the first pass of the register allocation. first_pass(tile t) for each child s of t do first_pass(s) end do build_interference_graph for t based on information from t and t s children color the interference graph summarize tile register allocation information for parent tile of t Figure 3-3 Outline of the first pass of tile register allocation Tile Allocation Summary Variables are divided into two groups according to each tile. Local variables are variables that are live only inside a tile. Global variables are variables that are live along any entry or exit edges of a tile. After a tile is visited in the first pass, each variable that is used in the tile is either assigned to a register or spilled to memory. Local register allocation information is summarized and passed to its parent. A tile summary variable is created for each register that is assigned a local variable. Local variables assigned to the same register are represented by the tile summary variable associated with that register. Information for each local variable is passed to a tile s parent through tile summary variables. Using tile summaries, local variables allocated to the same register are coalesced. Passing information of tile summary variables, therefore, is more efficient than passing information for each local variable. For each global variable that is assigned to a register in a tile, two types of conflicts are passed to the tile s parent. First, the conflicts between a global variable and a tile 22

32 3. Callahan and Koblenz s Algorithm summary variable are passed. A conflict exists between a global variable g and a tile summary variable t if g conflicts with any local variable that is contained in t. Second, conflicts between global variables that are in registers in the current tile are passed Tile Interference graph An interference graph is built for each tile in the first pass. There are two types of variables in the interference graph; variables that are referenced in the current tile and the tile summary variables from its child tiles. The first type represents the usage of values of the current tile. The second type represents the usage of variables that are local to the current tile s children. Variables that are live across the current tile but never referenced in the current tile are not included in the current tile s interference graph during the first pass. These values are the best candidates to spill when spilling is required. An edge exists between two nodes if they conflict with each other. Conflicts are calculated based on the following: 1. Two values conflict with each other if they conflict in a basic block in blocks(t). This is the typical conflict of a tile without considering its child tile s allocation information. In Chaitin s method, it is sufficient to only conflict a variable that is defined in a statement p with all variables that are live and available at p. However, using tile register allocation, the definition point of a live variable may not be included in a tile. Therefore, conflicts for a basic block in blocks(t) are computed as following: conflict a variable defined at p with all variables that are live at p and referenced in blocks(t) conflict a variable used at p with all variables that are live at p and referenced in blocks(t) conflict live variables that are referenced in blocks(t) By including conflicts between live variables, no conflict is missed, yet conflicts are not over estimated by restricting these live variables to be referenced in blocks(t). 2. A node corresponding to a tile summary variable conflicts with other tile sum- 23

33 3. Callahan and Koblenz s Algorithm mary variables from the same subtile.tile summary variables, however, do not conflict with tile summary variables from sibling tiles. This is because tile summary variables represent local variables and the fact that live ranges of local variables from sibling tiles never overlap. 3. For each subtile, global variables conflict with tile summary variables as indicated in the conflict summary for each tile. 4. A variable that is spilled in a subtile but live in the current tile conflicts with all tile summary variables from that subtile. 5. A variable that is live across a subtile but is not used or defined in the subtile conflicts with all the tile summary variables from that subtile Coloring and Preferencing In Callahan and Koblenz s algorithm, the tile interference graph is colored through one pass with no backtracking. Therefore, registers are reserved to handle the access to the spilled variables. Preferencing is used to meet the machine linkage conventions and to minimize the number of register to register move operations. In some machine conventions, certain variables have to be placed in particular registers, like parameters passed to a function or values returned from a function. Specific registers are set to be the preferencing for those variables. Also, since register allocation is performed for each tile locally, it is possible that in a different tile the same variable is assigned to a different register. In this case, a register to register move instruction is required to preserve the program semantics. In order to minimize the number of move instructions, each node in the interference graph is associated with a preferencing. The preferencing records registers the value is assigned in other tiles. The interference graph can be colored using any standard method. In this thesis, Briggs method is used. Some changes have been made in order to satisfy preferencing. When coloring a node that has preferencing, a preferred register is assigned to the node if available. If the register is not available, the preferencing is ignored. When coloring a node without preferencing, a color that is different from all of its colored neighbors as well 24

34 3. Callahan and Koblenz s Algorithm as different from the preferencing of all of its uncolored neighbors is assigned. If no such color is available, a color that is not the same as any of its colored neighbors is assigned to the node. 3.3 The Second Pass The second pass starts after the root tile is visited in the first pass. In the second pass, the tile tree is visited in the top-down fashion. An interference graph is rebuilt for each tile. The interference graph is built in the same way as in the first pass with one exception. In the second pass, nodes that are live across a tile but are never referenced are included in the interference graph if they are in a register in the parent tile. Those variables conflict with all the other nodes in the interference graph. The interference graph is colored in the same way as in the first pass. Figure 3-4 shows the outline of the second pass of register allocation. second_pass(t) rebuild_interference_graph for t based on information from t, t s children and t s parent tile. color the interference graph color spilled values and add spill code save register allocation information for child tiles of t for each child s of t do second_pass(s) end do Figure 3-4 Outline of the second pass of tile register allocation. 25

35 3. Callahan and Koblenz s Algorithm Coloring Spilled Nodes Registers are needed for moving spilled values between memory and the registers. In Callahan and Koblenz s method, a number of registers are reserved for accessing the spilled values. Spilled nodes are colored using the reserved registers. Spilled values are colored in such a way that two spilled values can not be loaded to the same register at the same time. This is done by ensuring that two spilled values used at a statement are not assigned to the same color. For each statement, if a spilled node is not colored, a reserved color is assigned to the node. If a spilled node is already colored, the color is checked to see if it is the same as other spilled node used in statement p. If this is true, then the node is assigned to a different color in p Adding Spill Code After the second pass, the final spill decision is made for each tile. New basic blocks are added along the entry and exit edges of each tile. Spill code is added to the new basic blocks. There are four different results of register allocation. 1. If a variable v is spilled in tile t but resident in a register in t s parent, then on the entry edge of t where v is live, a store instruction is added and on the exit edge of t where v is live a load instruction is added. 2. If a variable v is in a register in tile t but spilled in t s parent, then on the entry edge of t where v is live, a load instruction is added and on the exit edge of t where v is live a store instruction is added. If in tile t, v is never redefined, the store instruction is redundant. This is because the value of v is already in memory and never changed. 3. If a variable v is assigned to a register in tile t and is assigned to a different register in t s parent, register move instructions are inserted on the entry and exit edges where v is live. 4. If a variable v is spilled both in tile t and in t s parent, no instructions are needed in t s boundary since they all spill the variable to the same memory location. A load instruction is added before each use of v and a store instruction is added after each definition of v in blocks(t). 26

36 3. Callahan and Koblenz s Algorithm Code that is added to the new basic block needs to be ordered. Store and move from instructions should proceed load and move to instructions. For example, assume variable a is assigned to register r4 in tile t and r5 in t s parent tile. A more instruction from r5 to r4 is added along t s entry edge and a move instruction from r4 to r5 is added along t s exit edges. If another variable b is assigned to register r5 in tile t and r6 in t s parent tile. A move instruction from r6 to r5 is needed for b along t s entry edge and a move instruction from r5 to r6 is needed along t s exit edge. In order to ensure the program correctness, the move instructions should be orders as following: t s entry t s exit mov r4, r5 mov r5, r6 tile t mov r6, r5 mov r5, r4 move from r5 instructions proceed move to r5 instructions Figure 3-5 An example of move from proceed move to Sometimes, however, a move loop can exist. In this case, an unused register can be used to break the move loop. If no such register exists, then a node needs to be spilled. For example, variable a is assigned to register r4 and r5 in t s child tile s. Variable b is assigned to register r5 in tile t and r4 in t s child tile s. On the entry and exit edge of s, a move loop exists between register r4 and r5 mov r4, r5 mov r5, r4 r4 r5 A move loop mov r6, r5 mov r5, r4 mov r4, r6 Break the loop by using unused register r6 store r5 to memory address p mov r5, r4 load from p to r4 Break the loop by spilling b(r5) Figure 3-6 An example for breaking move loops 27

37 3. Callahan and Koblenz s Algorithm 3.4 Spill cost and Spill Decisions The following equations are used to determine which nodes deserve to be in registers. LocalWeight ( v) = prob ( b) ref ( v) t b b Transfer ( v) = prob ( e) Live ( v) t e e Reg ( v) = Reg ( v) min( Transfer ( v), Weight ( v) ) t t t t Mem ( v) = Mem ( v) Transfer ( v) t t t Weight ( v) = Reg ( v) Mem t ( v) + LocalWeight ( v) s s t s t - current tile b - all the basic blocks in blocks(t) e - entry and exit edges of t s - t s child tiles prob(b) - the probability of b being executed prob(e) - the probability of e being executed ref b (v) -the number of references in basic block b live e (v) equals 1 if v is live along the edge e, otherwise 0 reg t (v) equals 1 if v is in register in t, otherwise 0. mem t (v) is 1 if v is in memory in t, otherwise 0. Figure 3-7 Formulas for calculating spill cost LocalWeight represents the typical use pattern of value v in blocks(t) without considering information from t s child tiles. LocalWeight of tile summary variables from t s child tile equals 0. Transfer calculates the cost of loading and storing v along t s entry and exist edges. Reg is the cost of allocating v to memory in t s parent tile when v is in a register in t. 28

Code generation for modern processors

Code generation for modern processors Definitions (1 of 2) What are the dominant performance issues for a superscalar RISC processor? Refs: AS&U, Chapter 9 + Notes. Optional: Muchnick, 16.3 & 17.1 Instruction