Register Allocation via Hierarchical Graph Coloring

Size: px
Start display at page:

Download "Register Allocation via Hierarchical Graph Coloring"

Transcription

1 Register Allocation via Hierarchical Graph Coloring by Qunyan Wu A THESIS Submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN COMPUTER SCIENCE MICHIGAN TECHNOLOGICAL UNIVERSITY 1996

2 This thesis, Register Allocation via Hierarchical Graph Coloring, is hereby approved in partial fulfillment of the requirements for the Degree of MASTER OF SCIENCE IN COMPUTER SCIENCE. DEPARTMENT of Computer Science Thesis Advisor Dr. Steve Carr Thesis Advisor Dr. Philip Sweany Chair of Department Dr. Linda Ott Date

3 Abstract Register allocation is a vital stage in compiler optimization. It greatly impacts the effectiveness of other compiler optimization techniques. Graph coloring is the commonly used mechanism for register allocation. Since graph coloring is an NP-complete problem, heuristics are needed to find a practical solution. Several heuristics are available that perform well in practice. Commonly used heuristics, however, have a problem in that they do not include program structure information in register allocation. Without the information of program structure, poor spill decisions can be made. Callahan and Koblenz developed an algorithm that takes the program structure into account by performing register allocation hierarchically. Unfortunately, Callahan and Koblenz have not presented any experimental evidence to establish the effectiveness of their algorithm. This thesis examines the effectiveness of Callahan and Koblenz s algorithm and shows that when register pressure is high, Callahan and Koblenz s method generates worse code than Briggs method -- the generally accepted method of graph coloring register allocation. The performance degradation of Callahan and Koblenz s method is due to reserving registers in the graph coloring process. Also, by performing register allocation hierarchically, compensation move and jump instructions are required to ensure the correctness of program semantics. Furthermore, the experimental results suggest that without considering the degree of conflict in the spill cost, more values can be spilled than necessary.

4 Acknowledgments I would like to thank my advisor, Dr. Steve Carr and Dr. Phil Sweany for their encouragement and valuable advise during my two years of study at Michigan Technological University. I would like to thank Dr. David Poplawski and Dr. Barbara Bertram for taking time to be my committee members. I am also grateful to all the people in the Compute Science Department for providing a pleasant and friendly environment. And finally, I would like to thank my parents, brothers and John Mangus who have always been there for me.

5 Table of Contents Table of Contents 1.0 Introduction Register Allocation Graph Coloring Register Allocation Chaitin s method Interference Graph Coloring Problems with Chaitin s method Briggs Method Chow s Method The Problem Callahan and Koblenz s Algorithm Tile Tree The First Phase Tile Allocation Summary Tile Interference graph Coloring and Preferencing The Second Pass Coloring Spilled Nodes Adding Spill Code Spill cost and Spill Decisions Algorithm Evaluation Benchmarks Register Allocation and Instruction Scheduling Experiment Data Evaluation of the Tile Interference Graph Evaluation of Reserving Registers Evaluation of Different Levels of Tiles 37 i

6 Table of Contents Evaluation of the effectiveness of Preference Briggs vs. Callahan and Koblenz with no spilling Conclusions 49 ii

7 List of Figures List of Figures Figure 2-1 An Example Of Interference Graph 4 Figure 2-2 Resulting code of spilling different nodes 5 Figure 2-3 Chaitin s Register Allocator 6 Figure 2-4 Example Code Segment for Chaitin s Interference Graph 7 Figure 2-5 Chaitin s Interference Graph 7 Figure colorable Interference Graph 9 Figure 2-7 Code Segment 10 Figure 2-8 Resulting Code After Spilling 10 Figure 2-9 Briggs Register Allocator 12 Figure 2-10 The Resulting of Coloring Using Briggs Method 12 Figure 2-11 Difference Between Chow s and Chaitin s Interference Graph 14 Figure 2-12 Example of the Effectiveness of Live Range Splitting 14 Figure 2-13 Live Range Splitting 15 Figure 2-14 Control Flow Graph 16 Figure 2-15 Adding Spill Code for Loop Structure 17 Figure 3-1 Control Flow Graph 20 Figure 3-2 Tiles and Tile Tree 21 Figure 3-3 outline of the first pass of tile register allocation 22 Figure 3-4 outline of the second pass of tile register allocation. 25 Figure 3-5 An example of move from proceed move to 27 Figure 3-6 An example for breaking move loops 27 Figure 3-7 Formulas for calculating spill cost 28 Figure 4-1 The effectiveness of instruction scheduling 31 Figure 4-2 Normalized execution time on preg11 35 Figure 4-3 Normalized Execution time on preg22 36 ιi

8 List of Figures Figure 4-4 different tile levels on preg11 reserving 2 registers 39 Figure 4-5 Different levels of tiles on preg22 reserving 2 registers 39 Figure 4-6 Different levels of tiles on preg11 reserving 4 registers 42 Figure 4-7 Different levels of tiles on preg22 reserving 4 registers 42 Figure 4-8 with preference and without preference on preg11 44 Figure 4-9 with preference and without preference on preg22 45 Figure 4-10 example for coloring with preference 45 Figure 4-11 results with no spilling on preg11 47 Figure 4-12 results with no spilling on preg22 47 ιι ii

9 List of Tables List of Tables Table 4-1 Least number of registers required without spilling 33 Table 4-2 Execution times on preg11 35 Table 4-3 execution time on preg22 36 Table 4-4 Different tile levels when reserving 2 registers 38 Table 4-5 Different tile levels when reserving 4 41 Table 4-6 reserve 4 registers 44 Table 4-7 Briggs vs. Callahan and Koblenz with no spilling 46 i

10 1. Introduction 1 Introduction In recent years, microprocessor speed has been improved dramatically. However, the speed of memory has not kept up with it. The latency of a memory access has become a major bottleneck to computing speed. Since register access is much faster than memory access, it is important and desirable to put values into registers. However, as a program gets larger, the number of values used in a function can be larger than the number of registers available in the target machine. Therefore, it is often impossible to put all values in registers. Register allocation is the mechanism in a compiler that determines the usage of the registers. There are two issues of significance in register allocation: First, given the number of registers, is it possible to put all scalar values used in a function into registers? Second, when it is impossible to assign all values into registers, which values should be in memory? Allocating values to memory is called spilling. In a load/store architecture machine, values have to be in registers in order for the CPU to use it. Therefore, once a value is spilled, load and store instructions are required to move it between memory and a register. A spilled value is loaded to a register from memory before it is used and stored back to memory after it is defined. These load and store instructions are called spill code. Spilling different values in a function can generate codes with different numbers of load / store instructions. This is because some values are used more frequently than others. Also, there are different places that spill code can be inserted in a function. Different sections of a function can have different execution frequency. Inserting a load/store instruction into a heavily used portion of a function can result in more memory operations than inserting the instruction into a less frequently used portion. Register allocation determines which value to spill when spilling is needed and where to add spill code. The goal of register allocation is to minimize dynamic memory operations. Register allocation is commonly treated as a graph coloring problem. An interference graph is built for a function where each node represents a value in the function and an edge between nodes represents a conflict. Two nodes conflict if they cannot share the same register at the same time. Allocating M values to N registers is solved by coloring the corresponding interference graph using N colors. Each color 1

11 1. Introduction represents a register. The graph is colored so that each node has a color different from each of its neighbors. When there is no suitable color for a node, the node is spilled to memory. The register allocation process continues until all values are either allocated to registers or assigned to memory. Graph-coloring has been shown to be an NP-complete problem [6]. Therefore, it is impractical to find the optimal solution. Several heuristics are available that perform well in practice. However, commonly used heuristics have a problem that they have no way to encode the program structure in the interference graph [2][3][4]. Program structure is used to represent sections of a function that have different execution frequency like loops or conditional structures. Without the information of program structure, poor spill decisions can be made. For example, without the knowledge of a loop boundary, spill code can be inserted inside a loop when it is not necessary. Callahan and Koblenz[1] developed an algorithm that takes the program structure into account by performing register allocation hierarchically. Unfortunately, Callahan and Koblenz have not presented any experimental evidence to establish the effectiveness of their algorithm. This research examines the effectiveness of Callahan and Koblenz s algorithm and shows that when the register pressure is high, Callahan and Koblenz s method generates worse code than Briggs method -- the generally accepted method of graph coloring register allocation. A brief description of graph-coloring register allocation and some commonly used heuristics can be found in chapter 2. Callahan and Koblenz s algorithm is detailed in chapter 3. Experimental data and an analysis of factors that affect Callahan and Koblenz s method is presented in Chapter 4. Conclusions can be found in Chapter 5. 2

12 2. Register Allocation 2 Register Allocation Register allocation is a vital stage in compiler optimization. It determines the effectiveness of other compiler optimization techniques. Most compiler optimization techniques create temporaries that are candidates for registers. The improvement of these techniques depends on the cost to access those temporaries. For example, common subexpression elimination stores the result of a repeatedly computed expression in a temporary instead of computing it every time. However, if the temporary is stored in memory, it can be faster to recompute the expression than try to reload the temporary from memory[8]. Graph coloring, described in Section 2.1, is the commonly used mechanism for register allocation. Since graph coloring problem is an NP-complete problem, heuristics are needed to find a practical solution. Several commonly used heuristics are detailed. Section 2.2 describes Chaitin s[2] method, Section 2.3 presents Briggs [4] method, Section 2.4. presents Chow s[3] method. Finally, the common problem with all these methods is presented in Section Graph Coloring Register Allocation Register allocation is commonly treated as a graph-coloring problem. An interference graph for a function is a graph in which a node represents a value in the function and an edge represents a conflict between two nodes. Two nodes conflict with each other if the values that the nodes represent are live at the same time. As an example of an interference graph, consider the code fragment of Figure 2-1(a). A value is live at a point p if the value is defined before p and is used some point after p. A live range of a value is the region from its definition to its last use. In the section of code shown in Figure 2-1(a), the live range of variables a, b and c overlap with each other.this means that these three variables cannot share the same register. Therefore, in the interference graph, an edge exists between each pair of nodes. Figure 2-1(c) shows the corresponding interference graph. 3

13 2. Register Allocation a = 3 b = 4 c = 1 c = b+c a = a* c output c output a a b c b c code segment live range interference graph (a) (b) (c) Figure 2-1 An Example Of Interference Graph a The problem of assigning these values to N registers is equivalent to the coloring of the interference graph using N colors so that each node has a different color from all its neighbors. A graph is N-colorable if the graph can be colored using N colors. A node with less then N neighbors is called a colorable node. A colorable node is guaranteed to receive a color in the coloring process. One heuristic method for graph coloring starts by removing colorable nodes from the graph[2]. A color is assigned to a colorable node. The node along with its adjacent edges are deleted from the graph. By deleting the colorable node from the graph, the interference graph is simplified. This makes other nodes in the graph colorable. When no nodes in the graph have less than N neighbors, the coloring process is blocked. Spilling is required in order to continue the coloring. The value that is spilled is resident in memory instead of in a register. Spill code is required to move the spilled value between memory and a register. When all the nodes in the interference graph are colorable, it is trivial to do register allocation. However, when the interference graph is not N-colorable, spill decisions must be made. Choosing different nodes to spill can result in codes with a different number of memory operations. The smaller the number of memory operations, the faster the code runs. The graph in Figure 2-1(c) is not 2-colorable. Thus if there are only two registers available, at least one of the variables must be spilled into memory. Spill code is then inserted to move the value to/from memory. Figure 2-2(a) shows the resulting code for spilling a and Figure 2-2(b) shows the resulting code for spilling b. 4

14 2. Register Allocation a = 3 a = 3 store a b = 4 c = 1 c = b+c load a a = a*c store a output c load a output a b =4 store b c = 1 load b c = b+c a = a*c output c output a code after spill a code after spill b (a) (b) Figure 2-2 Resulting code of spilling different nodes As shown in Figure 2-2, the resulting code is affected by the spill decision. When a is spilled, two load and two store instructions are added resulting in a total of four memory operations. On the other hand, when b is spilled, one load and one store instruction are added resulting in a total of two memory operations. Therefore, spilling b generates better code than spilling a. However, the graph coloring problem is NP-complete. Therefore, it is impractical to try to find an optimal solution. Heuristics are needed to find practical solutions for register allocation problem. The commonly used heuristics are Chaitin s method, Briggs method and Chow s method. 2.2 Chaitin s method In Chaitin s method, the spill decision is made based upon the spill cost of each node. Spill cost represents the use pattern of each node. A reference of a value is either a define or a use of the value. The higher the spill cost, the more references of the node exist in a function. The more references of the node, the more beneficial it is to put the node in a register. Chaitin s method can be pictured as in Figure 2-3: 5

15 2. Register Allocation Add Spill code Spilling Build the interference graph Simplify the interference graph no spilling color the interference graph Figure 2-3 Chaitin s Register Allocator Chaitin s method performs register allocation iteratively. In each iteration, the interference graph is built. The graph is simplified either by removing a colorable node or by spilling. Spill code is inserted for any node that is spilled. The simplification process stops once the graph is empty. If during one iteration, no node is spilled then the register allocation process is done. Otherwise, the interference graph is rebuilt and recolored to check if spilling is sufficient Interference Graph In Chaitin s interference graph, conflicts are calculated in a way different from the one described in Section 2.1. In Section 2.1, two nodes conflict with each other if they are live at the same time. However, in Chaitin s method, two nodes conflict with each other if one of them is live and available at the definition point of the other. Chromatic number is the minimum number of colors that are required to color a graph successfully. Using Chaitin s method, an interference graph with smaller chromatic number can be obtained for some programs than using the method described in Section 2.1. For example, consider the code segment in Figure

16 2. Register Allocation label: a =... b =... if (a>b) { m =... sum = a+m } else { n =... sum = b+n } output sum Figure 2-4 Example Code Segment for Chaitin s Interference Graph Variables a, b, m, n and sum are live at the statement label. However, m is no longer live at the definition point of n. This is because only one branch of a conditional structure is taken during the program execution time. Therefore, in Chaitin s interference graph, m doesn t conflict with n. Furthermore, variables a, b, m and n do not conflict with variable sum since they are no longer live after the definition of sum. Therefore, sum can share the same register with either of them. The resulting interference graph by using Chaitin s method is shown in Figure 2-5. a b m n sum Figure 2-5 Chaitin s Interference Graph After the interference graph is built, nodes that are the source and target of copy operations are coalesced together if they do not conflict with each other. This is done by combining nodes into a single node which interferes with all the nodes with which coalesced nodes conflicted before coalescing. Therefore, once nodes are coalesced 7

17 2. Register Allocation together, they are assigned to the same register and copy operation becomes unnecessary. The disadvantage of it is that the chromatic number of the interference graph can be higher after coalescing. Spilling may be required after coalescing even though the interference graph before coalescing is N-colorable. Therefore, coalescing is only done if doing so does not make a colorable node have a degree N Coloring Chaitin s allocator tries to color the interference graph using N colors, where N is the number of registers available in a machine, using the method described in Section 2.1. When the graph is not colorable, spilling is required. Spill code is added to the original program and the interference graph is rebuilt. Register allocation is repeated until the resulting code has an interference graph that is N-colorable. Two phases exist in Chatin s allocator: graph simplification and graph coloring. Chaitin s allocator simplifies the interference graph using the following two steps. These two steps are repeated until the graph is empty. 1. Remove nodes with less than N neighbors from the graph along with its edges. Push these nodes onto a stack for coloring. 2. If no node exists with less than N neighbors in the graph, pick a node with lowest cost degree ratio to spill. Remove the node and its adjacent edges from the graph. Colorable nodes are removed from the graph in a random order. When the simplification process is blocked, a node is picked to spill and the spill code is added to the program. Spill code is load/store instructions that move a value between memory and a register. Therefore, even if a value is spilled, a register is required to load the value from memory. Since spilling involves new register use, the interference graph must be rebuilt and colored to check if more spilling is needed. This process is repeated until no more values need to be spilled. When the simplification process is blocked, a heuristic is needed to decide which node to spill. In Chaitin s method, each node is associated with spill cost. Spill cost equals the number of defines and uses weighted by the nesting level. The degree of a node equals the number of edges associated with the node in the interference graph. When the coloring 8

18 2. Register Allocation process reaches a point where spilling is necessary, Chaitin s method spills a node with the lowest ratio of spill cost to the node s current degree. These are the nodes that are referenced less frequently, therefore, fewer load/store instructions are required after spilling. Also, these nodes have a high degree. The interference graph is more likely to be N-colorable without further spilling if nodes with a high degree are removed from the graph. After the graph is simplified without spilling, the coloring process starts. Nodes are added back to the interference graph in the reverse order of deletion. Each node is assigned a color that is different from all its neighbors Problems with Chaitin s method The spill decision of Chaitin s method is too pessimistic. In Chaitin s method, a node can be spilled when N of its neighbors are colored even if two neighbors have the same color. For example, consider the graph in Figure 2-6. The graph can be easily colored using two colors. However, using Chaitin s method, the graph is not 2-colorable. No node with a degree less than two exists in the graph. Therefore, a node is spilled in order to continue the coloring process Figure colorable Interference Graph Another problem with Chaitin s method is spilling everywhere. Spilling everywhere can cause more memory operations than necessary. In spilling everywhere, once a value is spilled, it is loaded into a register from memory before each use and stored back to memory after each definition. However, in some portions of a program, the register pressure can be so low that the spilled node can be assigned to a register. In this way, the number of load/store instructions is lowered in the resulting code. 9

19 2. Register Allocation For example, consider the section of code shown in Figure 2-7 stmt 1: a = 3 stmt 2: b = 4 stmt 3: c = 1 stmt 4: c = b+c stmt 5: b = b*c stmt 6: output b stmt 7: c = a*c stmt 8: output c stmt 9: output a a b c a b c Code Segment Live Range Interference Graph Figure 2-7 Code Segment As shown in Figure 2-7, the interference graph is not 2-colorable. Spill cost of a, b and c are equal to 3, 4 and 7, respectively. If the graph is colored using two registers, node a is spilled. If a is spilled everywhere, then a is in memory throughout its live range. A load instruction is added before each use and a store instruction is added after each definition. The resulting code after spilling everywhere is shown in Figure 2-8(a). A closer look at the code in Figure 2-7 shows that variable b is no longer live after stmt 6. Only two variables a and c are used in stmt 7, 8 and 9. Both variable a and c can be in registers after stmt 6. The resulting code after spilling is shown in Figure 2-8(b). One load instruction is deleted compared to spilling everywhere. a(r1) = 3 store a(r1) b(r1) = 4 c(r2) = 1 c(r2) = b(r1) + c(r2) b(r1) = b(r1)*c(r2) output b(r1) load a(r1) c(r2) = a(r1)*c(r2) output c(r2) load a(r1) <-- redundant since a output a(r1) is in r1 already a(r1) = 3 store a(r1) b(r1) = 4 c(r2) = 1 c(r2) = b(r1)+c(r2) b(r1) = b(r1)*c(r2) output b(r1) load a(r1) c(r2) = a(r1)*c(r2) output c(r2) output a(r1) (a) resulting code after spill everywhere (b) a better way of adding spill code Figure 2-8 Resulting Code After Spilling 10

20 2. Register Allocation 2.3 Briggs Method Briggs method is an improvement over Chaitin s method. In Chaitin s method, nodes with N colored neighbors are spilled during the graph simplification stage. The assumption behind this is that once N neighbors of a node are colored, no color is left for this node. However, this is not always true. If two neighbors are assigned the same color, then there is a color left for the node. In this case, spilling is unnecessary. Briggs made an improvement by delaying the spill decision. In Briggs method, during graph simplification, nodes are simply removed from the interference graph in the increasing order of the cost/degree ratio. No node is spilled even it is not colorable. After all the nodes are removed from the interference graph, nodes are added back to the graph in the reverse order of deletion. A node is colored so that it has a different color from all its colored neighbors. If no such color is available, then the node is spilled and spill code is added. In this way, the algorithm can check if a color is available for a node before spilling. Briggs method uses spill cost to determine the order in which nodes are removed from the interference graph. Spill cost determines which node is more cost effective to be in a register. Nodes are removed from the interference graph and pushed on to a stack in the increasing order of cost/degree ratio. Therefore, more expensive nodes are on the top of the stack. This ensures that more expensive nodes are likely to be colored sooner than less expensive nodes. Nodes with less than N neighbors are removed from the graph in a random order. When no such node exists, a node with lowest ratio of spill cost to its current degree is removed from the graph without spilling. The spill cost for each node is calculated in the same way as in Chaitin s method. In this way, when spilling is required in Briggs method, the same value is spilled as in Chaitin s method. Therefore, Briggs method spills a subset of nodes that are spilled in Chaitin s method. Figure 2-9 shows how Briggs method works 11

21 2. Register Allocation Add Spill code Spilling Build the interference graph Simplify the interference graph color the interference graph no spill Figure 2-9 Briggs Register Allocator For the graph in Figure 2-6, using Chaitin s method, a node is spilled in order to color the graph using two colors. Assuming node 1 is picked, Chaitin s method spills the node to memory. Spill code is added to the original code and the node is removed from the interference graph along with its adjacent edges. This makes the remaining graph 2- colorable and node 2, 3 and 4 can be removed from the graph. Therefore, using Chaitin s method, spilling is required. The same interference graph is 2-colorable using Briggs method. Since Briggs method removes the nodes in the same order as Chaitin s method, nodes are removed and pushed to the stack in the order 1, 2, 3,4. Even though node 1 is not colorable, it is not spilled at this point. Therefore, no spill code is added. After all the nodes are removed from the graph, the nodes are popped from the stack and appropriate colors are assigned to each node. If no suitable color is found for a node, the node is spilled to memory and spill code is added. Using this method, the interference graph in Figure 2-6 can be colored using two colors without spilling. The result of Briggs method is shown in Figure r1 r2 r2 r1 Figure 2-10 The Resulting of Coloring Using Briggs Method Although Briggs method makes better spill decisions than Chaitin s method, it still 12

22 2. Register Allocation suffers from the problem of spilling everywhere. 2.4 Chow s Method Chow s method overcomes the spill everywhere problem from which Chaitin s and Briggs method suffer. In Chow s method, the live range is computed at the basic block level. Each live range is a candidate for a register. In register allocation, a live range can be split into several smaller live ranges. Each smaller live range can be assigned to a different register or to memory. Also, in Chow s method, register allocation is done in a single pass. This is done by reserving a number of registers to handle the accesses to variables that are spilled. With no backtracking, register allocation can be performed faster. However, reserving registers can result in more spilling than necessary. There are two major differences between Chow s method and Chaitin s method. First, Chow s method uses a coarser interference graph than Chaitin s method. In Chaitin s method, the conflicts are calculated at the instruction level whereas in Chow s method, the conflicts are calculated at the basic block level. Using Chow s method, an interference graph with a higher chromatic number may result. For example, consider the section of code in Figure 2-11(a). Chaitin s interference graph for this section of code is shown in Figure 2-11(b). Chow s interference graph for the same section of code is shown in Figure 2-11(c). Variables a, b and c are live in the same basic block. Therefore, they interfere with each other in Chow s interference graph which makes the graph not 2-colorable. While using Chaitin s method which computes interference at the instruction level, variable a is no longer live after the definition point of c. Therefore, in Chaitin s interference graph, there is no edge between nodes a and c and the graph is 2-colorable. 13

23 2. Register Allocation a = b = c = b+a output c output b (a) code in a basic block (b) Chaitin s Interference Graph (c) Chow s Interference Graph Figure 2-11 Difference Between Chow s and Chaitin s Interference Another difference between Chow s and Chaitin s methods is that Chaitin s method never splits a live range. When the allocation process is blocked, a variable is selected to spill. Instead of spilling, Chow s method splits the live range that cannot be colored. By splitting, the node divides into several smaller live ranges. These smaller live ranges usually interfere with fewer nodes in the graph. This may result in coloring the graph without spilling to memory or at least only spilling part of its live range to memory. The effect of live range splitting is shown in Figure b a splitting a a a b c (a) live range before spilling a b c a (b) live range and interference graph after splitting a c Figure 2-12 Example of the Effectiveness of Live Range Splitting 14

24 2. Register Allocation As shown in Figure 2-12(a), variable a interferes with variables b and c. Therefore, the corresponding interference graph is not 2-colorable. If variable a is spilled, a is resident in memory throughout its live range. However, splitting a into two smaller live ranges a and a, part of a s live range--a can be in a register. Therefore, fewer memory operations are needed. Chow s live-range splitting technique separates a portion of a live range as large as possible to avoid creating too many small live ranges. However, this can end up splitting a live range inside a loop, even though it is possible to split the nodes outside the loop. For example, with live ranges shown in Figure 2-13(a), Chow s method keeps splitting live range a as long as it is possible, which ends up splitting a inside the loop as shown in Figure 2-13(b). A register to register move instruction or spill code is added inside the loop that is executed more frequently than other portions of the program. A better way to split a is to split it outside the loop as show in the Figure 2-13(c). The problem of splitting inside a loop results from the fact that no information about program structure is encoded into the interference graph. Therefore, there is no way to detect the loop boundary when splitting. loop a b c a1 b c a1 b c a2 a2 (a) live ranges (b) split a with Chow s (c) method better way of spilt a a1 a2 represent different live ranges after splitting a Figure 2-13 Live Range Splitting 15

25 2. Register Allocation 2.5 The Problem The problem with graph coloring register allocation results from its lack of ways to encode program structure information in the interference graph. Although spill cost gives values inside of a loop higher priorities, values in a conditional structure are still treated equally with values outside a conditional structure. Spill cost ignores the fact that some branches of a conditional structure are less likely to be executed compared to other portions of a program. When spilling is required, spilling a node in a conditional structure may result in fewer executed memory operations. For example, consider the control flow graph in Figure Using Chaitin s and Briggs method, node a spills and a load instruction is added in block 0 which is executed every time the program runs. A better solution is to spill node b which results in adding two load instructions. The probability of the load instructions are executed is only 25%. Therefore, the number of memory operations at execution time may be reduced. Block 0... = a 50% Block 1 Block 2 50%... = b... = b Block 3 Figure 2-14 Control Flow Graph 16

26 2. Register Allocation Also, in Figure 2-15, node b is used but not defined in a loop. If b is spilled using Chaitin s or Briggs method, a load b instruction is added inside the loop that is executed n times in one execution. Since b is never defined in the loop, it is better to load b before entering the loop. In this way, the load instruction is only executed once. load b n = b... n load b n = b... = b... (a) Control Flow Graph (b) Chaitin s Method for adding spill code (c) better way of adding spill code Figure 2-15 Adding Spill Code for Loop Structure The examples discussed above suggest that program structure can help to make better spill decisions and help to find a better place to insert spill code. However, it has not yet been shown on real programs that it is beneficial to encode the program structure information in the register allocation process. 17

27 3. Callahan and Koblenz s Algorithm 3 Callahan and Koblenz s Algorithm Callahan and Koblenz s algorithm uses a tree structure called a tile tree to represent program structure. The algorithm performs register allocation through two passes. In the first pass, the tile tree is visited in a bottom-up fashion and an interference graph is built for each tile and colored using pseudo-registers. Local spill decisions are made and register allocation information is summarized for each tile. Tile register allocation summary is passed to its parent. In the second pass, the tile tree is visited in a top-down fashion to update spill decisions and map pseudo-registers to physical registers. Using tile structure, program sections with different execution frequency like a loop or a conditional structure can be separated. In this way, the program structure can be encoded in the register allocation process. 3.1 Tile Tree Before performing the two passes of register allocation, a tile tree is built for each function. A tile is a set of basic blocks that represents a loop or a conditional structure of a program. A tile tree is valid if each tile in the tile tree is either disjoint or a subset of another. If tile t 1 is a subset of tile t 2 and there is no tile t that satisfies t t t, then t is the parent of t 1. Blocks(t) is a set of basic blocks that belong to tile t but do not belong to any child of t. An entry edge of tile t is an edge <n,m>, where n blocks ( parent ( t) ) and m belongs to tile t. An exit edge of tile t is an edge <m,n>, where m belongs to tile t and n blocks ( parent ( t) ). The root of the tile is the tile t that satisfies { start, end} blocks ( t) where start is the basic block with no predecessor and end is the basic block with no successor. There are many different tile trees that can be built for a function. The main purpose of building a tile tree is to separate loop structures or conditional structures from other portions of a function. Therefore, in this thesis, a tile tree is built based on a control flow graph(cfg) using the following steps: 18

28 3. Callahan and Koblenz s Algorithm Step 1. Start with a tile that includes all basic blocks in the CFG. This is the root of the tile tree. Step 2. Identify loops in the program. Basic block A dominates basic block B if all the edges entering B pass through A. Basic block A post-dominates basic block B if all the edges exiting B pass through A. A loop head is the entry to a loop. A loop tail is the exit from a loop. If the loop head does not dominate all basic blocks in a loop, then a new loop head is formed so that the loop has a single entry. Similarly, if the loop tail does not post-dominate all basic blocks in a loop, then a new loop tail is formed so that the loop only has a single exit. Each loop is considered a tile by itself. Step 3. Recognize conditional structures inside a loop L. Loops that are strictly contained in L are coalesced together as a single basic block. Basic blocks in L are divided into several groups using dominator and post-dominator relations. In each group g = {b n,..., b i, b i+1... b m }, b i dominates b i+1 and is post-dominated by b i+1. If basic blocks in g are directed connected, g is a tile. Otherwise, g is blocks(l). Step 4. Once tiles are recognized, a tile tree is built based on the parent and child relationship. Program structure is represented hierarchically by the tile tree. Inner loops and conditional structures are on the bottom of the tile tree where outer loops are on the top. A new basic block is added on the entry and exit edge of each tile. For example, consider the control flow graph in Figure

29 3. Callahan and Koblenz s Algorithm B1 B2 B3 B4 B5 B6 Figure 3-1 Control Flow Graph Using the steps described above, root tile t 0 {B1, B2, B3, B4, B5, B6} is built first. Then the loop {B2, B3, B4, B5} is recognized as a tile t 1. In t 1, basic blocks are divided into three groups by dominate and post-dominate relations. The three groups are g 1 = {B3}, g 2 = {B4} and g 3 = {B1, B5}. Group g 1 and g 2 are considered as tile t 2 and t 3 since only one basic block exists in each group. Group g 3 is blocks(t 1 ) since B2 and B5 are not directly connected. New basic blocks NB0 to NB5 are added along the entry and exit edge of each tile. The resulting tile tree is shown in Figure

30 3. Callahan and Koblenz s Algorithm t 0 = {B1,B2,B3,B4,B5,B6} t 1 = {B2,B3,B4,B5} t 2 = {B3} t 3 = {B4} t 0 t 1 t 2 t 3 Tiles Tile Tree B1 NB0 B2 NB2 B3 NB3 NB4 B4 NB5 B5 NB1 B6 The Result CFG After Adding New Basic Blocks Figure 3-2 Tiles and Tile Tree 3.2 The First Phase In the first phase of register allocation, a tile tree is visited in the bottom-up fashion. An interference graph is built for each tile and colored using pseudo-registers. The number of pseudo-registers equals the number of registers a target machine provides. The register 21

31 3. Callahan and Koblenz s Algorithm allocation information is summarized and passed up to its parent. Figure 3-3 shows the outline of the first pass of the register allocation. first_pass(tile t) for each child s of t do first_pass(s) end do build_interference_graph for t based on information from t and t s children color the interference graph summarize tile register allocation information for parent tile of t Figure 3-3 Outline of the first pass of tile register allocation Tile Allocation Summary Variables are divided into two groups according to each tile. Local variables are variables that are live only inside a tile. Global variables are variables that are live along any entry or exit edges of a tile. After a tile is visited in the first pass, each variable that is used in the tile is either assigned to a register or spilled to memory. Local register allocation information is summarized and passed to its parent. A tile summary variable is created for each register that is assigned a local variable. Local variables assigned to the same register are represented by the tile summary variable associated with that register. Information for each local variable is passed to a tile s parent through tile summary variables. Using tile summaries, local variables allocated to the same register are coalesced. Passing information of tile summary variables, therefore, is more efficient than passing information for each local variable. For each global variable that is assigned to a register in a tile, two types of conflicts are passed to the tile s parent. First, the conflicts between a global variable and a tile 22

32 3. Callahan and Koblenz s Algorithm summary variable are passed. A conflict exists between a global variable g and a tile summary variable t if g conflicts with any local variable that is contained in t. Second, conflicts between global variables that are in registers in the current tile are passed Tile Interference graph An interference graph is built for each tile in the first pass. There are two types of variables in the interference graph; variables that are referenced in the current tile and the tile summary variables from its child tiles. The first type represents the usage of values of the current tile. The second type represents the usage of variables that are local to the current tile s children. Variables that are live across the current tile but never referenced in the current tile are not included in the current tile s interference graph during the first pass. These values are the best candidates to spill when spilling is required. An edge exists between two nodes if they conflict with each other. Conflicts are calculated based on the following: 1. Two values conflict with each other if they conflict in a basic block in blocks(t). This is the typical conflict of a tile without considering its child tile s allocation information. In Chaitin s method, it is sufficient to only conflict a variable that is defined in a statement p with all variables that are live and available at p. However, using tile register allocation, the definition point of a live variable may not be included in a tile. Therefore, conflicts for a basic block in blocks(t) are computed as following: conflict a variable defined at p with all variables that are live at p and referenced in blocks(t) conflict a variable used at p with all variables that are live at p and referenced in blocks(t) conflict live variables that are referenced in blocks(t) By including conflicts between live variables, no conflict is missed, yet conflicts are not over estimated by restricting these live variables to be referenced in blocks(t). 2. A node corresponding to a tile summary variable conflicts with other tile sum- 23

33 3. Callahan and Koblenz s Algorithm mary variables from the same subtile.tile summary variables, however, do not conflict with tile summary variables from sibling tiles. This is because tile summary variables represent local variables and the fact that live ranges of local variables from sibling tiles never overlap. 3. For each subtile, global variables conflict with tile summary variables as indicated in the conflict summary for each tile. 4. A variable that is spilled in a subtile but live in the current tile conflicts with all tile summary variables from that subtile. 5. A variable that is live across a subtile but is not used or defined in the subtile conflicts with all the tile summary variables from that subtile Coloring and Preferencing In Callahan and Koblenz s algorithm, the tile interference graph is colored through one pass with no backtracking. Therefore, registers are reserved to handle the access to the spilled variables. Preferencing is used to meet the machine linkage conventions and to minimize the number of register to register move operations. In some machine conventions, certain variables have to be placed in particular registers, like parameters passed to a function or values returned from a function. Specific registers are set to be the preferencing for those variables. Also, since register allocation is performed for each tile locally, it is possible that in a different tile the same variable is assigned to a different register. In this case, a register to register move instruction is required to preserve the program semantics. In order to minimize the number of move instructions, each node in the interference graph is associated with a preferencing. The preferencing records registers the value is assigned in other tiles. The interference graph can be colored using any standard method. In this thesis, Briggs method is used. Some changes have been made in order to satisfy preferencing. When coloring a node that has preferencing, a preferred register is assigned to the node if available. If the register is not available, the preferencing is ignored. When coloring a node without preferencing, a color that is different from all of its colored neighbors as well 24

34 3. Callahan and Koblenz s Algorithm as different from the preferencing of all of its uncolored neighbors is assigned. If no such color is available, a color that is not the same as any of its colored neighbors is assigned to the node. 3.3 The Second Pass The second pass starts after the root tile is visited in the first pass. In the second pass, the tile tree is visited in the top-down fashion. An interference graph is rebuilt for each tile. The interference graph is built in the same way as in the first pass with one exception. In the second pass, nodes that are live across a tile but are never referenced are included in the interference graph if they are in a register in the parent tile. Those variables conflict with all the other nodes in the interference graph. The interference graph is colored in the same way as in the first pass. Figure 3-4 shows the outline of the second pass of register allocation. second_pass(t) rebuild_interference_graph for t based on information from t, t s children and t s parent tile. color the interference graph color spilled values and add spill code save register allocation information for child tiles of t for each child s of t do second_pass(s) end do Figure 3-4 Outline of the second pass of tile register allocation. 25

35 3. Callahan and Koblenz s Algorithm Coloring Spilled Nodes Registers are needed for moving spilled values between memory and the registers. In Callahan and Koblenz s method, a number of registers are reserved for accessing the spilled values. Spilled nodes are colored using the reserved registers. Spilled values are colored in such a way that two spilled values can not be loaded to the same register at the same time. This is done by ensuring that two spilled values used at a statement are not assigned to the same color. For each statement, if a spilled node is not colored, a reserved color is assigned to the node. If a spilled node is already colored, the color is checked to see if it is the same as other spilled node used in statement p. If this is true, then the node is assigned to a different color in p Adding Spill Code After the second pass, the final spill decision is made for each tile. New basic blocks are added along the entry and exit edges of each tile. Spill code is added to the new basic blocks. There are four different results of register allocation. 1. If a variable v is spilled in tile t but resident in a register in t s parent, then on the entry edge of t where v is live, a store instruction is added and on the exit edge of t where v is live a load instruction is added. 2. If a variable v is in a register in tile t but spilled in t s parent, then on the entry edge of t where v is live, a load instruction is added and on the exit edge of t where v is live a store instruction is added. If in tile t, v is never redefined, the store instruction is redundant. This is because the value of v is already in memory and never changed. 3. If a variable v is assigned to a register in tile t and is assigned to a different register in t s parent, register move instructions are inserted on the entry and exit edges where v is live. 4. If a variable v is spilled both in tile t and in t s parent, no instructions are needed in t s boundary since they all spill the variable to the same memory location. A load instruction is added before each use of v and a store instruction is added after each definition of v in blocks(t). 26

36 3. Callahan and Koblenz s Algorithm Code that is added to the new basic block needs to be ordered. Store and move from instructions should proceed load and move to instructions. For example, assume variable a is assigned to register r4 in tile t and r5 in t s parent tile. A more instruction from r5 to r4 is added along t s entry edge and a move instruction from r4 to r5 is added along t s exit edges. If another variable b is assigned to register r5 in tile t and r6 in t s parent tile. A move instruction from r6 to r5 is needed for b along t s entry edge and a move instruction from r5 to r6 is needed along t s exit edge. In order to ensure the program correctness, the move instructions should be orders as following: t s entry t s exit mov r4, r5 mov r5, r6 tile t mov r6, r5 mov r5, r4 move from r5 instructions proceed move to r5 instructions Figure 3-5 An example of move from proceed move to Sometimes, however, a move loop can exist. In this case, an unused register can be used to break the move loop. If no such register exists, then a node needs to be spilled. For example, variable a is assigned to register r4 and r5 in t s child tile s. Variable b is assigned to register r5 in tile t and r4 in t s child tile s. On the entry and exit edge of s, a move loop exists between register r4 and r5 mov r4, r5 mov r5, r4 r4 r5 A move loop mov r6, r5 mov r5, r4 mov r4, r6 Break the loop by using unused register r6 store r5 to memory address p mov r5, r4 load from p to r4 Break the loop by spilling b(r5) Figure 3-6 An example for breaking move loops 27

37 3. Callahan and Koblenz s Algorithm 3.4 Spill cost and Spill Decisions The following equations are used to determine which nodes deserve to be in registers. LocalWeight ( v) = prob ( b) ref ( v) t b b Transfer ( v) = prob ( e) Live ( v) t e e Reg ( v) = Reg ( v) min( Transfer ( v), Weight ( v) ) t t t t Mem ( v) = Mem ( v) Transfer ( v) t t t Weight ( v) = Reg ( v) Mem t ( v) + LocalWeight ( v) s s t s t - current tile b - all the basic blocks in blocks(t) e - entry and exit edges of t s - t s child tiles prob(b) - the probability of b being executed prob(e) - the probability of e being executed ref b (v) -the number of references in basic block b live e (v) equals 1 if v is live along the edge e, otherwise 0 reg t (v) equals 1 if v is in register in t, otherwise 0. mem t (v) is 1 if v is in memory in t, otherwise 0. Figure 3-7 Formulas for calculating spill cost LocalWeight represents the typical use pattern of value v in blocks(t) without considering information from t s child tiles. LocalWeight of tile summary variables from t s child tile equals 0. Transfer calculates the cost of loading and storing v along t s entry and exist edges. Reg is the cost of allocating v to memory in t s parent tile when v is in a register in t. 28

Code generation for modern processors

Code generation for modern processors Code generation for modern processors Definitions (1 of 2) What are the dominant performance issues for a superscalar RISC processor? Refs: AS&U, Chapter 9 + Notes. Optional: Muchnick, 16.3 & 17.1 Instruction

More information

Code generation for modern processors

Code generation for modern processors Code generation for modern processors What are the dominant performance issues for a superscalar RISC processor? Refs: AS&U, Chapter 9 + Notes. Optional: Muchnick, 16.3 & 17.1 Strategy il il il il asm

More information

Global Register Allocation via Graph Coloring

Global Register Allocation via Graph Coloring Global Register Allocation via Graph Coloring Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission

More information

CSC D70: Compiler Optimization Register Allocation

CSC D70: Compiler Optimization Register Allocation CSC D70: Compiler Optimization Register Allocation Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry and Phillip Gibbons

More information

register allocation saves energy register allocation reduces memory accesses.

register allocation saves energy register allocation reduces memory accesses. Lesson 10 Register Allocation Full Compiler Structure Embedded systems need highly optimized code. This part of the course will focus on Back end code generation. Back end: generation of assembly instructions

More information

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Register Allocation Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class

More information

Compiler Design. Register Allocation. Hwansoo Han

Compiler Design. Register Allocation. Hwansoo Han Compiler Design Register Allocation Hwansoo Han Big Picture of Code Generation Register allocation Decides which values will reside in registers Changes the storage mapping Concerns about placement of

More information

CS 406/534 Compiler Construction Putting It All Together

CS 406/534 Compiler Construction Putting It All Together CS 406/534 Compiler Construction Putting It All Together Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy

More information

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006 P3 / 2006 Register Allocation What is register allocation Spilling More Variations and Optimizations Kostis Sagonas 2 Spring 2006 Storing values between defs and uses Program computes with values value

More information

Register allocation. Overview

Register allocation. Overview Register allocation Register allocation Overview Variables may be stored in the main memory or in registers. { Main memory is much slower than registers. { The number of registers is strictly limited.

More information

Global Register Allocation

Global Register Allocation Global Register Allocation Y N Srikant Computer Science and Automation Indian Institute of Science Bangalore 560012 NPTEL Course on Compiler Design Outline n Issues in Global Register Allocation n The

More information

Fall Compiler Principles Lecture 12: Register Allocation. Roman Manevich Ben-Gurion University

Fall Compiler Principles Lecture 12: Register Allocation. Roman Manevich Ben-Gurion University Fall 2014-2015 Compiler Principles Lecture 12: Register Allocation Roman Manevich Ben-Gurion University Syllabus Front End Intermediate Representation Optimizations Code Generation Scanning Lowering Local

More information

Topic 12: Register Allocation

Topic 12: Register Allocation Topic 12: Register Allocation COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 Structure of backend Register allocation assigns machine registers (finite supply!) to virtual

More information

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Register Allocation Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP at Rice. Copyright 00, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

Register Allocation. Register Allocation. Local Register Allocation. Live range. Register Allocation for Loops

Register Allocation. Register Allocation. Local Register Allocation. Live range. Register Allocation for Loops DF00100 Advanced Compiler Construction Register Allocation Register Allocation: Determines values (variables, temporaries, constants) to be kept when in registers Register Assignment: Determine in which

More information

Lecture 21 CIS 341: COMPILERS

Lecture 21 CIS 341: COMPILERS Lecture 21 CIS 341: COMPILERS Announcements HW6: Analysis & Optimizations Alias analysis, constant propagation, dead code elimination, register allocation Available Soon Due: Wednesday, April 25 th Zdancewic

More information

Global Register Allocation - Part 2

Global Register Allocation - Part 2 Global Register Allocation - Part 2 Y N Srikant Computer Science and Automation Indian Institute of Science Bangalore 560012 NPTEL Course on Compiler Design Outline Issues in Global Register Allocation

More information

Liveness Analysis and Register Allocation. Xiao Jia May 3 rd, 2013

Liveness Analysis and Register Allocation. Xiao Jia May 3 rd, 2013 Liveness Analysis and Register Allocation Xiao Jia May 3 rd, 2013 1 Outline Control flow graph Liveness analysis Graph coloring Linear scan 2 Basic Block The code in a basic block has: one entry point,

More information

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210 Register Allocation (wrapup) & Code Scheduling CS2210 Lecture 22 Constructing and Representing the Interference Graph Construction alternatives: as side effect of live variables analysis (when variables

More information

Register Allocation 3/16/11. What a Smart Allocator Needs to Do. Global Register Allocation. Global Register Allocation. Outline.

Register Allocation 3/16/11. What a Smart Allocator Needs to Do. Global Register Allocation. Global Register Allocation. Outline. What a Smart Allocator Needs to Do Register Allocation Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Determine ranges for each variable can benefit from using

More information

The C2 Register Allocator. Niclas Adlertz

The C2 Register Allocator. Niclas Adlertz The C2 Register Allocator Niclas Adlertz 1 1 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated

More information

Low-Level Issues. Register Allocation. Last lecture! Liveness analysis! Register allocation. ! More register allocation. ! Instruction scheduling

Low-Level Issues. Register Allocation. Last lecture! Liveness analysis! Register allocation. ! More register allocation. ! Instruction scheduling Low-Level Issues Last lecture! Liveness analysis! Register allocation!today! More register allocation!later! Instruction scheduling CS553 Lecture Register Allocation I 1 Register Allocation!Problem! Assign

More information

Compilers and Code Optimization EDOARDO FUSELLA

Compilers and Code Optimization EDOARDO FUSELLA Compilers and Code Optimization EDOARDO FUSELLA Contents Data memory layout Instruction selection Register allocation Data memory layout Memory Hierarchy Capacity vs access speed Main memory Classes of

More information

Register Allocation & Liveness Analysis

Register Allocation & Liveness Analysis Department of Computer Sciences Register Allocation & Liveness Analysis CS502 Purdue University is an Equal Opportunity/Equal Access institution. Department of Computer Sciences In IR tree code generation,

More information

Register allocation. TDT4205 Lecture 31

Register allocation. TDT4205 Lecture 31 1 Register allocation TDT4205 Lecture 31 2 Variables vs. registers TAC has any number of variables Assembly code has to deal with memory and registers Compiler back end must decide how to juggle the contents

More information

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1 CSE P 501 Compilers Register Allocation Hal Perkins Autumn 2011 11/22/2011 2002-11 Hal Perkins & UW CSE P-1 Agenda Register allocation constraints Local methods Faster compile, slower code, but good enough

More information

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices Register allocation IR instruction selection register allocation machine code errors Register allocation: have value in a register when used limited resources changes instruction choices can move loads

More information

CS301 - Data Structures Glossary By

CS301 - Data Structures Glossary By CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm

More information

Lecture 6. Register Allocation. I. Introduction. II. Abstraction and the Problem III. Algorithm

Lecture 6. Register Allocation. I. Introduction. II. Abstraction and the Problem III. Algorithm I. Introduction Lecture 6 Register Allocation II. Abstraction and the Problem III. Algorithm Reading: Chapter 8.8.4 Before next class: Chapter 10.1-10.2 CS243: Register Allocation 1 I. Motivation Problem

More information

Global Register Allocation - Part 3

Global Register Allocation - Part 3 Global Register Allocation - Part 3 Y N Srikant Computer Science and Automation Indian Institute of Science Bangalore 560012 NPTEL Course on Compiler Design Outline Issues in Global Register Allocation

More information

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413 Variables vs. Registers/Memory CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 33: Register Allocation 18 Apr 07 Difference between IR and assembly code: IR (and abstract assembly) manipulate

More information

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Traditional Three-pass Compiler

More information

Lecture 25: Register Allocation

Lecture 25: Register Allocation Lecture 25: Register Allocation [Adapted from notes by R. Bodik and G. Necula] Topics: Memory Hierarchy Management Register Allocation: Register interference graph Graph coloring heuristics Spilling Cache

More information

Register allocation. instruction selection. machine code. register allocation. errors

Register allocation. instruction selection. machine code. register allocation. errors Register allocation IR instruction selection register allocation machine code errors Register allocation: have value in a register when used limited resources changes instruction choices can move loads

More information

Quality and Speed in Linear-Scan Register Allocation

Quality and Speed in Linear-Scan Register Allocation Quality and Speed in Linear-Scan Register Allocation A Thesis presented by Omri Traub to Computer Science in partial fulfillment of the honors requirements for the degree of Bachelor of Arts Harvard College

More information

Global Register Allocation - 2

Global Register Allocation - 2 Global Register Allocation - 2 Y N Srikant Computer Science and Automation Indian Institute of Science Bangalore 560012 NPTEL Course on Principles of Compiler Design Outline n Issues in Global Register

More information

Register Allocation. Stanford University CS243 Winter 2006 Wei Li 1

Register Allocation. Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation Wei Li 1 Register Allocation Introduction Problem Formulation Algorithm 2 Register Allocation Goal Allocation of variables (pseudo-registers) in a procedure to hardware registers Directly

More information

SSA-Form Register Allocation

SSA-Form Register Allocation SSA-Form Register Allocation Foundations Sebastian Hack Compiler Construction Course Winter Term 2009/2010 saarland university computer science 2 Overview 1 Graph Theory Perfect Graphs Chordal Graphs 2

More information

Lecture 15 Register Allocation & Spilling

Lecture 15 Register Allocation & Spilling I. Motivation Lecture 15 Register Allocation & Spilling I. Introduction II. Abstraction and the Problem III. Algorithm IV. Spilling Problem Allocation of variables (pseudo-registers) to hardware registers

More information

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013 A Bad Name Optimization is the process by which we turn a program into a better one, for some definition of better. CS 2210: Optimization This is impossible in the general case. For instance, a fully optimizing

More information

A Report on Coloring with Live Ranges Split

A Report on Coloring with Live Ranges Split A Report on Coloring with Live Ranges Split Xidong Wang Li Yang wxd@cs.wisc.edu yangli@cs.wisc.edu Computer Science Department University of Wisconsin Madison December 17, 2001 1 Introduction One idea

More information

CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation

CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith

More information

Register Allocation. Preston Briggs Reservoir Labs

Register Allocation. Preston Briggs Reservoir Labs Register Allocation Preston Briggs Reservoir Labs An optimizing compiler SL FE IL optimizer IL BE SL A classical optimizing compiler (e.g., LLVM) with three parts and a nice separation of concerns: front

More information

How to efficiently use the address register? Address register = contains the address of the operand to fetch from memory.

How to efficiently use the address register? Address register = contains the address of the operand to fetch from memory. Lesson 13 Storage Assignment Optimizations Sequence of accesses is very important Simple Offset Assignment This lesson will focus on: Code size and data segment size How to efficiently use the address

More information

Register Allocation. Lecture 38

Register Allocation. Lecture 38 Register Allocation Lecture 38 (from notes by G. Necula and R. Bodik) 4/27/08 Prof. Hilfinger CS164 Lecture 38 1 Lecture Outline Memory Hierarchy Management Register Allocation Register interference graph

More information

CSc 553. Principles of Compilation. 23 : Register Allocation. Department of Computer Science University of Arizona

CSc 553. Principles of Compilation. 23 : Register Allocation. Department of Computer Science University of Arizona CSc 553 Principles of Compilation 3 : egister Allocation Department of Computer Science University of Arizona collberg@gmail.com Copyright c 0 Christian Collberg Introduction Lexing, Parsing Semantic Analysis,

More information

Static single assignment

Static single assignment Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. Static single

More information

Lecture 18 List Scheduling & Global Scheduling Carnegie Mellon

Lecture 18 List Scheduling & Global Scheduling Carnegie Mellon Lecture 18 List Scheduling & Global Scheduling Reading: Chapter 10.3-10.4 1 Review: The Ideal Scheduling Outcome What prevents us from achieving this ideal? Before After Time 1 cycle N cycles 2 Review:

More information

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210 Redundant Computation Elimination Optimizations CS2210 Lecture 20 Redundancy Elimination Several categories: Value Numbering local & global Common subexpression elimination (CSE) local & global Loop-invariant

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Register Allocation 1

Register Allocation 1 Register Allocation 1 Lecture Outline Memory Hierarchy Management Register Allocation Register interference graph Graph coloring heuristics Spilling Cache Management The Memory Hierarchy Registers < 1

More information

Register Allocation. Lecture 16

Register Allocation. Lecture 16 Register Allocation Lecture 16 1 Register Allocation This is one of the most sophisticated things that compiler do to optimize performance Also illustrates many of the concepts we ve been discussing in

More information

Today More register allocation Clarifications from last time Finish improvements on basic graph coloring concept Procedure calls Interprocedural

Today More register allocation Clarifications from last time Finish improvements on basic graph coloring concept Procedure calls Interprocedural More Register Allocation Last time Register allocation Global allocation via graph coloring Today More register allocation Clarifications from last time Finish improvements on basic graph coloring concept

More information

Compiler Optimization

Compiler Optimization Compiler Optimization The compiler translates programs written in a high-level language to assembly language code Assembly language code is translated to object code by an assembler Object code modules

More information

Global Register Allocation via Graph Coloring The Chaitin-Briggs Algorithm. Comp 412

Global Register Allocation via Graph Coloring The Chaitin-Briggs Algorithm. Comp 412 COMP 412 FALL 2018 Global Register Allocation via Graph Coloring The Chaitin-Briggs Algorithm Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda

More information

Register Allocation (via graph coloring) Lecture 25. CS 536 Spring

Register Allocation (via graph coloring) Lecture 25. CS 536 Spring Register Allocation (via graph coloring) Lecture 25 CS 536 Spring 2001 1 Lecture Outline Memory Hierarchy Management Register Allocation Register interference graph Graph coloring heuristics Spilling Cache

More information

CHAPTER 3. Register allocation

CHAPTER 3. Register allocation CHAPTER 3 Register allocation In chapter 1 we simplified the generation of x86 assembly by placing all variables on the stack. We can improve the performance of the generated code considerably if we instead

More information

Rematerialization. Graph Coloring Register Allocation. Some expressions are especially simple to recompute: Last Time

Rematerialization. Graph Coloring Register Allocation. Some expressions are especially simple to recompute: Last Time Graph Coloring Register Allocation Last Time Chaitin et al. Briggs et al. Today Finish Briggs et al. basics An improvement: rematerialization Rematerialization Some expressions are especially simple to

More information

EECS 583 Class 15 Register Allocation

EECS 583 Class 15 Register Allocation EECS 583 Class 15 Register Allocation University of Michigan November 2, 2011 Announcements + Reading Material Midterm exam: Monday, Nov 14?» Could also do Wednes Nov 9 (next week!) or Wednes Nov 16 (2

More information

Agenda. CSE P 501 Compilers. Big Picture. Compiler Organization. Intermediate Representations. IR for Code Generation. CSE P 501 Au05 N-1

Agenda. CSE P 501 Compilers. Big Picture. Compiler Organization. Intermediate Representations. IR for Code Generation. CSE P 501 Au05 N-1 Agenda CSE P 501 Compilers Instruction Selection Hal Perkins Autumn 2005 Compiler back-end organization Low-level intermediate representations Trees Linear Instruction selection algorithms Tree pattern

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Register Allocation. Michael O Boyle. February, 2014

Register Allocation. Michael O Boyle. February, 2014 Register Allocation Michael O Boyle February, 2014 1 Course Structure L1 Introduction and Recap L2 Course Work L3+4 Scalar optimisation and dataflow L5 Code generation L6 Instruction scheduling L7 Register

More information

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height = M-ary Search Tree B-Trees Section 4.7 in Weiss Maximum branching factor of M Complete tree has height = # disk accesses for find: Runtime of find: 2 Solution: B-Trees specialized M-ary search trees Each

More information

Chapter 9. Register Allocation

Chapter 9. Register Allocation Chapter 9. Register Allocation Basics of Compiler Design Torben Ægidius Mogensen Dr. Marco Valtorta, Professor Computer Science and Engineering Dept. University of South Carolina Radu Vitoc, PhD candidate

More information

Compiler Optimization and Code Generation

Compiler Optimization and Code Generation Compiler Optimization and Code Generation Professor: Sc.D., Professor Vazgen elikyan 1 Course Overview ntroduction: Overview of Optimizations 1 lecture ntermediate-code Generation 2 lectures achine-ndependent

More information

Software Pipelining by Modulo Scheduling. Philip Sweany University of North Texas

Software Pipelining by Modulo Scheduling. Philip Sweany University of North Texas Software Pipelining by Modulo Scheduling Philip Sweany University of North Texas Overview Instruction-Level Parallelism Instruction Scheduling Opportunities for Loop Optimization Software Pipelining Modulo

More information

On the Near-Optimality of List Scheduling Heuristics for Local and Global Instruction Scheduling

On the Near-Optimality of List Scheduling Heuristics for Local and Global Instruction Scheduling On the Near-Optimality of List Scheduling Heuristics for Local and Global Instruction Scheduling by John Michael Chase A thesis presented to the University of Waterloo in fulfillment of the thesis requirement

More information

CHAPTER 3. Register allocation

CHAPTER 3. Register allocation CHAPTER 3 Register allocation In chapter 1 we simplified the generation of x86 assembly by placing all variables on the stack. We can improve the performance of the generated code considerably if we instead

More information

Characteristics of RISC processors. Code generation for superscalar RISCprocessors. What are RISC and CISC? Processors with and without pipelining

Characteristics of RISC processors. Code generation for superscalar RISCprocessors. What are RISC and CISC? Processors with and without pipelining Code generation for superscalar RISCprocessors What are RISC and CISC? CISC: (Complex Instruction Set Computers) Example: mem(r1+r2) = mem(r1+r2)*mem(r3+disp) RISC: (Reduced Instruction Set Computers)

More information

Compiler Architecture

Compiler Architecture Code Generation 1 Compiler Architecture Source language Scanner (lexical analysis) Tokens Parser (syntax analysis) Syntactic structure Semantic Analysis (IC generator) Intermediate Language Code Optimizer

More information

k register IR Register Allocation IR Instruction Scheduling n), maybe O(n 2 ), but not O(2 n ) k register code

k register IR Register Allocation IR Instruction Scheduling n), maybe O(n 2 ), but not O(2 n ) k register code Register Allocation Part of the compiler s back end IR Instruction Selection m register IR Register Allocation k register IR Instruction Scheduling Machine code Errors Critical properties Produce correct

More information

Module 2: Classical Algorithm Design Techniques

Module 2: Classical Algorithm Design Techniques Module 2: Classical Algorithm Design Techniques Dr. Natarajan Meghanathan Associate Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu Module

More information

Compiler Optimization Techniques

Compiler Optimization Techniques Compiler Optimization Techniques Department of Computer Science, Faculty of ICT February 5, 2014 Introduction Code optimisations usually involve the replacement (transformation) of code from one sequence

More information

Principles of Compiler Design

Principles of Compiler Design Principles of Compiler Design Code Generation Compiler Lexical Analysis Syntax Analysis Semantic Analysis Source Program Token stream Abstract Syntax tree Intermediate Code Code Generation Target Program

More information

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss) M-ary Search Tree B-Trees (4.7 in Weiss) Maximum branching factor of M Tree with N values has height = # disk accesses for find: Runtime of find: 1/21/2011 1 1/21/2011 2 Solution: B-Trees specialized M-ary

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

More Code Generation and Optimization. Pat Morin COMP 3002

More Code Generation and Optimization. Pat Morin COMP 3002 More Code Generation and Optimization Pat Morin COMP 3002 Outline DAG representation of basic blocks Peephole optimization Register allocation by graph coloring 2 Basic Blocks as DAGs 3 Basic Blocks as

More information

Code Generation. M.B.Chandak Lecture notes on Language Processing

Code Generation. M.B.Chandak Lecture notes on Language Processing Code Generation M.B.Chandak Lecture notes on Language Processing Code Generation It is final phase of compilation. Input from ICG and output in the form of machine code of target machine. Major issues

More information

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009 What Compilers Can and Cannot Do Saman Amarasinghe Fall 009 Optimization Continuum Many examples across the compilation pipeline Static Dynamic Program Compiler Linker Loader Runtime System Optimization

More information

A Framework for Space and Time Efficient Scheduling of Parallelism

A Framework for Space and Time Efficient Scheduling of Parallelism A Framework for Space and Time Efficient Scheduling of Parallelism Girija J. Narlikar Guy E. Blelloch December 996 CMU-CS-96-97 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523

More information

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13 Run-time Environments Lecture 13 by Prof. Vijay Ganesh) Lecture 13 1 What have we covered so far? We have covered the front-end phases Lexical analysis (Lexer, regular expressions,...) Parsing (CFG, Top-down,

More information

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Register allocation. CS Compiler Design. Liveness analysis. Register allocation. Liveness analysis and Register allocation. V.

Register allocation. CS Compiler Design. Liveness analysis. Register allocation. Liveness analysis and Register allocation. V. Register allocation CS3300 - Compiler Design Liveness analysis and Register allocation V. Krishna Nandivada IIT Madras Copyright c 2014 by Antony L. Hosking. Permission to make digital or hard copies of

More information

Chapter 11. Instruction Sets: Addressing Modes and Formats. Yonsei University

Chapter 11. Instruction Sets: Addressing Modes and Formats. Yonsei University Chapter 11 Instruction Sets: Addressing Modes and Formats Contents Addressing Pentium and PowerPC Addressing Modes Instruction Formats Pentium and PowerPC Instruction Formats 11-2 Common Addressing Techniques

More information

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis Synthesis of output program (back-end) Intermediate Code Generation Optimization Before and after generating machine

More information

Algorithm Design Techniques (III)

Algorithm Design Techniques (III) Algorithm Design Techniques (III) Minimax. Alpha-Beta Pruning. Search Tree Strategies (backtracking revisited, branch and bound). Local Search. DSA - lecture 10 - T.U.Cluj-Napoca - M. Joldos 1 Tic-Tac-Toe

More information

Code generation for superscalar RISCprocessors. Characteristics of RISC processors. What are RISC and CISC? Processors with and without pipelining

Code generation for superscalar RISCprocessors. Characteristics of RISC processors. What are RISC and CISC? Processors with and without pipelining Code generation for superscalar RISCprocessors What are RISC and CISC? CISC: (Complex Instruction Set Computers) Example: mem(r1+r2) = mem(r1+r2)*mem(r3+disp) RISC: (Reduced Instruction Set Computers)

More information

Code Generation. CS 540 George Mason University

Code Generation. CS 540 George Mason University Code Generation CS 540 George Mason University Compiler Architecture Intermediate Language Intermediate Language Source language Scanner (lexical analysis) tokens Parser (syntax analysis) Syntactic structure

More information

Motivation for B-Trees

Motivation for B-Trees 1 Motivation for Assume that we use an AVL tree to store about 20 million records We end up with a very deep binary tree with lots of different disk accesses; log2 20,000,000 is about 24, so this takes

More information

Register Allocation. Introduction Local Register Allocators

Register Allocation. Introduction Local Register Allocators Register Allocation Introduction Local Register Allocators Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit

More information

Register Allocation. CS 502 Lecture 14 11/25/08

Register Allocation. CS 502 Lecture 14 11/25/08 Register Allocation CS 502 Lecture 14 11/25/08 Where we are... Reasonably low-level intermediate representation: sequence of simple instructions followed by a transfer of control. a representation of static

More information

Register Allocation in the SPUR Lisp Compiler?

Register Allocation in the SPUR Lisp Compiler? Register Allocation in the SPUR Lisp Compiler? James R. Larus Paul N. Hilfnger Computer Science Division Department of Electrical Engineering and Computer Sciences University of California Berkeley, California

More information

Virtual Memory Outline

Virtual Memory Outline Virtual Memory Outline Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating Kernel Memory Other Considerations Operating-System Examples

More information

Lecture Overview Register Allocation

Lecture Overview Register Allocation 1 Lecture Overview Register Allocation [Chapter 13] 2 Introduction Registers are the fastest locations in the memory hierarchy. Often, they are the only memory locations that most operations can access

More information

EECS 583 Class 3 More on loops, Region Formation

EECS 583 Class 3 More on loops, Region Formation EECS 583 Class 3 More on loops, Region Formation University of Michigan September 19, 2016 Announcements & Reading Material HW1 is out Get busy on it!» Course servers are ready to go Today s class» Trace

More information

Global Register Allocation

Global Register Allocation Global Register Allocation Lecture Outline Memory Hierarchy Management Register Allocation via Graph Coloring Register interference graph Graph coloring heuristics Spilling Cache Management 2 The Memory

More information

Lecture 7. Instruction Scheduling. I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code)

Lecture 7. Instruction Scheduling. I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code) Lecture 7 Instruction Scheduling I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code) Reading: Chapter 10.3 10.4 CS243: Instruction Scheduling 1 Scheduling Constraints Data dependences

More information

Investigating Different Register Allocation Techniques for a GPU Compiler

Investigating Different Register Allocation Techniques for a GPU Compiler MASTER S THESIS LUND UNIVERSITY 2016 Investigating Different Register Allocation Techniques for a GPU Compiler Max Andersson Department of Computer Science Faculty of Engineering LTH ISSN 1650-2884 LU-CS-EX

More information