Register Allocation (via graph coloring) Lecture 25. CS 536 Spring

Similar documents
Register Allocation 1

Register Allocation. Lecture 38

Global Register Allocation

Register Allocation. Lecture 16

Lecture 25: Register Allocation

Register allocation. TDT4205 Lecture 31

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Fall Compiler Principles Lecture 12: Register Allocation. Roman Manevich Ben-Gurion University

Lecture 6. Register Allocation. I. Introduction. II. Abstraction and the Problem III. Algorithm

CSC D70: Compiler Optimization Register Allocation

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413

register allocation saves energy register allocation reduces memory accesses.

Chapter 9. Register Allocation

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1

Compiler Design. Register Allocation. Hwansoo Han

Introduction to Code Optimization. Lecture 36: Local Optimization. Basic Blocks. Basic-Block Example

Global Register Allocation - Part 3

Global Register Allocation

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices

1 Motivation for Improving Matrix Multiplication

Register Allocation & Liveness Analysis

Register allocation. Overview

CS 406/534 Compiler Construction Putting It All Together

G Compiler Construction Lecture 12: Code Generation I. Mohamed Zahran (aka Z)

Low-Level Issues. Register Allocation. Last lecture! Liveness analysis! Register allocation. ! More register allocation. ! Instruction scheduling

Global Register Allocation - 2

Register Allocation. Preston Briggs Reservoir Labs

Code Generation. CS 540 George Mason University

Lecture Notes on Register Allocation

Global Register Allocation via Graph Coloring

Register Allocation. Register Allocation. Local Register Allocation. Live range. Register Allocation for Loops

Compiler Optimization and Code Generation

Register allocation. instruction selection. machine code. register allocation. errors

Register Allocation. CS 502 Lecture 14 11/25/08

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

Lecture Overview Register Allocation

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

Low-level optimization

Register Allocation. Ina Schaefer Selected Aspects of Compilers 100. Register Allocation

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

Global Register Allocation - Part 2

Lecture 15 Register Allocation & Spilling

Compiler Architecture

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Lecture 12. Lecture 12: The IO Model & External Sorting

Lecture Outline. Intermediate code Intermediate Code & Local Optimizations. Local optimizations. Lecture 14. Next time: global optimizations

Compilers and Code Optimization EDOARDO FUSELLA

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14

The C2 Register Allocator. Niclas Adlertz

Calvin Lin The University of Texas at Austin

CHAPTER 3. Register allocation

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210

Speicherarchitektur. Who Cares About the Memory Hierarchy? Technologie-Trends. Speicher-Hierarchie. Referenz-Lokalität. Caches

Liveness Analysis and Register Allocation. Xiao Jia May 3 rd, 2013

Register allocation. CS Compiler Design. Liveness analysis. Register allocation. Liveness analysis and Register allocation. V.

Topic 12: Register Allocation

Lecture 21 CIS 341: COMPILERS

CS2210: Compiler Construction. Compiler Optimization

Graph Coloring. Margaret M. Fleck. 3 May This lecture discusses the graph coloring problem (section 9.8 of Rosen).

CSE 401 Final Exam. March 14, 2017 Happy π Day! (3/14) This exam is closed book, closed notes, closed electronics, closed neighbors, open mind,...

Register Allocation. Introduction Local Register Allocators

Rematerialization. Graph Coloring Register Allocation. Some expressions are especially simple to recompute: Last Time

CSE 504: Compiler Design. Code Generation

CS 2461: Computer Architecture 1

Computer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College

Run-time Environments

CHAPTER 3. Register allocation

Compiling Techniques

Run-time Environments

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015

Calvin Lin The University of Texas at Austin

CISC 360. Cache Memories Exercises Dec 3, 2009

Intermediate Code & Local Optimizations

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Administrivia. CMSC 411 Computer Systems Architecture Lecture 8 Basic Pipelining, cont., & Memory Hierarchy. SPEC92 benchmarks

Compilation /17a Lecture 10. Register Allocation Noam Rinetzky

Register Allocation 3/16/11. What a Smart Allocator Needs to Do. Global Register Allocation. Global Register Allocation. Outline.

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210

Register Allocation. Stanford University CS243 Winter 2006 Wei Li 1

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)

CS321 Languages and Compiler Design I. Winter 2012 Lecture 1

CODE GENERATION Monday, May 31, 2010

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

MIT Spring 2011 Quiz 3

Lecture 4: RISC Computers

Register Allocation via Hierarchical Graph Coloring

Intermediate Code & Local Optimizations. Lecture 20

Multi-dimensional Arrays

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

More Code Generation and Optimization. Pat Morin COMP 3002

CS61C : Machine Structures

Computer Architecture and Engineering. CS152 Quiz #2. March 3rd, Professor Krste Asanovic. Name:

Register Allocation. 2. register contents, namely which variables, constants, or intermediate results are currently in a register, and

Compilation for Heterogeneous Platforms

Outline. Issues with the Memory System Loop Transformations Data Transformations Prefetching Alias Analysis

CS 31: Intro to Systems Caching. Martin Gagne Swarthmore College March 23, 2017

Code Generation. CS 1622: Code Generation & Register Allocation. Arrays. Why is Code Generation Hard? Multidimensional Arrays. Array Element Address

So, coming back to this picture where three levels of memory are shown namely cache, primary memory or main memory and back up memory.

CS 61C: Great Ideas in Computer Architecture. Cache Performance, Set Associative Caches

Transcription:

Register Allocation (via graph coloring) Lecture 25 CS 536 Spring 2001 1

Lecture Outline Memory Hierarchy Management Register Allocation Register interference graph Graph coloring heuristics Spilling Cache Management CS 536 Spring 2001 2

The Memory Hierarchy Registers 1 cycle 256-8000 bytes Cache 3 cycles 256k-1M Main memory 20-100 cycles 32M-1G Disk 0.5-5M cycles 4G-1T CS 536 Spring 2001 3

Managing the Memory Hierarchy Programs are written as if there are only two kinds of memory: main memory and disk Programmer is responsible for moving data from disk to memory (e.g., file I/O) Hardware is responsible for moving data between memory and caches Compiler is responsible for moving data between memory and registers CS 536 Spring 2001 4

Current Trends Cache and register sizes are growing slowly Processor speed improves faster than memory speed and disk speed The cost of a cache miss is growing The widening gap is bridged with more caches It is very important to: Manage registers properly Manage caches properly Compilers are good at managing registers CS 536 Spring 2001 5

The Register Allocation Problem Recall that intermediate code uses as many temporaries as necessary This complicates final translation to assembly But simplifies code generation and optimization Typical intermediate code uses too many temporaries The register allocation problem: Rewrite the intermediate code to use fewer temporaries than there are machine registers Method: assign more temporaries to a register But without changing the program behavior CS 536 Spring 2001 6

History Register allocation is as old as intermediate code Register allocation was used in the original FORTRAN compiler in the 50s Very crude algorithms A breakthrough was not achieved until 1980 when Chaitin invented a register allocation scheme based on graph coloring Relatively simple, global and works well in practice CS 536 Spring 2001 7

An Example Consider the program a := c + d e := a + b f := e - 1 with the assumption that a and e die after use Temporary a can be reused after e := a + b Same with temporary e Can allocate a, e, and f all to one register (r 1 ): r 1 := r 2 + r 3 r 1 := r 1 + r 4 r 1 := r 1-1 CS 536 Spring 2001 8

Basic Register Allocation Idea The value in a dead temporary is not needed for the rest of the computation A dead temporary can be reused Basic rule: Temporaries t 1 and t 2 can share the same register if at any point in the program at most one of t 1 or t 2 is live! CS 536 Spring 2001 9

Algorithm: Part I Compute live variables for each point: {a,c,f} {c,d,f} {c,e} f := 2 * e {c,f} a := b + c d := -a e := d + f {c,f} b := d + e e := e - 1 {c,d,e,f} {b,c,f} {b,c,e,f} b := f + c {b} {b} CS 536 Spring 2001 10

The Register Interference Graph Two temporaries that are live simultaneously cannot be allocated in the same register We construct an undirected graph A node for each temporary An edge between t 1 and t 2 if they are live simultaneously at some point in the program This is the register interference graph (RIG) Two temporaries can be allocated to the same register if there is no edge connecting them CS 536 Spring 2001 11

Register Interference Graph. Example. For our example: f a b e c d E.g., b and c cannot be in the same register E.g., b and d can be in the same register CS 536 Spring 2001 12

Register Interference Graph. Properties. It extracts exactly the information needed to characterize legal register assignments It gives a global (i.e., over the entire flow graph) picture of the register requirements After RIG construction the register allocation algorithm is architecture independent CS 536 Spring 2001 13

Graph Coloring. Definitions. A coloring of a graph is an assignment of colors to nodes, such that nodes connected by an edge have different colors A graph is k-colorable if it has a coloring with k colors CS 536 Spring 2001 14

Register Allocation Through Graph Coloring In our problem, colors = registers We need to assign colors (registers) to graph nodes (temporaries) Let k = number of machine registers If the RIG is k-colorable then there is a register assignment that uses no more than k registers CS 536 Spring 2001 15

Graph Coloring. Example. Consider the example RIG a r 2 r 1 f b r 3 r 2 e c r 4 d r 3 There is no coloring with less than 4 colors There are 4-colorings of this graph CS 536 Spring 2001 16

Graph Coloring. Example. Under this coloring the code becomes: r 2 := r 3 + r 4 r 3 := -r 2 r 2 := r 3 + r 1 r 1 := 2 * r 2 r 3 := r 3 + r 2 r 2 := r 2-1 r 3 := r 1 + r 4 CS 536 Spring 2001 17

Computing Graph Colorings The remaining problem is to compute a coloring for the interference graph But: 1. This problem is very hard (NP-hard). No efficient algorithms are known. 2. A coloring might not exist for a given number or registers The solution to (1) is to use heuristics We ll consider later the other problem CS 536 Spring 2001 18

Graph Coloring Heuristic Observation: Pick a node t with fewer than k neighbors in RIG Eliminate t and its edges from RIG If the resulting graph has a k-coloring then so does the original graph Why: Let c 1,,c n be the colors assigned to the neighbors of t in the reduced graph Since n < k we can pick some color for t that is different from those of its neighbors CS 536 Spring 2001 19

Graph Coloring Heuristic The following works well in practice: Pick a node t with fewer than k neighbors Put t on a stack and remove it from the RIG Repeat until the graph has one node Then start assigning colors to nodes on the stack (starting with the last node added) At each step pick a color different from those assigned to already colored neighbors CS 536 Spring 2001 20

Graph Coloring Example (1) Start with the RIG and with k = 4: f e a b c Stack: {} Remove a and then d d CS 536 Spring 2001 21

Graph Coloring Example (2) Now all nodes have fewer than 4 neighbors and can be removed: c, b, e, f f e b c Stack: {d, a} CS 536 Spring 2001 22

Graph Coloring Example (2) Start assigning colors to: f, e, b, c, d, a r 1 f a r 2 b r 3 r 2 e c r 4 d r 3 CS 536 Spring 2001 23

What if the Heuristic Fails? What if during simplification we get to a state where all nodes have k or more neighbors? Example: try to find a 3-coloring of the RIG: f a b e d c CS 536 Spring 2001 24

What if the Heuristic Fails? Remove a and get stuck (as shown below) Pick a node as a candidate for spilling A spilled temporary lives is memory Assume that f is picked as a candidate f e b c d CS 536 Spring 2001 25

What if the Heuristic Fails? Remove f and continue the simplification Simplification now succeeds: b, d, e, c b e c d CS 536 Spring 2001 26

What if the Heuristic Fails? On the assignment phase we get to the point when we have to assign a color to f We hope that among the 4 neighbors of f we use less than 3 colors optimistic coloring? f b r 3 r 2 e c r 1 d r3 CS 536 Spring 2001 27

Spilling Since optimistic coloring failed we must spill temporary f We must allocate a memory location as the home of f Typically this is in the current stack frame Call this address fa Before each operation that uses f, insert f := load fa After each operation that defines f, insert store f, fa CS 536 Spring 2001 28

Spilling. Example. This is the new code after spilling f a := b + c d := -a f := load fa e := d + f f := 2 * e store f, fa b := d + e e := e - 1 f := load fa b := f + c CS 536 Spring 2001 29

Recomputing Liveness Information The new liveness information after spilling: {c,d,f} {c,f} {c,f} {a,c,f} {c,d,f} {c,e} f := 2 * e store f, fa {c,f} {b} a := b + c d := -a f := load fa e := d + f {c,f} f := load fa b := f + c b := d + e e := e - 1 {c,d,e,f} CS 536 Spring 2001 30 {b} {b,c,f} {b,c,e,f}

Recomputing Liveness Information The new liveness information is almost as before f is live only Between a f := load fa and the next instruction Between a store f, fa and the preceding instr. Spilling reduces the live range of f And thus reduces its interferences Which result in fewer neighbors in RIG for f CS 536 Spring 2001 31

Recompute RIG After Spilling The only changes are in removing some of the edges of the spilled node In our case f still interferes only with c and d And the resulting RIG is 3-colorable a f b e c d CS 536 Spring 2001 32

Spilling (Cont.) Additional spills might be required before a coloring is found The tricky part is deciding what to spill Possible heuristics: Spill temporaries with most conflicts Spill temporaries with few definitions and uses Avoid spilling in inner loops Any heuristic is correct CS 536 Spring 2001 33

Caches Compilers are very good at managing registers Much better than a programmer could be Compilers are not good at managing caches This problem is still left to programmers It is still an open question whether a compiler can do anything general to improve performance Compilers can, and a few do, perform some simple cache optimization CS 536 Spring 2001 34

Cache Optimization Consider the loop for(j := 1; j < 10; j++) for(i=1; i<1000; i++) a[i] *= b[i] This program has a terrible cache performance Why? CS 536 Spring 2001 35

Cache Optimization (Cont.) Consider the program: for(i=1; i<1000; i++) for(j := 1; j < 10; j++) a[i] *= b[i] Computes the same thing But with much better cache behavior Might actually be more than 10x faster A compiler can perform this optimization called loop interchange CS 536 Spring 2001 36

Conclusions Register allocation is a must have optimization in most compilers: Because intermediate code uses too many temporaries Because it makes a big difference in performance Graph coloring is a powerful register allocation schemes Register allocation is more complicated for CISC machines CS 536 Spring 2001 37