Global Register Allocation

Similar documents
Register Allocation (via graph coloring) Lecture 25. CS 536 Spring

Register Allocation 1

Register Allocation. Lecture 38

Register Allocation. Lecture 16

Lecture 25: Register Allocation

Register allocation. TDT4205 Lecture 31

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices

Register allocation. instruction selection. machine code. register allocation. errors

Fall Compiler Principles Lecture 12: Register Allocation. Roman Manevich Ben-Gurion University

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1

Register Allocation & Liveness Analysis

Topic 12: Register Allocation

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Liveness Analysis and Register Allocation. Xiao Jia May 3 rd, 2013

register allocation saves energy register allocation reduces memory accesses.

Low-Level Issues. Register Allocation. Last lecture! Liveness analysis! Register allocation. ! More register allocation. ! Instruction scheduling

Lecture 6. Register Allocation. I. Introduction. II. Abstraction and the Problem III. Algorithm

Register Allocation. CS 502 Lecture 14 11/25/08

CSC D70: Compiler Optimization Register Allocation

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

Lecture Notes on Register Allocation

Register Allocation. Ina Schaefer Selected Aspects of Compilers 100. Register Allocation

More Code Generation and Optimization. Pat Morin COMP 3002

Compiler Architecture

Lecture 21 CIS 341: COMPILERS

Compiler Design. Register Allocation. Hwansoo Han

Chapter 9. Register Allocation

G Compiler Construction Lecture 12: Code Generation I. Mohamed Zahran (aka Z)

Lecture 4: RISC Computers

Global Register Allocation

Compilers and Code Optimization EDOARDO FUSELLA

Intermediate Code & Local Optimizations

Introduction to Code Optimization. Lecture 36: Local Optimization. Basic Blocks. Basic-Block Example

Register Allocation. Introduction Local Register Allocators

Global Register Allocation - Part 3

1 Motivation for Improving Matrix Multiplication

Lecture Overview Register Allocation

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan Memory Hierarchy

CS 2461: Computer Architecture 1

Compilation /17a Lecture 10. Register Allocation Noam Rinetzky

Compiler Optimization and Code Generation

CS 406/534 Compiler Construction Putting It All Together

Run-time Environments

Run-time Environments

Register Allocation. Preston Briggs Reservoir Labs

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Global Register Allocation via Graph Coloring

Abstract Interpretation Continued

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

So, coming back to this picture where three levels of memory are shown namely cache, primary memory or main memory and back up memory.

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

CSE 504: Compiler Design. Code Generation

Naming in OOLs and Storage Layout Comp 412

PERFORMANCE OPTIMISATION

CODE GENERATION Monday, May 31, 2010

Register allocation. Overview

Register allocation. CS Compiler Design. Liveness analysis. Register allocation. Liveness analysis and Register allocation. V.

University of California at Santa Barbara. ECE 154A Introduction to Computer Architecture. Quiz #1. October 30 th, Name (Last, First)

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

Exercises 6 - Virtual vs. Physical Memory, Cache

Register Allocation. Register Allocation. Local Register Allocation. Live range. Register Allocation for Loops

ECE 486/586. Computer Architecture. Lecture # 7

Lecture 15 Register Allocation & Spilling

Low-level optimization

Optimising for the p690 memory system

Lectures 5. Announcements: Today: Oops in Strings/pointers (example from last time) Functions in MIPS

Run Time Environment. Procedure Abstraction. The Procedure as a Control Abstraction. The Procedure as a Control Abstraction

Intermediate Code & Local Optimizations. Lecture 20

Lecture 4: RISC Computers

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010

Code Generation. CS 540 George Mason University

Computer Architecture, RISC vs. CISC, and MIPS Processor

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

William Stallings Computer Organization and Architecture. Chapter 12 Reduced Instruction Set Computers

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)

Lecture 5. Announcements: Today: Finish up functions in MIPS

Lecture Outline. Intermediate code Intermediate Code & Local Optimizations. Local optimizations. Lecture 14. Next time: global optimizations

Code Generation & Parameter Passing

RISC Principles. Introduction

Multi-dimensional Arrays

Run-time Environment

Calvin Lin The University of Texas at Austin

Loop Transformations! Part II!

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

CSE 401 Final Exam. March 14, 2017 Happy π Day! (3/14) This exam is closed book, closed notes, closed electronics, closed neighbors, open mind,...

Global Register Allocation - 2

COMP3221: Microprocessors and. and Embedded Systems. Instruction Set Architecture (ISA) What makes an ISA? #1: Memory Models. What makes an ISA?

ECE232: Hardware Organization and Design

CSE351 Winter 2016, Final Examination March 16, 2016

16.10 Exercises. 372 Chapter 16 Code Improvement. be translated as

Run-time Environments -Part 1

April 15, 2015 More Register Allocation 1. Problem Register values may change across procedure calls The allocator must be sensitive to this

MODEL ANSWERS COMP36512, May 2016

OPERATING SYSTEMS. After A.S.Tanenbaum, Modern Operating Systems 3rd edition Uses content with permission from Assoc. Prof. Florin Fortis, PhD

Heap Management. Heap Allocation

CSCI Compiler Design

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Functions in MIPS. Functions in MIPS 1

Code Generation. The Main Idea of Today s Lecture. We can emit stack-machine-style code for expressions via recursion. Lecture Outline.

Transcription:

Global Register Allocation

Lecture Outline Memory Hierarchy Management Register Allocation via Graph Coloring Register interference graph Graph coloring heuristics Spilling Cache Management 2

The Memory Hierarchy Registers 1 cycle 256-8000 bytes Cache 3 cycles 256k-16M Main memory 20-100 cycles 512M-64G Disk 0.5-5M cycles 10G-1T 3

Managing the Memory Hierarchy Programs are written as if there are only two kinds of memory: main memory and disk Programmer is responsible for moving data from disk to memory (e.g., file I/O) Hardware is responsible for moving data between memory and caches Compiler is responsible for moving data between memory and registers 4

Current Trends Power usage limits Size and speed of registers/caches Speed of processors Improves faster than memory speed (and disk speed) The cost of a cache miss is growing The widening gap between processors and memory is bridged with more levels of caches It is very important to: Manage registers properly Manage caches properly Compilers are good at managing registers 5

The Register Allocation Problem Recall that intermediate code uses as many temporaries as necessary This complicates final translation to assembly But simplifies code generation and optimization Typical intermediate code uses too many temporaries The register allocation problem: Rewrite the intermediate code to use at most as many temporaries as there are machine registers Method: Assign multiple temporaries to a register But without changing the program behavior 6

History Register allocation is as old as intermediate code Register allocation was used in the original FORTRAN compiler in the 50s Very crude algorithms were used back then A breakthrough was not achieved until 1980 Register allocation scheme based on graph coloring Relatively simple, global, and works well in practice 7

An Example Consider the program a := c + d e := a + b f := e - 1 with the assumption that a and e die after use Temporary a can be reused after a + b Same with temporary e after e - 1 Can allocate a, e, and f all to one register (r 1 ): r 1 := r 2 + r 3 r 1 := r 1 + r 4 r 1 := r 1-1 8

Basic Register Allocation Idea The value in a dead temporary is not needed for the rest of the computation A dead temporary can be reused Basic rule: Temporaries t 1 and t 2 can share the same register if at all points in the program at most one of t 1 or t 2 is live! 9

Algorithm: Part I Compute live variables for each program point: {a,c,f} {c,d,f} {c,e} f := 2 * e {c,f} {b} a := b + c d := -a e := d + f {c,f} b := f + c b := d + e e := e - 1 {b,c,f} {c,d,e,f} {b,c,e,f} {b,e} 10

The Register Interference Graph Two temporaries that are live simultaneously cannot be allocated in the same register We construct an undirected graph with A node for each temporary An edge between t 1 and t 2 if they are live simultaneously at some point in the program This is the register interference graph (RIG) Two temporaries can be allocated to the same register if there is no edge connecting them 11

Register Interference Graph: Example For our example: f a b e c d E.g., b and c cannot be in the same register E.g., b and d can be in the same register 12

Register Interference Graph: Properties It extracts exactly the information needed to characterize legal register assignments It gives a global (i.e., over the entire flow graph) picture of the register requirements After RIG construction, the register allocation algorithm is architecture independent 13

Graph Coloring: Definitions A coloring of a graph is an assignment of colors to nodes, such that nodes connected by an edge have different colors A graph is k-colorable if it has a coloring with k colors 14

Register Allocation Through Graph Coloring In our problem, colors = registers We need to assign colors (registers) to graph nodes (temporaries) Let k = number of machine registers If the RIG is k-colorable then there is a register assignment that uses no more than k registers 15

Graph Coloring: Example Consider the example RIG a r 2 r 1 f b r 3 r 2 e c r 4 d r 3 There is no coloring with less than 4 colors There are various 4-colorings of this graph (one of them is shown in the figure) 16

Graph Coloring: Example Under this coloring the code becomes: r 2 := r 3 + r 4 r 3 := -r 2 r 2 := r 3 + r 1 r 1 := 2 * r 2 r 3 := r 3 + r 2 r 2 := r 2-1 r 3 := r 1 + r 4 17

Computing Graph Colorings The remaining problem is how to compute a coloring for the interference graph But: (1) Computationally this problem is NP-hard: No efficient algorithms are known (2) A coloring might not exist for a given number of registers The solution to (1) is to use heuristics We will consider the other problem later 18

Graph Coloring Heuristic Observation: Pick a node t with fewer than k neighbors in RIG Eliminate t and its edges from RIG If the resulting graph has a k-coloring then so does the original graph Why: Let c 1,,c n be the colors assigned to the neighbors of t in the reduced graph Since n < k we can pick some color for t that is different from those of its neighbors 19

Graph Coloring Simplification Heuristic The following works well in practice: Pick a node t with fewer than k neighbors Put t on a stack and remove it from the RIG Repeat until the graph has one node Then start assigning colors to nodes on the stack (starting with the last node added) At each step pick a color different from those assigned to already colored neighbors 20

Graph Coloring Example (1) Start with the RIG and with k = 4: f e Remove a a d b c Stack: {} 21

Graph Coloring Example (2) Start with the RIG and with k = 4: f e b c Stack: {a} Remove d d 22

Graph Coloring Example (3) Now all nodes have fewer than 4 neighbors and can be removed: c, b, e, f f e b c Stack: {d, a} 23

Graph Coloring Example (4) Start assigning colors to: f, e, b, c, d, a r 1 f a r 2 b r 3 r 2 e c r 4 d r 3 24

What if the Heuristic Fails? What if during simplification we get to a state where all nodes have k or more neighbors? Example: try to find a 3-coloring of the RIG: f a b e d c 25

What if the Heuristic Fails? Remove a and get stuck (as shown below) Pick a node as a possible candidate for spilling A spilled temporary lives is memory Assume that f is picked as a candidate f b e c d 26

What if the Heuristic Fails? Remove f and continue the simplification Simplification now succeeds: b, d, e, c b e c d 27

What if the Heuristic Fails? On the assignment phase we get to the point when we have to assign a color to f We hope that among the 4 neighbors of f we used less than 3 colors optimistic coloring? f b r 3 r 2 e c r 1 d r3 28

Spilling Since optimistic coloring failed, we must spill temporary f (actual spill) We must allocate a memory location as the home of f Typically this is in the current stack frame Call this address fa Before each operation that uses f, insert f := load fa After each operation that defines f, insert store f, fa 29

Spilling: Example This is the new code after spilling f a := b + c d := -a f := load fa e := d + f f := 2 * e store f, fa b := d + e e := e - 1 f := load fa b := f + c 30

Recomputing Liveness Information The new liveness information after spilling: {c,d,f} {c,f} {c,f} {a,c,f} {c,d,f} {c,e} f := 2 * e store f, fa {c,f} {b} a := b + c d := -a f := load fa e := d + f {c,f} f := load fa b := f + c b := d + e e := e - 1 {b,c,f} {c,d,e,f} {b,c,e,f} {b,e} 31

Recomputing Liveness Information New liveness information is almost as before f is live only Between a f := load fa and the next instruction Between a store f, fa and the preceding instruction Spilling reduces the live range of f And thus reduces its interferences Which results in fewer RIG neighbors for f 32

Recompute RIG After Spilling The only changes are in removing some of the edges of the spilled node In our case f now interferes only with c and d And now the resulting RIG is 3-colorable r2 a r2 f b r3 r2 e c r1 d r3 33

Spilling Notes Additional spills might be required before a coloring is found The tricky part is deciding what to spill Possible heuristics: Spill temporaries with most conflicts Spill temporaries with few definitions and uses Avoid spilling in inner loops Any heuristic is correct 34

Precolored Nodes Precolored nodes are nodes which are a priori bound to actual machine registers These nodes are usually used for some specific (time-critical) purpose, e.g.: for the frame pointer for the first N arguments (N=2,3,4,5) 35

Precolored Nodes (Cont.) For each color, there should be only one precolored node with that color; all precolored nodes usually interfere with each other We can give an ordinary temporary the same color as a precolored node as long as it does not interfere with it However, we cannot simplify or spill precolored nodes; we thus treat them as having infinite degree 36

Effects of Global Register Allocation Reduction in % for MIPS C Compiler Program cycles total loads/stores scalar loads/stores boyer 37.6 76.9 96.2 diff 40.6 69.4 92.5 yacc 31.2 67.9 84.4 nroff 16.3 49.0 54.7 ccom 25.0 53.1 67.2 upas 25.3 48.2 70.9 as1 30.5 54.6 70.8 Geo Mean 28.4 59.0 75.4 37

Managing Caches Compilers are very good at managing registers Much better than a programmer could be Compilers are not good at managing caches This problem is still left to programmers It is still an open question whether a compiler can do anything general to improve performance Compilers can, and a few do, perform some simple cache optimization 38

Cache Optimization Consider the loop for (j = 1; j < 10; j++) for (i = 1; i < 1000000; i++) a[i] *= b[i] This program has terrible cache performance Why? 39

Cache Optimization (Cont.) Consider now the program: for (i = 1; i < 1000000; i++) for (j = 1; j < 10; j++) a[i] *= b[i] Computes the same thing But with much better cache behavior Might actually be more than 10x faster A compiler can perform this optimization called loop interchange 40

Concluding Remarks Register allocation is a must have optimization in most compilers: Because intermediate code uses too many temporaries Because it makes a big difference in performance Graph coloring is a powerful register allocation scheme (with many variations on the heuristics) Register allocation is more complicated for CISC machines 41