register allocation saves energy register allocation reduces memory accesses.

Similar documents
Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

Register Allocation 3/16/11. What a Smart Allocator Needs to Do. Global Register Allocation. Global Register Allocation. Outline.

CSC D70: Compiler Optimization Register Allocation

Register allocation. TDT4205 Lecture 31

Lecture 15 Register Allocation & Spilling

Low-Level Issues. Register Allocation. Last lecture! Liveness analysis! Register allocation. ! More register allocation. ! Instruction scheduling

Compiler Design. Register Allocation. Hwansoo Han

Register Allocation & Liveness Analysis

Code generation for modern processors

Code generation for modern processors

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices

Lecture 21 CIS 341: COMPILERS

Register allocation. Overview

CS 406/534 Compiler Construction Putting It All Together

CHAPTER 3. Register allocation

CHAPTER 3. Register allocation

The C2 Register Allocator. Niclas Adlertz

Register Allocation. Introduction Local Register Allocators

How to efficiently use the address register? Address register = contains the address of the operand to fetch from memory.

Lecture Compiler Backend

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210

Register Allocation via Hierarchical Graph Coloring

Run-time Environment

Register allocation. instruction selection. machine code. register allocation. errors

Topic 12: Register Allocation

Liveness Analysis and Register Allocation. Xiao Jia May 3 rd, 2013

Register Allocation. Register Allocation. Local Register Allocation. Live range. Register Allocation for Loops

Compilers and Code Optimization EDOARDO FUSELLA

Compiler Architecture

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210

Register Allocation. CS 502 Lecture 14 11/25/08

Simple Machine Model. Lectures 14 & 15: Instruction Scheduling. Simple Execution Model. Simple Execution Model

Register Allocation (via graph coloring) Lecture 25. CS 536 Spring

Register Allocation. Stanford University CS243 Winter 2006 Wei Li 1

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

Functions in MIPS. Functions in MIPS 1

Global Register Allocation

Register Allocation 1

Global Register Allocation

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1

Code Generation. CS 540 George Mason University

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413

Fall Compiler Principles Lecture 12: Register Allocation. Roman Manevich Ben-Gurion University

Global Register Allocation - Part 3

Instruction Sets: Characteristics and Functions

Lecture 5. Announcements: Today: Finish up functions in MIPS

Lecture 6. Register Allocation. I. Introduction. II. Abstraction and the Problem III. Algorithm

Lectures 5. Announcements: Today: Oops in Strings/pointers (example from last time) Functions in MIPS

Today. Putting it all together

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Register Allocation. Lecture 16

Global Register Allocation via Graph Coloring

Register Allocation Amitabha Sanyal Department of Computer Science & Engineering IIT Bombay

Computer Architecture and Organization. Instruction Sets: Addressing Modes and Formats

Global Register Allocation - Part 2

Register allocation. CS Compiler Design. Liveness analysis. Register allocation. Liveness analysis and Register allocation. V.

Lecture 25: Register Allocation

Register Allocation. Lecture 38

Chapter 9: Virtual-Memory

Linear Scan Register Allocation. Kevin Millikin

Addressing Modes. Immediate Direct Indirect Register Register Indirect Displacement (Indexed) Stack

April 15, 2015 More Register Allocation 1. Problem Register values may change across procedure calls The allocator must be sensitive to this

Code Generation. Lecture 12

Today More register allocation Clarifications from last time Finish improvements on basic graph coloring concept Procedure calls Interprocedural

Lecture 4: Instruction Set Architecture

Rematerialization. Graph Coloring Register Allocation. Some expressions are especially simple to recompute: Last Time

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

Chapter 9. Register Allocation

Assembly Language: Function Calls

Register Allocation in Just-in-Time Compilers: 15 Years of Linear Scan

CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

CA Compiler Construction

CS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri

CSE 504: Compiler Design. Code Generation

Register Allocation. Preston Briggs Reservoir Labs

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Run Time Environments

Chapter 2A Instructions: Language of the Computer

G Compiler Construction Lecture 12: Code Generation I. Mohamed Zahran (aka Z)

RISC Architecture Ch 12

CS Introduction to Data Structures How to Parse Arithmetic Expressions

CSE Lecture In Class Example Handout

Tour of common optimizations

MODULE 4 INSTRUCTIONS: LANGUAGE OF THE MACHINE

III Data Structures. Dynamic sets

COP5621 Exam 4 - Spring 2005

CS5363 Final Review. cs5363 1

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Run-Time Environments/Garbage Collection

CS 61c: Great Ideas in Computer Architecture

Module 27 Switch-case statements and Run-time storage management

Characteristics of RISC processors. Code generation for superscalar RISCprocessors. What are RISC and CISC? Processors with and without pipelining

Symbol Tables. ASU Textbook Chapter 7.6, 6.5 and 6.3. Tsan-sheng Hsu.

Process. One or more threads of execution Resources required for execution. Memory (RAM) Others

CSE 431S Final Review. Washington University Spring 2013

Transcription:

Lesson 10 Register Allocation Full Compiler Structure Embedded systems need highly optimized code. This part of the course will focus on Back end code generation. Back end: generation of assembly instructions from IR Then: selecting instructions register allocation on them instruction scheduling Preparing IR for Efficient Code Generation (register allocation) First step of Code Generation: Convert IR into CFG (lesson 8) Perform Data flow analysis: liveness analysis (lesson 9) Gather Live variables Information: perform register allocation CISC all operands are in memory and there are long instructions to retrieve and use them. RISC and VLIW registers allocation is a must for these architectures. In VLIW the compiler needs to have control over the registers for bundling and scheduling. register allocation saves energy register allocation reduces memory accesses. The importance of register allocation through an example: Adding two numbers together.

There are two options: keep the numbers in memory: So load and store instructions are used. For this simple example there are 2 loads and one store. In this option, the values are already in the registers, they just need to be added. Registers had to allocated. It does not go to memory, no latencies. Register allocation: there are few registers are available: 32 float registers, 32 integer registers some registers have special purposes: integer reg float reg doubleword registers (odd/even pairs) branch and predicate registers stack pointer including the activation record (AR) all the local variables, return address, etc. for function calls R0 holds the program, this register is initialized to the address of the program string manipulation registers for implicit uses. In this lesson we will be focusing on: general purpose registers for holding integers and floats. The reason the answer is no. sometimes a[i] == a[j]. But this is not always the case. The generated code is always implementing the code as if they are always the same. This example illustrates the issue of arrays. They can become inefficient with multiple loads and stores. Unless we can prove it is always equal, we have to allocate as if it is not. Register allocation: determines which of the values (variables, temporaries, large constants) should be in registers at each execution point. Register assignment: determines which register should be used by each allocated value

The goal of register allocation: to minimize the traffic between the CPU and the memory hierarchy. This means. increase the number of values that stay in the registers for the program duration. This optimization has the highest impact for both embedded systems and general purpose processors. Speed, code size, energy efficiency all of these are affected by register allocation. The better the allocation, the better the three parameters. Yet, there are very few registers in gen. purpose processors. In embedded systems there could be as few as 16 registers. Register Allocation Definitions gen (generated) variable is newly defined kill old value is no longer accessible. def on left hand side (a = 5) a is on the LHS use on the right hand side (a = b + c) b and c are being used live variable will be used again (before it is overwritten) (a = 10,, c = a + b :: a is live because it is used) dead variable will never be used again and the variable will be overwritten before the next use We make the assumption that the values are not used after the last line of the program. The live range begins at a definition.

Now to find the interference of live ranges. There are overlaps of live ranges. If this is graphed we have an interference graph. To allocate registers we will color the graphs. Each register will be a different color. r1 = green, r2 = blue, r3 = pink The same color cannot be in an overlapping live range (nodes that share an edge) because the assignments will interfere with each other. For s6 any register would work, so blue was used. The final result of register allocation: Three registers were needed for this code. Usually, there are more variables than registers, so the decision is which value(s) are kicked out of which registers. Register Allocation Defined 1. determine live ranges for each value (web) 2. determine overlapping ranges (interference) 3. computer the benefit of keeping each web in a register (spill cost). Spill cost = how many definitions and uses are in a range (to help avoid load/stores). 4. decide which webs get a register (allocation) 5. Split webs if needed (spilling and splitting) 6. assign hard registers to webs (assignment coloring) 7. generate the code including spills (code gen). Spill = a value needs to be removed from a register, but it is still live.

Alias variable = variable that can be modified by pointers. This is a false statement. Keeping all variables in registers is unsafe. For example: if a variable is stored in a register, and updated in memory. The value in the register will be stale. So alias variables cannot be kept in registers, EVER! Webs webs a way to extend liveness across basic blocks what if the definition is in one basic block and the use is in another basic block. webs start at definition, end at end of use def use (DU) chains connects definition to all reachable uses conditions for putting defs and uses into the same web definition and all reachable uses must be part of the same web all defs that reach the same use must be in the same web example of a web. S1 is a web for y. Note that there is another web for y, S3. The reason they are different webs is because the uses are not reachable to each other. Interference Two webs interfere if their live ranges overlap (have a non empty intersection). If two webs interfere, values must be stored in different registers or memory locations. If two webs do NOT interfere, they can store values in the same register or memory location. In the above example: webs s1 and s2 interfere. webs s2 and s3 interfere.

An edge exists between two nodes if they interfere. Register Allocation After completing the interference graph, register allocation is done. Each web is allocated a register. Each node gets a register (color) If two webs are interfering they cannot have the same color. So the question is can we color the graph with the minimum number of registers. NP complete to solve the problem we may need a program that takes an exponential amount of time. But good heuristics exist for register allocation. Graph Coloring Example 1: This graph takes 1 color because there is no interference between the nodes. This graph requires two colors because two nodes interfere with each other. This graph requires two colors. This graph requires 3 colors.

Graph Coloring Heuristics To find the minimum number of colors during compile time could take an incredibly long period of time. So most graph coloring heuristics seek an almost minimal set instead of the minimum set. if degree < N (degree of a node = # of edges it has) A Node can always be colored. As long as the number of neighbors are less than N, the node can always be colored. After coloring the rest of the nodes, you ll have at least one color left to color the current node. if degree >= N it may still be colorable with N colors. Graph Coloring Heuristics Remove nodes that have degree < N push the removed nodes onto the stack When all the remaining nodes have a degree >= N Find the node with the least spill cost and push it onto the stack What we are doing is ordering the nodes on the stack in the order of their spill cost. The top of the stack will have the highest spill cost, the bottom of the stack will have the least spill cost. The nodes with the highest spill costs will get registers. When all the list of nodes is empty, it is time to start coloring. Now it is time to start coloring Pop a node from the top of the stack ( highest spill cost). Assign it a color that is different from its connected nodes (since degree < N, a color should exist). If there is not a color available, then there will be a spill.

Graph Coloring Example 5 For this example N = 3 Applying the algorithm to this example: 1. Push onto the stack the nodes that have a degree <= N. This would be nodes S4 and S2. 2. Then put on the stack the nodes that have degrees >= N. This would be nodes S1, S3, S0. This is known as graph simplification. So there are three registers (blue, green, red). Time to color the graph. 1. Remove the first node from the top, and work down in the stack. So S0 is colored first. 2. Then the next node in the stack is colored (S3). 3. Then remove the next node (S1) and determine the color, in this case red. 4. The S2 is to be colored. Since it cannot be blue or red, green is selected as the color. 5. S4 must now be colored. It cannot be green (due to interference), but it can be red or blue. Randomly select blue. The final outcome of coloring:

Graph Coloring Example 6 1. N = 3, so we put S4 on the stack first, because its degree is < N 2. All other nodes have degrees that are == 3 3. So no more nodes can go on the stack. 4. So now find the node, of those that are left, that has the lowest spill cost. 5. Using Chatains algorithm, take out S1 and remove all its edges. If using Briggs, S1 also goes on the stack next. S1 is a spill node, so it does not get a color. 6. Then decide between S3, S0, S2. Select S3 and push it on the stack. 7. Then push S2 on the stack. Then S0. 8. The final result is.. So now we have to color the node on the stack. S0 choose red. S2 cannot be red, so choose green, S3 cannot be red, or green, so choose blue, S4 must be red. Look at S1 does not have a color in this scheme. It has been spilled to the memory. In both algorithms, S1 is spilled to memory. But there are graphs where one algorithm will work better than the other.

When Coloring Fails What are the options: 1. pick a web and allocate the value in memory. All the definitions go to memory, all uses come from memory. 2. Split the web into multiple webs. In either case, will retry the coloring. For splitting Which web to choose? one with interference degree >= N one with minimal spill cost (cost of placing value in memory rather than in register.) Ideal spill cost: dynamic cost of extra load and store instructions. Can t expect to compute this. Don t know which way branches resolve. Don t know how many times loops execute. Actual cost may be different for different executions. Solution: Use static approximation profiling can give instruction execution frequencies or use heuristics based on structure of control flow graph. The second technique is usually used in practice. Computing spill costs using heuristics based on structure of control flow graph Goal: give priority to values used in loops Assume loops execute 8 to 10 times Spill cost = sum over all def sites: cost of a store instruction times 10 to the loop nesting depth power, plus sum over all use sites: cost of a load instruction times 10 to the loop nesting depth power. For example: if you have a loop that executes 10 times then d = nested loop level # executions = 10 d store cost = # executions * cost of store instruction load cost = # executions * cost of load instruction Then choose the web with the lowest spill cost.

Spill Cost Example The x web spans from the start to the end. There are two y webs. Spill cost for x storecost + loadcost Spill cost for y 10*storeCost + 10*loadCost (we are only looking at the the spill cost of y inside the loop). With 1 register which variable gets spilled? The one with the lower cost, x. Splitting the Web Split a web into multiple webs so that there will be less interference in the interference graph making it N colorable. The web that should be split highest interference and lowest spill cost. After splitting spill the value to memory and load it back at the points where the web is split. Splitting Example We will need at least 3 colors. Can we split the web? Yes, split the z variable. The graph is now two colorable (two registers), with spill of z. Z will require a store and load from main memory.

Splitting Heuristic Identify a program point where the graph is not R colorable (point where the number of webs > N). Pick a web that is not used for the largest enclosing block around that point of the program. (for example z in the last example) Split the web at the corresponding edge Redo the interference graph Try to re color the graph Costs and Benefits of Splitting Cost of splitting a node: proportional to the number of time the split edge has to be crossed dynamically estimate by its loop nesting So the cost of each load and store must be considered when splitting the web. Benefit of Splitting a node: Increase colorability of the nodes the split web interferes with. Can approximate by its degree in the interference graph Greedy Heuristic: pick the live range with the highest benefit of cost ratio to spill. Further Optimizations register coalescing register targeting(pre coloring) presplitting of webs interprocedural register allocation Register Coalescing Find register copy instructions sj = si (register to register copying) Remove these by combining their webs into one web: if sj and si do not interfere, combine their webs Pros: reduce the number of instructions Cons: may increase the degree of the combined node a colorable graph may become non colorable

Register Targeting (Pre coloring) Some variables need to be in special register at a given time first 4 arguments to a function return value Pre color those webs and bind them to the correct register Will eliminate unnecessary copy instructions Pre Splitting the Webs Some live ranges have very large dead regions. This is a large region where the variable is unused. Break up the live ranges need to pay a small cost in spilling the graph will be very easy to color Can find strategic locations to break up at a call site (because you need to spill anyway) around a large loop nest (reserve register for values used in the loop) There is a cost associated with splitting. Recall loads and stores. Interprocedural Register Allocation Saving and restoring registers across procedure boundaries is expensive especially for programs with many small functions Calling convention is too general and inefficient Customize calling convention per function by doing interprocedural register allocation You may need to create a clone version of the function. A clone version is only effective at the call site, it has a specialized register allocation.