CS415 Compilers Register Allocation. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Similar documents
CS415 Compilers. Instruction Scheduling and Lexical Analysis

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Local Register Allocation (critical content for Lab 2) Comp 412

CS415 Compilers. Intermediate Represeation & Code Generation

Lecture 12: Compiler Backend

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

Instruction Selection: Preliminaries. Comp 412

Instruction Selection and Scheduling

Instruction Selection, II Tree-pattern matching

Instruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Intermediate Representations

The Software Stack: From Assembly Language to Machine Code

CS415 Compilers. Lexical Analysis

Intermediate Representations

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CS 406/534 Compiler Construction Instruction Scheduling

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

CS 406/534 Compiler Construction Putting It All Together

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation

CS415 Compilers. Procedure Abstractions. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Intermediate Representations

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1

Global Register Allocation via Graph Coloring

Code Shape II Expressions & Assignment

Instruction Scheduling

The ILOC Simulator User Documentation

Handling Assignment Comp 412

The ILOC Simulator User Documentation

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

The ILOC Simulator User Documentation

CS5363 Final Review. cs5363 1

Code generation for modern processors

Code generation for modern processors

CS 406/534 Compiler Construction Intermediate Representation and Procedure Abstraction

opt. front end produce an intermediate representation optimizer transforms the code in IR form into an back end transforms the code in IR form into

The ILOC Virtual Machine (Lab 1 Background Material) Comp 412

Procedure and Function Calls, Part II. Comp 412 COMP 412 FALL Chapter 6 in EaC2e. target code. source code Front End Optimizer Back End

CS 432 Fall Mike Lam, Professor. Data-Flow Analysis

CS 314 Principles of Programming Languages Fall 2017 A Compiler and Optimizer for tinyl Due date: Monday, October 23, 11:59pm

Front End. Hwansoo Han

Compiler Design. Register Allocation. Hwansoo Han

Intermediate Representations

Intermediate Representations Part II

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412

Optimizations. Optimization Safety. Optimization Safety CS412/CS413. Introduction to Compilers Tim Teitelbaum

CS415 Compilers Context-Sensitive Analysis Type checking Symbol tables

Runtime Support for Algol-Like Languages Comp 412

Lab 3, Tutorial 1 Comp 412

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

Implementing Control Flow Constructs Comp 412

Introduction to Optimization Local Value Numbering

Compiler Optimisation

Today More register allocation Clarifications from last time Finish improvements on basic graph coloring concept Procedure calls Interprocedural

The So'ware Stack: From Assembly Language to Machine Code

Operator Strength Reduc1on

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Parsing II Top-down parsing. Comp 412

G Compiler Construction Lecture 12: Code Generation I. Mohamed Zahran (aka Z)

Software Tools for Lab 1

The View from 35,000 Feet

CS 406/534 Compiler Construction Instruction Scheduling beyond Basic Block and Code Generation

CS 432 Fall Mike Lam, Professor. Code Generation

CSCI 402: Computer Architectures. Instructions: Language of the Computer (1) Fengguang Song Department of Computer & Information Science IUPUI

Topic 6 Basic Back-End Optimization

CSCI 402: Computer Architectures. Instructions: Language of the Computer (1) Fengguang Song Department of Computer & Information Science IUPUI

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Optimizer. Defining and preserving the meaning of the program

Lecture 21 CIS 341: COMPILERS

ECE 486/586. Computer Architecture. Lecture # 7

Combining Optimizations: Sparse Conditional Constant Propagation

Class Information ANNOUCEMENTS

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Understand the factors involved in instruction set

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information.

Code Merge. Flow Analysis. bookkeeping

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413

CSE 504: Compiler Design. Instruction Selection

April 15, 2015 More Register Allocation 1. Problem Register values may change across procedure calls The allocator must be sensitive to this

CSE 504: Compiler Design. Code Generation

Advanced Compiler Construction

Code generation for superscalar RISCprocessors. Characteristics of RISC processors. What are RISC and CISC? Processors with and without pipelining

Separate compilation. Topic 6: Runtime Environments p.1/21. CS 526 Topic 6: Runtime Environments The linkage convention

k register IR Register Allocation IR Instruction Scheduling n), maybe O(n 2 ), but not O(2 n ) k register code

Architectures. Code Generation Issues III. Code Generation Issues II. Machine Architectures I

Intermediate Representations & Symbol Tables

Agenda. CSE P 501 Compilers. Big Picture. Compiler Organization. Intermediate Representations. IR for Code Generation. CSE P 501 Au05 N-1

CHAPTER 3. Register allocation

Runtime Support for OOLs Object Records, Code Vectors, Inheritance Comp 412

Introduction to Compiler

Compiler Theory Introduction and Course Outline Sandro Spina Department of Computer Science

MODEL ANSWERS COMP36512, May 2016

Engineering a Compiler

Compiler Design Spring 2018

register allocation saves energy register allocation reduces memory accesses.

Low-level optimization

Transcription:

CS415 Compilers Register Allocation These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Review: The Back End IR Instruction Selection IR Register Allocation IR Instruction Scheduling Machine code Responsibilities Translate IR into target machine code Choose instructions to implement each IR operation Decide which value to keep in registers Ensure conformance with system interfaces Automation has been less successful in the back end Errors cs415, spring 14 Lecture 2 2

Register Allocation Part of the compiler s back end IR Instruction Selection m register IR Register Allocation k register IR Instruction Scheduling Machine code Errors Critical properties Produce correct code that uses k (or fewer) registers Minimize added loads and stores Minimize space used to hold spilled values Operate efficiently O(n), O(n log 2 n), maybe O(n 2 ), but not O(2 n ) cs415, spring 14 Lecture 2 3

Local Register Allocation (and Assignment) Readings: EaC 13.1-13.3, Appendix A (ILOC) Local: within single basic block Global: across procedure/function cs415, spring 14 Lecture 2 4

ILOC Register allocation on basic blocks in ILOC Pseudo-code for a simple, abstracted RISC machine generated by the instruction selection process Simple, compact data structures Here: we only use a small subset of ILOC Naïve Representation: Quadruples: table of k x 4 small integers loadi 2 r1 loadai r0 @y r2 add r1 r2 r3 store r0 r4 sub r4 r3 r5 simple record structure easy to reorder all names are explicit ILOC is described in Appendix A of EAC Simulator at ~zz124/cs415_2014/iloc_simulator on ilab cluster cs415, spring 14 Lecture 2 5

Memory Model / Code Shape Source code A = 5; B = 6; A = A + B; memory layout 0 1024... data addresses cs415, spring 14 Lecture 2 6

Memory Model / Code Shape Source code A = 5; B = 6; A = A + B; ILOC code loadi 5 r1 // compute address of A in r2 1024... store r1 r2 // content(a) = r1 loadi 6 r3 // compute address of B in r4... store r3 r4 // content(b) = r3 add r1, r3 r5 // compute address of A in r6... store r5 r6 // content(a) = r1 + r3 memory layout 0... data addresses Is this code correct? cs415, spring 14 Lecture 2 7

Memory Model / Code Shape Source code foo (var A, B) A = 5; B = 6; A = A + B; end foo; call foo(x,x); print x; ILOC code loadi 5 r1 // compute address of A in r2 1024... store r1 r2 // content(a) = r1 loadi 6 r3 // compute address of B in r4... store r3 r4 // content(b) = r3 add r1, r3 r5 // compute address of C in r6... store r5 r6 // content(c) = r1 + r3 memory layout 0... data addresses Is this code correct? cs415, spring 14 Lecture 2 8

Memory Model / Code Shape Source code foo (var A, B) A = 5; B = 6; A = A + B; end foo; call foo(x,x); print x; Incorrect for call-by-reference! ILOC code loadi 5 r1 // compute address of A in r2... store r1 r2 // content(a) = r1 loadi 6 r3 // compute address of B in r4... store r3 r4 // content(b) = r3 add r1, r3 r5 // compute address of C in r6... store r5 r6 // content(c) = r1 + r3 Is this code correct? memory layout 0 1024... data addresses cs415, spring 14 Lecture 2 9

Memory Model / Code Shape register-register model Values that may safely reside in registers are assigned to a unique virtual register Register allocation/assignment maps virtual registers to limited set of physical registers Register allocation/assignment pass needed to make code work memory-memory model All values reside in memory, and are only kept in registers as briefly as possible (load operands from memory, perform computation, store result into memory) Register allocation/assignment has to try to identify cases where values can be safely kept in registers Safety verification is hard at the low level or program abstraction Even without register allocation/assignment, code will work cs415, spring 14 Lecture 2 10

Memory Model / Code Shape register-register model Values that may safely reside in registers are assigned to a unique virtual register Register allocation/assignment maps virtual registers to limited set of physical registers Register allocation/assignment pass needed to make code work memory-memory model Will use this one from now on All values reside in memory, and are only kept in registers as briefly as possible (load operands from memory, perform computation, store result into memory) Register allocation/assignment has to try to identify cases where values can be safely kept in registers Safety verification is hard at the low level or program abstraction Even without register allocation/assignment, code will work cs415, spring 14 Lecture 2 11

Register Allocation Consider a fragment of assembly code (or ILOC) loadi 1024 r0 // r0 1024 loadi 2 r1 // r1 2 loadai r0, @y r2 // r2 y mult r1, r2 r3 // r3 2 y loadai r0, @x r4 // r4 x sub r4, r3 r5 // r5 x (2 y) The Problem At each instruction, decide which values to keep in registers Note: a value is a pseudo-register Simple if values registers Harder if values > registers The compiler must automate this process Register allocation is described in Chapters 1 & 13 of EAC ILOC is discussed in Appendix A of EAC cs415, spring 14 Lecture 2 12

Register Allocation Consider a fragment of assembly code (or ILOC) address immediate loadi 1024 r0 // r0 1024 loadi 2 r1 // r1 2 loadai r0, @y r2 // r2 y mult r1, r2 r3 // r3 2 y loadai r0, @x r4 // r4 x sub r4, r3 r5 // r5 x (2 y) The Problem At each instruction, decide which values to keep in registers Note: a value is a pseudo-register Simple if values registers Harder if values > registers The compiler must automate this process Register allocation is described in Chapters 1 & 13 of EAC ILOC is discussed in Appendix A of EAC cs415, spring 14 Lecture 2 13

Register Allocation The General Task At each point in the code, pick the values to keep in registers Insert code to move values between registers & memory No instruction reordering (leave that to scheduling) Minimize inserted code both dynamic & static measures Make good use of any extra registers Allocation versus assignment Allocation is deciding which values to keep in registers Assignment is choosing specific registers for values This distinction is often lost in the literature The compiler must perform both allocation & assignment cs415, spring 14 Lecture 2 14

Basic Blocks Definition A basic block is a maximal length segment of straight-line (i.e., branch free) code Importance (assuming normal execution) Strongest facts are provable for branch-free code If any statement executes, they all execute Execution is totally ordered Optimization Many techniques for improving basic blocks Simplest problems Strongest methods cs415, spring 14 Lecture 2 15

Local Register Allocation What s local? A local transformation operates on basic blocks Many optimizations are done locally Does local allocation solve the problem? It produces decent register use inside a block (as opposed to global ) Inefficiencies can arise at boundaries between blocks The first project (instruction scheduling) assumes that the block is the entire program How many passes can the allocator make? This is an off-line problem As many passes as it takes memory-to-memory vs. register-to-register model code shape and safety issues cs415, spring 14 Lecture 2 16

ILOC Register allocation on basic blocks in ILOC Pseudo-code for a simple, abstracted RISC machine generated by the instruction selection process Simple, compact data structures Here: we only use a small subset of ILOC Naïve Representation: Quadruples: table of k x 4 small integers loadi 2 r1 loadai r0 @y r2 add r1 r2 r3 storeai r0 @x r4 sub r4 r3 r5 simple record structure easy to reorder all names are explicit ILOC is described in Appendix A of EAC Simulator at ~zz124/cs415_2014/iloc_simulator on ilab cluster cs415, spring 14 Lecture 2 17

Register Allocation Can we do this optimally? (on real code?) Local Allocation Local Assignment Simplified cases O(n) Single size, no spilling O(n) Real cases NP-Complete Two sizes NP-Complete Global Allocation NP-Complete for 1 register NP-Complete for k registers (most sub-problems are NPC, too) Global Assignment NP-Complete Real compilers face real problems cs415, spring 14 Lecture 2 18

F Set of Feasible Registers Allocator may need to reserve registers to ensure feasibility Must be able to compute addresses Requires some minimal set of registers, F F depends on target architecture F contains registers to make spilling work (set F registers aside, i.e., not available for register assignment) Notation: k is the number of registers on the target machine cs415, spring 14 Lecture 2 19

Live Range of a Value (Virtual Register) A value is live between its definition and its uses Find definitions (x ) and uses (y x...) From definition to last use is its live range How does a second definition affect this? Can represent live range as an interval [i,j] (in block) live on exit Let MAXLIVE be the maximum, over each instruction i in the block, of the number of values (pseudo-registers) live at i. If MAXLIVE k, allocation should be easy If MAXLIVE k, no need to reserve F registers for spilling If MAXLIVE > k, some values must be spilled to memory Finding live ranges is harder in the global (non local) case cs415, spring 14 Lecture 2 20

Top-down Versus Bottom-up Allocation Top-down allocator Work from external notion of what is important Assign registers in priority order Register assignment remains fixed for entire basic block Save some registers for the values relegated to memory (feasible set F) Bottom-up allocator Work from detailed knowledge about problem instance Incorporate knowledge of partial solution at each step Register assignment may change across basic block (different register assignments for different parts of live range) Save some registers for the values relegated to memory (feasible set F) cs415, spring 14 Lecture 2 21

Top-down Allocator The idea: Keep busiest values in a register Use the feasible (reserved) set, F, for the rest Algorithm: Rank values by number of occurrences Allocate first k F values to registers Rewrite code to reflect these choices SPILL: Move values with no register into memory (add LOADs & STOREs) cs415, spring 14 Lecture 2 22

Bottom-up Allocator The idea: Focus on replacement rather than allocation Keep values used soon in registers Algorithm: Start with empty register set Load on demand When no register is available, free one Replacement: Spill the value whose next use is farthest in the future Sound familiar? Think page replacement... cs415, spring 14 Lecture 2 23

An Example Ø Here is a sample code sequence loadi 1028 r1 // r1 1028 load r1 r2 // r2 MEM(r1) == y mult r1, r2 r3 // r3 1028 y loadi 5 r4 // r4 5 sub r4, r2 r5 // r5 5 y loadi 8 r6 // r6 8 mult r5, r6 r7 // r7 8 (5 y) sub r7, r3 r8 // r5 8 (5 y) (1028 y) store r8 r1 // MEM(r1) 8 (5 y) (1028 y) cs415, spring 14 Lecture 3 24

An Example : Top-Down Ø Live Ranges 1 loadi 1028 r1 // r1 2 load r1 r2 // r1 r2 3 mult r1, r2 r3 // r1 r2 r3 4 loadi 5 r4 // r1 r2 r3 r4 5 sub r4, r2 r5 // r1 r3 r5 6 loadi 8 r6 // r1 r3 r5 r6 7 mult r5, r6 r7 // r1 r3 r7 8 sub r7, r3 r8 // r1 r8 9 store r8 r1 // NOTE: live sets on exit of each instruction cs415, spring 14 Lecture 2 25

An Example Ø Top down (3 physical registers (k-f): ra, rb, rc) 1 loadi 1028 r1 // r1 2 load r1 r2 // r1 r2 3 mult r1, r2 r3 // r1 r2 r3 4 loadi 5 r4 // r1 r2 r3 r4 5 sub r4, r2 r5 // r1 r3 r5 6 loadi 8 r6 // r1 r3 r5 r6 7 mult r5, r6 r7 // r1 r3 r7 8 sub r7, r3 r8 // r1 r8 9 store r8 r1 // Ø Consider statements with MAXLIVE > (k-f) - number of occurrences of virtual register - length of live range cs415, spring 14 Lecture 2 26

An Example Ø Top down (3 physical registers (k-f): ra, rb, rc) 1 loadi 1028 r1 // r1 2 load r1 r2 // r1 r2 3 mult r1, r2 r3 // r1 r2 r3 4 loadi 5 r4 // r1 r2 r3 r4 5 sub r4, r2 r5 // r1 r3 r5 6 loadi 8 r6 // r1 r3 r5 r6 7 mult r5, r6 r7 // r1 r3 r7 8 sub r7, r3 r8 // r1 r8 9 store r8 r1 // Ø Consider statements with MAXLIVE > (k-f) - number of occurrences of virtual register (most important) - length of live range (tie breaker) - Note: This is different from the algorithm discussed in EAC! cs415, spring 14 Lecture 2 27

An Example Ø Top down (3 physical registers: ra, rb, rc) 1 loadi 1028 ra // r1 2 load ra rb // r1 r2 3 mult ra, rb f1 // r1 r2 r3 store* f1 10 // spill code 4 loadi 5 rc // r1 r2 r3 r4 5 sub rc, rb rb // r1 r3 r5 6 loadi 8 rc // r1 r3 r5 r6 7 mult rb, rc rb // r1 r3 r7 load* 10 f1 // spill code 8 sub rb, f1 rb // r1 r8 9 store rb ra // Ø Insert spill code for every occurrence of spilled virtual register in basic block cs415, spring 14 Lecture 2 28

Spill code A virtual register is spilled by using only registers from the feasible set (F), not the allocated set (k-f) How to insert spill code, with F = {f1, f2, }? For the definition of the spilled value (assignment of the value to the virtual register), use a feasible register as the target register and then use an additional register to load its address in memory, and perform the store: add r1, r2 f1 loadi @f f2 // value lives at memory location @f store f1 f2 For the use of the spilled value, load value from memory into a feasible register: loadi @f f1 load f1 f1 add f1, r2 r1 How many feasible registers to we need for an add instruction? cs415, spring 14 Lecture 3 29

Next class More Register Allocation Instruction Scheduling Read EaC: Chapters 12.1 12.3 Recitation starts TODAY! First homework will be posted tomorrow. http://www.cs.rutgers.edu/~zz124/cs415_spring2014/ cs415, spring 14 Lecture 2 30