Compiler Architecture

Similar documents
Code Generation. CS 540 George Mason University

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Instruction Set Architecture

Processor Architecture

Chapter 2. Instructions:

Do-While Example. In C++ In assembly language. do { z--; while (a == b); z = b; loop: addi $s2, $s2, -1 beq $s0, $s1, loop or $s2, $s1, $zero

Computer Organization MIPS Architecture. Department of Computer Science Missouri University of Science & Technology

LECTURE 10. Pipelining: Advanced ILP

Introduction to MIPS Processor

Computer Architecture. The Language of the Machine

COMP 303 Computer Architecture Lecture 3. Comp 303 Computer Architecture

Instruction Set Architecture part 1 (Introduction) Mehran Rezaei

Instruction Set Architecture (Contd)

Prof. Kavita Bala and Prof. Hakim Weatherspoon CS 3410, Spring 2014 Computer Science Cornell University. See P&H 2.8 and 2.12, and A.

Instructions: Assembly Language

CS 110 Computer Architecture Lecture 6: More MIPS, MIPS Functions

Quiz for Chapter 2 Instructions: Language of the Computer3.10

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro

Instruction Set Architecture. "Speaking with the computer"

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

CA Compiler Construction

Lectures 3-4: MIPS instructions

Lectures 5. Announcements: Today: Oops in Strings/pointers (example from last time) Functions in MIPS

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes

Computer Architecture. Chapter 2-2. Instructions: Language of the Computer

Principles of Compiler Design

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices

Chapter 2A Instructions: Language of the Computer

Math 230 Assembly Programming (AKA Computer Organization) Spring 2008

Lecture 2. Instructions: Language of the Computer (Chapter 2 of the textbook)

ECE 331 Hardware Organization and Design. Professor Jay Taneja UMass ECE - Discussion 3 2/8/2018

MIPS Functions and Instruction Formats

Announcements HW1 is due on this Friday (Sept 12th) Appendix A is very helpful to HW1. Check out system calls

CS 61c: Great Ideas in Computer Architecture

Architecture II. Computer Systems Laboratory Sungkyunkwan University

ECE232: Hardware Organization and Design. Computer Organization - Previously covered

Branch Addressing. Jump Addressing. Target Addressing Example. The University of Adelaide, School of Computer Science 28 September 2015

Computer Science 246 Computer Architecture

Today s topics. MIPS operations and operands. MIPS arithmetic. CS/COE1541: Introduction to Computer Architecture. A Review of MIPS ISA.

Chapter 3. Instructions:

101 Assembly. ENGR 3410 Computer Architecture Mark L. Chang Fall 2009

MIPS Procedure Calls - Review

5/17/2012. Recap from Last Time. CSE 2021: Computer Organization. The RISC Philosophy. Levels of Programming. Stored Program Computers

Recap from Last Time. CSE 2021: Computer Organization. Levels of Programming. The RISC Philosophy 5/19/2011

Levels of Programming. Registers

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

Chapter 2. Computer Abstractions and Technology. Lesson 4: MIPS (cont )

Control Instructions. Computer Organization Architectures for Embedded Computing. Thursday, 26 September Summary

Control Instructions

CS 61C: Great Ideas in Computer Architecture More MIPS, MIPS Functions

Register allocation. instruction selection. machine code. register allocation. errors

Lecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

Architectures. Code Generation Issues III. Code Generation Issues II. Machine Architectures I

Introduction to the MIPS. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

MIPS R-format Instructions. Representing Instructions. Hexadecimal. R-format Example. MIPS I-format Example. MIPS I-format Instructions

CS 316: Procedure Calls/Pipelining

ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design

CENG3420 Lecture 03 Review

Liveness Analysis and Register Allocation. Xiao Jia May 3 rd, 2013

Pipeline issues. Pipeline hazard: RaW. Pipeline hazard: RaW. Calcolatori Elettronici e Sistemi Operativi. Hazards. Data hazard.

Fall Compiler Principles Lecture 12: Register Allocation. Roman Manevich Ben-Gurion University

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

COE608: Computer Organization and Architecture

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

CSCI 402: Computer Architectures. Instructions: Language of the Computer (3) Fengguang Song Department of Computer & Information Science IUPUI.

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer

Code Generation. The Main Idea of Today s Lecture. We can emit stack-machine-style code for expressions via recursion. Lecture Outline.

This course provides an overview of the SH-2 32-bit RISC CPU core used in the popular SH-2 series microcontrollers

We can emit stack-machine-style code for expressions via recursion

Course on Advanced Computer Architectures

Instructions: Language of the Computer

Photo David Wright STEVEN R. BAGLEY PIPELINES AND ILP

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413

Four Steps of Speculative Tomasulo cycle 0

Chapter 2. Instruction Set Architecture (ISA)

COMPSCI 313 S Computer Organization. 7 MIPS Instruction Set

MIPS%Assembly% E155%

CS61C : Machine Structures

MODULE 4 INSTRUCTIONS: LANGUAGE OF THE MACHINE

Computer Architecture, EIT090 exam

CODE GENERATION Monday, May 31, 2010

Compilation /17a Lecture 10. Register Allocation Noam Rinetzky

Chapter 2: Instructions:

Computer Architecture

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Memory Usage 0x7fffffff. stack. dynamic data. static data 0x Code Reserved 0x x A software convention

register allocation saves energy register allocation reduces memory accesses.

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

EEC 581 Computer Architecture Lecture 1 Review MIPS

CS/COE1541: Introduction to Computer Architecture

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced processor designs

Lecture 4: MIPS Instruction Set

CSE 378 Midterm 2/12/10 Sample Solution

Instructions: Language of the Computer

MIPS Assembly Language Guide

CS252 Graduate Computer Architecture Midterm 1 Solutions

Transcription:

Code Generation 1

Compiler Architecture Source language Scanner (lexical analysis) Tokens Parser (syntax analysis) Syntactic structure Semantic Analysis (IC generator) Intermediate Language Code Optimizer Code Generator Target language Symbol Table 2

Code Generation The code generation problem is the task of mapping intermediate code to machine code Machine Dependent Optimization Requirements: Correctness Efficiency 3

Issues Input language: intermediate code (optimized or not) Target architecture: must be well understood Interplay between Instruction Selection Register Allocation Instruction Scheduling 4

Example Target: MIPS Assembly Language General Characteristics Byte-addressable with 4-byte words N general-purpose registers Three-address instructions: op destination, source1, source2 0 zero constant 0 1 at reserved for assembler 2 v0 expression evaluation & 3 v1 function results 4 a0 arguments 5 a1 6 a2 7 a3 8 t0 temporary: caller saves... (callee can clobber) 15 t7 16 s0 callee saves... (caller can clobber) 23 s7 24 t8 temporary (cont d) 25 t9 26 k0 reserved for OS kernel 27 k1 28 gp Pointer to global area 29 sp Stack pointer 30 fp frame pointer 31 ra Return Address (HW) 5

Instruction Selection There may be a large number of candidate machine instructions for a given IC instruction each has own cost and constraints cost may be influenced by surrounding context different architectures have different needs that must be considered: speed, power constraints, space 6

Instruction Scheduling Choosing the order of instructions to best utilize resources Architecture RISC (pipeline) Vector processing Superscalar and VLIW Memory hierarchy Ordering to decrease memory fetching Latency tolerance doing something when data does have to be fetched 7

Register Allocation How to best use the bounded number of registers Complications: Special purpose registers Operators requiring multiple registers 8

Naive Approach to Code Generation Simple code generation algorithm: Define a target code sequence for each intermediate code statement type Why is this not sufficient? 9

Mapping from Intermediate Code Simple code generation algorithm: Define a target code sequence to each intermediate code statement type Intermediate becomes Intermediate becomes a := b lw $t0,b a := b + c lw $t0,b sw $t0,a lw $,c add $t0,$t0,$ sw $t0,a a := b[c] la $t0,b lw $,c a[b] := c la $t0,a lw $,b add $t0,$t0,$ lw $t0,($t0) add $t0,$t0,$ lw $,c sw $t0,a sw $,($t0) 10

Mapping from Intermediate Code Consider the C statement: a[i] = d[c[k]]; := 4 * k t2 := c[] t3 := 4 * t2 lw $t0,k sll $t0,$t0,2 sw $t0, la $t0,c lw $, add $t0,$t0,$ lw $t0,($t0) sw $t0,t2 lw $t0,t2 sll $t0,$t0,2 sw $t0,t3 t4 := d[t3] t5 := 4 * i a[t5] := t4 la $t0,d lw $,t3 add $t0,$t0,$ lw $t0,($t0) sw $t0,t4 lw $t0,i sll $t0,$t0,2 sw $t0,t5 la $t0,a lw $,t5 add $t0,$t0,$ lw $,t4 sw $,($t0) We use 24 instructions (18 load/store + 6 arithmetic) and allocate space for five temporaries (but only use two registers). 11

Problems with this approach Local decisions do not produce good code Does not take temporary variables into account Get rid of the temporaries (reduce load/store): a[i] = d[c[k]]; la $t0,c lw $,k sll $,$,2 add $t0,$t0,$ # address of c[k] lw $t0,($t0) la $,d sll $t0,$t0,2 add $,$,$t0 # address of d[c[k]] lw $,($) la $t0,a lw $t2,i sll $t2,$t2,2 add $t0,$t0,$t2 # address of a[i] sw $,($t0) 12

How to Improve Quality Need a way to generate machine code based on past and future use of the data Analyze the code Use results of analysis 13

Representing Intermediate Code: Control Flow Graph - CFG CFG = < V, E, Entry >, where V = vertices or nodes, representing an instruction or basic block (group of statements). E = (V x V) edges, potential flow of control Entry is an element of V, the unique program entry 1 2 3 4 5 14

Basic Blocks A basic block is a sequence of consecutive statements with single entry/single exit: Flow of control only enters at the beginning Flow of control only leaves at the end Variants: single entry/multiple exit, multiple entry/single exit 15

Generating CFGs from Intermediate Code Partition intermediate code into basic blocks Add edges corresponding to control flow between blocks Unconditional goto Conditional goto multiple edges No goto at end control passes to first statement of next block 16

Partitioning into Basic Blocks Input: A sequence of intermediate code statements Determine the leaders, the first statements of basic blocks The first statement in the sequence is a leader Any statement that is the target of a goto (conditional or unconditional) is a leader Any statement immediately following a goto (conditional or unconditional) is a leader For each leader, its basic block is the leader and all statements up to, but not including, the next leader or the end of the program 17

Example Code (1) i := m 1 (16) t7 := 4 * i (2) j := n (17) t8 := 4 * j (3) := 4 * n (18) t9 := a[t8] (4) v := a[] (19) a[t7] := t9 (5) i := i + 1 (20) 0 := 4 * j (6) t2 := 4 * i (21) a[0] := x (7) t3 := a[t2] (22) goto (5) (8) if t3 < v goto (5) (23) 1 := 4 * i (9) j := j - 1 (24) x := a[1] (10) t4 := 4 * j (25) 2 := 4 * i (11) t5 := a[t4] (26) 3 := 4 * n (12) If t5 > v goto (9) (27) 4 := a[3] (13) if i >= j goto (23) (28) a[2] := 4 (14) t6 := 4*i (29) 5 := 4 * n (15) x := a[t6] (30) a[5] := x 18

Block Leaders (1) i := m 1 (16) t7 := 4 * i (2) j := n (17) t8 := 4 * j (3) := 4 * n (18) t9 := a[t8] (4) v := a[] (19) a[t7] := t9 (5) i := i + 1 (20) 0 := 4 * j (6) t2 := 4 * i (21) a[0] := x (7) t3 := a[t2] (22) goto (5) (8) if t3 < v goto (5) (23) 1 := 4 * i (9) j := j - 1 (24) x := a[1] (10) t4 := 4 * j (25) 2 := 4 * i (11) t5 := a[t4] (26) 3 := 4 * n (12) If t5 > v goto (9) (27) 4 := a[3] (13) if i >= j goto (23) (28) a[2] := 4 (14) t6 := 4*i (29) 5 := 4 * n (15) x := a[t6] (30) a[5] := x 19

Flow Graph (1) i := m 1 (16) t7 := 4 * i (2) j := n (17) t8 := 4 * j (3) := 4 * n (18) t9 := a[t8] (4) v := a[] (19) a[t7] := t9 (5) i := i + 1 (20) 0 := 4 * j (6) t2 := 4 * i (21) a[0] := x (7) t3 := a[t2] (22) goto (5) (8) if t3 < v goto (5) (23) 1 := 4 * i (9) j := j - 1 (24) x := a[1] (10) t4 := 4 * j (25) 2 := 4 * i (11) t5 := a[t4] (26) 3 := 4 * n (12) If t5 > v goto (9) (27) 4 := a[3] (13) if i >= j goto (23) (28) a[2] := 4 (14) t6 := 4*i (29) 5 := 4 * n (15) x := a[t6] (30) a[5] := x 20

Instruction Scheduling Choosing the order of instructions to best utilize resources (CPU, registers, ) Consider RISC pipeline architecture: IF ID IF EX ID IF MA EX ID WB MA EX WB MA WB IF Instruction Fetch ID Instruction Decode EX Execute MA Memory access WB Write back time 21

Hazards 1. Structural hazards machine resources limit overlap 2. Data hazards output of instruction needed by later instruction 3. Control hazards branching Pipeline stalls! 22

Data Hazards Memory latency: lw R1,0(R2) IF ID EX MA WB add R3,R1,R4 IF ID stall EX MA WB Can t add until register R1 is loaded. 23

Structural Hazards Instruction latency: addf R3,R1,R2 IF ID EX EX MA WB addf R3,R3,R4 IF ID stall EX EX MA WB Assumes floating point ops take 2 execute cycles 24

Dealing with Data Hazards Typical solution is to re-order statements To do this without changing the outcome, need to understand the relationship (dependences) between statements addf R3,R1,R2 IF ID EX EX MA WB add R5,R5,R6 IF ID EX MA WB addf R3,R3,R4 IF ID EX EX MA WB 25

Instruction Scheduling Many operations have non-zero latencies Execution time is order-dependent Assumed latencies (conservative): Operation Cycles load 3 store 3 loadi 1 add 1 mult 2 fadd 1 fmult 2 shift 1 branch 0 to 8 26

w w * 2 * x * y * z Schedule 1 1 lw $t0,w 4 add $t0,$t0,$t0 5 lw $,x 8 mult $t0,$t0,$ 9 lw $,y 12 mult $t0,$t0,$ 13 lw $,z 16 mult $t0,$t0,$ 18 sw $t0,w Schedule 2 1 lw $t0,w 2 lw $,x 3 lw $t2,y 4 add $t0,$t0,$t0 5 mult $t0,$t0,$ 6 lw $,z 7 mult $t0,$t0,$t2 9 mult $t0,$t0,$ 11 sw $t0,w done at time 21 done at time 14 Issue time 27

Control Hazards Branch instruction IF ID EX MA WB IF stall stall IF ID EX MA WB Stall if branch is made 28

Branch Scheduling Problem: Branches often take some number of cycles to complete, creating delay slots Can be a delay between a compare b and its associated branch Even unconditional branches have delay slots A compiler will try to fill these delay slots with valid instructions (rather than nop) 29

Example Assume loads take 2 cycles and branches have a delay slot 7 cycles Can look at the dependencies between the statements and move a statement into the delay slot Instruction lw $t2,4($) lw $t3,8($) add $t4, $t2, $t3 add $t5, $t2,1 b L1 nop 1 2 Start Time 1 2 4 5 6 7 4 3 30

Example 5 cycles filling delay slots Instruction lw $t2,4($) lw $t3,8($) add $t5, $t2,1 b L1 add $t4, $t2, $t3 Start Time 1 2 3 4 5 31

Register Allocation How to best use the bounded number of registers. Reducing load/store operations What are best values to keep in registers? When can we free registers? Complications: special purpose registers operators requiring multiple registers 32

Register Allocation Algorithms Local (basic block level): Basic - using liveness information Register Allocation using graph coloring Global (CFG) Need to use global liveness information 33

Basic Code Generation Deal with each basic block individually Compute liveness information for the block Using liveness information, generate code that uses registers as well as possible At end, generate code that saves any live values left in registers 34

Concept: Variable Liveness For some statement s, variable x is live if there is a statement t that uses x there is a path in the CFG from s to t there is no assignment to x on some path from s to t A variable is live at a given point in the source code if it could be used before it is defined Liveness tells us whether we care about the value held by a variable 35

Example: When Is a Live? a := b + c := a * a a is live b := + a c := * b t2 := c + b a := t2 + t2 Assume a,b and c are used after this basic block 36

Example: When Is b Live? a := b + c := a * a b := + a c := * b t2 := c + b a := t2 + t2 Assume a,b and c are used after this basic block 37

Computing Live Status in Basic Blocks Input: A basic block Output: For each statement, set of variables live after the statement Initially all non-temporary variables go into live set (L) for i = last statement to first statement: For statement i: x := y op z 1. Attach L to statement i 2. Remove x from set L 3. Add y and z to set L 38

Example a := b + c := a * a b := + a c := * b t2 := c + b a := t2 + t2 live = {} live = {} live = {} live = {} live = {} live = {} live = {a,b,c} 39

Example Answers a := b + c := a * a b := + a c := * b t2 := c + b a := t2 + t2 live = {} live = {} live = {} live = {} live = {} live = {b,c,t2} live = {a,b,c} 40

Example Answers a := b + c := a * a b := + a c := * b t2 := c + b a := t2 + t2 live = {} live = {} live = {} live = {} live = {b,c} live = {b,c,t2} live = {a,b,c} 41

Example Answers a := b + c := a * a b := + a c := * b t2 := c + b a := t2 + t2 live = {} live = {} live = {} live = {b,} live = {b,c} live = {b,c,t2} live = {a,b,c} 42

Example Answers a := b + c := a * a b := + a c := * b t2 := c + b a := t2 + t2 live = {} live = {} live = {a,} live = {b,} live = {b,c} live = {b,c,t2} live = {a,b,c} 43

Example Answers a := b + c := a * a b := + a c := * b t2 := c + b a := t2 + t2 live = {} live = {a} live = {a,} live = {b,} live = {b,c} live = {b,c,t2} live = {a,b,c} 44

Example Answers a := b + c := a * a b := + a c := * b t2 := c + b a := t2 + t2 live = {b,c} live = {a} live = {a,} live = {b,} live = {b,c} live = {b,c,t2} live = {a,b,c} what does this mean? 45

Basic Code Generation Deal with each basic block individually Compute liveness information for the block Using liveness information, generate code that uses registers as well as possible At end, generate code that saves any live values left in registers 46

Basic Code Generation Idea: Deal with the instructions from beginning to end. For each instruction, Use registers whenever possible A non-live value in a register can be discarded, freeing that register Data Structures: Register descriptor - register status (empty, full) and contents (one or more "values") Address descriptor - the location (or locations) where the current value for a variable can be found (register, stack, memory) 47

Instruction type: x := y op z Choose R x, the register where the result (x) will be kept If y (or z) is the only variable in a register t and not live after the statement, choose R x = t Else if there is a free register t, choose R x = t Else must free up a register for R x Find R y. If y is not in a register, generate load into a free register (or R x ) Find R z. If z is not in a register, generate load into a free register (can use R x if not used by y) Generate: OP R x, R y, R z 48

Instruction type: x := y op z Update information about the current location of x Update information for the register holding x If y and/or z are not live after this instruction, update register and address descriptors accordingly 49

Example Code a := b + c := a * a b := + a c := * b t2 := c + b a := t2 + t2 live = {b,c} live = {a} live = {a,} live = {b,} live = {b,c} live = {b,c,t2} live = {a,b,c} 50

Code Generation Example Initially Three Registers: ( -; -; -) all empty Current values: (a;b;c;;t2) = (m;m;m;-;-) Instruction 1: a := b + c, Live = {a} R a = $t0, R b = $t0, R c = $ lw $t0,b lw $,c add $t0,$t0,$ Registers: (a;-;-) Don t need to keep track of b or c since aren t live. current values: ($t0;-;-;-;-) 51

Code Generation Example Instruction 2: := a * a, Live = {a,} R = $ (since a is live after instruction) mul $,$t0,$t0 Registers: (a;;-) current values: ($t0;-;-;$;-) Instruction 3: b := + a, Live = {b,} Since a is not live after instruction, R b = $t0 add $t0,$,$t0 Registers: (b;;-) current values: (-;$t0;-;$;-) 52

Code Generation Example Instruction 4: c := * b, Live = {b,c} Since is not live after instruction, R c = $ mul $,$,$t0 Registers: (b;c;-) current values: (-;$t0;$;-;-) Instruction 5: t2 := c + b, Live = {b,c,t2} R t2 = $t2 add $t2,$,$t0 Registers: (b;c;t2) current values: (-;$t0;$;-;$t2) 53

Code Generation Example Instruction 6: a := t2 + t2, Live = {a,b,c} R a = $t2 add $t2,$t2,$t2 Registers: (b;c;a) current values: ($t2;$t0;$;-;-) Since end of block, move live variables: sw $t2,a sw $t0,b sw $,c all registers available all live variables moved to memory 54

Generated code lw $t0,b lw $,c add $t0,$t0,$ mul $,$t0,$t0 add $t0,$,$t0 mul $,$,$t0 add $t2,$,$t0 add $t2,$t2,$t2 sw $t2,a sw $t0,b sw $,c a := b + c := a * a b := + a c := * b t2 := c + b a := t2 + t2 Cost = 16 How does this compare to naive approach? 55

Improving Efficiency Liveness information allows us to keep values in registers if they will be used later (efficiency) Why do we assume all variables are live at the end of blocks? Can we do better? Why do we need to save live variables at the end? We might have to reload them in the next block. 56

Register Allocation with Graph Coloring Local register allocation - graph coloring problem Uses liveness information Allocate K registers where each register is associated with one of the K colors 57

Graph Coloring The coloring of a graph G = (V,E) is a mapping C: V S, where S is a finite set of colors, such that if edge vw is in E, C(v) <> C(w) Problem is NP (for more than 2 colors) no polynomial time solution Fortunately there are approximation algorithms 58

Coloring a Graph with K Colors K = 3 K = 4 No color for this node 59

Register Allocation and Graph K-Coloring K = number of available registers G = (V,E) where Vertex set V = {V s s is a program variable} Edge V s V t in E if s and t can be live at the same time G is an interference graph 60

Algorithm: K Registers 1. Compute liveness information for the basic block. Assume, that every live variable will be stored in a register. 2. Create interference graph G - one node for each variable, an edge connecting any two variables alive simultaneously 61

Example Interference Graph a := b + c := a * a b := + a c := * b t2 := c + b a := t2 + t2 {b,c} {a} {,a} {b,} {b,c} {b,c,t2} {a,b,c} a b c t2 62

Algorithm: K Registers 3. Simplify - For any node m with fewer than K neighbors, remove it from the graph and push it onto a stack. If G - m can be colored with K colors, so can G. If we reduce the entire graph, goto step 5. 4. Spill - If we get to the point where we are left with only nodes with degree >= K, mark some node for potential spilling. Remove and push onto stack. Back to step 3. 63

Choosing a Spill Node Potential criteria: Random Most neighbors Longest live range (in code) with or without taking the access pattern into consideration 64

Algorithm: K Registers 5. Assign colors - Starting with empty graph, rebuild graph by popping elements off the stack, putting them back into the graph and assigning them colors different from neighbors. Potential spill nodes may or may not be colorable. Process may require iterations and rewriting of some of the code to create more temporaries 65

Rewriting the Code Want to be able to remove some edges in the interference graph write variable to memory earlier compute/read in variable later Not all live variables will be stored in registers all the time. 66

Back to example a := b + c {b,c} := a * a {a} b := + a {,a} c := * b {b,} t2 := c + b {b,c} a := t2 + t2 {b,c,t2} {a,b,c} a b c t2 67

Example, k = 3 Assume k = 3 a c b t2 Remove Interference graph 68

Example Assume k = 3 a c b t2 a Remove a 69

Example Assume k = 3 a c a b Remove b b t2 70

Example Assume k = 3 a c c a b Remove c b t2 71

Example Assume k = 3 a c t2 c a b Remove t2 b t2 72

Rebuild the graph Assume k = 3 c a b t2 73

Example Assume k = 3 c a b t2 74

Example Assume k = 3 c b t2 a 75

Example Assume k = 3 a c b t2 76

Example Assume k = 3 a t0 a c b c t2 t2 b t2 t2 t0 77

Back to example a := b + c := a * a b := + a c := * b t2 := c + b a := t2 + t2 a b c t2 t0 t2 t2 t0 lw $,b lw $t2,c add $t0,$,$t2 mul $t2,$t0,$t0 add $,$t2,$t0 mul $t2,$t2,$ add $t0,$t2,$ add $t0,$t0,$t0 sw $t0,a sw $,b sw $t2,c 78

Generated code: Basic Generated Code: Coloring lw $t0,b lw $,c add $t0,$t0,$ mul $,$t0,$t0 add $t0,$,$t0 mul $,$,$t0 add $t2,$,$t0 add $t2,$t2,$t2 sw $t2,a sw $t0,b sw $,c lw $,b lw $t2,c add $t0,$,$t2 mul $t2,$t0,$t0 add $,$t2,$t0 mul $t2,$t2,$ add $t0,$t2,$ add $t0,$t0,$t0 sw $t0,a sw $,b sw $t2,c 79

Example, k = 2 Assume k = 2 a c b t2 b* Remove b as spill 80

Example Assume k = 2 a c b t2 b* Remove 81

Example Assume k = 2 a c a b* Remove a b t2 82

Example Assume k = 2 a c c a b* Remove c b t2 83

Example Assume k = 2 a c t2 c a b* Remove t2 b t2 84

Example Assume k = 2 a c Can flush b out to memory, creating a smaller window b??? t2 85

After Spilling b: a := b + c := a * a b := + a {b,c} {a} {,a} a c c := * b {b,} b to memory b t2 t2 := c + b {b,c} a := t2 + t2 {c,t2} {a,c} 86

After Spilling b: a c b t2 t2 87

After Spilling b: a c b t2 c* t2 Have to choose c as a potential spill node. 88

After Spilling b: a c b c* t2 b t2 89

After Spilling b: a c a b c* t2 b t2 90

After Spilling b: a c a b c* t2 b t2 91

Now Rebuild: a c a b c* t2 b t2 92

Now Rebuild: a c b c* t2 b t2 93

Now Rebuild: a c b t2 c* t2 94

Now Rebuild: a c b t2 t2 Fortunately, there is a color for c 95

Now Rebuild: a b c t2 t0 t0 t0 a c b t2 The graph is 2-colorable now 96

The Code a := b + c := a * a b := + a c := * b b to memory t2 := c + b a := t2 + t2 a b c t2 t0 t0 t0 lw $t0,b lw $,c add $t0,$t0,$ mul $,$t0,$t0 add $t0,$,$t0 mul $,$,$t0 sw $t0,b add $t0,$,$t0 add $t0,$t0,$t0 sw $t0,a sw $,c 97