Compiler Optimization and Code Generation

Similar documents
Lecture 6. Register Allocation. I. Introduction. II. Abstraction and the Problem III. Algorithm

Lecture 15 Register Allocation & Spilling

Register Allocation. Stanford University CS243 Winter 2006 Wei Li 1

CODE GENERATION Monday, May 31, 2010

CSC D70: Compiler Optimization Register Allocation

CSE 504: Compiler Design. Code Generation

Code Generation (#')! *+%,-,.)" !"#$%! &$' (#')! 20"3)% +"#3"0- /)$)"0%#"

Simple Machine Model. Lectures 14 & 15: Instruction Scheduling. Simple Execution Model. Simple Execution Model

Compiler Architecture

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

Register Allocation. CS 502 Lecture 14 11/25/08

Register Allocation. Lecture 38

Register Allocation. Lecture 16

Register Allocation (via graph coloring) Lecture 25. CS 536 Spring

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1

Register Allocation 1

Code Generation. CS 540 George Mason University

Lecture 25: Register Allocation

Register allocation. Overview

Global Register Allocation

CS5363 Final Review. cs5363 1

Control Instructions. Computer Organization Architectures for Embedded Computing. Thursday, 26 September Summary

Control Instructions

Instruction Selection and Scheduling

General issues. Section 9.1. Compiler Construction: Code Generation p. 1/18

Compilers and Code Optimization EDOARDO FUSELLA

register allocation saves energy register allocation reduces memory accesses.

Code generation scheme for RCMA

Compiler Optimization Techniques

Fall Compiler Principles Lecture 12: Register Allocation. Roman Manevich Ben-Gurion University

Code Generation. M.B.Chandak Lecture notes on Language Processing

Wrap-Up. Lecture 10 Computer Architecture V. Performance Metrics. Overview. CPI for PIPE (Cont.) CPI for PIPE. Clock rate

Register Allocation via Hierarchical Graph Coloring

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

UNIT-V. Symbol Table & Run-Time Environments Symbol Table

CSc 553. Principles of Compilation. 23 : Register Allocation. Department of Computer Science University of Arizona

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

COMP 303 Computer Architecture Lecture 3. Comp 303 Computer Architecture

Lecture Outline. Intermediate code Intermediate Code & Local Optimizations. Local optimizations. Lecture 14. Next time: global optimizations

CA Compiler Construction

Lecture 21 CIS 341: COMPILERS

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Global Register Allocation

Lecture 19: Instruction Level Parallelism

More Code Generation and Optimization. Pat Morin COMP 3002

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413

G Compiler Construction Lecture 12: Code Generation I. Mohamed Zahran (aka Z)

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

CS 406/534 Compiler Construction Putting It All Together

Wrap-Up. Lecture 10 Computer Architecture V. Performance Metrics. Overview. CPI for PIPE (Cont.) CPI for PIPE. Clock rate

Superscalar Processors Ch 13. Superscalar Processing (5) Computer Organization II 10/10/2001. New dependency for superscalar case? (8) Name dependency

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:

Register Allocation & Liveness Analysis

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

Global Register Allocation - Part 2

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Liveness Analysis and Register Allocation. Xiao Jia May 3 rd, 2013

CS553 Lecture Profile-Guided Optimizations 3

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes

Code Generation. The Main Idea of Today s Lecture. We can emit stack-machine-style code for expressions via recursion. Lecture Outline.

We can emit stack-machine-style code for expressions via recursion

CSc 453. Code Generation I Christian Collberg. Compiler Phases. Code Generation Issues. Slide Compilers and Systems Software.

Compilation /17a Lecture 10. Register Allocation Noam Rinetzky

Getting CPI under 1: Outline

CS 110 Computer Architecture Lecture 6: More MIPS, MIPS Functions

Do-While Example. In C++ In assembly language. do { z--; while (a == b); z = b; loop: addi $s2, $s2, -1 beq $s0, $s1, loop or $s2, $s1, $zero

Intermediate Code & Local Optimizations

Instruction Set Principles and Examples. Appendix B

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Global Register Allocation - Part 3

Register Allocation. Introduction Local Register Allocators

Liveness Analysis and Register Allocation

EECS 583 Class 15 Register Allocation

Instruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Low-level optimization

Architectures. Code Generation Issues III. Code Generation Issues II. Machine Architectures I

MIPS Programming. A basic rule is: try to be mechanical (that is, don't be "tricky") when you translate high-level code into assembler code.

Announcements. Class 7: Intro to SRC Simulator Procedure Calls HLL -> Assembly. Agenda. SRC Procedure Calls. SRC Memory Layout. High Level Program

Administration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering

CS 432 Fall Mike Lam, Professor. Code Generation

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Anne Bracy CS 3410 Computer Science Cornell University

Introduction. CSc 453. Compilers and Systems Software. 19 : Code Generation I. Department of Computer Science University of Arizona.

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices

Intermediate Code & Local Optimizations. Lecture 20

Compilers and computer architecture: A realistic compiler to MIPS

Instruction Set Architecture. "Speaking with the computer"

Topic 12: Register Allocation

Course Administration

Agenda. CSE P 501 Compilers. Big Picture. Compiler Organization. Intermediate Representations. IR for Code Generation. CSE P 501 Au05 N-1

Branch Addressing. Jump Addressing. Target Addressing Example. The University of Adelaide, School of Computer Science 28 September 2015

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

Other Forms of Intermediate Code. Local Optimizations. Lecture 34

Administrative. Other Forms of Intermediate Code. Local Optimizations. Lecture 34. Code Generation Summary. Why Intermediate Languages?

! What do we care about? n Fast program execution. n Efficient memory usage. n Avoid memory fragmentation. n Maintain data locality

Memory Usage 0x7fffffff. stack. dynamic data. static data 0x Code Reserved 0x x A software convention

Chapter 10 Memory Model for Program Execution. Problem

Transcription:

Compiler Optimization and Code Generation Professor: Sc.D., Professor Vazgen elikyan 1

Course Overview ntroduction: Overview of Optimizations 1 lecture ntermediate-code Generation 2 lectures achine-ndependent Optimizations 3 lectures Code Generation 2 lectures 2

Code Generation 3

achine Code Generation nput: intermediate code + symbol tables n this case, three-address code All variables have values that machines can directly manipulate Assume program is free of errors Output: Type checking has taken place, type conversion done Absolute/relocatable machine code or assembly code n this case, use assembly Architecture variations: SC, CSC, stack-based ssues: emory management, instruction selection and scheduling, register allocation and assignment 4

etargetable ack nd achine Description uild retargetable compilers solate machine dependent info Compilers on different machines share a common Can have common front and mid ends Table-based back ends share common algorithms Table-based instruction selector ack nd Generator Tables Pattern- atching engine nstruction Selector Create a description of target machine, use back-end generator 5

Translating from Three-Address Code No more support for structured control-flow unction calls => explicit memory management and goto jumps very three-address instruction is translated into one or more target machine instructions The original evaluation order is maintained emory management very variable must have a location to store its value egister, stack, heap, static storage emory allocation convention Scalar/atomic values and addresses => registers, stacks Arrays => heap Global variables => static storage 6

Assigning Storage Locations Compilers must choose storage locations for all values Procedure-local storage Local variables not preserved across procedural calls Procedure-static storage Local variables preserved across procedural calls Global storage - global variables un-time heap - dynamically allocated storage egisters - temporary storage for applying operations to values Unambiguous values can be assigned to registers with no backup storage 7

unction Call and eturn At each function call Allocate an new A on stack Save return address in new A Set parameter values and return results Go to caller s code Save SP and other regs; set AL if necessary At each function return estore SP and regs Go to return address in caller s A Pop caller s A off stack Different languages may implement this differently p1 sp eturn address parameters eturn result Control link Access link egister save area Local variables eturn address parameters eturn result Control link Access link egister save area Local variables 8

Translating unction Calls Use a register SP to store address of activation record on top of stack SP,AL and other registers saved/restored by caller Use C(s) address mode to access parameters and local variables /* code for s */ Action1 Param 5 Call q, 1 Action2 Halt /* code for q */ Action3 return LD stackstart =>SP /* initialize stack*/ 108: ACTON1 128: Add SP,ssize=>SP /*now call sequence*/ 136: ST 160 =>*SP /*push return addr*/ 144: ST 5 => 2(SP) /* push param1*/ 152: 300 /* call q */ 160: SU SP, ssize =>SP /*restore SP*/ 168: ACTON2 190: HALT /* code for q*/ 300: save SP,AL and other regs ACTON3 restore SP,AL and other regs 400: *0(SP) /* return to caller*/ 9

Translating Variable Assignment Keep track of locations for variables in symbol table The current value of a variable may reside in a register, a stack memory location, a static memory location, or a set of these Use symbol table to store locations of variables Allocation of variables to registers Assume infinite number of pseudo registers elocate pseudo registers afterwards Statements t := a - b u := t + c Generated code LD a => r0 LD b => r1 SU r0,r1=>r0 LD c => r2 ADD r0,r2=>r0 egister descriptor r0 contains t r1 contains b r0 contains u r1 contains b r2 contains c Address descriptor t in r0 b in r1 u in r0 b in r1 c in r2 10

Translating Arrays Arrays are allocated in heap Statement i in register ri i in memory i i in stack a := b[i] ult ri, elsize=>r1 LD b(r1)=>ra LD i => ri ult i,elsize=>r1 LD b(r1) =>ra LD i(sp) => ri ult ri,elsize=>r1 LD b(r1) =>ra a[i] := b ult ri, elsize=>r1 ST rb => a(r1) LD i => ri ult i,elsize=>r1 ST rb => a(r1) LD i(sp) => ri ult ri,elsize=>r1 ST rb => a(r1) 11

Translating Conditional Statements Condition determined after ADD or SU f x < y goto z SU rx, ry =>rt LTZ z := y + z if (x < 0) goto L ADD ry, rz => rx LTZ L 12

Peephole Optimization Use a simple scheme to match to machine code fficiently discover local improvements by examining short sequences of adjacent operations StoreA r1 => SP, 8 loada SP,8 => r15 StoreA r1 => SP, 8 r2r r1 => r15 add r2, 0 => r7 ult r4, r7 => r10 ult r4, r2 => r10 jump -> L10 L10: jump -> L11 jump -> L11 L10: jump -> L11 13

fficiency of Peephole Optimization Design issues Dead values ay intervene with valid simplification Need to be recognized expansion process Control flow operations Complicates simplifier: Clear window vs. special-case handling Physical vs. logical windows Adjacent operations may be irrelevant Sliding window includes ops that define or use common values SC vs. CSC architectures SC architectures makes instruction selection easier Additional issues Automatic tools to generate large pattern libraries for different architectures ront ends that generate LL make compilers more portable 14

egister Allocation Problem Allocation of variables (pseudo-registers) to hardware registers in a procedure eatures The most important optimization Directly reduces running time (memory access => register access) Useful for other optimizations Goals.g. CS assumes old values are kept in registers ind an allocation for all pseudo-registers, if possible. f there are not enough registers in the machine, choose registers to spill to memory 15

An Abstraction for Allocation and Assignment Two pseudo-registers interfere if at some point in the program they cannot both occupy the same register. nterference graph: an undirected graph, where Nodes = pseudo-registers There is an edge between two nodes if their corresponding pseudo-registers interfere hat is not represented xtent of the interference between uses of different variables here in the program is the interference 16

egister Allocation and Coloring A graph is n-colorable if: very node in the graph can be colored with one of the n colors such that two adjacent nodes do not have the same color. Assigning n register (without spilling) = Coloring with n colors Assign a node to a register (color) such that no two adjacent nodes are assigned same registers(colors) Spilling is necessary = The graph is n-colorable 17

Algorithm Step 1: uild an interference graph efining notion of a node inding the edges Step 2. Coloring Use heuristics to try to find an n-coloring Success: Colorable and we have an assignment ailure: Graph not colorable Graph is colorable, but it is too expensive to color 18

Live anges and erged Live anges otivation: to create an interference graph that is easier to color liminate interference in a variable s dead zones ncrease flexibility in allocation Can allocate same variable to different registers A live range consists of a definition and all the points in a program (e.g. end of an instruction) in which that definition is live. Two overlapping live ranges for the same variable must be merged a = a = = a 19

xample Live Variables eaching Definitions A =... (A1) A goto L1 {} {} {A} {A1} {A} {A1} {A} {A1} {A,} {A1,1} {} {A1,1} {D} {A1,1,D2} =... (1) = A D = (D2) L1: C = (C1) = A {A} {A1} {A,C} {A1,C1} {C} {A1,C1} {D} {A1,C1,D1} D = (D1) A = 2 (A2) {D} {A1,1,C1,D1,D2} {A,D} {A2,1,C1,D1,D2} {A,D} {A2,1,C1,D1,D2} {D} {A2,1,C1,D1,D2} = A et D erge 20

erging Live anges erging definitions into equivalence classes Start by putting each definition in a different equivalence class or each point in a program: f (i) variable is live, and (ii) there are multiple reaching definitions for the variable, then: erge the equivalence classes of all such definitions into one equivalence class rom now on, refer to merged live ranges simply as live ranges erged live ranges are also known as webs 21

dges of nterference Graph Two live ranges (necessarily of different variables) may interfere if they overlap at some point in the program. Algorithm At each point in the program enter an edge for every pair of live ranges at that point. An optimized definition & algorithm for edges: Algorithm: aster Check for interference only at the start of each live range etter quality 22

Coloring Coloring for n > 2 is NP-complete Observations: A node with degree < n can always color it successfully, given its neighbors colors Coloring Algorithm terate until stuck or done Pick any node with degree < n emove the node and its edges from the graph f done (no nodes left) reverse process and add colors xample ( n=3 ) A C D Note: degree of a node may drop in iteration 23

hen Coloring ails Using heuristics to improve its chance of success and to spill code uild interference graph terative until there are no nodes left f there exists a node v with less than n neighbor place v on stack to register allocate else v = node chosen by heuristics (least frequently executed, has many neighbors) place v on stack to register allocate (mark as spilled) remove v and its edges from graph hile stack is not empty emove v from stack einsert v and its edges into the graph Assign v a color that differs from all its neighbors (guaranteed to be possible for nodes not marked as spilled) 24

egister Allocation: Summary Problem: ind an assignment for all pseudo-registers, whenever possible. Solution: Abstract on Abstraction: an interference graph Nodes: live ranges dges: presence of live range at time of definition egister Allocation and Assignment problems quivalent to n-colorability of interference graph Heuristics to find an assignment for n colors Successful: colorable, and finds assignment Not successful: colorability unknown and no assignment 25

nstruction Scheduling: The Goal Assume that the remaining instructions are all essential: otherwise, earlier passes would have eliminated them The way to perform fixed amount of work in less time: execute the instructions in parallel Time a = 1 + x; a = 1 + x; = 2 + y; c = z + 3; = 2 + y; c = z + 3; 26

Hardware Support for Parallel xecution Three forms of parallelism are found in modern machines: Pipelining Superscalar Processing ultiprocessing nstruction Scheduling Automatic Parallelization 27

Pipelining asic idea: reak instruction into stages that can be overlapped xample: Simple 5-stage pipeline from early SC machines Time nstruction = nstruction etch = Decode & egister etch = xecute on ALU = emory Access = rite ack to egister ile 28

29 Pipelining llustration Time n a given cycle, each instruction is in a different stage

eyond Pipelining: Superscalar Processing asic idea: ultiple (independent) instructions can proceed simultaneously through the same pipeline stages equires additional hardware Pipe egister Pipe egister Pipe egister r1 + r2 Pipe egister Pipe egister r3 + r4 r1 + r2 Pipe egister Abstract epresentation 30 Hardware for Scalar pipeline 1 ALU Hardware for 2 way superscalar 2 ALUs

31 Superscalar Pipeline llustration Time Original (scalar) pipeline: Only one instruction in a given pipe stage at a given time Superscalar pipeline: ultiple instructions in the same pipe stage at the same time

Limitations upon Scheduling Hardware esources Processors have finite resources, and there are often constraints on how these resources can be used. xamples: inite issue width Limited functional units (Us) per given instruction type Limited pipelining within a given functional unit (U) Data Dependences hile reading or writing a data location too early, the program may behave incorrectly. Control Dependences mpractical to schedule for all possible paths Choosing an expected path may be difficult 32

Predictable Success 33