Agenda. CSE P 501 Compilers. Big Picture. Compiler Organization. Intermediate Representations. IR for Code Generation. CSE P 501 Au05 N-1

Similar documents
CSE P 501 Compilers. Instruc7on Selec7on Hal Perkins Winter UW CSE P 501 Winter 2016 N-1

Instruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Instruction Selection and Scheduling

CSE 504: Compiler Design. Instruction Selection

Topic 6 Basic Back-End Optimization

Instruction Selection, II Tree-pattern matching

Instruction Selection: Preliminaries. Comp 412

CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation

Lecture 12: Compiler Backend

CS 406/534 Compiler Construction Putting It All Together

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

CS415 Compilers. Intermediate Represeation & Code Generation

CSE 401/M501 Compilers

Where we are. What makes a good IR? Intermediate Code. CS 4120 Introduction to Compilers

Where we are. Instruction selection. Abstract Assembly. CS 4120 Introduction to Compilers

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Low-level optimization

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1

Introduction to Compiler

Code generation for modern processors

Intermediate Code & Local Optimizations

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

Code generation for modern processors

Intermediate Code Generation

CS5363 Final Review. cs5363 1

x86: assembly for a real machine Compiler construction 2012 x86 assembler, a first example Example explained Lecture 7

The View from 35,000 Feet

Compilers. Lecture 2 Overview. (original slides by Sam

Intermediate Representations

CSE P 501 Compilers. x86 Lite for Compiler Writers Hal Perkins Autumn /25/ Hal Perkins & UW CSE J-1

Administration. Where we are. Canonical form. Canonical form. One SEQ node. CS 412 Introduction to Compilers

We ve written these as a grammar, but the grammar also stands for an abstract syntax tree representation of the IR.

CSE 401 Final Exam. December 16, 2010

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

CSE 401/M501 Compilers

COMP 181. Prelude. Intermediate representations. Today. High-level IR. Types of IRs. Intermediate representations and code generation

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

Summary: Direct Code Generation

CODE GENERATION Monday, May 31, 2010

Fall Compiler Principles Lecture 12: Register Allocation. Roman Manevich Ben-Gurion University

CSE 401/M501 Compilers

Chapter 2 A Quick Tour

Advanced C Programming

CSE 582 Autumn 2002 Exam 11/26/02

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

CS 432 Fall Mike Lam, Professor. Code Generation

7 Translation to Intermediate Code

Compilers and Interpreters

MODULE 32 DAG based Code Generation and Dynamic Programming

6.035 Project 3: Unoptimized Code Generation. Jason Ansel MIT - CSAIL

CSE 401 Final Exam. March 14, 2017 Happy π Day! (3/14) This exam is closed book, closed notes, closed electronics, closed neighbors, open mind,...

CSE 413 Winter 2001 Final Exam Sample Solution

CSE P 501 Exam 8/5/04 Sample Solution. 1. (10 points) Write a regular expression or regular expressions that generate the following sets of strings.

Review. Pat Morin COMP 3002

COMPILER CONSTRUCTION Seminar 03 TDDB

register allocation saves energy register allocation reduces memory accesses.

The Software Stack: From Assembly Language to Machine Code

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412

Intermediate Code Generation (ICG)

Code Shape II Expressions & Assignment

Intermediate Code & Local Optimizations. Lecture 20

Overview of a Compiler

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CSE 582 Autumn 2002 Exam Sample Solution

Static Semantics. Winter /3/ Hal Perkins & UW CSE I-1

Compiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine.

The structure of a compiler

Chapter 2A Instructions: Language of the Computer

Lecture 21 CIS 341: COMPILERS

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Alternatives for semantic processing

Compilers CS S-08 Code Generation

Compiler construction in4303 lecture 9

Register allocation. TDT4205 Lecture 31

Compilers and Code Optimization EDOARDO FUSELLA

Code Generation. Lecture 30

More Code Generation and Optimization. Pat Morin COMP 3002

CS 132 Compiler Construction

Compiler Optimization Techniques

Variables vs. Registers/Memory. Simple Approach. Register Allocation. Interference Graph. Register Allocation Algorithm CS412/CS413

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

An Overview to Compiler Design. 2008/2/14 \course\cpeg421-08s\topic-1a.ppt 1

CS153: Compilers Lecture 15: Local Optimization

Code Merge. Flow Analysis. bookkeeping

Itree Stmts and Exprs. Back-End Code Generation. Summary: IR -> Machine Code. Side-Effects

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

High-level View of a Compiler

Compiler Code Generation COMP360

Code Generation. Lecture 31 (courtesy R. Bodik) CS164 Lecture14 Fall2004 1

CA Compiler Construction

CSE P 501 Exam 11/17/05 Sample Solution

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Lecture Outline. Code Generation. Lecture 30. Example of a Stack Machine Program. Stack Machines

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Register Allocation. CS 502 Lecture 14 11/25/08

CSE 401/M501 Compilers

Compiling Techniques

Code Generation. The Main Idea of Today s Lecture. We can emit stack-machine-style code for expressions via recursion. Lecture Outline.

Front End. Hwansoo Han

Transcription:

Agenda CSE P 501 Compilers Instruction Selection Hal Perkins Autumn 2005 Compiler back-end organization Low-level intermediate representations Trees Linear Instruction selection algorithms Tree pattern matching Peephole matching Credits: Much of this material is adapted from slides by Keith Cooper (Rice) and material in Appel s Modern Compiler Implementation in Java 11/22/2005 2002-05 Hal Perkins & UW CSE N-1 11/22/2005 2002-05 Hal Perkins & UW CSE N-2 Compiler Organization Big Picture scan parse semantics opt1 opt2 optn instr. select isntr. sched reg. alloc front end middle back end infrastructure symbol tables, trees, graphs, etc Compiler consists of lots of fast stuff followed by hard problems Scanner: O(n) Parser: O(n) Analysis & Optimization: ~ O(n log n) Instruction selection: fast or NP-Complete Instruction scheduling: NP-Complete Register allocation: NP-Complete 11/22/2005 2002-05 Hal Perkins & UW CSE N-3 11/22/2005 2002-05 Hal Perkins & UW CSE N-4 Intermediate Representations Tree or linear? Closer to source language or machine? Source language: more context for high-level optimizations Machine: exposes opportunities for low-level optimizations and easier to map to actual code Common strategy Initial IR is AST, close to source After some optimizations, transform to lower-level IR, either tree or linear; use this to optimize further and generate code IR for Code Generation Assume a low-level RISC-like IR 3 address, register-register instructions load/store r1 <- r2 op r3 Could be tree structure or linear Expose as much detail as possible Assume enough registers Invent new temporaries for intermediate results Map to actual registers later 11/22/2005 2002-05 Hal Perkins & UW CSE N-5 11/22/2005 2002-05 Hal Perkins & UW CSE N-6 CSE P 501 Au05 N-1

Overview Instruction Selection Map IR into assembly code Assume known storage layout and code shape i.e., the optimization phases have already done their thing Combine low-level IR operations into machine instructions (addressing modes, etc.) Overview Instruction Scheduling Reorder operations to hide latencies processor function units; memory/cache Originally invented for supercomputers (1960s) Now important for consumer machines Even non-risc machines, i.e., x86 Assume fixed program 11/22/2005 2002-05 Hal Perkins & UW CSE N-7 11/22/2005 2002-05 Hal Perkins & UW CSE N-8 Overview Register Allocation Map values to actual registers Previous phases change need for registers Add code to spill values to temporaries as needed, etc. How Hard? Instruction selection Can make locally optimal choices Global is undoubtedly NP-Complete Instruction scheduling Single basic block quick heuristics General problem NP Complete Register allocation Single basic block, no spilling, interchangeable registers linear General NP Complete 11/22/2005 2002-05 Hal Perkins & UW CSE N-9 11/22/2005 2002-05 Hal Perkins & UW CSE N-10 Conventional Wisdom We probably lose little by solving these independently Instruction selection Use some form of pattern matching Assume enough registers Instruction scheduling Within a block, list scheduling is close to optimal Across blocks: build framework to apply list scheduling Register allocation Start with virtual registers and map enough to K Targeting, use good priority heuristic An Simple Low-Level IR (1) Source: Appel, Modern Compiler Implementation Details not important for our purposes; point is to get a feeling for the level of detail involved Expressions (i) integer constant i TEMP(t) temporary t (i.e., register) BINOP(op,e1,e2) application of op to e1,e2 MEM(e) contents of memory at address e Means value when used in an expression Means address when used on left side of assignment CALL(f,args) application of function f to argument list args 11/22/2005 2002-05 Hal Perkins & UW CSE N-11 11/22/2005 2002-05 Hal Perkins & UW CSE N-12 CSE P 501 Au05 N-2

Simple Low-Level IR (2) Statements MOVE(TEMP t, e) evaluate e and store in temporary t MOVE(MEM(e1), e2) evaluate e1 to yield address a; evaluate e2 and store at a EXP(e) evaluate expressions e and discard result SEQ(s1,s2) execute s1 followed by s2 NAME(n) assembly language label n JUMP(e) jump to e, which can be a NAME label, or more compex (e.g., switch) CJUMP(op,e1,e2,t,f) evaluate e1 op e2; if true jump to label t, otherwise jump to f LABEL(n) defines location of label n in the code Low-Level IR Example (1) For a local variable at a known offset k from the frame pointer fp Linear MEM(BINOP(PLUS, TEMP fp, k)) Tree MEM TEMP fp k 11/22/2005 2002-05 Hal Perkins & UW CSE N-13 11/22/2005 2002-05 Hal Perkins & UW CSE N-14 Low-Level IR Example (2) For an array element e(k), where each element takes up w storage locations MEM MEM * e k w Generating Low-Level IR Assuming initial IR is an AST, a simple treewalk can be used to generate the low-level IR Can be done before, during, or after optimizations in the middle part of the compiler Create registers (temporaries) for values and intermediate results Value can be safely allocated in a register when only 1 name can reference it Trouble: pointers, arrays, reference parameters Assign a virtual register to anything that can go into one Generate loads/stores for other values 11/22/2005 2002-05 Hal Perkins & UW CSE N-15 11/22/2005 2002-05 Hal Perkins & UW CSE N-16 Instruction Selection Issues Given the low-level IR, there are many possible code sequences that implement it correctly e.g. to set eax to 0 on x86 mov eax,0 xor eax,eax sub eax,eax imul eax,0 Many machine instructions do several things at once e.g., register arithmetic and effective address calculation Instruction Selection Criteria Several possibilities Fastest Smallest Minimize power consumption Sometimes not obvious e.g., if one of the function units in the processor is idle and we can select an instruction that uses that unit, it effectively executes for free, even if that instruction wouldn t be chosen normally (Some interaction with scheduling here ) 11/22/2005 2002-05 Hal Perkins & UW CSE N-17 11/22/2005 2002-05 Hal Perkins & UW CSE N-18 CSE P 501 Au05 N-3

Implementation Problem: We need some representation of the target machine instruction set that facilitates code generation Idea: Describe machine instructions in same low-level IR used for program Use pattern matching techniques to pick machine instructions that match fragments of the program IR tree Want this to run quickly Would like to automate as much as possible Matching: How? Tree IR pattern match on trees Tree patterns as input Each pattern maps to target machine instruction (or sequence) Use dynamic programming or bottom-up rewrite system (BURS) Linear IR some sort of string matching Strings as input Each string maps to target machine instruction sequence Use text matching or peephole matching Both work well in practice; actual algorithms are quite different 11/22/2005 2002-05 Hal Perkins & UW CSE N-19 11/22/2005 2002-05 Hal Perkins & UW CSE N-20 An Example Target Machine (1) Also from Appel An Example Target Machine (2) Arithmetic Instructions (unnamed) ri TEMP ADD ri <- rj rk Immediate Instructons ADDI ri <- rj c MUL ri <- rj * rk SUB and DIV are similar * SUBI ri <- rj - c - 11/22/2005 2002-05 Hal Perkins & UW CSE N-21 11/22/2005 2002-05 Hal Perkins & UW CSE N-22 An Example Target Machine (3) Load LOAD ri <- M[rj c] MEM MEM MEM MEM An Example Target Machine (4) Store STORE M[rj c] <- ri MOVE MOVE MOVE MOVE MEM MEM MEM MEM 11/22/2005 2002-05 Hal Perkins & UW CSE N-23 11/22/2005 2002-05 Hal Perkins & UW CSE N-24 CSE P 501 Au05 N-4

Tree Pattern Matching (1) Goal: Tile the low-level tree with operation trees A tiling is a collection of <node,op> pairs node is a node in the tree op is an operation tree <node,op> means that op could implement the subtree at node Tree Pattern Matching (2) A tiling implements a tree if it covers every node in the tree and the overlap between any two tiles (trees) is limited to a single node If <node,op> is in the tiling, then node is also covered by a leaf in another operation tree in the tiling unless it is the root Where two operation trees meet, they must be compatible (i.e., expect the same value in the same location) 11/22/2005 2002-05 Hal Perkins & UW CSE N-25 11/22/2005 2002-05 Hal Perkins & UW CSE N-26 Generating Code Given a tiled tree, to generate code Postorder treewalk; node-dependant order for children Emit code sequences corresponding to tiles in order Connect tiles by using same register name to tie boundaries together Tiling Algorithm There may be many tiles that could match at a particular node Idea: Walk the tree and accumulate the set of all possible tiles that could match at that point Tiles(n) Later: can keep lowest cost match at each point Generates local optimality lowest cost match at each point 11/22/2005 2002-05 Hal Perkins & UW CSE N-27 11/22/2005 2002-05 Hal Perkins & UW CSE N-28 Tile(Node n) Peephole Matching Tiles(n) <- empty; if n has two children then Tile(left child of n) Tile(right child of n) for each rule r that implements n if (left(r) is in Tiles(left(n)) and right(r) is in Tiles(right(n))) Tiles(n) <- Tiles(n) r else if n has one child then Tile(child of n) for each rule r that implements n if(left(r) is in Tiles(child(n))) Tiles(n) <- Tiles(n) r else /* n is a leaf */ Tiles(n) <- { all rules that implement n } A code generaton/improvement strategy for linear representations Basic idea Look at small sequences of adjacent operations Compiler moves a sliding window ( peephole ) over the code and looks for improvements 11/22/2005 2002-05 Hal Perkins & UW CSE N-29 11/22/2005 2002-05 Hal Perkins & UW CSE N-30 CSE P 501 Au05 N-5

Peephole Optimizations (1) Classic example: store followed by a load, or push followed by a pop original improved mov [ebp-8],eax mov [ebp-8],eax mov eax,[ebp-8] Peephole Optimizations (2) Simple algebraic identies original improved add eax,0 --- add eax,1 inc eax push eax --- pop eax mul eax,2 mul eax,4 add eax,eax shl eax,2 11/22/2005 2002-05 Hal Perkins & UW CSE N-31 11/22/2005 2002-05 Hal Perkins & UW CSE N-32 Peephole Optimizations (3) Implementing Peephole Matching Jump to a Jump original jmp here here: jmp there improved jmp there Early versions Limited set of hand-coded pattern Modest window size to ensure speed Modern Break problem in to expander, simplifier, matcher Apply symbolic interpretation and simplification systematically 11/22/2005 2002-05 Hal Perkins & UW CSE N-33 11/22/2005 2002-05 Hal Perkins & UW CSE N-34 Expander Turn IR code into very low-level IR (LLIR) Template-driven rewriting LLIR includes all direct effects of instructions, e.g., setting condition codes Big, although constant size expansion Simplifier Look at LLIR through window and rewrite using Forward substitution Algebraic simplification Local constant propagation Eliminate dead code This is the heart of the processing 11/22/2005 2002-05 Hal Perkins & UW CSE N-35 11/22/2005 2002-05 Hal Perkins & UW CSE N-36 CSE P 501 Au05 N-6

Matcher Compare simplified LLIR against library of patterns Pick low-cost pattern that captures effects Must preserve LLIR effects, can add new ones (condition codes, etc.) Generates assembly code output Peephole Optimization Considered LLIR is largely machine independent (RTL) Target machine description is LLIR -> ASM patterns Pattern matching Use hand-coded matcher (classical gcc) Turn patterns into grammar and use LR parser Used in several important compilers Seems to produce good portable instruction selectors 11/22/2005 2002-05 Hal Perkins & UW CSE N-37 11/22/2005 2002-05 Hal Perkins & UW CSE N-38 Coming Attractions Instruction Scheduling Register Allocation Survey of Optimization Survey of new technologies Memory management & garbage collection Virtual machines, portability, and security 11/22/2005 2002-05 Hal Perkins & UW CSE N-39 CSE P 501 Au05 N-7