CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Similar documents
CSE 401/M501 Compilers

Intermediate Representations

Intermediate Representations

Intermediate representation

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Intermediate Representation (IR)

Intermediate Representations

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CS415 Compilers. Intermediate Represeation & Code Generation

Alternatives for semantic processing

Intermediate Code Generation

opt. front end produce an intermediate representation optimizer transforms the code in IR form into an back end transforms the code in IR form into

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Intermediate Representations & Symbol Tables

Front End. Hwansoo Han

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

CS 406/534 Compiler Construction Putting It All Together

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

Principles of Compiler Design

Intermediate Representations

Intermediate Representations

CODE GENERATION Monday, May 31, 2010

Intermediate Representations Part II

Agenda. CSE P 501 Compilers. Big Picture. Compiler Organization. Intermediate Representations. IR for Code Generation. CSE P 501 Au05 N-1

CS5363 Final Review. cs5363 1

COMP 181. Prelude. Intermediate representations. Today. High-level IR. Types of IRs. Intermediate representations and code generation

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Autumn /20/ Hal Perkins & UW CSE H-1

Intermediate Code & Local Optimizations

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

Principles of Compiler Design

Introduction to Compiler

6. Intermediate Representation

Compiler Design. Code Shaping. Hwansoo Han

6. Intermediate Representation!

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Syntax-directed translation. Context-sensitive analysis. What context-sensitive questions might the compiler ask?

CSE 401/M501 Compilers

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Winter /22/ Hal Perkins & UW CSE H-1

Where we are. What makes a good IR? Intermediate Code. CS 4120 Introduction to Compilers

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

Intermediate Representations

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Static Semantics. Winter /3/ Hal Perkins & UW CSE I-1

CS606- compiler instruction Solved MCQS From Midterm Papers

CSc 453. Compilers and Systems Software. 13 : Intermediate Code I. Department of Computer Science University of Arizona

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

CSE 401/M501 Compilers

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter UW CSE P 501 Winter 2016 C-1

Compiling Regular Expressions COMP360

Compiling Techniques

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Code Shape II Expressions & Assignment

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

Tour of common optimizations

CS 406/534 Compiler Construction Intermediate Representation and Procedure Abstraction

Implementing Control Flow Constructs Comp 412

CSE 401 Final Exam. March 14, 2017 Happy π Day! (3/14) This exam is closed book, closed notes, closed electronics, closed neighbors, open mind,...

Compilers. Lecture 2 Overview. (original slides by Sam

Building a Compiler with. JoeQ. Outline of this lecture. Building a compiler: what pieces we need? AKA, how to solve Homework 2

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Context-sensitive analysis. Semantic Processing. Alternatives for semantic processing. Context-sensitive analysis

CS 432 Fall Mike Lam, Professor. Code Generation

This test is not formatted for your answers. Submit your answers via to:

Fall Compiler Principles Lecture 6: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

The View from 35,000 Feet

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

Figure 28.1 Position of the Code generator

Code Generation (#')! *+%,-,.)" !"#$%! &$' (#')! 20"3)% +"#3"0- /)$)"0%#"

Semantic actions for declarations and expressions

tokens parser 1. postfix notation, 2. graphical representation (such as syntax tree or dag), 3. three address code

TDDD55 - Compilers and Interpreters Lesson 3

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

A Simple Syntax-Directed Translator

CSE P 501 Compilers. Static Semantics Hal Perkins Winter /22/ Hal Perkins & UW CSE I-1

Compiler Design (40-414)

Lecture 3 Local Optimizations, Intro to SSA

COMPILER DESIGN. For COMPUTER SCIENCE

G Compiler Construction Lecture 12: Code Generation I. Mohamed Zahran (aka Z)

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

LECTURE 3. Compiler Phases

CMPT 379 Compilers. Parse trees

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Compiler Code Generation COMP360

CSE 401/M501 Compilers

Register allocation. TDT4205 Lecture 31

CS 432 Fall Mike Lam, Professor. Compilers. Advanced Systems Elective

CSE 401 Final Exam. December 16, 2010

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

Intermediate Code & Local Optimizations. Lecture 20

IR Optimization. May 15th, Tuesday, May 14, 13

CSc 453 Interpreters & Interpretation

Transcription:

CSE P 501 Compilers Intermediate Representations Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 G-1

Administrivia Semantics/types/symbol table project due ~2 weeks how goes it? Should be caught up on grading and parser sanity checks late this week End-of-quarter schedule Exam will be Thur. 5/24, 6:30-8:00 (both locations) Compiler project final commit/push Sun. 6/3, 11pm (codegen) Compiler short report push by Mon. 6/4, 11pm UW CSE P 501 Spring 2018 G-2

Agenda Survey of Intermediate Representations Graphical Concrete/Abstract Syntax Trees (ASTs) Control Flow Graph Dependence Graph Linear Representations Stack Based 3-Address Several of these will show up as we explore program analysis and optimization UW CSE P 501 Spring 2018 G-3

Compiler Structure (review) tokens Parser IR Semantic Analysis IR (maybe different) Middle (optimization) IR (often different) Scanner Code Gen characters Assembly or binary code Source Target UW CSE P 501 Spring 2018 G-4

Intermediate Representations In most compilers, the parser builds an intermediate representation of the program Typically an AST, as in the MiniJava project Rest of the compiler transforms the IR to improve ( optimize ) it and eventually translate to final target code Typically will transform initial IR to one or more different IRs along the way Some general examples now; more specifics later as needed UW CSE P 501 Spring 2018 G-5

IR Design Decisions affect speed and efficiency of the rest of the compiler General rule: compile time is important, but performance/quality of generated code often more important Typical case for production code: compile a few times, run many times Although the reverse is true during development So make choices that improve compile time as long as they don t compromise the result UW CSE P 501 Spring 2018 G-6

IR Design Desirable properties Easy to generate Easy to manipulate Expressive Appropriate level of abstraction Different tradeoffs depending on compiler goals Different tradeoffs in different parts of the same compiler So often different IRs in different parts UW CSE P 501 Spring 2018 G-7

IR Design Taxonomy Structure Graphical (trees, graphs, etc.) Linear (code for some abstract machine) Hybrids are common (e.g., control-flow graphs whose nodes are basic blocks of linear code) Abstraction Level High-level, near to source language Low-level, closer to machine (exposes more details to compiler) UW CSE P 501 Spring 2018 G-8

Examples: Array Reference source: A[i,j] subscript A i j t1 A[i,j] loadi 1 => r1 sub rj,r1 => r2 loadi 10 => r3 mult r2,r3 => r4 sub ri,r1 => r5 add r4,r5 => r6 loadi @A => r7 add r7,r6 => r8 load r8 => r9 UW CSE P 501 Spring 2018 G-9

Levels of Abstraction Key design decision: how much detail to expose Affects possibility and profitability of various optimizations Depends on compiler phase: some semantic analysis & optimizations are easier with high-level IRs close to the source code. Low-level usually preferred for other optimizations, register allocation, code generation, etc. Structural (graphical) IRs are typically fairly high-level but are also used for low-level Linear IRs are typically low-level But these generalizations don t always hold UW CSE P 501 Spring 2018 G-10

Graphical IRs IR represented as a graph (or tree) Nodes and edges typically reflect some structure of the program E.g., source code, control flow, data dependence May be large (especially syntax trees) High-level examples: syntax trees, DAGs Generally used in early phases of compilers Other examples: control flow graphs and data dependency graphs Often used in optimization and code generation UW CSE P 501 Spring 2018 G-11

Concrete Syntax Trees The full grammar is needed to guide the parser, but contains many extraneous details Chain productions Rules that control precedence and associativity Typically the full syntax tree (parse tree) does not need to be used explicitly, but sometimes we want it (structured source code editors or transformations, ) UW CSE P 501 Spring 2018 G-12

Example assign ::= id = expr ; expr ::= expr + term expr term term term ::= term * factor term / factor factor factor ::= int id ( expr ) Concrete syntax for x = 2*(n+m) UW CSE P 501 Spring 2018 G-13

Abstract Syntax Trees Want only essential structural information Omit extra junk Can be represented explicitly as a tree or in a linear form Example: LISP/Scheme S-expressions are essentially ASTs Common output from parser; used for static semantics (type checking, etc.) and sometimes high-level optimizations UW CSE P 501 Spring 2018 G-14

Example assign ::= id = expr ; expr ::= expr + term expr term term term ::= term * factor term / factor factor factor ::= int id ( expr ) Abstract syntax for x = 2*(n+m) UW CSE P 501 Spring 2018 G-15

DAGs (Directed Acyclic Graphs) Variation on ASTs with shared substructures Pro: saves space, exposes redundant subexpressions Con: less flexibility if part needs to be changed + * * a 2 b UW CSE P 501 Spring 2018 G-16

Linear IRs Pseudo-code for some abstract machine Level of abstraction varies Simple, compact data structures Commonly used: arrays, linked lists Examples: 3-address code, stack machine code t1 2 t2 b t3 t1 * t2 t4 a t5 t4 t3 Fairly compact Compiler can control reuse of names clever choice can reveal optimizations ILOC & similar code push 2 push b multiply push a subtract Each instruction consumes top of stack & pushes result Very compact Easy to create and interpret Java bytecode, MSIL UW CSE P 501 Spring 2018 G-17

Abstraction Levels in Linear IR Linear IRs can also be close to the source language, very low-level, or somewhere in between. Examples: Linear IRs for C array reference a[i][j+2] High-level: t1 a[i,j+2] UW CSE P 501 Spring 2018 G-18

More IRs for a[i][j+2] Medium-level t1 j + 2 t2 i * 20 t3 t1 + t2 t4 4 * t3 t5 addr a t6 t5 + t4 t7 *t6 Low-level r1 [fp-4] r2 r1 + 2 r3 [fp-8] r4 r3 * 20 r5 r4 + r2 r6 4 * r5 r7 fp 216 f1 [r7+r6] UW CSE P 501 Spring 2018 G-19

Abstraction Level Tradeoffs High-level: good for some source-level optimizations, semantic checking, but can t optimize things that are hidden like address arithmetic for array subscripting Low-level: need for good code generation and resource utilization in back end but loses semantic knowledge (e.g., variables, data aggregates, source relationships are usually missing) Medium-level: more detail but keeps more higher-level semantic information great for machine-independent optimizations. Many (all?) optimizing compilers work at this level Many compilers use all 3 in different phases UW CSE P 501 Spring 2018 G-20

Three-Address Code (TAC) Usual form: x y op z One operator Maximum of 3 names (Copes with: nullary x y and unary x op y) Eg: x = 2 * (m + n) becomes t1 m + n; t2 2 * t1; x t2 You may prefer: add t1, m, n; mul t2, 2, t1; mov x, t2 Invent as many new temp names as needed. expression temps don t correspond to any user variables; de-anonymize expressions Store in a quad(ruple) <lhs, rhs1, op, rhs2> UW CSE P 501 Spring 2018 G-21

Three Address Code Advantages Resembles code for actual machines Explicitly names intermediate results Compact Often easy to rearrange Various representations Quadruples, triples, SSA (Static Single Assignment) We will see much more of this UW CSE P 501 Spring 2018 G-22

Stack Machine Code Example Hypothetical code for x = 2 * (m + n) pushaddr x pushconst 2 pushval n pushval m add mult store m n m + n 2 2 2*(m+n) @x @x @x???? Compact: common opcodes just 1 byte wide; instructions have 0 or 1 operand UW CSE P 501 Spring 2018 G-23

Stack Machine Code Originally used for stack-based computers (famous example: B5000, ~1961) Often used for virtual machines: Pascal pcode Forth Java bytecode in a.class files (generated by Java compiler) MSIL in a.dll or.exe assembly (generated by C#/F#/VB compiler) Advantages Compact; mostly 0-address opcodes (fast download over network) Easy to generate; easy to write a front-end compiler, leaving the 'heavy lifting' and optimizations to the JIT Simple to interpret or compile to machine code Disadvantages Somewhat inconvenient/difficult to optimize directly Does not match up with modern chip architectures UW CSE P 501 Spring 2018 G-24

Hybrid IRs Combination of structural and linear Level of abstraction varies Most common example: control-flow graph (CFG) UW CSE P 501 Spring 2018 G-25

Control Flow Graph (CFG) Nodes: basic blocks Edges: represent possible flow of control from one block to another, i.e., possible execution orderings Edge from A to B if B could execute immediately after A in some possible execution Required for much of the analysis done during optimization phases UW CSE P 501 Spring 2018 G-26

Basic Blocks Fundamental concept in analysis/optimization A basic block is: A sequence of code One entry, one exit Always executes as a single unit ( straightline code ) so it can be treated as an indivisible unit We ll ignore exceptions, at least for now Usually represented as some sort of a list although Trees/DAGs are possible UW CSE P 501 Spring 2018 G-27

CFG Example print( hello ); a=7; if (x == y) { print( same ); b = 9; } else { b = 10; } while (a < b) { a++; print( bump ); } print( finis ); print( same ); b = 9; print( hello ); a = 7; if (x == y); while (a < b) print( finis ); b = 10; a++; print( bump ); UW CSE P 501 Spring 2018 G-28

Basic Blocks: Start with Tuples 1 i = 1 2 j = 1 3 t1 = 10 * i 4 t2 = t1 + j 5 t3 = 8 * t2 6 t4 = t3-88 7 a[t4] = 0 8 j = j + 1 9 if j <= 10 goto #3 10 i = i + 1 11 if i <= 10 goto #2 12 i = 1 13 t5 = i - 1 14 t6 = 88 * t5 15 a[t6] = 1 16 i = i + 1 17 if i <= 10 goto #13 Typical "tuple stew" - IR generated by traversing an AST Partition into Basic Blocks: Sequence of consecutive instructions No jumps into the middle of a BB No jumps out of the middles of a BB "I've started, so I'll finish" (Ignore exceptions) UW CSE P 501 Spring 2018 G-29

Basic Blocks: Leaders 1 i = 1 2 j = 1 3 t1 = 10 * i 4 t2 = t1 + j 5 t3 = 8 * t2 6 t4 = t3-88 7 a[t4] = 0 8 j = j + 1 9 if j <= 10 goto #3 10 i = i + 1 11 if i <= 10 goto #2 12 i = 1 13 t5 = i - 1 14 t6 = 88 * t5 15 a[t6] = 1 16 i = i + 1 17 if i <= 10 goto #13 Identify Leaders (first instruction in a basic block): First instruction is a leader Any target of a branch/jump/goto Any instruction immediately after a branch/jump/goto Leaders in red. Why is each leader a leader? UW CSE P 501 Spring 2018 G-30

B1 ENTRY i = 1 Basic Blocks: Flowgraph B2 B3 B4 j = 1 t1 = 10 * i t2 = t1 + j t3 = 8 * t2 t4 = t3-88 a[t4] = 0 j = j + 1 if j <= 10 goto B3 i = i + 1 if i <= 10 goto B2 Control Flow Graph ("CFG", again!) 3 loops total 2 of the loops are nested B5 i = 1 B6 t5 = i - 1 t6 = 88 * t5 a[t6] = 1 i = i + 1 if i <= 10 goto B6 Most of the executions likely spent in loop bodies; that's where to focus efforts at optimization EXIT UW CSE P 501 Spring 2018 G-31

Identifying Basic Blocks: Recap Perform linear scan of instruction stream A basic blocks begins at each instruction that is: The beginning of a method The target of a branch Immediately follows a branch or return UW CSE P 501 Spring 2018 G-32

Dependency Graphs Often used in conjunction with another IR Data dependency: edges between nodes that reference common data Examples Block A defines x then B reads it (RAW read after write) Block A reads x then B writes it (WAR antidependence) Blocks A and B both write x (WAW) order of blocks must reflect original program semantics These restrict reorderings the compiler can do UW CSE P 501 Spring 2018 G-33

What IR to Use? Common choice: all(!) AST used in early stages of the compiler Closer to source code Good for semantic analysis Facilitates some higher-level optimizations Lower to linear IR for optimization and codegen Closer to machine code Use to build control-flow graph Exposes machine-related optimizations Hybrid (graph + linear IR = CFG) for dataflow & opt UW CSE P 501 Spring 2018 G-34

Coming Attractions Survey of compiler optimizations Analysis and transformation algorithms for optimizations (including SSA IR) Back-end organization in production compilers Instruction selection and scheduling, register allocation Other topics depending on time UW CSE P 501 Spring 2018 G-35