Intermediate Code, Object Representation, Type-Based Optimization

Similar documents
Data-Flow Analysis Foundations

Intermediate Code Generation

Efficient Separate Compilation of Object-Oriented Languages

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

2. Reachability in garbage collection is just an approximation of garbage.

Tiny Compiler: Back End

Efficient Separate Compilation of Object-Oriented Languages

CS 360 Programming Languages Interpreters

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

IC Language Specification

Dynamic Languages. CSE 501 Spring 15. With materials adopted from John Mitchell

IR Lowering. Notation. Lowering Methodology. Nested Expressions. Nested Statements CS412/CS413. Introduction to Compilers Tim Teitelbaum

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 19: Efficient IL Lowering 5 March 08

Compiler Internals. Reminders. Course infrastructure. Registering for the course

Today's Topics. CISC 458 Winter J.R. Cordy

Intermediate Formats. for object oriented languages

Intermediate Code Generation

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam

Structure of Programming Languages Lecture 10

Problem Score Max Score 1 Syntax directed translation & type

Lecture 31: Building a Runnable Program

(Not Quite) Minijava

Interprocedural Analysis with Data-Dependent Calls. Circularity dilemma. A solution: optimistic iterative analysis. Example

CS558 Programming Languages Winter 2013 Lecture 8

CSCE 531, Spring 2015 Final Exam Answer Key

Lecture08: Scope and Lexical Address

CSE413: Programming Languages and Implementation Racket structs Implementing languages with interpreters Implementing closures

Compiler Construction Lent Term 2015 Lectures 10, 11 (of 16)

Short Notes of CS201

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

COS 320. Compiling Techniques

CSE 504. Expression evaluation. Expression Evaluation, Runtime Environments. One possible semantics: Problem:

Intermediate Formats. for object oriented languages

CS201 - Introduction to Programming Glossary By

19 Much that I bound, I could not free; Much that I freed returned to me. Lee Wilson Dodd

CSE 401/M501 Compilers

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Intermediate Formats. Martin Rinard Laboratory for Computer Science

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

CSE 401 Final Exam. March 14, 2017 Happy π Day! (3/14) This exam is closed book, closed notes, closed electronics, closed neighbors, open mind,...

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

Programming Language Processor Theory

UNIT IV INTERMEDIATE CODE GENERATION

Tour of common optimizations

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Writing Evaluators MIF08. Laure Gonnord

CS250 Intro to CS II. Spring CS250 - Intro to CS II 1

Computer Science II (20082) Week 1: Review and Inheritance

Intermediate Code Generation

Inheritance and Interfaces

IR Optimization. May 15th, Tuesday, May 14, 13

2. The object-oriented paradigm!

Compilers Crash Course

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Autumn /20/ Hal Perkins & UW CSE H-1

What about Object-Oriented Languages?

Announcements. Written Assignment 2 Due Monday at 5:00PM. Midterm next Wednesday in class, 11:00 1:00. Midterm review session next Monday in class.

Compiler Construction

CA Compiler Construction

We ve written these as a grammar, but the grammar also stands for an abstract syntax tree representation of the IR.

Compiler Construction

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

CMSC430 Spring 2014 Midterm 2 Solutions

CS-XXX: Graduate Programming Languages. Lecture 9 Simply Typed Lambda Calculus. Dan Grossman 2012

CSE351 Winter 2016, Final Examination March 16, 2016

Compilers CS S-05 Semantic Analysis

Compiler Code Generation COMP360

Where we are. What makes a good IR? Intermediate Code. CS 4120 Introduction to Compilers

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

1 Lexical Considerations

An Introduction to Object Orientation

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

G Programming Languages - Fall 2012

Compiler Construction

Compiler Construction

Three-Address Code IR

Arrays Classes & Methods, Inheritance

Static Checking and Intermediate Code Generation Pat Morin COMP 3002

Topic 7: Intermediate Representations

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

Compiler Construction

The Procedure Abstraction

COMP 181. Agenda. Midterm topics. Today: type checking. Purpose of types. Type errors. Type checking

LECTURE 3. Compiler Phases

In fact, as your program grows, you might imagine it organized by class and superclass, creating a kind of giant tree structure. At the base is the

Intermediate representation

Optimising Functional Programming Languages. Max Bolingbroke, Cambridge University CPRG Lectures 2010

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

In this chapter you ll learn:

CSE341: Programming Languages Lecture 17 Implementing Languages Including Closures. Dan Grossman Autumn 2018

CSCI565 Compiler Design

CSE 401/M501 Compilers

Languages and Compiler Design II IR Code Generation I

CSE 504: Compiler Design. Runtime Environments

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210


Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

CS153: Compilers Lecture 15: Local Optimization

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

Programming Languages Third Edition. Chapter 10 Control II Procedures and Environments

Transcription:

CS 301 Spring 2016 Meetings March 14 Intermediate Code, Object Representation, Type-Based Optimization Plan Source Program Lexical Syntax Semantic Intermediate Code Generation Machine- Independent Optimization Code Generation Target Program This week brings our departure from the front end of the compiler as we start to consider translating to intermediate code and implementing objects. The first topic is intermediate code generation, the compiler phase that converts the high-level, hierarchical AST representation into a low-level, flat three-address-code representation. This lowering or flattening brings our program representation closer to a realistic machine model and provides a convenient platform for optimization and machine code generation, our topics for the next few weeks. The second topic is object representation: how to implement the run-time behavior of IC objects, including storage of fields and dispatch of method calls. Finally, we consider two optimization techniques that exploit realistic use of inheritance and polymorphism in object-oriented languages, statically (based on the class hierarchy) and dynamically (based on observed run-time behavior). These techniques are key in many modern language implementations. There are several small readings on intermediate code and object representation, plus two classic research papers on type-based optimizations. Readings TAC specification, https://cs.wellesley.edu/~cs301/project/tac.pdf Dragon 6.2 6.2.1 EC 5.3 (Skim as needed) EC 6.3.3-6.3.4 Appel, Modern Compiler Implementation in Java, 14.1-14.2, 14.6-14.7. Available in bookcase outside my door. Optimization of Object-Oriented Programs Using Static Class Hierarchy, Jeffrey Dean, David Grove, and Craig Chambers, ECOOP 1995. Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches, Urs Hölzle, Craig Chambers, David Ungar, ECOOP 1991. Exercises 1. (Optional) As you approach the translation of AST to three-address code (TAC), you may find it helpful to revisit the Tiny compiler back end as a small-scale example. https://cs.wellesley.edu/~cs301/labs.html#tiny-backend 1

2. This problem explores how to translate a program represented as an AST into three-address code (TAC), an intermediate representation of programs that we will use for optimization and translation to machine code. A specification of the TAC instruction set for this question is on the web site project page. As an example, consider the following while loop and its translation: n = 0; while (n < 10) { n = n + 1; n = 0 label test t1 = n < 10 t2 = not t1 cjump t2 end label body n = n + 1 jump test label end To leave the AST behind and move toward optimization and code generation, the compiler will translate programs to TAC. Here, we develop a definition of this step as a syntax-directed translation, meaning we give a function from syntactic forms of the source language to semantically sequences of TAC instructions. Your next project phase after spring break will start with a more concrete implementation of this translation, working from your AST. Note that the operands of each TAC instruction are either program variable names (e.g., n), temporary variable names introduced during translation (e.g., t2, t3), or constants (e.g., 0, 10, 1). Branch instructions refer to label names generated during the translation. Below, the function T defines a translation such that T [s] is an equivalent TAC representation for the high-level source statement s. T [e] does the same for an expression e. When translating expressions, e, use t := T [e] to denote the series of instructions to compute e, concluding by storing the result of e into temporary variable t. Translate expressions recursively. To translate an expression with subexpressions, first translate the subexpressions, then generate code to combine their results according to the top-level expression. For example, t := T [e 1 + e 2 ] would be: t1 := T [e 1 ] t2 := T [e 2 ] t = t1 + t2 The first two lines recursively translate e 1 and e 2 and store their results in new temporary variables t1 and t2, which are then added together and stored in t. Here are a few other general cases: e t := T [e] (description) v t := v (variable) n t := n (integer) e 1.f t1 := T [e 1 ] (field access) t = t1.f e 1 [e 2 ] = e 3 t1 := T [e 1 ] (array assignment) t2 := T [e 2 ] t3 := T [e 3 ] t1[t2] := t3 Generate new temporary names whenever necessary. For more complex expressions, apply rules recursively. T [ a[i] = x y + 1 ] becomes: 2

t1 := T [a] t2 := T [i] t3 := T [x y + 1] t4 := T [x y] t5 := T [1] t6 := T [x] t7 := T [y] t5 = 1 t6 = x t7 = y t5 = 1 Translation of statements follows the same pattern. T [ while ( e ) s ] becomes: (a) Define T for the following syntactic forms: label test t1 := T [e] t2 = not t1 cjump t2 end T [s] jump test label end t := T [e 1 * e 2 ] t := T [e 1 e 2 ] (where is short-circuited) T [ if ( e ) s 1 else s 2 ] T [ s 1 ; s 2 ;...; s n ] t := T [ f(e 1,..., e n ) ] (b) These translation rules introduce more copy instructions than strictly necessary. For example, t4 := T [ x * y ] becomes instead of the single statement t6 = x t7 = y t4 = x * y Describe how you would change your translation function to avoid generating these unnecessary copy statements. (c) The original rules also use more unique temporary variables than required, even after changing them to avoid the unnecessary copy instructions. For example, T [ x = x*x+1; y = y*y-z*z; z = (x+y+w)*(y+z+w)] becomes the following: t1 = x * x t2 = t1 + 1 x = t2 t3 = y * y t4 = z * z y = t3 - t4 t5 = x + y t6 = t5 + w t7 = y + z t8 = t7 + w z = t6 * t8 Rewrite this to use as few temporaries as possible. Generalizing from this example, how would you change T to avoid using more temporaries than necessary. 3

(d) Eliminating unnecessary variables here may seem like a good idea, but there are enough downsides that we will avoid it in our implementation. What are some reasons to stick with original, more verbose translation (for both or either of questions 2b and 2c)? (e) (Optional) Translation of some constructs, such as nested if statements, while loops, shortcircuit and/or statements may generate adjacent labels in the TAC. This is less than ideal, since the labels are clearly redundant and only one of them is needed. Illustrate an example where this occurs, and devise a scheme for generating TAC that does not generate consecutive labels. 3. Draw the run-time structures for the IC program given below, following the style of EC 6.3.4. Show information about each class, including the method vector and superclass pointer, but omitting the class pointer and classmethods vector. (IC does not support classes as objects themselves or static methods.) The representation of each object allocated by the sample program, including its class pointer, method vector pointer (point to class s method vector, not a copy as in EC), and fields. Draw only data and metadata. Omit representation of code itself (or abstract as a box). class A { int x; void setx(int x) { this.x = x; int f() { return this.x + 73; int g() { return f() + 2; class B extends A { string s; void sets(string s) { this.s = s; int f() { Library.println(s); return x + 12; void main(string[] args) { A a = new A(); B b = new B(); a.setx(1); b.setx(2); b.sets("yum. "); Library.printi(a.g()); Library.printi(b.g()); 4. Give the series of lower-level steps taken to execute the main method using basic operations on the run-time structures from exercise 3. (You do not need to write these steps down, but be prepared to walk me through them.) 4

5. Exercise 14.3 from Appel, Modern Compiler Implementation in Java, (see above). This exercise (and the next) use ideas from Optimization of Object-Oriented Programs Using Static Class Hierarchy, which Appel summarizes. 6. Exercise 14.4 from Appel, Modern Compiler Implementation in Java, (see above). 7. Read Optimization of Object-Oriented Programs Using Static Class Hierarchy. This paper introduces many of the ideas that power the preceding two exercises. Read and consider: For what kinds of programs is CHA particularly effective or ineffective? What features or idioms does it affect? What are the main limitations of CHA? Section 2.2.3 discusses how CHA can be used for dynamically-typed languages like Smalltalk (or Ruby). Would these techniques be useful for Java or IC? 8. Read Optimizing Dynamically-Typed Object-Oriented Languages with Polymorphic Inline Caches. What does this optimization do? When are PICs most or least effective? What features or idioms do they target? Compare PICs with CHA for dynamically-typed languages. What is the same? What is different? Self, like JavaScript, is a classless object-oriented language that relies on prototypes to define object behavior. This is not the same as a dynamically-typed class-based language. Prototypes essentially give every object its own method vector, which can be arranged at object allocation time by the programmer. Distinct objects may share the same prototype, giving them the same behavior, or include some of the same methods, but there are no classes or inheritance that enforce a strict hierarchy as in Java or IC. PIC was developed for such a language (Self). Naturally, with no class hierarchy, CHA does not apply directly. Can any of the techniques from CHA be adapted for use with prototypes? 5