Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

Similar documents
Where we are. What makes a good IR? Intermediate Code. CS 4120 Introduction to Compilers

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

CSc 453 Interpreters & Interpretation

Intermediate Representations

Intermediate representation

Project Compiler. CS031 TA Help Session November 28, 2011

Building a Compiler with. JoeQ. Outline of this lecture. Building a compiler: what pieces we need? AKA, how to solve Homework 2

Administration. Where we are. Canonical form. Canonical form. One SEQ node. CS 412 Introduction to Compilers

CSE 401/M501 Compilers

Principles of Compiler Design

IR Lowering. Notation. Lowering Methodology. Nested Expressions. Nested Statements CS412/CS413. Introduction to Compilers Tim Teitelbaum

CSc 553. Principles of Compilation. 2 : Interpreters. Department of Computer Science University of Arizona

CMPSC 160 Translation of Programming Languages. Three-Address Code

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 19: Efficient IL Lowering 5 March 08

Run-time Program Management. Hwansoo Han

CSc 520 Principles of Programming Languages

Running class Timing on Java HotSpot VM, 1

JVML Instruction Set. How to get more than 256 local variables! Method Calls. Example. Method Calls

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

CSc 453. Compilers and Systems Software. 18 : Interpreters. Department of Computer Science University of Arizona. Kinds of Interpreters

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

Where we are. Instruction selection. Abstract Assembly. CS 4120 Introduction to Compilers

Compiler construction 2009

Compiling Techniques

Compiler construction 2009

Intermediate Representation (IR)

SSA Based Mobile Code: Construction and Empirical Evaluation

Soot A Java Bytecode Optimization Framework. Sable Research Group School of Computer Science McGill University

Code Generation. Lecture 30

Alternatives for semantic processing

Compilers Crash Course

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

JVM. What This Topic is About. Course Overview. Recap: Interpretive Compilers. Abstract Machines. Abstract Machines. Class Files and Class File Format

Intermediate Representations

High-Level Language VMs

CS415 Compilers. Intermediate Represeation & Code Generation

Lecture Outline. Code Generation. Lecture 30. Example of a Stack Machine Program. Stack Machines

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CS321 Languages and Compiler Design I. Winter 2012 Lecture 1

Lecture 31: Building a Runnable Program

Code Generation. The Main Idea of Today s Lecture. We can emit stack-machine-style code for expressions via recursion. Lecture Outline.

We can emit stack-machine-style code for expressions via recursion

Intermediate Representations

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

55:132/22C:160, HPCA Spring 2011

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization.

Outline. Java Models for variables Types and type checking, type safety Interpretation vs. compilation. Reasoning about code. CSCI 2600 Spring

The role of semantic analysis in a compiler

Announcements. Working on requirements this week Work on design, implementation. Types. Lecture 17 CS 169. Outline. Java Types

We ve written these as a grammar, but the grammar also stands for an abstract syntax tree representation of the IR.

CSE 341: Programming Languages

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Autumn /20/ Hal Perkins & UW CSE H-1

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 16

Intermediate Code Generation

Lecture 8 CS 412/413 Spring '00 -- Andrew Myers 2. Lecture 8 CS 412/413 Spring '00 -- Andrew Myers 4

COMP 181. Prelude. Intermediate representations. Today. High-level IR. Types of IRs. Intermediate representations and code generation

Intermediate Representations

The View from 35,000 Feet

Structure of Programming Languages Lecture 10

Azul Systems, Inc.

02 B The Java Virtual Machine

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Java TM. Multi-Dispatch in the. Virtual Machine: Design and Implementation. Computing Science University of Saskatchewan

Today. Instance Method Dispatch. Instance Method Dispatch. Instance Method Dispatch 11/29/11. today. last time

CSc 453. Compilers and Systems Software. 13 : Intermediate Code I. Department of Computer Science University of Arizona

Just-In-Time Compilation

Java Internals. Frank Yellin Tim Lindholm JavaSoft

Usually, target code is semantically equivalent to source code, but not always!

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Winter /22/ Hal Perkins & UW CSE H-1

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

Compiler Code Generation COMP360

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

SOFTWARE ARCHITECTURE 7. JAVA VIRTUAL MACHINE

Intermediate Code & Local Optimizations

Lecture #16: Introduction to Runtime Organization. Last modified: Fri Mar 19 00:17: CS164: Lecture #16 1

! Broaden your language horizons! Different programming languages! Different language features and tradeoffs. ! Study how languages are implemented

Type Inference. Prof. Clarkson Fall Today s music: Cool, Calm, and Collected by The Rolling Stones

Semantic Analysis. Lecture 9. February 7, 2018

Undergraduate Compilers in a Day

Lecture #19: Code Generation

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1

Intermediate Languages and Machine Languages. Lecture #31: Code Generation. Stack Machine with Accumulator. Stack Machines as Virtual Machines

Intermediate Representations & Symbol Tables

CS251 Programming Languages Handout # 29 Prof. Lyn Turbak March 7, 2007 Wellesley College

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

COS 320. Compiling Techniques

Lecture #31: Code Generation

Java byte code verification

Tizen/Artik IoT Lecture Chapter 3. JerryScript Parser & VM

The Java Virtual Machine

Compilers and Interpreters

Transcription:

CS 412/413 Introduction to Compilers and Translators Andrew Myers Cornell University Administration Design reports due Friday Current demo schedule on web page send mail with preferred times if you haven t signed up yet keep on eye on the schedule! Lecture 38: Compilation strategies 3 May 00 CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 2 Why build a compiler? You can design your own programming language Domain-specific languages can be designed for problems being solved Code is shorter, easier to maintain: language has the right concepts baked in Faster: can use optimize using special knowledge of language semantics This lecture: how to make it a little easier CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 3 Compiler front end back end Compilers Scanning Parsing Type checking Intermediate Code Generation MIR Optimization Code generation LIR optimization Linking & Loading source-tosource translator interpreter Java bytecode compiler JIT compiler CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 4 Architectural independence Source-to-source translator: compile from source to another high-level language (e.g. C), let other compiler deal with code gen, etc. Compile from source to an intermediate code format for which a back end already exists (ucode, RTF, LCC,...) Compile from source to an executable intermediate code format, interpret: abstract syntax tree bytecodes (stack or register machine) threaded code CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 5 Source-to-source translator Idea: choose well-supported high-level language (e.g. C, C++, Java) as target Translate AST to high-level language constructs instead of to IR, pass translated code off to underlying compiler Advantage: easy, can leverage good underlying compiler technology. Examples: C++ (to C), PolyJ (to Java), Toba (JVM to C) Disadvantages: target language won t support all features, optimization harder in target language, language may impose extra checks CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 6 1

Compiling to C C doesn t impose extra checks, is reasonably close to assembly, widely available (but can t support static exception tables Mismatch: no statements underneath expressions; must translate to canonical form in one step Translation of expression into C (or Java) is: sequence of statements to be executed expression to be evaluated afterward E Ze[ = { s 1 ; ; s n ; } ; e S Zs[ = { s 1 ; ; s n ; } Translation rules Translation still can be performed by recursive traversal of AST Some Iota C rules: assignment block E Z e [ = s ; e E Z id = e [ = s ; id = e E Z s i [ = s i ; e i i 1..n E Z ( s 1 ; ; s n ) [ = { s 1 ; ; s n ; } ; e n CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 7 CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 8 Translating to Java Same problems as C, plus: Java is typesafe (good in a HLL, not so good in an intermediate language!) May need to use casting and instanceof expressions in generated code dynamic type discrimination: slow Back Ends Several standard intermediate code formats exist with back ends for various architectures can reuse back ends p-code: very old stack machine format UCODE: old Stanford/MIPS stack machine format Java bytecode: new stack machine format RTL: GNU gcc, etc. SUIF: Stanford format for optimization LCC: Lightweight C compiler CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 9 CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 10 Intermediate code formats Stack machine format Quadruples compact, similar to machine code, good for standard optimization techniques Stack machine E.g., Java bytecode format easy to generate code for hard to optimize directly can be converted back into quadruples used by some (sort of) high-level languages: FORTH, PostScript, HP calculators CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 11 Code is a sequence of stack operations (not necessarily the same stack as the call stack) push CONST : add CONST to the top of stack pop : discard top of stack store : in memory location specified by top of stack store element just below. load : replace top of stack with memory location it points to +, *, /, -, : replace top two elems w/ result of operation CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 12 2

Generating code Stack operations mostly don t name operands (implicit): can code in 1 byte Expression is translated to code that leaves its value on the top of stack Translation of E 1 + E 2 : ZE 1 + E 2 [ = ZE 1 [; ZE 2 [; + Translation of id = E : Z id = E [ = Z E [; push addr(id); store Bad code generation is easy Compactness Values get trapped down low on stack especially with subexpression elim. Often need instruction that re-pushes element at known stack index on top Might as well have register operands! Result: not more compact than a register-based format; extra copies of data on stack too CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 13 CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 14 Stack machine quadruples At each point in code, keep track of stack depth (if possible) Assign temporaries according to depth Replace stack operands with quadruples using these temporaries 0 1 2 a b c a b*c a + b*c a+b*c push a ; 0 push b ; 1 push c ; 2 * ; 1 + ; 0 t0 = a t1 = b t2 = c t1 = t1 * t2 t0 = t0 + t1 CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 15 Java compilation model Java source code javac Java bytecode bytecode verifier Java interpreter sun.tools.javac.main(o, s) Just-In-Time (JIT) compiler native code stack machine instructions type annotations CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 16 Verification Java security depends on access only through public/protected methods hidden private variables unforgeable references to objects (capabilities) If Java program is not strongly typed, security of machine can be compromised! Java bytecode verifier checks Java bytecode to ensure strong typing: typed intermediate language Java Virtual Machine interpreter runs verified bytecode quickly, avoids run-time checks CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 17 JVM bytecode stack-machine intermediate code add, sub, mul, rem, div, : arithmetic dup, swap, pop, : stack ops also has local registers/temporaries load, store untyped, reused for different types built-in object operations invokevirtual, invokestatic, getfield, putfield, types of methods, fields are declared control flow ifeq, goto, ifne, : conditional branch How to show that code is type-safe? (efficiently!) CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 18 3

Type inference Type-checking bytecode: need to know type of every stack entry type of every local at every instruction Not present in bytecode file: inferred Start from known argument, return types to method object calls inside method Use forward data-flow analysis to propagate types to all bytecode instructions! Data-flow value is type of every stack entry, type of every local Meet is point-wise join in type hierarchy CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 19 Example Data-flow value = (T 1, T 2, ), [0: T 0, 1: T 1, ] stack types local types (T 1, T 2, ), [0: T 0, 1: T 1, ] (T 1, T 2, ), [0: T 0,, i: T i, ] swap (T 2, T 1, ), [0: T 0, 1: T 1, ] ( ), [, i: T i, ] flow functions load i (T i, T 1,T 2, ), [0: T 0,, i: T i, ] ( ), [, i: T i, ] combining operator K ( ), [, i: T i J T i, ] Object J int =? CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 20 JIT compilers Particularly widely available back end(s) with well-defined intermediate code (JVM bytecode) Generate code by reconstructing registers from stack machine as discussed Inferred types allow better code Compilation is done on-the-fly: generating code quickly is essential generated code quality is usually low HotSpot: new Sun JIT. High-quality optimization (esp. inlining and specialization), but used sparingly CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 21 Interpreters Why generate machine code at all? Just run it. Processors are really fast Options: token interpreters (parsing on the fly) -- reallly slow (>1000x) AST interpreters -- 300x threaded interpreters -- 20-50x bytecode interpreters -- 10-30x CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 22 AST interpreters Yet another recursive traversal For every node type in AST, add method Object evaluate(runtimecontext r) Evaluate method is implemented recursively Object PlusNode.evaluate(r) { return left.evaluate(r).plus(right.evaluate(r)); } Variables, etc. looked up in r; some help from AST yields big speed-ups (e.g. pre-computed variable locations) Interpreter code broken into tiny methods w/ lots of method invocations: slow CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 23 Implementing bytecode interpreters Bytecode interpreter simulates a simple architecture (either stack or register machine) Interpreter state: current code pointer current simulated function return stack current registers or stack & stack pointer Interpreter code is a big loop containing a switch over kinds of bytecode instructions one big function: optimizer does good things Avoid: recursion on function calls Result: 10-30x slowdown if done right CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 24 4

Summary Building a new system for executing code doesn t require construction of a full compiler Cost-effective strategies: source-to-source translation or translation to an existing intermediate code format Material covered in this course still helps High performance: translate to C Portability, extensibility: translate to Java or JVM (leverage existing back end/interpreter) CS 412/413 Spring '00 Lecture 38 -- Andrew Myers 25 5