Compiler Construction 1. Introduction. Oscar Nierstrasz

Similar documents
Compiler Construction 1. Introduction. Oscar Nierstrasz

Compiler Construction

CS 352: Compilers: Principles and Practice

CS 132 Compiler Construction

Fundamentals of Compilation

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CPSC 411: Introduction to Compiler Construction

Compilers and Interpreters

PART ONE Fundamentals of Compilation

Compiling Techniques

6. Intermediate Representation

The View from 35,000 Feet

Compilation 2014 Warm-up project

Academic Formalities. CS Modern Compilers: Theory and Practise. Images of the day. What, When and Why of Compilers

3. Parsing. Oscar Nierstrasz

6. Intermediate Representation!

Overview of a Compiler

Compiler Design (40-414)

High-level View of a Compiler

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Academic Formalities. CS Language Translators. Compilers A Sangam. What, When and Why of Compilers

CA Compiler Construction

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Compilers and Code Optimization EDOARDO FUSELLA

Programming Language Processor Theory

4. Parsing in Practice

Introduction to Compilers

Modern Compiler Implementation in ML

Compiler Construction LECTURE # 1

5. Semantic Analysis. Mircea Lungu Oscar Nierstrasz

Introduction to Compiler

Front End. Hwansoo Han

5. Semantic Analysis!

Undergraduate Compilers in a Day

Compilers. Lecture 2 Overview. (original slides by Sam

Overview of a Compiler

Intermediate Representations

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

Overview of a Compiler

CS415 Compilers. Lexical Analysis

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

7. Introduction to Denotational Semantics. Oscar Nierstrasz

5. Semantic Analysis. Mircea Lungu Oscar Nierstrasz

Compiler Code Generation COMP360

CS Compiler Construction West Virginia fall semester 2014 August 18, 2014 syllabus 1.0

Compiler Theory Introduction and Course Outline Sandro Spina Department of Computer Science

Principles of Compiler Construction ( )

Lexical Analysis. Introduction

CS 415 Midterm Exam Spring 2002

CST-402(T): Language Processors

COSE312: Compilers. Lecture 1 Overview of Compilers

CS 4120 and 5120 are really the same course. CS 4121 (5121) is required! Outline CS 4120 / 4121 CS 5120/ = 5 & 0 = 1. Course Information

Part 5 Program Analysis Principles and Techniques

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Lexical Scanning COMP360

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

CSE4305: Compilers for Algorithmic Languages CSE5317: Design and Construction of Compilers

A simple syntax-directed

Compiling Regular Expressions COMP360

CSE 3302 Programming Languages Lecture 2: Syntax

Topic 3: Syntax Analysis I

Compiler Construction

CS 4201 Compilers 2014/2015 Handout: Lab 1

JavaCC. JavaCC File Structure. JavaCC is a Lexical Analyser Generator and a Parser Generator. The structure of a JavaCC file is as follows:

Compilers Crash Course

Formal Languages and Compilers Lecture I: Introduction to Compilers

CS 406/534 Compiler Construction Putting It All Together

Abstract Syntax Trees

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Working of the Compilers

Compiler Construction Lecture 1: Introduction Winter Semester 2018/19 Thomas Noll Software Modeling and Verification Group RWTH Aachen University

Principles of Compiler Construction ( )

Overview of the Class

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Compiler Construction

Compiler Construction Lecture 1: Introduction Summer Semester 2017 Thomas Noll Software Modeling and Verification Group RWTH Aachen University

General Course Information. Catalogue Description. Objectives

Alternatives for semantic processing

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

8. Static Single Assignment Form. Marcus Denker

10. PEGs, Packrats and Parser Combinators

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Chapter 4. Abstract Syntax

2. Lexical Analysis! Prof. O. Nierstrasz!

CS 314 Principles of Programming Languages

CPSC 510 (Fall 2007, Winter 2008) Compiler Construction II

CSE4305: Compilers for Algorithmic Languages CSE5317: Design and Construction of Compilers

PLAGIARISM. Administrivia. Compilers. CS143 11:00-12:15TT B03 Gates. Text. Staff. Instructor. TAs. Office hours, contact info on 143 web site

Chapter 2 - Programming Language Syntax. September 20, 2017

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation

Topic 5: Syntax Analysis III

1DL321: Kompilatorteknik I (Compiler Design 1)

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Transcription:

Compiler Construction 1. Introduction Oscar Nierstrasz

Compiler Construction Lecturers Assistants Lectures Exercises WWW Prof. Oscar Nierstrasz, Dr. Mircea Lungu Jan Kurš, Boris Spasojević E8 001, Fridays @ 10h15-12h00 E8 001, Fridays @ 12h00-13h00 scg.unibe.ch/teaching/cc 2

MSc registration Spring 2015 JMCS students Register on Academia for teaching units by March 13, 2015 Register on Academia for exams by May 15, 2015 Request reimbursement of travel expenses by June 30, 2015 NB: Hosted JMCS students (e.g. CS bachelor students etc.) must additionally: Request for Academia access by February 28, 2015

Roadmap > Overview > Front end > Back end > Multi-pass compilers > Example: compiler and interpreter for a toy language See Modern compiler implementation in Java (Second edition), chapter 1. 4

Roadmap > Overview > Front end > Back end > Multi-pass compilers > Example: compiler and interpreter for a toy language 5

Textbook > Andrew W. Appel, Modern compiler implementation in Java (Second edition), Cambridge University Press, New York, NY, USA, 2002, with Jens Palsberg. Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/ http://www.cs.purdue.edu/homes/hosking/ 6

Other recommended sources > Compilers: Principles, Techniques, and Tools, Aho, Sethi and Ullman http://dragonbook.stanford.edu/ > Parsing Techniques, Grune and Jacobs http://www.cs.vu.nl/~dick/pt2ed.html > Advanced Compiler Design and Implementation, Muchnik 7

Schedule 1 20-Feb-15 Introduction 2 27-Feb-15 Lexical Analysis 3 06-Mar-15 Parsing 4 13-Mar-15 Parsing in Practice 5 20-Mar-15 Semantic Analysis 6 27-Mar-15 Intermediate Representation 03-Apr-15 Good Friday 10-Apr-15 Spring break 7 17-Apr-15 Optimization 8 24-Apr-15 Code Generation 9 01-May-15 Bytecode and Virtual Machines 10 08-May-15 PEGs, Packrats and Parser Combinators 11 15-May-15 Program Transformation 12 22-May-15 Project Presentations 13 29-May-15 Final Exam 8

What is a compiler? a program that translates an executable program in one language into an executable program in another language Source code Compiler Target code errors 9 Translates source code into target code. We expect the program produced by the compiler to be better, in some way, than the original.

What is an interpreter? a program that reads an executable program and produces the results of running that program Source code Interpreter Output errors 10 Usually, this involves executing the source program in some fashion.

Implementing Compilers, Interpreters Pre-processor Parser Code Generator Assembler Program Parse tree / IR Assembly code Machine code Translator Interpreter Bytecode Generator JIT Compiler Program... Bytecode Bytecode Interpreter 11 This picture offers a high-level overview of the different approaches to implementing languages

Why do we care? Compiler construction is a microcosm of computer science artificial intelligence algorithms theory systems architecture greedy algorithms learning algorithms graph algorithms union-find dynamic programming DFAs for scanning parser generators lattice theory for analysis allocation and naming locality synchronization pipeline management hierarchy management instruction set use Inside a compiler, all these things come together 12

Isn t it a solved problem? > Machines are constantly changing Changes in architecture changes in compilers new features pose new problems changing costs lead to different concerns old solutions need re-engineering > Innovations in compilers should prompt changes in architecture New languages and features 13 For example, computationally expensive but simpler scannerless parsing techniques are undergoing a renaissance.

What qualities are important in a compiler? > Correct code > Output runs fast > Compiler runs fast > Compile time proportional to program size > Support for separate compilation > Good diagnostics for syntax errors > Works well with the debugger > Good diagnostics for flow anomalies > Cross language calls > Consistent, predictable optimization 14 Each of these shapes your feelings about the correct contents of this course

A bit of history > 1952: First compiler (linker/loader) written by Grace Hopper for A-0 programming language > 1957: First complete compiler for FORTRAN by John Backus and team > 1960: COBOL compilers for multiple architectures > 1962: First self-hosting compiler for LISP 15 http://en.wikipedia.org/wiki/compiler ;-)

A compiler was originally a program that compiled subroutines [a linkloader]. When in 1954 the combination algebraic compiler came into use, or rather into misuse, the meaning of the term had already shifted into the present one. Bauer and Eickel [1975] 16

Abstract view Source code Compiler Target code errors recognize legal (and illegal) programs generate correct code manage storage of all variables and code agree on format for object (or assembly) code Big step up from assembler higher level notations 17

Traditional two pass compiler source code front end IR code generation machine code errors front end maps legal code into IR intermediate representation (IR) back end maps IR onto target machine simplifies retargeting allows multiple front ends multiple passes better code 18 A classical compiler consists of a front end that parses the source code into an intermediate representation, and a back end that generates executable code.

A fallacy! Front-end, IR and back-end must encode knowledge needed for all n m combinations! 19 - must encode all the knowledge in each front end - must represent all the features in one IR - must handle all the features in each back end Limited success with low-level IRs

Roadmap > Overview > Front end > Back end > Multi-pass compilers > Example: compiler and interpreter for a toy language 20

Front end source code scanner tokens parser IR errors recognize legal code report errors produce IR preliminary storage map shape code for the back end Much of front end construction can be automated 21 preliminary storage map => not only prepare symbol table, but decide what part of storage different names (entities) should be mapped to (local, global, automatic etc) shape code for the back end => decide how different parts of code are organized (in the IR)

Scanner source code scanner tokens parser IR errors map characters to tokens character string value for a token is a lexeme eliminate white space x = x + y <id,x> = <id,x> + <id,y> 22 character string value for a token is a lexeme Typical tokens: id, number, do, end Key issue is speed

Parser source code scanner tokens parser IR errors recognize context-free syntax guide context-sensitive analysis construct IR(s) produce meaningful error messages attempt error correction Parser generators mechanize much of the work 23

Context-free grammars Context-free syntax is specified with a grammar, usually in Backus-Naur form (BNF) 1.<goal> := <expr> 2.<expr> := <expr> <op> <term> 3. <term> 4.<term> := number 5. id 6.<op> := + 7. - A grammar G = (S,N,T,P) S is the start-symbol N is a set of non-terminal symbols T is a set of terminal symbols P is a set of productions P: N (N T) * 24 Called context-free because rules for non-terminals can be written without regard for the context in which they appear.

Deriving valid sentences Production Result <goal> 1 <expr> 2 <expr> <op> <term> 5 <expr> <op> y 7 <expr> - y 2 <expr> <op> <term> - y 4 <expr> <op> 2 - y 6 <expr> + 2 - y 3 <term> + 2 - y 5 x + 2 - y Given a grammar, valid sentences can be derived by repeated substitution. To recognize a valid sentence in some CFG, we reverse this process and build up a parse. 25 The parse is the sequence of productions needed to parse the input. The parse tree is something else

Parse trees A parse can be represented by a tree called a parse tree (or syntax tree). Obviously, this contains a lot of unnecessary information 26

Abstract syntax trees So, compilers often use an abstract syntax tree (AST). ASTs are often used as an IR. 27

Roadmap > Overview > Front end > Back end > Multi-pass compilers > Example: compiler and interpreter for a toy language 28

Back end IR instruction selection register allocation machine code errors translate IR into target machine code choose instructions for each IR operation decide what to keep in registers at each point ensure conformance with system interfaces Automation has been less successful here 29

Instruction selection IR instruction selection register allocation machine code errors produce compact, fast code use available addressing modes pattern matching problem ad hoc techniques tree pattern matching string pattern matching dynamic programming 30

Register allocation IR instruction selection register allocation machine code errors have value in a register when used limited resources changes instruction choices can move loads and stores optimal allocation is difficult Modern allocators often use an analogy to graph coloring 31

Roadmap > Overview > Front end > Back end > Multi-pass compilers > Example: compiler and interpreter for a toy language 32

Traditional three-pass compiler analyzes and changes IR goal is to reduce runtime (optimization) must preserve results 33 Code improvement and optimization

Optimizer (middle end) Modern optimizers are usually built as a set of passes IR opt 1 IR... IR opt n IR errors constant expression propagation and folding code motion reduction of operator strength common sub-expression elimination redundant store elimination dead code elimination 34 constant propagation and folding (evaluate and propagate constant expressions at compile time) code motion (move code that does not need to be reevaluated out of loops) reduction of operator strength (replace slow operations by equivalent faster ones) common sub-expression elimination (evaluate once and store) redundant store elimination (detect when values are stored repeatedly and eliminate) dead code elimination (eliminate code that can never be executed)

The MiniJava compiler 35 Cf. MCIJ 2d edn p 4

Compiler phases Lex Parse Parsing Actions Semantic Analysis Frame Layout Translate Canonicalize Instruction Selection Control Flow Analysis Data Flow Analysis Break source file into individual words, or tokens Analyse the phrase structure of program Build a piece of abstract syntax tree for each phrase Determine what each phrase means, relate uses of variables to their definitions, check types of expressions, request translation of each phrase Place variables, function parameters, etc., into activation records (stack frames) in a machine-dependent way Produce intermediate representation trees (IR trees), a notation that is not tied to any particular source language or target machine Hoist side effects out of expressions, and clean up conditional branches, for convenience of later phases Group IR-tree nodes into clumps that correspond to actions of target-machine instructions Analyse sequence of instructions into control flow graph showing all possible flows of control program might follow when it runs Gather information about flow of data through variables of program; e.g., liveness analysis calculates places where each variable holds a still-needed (live) value Register Allocation Code Emission Choose registers for variables and temporary values; variables not simultaneously live can share same register Replace temporary names in each machine instruction with registers 36

Roadmap > Overview > Front end > Back end > Multi-pass compilers > Example: compiler and interpreter for a toy language 37

A straight-line programming language (no loops or conditionals): Stm Stm ; Stm CompoundSt m Stm id := Exp AssignStm Stm print ( ExpList ) PrintStm Exp id IdExp Exp num NumExp Exp Exp Binop Exp OpExp Exp ( Stm, Exp ) EseqExp ExpList Exp, ExpList PairExpList ExpList Exp LastExpList Binop + Plus Binop Minus Binop Times Binop / Div a := 5 + 3; b := (print(a,a 1),10 a); print(b) prints 8 7 80 38

Tree representation a := 5 + 3; b := (print(a,a 1),10 a); print(b) 39

Straightline Interpreter and Compiler Files Source files Generated files «Grammar spec» slpl.jj JTB «Grammar spec with actions» jtb.out.jj JavaCC «Parser source» StraightLineParser... produces «Compiler source» CompilerVisitor... «Default visitors and interfaces» Visitor... visits «Syntax Tree Nodes» Goal... «Interpreter source» InterpreterVisitor... JavaC uses «Abstract Machine for Interpreter» Machine «Bytecode» StraightLineParser... Key generates 40

Java classes for trees abstract class Stm {} class CompoundStm extends Stm { Stm stm1, stm2; CompoundStm(Stm s1, Stm s2) {stm1=s1; stm2=s2;} } class AssignStm extends Stm { String id; Exp exp; AssignStm(String i, Exp e) {id=i; exp=e;} } class PrintStm extends Stm { ExpList exps; PrintStm(ExpList e) {exps=e;} } abstract class Exp {} class IdExp extends Exp { String id; IdExp(String i) {id=i;} } class NumExp extends Exp { int num; NumExp(int n) {num=n;} } class OpExp extends Exp { Exp left, right; int oper; final static int Plus=1,Minus=2,Times=3,Div=4; OpExp(Exp l, int o, Exp r) {left=l; oper=o; right=r;} } class EseqExp extends Exp { Stm stm; Exp exp; EseqExp(Stm s, Exp e) {stm=s; exp=e;} } abstract class ExpList {} class PairExpList extends ExpList { Exp head; ExpList tail; public PairExpList(Exp h, ExpList t) {head=h; tail=t;} } class LastExpList extends ExpList { Exp head; public LastExpList(Exp h) {head=h;} } 41

Straightline Interpreter and Compiler Runtime «Straightline source code» Examples StraightLineParser «Syntax Tree» Goal... visits visits InterpreterVisitor... CompilerVisitor... instructs Machine uses «Bytecode generation library» bcel.jar output bytecode Key generates output 42

What you should know! What is the difference between a compiler and an interpreter? What are important qualities of compilers? Why are compilers commonly split into multiple passes? What are the typical responsibilities of the different parts of a modern compiler? How are context-free grammars specified? What is abstract about an abstract syntax tree? What is intermediate representation and what is it for? Why is optimization a separate activity? 43

Can you answer these questions? Is Java compiled or interpreted? What about Smalltalk? Ruby? PHP? Are you sure? What are the key differences between modern compilers and compilers written in the 1970s? Why is it hard for compilers to generate good error messages? What is context-free about a context-free grammar? 44

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) You are free to: Share copy and redistribute the material in any medium or format Adapt remix, transform, and build upon the material for any purpose, even commercially. The licensor cannot revoke these freedoms as long as you follow the license terms. Under the following terms: Attribution You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. ShareAlike If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. No additional restrictions You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. http://creativecommons.org/licenses/by-sa/4.0/