Compiler Construction D7011E

Similar documents
CA Compiler Construction

Compiler Construction D7011E

Philadelphia University Faculty of Information Technology Department of Computer Science --- Semester, 2007/2008. Course Syllabus

Compilers for Modern Architectures Course Syllabus, Spring 2015

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Compiler Construction 2010 (6 credits)

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Compiler Construction LECTURE # 1

CS Compiler Construction West Virginia fall semester 2014 August 18, 2014 syllabus 1.0

CS 4120 and 5120 are really the same course. CS 4121 (5121) is required! Outline CS 4120 / 4121 CS 5120/ = 5 & 0 = 1. Course Information

Compiler Design. Dr. Chengwei Lei CEECS California State University, Bakersfield

CMPE 152 Compiler Design

ECE573 Introduction to Compilers & Translators

Compiler Construction

Compiler Construction

COMP 3002: Compiler Construction. Pat Morin School of Computer Science

CPS 506 Comparative Programming Languages. Syntax Specification

Compilers Crash Course

Lecture 1: Introduction

Finding People and Information (1) G53CMP: Lecture 1. Aims and Motivation (1) Finding People and Information (2)

CSCI 565 Compiler Design and Implementation Spring 2014

CMPE 152 Compiler Design

CS/SE 153 Concepts of Compiler Design

Formal Languages and Compilers Lecture I: Introduction to Compilers

Evaluation Scheme L T P Total Credit Theory Mid Sem Exam

Syntax Intro and Overview. Syntax

Compiler Construction D7011E

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

CSE4305: Compilers for Algorithmic Languages CSE5317: Design and Construction of Compilers

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation

CMPE 152 Compiler Design

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, Pilani Pilani Campus Instruction Division

Compiler Design Overview. Compiler Design 1

Building Compilers with Phoenix

Error Recovery during Top-Down Parsing: Acceptable-sets derived from continuation

LECTURE NOTES ON COMPILER DESIGN P a g e 2

G.PULLAIH COLLEGE OF ENGINEERING & TECHNOLOGY

Compiler Construction

Introduction to Parsing Ambiguity and Syntax Errors

Compiler Optimisation

Introduction to Parsing Ambiguity and Syntax Errors

CST-402(T): Language Processors

CS/SE 153 Concepts of Compiler Design

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

General Course Information. Catalogue Description. Objectives

CSE4305: Compilers for Algorithmic Languages CSE5317: Design and Construction of Compilers

Stating the obvious, people and computers do not speak the same language.

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

CSE4305: Compilers for Algorithmic Languages CSE5317: Design and Construction of Compilers

CSE 504: Compiler Design

Chapter 2 A Quick Tour

Compilers - Chapter 2: An introduction to syntax analysis (and a complete toy compiler)

Compiler Construction Lecture 1: Introduction Winter Semester 2018/19 Thomas Noll Software Modeling and Verification Group RWTH Aachen University

Compiler Design (40-414)

PLAGIARISM. Administrivia. Compilers. CS143 11:00-12:15TT B03 Gates. Text. Staff. Instructor. TAs. Office hours, contact info on 143 web site

Introduction to Compilers

Chapter 2 - Programming Language Syntax. September 20, 2017

Compilers and computer architecture: introduction

Working of the Compilers

CS426 Compiler Construction Fall 2006

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

COP 3402 Systems Software Syntax Analysis (Parser)

CSCI 2041: Advanced Language Processing

EI326 ENGINEERING PRACTICE & TECHNICAL INNOVATION (III-G) Kenny Q. Zhu Dept. of Computer Science Shanghai Jiao Tong University

G53CMP: Lecture 1. Administrative Details 2016 and Introduction to Compiler Construction. Henrik Nilsson. University of Nottingham, UK

EECS 6083 Intro to Parsing Context Free Grammars

Context-Free Language Translation for the DDC Assembler

CSE 582 Autumn 2002 Exam 11/26/02

Compiler Construction

CS131: Programming Languages and Compilers. Spring 2017

Compiler Construction

Introduction. Compiler Design CSE Overview. 2 Syntax-Directed Translation. 3 Phases of Translation

1 Lexical Considerations

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Compilers. Pierre Geurts

Compilers CS S-01 Compiler Basics & Lexical Analysis

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

15-411/ Compiler Design

Administrivia. Compilers. CS143 10:30-11:50TT Gates B01. Text. Staff. Instructor. TAs. The Purple Dragon Book. Alex Aiken. Aho, Lam, Sethi & Ullman

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

SOFTWARE ARCHITECTURE 5. COMPILER

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Overview. Overview. Programming problems are easier to solve in high-level languages

A Simple Syntax-Directed Translator

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Introduction to Lexing and Parsing

Compiler Code Generation COMP360

Compilers and Interpreters

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

GUJARAT TECHNOLOGICAL UNIVERSITY

Compilers CS S-01 Compiler Basics & Lexical Analysis

Translator Design CRN Course Administration CMSC 4173 Spring 2017

Semantic Analysis. Lecture 9. February 7, 2018

CS 406/534 Compiler Construction Putting It All Together

Experiences in Building a Compiler for an Object-Oriented Language

Transcription:

Compiler Construction D7011E Lecture 1: Introduction to compilers Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1

Course Formalities 2

Teacher (me): Viktor Leijon: n Email: viktor.leijon@ltu.se n Lives and works in Stockholm, meetings by appointment but email at any time. n The person to contact first with all questions. D7011E web page: http://www.sm.luth.se/csee/courses/d7011e/ (or just google d7011e ). 3

Examiner: Johan Nordlander: n Email: johan.nordlander@ltu.se Responsible for the quality of the course. The person to turn to if talking to the teacher is not enough. 4

Lectures: Currently scheduled: n 2-3 classes / batch. n 15 in total. n See online schedule for time and location. Rules: n Questions and interruptions are welcome! n Cell-phones are turned off! 5

Methods of Assessment: Written examination n 4.5 hp (3p). n Grades U, 3, 4, 5. n Wednesday October 31. Programming project n 3 hp (2p). n Grades U, G. n 6 steps, each with a deadline. n Final deadline: October 26. 6

Evaluation follow-up Review of the previous course evaluation. Earlier evaluations available on the website. 7

Programming project: The project is about implementing a compiler (of course!) for a subset of Java. 6 subtopics, will follow the lectures closely. Use your Linux accounts (required tools installed!). n Your compiler must work on the Linux account. Project details can be found on the web page. Project work will be carried out individually. Each of the 6 project steps will be examined separately grade G required before proceeding. The final step will include a written project report that covers the full compiler implementation. 8

Policies: By default, all deadlines are firm. n Sept 13, Sept 24, Oct 5, Oct 9, Oct 19, Oct 26. I will be as flexible as possible in accommodating special circumstances; but advance notice will make this a lot easier. Standard guidelines for academic integrity: n Discussion is good, copying files is not! n Items turned in should be your own, individual work; n Please ask if this is unclear! n Violations will be reported to the disciplinary board. 9

What will I be looking for? Always include some details of your working/ methods/explanations: n to demonstrate your understanding of concepts. n to show your problem solving skills in applying what you have learned. Stating only the final answer may result in grade U, even though the answer is technically correct. 10

Prerequisites: Good knowledge of imperative programming (D0009E/ SMD180) and object-oriented programming & design (D0010E/SMD181). Functions and relations, set theory, state automata (M0009M/MAM200). Searching and sorting, common data structures like queues, stacks, lists, trees and graphs (D0012E/ SMD184). Stack-based assembly programming (D0013E). No previous experience of compiler construction is assumed. Programming experience in Java is expected. 11

Text book Compilers: principles, techniques and tools (2nd ed.) Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman. Addison Wesley, 2007, ISBN: 0321486811. New and improved! 12

Complementary material The Definitive ANTLR Reference by Terrence Parr; The Pragmatic Bookshelf (2007); ISBN: 0-9787392-5-6 Very useful text on parsing, available online (at a small cost) 13

Previous course book Modern Compiler Implementation in Java by Andrew W. Appel with Jens Palsberg; 2nd edition, Cambridge Univ Press (2002); ISBN: 0-521-82060-X 14

Alternatives (1/2) Engineering a compiler 2nd ed by K. Cooper, L. Torczon More pragmatic approach to the subject, covers many modern topics. May become the main course literature in the future. 15

Alternatives (2/2) Compilers Principles, Techniques and Tools by Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman Classic text, rich in content, but slightly dated. Previous edition of the course book. Advanced Compiler Design and Implementation by Steven S. Muchnick More advanced and more narrow, good source for info on advanced compiler optimization techniques 16

Rough Syllabus: Lecture Contents Chapters 1 Introduction 1, (2) 2 Lexical analysis and finite automata 3.1-3.4 3 Lexer generators 3.5-3.11 4 Introduction to parsing 4.1-4.4 5 Details of top-down & bottom-up parsing 4.5-4.5 6 Parser generators and abstract syntax 4.8-4.9, (5) 7 Semantic analysis 6.1-6.3 8 Introduction to code generation (6.4-6.9), 8 9 Implementing functions and objects (7.1, 7.2), 8 10 Continuing code generation 8 11 Introduction to optimization and LLVM 8, 9, (9.2) 12 Optimization using LLVM 8, 9, (9.2) 13 Instruction selection 8 14 Memory Management 7 15 Polymorphic types 17

Basics of Compiler Structure: 18

Aims of this Course: To understand how compilers work, and to see how they are constructed. n n To build familiarity with compiler tools. ANTLR & LLVM. lex/yacc To build familiarity with some theoretical tools that are useful in the study of compilers. To investigate advanced language features from an implementer s perspective. 19

What is a Compiler? Compilers are translators: compiler Source Programs Target Programs Diagnostics 20

The Ins and Outs: Source programs: Many possible source languages, from traditional, to application specific languages. Target programs: Usually another programming language, often the machine language of a particular computer system. Error diagnostics: Essential for serious program development. 21

Critical Properties: Always: n Correctness: The compiler must accept valid input and produce valid output; and it must preserve the intended semantics. Almost Always: n Performance of compiled code: time and space. n Performance of compiler: again, time and space. n Diagnostics: To permit early and accurate detection of programming errors. 22

Pragmatic Issues: n Support for large programming projects: In particular, this includes: Separate compilation; reducing the amount of recompilation that is needed when a program is changed. n Use of libraries. Convenient development environment: n Supports program development with a range of useful tools, for example, profiling, debugging, cross-referencing, browsing, project management (make), 23

Language vs Implementation: Be very careful to distinguish between languages and their implementations. It makes little sense to talk about a slow language. It makes little sense to talk about a compiled language. 24

Quick Questions! What is your favorite language? n Why? What is your least favorite one? n Why? What makes a language great? n Why? 25

Inside the Compiler: How do we get from here count = 0; total = 0; while (total < 10) { } total = total + count*count; count = count + 1; 26

Inside the Compiler: How do we get from here to here? count = 0; total = 0; while (total < 10) { } total = total + count*count; count = count + 1; movl $0,count addl total,%eax movl $0,total movl %eax,total, L0: movl total,%eax movl count,%eax cmpl $10,%eax addl $1,%eax jnl L1 movl %eax,count movl count,%eax jmp L0 imull %eax,%eax L1: 26

Inside the Compiler (aside): How do we get from here Did we really mean total < 10? to here? count = 0; total = 0; while (total < 10) { } total = total + count*count; count = count + 1; movl $0,count addl total,%eax movl $0,total movl %eax,total, L0: movl total,%eax movl count,%eax cmpl $10,%eax addl $1,%eax jnl L1 movl %eax,count movl count,%eax jmp L0 imull %eax,%eax L1: 27

Inside the Compiler (aside): How do we get from here Did we really mean total < 10? A compiler can detect invalid programs to here? A compiler can t determine if the program you entered was the program you intended! count = 0; total = 0; while (total < 10) { } total = total + count*count; count = count + 1; movl $0,count addl total,%eax movl $0,total movl %eax,total, L0: movl total,%eax movl count,%eax cmpl $10,%eax addl $1,%eax jnl L1 movl %eax,count movl count,%eax jmp L0 imull %eax,%eax L1: 27

The Compiler Pipeline: Traditionally, this task is broken down into several steps, or phases: 28

Lexical Analysis: Character stream Token stream Translate a raw input sequence of characters into a sequence of tokens. For example, the keyword for is a single entity, not a 3 character string. 29

Parsing: Token stream Structured representation Build data structures that capture the underlying structure (abstract syntax) of the input program. Check for ill-formed programs. 30

Static/Semantic Analysis: Structured representation Validated representation Check that the program is reasonable: n No references to unbound variables; n No type inconsistencies; n Etc 31

Simplification: Validated representation Simplified representation Remove redundant constructions, e.g.: n Rewrite for loops in terms of while; n Break down nested function calls; n Etc 32

Code Generation: Simplified representation Translated program Generate an appropriate sequence of machine instructions as output. Different strategies will be needed for different target machines. 33

Optimization: Translated program Optimized code Look for opportunities to improve the quality of output code. n There may be conflicting ways to improve a given program; choice depends on context. n Producing genuinely optimal code is theoretically impossible; improved is as good as it gets. 34

Variations on a Theme: In practice, almost every compiler has its own variation on this basic idea: n Extra phases (e.g., preprocessing); n Repeated phases (e.g., type checking at multiple points in the pipeline); n Iterated phases (e.g., running the optimizer multiple times); n Additional data may be passed from one phase to the next (e.g., symbol tables). 35

Phases and Passes: A phase is a logical stage in a compiler s pipeline. A pass is a physical traversal over the representation of a program. Several phases may be combined in one pass. Passes may be run in sequence or in parallel. Some languages are specifically designed so that they can be compiled in a single pass. 36

The Quick Calculator 37

A Quick Calculator Example: > java QuickCalc Welcome to the quick calculator program! 1 + 2 + 3; Result = 6 Compiled version: iconst 1 iconst 2 iadd iconst 3 iadd 1 + 2 * 3; Result = 7 Compiled version: iconst 1 iconst 2 iconst 3 mul iadd ^C > 38

Inside the Quick Calculator: This is a toy interpreter/compiler. Its implementation involves: n Lexical analysis; n Parsing; n Abstract syntax; n Simple compilation and evaluation. I want you to get a feel for the structure; details are less important. 39

Specifying the Input Language: What does the input to the program look like? In BNF (Backus-Naur Form): program ::= expr program ; program expr ::= expr + expr expr - expr expr * expr expr / expr ( expr ) integer Tokens: + - * / ; ( ) integer 40

Lexical Analysis: static void gettoken() { while (Character.isWhitespace((char)c)) // Skip whitespace c = getchar(); if (c<0) { token = 0; return; } // End of file? switch (c) { } case '+' : case '-' : case '*' : case '/' : case ';' : case '(' : case ')' : token = c; c = getchar(); return; default: if (Character.isDigit((char)c)) { } int n = 0; do { n = 10*n + (c - '0'); c = getchar(); } while (Character.isDigit((char)c)); tokval = n; token = integer; return; calcerror("illegal character "+c); } c holds the next character in the input; token is used to return the code of the next token; tokval is used to return a value (if any) for the token. 41

Source Input: static void initinput() { c = getchar(); gettoken(); } static int getchar() { int c = 0; try { c = System.in.read(); } catch (Exception e) { calcerror("error in input stream"); } return c; } 42

Error Handling: For now, we provide a very crude error handler: static void calcerror(string msg) { System.err.println("ERROR: " + msg); System.exit(1); } In a more sophisticated system, we may wish to provide: n Additional diagnostic information; n Error recovery; n Etc. 43

Representation for Expressions: Before we can write the parser, we must choose a representation for expressions. There are really two kinds of expression that we have to deal with: n Integers, described by a simple numeric value; n Binary expressions, which have an operator and two subexpressions. 44

Using Java Classes: abstract class Expr { abstract void compile(); abstract int eval(); } class IntExpr extends Expr { private int value; } class BinExpr extends Expr { private char op; private Expr left; private Expr right; } 45

Integer Expressions: class IntExpr extends Expr { private int value; IntExpr(int value) { this.value = value; } void compile() { } emit("iconst " + value); } int eval() { } return value; 46

Binary Expressions: class BinExpr extends Expr { private char op; private Expr left; private Expr right; int eval() { int l = left.eval(); int r = right.eval(); switch (op) { case '+' : return l + r; case '-' : return l - r; case '*' : return l * r; case '/' : return l / r; } return 0; /* not used */ } } 47

Ambiguities in the Grammar: The grammar that we used to describe the input is ambiguous: expr ::= expr + expr expr - expr expr * expr expr / expr ( expr ) integer Should 1+2*3 be read as (1+2)*3 or as 1+(2*3)? 48

Changing the Grammar: We can rewrite the grammar to avoid ambiguity: expr ::= expr + term expr - term term term ::= term * atom term / atom atom atom ::= ( expr ) integer Now 1+2*3 must be read as 1+(2*3)! 49

A Recursive Descent Parser: static Expr expr() { // Expr = Term Expr e = term(); // Expr '+' Term // Expr '-' Term for (;;) { if (token=='+') { gettoken(); e = new BinExpr('+', e, term()); } else if (token=='-') { gettoken(); e = new BinExpr('-', e, term()); } else { return e; } } } 50

Putting it All Together: public static void main(string[] args) { System.out.println("Welcome!"); initinput(); for (;;) { Expr e = expr(); System.out.println("Result = " + e.eval()); System.out.println("Compiled version:"); e.compile(); if (token==';') { gettoken(); } else if (token!=0) { calcerror("syntax error"); } else { return; } } } 51

Deficiencies of QuickCalc: Ad-hoc construction; Poor error handling; Poor storage management; Crude user interface; Limited input language. 52

Project warm-up (step 0) Compile and run the quick calculator program (link QuickCalc.java). You can also rewrite it in a language of your own choice if you wish Modify your quick calculator to support unary minus: n Extend the data structures with a representation for unary minus n Add methods to construct and evaluate the new forms of expression (you can ignore the compile method; just use {} for the body) n Modify the parser, giving unary minus a higher precedence than any other operator (for example, -2+3, -(2+3), 4 * -5, and - - 23 should give 1, -5, -20, and 23, respectively) Deadline Wed Sept 13 (midnight). 53

Reading instructions Chapter 1 is the introduction and overview and should be read first. Chapter 2 is a capsule version of the course, you can either read it or skip it and refer back to it when needed (it defines terms). Chapter 3 is on lexical analysis, read up to 3.4 until next lecture. Many pages this time, but comparatively easy. 54