Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

Similar documents
Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

A programming language requires two major definitions A simple one pass compiler

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

A Simple Syntax-Directed Translator

Building Compilers with Phoenix

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Introduction. Compilers and Interpreters

Introduction to Compiler Construction

Compiler Design. Dr. Chengwei Lei CEECS California State University, Bakersfield

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Compiler Design (40-414)

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Introduction to Compiler Construction

Lexical and Syntax Analysis. Top-Down Parsing

CS 314 Principles of Programming Languages

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CS131: Programming Languages and Compilers. Spring 2017

Formal Languages and Compilers Lecture I: Introduction to Compilers

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

2.2 Syntax Definition

Lexical and Syntax Analysis

CST-402(T): Language Processors

3. Parsing. Oscar Nierstrasz

Principles of Programming Languages COMP251: Syntax and Grammars

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Syntax Analysis Part I

A simple syntax-directed

Introduction to Compiler Construction

CS 406/534 Compiler Construction Parsing Part I

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CPS 506 Comparative Programming Languages. Syntax Specification

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Principles of Programming Languages COMP251: Syntax and Grammars

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

CS426 Compiler Construction Fall 2006

Introduction to Compiler Construction

Chapter 3. Describing Syntax and Semantics ISBN

LANGUAGE PROCESSORS. Introduction to Language processor:

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

CSE 3302 Programming Languages Lecture 2: Syntax

Lecture 14 Sections Mon, Mar 2, 2009

Introduction to Compiler Construction

Abstract Syntax Trees & Top-Down Parsing

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Syntax. A. Bellaachia Page: 1

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

CMSC 330: Organization of Programming Languages

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Monday, September 13, Parsers

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

More Assigned Reading and Exercises on Syntax (for Exam 2)

Summary: Semantic Analysis

Abstract Syntax Trees Synthetic and Inherited Attributes

LECTURE NOTES ON COMPILER DESIGN P a g e 2

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Compiler Design Overview. Compiler Design 1

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CSCI312 Principles of Programming Languages

Defining syntax using CFGs

ICOM 4036 Spring 2004

Chapter 4 :: Semantic Analysis

COMP3131/9102: Programming Languages and Compilers

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Introduction to Parsing

Syntax Analysis Check syntax and construct abstract syntax tree

VIVA QUESTIONS WITH ANSWERS

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Introduction to Compiler Design

EECS 6083 Intro to Parsing Context Free Grammars

COMP455: COMPILER AND LANGUAGE DESIGN. Dr. Alaa Aljanaby University of Nizwa Spring 2013

LECTURE 3. Compiler Phases

Compiler Code Generation COMP360

CMSC 330: Organization of Programming Languages

Types of parsing. CMSC 430 Lecture 4, Page 1

Parsing II Top-down parsing. Comp 412

Undergraduate Compilers in a Day

Introduction to Compilers

CSCI312 Principles of Programming Languages!

Philadelphia University Faculty of Information Technology Department of Computer Science --- Semester, 2007/2008. Course Syllabus

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Wednesday, August 31, Parsers

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

CMSC 330: Organization of Programming Languages. Context Free Grammars

CS Compiler Construction West Virginia fall semester 2014 August 18, 2014 syllabus 1.0

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Transcription:

Compilerconstructie najaar 2013 http://www.liacs.nl/home/rvvliet/coco/ Rudy van Vliet kamer 124 Snellius, tel. 071-527 5777 rvvliet(at)liacs(dot)nl college 1, dinsdag 3 september 2013 Overview 1 Why this course It s part of the general background of a software engineer How do compilers work? How do computers work? What machine code is generated for certain language constructs? Working on a non-trivial programming project After the course Know how to build a compiler for a simplified progr. language Know how to use compiler construction tools, such as generators for scanners and parsers Be familiar with compiler analysis and optimization techniques 2 In class, we discuss the theory using the dragon book by Aho et al. The theory is applied in the practicum to build a compiler that converts Pascal code to MIPS instructions. A.V. Aho, M.S. Lam, R. Sethi, and J.D. Ullman, Compilers: Principles, Techniques, & Tools, Addison-Wesley, 2007, ISBN: 978-0-321-49169-5 (international edition). 3 In class, we discuss the theory using the dragon book by Aho et al. The theory is applied in the practicum to build a compiler that converts Pascal code to MIPS instructions. A.V. Aho, M.S. Lam, R. Sethi, and J.D. Ullman, Compilers: Principles, Techniques, & Tools, Addison-Wesley, 2006, ISBN: 978-0-321-54798-9. 4 Earlier edition Dragon book has been revised in 2006 In Second edition good improvements are made Parallelism... Array data-dependence analysis First edition may also be used A.V. Aho, R. Sethi, and J.D. Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley, 1986, ISBN-10: 0-201-10088-6 / 0-201-10194-7 (international edition). Contact Room 124, tel. 071-5275777, rvvliet(at)liacs(dot)nl Course website: http://www.liacs.nl/home/rvvliet/coco/ Lecture slides, assignments, grades Practicum 4 self-contained assignments These assignments are done by groups of two persons Assignments are submitted by e-mail Assistants: Mathijs van de Nes (and Teddy Zhai) Written exam 17 December 2013, 10:00 13:00 11 March 2014, 14:00 17:00 5 6 Grading: Average (50-50) of the grades from the written exam and the practicum You need to pass all 4 assignments to obtain a grade Final grade is only accepted if all grades are 5.5 Then, you obtain 6 EC Studying only from the lecture slides may not be sufficient. Relevant book chapters will be given. 7 (tentative) 1. Overview 2. Lexical Analysis 3. Syntax Analysis Part 1 4. Syntax Analysis Part 2 (+ exercise class) 5. Assignment 1 6. Static Type Checking 7. Assignment 2 8. Intermediate Code Generation (+ exercise class) 9. Assignment 3 10. Code Generation 11. Code optimization (+ exercise class) 12. Assignment 4 13. spare date 14. Assignment 4 (extra session) 8

Practicum Assignment 1: Calculator Assignment 2: Parsing & Syntax tree Assignment 3: Intermediate code Assignment 4: Assembly generation 2 academic hours of Lab session + 3 weeks to complete (except assignment 1) 9 Short History of Compiler Construction Formerly a mystery, today one of the best known areas of computing 1957 Fortran first compilers (arithmetic expressions, statements, procedures) 1960 Algol first formal language definition (grammars in Backus-Naur form, block structure, recursion,...) 1970 Pascal user-defined types, virtual machines (P-code) 1985 C++ object-orientation, exceptions, templates 1995 Java just-in-time compilation We only consider imperative languages Functional languages (e.g., Lisp) and logical languages (e.g., Prolog) require different techniques. 10 Compilers and Interpreters Compilation: Translation of a program written in a source language into a semantically equivalent program written in a target language Source Compiler Error messages Input Target Output Interpretation: Performing the operations implied by the source program. Source Input Interpreter Output Compilers and Interpreters Compiler: Translates source code into machine code, with scanner, parser,..., code generator Interpreter: Executes source code directly, with scanner, parser Statements in, e.g., a loop are scanned and parsed again and again Error messages 11 12 Compilers and Interpreters Hybrid compiler (Java): Analysis-Synthesis Model of Compilation There are two parts to compilation: Translation of a program written in a source language into a semantically equivalent program written in an intermediate language (bytecode) Interpretation of intermediate program by virtual machine, which simulates physical machine Analysis Determines the operations implied by the source program which are recorded in an intermediate representation, e.g., a tree structure Source Translator Error messages Intermed. Input Virtual Machine Output Synthesis Takes the intermediate representation and translates the operations therein into the target program 13 14 Compilation flow source program Other tools that use A-S Model Editors (syntax highlighting, text auto completion) Preprocessor modified source program Compiler Text formatters (LAT E X, MS Word) target assembly program Assembler relocatable machine code Linker/Loader library files relocatable object files 15 target machine code 16

Symbol Table source program / character stream Lexical Analyser (scanner) Syntax Analyser (parser) Semantic Analyser Intermediate Code Generator Code optimizer Code Generator target machine code 17 Character stream: position initial + rate * 60 Lexical Analyser (scanner) Token stream: id,1 id,2 + id,3 60 18 Token stream: Syntax Analyser (parser) Parse tree / syntax tree: id stmt expr expr + term id,1 id,2 + id,3 60 term term factor factor factor number id id id, 3 60 19 Syntax tree: Semantic Analyser Syntax tree: id, 3 60 id, 3 inttofloat 60 20 Syntax tree: id, 3 inttofloat 60 Intermediate Code Generator t1 inttofloat(60) t2 id3 * t1 t3 id2 + t2 id1 t3 Code Optimizer t1 inttofloat(60) t2 id3 * t1 t3 id2 + t2 id1 t3 21 t1 id3 * 60.0 id1 id2 + t1 22 The Grouping of Phases t1 id3 * 60.0 id1 id2 + t1 Code Generator Target code (assembly code): Front End: scanning, parsing, semantic analysis, intermediate code generation (source code intermediate representation) Back End: code optimizing, code generation (intermediate representation target machine code) LDF R2, id3 MULF R2, R2, #60.0 LDF R1, id2 ADDF R1, R1, R2 STF id1, R1 language-dependent Java C Pascal machine-dependent Pentium PowerPC SPARC 23 24

Passes: Single-Pass Compilers Phases work in an interleaved way do scan token parse token check token generate code for token while (not eof) Portion of code is generated while reading portion of source program Compiler-Construction Tools 25 Passes: Multi-Pass Compilers Phases are separate programs, which run sequentially characters Scanner tokens Parser tree Semantic analyser... code Each phase reads from a file and writes to a new file. Time vs memory Why multi-pass? If the language is complex If portability is important Today: often two-pass compiler The Structure of our compiler 26 Software development tools are available to implement one or more compiler phases Scanner generators Parser generators Syntax-directed translation engines Character stream Lexical Analyser Token Stream Syntax-Directed Translation Develop a parser and code generator Syntax Definition (Grammar) MIPS Specification MIPS Assembly Code Automatic code generators Data-flow engines 27 Syntax directed translation: The compiler uses the syntactic structure of the language to generate output 28 What is a grammar? Context-free grammar is a 4-tuple with A set of nonterminals (syntactic variables) A set of tokens (terminal symbols) A designated start symbol (nonterminal) A set of productions: rules how to decompose nonterminals Example: Context-free grammar for simple expressions: G ({list,digit, {+,,0,1,2,3,4,5,6,7,8,9,list, P) with productions P: list list +digit list list digit list digit digit 0 1 2 3 4 5 6 7 8 9 29 Derivation Given a context-free grammar, we can determine the set of all s (sequences of tokens) generated by the grammar using derivations: We begin with the start symbol In each step, we replace one nonterminal in the current form with one of the right-hand sides of a production for that nonterminal 30 Derivation (Example) G ({list,digit, {+,,0,1,2,3,4,5,6,7,8,9,list, P) list list +digit list digit digit digit 0 1 2 3 4 5 6 7 8 9 Example: 9-5+2 list list +digit list digit +digit digit digit +digit 9 digit +digit 9 5+digit 9 5+2 This is an example of leftmost derivation, because we replaced the leftmost nonterminal (underlined) in each step 31 Parse Tree (derivation tree in FI2) The root of the tree is labelled by the start symbol Each leaf of the tree is labelled by a terminal (token) or ǫ (empty) Each interior node is labelled by a nonterminal If node A has children X 1,X 2,...,X n, then there must be a production A X 1 X 2...X n 32

Parse Tree (Example) Parse tree of the 9 5+2 using grammar G list digit list digit list digit 9 5 + 2 Yield of the parse tree: the sequence of leafs (left to right) Parsing: the process of finding a parse tree for a given Language: the set of s that can be generated by some parse tree 33 Ambiguity Consider the following context-free grammar: G ({, {+,,0,1,2,3,4,5,6,7,8,9,, P) with productions P + 0 1... 9 This grammar is ambiguous, because more than one parse tree generates the 9 5+2 34 Ambiguity (Example) Parse trees of the 9 5+2 using grammar G Associativity of Operators By convention 9+5+2 (9+5)+2 9 5 2 (9 5) 2 In most programming languages: +,,,/ are left associative left associative 9 5 + 2 (9 5)+2 6 9 5 + 2 9 (5+2) 2, are right associative: abc a(bc) a b c a (b c) 35 36 Precedence of Operators Consider: 9+52 Is this (9+5)2 or 9+(52)? Associativity does not resolve this Precedence of operators: + increasing / precedence A grammar for arithmetic expressions:... Example: 9+523+1+47 37 Precedence of Operators Consider: 9+52 Is this (9+5)2 or 9+(52)? Associativity does not resolve this Precedence of operators: A grammar for arithmetic expressions: + increasing / precedence expr expr +term expr term term term term factor term/factor factor factor digit (expr) digit 0 1... 9 Parse tree for 9+52... 38 Syntax-Directed Translation Syntax-Directed Translation Using the syntactic structure of the language to generate output corresponding to some input Two techniques: Syntax-directed definition Translation scheme Using the syntactic structure of the language to generate output corresponding to some input Two variants: Syntax-directed definition Translation scheme Example: translation of infix notation to postfix notation What is 952+ 3? infix postfix (9 5)+2 95 2+ 9 (5+2) 952+ 39 Example: translation of infix notation to postfix notation Simple infix expressions generated by expr expr 1 +term expr 1 term term term 0 1... 9 40

Syntax-Directed Definition Uses a context-free grammar to specify the syntactic structure of the language Associates a set of attributes with (non)terminals Associates with each production a set of semantic rules for computing values for the attributes The attributes contain the translated form of the input after the computations are completed (in example: postfix notation corresponding to subtree) 41 Syntax-Directed Definition (Example) Production Semantic rule expr expr 1 +term expr.t expr 1.t term.t + expr expr 1 term expr.t expr 1.t term.t expr term expr.t term.t term 0 term.t 0 term 1 term.t 1...... term 9 term.t 9 Result: annotated parse tree Example: 9 5+2 42 Synthesized and Inherited Attributes An attribute is said to be... synthesized if its value at a parse tree node N is determined from attribute values at the children of N (and at N itself) inherited if its value at a parse tree node N is determined from attribute values at the parent of N (and at N itself and its siblings) We (mainly) consider synthesized attributes 43 Depth-First Traversal A syntax-directed definition does not impose an evaluation order of the attributes on a parse tree Different orders might be suitable Tree traversal: a specific order to visit the nodes of a tree (always starting from the root node) Depth-first traversal Start from root Recursively visit children (in any order) Hence, visit nodes far away from the root as quickly as it can (DF) 44 A Possible DF Traversal Postorder traversal procedure visit (node N) { for (each child C of N, from left to right) { visit (C); evaluate semantic rules at node N; Can be used to determine synthesized attributes / annotated parse tree 45 Translation Scheme A translation scheme is a context-free grammar with semantic actions embedded in the bodies of the productions Example With semantic action: rest +term rest 1 rest +term {print( + ) rest 1 46 Translation Scheme Translation Scheme (Example) A translation scheme is a context-free grammar with semantic actions embedded in the bodies of the productions Example rest +term {print( + ) rest 1 Corresponding effect on parse tree: rest.... + term {print( + ) rest1 47 expr expr 1 +term {print( + ) expr expr 1 term {print( ) expr term term 0 {print( 0 ) term 1 {print( 1 )...... term 9 {print( 9 ) Example: parse tree for 9 5+2 Implementation requires postorder traversal (LRW) 48

Parsing Process of determining if a of tokens can be generated by a grammar For any context-free grammar, there is a parser that takes at most O(n 3 ) time to parse a of n tokens Linear algorithms sufficient for parsing programming languages Two methods of parsing: Top-down constructs parse tree from root to leaves Bottom-up constructs parse tree from leaves to root Cf. top-down PDA and bottom-up PDA in FI2 49 Parsing (Top-Down Example) stmt expr ; optexpr ǫ if (expr )stmt for (optexpr ; optexpr ; optexpr )stmt other expr How to determine parse tree for for (;expr ;expr )other Use lookahead: current terminal in input 50 Predictive Parsing Recursive-descent parsing is a top-down parsing method: Executes a set of recursive procedures to process the input Every nonterminal has one (recursive) procedure parsing the nonterminal s syntactic category of input tokens Predictive parsing is a special form of recursive-descent parsing: Predictive Parsing (Example) void stmt() { switch (lookahead) { case expr: match(expr); match( ; ); break; case if: match(if); match( ( ); match(expr); match( ) ); stmt(); break; case for: match(for); match( ( ); optexpr(); match( ; ); optexpr(); match( ; ); optexpr(); match( ) ); stmt(); break; case other; match(other); break; default: report("syntax error"); The lookahead symbol unambiguously determines the production for each nonterminal 51 void match(terminal t) { if (lookaheadt) lookahead nextterminal; else report("syntax error"); 52 Using FIRST Let α be of grammar symbols FIRST(α) is the set of terminals that appear as first symbols of s generated from α Simple example: stmt expr ; if (expr )stmt for (optexpr ; optexpr ; optexpr )stmt other Using FIRST Let α be of grammar symbols FIRST(α) is the set of terminals that appear as first symbols of s generated from α When a nontermimal has multiple productions, e.g., A α β then FIRST(α) and FIRST(β) must be disjoint in order for predictive parsing to work Right-hand side may start with nonterminal... 53 54 Compilerconstructie college 1 Overview Chapters for reading: 1.1, 1.2, 2.1-2.5 55