Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Similar documents
Languages and Compilers (SProg og Oversættere)

Parsing. Parsing. Bottom Up Parsing. Bottom Up Parsing. Bottom Up Parsing. Bottom Up Parsing

Languages and Compilers (SProg og Oversættere) Lecture 3 recap Bent Thomsen Department of Computer Science Aalborg University

The Phases of a Compiler. Course Overview. In Chapter 4. Syntax Analysis. Syntax Analysis. Multi Pass Compiler. PART I: overview material

Course Overview. Levels of Programming Languages. Compilers and other translators. Tombstone Diagrams. Syntax Specification

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

A simple syntax-directed

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CSE 3302 Programming Languages Lecture 2: Syntax

LL parsing Nullable, FIRST, and FOLLOW

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Lexical and Syntax Analysis. Top-Down Parsing

CPS 506 Comparative Programming Languages. Syntax Specification

Chapter 3. Parsing #1

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CS 536 Midterm Exam Spring 2013

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Syntax. In Text: Chapter 3

Syntax and Grammars 1 / 21

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

4. Lexical and Syntax Analysis

Lexical and Syntax Analysis

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Top down vs. bottom up parsing

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

4. Lexical and Syntax Analysis

Topic 3: Syntax Analysis I

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Wednesday, August 31, Parsers

Chapter 4. Lexical and Syntax Analysis

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Chapter 3. Describing Syntax and Semantics ISBN

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

Lexical and Syntax Analysis

ICOM 4036 Spring 2004

Non-deterministic Finite Automata (NFA)

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Syntax-Directed Translation. Lecture 14

CMSC 330: Organization of Programming Languages. Context Free Grammars

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Principles of Programming Languages COMP251: Syntax and Grammars

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Introduction to Lexical Analysis

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Week 2: Syntax Specification, Grammars

A programming language requires two major definitions A simple one pass compiler

CS 230 Programming Languages

3. Parsing. Oscar Nierstrasz

Part 3. Syntax analysis. Syntax analysis 96

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

COMP3131/9102: Programming Languages and Compilers

CS Parsing 1

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

Context-Free Grammars

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Grammars and Parsing. Paul Klint. Grammars and Parsing

Building Compilers with Phoenix

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Context-free grammars (CFG s)

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Programming Language Syntax and Analysis

A Simple Syntax-Directed Translator

Lexical Analysis. Introduction

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Syntax Intro and Overview. Syntax

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Outline. Top Down Parsing. SLL(1) Parsing. Where We Are 1/24/2013

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Describing Syntax and Semantics

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

CMSC 330: Organization of Programming Languages

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Syntax Analysis. Chapter 4

LECTURE 7. Lex and Intro to Parsing

COP4020 Programming Assignment 2 - Fall 2016

Abstract Syntax Trees & Top-Down Parsing

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

shift-reduce parsing

Downloaded from Page 1. LR Parsing

ECE251 Midterm practice questions, Fall 2010

Parsing II Top-down parsing. Comp 412

Transcription:

Course Overview Introduction (Chapter 1) Compiler Frontend: Today Lexical Analysis & Parsing (Chapter 2,3,4) Semantic Analysis (Chapter 5) Activation Records (Chapter 6) Translation to Intermediate Code (Chapter 7) Basic Blocks and Traces (Chapter 8) Compiler Backend: Instruction Selection (Chapter 9) Liveness Analysis (Chapter 10) Register Allocation (Chapter 11) Code Emission (Chapter 12) 1

Parsing Overview Syntax Analysis Scanning: recognize words or tokens in the input (ch 2) Parsing: recognize phrase structure (Chapter 3) Different parsing strategies: Top Down vs Bottom Up. Top Down predictive parsers (LL parser generators) Manual Construction Using JavaCC Bottum Up Parsers (LR parser generators) AST Construction (Chapter 4) 2

RECAP: Syntax Analysis The job of syntax analysis is to read the source text and determine its phrase structure. Subphases Scanning Parsing Construct an internal representation of the source text that reifies the phrase structure (usually an AST) Note: A single-pass compiler usually does not construct an AST. 3

1) Scan: Divide Input into Tokens An example mini Triangle source program: let var y: Integer in!new year y := y+1 scanner Tokens are words in the input, for example keywords, operators, identifiers, literals, etc. let let var var ident. y colon : ident. Integer in in...... ident. y becomes := ident. y op. + intlit 1 eot 4

2) Parse: Determine phrase structure Parser analyzes the phrase structure of the token stream with respect to the grammar of the language. Program single-command Declaration single-declaration single-command Expression primary-exp Type Denoter V-Name V-Name primary-exp Ident Ident Ident Ident Op. Int.Lit let var id. col. id. in id. bec. id. op intlit eot let var y : Int in y := y + 1 5

3) AST Construction Program LetCommand AssignCommand VarDecl BinaryExpr SimpleT SimpleV. VNameExp Int.Expr SimpleV Ident Ident Ident Ident Op Int.Lit y Integer y y + 1 6

Grammars (ctd.) For our convenience, we will use EBNF or Extended BNF rather than simple BNF. EBNF = BNF + regular expressions Example: Mini Triangle in EBNF * means 0 or more occurrences of Program ::= single Command Command ::= ( Command ; )* single Command single Command ::= V name := Expression begin Command end... 7

Regular Expressions RE are a notation for expressing a set of strings of terminal symbols. Different kinds of RE: ε The empty string t Generates only the string t X Y Generates any string xy such that x is generated by x and y is generated by Y X Y Generates any string which generated either by X or by Y X* The concatenation of zero or more strings generated by X (X) For grouping, 8

Regular Expressions The languages that can be defined by RE and CFG have been extensively studied by theoretical computer scientists. These are some important conclusions / terminology RE is a weaker formalism than CFG: Any language expressible by a RE can be expressed by CFG but not the other way around! The languages expressible as RE are called regular languages Generally: a language that exhibits self embedding cannot be expressed by RE. Programming languages exhibit self embedding. (Example: an expression can contain an (other) expression). 9

Extended BNF Extended BNF combines BNF with RE A production in EBNF looks like LHS ::= RHS where LHS is a non terminal symbol and RHS is an extended regular expression An extended RE is just like a regular expression except it is composed of terminals and non terminals of the grammar. Simply put... EBNF adds to BNF the notation of (...) for the purpose of grouping and * for denoting 0 or more repetitions of 10

Extended BNF: an Example Example: a simple expression language Expression ::= PrimaryExp (Operator PrimaryExp)* PrimaryExpression ::= Literal Identifier ( Expression ) Identifier ::= Letter (Letter Digit)* Literal ::= Digit Digit* Letter ::= a b c... z Digit ::= 0 1 2 3 4... 9 11

Parsing We will now look at parsing. Topics: Different types of parsing strategies bottom up top down Recursive descent parsing What is it How to implement one given an EBNF specification Bottom up parsing algorithms 12

Parsing: Some Terminology Recognition To answer the question does the input conform to the syntax of the language Parsing Recognition + determine phrase structure (for example by generating AST data structures) (Un)ambiguous grammar: A grammar is unambiguous if there is only at most one way to parse any input. (i.e. for syntactically correct program there is precisely one parse tree) 13

Different kinds of Parsing Algorithms Two big groups of algorithms can be distinguished: bottom up strategies top down strategies Example parsing of Micro-English Sentence ::= Subject Verb Object. Subject ::= I a Noun the Noun Object ::= me a Noun the Noun Noun ::= cat mat rat Verb ::= like is see sees The cat sees the rat. The rat sees me. I like a cat The rat like me. I see the rat. I sees a rat. 14

Bottom up parsing (LR) The parse tree grows from the bottom (leafs) up to the top (root). Sentence Subject Object Noun Verb Noun The cat sees a rat. 15

Top-down parsing (LL) The parse tree is constructed starting at the top (root). Sentence Subject Verb Object. Noun Noun The cat sees a rat. 16

Recursive Descent Parsing Recursive descent parsing is a straightforward topdown parsing algorithm. We will now look at how to develop a recursive descent parser from an EBNF specification. Idea: the parse tree structure corresponds to the call graph structure of parsing procedures that call each other. In the book called predictive parsing. Theorethical name for this family of parsers and languages: LL k (manually developed usually with k = 1, computer generated can use any k) 17

Recursive Descent Parsing Sentence ::= Subject Verb Object. Subject ::= I a Noun the Noun Object ::= me a Noun the Noun Noun ::= cat mat rat Verb ::= like is see sees Define a procedure parsen for each non terminal N private void parsesentence() ; private void parsesubject(); private void parseobject(); private void parsenoun(); private void parseverb(); 18

Recursive Descent Parsing public class MicroEnglishParser { private TerminalSymbol currentterminal; //Auxiliary methods will go here... } //Parsing methods will go here... 19

Recursive Descent Parsing: Auxiliary Methods public class MicroEnglishParser { private TerminalSymbol currentterminal private void accept(terminalsymbol expected) { if (currentterminal matches expected) currentterminal = next input terminal ; else report a syntax error } }... 20

Recursive Descent Parsing: Parsing Methods Sentence ::= Subject Verb Object. private void parsesentence() { parsesubject(); parseverb(); parseobject(); accept(. ); } 21

Recursive Descent Parsing: Parsing Methods Subject ::= I a Noun the Noun private void parsesubject() { if (currentterminal matches I ) accept( I ); else if (currentterminal matches a ) { accept( a ); parsenoun(); } else if (currentterminal matches the ) { accept( the ); parsenoun(); } else report a syntax error } 22

Recursive Descent Parsing: Parsing Methods Noun ::= cat mat rat private void parsesubject() { if (currentterminal matches cat ) accept( cat ); else if (currentterminal matches mat ) accept( mat ); else if (currentterminal matches rat ) accept( rat ); else report a syntax error } 23

Recursive Descent Parsing: Parsing Methods Object Verb ::= me a Noun the Noun ::= like is see sees private void parseobject() {? } private void parseverb() {? } 24

Systematic Development of RD Parser (1) Express grammar in EBNF (2) Grammar Transformations: Left factorization and Left recursion elimination (3) Create a parser class with private variable currenttoken methods to call the scanner: accept and acceptit (4) Implement parsing methods: add parsen method for each non terminal N public parse method that gets the first token form the scanner calls parses (S is the start symbol of the grammar) 25

An Example Let's develop a RD parser for this language. I'll sketch out the algorithm and then we'll use JavaCC to do the heavy lifting. Exp ::= Exp ( * ) Exp Num Ide Ide [ Exp ] ( Exp ) Ide ::= Letter (Letter Digit)* Num ::= Digit+ 26

Algorithm for RD parsing Basic idea: Compute FIRST and FOLLOW sets for nonterminals and EBNF phrases. Use first and follow sets to make decisions on what parser should do next based on a lookahead token. LL(1) means one lookahead token. JavaCC can do LL(k) for any k. Beware: higher k can be quite expensive (large parsing tables!) 27

FIRST, FOLLOW and NULLABLE See book for detailed algorithms to compute these. FIRST (X) : Set of tokens that can be the first token of an X. FOLLOW(X) : Set of tokens that can be seen in a valid program, directly after parsing the non-terminal X. NULLABLE(X) : True if the non-terminal X can derive the empty string. 28

Algorithm to convert EBNF into a RD parser parse t where t is a terminal accept(t); parse N where N is a non terminal parsen(); parse ε // a dummy statement parse XY parse X parse Y 29

Algorithm to convert EBNF into a RD parser parse X* while (currenttoken.kind is in FIRST[X]) { parse X } // continue if currenttoken.kind is in FOLLOW[X] parse X Y switch (currenttoken.kind) { cases in FIRST[X]: parse X break; cases in FIRST[Y]: parse Y break; default: report syntax error } 30

LL 1 Grammars The presented algorithm to convert EBNF into a parser does not work for all possible grammars. It only works for so called LL 1 grammars. Basically, an LL1 grammar is a grammar which can be parsed with a top-down parser with a lookahead (in the input stream of tokens) of one token. What grammars are LL1? How can we recognize that a grammar is (or is not) LL1? => We can deduce the necessary conditions from the parser generation algorithm. 31

LL 1 Grammars parse X* while (currenttoken.kind is in FIRST[X]) { parse X } // continue if currenttoken.kind is in FOLLOW[X] parse X Y switch (currenttoken.kind) { cases in starters[x]: } Condition: FIRST[X] must be disjoint from FOLLOW[X] parse X break; cases in starters[y]: parse Y break; default: report syntax error Condition: FIRST[X] and FIRST[Y] must be disjoint 32

Is our example grammar LL1? An Example Exp ::= Exp ( * ) Exp Num Ide Ide [ Exp ] ( Exp ) Ide ::= <IDE> Num ::= <NUM> One way to find out is feed it to an LL1 parser generator such as JavaCC. We'll find it is not, and we have to fix some problems... Let's do it! 33