EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Similar documents
EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

LL parsing Nullable, FIRST, and FOLLOW

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

EDA180: Compiler Construc6on. More Top- Down Parsing Abstract Syntax Trees Görel Hedin Revised:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3

LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Top down vs. bottom up parsing

Context-free grammars (CFG s)

Topic 3: Syntax Analysis I

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Compila(on (Semester A, 2013/14)

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

CS502: Compilers & Programming Systems

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

3. Parsing. Oscar Nierstrasz

CS 314 Principles of Programming Languages

Defining syntax using CFGs

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

CS Parsing 1

Lexical and Syntax Analysis (2)

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

LECTURE 3. Compiler Phases

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

CPS 506 Comparative Programming Languages. Syntax Specification

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

CSE 401 Midterm Exam Sample Solution 2/11/15

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Revisit the example. Transformed DFA 10/1/16 A B C D E. Start

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

CA Compiler Construction

ICOM 4036 Spring 2004


Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Syntax. In Text: Chapter 3

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Principles of Programming Languages COMP251: Syntax and Grammars

CS 230 Programming Languages

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

4. Lexical and Syntax Analysis

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Types of parsing. CMSC 430 Lecture 4, Page 1

CS 536 Midterm Exam Spring 2013

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Abstract Syntax Trees & Top-Down Parsing

CSCI312 Principles of Programming Languages

Parsing II Top-down parsing. Comp 412

1. Explain the input buffer scheme for scanning the source program. How the use of sentinels can improve its performance? Describe in detail.

4. Lexical and Syntax Analysis

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Syntax Analysis, III Comp 412

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Defining syntax using CFGs

A Simple Syntax-Directed Translator

3. Parsing. 3.1 Context-Free Grammars and Push-Down Automata 3.2 Recursive Descent Parsing 3.3 LL(1) Property 3.4 Error Handling

Part 3. Syntax analysis. Syntax analysis 96

Building Compilers with Phoenix

Compilers. Predictive Parsing. Alex Aiken

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

COMP3131/9102: Programming Languages and Compilers

Chapter 3. Parsing #1

Syntax Analysis, III Comp 412

Chapter 3. Describing Syntax and Semantics ISBN

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Chapter 4. Lexical and Syntax Analysis

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Context-free grammars

CSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I

CSE 3302 Programming Languages Lecture 2: Syntax

Outline. Top Down Parsing. SLL(1) Parsing. Where We Are 1/24/2013

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Lecture 8: Deterministic Bottom-Up Parsing

Lexical and Syntax Analysis. Top-Down Parsing

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Transcription:

EDA180: Compiler Construc6on Top- down parsing Görel Hedin Revised: 2013-01- 30a

Compiler phases and program representa6ons source code Lexical analysis (scanning) Intermediate code genera6on tokens intermediate code Syntac6c analysis (parsing) Op6miza6on AST APributed AST intermediate code Seman6c analysis Analysis Machine code genera6on Synthesis machine code 2

A closer look at the parser text Scanner tokens Defined by: regular expressions Parser Pure parsing concrete parse tree (implicit) This lecture context- free grammar AST building AST abstract grammar 3

Different parsing algorithms Unambiguous LR LL Ambiguous This lecture All context- free grammars LL: Left-to-right scan Leftmost derivation Builds tree top-down Simple to understand LR: Left-to-right scan Rightmost derivation Builds tree bottom-up More powerful 4

CompoundStmt IfStmt LL and LR parsers, main idea Id Assign Id Assign Id Id if ID then ID = ID ; ID... if ID then ID = ID ; ID... LL(1): decides to build Assign after seeing the first token of its subtree. The tree is built top down. LR(1): decides to build Assign after seeing the first token following its subtree. The tree is built bottom up. The token is called lookahead. LL(k) and LR(k) use k lookahead tokens. 5

Recursive- descent parsing A way of programming an LL(1) parser by recursive method calls Assume an EBNF grammar with exactly one produc6on rule for each nonterminal symbol. For each nonterminal, a method is constructed. A nonterminal method matches tokens and calls other nonterminal methods, according to the grammar. If the lookahead token does not match, an error is reported. A -> B C D B -> a C b D C ->... D ->... 6

Example Java implementa6on: overview statement -> assignment compoundstmt assignment-> ID ASSIGN expr SEMICOLON compoundstmt -> LBRACE statement* RBRACE expr ->... class Parser { private int token; "// current lookahead token void accept(int t) {...} "// accept t and read in next token void error(string str) {...}"// generate error message void statement() {...} void assignment () {...} void compoundstmt () {...}... } 7

Example: recursive descent methods statement -> assignment compoundstmt assignment-> ID ASSIGN expr SEMICOLON compoundstmt -> LBRACE statement* RBRACE class Parser { void statement() { switch(token) { case ID: assignment(); break; case LBRACE: compoundstmt(); break; default: error("expecting statement, found: " + token); } } void assignment() { accept(id); accept(assign); expr(); accept(semicolon); } void compoundstmt() { accept(lbrace); while (token!=rbrace) { statement(); } accept(rbrace); }... } 8

Example: Parser skeleton details statement -> assignment compoundstmt assignment-> ID ASSIGN expr SEMICOLON compoundstmt -> LBRACE statement* RBRACE expr ->... class Parser { final static int ID=1, WHILE=2, DO=3, ASSIGN=4,...; private int token; "// current lookahead token void accept(int t) { "// accept t and read in next token if (token==t) { token = nexttoken(); } else { error("expected " + t + ", but found " + token); } } void error(string str) {...}"// generate error message private int nexttoken() {...} // read next token from scanner void statement()...... } 9

Are these grammars LL(1)? expr -> name params name Common prefix expr -> expr "+" term term term -> ID Led recursion What would happen in a recursive- descent parser? Could they be LL(2)? LL(k)? 10

Dealing with common prefix of limited length: Local lookahead LL(2) grammar: statement -> assignment compoundstmt callstmt assignment-> ID ASSIGN expr SEMICOLON compoundstmt -> LBRACE statement* RBRACE callstmt -> ID LPAR expr RPAR SEMICOLON void statement()... 11

Dealing with common prefix of limited length: Local lookahead LL(2) grammar: statement -> assignment compoundstmt callstmt assignment-> ID ASSIGN expr SEMICOLON compoundstmt -> LBRACE statement* RBRACE callstmt -> ID LPAR expr RPAR SEMICOLON void statement() { switch(token) { case ID: if (lookahead(2) == ASSIGN) { assignment(); } else { callstmt(); } break; case LBRACE: compoundstmt(); break; default: error("expecting statement, found: " + token); } } 12

Common prefix? If two produc6ons can derive a sentence star6ng in the same way, they share a common prefix. A -> a B A -> a C B -> b C -> c A -> B a A -> B b B -> c d A -> a B B -> a C B -> b C C -> c A has two rules that can derive the prefix a The grammar is LL(2) A has two rules that can derive the prefix c d The grammar is LL(3) No problem. The two rules that start the same cannot be derived from the same nonterminal. The grammar is LL(1) Which nonterminals have common prefix produc6ons? How long is the common prefix? Is the grammar LL(1), LL2(),...? 13

Common prefix? A -> B A -> C A -> D B -> b a C -> b d D -> e A has two rules that can derive the prefix b The grammar is LL(2) A -> B a A -> B b B -> B c B -> d A has two rules that can derive the prefix d c* So, the prefix can become arbitrarily long. The grammar is not LL(k), no matter what k we use. We need to rewrite the grammar, or use another parsing method. Which nonterminals have common prefix produc6ons? How long is the common prefix? Is the grammar LL(1), LL2(),...? 14

Elimina6ng the common prefix Rewrite to an equivalent grammar without the common prefix Exp -> Name Params Exp -> Name With common prefix - not LL(1) 15

Elimina6ng the common prefix Rewrite to an equivalent grammar without the common prefix Exp -> Name Params Exp -> Name With common prefix - not LL(1) Exp -> Name OptParams OptParams -> Params OptParams -> ε Without common prefix - LL(1) Eliminating a common prefix this way is called "left factoring". 16

Elimina6ng the common prefix Rewrite to an equivalent grammar without the common prefix A -> B A -> C B -> b a B -> e D B -> f C -> b d D -> B C Indirect common prefix 17

Elimina6ng the common prefix Rewrite to an equivalent grammar without the common prefix A -> B A -> C B -> b a B -> e D B -> f C -> b d D -> B C Indirect common prefix First, make the common prefix directly visible: Substitute all B right-hand sides into the A -> B rule We can't remove the B rules since B is used in other places. Similarly for the A -> C rule A -> b a A -> e D A -> f B -> b a B -> e D B -> f A -> b d C -> b d D -> B C Direct common prefix Then, eliminate the direct common prefix, as previously. 18

Dealing with led recursion in LL parsers Method 1: Rewrite to an equivalent grammar without led recursion (A bit cumbersome) Left-recursive grammar not LL(k) E -> E "+" T E -> T T-> ID Rewrite to right-recursion! But there is now a common prefix! Still not LL(k). E -> T "+" E E -> T T-> ID Eliminate the common prefix. The grammar is now LL(1) E -> T E' E' -> "+" E E' -> ε T-> ID A left-recursive AST can be built during the right-recursive parse. 19

Dealing with led recursion in LL parsers Method 2: Rewrite to EBNF (Easy!) Left-recursive grammar not LL(k) E -> E "+" T E -> T T-> ID Rewrite to EBNF! E -> T ( "+" T )* T-> ID A left-recursive AST can be built during the iteration. 20

JavaCC: An LL- based parser generator CFG (in Java-like spec langauge) JavaCC Parser (in Java code) 21

JavaCC specifica6on CFG: statement -> assignment compoundstmt assignment-> ID ASSIGN expr SEMICOLON compoundstmt -> LBRACE statement* RBRACE JavaCC: void statement() : {} { assignment() compoundstmt() } void assignment() : {} { id() <ASSIGN> expr() <SEMICOLON> } void compoundstmt() : {} { <LBRACE> (statement())* <RBRACE> } void id() : {} { <ID> } Place where Java code can be added. You can also add Java code inside the rules. (For semantic actions, e.g. build the AST) Good idea to add a nonterminal id for ID tokens. This way you can avoid code duplication in the semantic actions. 22

Using local lookahead in JavaCC Local lookahead can be used to discriminate between an assignment and a procedure call: statement -> assignment callstmt whilestmt assignment -> ID ASSIGN expr SEMICOLON callstmt -> ID LPAR expr RPAR SEMICOLON JavaCC: void statement() : {} { LOOKAHEAD(2) assignment() callstmt() whilestmt() }... A lookahead of 2 tokens will be used before selecting assignment. If that fails, ordinary single-token lookahead will be used in the following alternatives. 23

Using EBNF in JavaCC Straight forward! expr -> term (PLUS term)* term -> factor (TIMES factor)* factor -> ID INT LPAR expr RPAR JavaCC: void expr() : {} { term() (<PLUS> term())* } void term() : {} { factor() (<TIMES> factor())* } void factor() : {} { id() intexpr() <LPAR> expr () <RPAR> } 24

Algorithm for construc6ng an LL(1) parser Fairly simple. The non-trivial part: how to select the correct production p for X, based on the lookahead token. X p1: X ->... p2: X ->... Which tokens can occur in the FIRST position?... t 1... t n t n+1... FIRST FOLLOW Can one of the productions derive the empty string? I.e., is it "NULLABLE"? If so, which tokens can occur in the FOLLOW position? 25

Steps in construc6ng an LL(1) parser 1. Write the grammar on canonical form 2. Analyze the grammar to construct a table. The table shows what production to select, given the current lookahead token. 3. Conflicts in the table? The grammar is not LL(1). 4. No conflicts? Straight forward implementation using table-driven parser or recursive descent. t 1 t 2 t 3 t 4 X 1 p1 p2 X 2 p3 p3 p4 26

Example: Construct the LL(1) table for this grammar: p1: statement -> assignment p2: statement -> compoundstmt p3: assignment -> ID "=" expr ";" p4: compoundstmt -> "{" statements "}" p5: statements -> statement statements p6: statements -> ε statement assignment compoundstmt statements ID "=" ";" "{" "}" For each production p: X -> γ, we are interested in: FIRST(γ) the tokens that occur first in a sentence derived from γ. NULLABLE(γ) is it possible to derive ε from γ? And if so: FOLLOW(X) the tokens that can occur immediately after an X-sentence. 27

Example: Construct the LL(1) table for this grammar: p1: statement -> assignment p2: statement -> compoundstmt p3: assignment -> ID "=" expr ";" p4: compoundstmt -> "{" statements "}" p5: statements -> statement statements p6: statements -> ε ID "=" ";" "{" "}" statement p1 p2 assignment p3 compoundstmt p4 statements p5 p5 p6 To construct the table, look at each production p: X -> γ. Compute the token set FIRST(γ). Add p to each corresponding entry for X. Then, check if γ is NULLABLE. If so, compute the token set FOLLOW(X), and add p to each corresponding entry for X. 28

Example: Dealing with End of File: p1: vardecl -> type ID optinit p2: type -> "integer" p3: type -> "boolean" "=" expr ";" p4: optinit -> "=" INT p5: optinit -> ε ID integer boolean "=" ";" INT vardecl type optinit 29

Example: Dealing with End of File: p0: S -> vardecl EOF p1: vardecl -> type ID optinit p2: type -> "integer" p3: type -> "boolean" "=" expr ";" p4: optinit -> "=" INT p5: optinit -> ε ID integer boolean "=" ";" INT EOF S p0 p0 vardecl p1 p1 type p2 p3 optinit p4 p5 30

Example: Ambiguous grammar: p1: E -> E "+" E p2: E -> ID p3: E -> INT E "+" ID INT 31

Example: Ambiguous grammar: p1: E -> E "+" E p2: E -> ID p3: E -> INT "+" ID INT E p1, p2 p1, p3 Collision in a table entry! The grammar is not LL(1) An ambiguous grammar is not even LL(k) adding more lookahead does not help. 32

Example: Unambiguous, but led- recursive grammar: p1: E -> E "*" F p2: E -> F p3: F -> ID p4: F -> INT E F "*" ID INT 33

Example: Unambiguous, but led- recursive grammar: p1: E -> E "*" F p2: E -> F p3: F -> ID p4: F -> INT "*" ID INT E p1,p2 p1,p2 F p3 p4 Collision in a table entry! The grammar is not LL(1) A grammar with left-recursion is not even LL(k) adding more lookahead does not help. 34

Example: Grammar with common prefix: p1: E -> F "*" E p2: E -> F p3: F -> ID p4: F -> INT p5: F -> "(" E ")" E F "*" ID INT "(" ")" 35

Example: Grammar with common prefix: p1: E -> F "*" E p2: E -> F p3: F -> ID p4: F -> INT p5: F -> "(" E ")" "*" ID INT "(" ")" E p1,p2 p1,p2 p1,p2 F p3 p4 p5 Collision in a table entry! The grammar is not LL(1) A grammar with common prefix is not LL(1). Some grammars with common prefix are LL(k), for some k, but not this one. 36

Example: Another grammar with common prefix: p1: Stmt -> ID "(" IdList ")" p2: Stmt -> ID "=" Exp Stmt... ID "(" ")" "=" 37

Example: Another grammar with common prefix: p1: Stmt -> ID "(" IdList ")" p2: Stmt -> ID "=" Exp Stmt... ID "(" ")" "=" p1, p2 Collision in a table entry! The grammar is not LL(1) A grammar with common prefix is not LL(1) But this grammar is LL(2) 38

We could create an LL(2) table! (global lookahead 2) p1: Stmt -> ID "(" IdList ")" p2: Stmt -> ID "=" Exp ID ID ID "(" ID "=" ID ")" "(" ID "(" "("... Stmt p1 p2... No conflicts! The grammar is LL(2)! But k > 1 gives very large tables inefficient! 39

A beper alterna6ve: Local lookahead! p1: Stmt -> ID "(" IdList ")" p2: Stmt -> ID "=" Exp Stmt "(" "="... p1 ID "(" ")" "=" p2 No collisions! A slightly more complex table structure for the local lookahead. Will be efficient. JavaCC can generate both LL(k) and local lookahead parsers. Using k > 1 is not recommended. Too slow parsing. Use local lookahead if needed. 40

Summary: construc6ng an LL(1) parser 1. Write the grammar on canonical form 2. Analyze the grammar using FIRST, NULLABLE, and FOLLOW. 3. Use the analysis to construct a table. The table shows what production to select, given the current lookahead token. 4. Conflicts in the table? The grammar is not LL(1). 5. No conflicts? Straight forward implementation using table-driven parser or recursive descent. 41

Summary ques6ons Construct a CFG for a simple part of a programming language. Construct a recursive descent parser for a simple language. Give typical examples of ambigui6es in CFGs What is the difference between LL(1) and LL(k)? What is a "common prefix", and how can it be eliminated? What is meant by "led factoring"? What is "led recursion" and how can it be eliminated? In what way can an LL syntax tree differ from the desired AST? Construct an EBNF grammar for conven6onal arithme6c expressions that respect standard precedence and associa6vity. What is NULLABLE(X), FIRST(X), and FOLLOW(X)? Construct an LL(1) table for a grammar. What does it mean if there is a collision in an LL(1) table? What is the difference between local lookahead and global lookahead? Why can it be useful to add an end- of- file rule to some grammars? How can we decide if a grammar is LL(1) or not? 42

Readings F4: Predic6ve parsing. Recursive descent. LL grammars and parsing. Led recursion and factoriza6on. Appel, chapter 3.2 43