Types of parsing. CMSC 430 Lecture 4, Page 1

Similar documents
CA Compiler Construction

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Syntactic Analysis. Top-Down Parsing

3. Parsing. Oscar Nierstrasz

Parsing Part II (Top-down parsing, left-recursion removal)

CSCI312 Principles of Programming Languages

Computer Science 160 Translation of Programming Languages

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Syntax Analysis, III Comp 412

CS 406/534 Compiler Construction Parsing Part I

Syntax Analysis, III Comp 412

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Table-Driven Parsing

CS 314 Principles of Programming Languages

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.


Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Introduction to Parsing. Comp 412

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Monday, September 13, Parsers

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Syntactic Analysis. Top-Down Parsing. Parsing Techniques. Top-Down Parsing. Remember the Expression Grammar? Example. Example

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Parsing II Top-down parsing. Comp 412

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

Top down vs. bottom up parsing

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

CS502: Compilers & Programming Systems

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Wednesday, August 31, Parsers

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Concepts Introduced in Chapter 4

Compilers. Predictive Parsing. Alex Aiken

CS 321 Programming Languages and Compilers. VI. Parsing

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Chapter 3. Parsing #1

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

CS Parsing 1

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

CMSC 330: Organization of Programming Languages

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

Lecture Notes on Top-Down Predictive LL Parsing

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Context-free grammars

Chapter 4: LR Parsing

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Compiler Construction: Parsing

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CSC 4181 Compiler Construction. Parsing. Outline. Introduction

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

Topdown parsing with backtracking

LL parsing Nullable, FIRST, and FOLLOW

Parsing. Earley Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 39

CS 406/534 Compiler Construction Parsing Part II LL(1) and LR(1) Parsing

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Lexical and Syntax Analysis. Top-Down Parsing

Syntactic Analysis. Chapter 4. Compiler Construction Syntactic Analysis 1

Topic 3: Syntax Analysis I

Extra Credit Question

CMSC 330: Organization of Programming Languages

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Compila(on (Semester A, 2013/14)

Syntax Analyzer --- Parser

Compiler Construction 2016/2017 Syntax Analysis

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

A programming language requires two major definitions A simple one pass compiler

Table-Driven Top-Down Parsers

S Y N T A X A N A L Y S I S LR

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

VIVA QUESTIONS WITH ANSWERS

CSE431 Translation of Computer Languages

Transcription:

Types of parsing Top-down parsers start at the root of derivation tree and fill in picks a production and tries to match the input may require backtracking some grammars are backtrack-free (predictive) Bottom-up parsers start at the leaves and fill in start in a state valid for legal first tokens as input is consumed, change state to encode possibilities (recognize valid prefixes) use a stack to store both state and sentential forms CMSC 430 Lecture 4, Page 1

Top-down parsing A top-down parser starts with the root of the parse tree. It is labeled with the start symbol or goal symbol of the grammar. To build a parse, it repeats the following steps until the fringe of the parse tree matches the input string. 1. At a node labeled A, select a production with A on its lhs and for each symbol on its rhs, construct the appropriate child. 2. When a terminal is added to the fringe that doesn t match the input string, backtrack. 3. Find the next node to be expanded. (Must have a label in NT ) The key is selecting the right production in step 1. should be guided by input string CMSC 430 Lecture 4, Page 2

Example grammar This is a grammar for simple expressions: 1 <goal> ::= <expr> 2 <expr> ::= <expr> + <term> 3 <expr> - <term> 4 <term> 5 <term> ::= <term> * <factor> 6 <term> / <factor> 7 <factor> 8 <factor> ::= number 9 id Consider parsing the input string x - 2 * y CMSC 430 Lecture 4, Page 3

Backtracking parse example One possible parse for x - 2 * y Prod n Sentential form Input <goal> x - 2 * y 1 <expr> x - 2 * y 3 <expr> - <term> x - 2 * y 4 <term> - <term> x - 2 * y 7 <factor> - <term> x - 2 * y 9 <id> - <term> x - 2 * y <id> - <term> x - 2 * y <id> - <term> x - 2 * y 7 <id> - <factor> x - 2 * y 9 <id> - <num> x - 2 * y <id> - <num> x - 2 * y <id> - <term> x - 2 * y 5 <id> - <term> * <factor> x - 2 * y 7 <id> - <factor> * <factor> x - 2 * y 9 <id> - <num> * <factor> x - 2 * y <id> - <num> * <factor> x - 2 * y <id> - <num> * <factor> x - 2 * y 9 <id> - <num> * <id> x - 2 * y <id> - <num> * <id> x - 2 * y CMSC 430 Lecture 4, Page 4

Example Another possible parse for x - 2 * y Prod n Sentential form Input <goal> x - 2 * y 1 <expr> x - 2 * y 2 <expr> + <term> x - 2 * y 2 <expr> + <term> + <term> x - 2 * y 2 <expr> + <term> + <term> + <term> x - 2 * y 2 <expr> + <term> + <term> + <term> + x - 2 * y 2 x - 2 * y If the parser makes the wrong choices, the expansion doesn t terminate. This isn t a good property for a parser to have. CMSC 430 Lecture 4, Page 5

Left recursion Top-down parsers cannot handle left-recursion in a grammar. Formally, a grammar is left recursive if A NT such that a derivation A + Aα for some string α. Our simple expression grammar is left recursive. CMSC 430 Lecture 4, Page 6

Eliminating left recursion To remove left recursion, we can transform the grammar. Consider the grammar fragment: <foo> ::= <foo> α β where α and β do not start with <foo>. We can rewrite this as: <foo> ::= β <bar> <bar> ::= α <bar> ɛ where <bar> is a new non-terminal. This fragment contains no left recursion. CMSC 430 Lecture 4, Page 7

Example Our expression grammar contains two cases of left recursion <expr> ::= <expr> + <term> <expr> - <term> <term> <term> ::= <term> * <factor> <term> / <factor> <factor> Applying the transformation gives <expr> ::= <term> <expr > <expr > ::= + <term> <expr > - <term> <expr > ɛ <term> ::= <factor> <term > <term > ::= * <factor> <term > / <factor> <term > ɛ CMSC 430 Lecture 4, Page 8

Eliminating left recursion A general technique for removing left recursion arrange the non-terminals in some order A 1, A 2,..., A n for i 1 to n for j 1 to i-1 replace each production of the form A i ::= A j γ with the productions A i ::= δ 1 γ δ 2 γ... δ k γ, where A j ::= δ 1 δ 2... δ k are all the current A j productions. eliminate any immediate left recursion on A i using the direct transformation This assumes that the grammar has no cycles (A + A) or ɛ productions (A ::= ɛ). CMSC 430 Lecture 4, Page 9

How does this algorithm work? 1. impose an arbitrary order on the non-terminals 2. outer loop cycles through NT in order 3. inner loop ensures that a production expanding A i has no non-terminal A j with j < i 4. It forward substitutes those away 5. last step in the outer loop converts any direct recursion on A i to right recursion using the simple transformation showed earlier 6. new non-terminals are added at the end of the order and only involve right recursion At the start of the i th outer loop iteration for all k < i, a production expanding A k that has A l in its rhs, for l < k. At the end of the process (n < i), the grammar has no remaining left recursion. CMSC 430 Lecture 4, Page 10

How much lookahead is needed? We saw that top-down parsers may need to backtrack when they select the wrong production Do we need arbitrary lookahead to parse CFGs? in general, yes Fortunately large subclasses of CFGs can be parsed with limited lookahead most programming language constructs can be expressed in a grammar that falls in these subclasses Among the interesting subclasses are LL(1) and LR(1). CMSC 430 Lecture 4, Page 11

Recursive Descent Parsing Properties top-down parsing algorithm parser built on procedure calls procedures may be (mutually) recursive Algorithm write procedure for each non-terminal turn each production into clause insert call to procedure A() for non-terminal A to match(x) for terminal x start by invoking procedure for start symbol S Example A ::= a B c A() { match(a); B(); match(c); } CMSC 430 Lecture 4, Page 12

Recursive Descent Parsing Example grammar 1 S ::= a A 2 b 3 A ::= S c Helpers tok; // current token match(x) { if (tok!= x) error(); tok = gettoken(); } Parser S() { if (tok == a) match(a); A(); else if (tok == b) match(b); else error(); } A() { S(); match(c); } CMSC 430 Lecture 4, Page 13

Predictive Parsing Basic idea For any two productions A ::= α β, we would like a distinct way of choosing the correct production to expand. FIRST sets For some rhs α G, define FIRST(α) as the set of tokens that appear as the first symbol in some string derived from α. That is, x FIRST(α) iff α xγ for some γ. LL(1) property Whenever two productions A ::= α and A ::= β both appear in the grammar, we would like F IRST (α) F IRST (β) = ɛ This would allow the parser to make a correct choice with a lookahead of only one symbol! Pursuing this idea leads to predictive LL(1) parsers. CMSC 430 Lecture 4, Page 14

Left Factoring What if a grammar does not have the LL(1) property? Sometimes, we can transform a grammar to have this property. For each non-terminal A find the longest prefix α common to two or more of its alternatives. if α = ɛ, then replace all of the A productions A ::= αβ 1 αβ 2 αβ n γ with A ::= αl γ L ::= β 1 β 2 β n where L is a new non-terminal. Repeat until no two alternatives for a single non-terminal have a common prefix. CMSC 430 Lecture 4, Page 15

Example Consider a right-recursive version of the expression grammar: 1 <goal> ::= <expr> 2 <expr> ::= <term> + <expr> 3 <term> - <expr> 4 <term> 5 <term> ::= <factor> * <term> 6 <factor> / <term> 7 <factor> 8 <factor> ::= number 9 id To choose between productions 2, 3, & 4, the parser must see past the number or id and look at the +, -, *, or /. This grammar fails the test. FIRST(2) FIRST(3) FIRST(4) Note: This grammar is right-associative. CMSC 430 Lecture 4, Page 16

Example There are two nonterminals that must be left factored: <expr> ::= <term> + <expr> <term> - <expr> <term> <term> ::= <factor> * <term> <factor> / <term> <factor> Applying the transformation gives us: <expr> ::= <term> <expr > <expr > ::= + <expr> - <expr> ɛ <term> ::= <factor> <term > <term > ::= * <term> / <term> ɛ CMSC 430 Lecture 4, Page 17

Example Substituting back into the grammar yields 1 <goal> ::= <expr> 2 <expr> ::= <term> <expr > 3 <expr > ::= + <expr> 4 - <expr> 5 ɛ 6 <term> ::= <factor> <term > 7 <term > ::= * <term> 8 / <term> 9 ɛ 10 <factor> ::= number 11 id Now, selection requires only a single token lookahead. Note: This grammar is still right-associative. CMSC 430 Lecture 4, Page 18

Example: Sentential form Input <goal> x - 2 * y 1 <expr> x - 2 * y 2 <term> <expr > x - 2 * y 6 <factor> <term > <expr > x - 2 * y 11 <id> <term > <expr > x - 2 * y <id> <term > <expr > x - 2 * y 9 <id> ɛ <expr > x - 2 4 <id> - <expr> x - 2 * y <id> - <expr> x - 2 * y 2 <id> - <term> <expr > x - 2 * y 6 <id> - <factor> <term > <expr > x - 2 * y 10 <id> - <num> <term > <expr > x - 2 * y <id> - <num> <term > <expr > x - 2 * y 7 <id> - <num> * <term> <expr > x -2 * y <id> - <num> * <term> <expr > x -2 * y 6 <id> - <num> * <factor> <term > <expr > x -2 * y 11 <id> - <num> * <id> <expr > x -2 * y <id> - <num> * <id> <term > <expr > x -2 * y 9 <id> - <num> * <id> <expr > x -2 * y 5 <id> - <num> * <id> x -2 * y The next symbol determined each choice correctly. CMSC 430 Lecture 4, Page 19

Generality Question: By eliminating left recursion and left factoring, can we transform an arbitrary context free grammar to a form where it can be predictively parsed with a single token lookahead? Answer: Given a context free grammar that doesn t meet our conditions, it is undecidable whether an equivalent grammar exists that does meet our conditions. Many context free languages do not have such a grammar. {a n 0b n n 1} {a n 1b 2n n 1} CMSC 430 Lecture 4, Page 20

The FIRST set For a string of grammar symbols α, define FIRST(α) as the set of terminal symbols that begin strings derived from α if α ɛ, then ɛ FIRST(α) FIRST(α) contains the set of tokens valid in the first position of α To build FIRST(X): 1. if X is a terminal, FIRST(X) is {X} 2. if X ::= ɛ, then ɛ FIRST(X) 3. if X ::= Y 1 Y 2 Y k then put FIRST(Y 1 ) in FIRST(X) 4. if X is a non-terminal and X ::= Y 1 Y 2 Y k, then a FIRST(X) if a FIRST(Y i ) and ɛ FIRST(Y j ) for all 1 j < i (If ɛ FIRST(Y 1 ), then FIRST(Y i ) is irrelevant, for 1 < i) CMSC 430 Lecture 4, Page 21

Our example grammar 1 goal ::= expr 2 expr ::= term expr 3 expr ::= + expr 4 - expr 5 ɛ 6 term ::= factor term 7 term ::= * term 8 / term 9 ɛ 10 factor ::= num 11 id CMSC 430 Lecture 4, Page 22

The FIRST construction rule 1 2 3 4 FIRST goal num,id {num,id} expr num,id {num,id} expr ɛ +,- {ɛ,+,-} term num,id {num,id} term ɛ *,/ {ɛ,*,/} factor num,id {num,id} num num {num} id id {id} + + {+} - - {-} * * {*} / / {/} CMSC 430 Lecture 4, Page 23

The FOLLOW set For a non-terminal A, define FOLLOW(A) as the set of terminals that can appear immediately to the right of A in some sentential form Thus, a non-terminal s FOLLOW set specifies the tokens that can legally appear after it A terminal symbol has no FOLLOW set To build FOLLOW(X): 1. place eof in FOLLOW( goal ) 2. if A ::= αbβ, then put {FIRST(β) ɛ} in FOLLOW(B) 3. if A ::= αb then put FOLLOW(A) in FOLLOW(B) 4. if A ::= αbβ and ɛ FIRST(β), then put FOLLOW(A) in FOLLOW(B) CMSC 430 Lecture 4, Page 24

The FOLLOW construction rule 1 2 3 4 FOLLOW goal eof {eof} expr eof {eof} expr eof {eof} term +,- eof {eof,+,-} term eof,+,- {eof,+,-} factor *,/ eof,+,- {eof,+,-,*,/} CMSC 430 Lecture 4, Page 25

Using FIRST and FOLLOW To build a predicative recursive-descent parser: For each production A ::= α and lookahead token expand A using production if token FIRST(α) if ɛ FIRST(α) expand A using production if token FOLLOW(A) all other tokens return error If multiple choices, the grammar is not LL(1) (predicative). id num + - * / eof goal g e g e expr e te e te expr e +e e -e e ɛ term t ft t ft term t ɛ t ɛ t *t t /t t ɛ factor f id f num CMSC 430 Lecture 4, Page 26

LL(1) grammars Features input parsed from left to right leftmost derivation one token lookahead Definition A grammar G is LL(1) if and only if, for all non-terminals A, each distinct pair of productions A ::= β and A ::= γ satisfy the condition FIRST(β) FIRST(γ) = A grammar G is LL(1) if and only if for each set of productions A ::= α 1 α 2 α n 1. FIRST(α 1 ), FIRST(α 2 ),, FIRST(α n ) are all pairwise disjoint 2. if α i ɛ, then FIRST(α j ) FOLLOW(A) =, for all 1 j n, i =j. If G is ɛ free, condition 1 is sufficient. CMSC 430 Lecture 4, Page 27

LL(1) grammars Provable facts about LL(1) grammars: no left recursive grammar is LL(1) no ambiguous grammar is LL(1) LL(1) parsers operate in linear time an ɛ free grammar where each alternative expansion for A begins with a distinct terminal is a simple LL(1) grammar Not all grammars are LL(1) S ::= as a is not LL(1) FIRST(aS) = FIRST(a) = {a} S ::= as S ::= as ɛ accepts the same language and is LL(1) CMSC 430 Lecture 4, Page 28

LL grammars LL(1) grammars may need to rewrite grammar (left recursion, left factoring) resulting grammar larger, less maintainable LL(k) grammars k-token lookahead, more powerful than LL(1) grammars example: S ::= ac abc is LL(2) Not all grammars are LL(k) example: S ::= a i b j where i j equivalent to dangling else problem problem - must choose production after k tokens of lookahead Bottom-up parsers avoid this problem CMSC 430 Lecture 4, Page 29