CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Similar documents
Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Top down vs. bottom up parsing

Compilers. Predictive Parsing. Alex Aiken

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Types of parsing. CMSC 430 Lecture 4, Page 1

Syntactic Analysis. Top-Down Parsing

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

CSCI312 Principles of Programming Languages

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Parsing Part II (Top-down parsing, left-recursion removal)

Parsing III. (Top-down parsing: recursive descent & LL(1) )

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

CS502: Compilers & Programming Systems

Extra Credit Question

Error Handling Syntax-Directed Translation Recursive Descent Parsing

CA Compiler Construction

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Syntax Analysis, III Comp 412

3. Parsing. Oscar Nierstrasz

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Syntax Analysis, III Comp 412

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

Lexical and Syntax Analysis. Top-Down Parsing

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

Monday, September 13, Parsers

Error Handling Syntax-Directed Translation Recursive Descent Parsing. Lecture 6

Lexical and Syntax Analysis

Table-Driven Parsing

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

CS 406/534 Compiler Construction Parsing Part I


CS 314 Principles of Programming Languages

Introduction to Parsing. Comp 412

Wednesday, August 31, Parsers

LL Parsing: A piece of cake after LR

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Chapter 3. Parsing #1

Compila(on (Semester A, 2013/14)

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

CMSC 330: Organization of Programming Languages

A programming language requires two major definitions A simple one pass compiler

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

CS Parsing 1

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

More Bottom-Up Parsing

Topdown parsing with backtracking

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

LL parsing Nullable, FIRST, and FOLLOW

CS 406/534 Compiler Construction Parsing Part II LL(1) and LR(1) Parsing

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Compiler construction in4303 lecture 3

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Outline. Top Down Parsing. SLL(1) Parsing. Where We Are 1/24/2013

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Lecture Notes on Top-Down Predictive LL Parsing

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

Error Handling Syntax-Directed Translation Recursive Descent Parsing

4. Lexical and Syntax Analysis

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

The procedure attempts to "match" the right hand side of some production for a nonterminal.

Bottom-Up Parsing. Lecture 11-12

Compiler construction lecture 3

4. Lexical and Syntax Analysis

CS 321 Programming Languages and Compilers. VI. Parsing

Concepts Introduced in Chapter 4

Computer Science 160 Translation of Programming Languages

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Bottom-Up Parsing. Lecture 11-12

Defining syntax using CFGs

Transcription:

CS1622 Lecture 9 Parsing (4) CS 1622 Lecture 9 1 Today Example of a recursive descent parser Predictive & LL(1) parsers Building parse tables CS 1622 Lecture 9 2 A Recursive Descent Parser. Preliminaries Let TOKEN be the type of tokens Special tokens INT, OPEN, CLOSE, PLUS, TIMES Let the global variable next point to the next token CS 1622 Lecture 9 3 1

A Recursive Descent Parser (2) Define boolean functions that check the token string for a match of A given token terminal bool term(token tok) { return *next++ == tok; } S -> S 1 S 2 S S n A given production of S (the n th ) bool S n () { } Any production of S: bool S() { } These functions advance next CS 1622 Lecture 9 4 A Recursive Descent Parser (3) For production E T bool E 1 () { return T(); } For production E T + E bool E 2 () { return T() && term(plus) && E(); } For all productions of E (with backtracking) bool E() { TOKEN *save = next; return (next = save, E 1 ()) (next = save, E 2 ()); } CS 1622 Lecture 9 5 A Recursive Descent Parser (4) Functions for non-terminal T bool T 1 () { return term(open) && E() && term(close); } bool T 2 () { return term(int) && term(times) && T(); } bool T 3 () { return term(int); } bool T() { TOKEN *save = next; return (next = save, T 1 ()) (next = save, T 2 ()) (next = save, T 3 ()); } CS 1622 Lecture 9 6 2

Recursive Descent Parsing. Notes. To start the parser Initialize next to point to first token Invoke E() Notice how this simulates our previous example Easy to implement by hand But does not always work CS 1622 Lecture 9 7 When Recursive Descent Does Not Work Consider a production S S a bool S 1 () { return S() && term(a); } bool S() { return S 1 (); } S() will get into an infinite loop A left-recursive grammar has a non-terminal S S + Sα for some α Recursive descent does not work in such cases CS 1622 Lecture 9 8 Elimination of Left Recursion Consider the left-recursive grammar S S α β S generates all strings starting with a β and followed by a number of α Can rewrite using right-recursion S β S S α S ε CS 1622 Lecture 9 9 3

More Elimination of Left- Recursion In general S S α 1 S α n β 1 β m All strings derived from S start with one of β 1,,β m and continue with several instances of α 1,,α n Rewrite as S β 1 S β m S S α 1 S α n S ε CS 1622 Lecture 9 10 General Left Recursion The grammar S A α δ A S β is also left-recursive because S + S β α This left-recursion can also be eliminated - won t go into this CS 1622 Lecture 9 11 Summary of Recursive Descent Simple and general parsing strategy Left-recursion must be eliminated first but that can be done automatically Unpopular because of backtracking Too inefficient In practice, backtracking is eliminated by restricting the grammar CS 1622 Lecture 9 12 4

When Recursive Descent Does Not Work Consider a production S S a bool S 1 () { return S() && term(a); } bool S() { return S 1 (); } S() will get into an infinite loop A left-recursive grammar has a non-terminal S S + Sα for some α Recursive descent does not work in such cases CS 1622 Lecture 9 13 Elimination of Left Recursion Consider the left-recursive grammar S S α β S generates all strings starting with a β and followed by a number of α Can rewrite using right-recursion S β S S α S ε CS 1622 Lecture 9 14 More Elimination of Left- Recursion In general S S α 1 S α n β 1 β m All strings derived from S start with one of β 1,,β m and continue with several instances of α 1,,α n Rewrite as S β 1 S β m S S α 1 S α n S ε CS 1622 Lecture 9 15 5

General Left Recursion The grammar S A α δ A S β is also left-recursive because S + S β α This left-recursion can also be eliminated - won t go into this CS 1622 Lecture 9 16 Summary of Recursive Descent Simple and general parsing strategy Left-recursion must be eliminated first but that can be done automatically Unpopular because of backtracking Thought to be too inefficient In practice, backtracking is eliminated by restricting the grammar CS 1622 Lecture 9 17 Predictive Parsers Like recursive-descent but parser can predict which production to use By looking at the next few tokens No backtracking Predictive parsers accept LL(k) grammars L means left-to-right scan of input L means leftmost derivation k means predict based on k tokens of lookahead In practice, LL(1) is used CS 1622 Lecture 9 18 6

LL(1) Languages In recursive-descent, for each non-terminal and input token there may be a choice of production LL(1) means that for each non-terminal and token there is only one production Implementation Recursive procedures Table driven: Can be specified via 2D table One dimension for current non-terminal to expand One dimension for next token A table entry contains one production CS 1622 Lecture 9 19 Predictive Parser Avoid Backtracking by selecting the correct production How? Use look ahead LL(1) uses 1 lookahead Must structure the grammar to enable this A a B c D e F Not A a B a C CS 1622 Lecture 9 20 Predictive Parsing and Left Factoring Consider the expression grammar E T + E T T int int * T ( E ) Hard to predict because For T two productions start with int For E it is not clear how to predict A grammar must be left-factored before use for predictive parsing CS 1622 Lecture 9 21 7

Left-Factoring Example Recall the grammar E T + E T T int int * T ( E ) Factor out common prefixes of productions E T X X + E ε T ( E ) int Y Y * T ε CS 1622 Lecture 9 22 Predictive Parsing Basic idea Given A α β, the parser should be able to choose between α & β FIRST sets (see later how to compute) For some rhs α G, define FIRST(α) as the set of tokens that appear as the first symbol in some string derived from α That is, x FIRST(α) iff α * x γ, for some γ (Pursuing this idea leads to LL(1) parser generators...) Note: x is terminal token CS 1622 Lecture 9 23 LL(1) Property The LL(1) Property If A α β both appear in the grammar, α and β are strings consisting of terminal and non-terminals, we would like FIRST(α) FIRST(β) = (see later how to compute) This would allow the parser to make a correct choice with a lookahead of exactly one symbol! CS 1622 Lecture 9 24 8

LL(1) Parsing Consider A β 1 β 2 β 3, with FIRST(β 1 ) FIRST(β 2 ) FIRST(β 3 ) = /* find an A */ if (current_word FIRST(β 1 )) do β 1 else if (current_word FIRST(β 2 )) do β 2 else if (current_word FIRST(β 3 )) do β 3 else report an error and return false CS 1622 Lecture 9 25 Predictive Recursive Descent Parsing 1 Goal! Expr 2 Expr! Term Expr" 3 Expr"! + Term Expr" 4 - Term Expr" 5 # 6 Term! Factor Term" 7 Term"! * Factor Term" 8 / Factor Term" 9 # 10 Factor! number 11 id This produces a parser with six mutually recursive routines: Goal Expr Expr_Prime Term Term_Prime Factor Each recognizes one nonterminal CS 1622 Lecture 9 26 Recursive Descent Parsing Goal( ) token next_token( ); if (Expr( ) = true) then next compilation step; else return false; Expr( ) result true; if (Term( ) = false) then result false; else if (EPrime( ) = false) then result false; return result; Factor( ) result true; if (token = Number) then token next_token( ); else if (token = identifier) then token next_token( ); else report syntax error; result false; return result; CS 1622 Lecture 9 27 9

Implementation Alternative Recursive procedures expensive Stack space & call overhead Parsing table + driver normally used as alternative CS 1622 Lecture 9 28 LL(1) Parsing Table Example E X T Y Left-factored grammar E T X T ( E ) int Y The LL(1) parsing table: int T X int Y * * T + + E ε ( T X ( E ) X + E ε Y * T ε CS 1622 Lecture 9 29 ) ε ε $ ε ε LL(1) Parsing Table Example (Cont.) Consider the [E, int] entry When current non-terminal is E and next input is int, use production E T X This production can generate an int in the first place Consider the [Y,+] entry When current non-terminal is Y and current token is +, get rid of Y Y can be followed by + only in a derivation in which Y ε CS 1622 Lecture 9 30 10

LL(1) Parsing Tables. Errors Blank entries indicate error situations Consider the [E,*] entry There is no way to derive a string starting with * from non-terminal E CS 1622 Lecture 9 31 Using Parsing Tables Method similar to recursive descent, except For each non-terminal S We look at the next token a And chose the production shown at [S,a] We use a stack to keep track of pending nonterminals - instead of procedure stack! We reject when we encounter an error state We accept when we encounter end-of-input CS 1622 Lecture 9 32 LL(1) Parsing Algorithm initialize stack = <S $> and next repeat case stack of <X, rest> : if T[X,*next] = Y 1 Y n then stack <Y 1 Y n rest>; else error (); <t, rest> : if t == *next ++ then stack <rest>; else error (); until stack == < > CS 1622 Lecture 9 33 11

LL(1) Parsing Example Stack Input Action E $ int * int $ T X T X $ int * int $ int Y int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal T X $ int $ int Y int Y X $ int $ terminal Y X $ $ ε X $ $ ε $ $ ACCEPT CS 1622 Lecture 9 34 Constructing Parsing Tables LL(1) languages are those defined by a parsing table for the LL(1) algorithm No table entry can be multiply defined We want to generate parsing tables from CFG CS 1622 Lecture 9 35 Constructing Parsing Tables (Cont.) If A α, where in the line of A we place α? In the column of t where t can start a string derived from α α * t β We say that t First(α) In the column of t if α is ε and t can follow an A S * β A t δ We say t Follow(A) CS 1622 Lecture 9 36 12

Computing First Sets Definition First(X) = { t X * tα} {ε X * ε} Algorithm: First(t) = { t } ε First(X) if X ε is a production ε First(X) if X A 1 A n and ε First(A i ) for 1 i n First(α) First(X) if X A 1 A n α and ε First(A i ) for 1 i n CS 1622 Lecture 9 37 First Sets. Example Recall the grammar E T X X + E ε T ( E ) int Y Y * T ε First sets First( ( ) = { ( } First( T ) = {int, ( } First( ) ) = { ) } First( E ) = {int, ( } First( int) = { int } First( X ) = {+, ε } First( + ) = { + } First( Y ) = {*, ε } First( * ) = { * } CS 1622 Lecture 9 38 Computing Follow Sets for Nonterminals Follow(S) needed for productions that generate the empty string ε Definition: Follow(X) = { t S * β X t δ } Note that ε CAN NEVER BE IN FOLLOW(X)!! Intuition If X A B then First(B) Follow(A) and Follow(X) Follow(B) Also if B * ε then Follow(X) Follow(A) If S is the start symbol then $ Follow(S) CS 1622 Lecture 9 39 13

Computing Follow Sets (Cont.) Algorithm sketch: $ Follow(S) For each production A α X β First(β) - {ε} Follow(X) For each production A α X β where ε First(β) Follow(A) Follow(X) CS 1622 Lecture 9 40 Follow Sets. Example Recall the grammar E T X T ( E ) int Y Follow sets X + E ε Y * T ε Follow( E ) = {), $} Follow( X ) = {$, ) } Follow( T ) = {+, ), $} Follow( Y ) = {+, ), $} CS 1622 Lecture 9 41 Constructing LL(1) Parsing Tables Construct a parsing table T for CFG G For each production A α in G do: For each terminal t First(α) do T[A, t] = α If ε First(α), for each t Follow(A) do T[A, t] = α If ε First(α) and $ Follow(A) do T[A, $] = α CS 1622 Lecture 9 42 14

Notes on LL(1) Parsing Tables If any entry is multiply defined then G is not LL(1). Possible reasons: If G is ambiguous If G is left recursive If G is not left-factored Grammar is not LL(1) Most programming language grammars are not LL(1) Some can be made LL(1) though but others can t There are tools that build LL(1) tables E.g. LLGen CS 1622 Lecture 9 43 15