Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Similar documents
Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Extra Credit Question

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Abstract Syntax Trees & Top-Down Parsing

Compilers. Predictive Parsing. Alex Aiken

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Top down vs. bottom up parsing

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

LL Parsing: A piece of cake after LR

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

Types of parsing. CMSC 430 Lecture 4, Page 1

Chapter 3. Parsing #1

Error Handling Syntax-Directed Translation Recursive Descent Parsing

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

CSCI312 Principles of Programming Languages

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Syntactic Analysis. Top-Down Parsing

CS Parsing 1

Error Handling Syntax-Directed Translation Recursive Descent Parsing. Lecture 6

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

CA Compiler Construction

Lexical and Syntax Analysis

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Lexical and Syntax Analysis. Top-Down Parsing

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

More Bottom-Up Parsing

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

3. Parsing. Oscar Nierstrasz

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Monday, September 13, Parsers

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Compila(on (Semester A, 2013/14)

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Syntax Analysis, III Comp 412

CS502: Compilers & Programming Systems

Outline. Top Down Parsing. SLL(1) Parsing. Where We Are 1/24/2013

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

CSE431 Translation of Computer Languages

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

CMSC 330: Organization of Programming Languages

Syntax Analysis, III Comp 412

LL parsing Nullable, FIRST, and FOLLOW

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

CMSC 330: Organization of Programming Languages

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

The procedure attempts to "match" the right hand side of some production for a nonterminal.

Parsing Part II (Top-down parsing, left-recursion removal)

Syntax-Directed Translation. Lecture 14

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

Parsing II Top-down parsing. Comp 412

Wednesday, August 31, Parsers

CS 536 Midterm Exam Spring 2013

Lecture Notes on Top-Down Predictive LL Parsing

Part 3. Syntax analysis. Syntax analysis 96

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Table-Driven Parsing

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

1. Consider the following program in a PCAT-like language.

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

LECTURE 7. Lex and Intro to Parsing

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Question Points Score

CSE 401 Midterm Exam Sample Solution 2/11/15

Introduction to Parsing. Comp 412

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

CS 321 Programming Languages and Compilers. VI. Parsing

Parsing III. (Top-down parsing: recursive descent & LL(1) )


Introduction to Parsing. Lecture 8

Revisit the example. Transformed DFA 10/1/16 A B C D E. Start

Context-free grammars (CFG s)

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

LL(1) Grammars. Example. Recursive Descent Parsers. S A a {b,d,a} A B D {b, d, a} B b { b } B λ {d, a} D d { d } D λ { a }

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Topdown parsing with backtracking

Transcription:

Building a Parser III CS164 3:30-5:00 TT 10 Evans 1

Overview Finish recursive descent parser when it breaks down and how to fix it eliminating left recursion reordering productions Predictive parsers (aka LL(1) parsers) computing FIRST, FOLLOW table-driven, stack-manipulating version of the parser 3

Review: grammar for arithmetic expressions Simple arithmetic expressions: E n id ( E ) E + E E * E Some elements of this language: id n ( n ) n + id id * ( id + id ) 4

Review: derivation Grammar: E n id ( E ) E + E E * E a derivation: E rewrite E with ( E ) ( E ) rewrite E with n ( n ) this is the final string of terminals another derivation (written more concisely): E ( E ) ( E * E ) ( E + E * E ) ( n + E * E ) ( n + id * E ) ( n + id * id ) this is left-most derivation (remember it) always expand the left-most non-terminal can you guess what s right-most derivation? 5

Recursive Descent Parsing Consider the grammar E T + E T T int int * T ( E ) Token stream is: int 5 * int 2 Start with top-level non-terminal E Try the rules for E in order 6

Recursive-Descent Parsing Parsing: given a string of tokens t 1 t 2... t n, find its parse tree Recursive-descent parsing: Try all the productions exhaustively At a given moment the fringe of the parse tree is: t 1 t 2 t k A Try all the productions for A: if A BC is a production, the new fringe is t 1 t 2 t k B C Backtrack when the fringe doesn t match the string Stop when there are no more non-terminals 7

When Recursive Descent Does Not Work Consider a production S S a: In the process of parsing S we try the above rule What goes wrong? A fix? S must have a non-recursive production, say S b expand this production before you expand S S a Problems remain performance (steps needed to parse baaaaa ) termination (parse the error input c ) 8

Solutions First, restrict backtracking backtrack just enough to produce a sufficiently powerful r.d. parser Second, eliminate left recursion transformation that produces a different grammar the new grammar generates same strings but does it give us same parse tree as old grammar? Let s see the restricted r.d. parser first 9

A Recursive Descent Parser (1) Define boolean functions that check the token string for a match of A given token terminal bool term(token tok) { return in[next++] == tok; } A given production of S (the n th ) bool S n () { } Any production of S: bool S() { } These functions advance next 10

A Recursive Descent Parser (2) For production E T + E bool E 1 () { return T() && term(plus) && E(); } For production E T bool E 2 () { return T(); } For all productions of E (with backtracking) bool E() { int save = next; return (next = save, E 1 ()) (next = save, E 2 ()); } 11

A Recursive Descent Parser (3) Functions for non-terminal T bool T 1 () { return term(open) && E() && term(close); } bool T 2 () { return term(int) && term(times) && T(); } bool T 3 () { return term(int); } bool T() { int save = next; return (next = save, T 1 ()) (next = save, T 2 ()) (next = save, T 3 ()); } 12

Recursive Descent Parsing. Notes. To start the parser Initialize next to point to first token Invoke E() Notice how this simulates our backtracking example from lecture Easy to implement by hand Predictive parsing is more efficient 13

Now back to left-recursive grammars Does this style of r.d. parser work for our left-recursive grammar? the grammar: S S a b what happens when S S a is expanded first? what happens when S b is expanded first? 14

Left-recursive grammars A left-recursive grammar has a non-terminal S S + S for some Recursive descent does not work in such cases It goes into an loop Notes: : a shorthand for any string of terminals, nonterminals symbol + is a shorthand for can be derived in one or more steps : S + S is same as S S 15

Elimination of Left Recursion Consider the left-recursive grammar S S S generates all strings starting with a and followed by a number of Can rewrite using right-recursion S S S S 16

Elimination of Left-Recursion. Example Consider the grammar S 1 S 0 ( = 1 and = 0 ) can be rewritten as S 1 S S 0 S 17

More Elimination of Left-Recursion In general S S 1 S n 1 m All strings derived from S start with one of 1,, m and continue with several instances of 1,, n Rewrite as S 1 S m S S 1 S n S 18

General Left Recursion The grammar S A A S is also left-recursive because S + S This left-recursion can also be eliminated See [ASU], Section 4.3 for general algorithm 19

Summary of Recursive Descent simple parsing strategy left-recursion must be eliminated first but that can be done automatically unpopular because of backtracking thought to be too inefficient in practice, backtracking is (sufficiently) eliminated by restricting the grammar so, it s good enough for small languages careful, though: order of productions important even after left-recursion eliminated try to reverse the order of E T + E T what goes wrong? 20

Motivation Wouldn t it be nice if the r.d. parser just knew which production to expand next? Idea: replace with return (next = save, E1()) (next = save, E2()); } switch ( something ) { case L1: return E1(); case L2: return E2(); otherwise: print syntax error ; } what s something, L1, L2? the parser will do lookahead (look at next token) 22

Predictive Parsers Like recursive-descent but parser can predict which production to use By looking at the next few tokens No backtracking Predictive parsers accept LL(k) grammars L means left-to-right scan of input L means leftmost derivation k means predict based on k tokens of lookahead In practice, LL(1) is used 23

LL(1) Languages In recursive-descent, for each non-terminal and input token there may be a choice of production LL(1) means that for each non-terminal and token there is only one production that could lead to success Can be specified as a 2D table One dimension for current non-terminal to expand One dimension for next token A table entry contains one production 24

Predictive Parsing and Left Factoring Recall the grammar E T + E T T int int * T ( E ) Impossible to predict because For T two productions start with int For E it is not clear how to predict A grammar must be left-factored before use predictive parsing 26

Left-Factoring Example Recall the grammar E T + E T T int int * T ( E ) Factor out common prefixes of productions E T X X + E T ( E ) int Y Y * T 27

LL(1) parser to simplify things, instead of switch ( something ) { case L1: return E1(); case L2: return E2(); otherwise: print syntax error ; } we ll use a LL(1) table and a parse stack the LL(1) table will replace the switch the parse stack will replace the call stack 29

LL(1) Parsing Table Example Left-factored grammar E T X T ( E ) int Y The LL(1) parsing table: X + E Y * T int * + ( ) $ T int Y ( E ) E T X T X X + E Y * T 30

LL(1) Parsing Table Example (Cont.) Consider the [E, int] entry When current non-terminal is E and next input is int, use production E T X This production can generate an int in the first place Consider the [Y,+] entry When current non-terminal is Y and current token is +, get rid of Y We ll see later why this is so 31

LL(1) Parsing Tables. Errors Blank entries indicate error situations Consider the [E,*] entry There is no way to derive a string starting with * from non-terminal E 32

Using Parsing Tables Method similar to recursive descent, except For each non-terminal S We look at the next token a And choose the production shown at [S,a] We use a stack to keep track of pending nonterminals We reject when we encounter an error state We accept when we encounter end-of-input 33

LL(1) Parsing Algorithm initialize stack = < S $ > and next ( pointer to tokens) repeat case stack of < X, rest> : if T [ X, * next] = Y 1 Y n then stack < Y 1 Y n rest> ; else error ( ) ; < t, rest> : if t = = * next + + then stack < rest> ; else error ( ) ; until stack = = < > 34

LL(1) Parsing Example Stack Input Action E $ int * int $ T X T X $ int * int $ int Y int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal T X $ int $ int Y int Y X $ int $ terminal Y X $ $ X $ $ $ $ ACCEPT 35

Constructing Parsing Tables LL(1) languages are those defined by a parsing table for the LL(1) algorithm No table entry can be multiply defined We want to generate parsing tables from CFG 36

Constructing Predictive Parsing Tables Consider the state S * A With b the next token Trying to match b There are two possibilities: 1. b belongs to an expansion of A Or Any A can be used if b can start a string derived from In this case we say that b First( ) 37

Constructing Predictive Parsing Tables (Cont.) 2. b does not belong to an expansion of A The expansion of A is empty and b belongs to an expansion of Means that b can appear after A in a derivation of the form S * Ab We say that b Follow(A) in this case What productions can we use in this case? Any A can be used if can expand to We say that First(A) in this case 38

First Sets. Example Recall the grammar E T X X + E T ( E ) int Y Y * T First sets First( ( ) = { ( } First( T ) = {int, ( } First( ) ) = { ) } First( E ) = {int, ( } First( int) = { int } First( X ) = {+, } First( + ) = { + } First( Y ) = {*, } First( * ) = { * } 40

Computing First Sets Definition 1. First(b) = { b } First(X) = { b X * b } { X * } 2. For all productions X A 1 A n Add First(A 1 ) { } to First(X). Stop if First(A 1 ) Add First(A 2 ) { } to First(X). Stop if First(A 2 ) Add First(A n ) { } to First(X). Stop if First(A n ) Add to First(X) 3. Repeat step 2 until no First set grows 41

Follow Sets. Example Recall the grammar E T X X + E T ( E ) int Y Y * T Follow sets Follow( + ) = { int, ( } Follow( * ) = { int, ( } Follow( ( ) = { int, ( } Follow( E ) = {), $} Follow( X ) = {$, ) } Follow( T ) = {+, ), $} Follow( ) ) = {+, ), $} Follow( Y ) = {+, ), $} Follow( int) = {*, +, ), $} 42

Computing Follow Sets Definition Follow(X) = { b S * X b } 1. Compute the First sets for all non-terminals first 2. Add $ to Follow(S) (if S is the start non-terminal) 3. For all productions Y X A 1 A n Add First(A 1 ) { } to Follow(X). Stop if First(A 1 ) Add First(A 2 ) { } to Follow(X). Stop if First(A 2 ) Add First(A n ) { } to Follow(X). Stop if First(A n ) Add Follow(Y) to Follow(X) 4. Repeat step 3 until no Follow set grows 43

Constructing LL(1) Parsing Tables Construct a parsing table T for CFG G For each production A in G do: For each terminal b First( ) do T[A, b] = If *, for each b Follow(A) do T[A, b] = If * and $ Follow(A) do T[A, $] = 45

Constructing LL(1) Tables. Example Recall the grammar E T X X + E T ( E ) int Y Y * T Where in the line of Y we put Y * T? In the lines of First( *T) = { * } Where in the line of Y we put Y? In the lines of Follow(Y) = { $, +, ) } 46

Notes on LL(1) Parsing Tables If any entry is multiply defined then G is not LL(1) If G is ambiguous If G is left recursive If G is not left-factored And in other cases as well Most programming language grammars are not LL(1) There are tools that build LL(1) tables 47

Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal E T + E int * int + int 49

Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves int Always expand the leftmost non-terminal E T * T + E int * int + int The leaves at any point form a string A contains only terminals The input string is b The prefix matches The next token is b 50

Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves int Always expand the leftmost non-terminal E T * T int + E T int * int + int The leaves at any point form a string A contains only terminals The input string is b The prefix matches The next token is b 51

Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves int Always expand the leftmost non-terminal E T * T int + E T int int * int + int The leaves at any point form a string A contains only terminals The input string is b The prefix matches The next token is b 52

Predictive Parsing. Review. A predictive parser is described by a table For each non-terminal A and for each token b we specify a production A When trying to expand A we use A if b follows next Once we have the table The parsing algorithm is simple and fast No backtracking is necessary 53