Extra Credit Question

Similar documents
Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Abstract Syntax Trees & Top-Down Parsing

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Compilers. Predictive Parsing. Alex Aiken

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Top down vs. bottom up parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

LL Parsing: A piece of cake after LR

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Types of parsing. CMSC 430 Lecture 4, Page 1

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

More Bottom-Up Parsing

Table-Driven Parsing

Chapter 3. Parsing #1

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

CS Parsing 1

CMSC 330: Organization of Programming Languages

CSE431 Translation of Computer Languages

Monday, September 13, Parsers

Syntactic Analysis. Top-Down Parsing

CA Compiler Construction

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CSCI312 Principles of Programming Languages

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Part 3. Syntax analysis. Syntax analysis 96

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Lexical and Syntax Analysis. Top-Down Parsing

Wednesday, August 31, Parsers

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

3. Parsing. Oscar Nierstrasz

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Outline. Top Down Parsing. SLL(1) Parsing. Where We Are 1/24/2013

Lexical and Syntax Analysis

Parsing Part II (Top-down parsing, left-recursion removal)

Compila(on (Semester A, 2013/14)

1. Consider the following program in a PCAT-like language.

CS502: Compilers & Programming Systems

CSE 401 Midterm Exam Sample Solution 2/11/15

CS164: Midterm I. Fall 2003

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

ECE251 Midterm practice questions, Fall 2010

Syntax Analysis, III Comp 412

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

CMSC 330: Organization of Programming Languages

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Syntax Analysis, III Comp 412

CMSC 330: Organization of Programming Languages

Lecture Notes on Top-Down Predictive LL Parsing

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

LL parsing Nullable, FIRST, and FOLLOW

CMSC 330: Organization of Programming Languages

Topic 3: Syntax Analysis I

Intro To Parsing. Step By Step

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

In this simple example, it is quite clear that there are exactly two strings that match the above grammar, namely: abc and abcc

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CS 321 Programming Languages and Compilers. VI. Parsing

LL(1) predictive parsing

CS 536 Midterm Exam Spring 2013

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

CSE 3302 Programming Languages Lecture 2: Syntax

Computing PREDICT and FOLLOW sets

Language Processing note 12 CS

Compilers - Chapter 2: An introduction to syntax analysis (and a complete toy compiler)

LL(1) predictive parsing

CMSC 330: Organization of Programming Languages. Context Free Grammars

Chapter 3: Lexing and Parsing

Question Points Score

CMSC 330, Fall 2009, Practice Problem 3 Solutions

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Transcription:

Top-Down Parsing #1

Extra Credit Question Given this grammar G: E E+T E T T T * int T int T (E) Is the string int * (int + int) in L(G)? Give a derivation or prove that it is not. #2

Revenge of Theory How do we tell if DFA P is equal to DFA Q? We can do: is DFA P empty? How? We can do: P := not Q How? We can do: P := Q intersect R How? So do: is P intersect not Q empty? Does this work for CFG X and CFG Y? Can we tell if s is in CFG X? #3

Outline Recursive Descent Parsing Left Recursion LL(1) Parsing LL(1) Parsing Tables LP(1) Parsing Algorithm Constructing LL(1) Parsing Tables First, Follow #4

In One Slide An LL(1) parser reads tokens from left to right and constructs a top-down leftmost derivation. LL(1) parsing is a special case of recursive descent parsing in which you can predict which single production to use from one token of lookahead. LL(1) parsing is fast and easy, but it does not work if the grammar is ambiguous, left-recursive, or not left-factored (i.e., it does not work for most programming languages). #5

Intro to Top-Down Parsing Terminals are seen in order of appearance in the token stream: t1 t2 t3 t4 t5 The parse tree is constructed From the top From left to right A t1 B C t2 t4 D t3 t4 #6

Recursive Descent Parsing We ll try recursive descent parsing first Try all productions exhaustively, backtrack Consider the grammar E T+E T T ( E ) int int * T Token stream is: int * int Start with top-level non-terminal E Try the rules for E in order #7

Recursive Descent Example Try E0 T1 + E2 E T+E T T ( E ) int int * T Input = int * int Then try a rule for T1 ( E3 ) But ( does not match input token int Try T1 int. Token matches. But + after T1 does not match input token * Try T1 int * T2 This will match but + after T1 will be unmatched Have exhausted the choices for T1 Backtrack to choice for E0 #8

Recursive Descent Example (2) E T+E T T ( E ) int int * T Input = int * int Try E0 T1 Follow same steps as before for T1 And succeed with T1 int * T2 and T2 int With the following parse tree E0 T1 int * T2 int #9

Recursive Descent Parsing Parsing: given a string of tokens t1 t2... tn, find its parse tree Recursive descent parsing: Try all the productions exhaustively At a given moment the fringe of the parse tree is: t1 t2 tk A Try all the productions for A: if A! BC is a production, the new fringe is t1 t2 tk B C Backtrack when the fringe doesn t match the string Stop when there are no more non-terminals #10

When Recursive Descent Does Not Work Consider a production S S a: In the process of parsing S we try the above rule What goes wrong? A left-recursive grammar has S + S for some Recursive descent does not work in such cases It goes into an 1 loop #11

What's Wrong With That Picture? #12

Elimination of Left Recursion Consider the left-recursive grammar S S S generates all strings starting with a and followed by a number of Can rewrite using right-recursion S T T T #13

Example of Eliminating Left Recursion Consider the grammar S!1 S0 ( = 1 and = 0 ) It can be rewritten as S!1T T!0T #14

More Left Recursion Elimination In general S S 1 S n 1 m All strings derived from S start with one of 1,, m and continue with several instances of 1,, n Rewrite as S 1 T m T T 1 T n T #15

General Left Recursion The grammar S A A S is also left-recursive because S + S This left-recursion can also be eliminated See book, Section 2.3 Detecting and eliminating left recursion are popular test questions #16

Summary of Recursive Descent Simple and general parsing strategy Left-recursion must be eliminated first but that can be done automatically Unpopular because of backtracking Thought to be too inefficient (repetition) We can avoid backtracking Sometimes... #17

Predictive Parsers Like recursive descent but parser can predict which production to use By looking at the next few tokens No backtracking Predictive parsers accept LL(k) grammars First L means left-to-right scan of input Second L means leftmost derivation The k means predict based on k tokens of lookahead In practice, LL(1) is used #18

Sometimes Things Are Perfect The.ml-lex format you emit in PA2 Will be the input for PA3 actually the reference.ml-lex will be used It can be parsed with no lookahead You always know just what to do next Ditto with the.ml-ast output of PA3 Just write a few mutually-recursive functions They read in the input, one line at a time #19

LL(1) In recursive descent, for each non-terminal and input token there may be a choice of which production to use LL(1) means that for each non-terminal and token there is only one production that could lead to success Can be specified as a 2D table One dimension for current non-terminal to expand One dimension for next token Each table entry contains one production #20

Predictive Parsing and Left Factoring Recall the grammar E T+E T T int int * T ( E ) Impossible to predict because For T two productions start with int For E it is not clear how to predict A grammar must be left-factored before use for predictive parsing #21

Left-Factoring Example Recall the grammar E T+E T T int int * T ( E ) Factor out common prefixes of productions E T X X + E T ( E ) Y * T #22

Introducing: Parse Tables #23

LL(1) Parsing Table Example Left-factored grammar E T ( E ) X +E Y *T The LL(1) parsing table ($ is a special end marker): T E X Y int * + +E *T ( (E) ) $ #24

LL(1) Parsing Table Example Analysis Consider the [E, int] entry When current non-terminal is E and next input is int, use production E T X This production can generate an int in the first position T E X Y int * + +E *T ( (E) ) $ #25

LL(1) Parsing Table Example Analysis Consider the [Y,+] entry When current non-terminal is Y and current token is +, get rid of Y We ll see later why this is so T E X Y int * + +E *T ( (E) ) $ #26

LL(1) Parsing Tables: Errors Blank entries indicate error situations Consider the [E,*] entry There is no way to derive a string starting with * from non-terminal E T E X Y int * + +E *T ( (E) ) $ #27

Using Parsing Tables Method similar to recursive descent, except For each non-terminal S We look at the next token a And choose the production shown at [S,a] We use a stack to keep track of pending nonterminals We reject when we encounter an error state We accept when we encounter end-of-input #28

LL(1) Parsing Algorithm initialize stack = <S $> next = (pointer to tokens) repeat match stack with <X, rest>: if T[X,*next] = Y1 Yn then stack <Y1 Yn rest> else error () <t, rest>: if t == *next ++ then stack <rest> else error () until stack == < > #29

Stack Input int * + Action ( ) $ +E T (E) E X Y *T #30

Stack E$ Input int * int $ int * + Action ( ) $ +E T (E) E X Y *T #31

Stack E$ $ Input int * int $ int * int $ int * + Action ( ) $ +E T (E) E X Y *T #32

Stack E$ $ X $ int Input int * int $ int * int $ int * int $ * + Action terminal ( ) $ +E T (E) E X Y *T #33

Stack E$ $ X $ YX$ int Input int * int $ int * int $ int * int $ * int $ * + Action terminal *T ( ) $ +E T (E) E X Y *T #34

Stack E$ $ X $ YX$ *$ int Input int * int $ int * int $ int * int $ * int $ * int $ * + Action terminal *T terminal ( ) $ +E T (E) E X Y *T #35

Stack E$ $ X $ YX$ *$ $ int Input int * int $ int * int $ int * int $ * int $ * int $ int $ * + Action terminal *T terminal ( ) $ +E T (E) E X Y *T #36

Stack E$ $ X $ YX$ *$ $ X $ int Input int * int $ int * int $ int * int $ * int $ * int $ int $ int $ * + Action terminal *T terminal terminal ( ) $ +E T (E) E X Y *T #37

Stack E$ $ X $ YX$ *$ $ X $ YX$ int Input int * int $ int * int $ int * int $ * int $ * int $ int $ int $ $ * + Action terminal *T terminal terminal ( ) $ +E T (E) E X Y *T #38

Stack E$ $ X $ YX$ *$ $ X $ YX$ X$ int Input int * int $ int * int $ int * int $ * int $ * int $ int $ int $ $ $ * + Action terminal *T terminal terminal ( ) $ +E T (E) E X Y *T #39

Stack E$ $ X $ YX$ *$ $ X $ YX$ X$ $ int Input int * int $ int * int $ int * int $ * int $ * int $ int $ int $ $ $ $ * + Action terminal *T terminal terminal ACCEPT ( ) $ +E T (E) E X Y *T #40

LL(1) Languages LL(1) languages can be LL(1) parsed A language Q is LL(1) if there exists an LL(1) table such the LL(1) parsing algorithm using that table accepts exactly the strings in Q No table entry can be multiply defined Once we have the table The parsing algorithm is simple and fast No backtracking is necessary Want to generate parsing tables from CFG! #41

Q: Movies (263 / 842) This 1982 Star Trek film features Spock nerve-pinching McCoy, Kirstie Alley "losing" the Kobayashi Maru, and Chekov being mind-controlled by a slug-like alien. Ricardo Montalban is "is intelligent, but not experienced. His pattern indicates two-dimensional thinking."

Q: Music (238 / 842) For two of the following four lines from the 1976 Eagles song Hotel California, give enough words to complete the rhyme. So I called up the captain / "please bring me my wine" Mirrors on the ceiling / pink champagne on ice And in the master's chambers / they gathered for the feast We are programmed to receive / you can checkout any time you like,

Q: Books (727 / 842) Name 5 of the 9 major characters in A. A. Milne's 1926 books about a "bear of very little brain" who composes poetry and eats honey.

Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal E T int * + E int + int #45

Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal E T int * + E T The leaves at any point form a string A contains only terminals The input string is b The prefix matches int * int + int The next token is b #46

Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal E T int * + T E T int int * int + int The leaves at any point form a string A contains only terminals The input string is b The prefix matches The next token is b #47

Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal E T int * int * + E T T int int int + int The leaves at any point form a string A contains only terminals The input string is b The prefix matches The next token is b #48

Constructing Predictive Parsing Tables Consider the state S!* A With b the next token Trying to match b There are two possibilities: b belongs to an expansion of A Any A! can be used if b can start a string derived from In this case we say that b 2 First( ) Or #49

Constructing Predictive Parsing Tables b does not belong to an expansion of A The expansion of A is empty and b belongs to an expansion of (e.g., b ) Means that b can appear after A in a derivation of the form S!* Ab We say that b 2 Follow(A) in this case What productions can we use in this case? Any A! can be used if can expand to We say that 2 First(A) in this case #50

Computing First Sets Definition First(X) = { b X * b } { X * } First(b) = { b } For all productions X! A1 An Add First(A1) {} to First(X). Stop if First(A1) Add First(A2) {} to First(X). Stop if First(A2) Add First(An) {} to First(X). Stop if First(An) Add to First(X) (ignore Ai if it is X) #51

Example First Set Computation Recall the grammar E T ( E ) X +E Y *T First sets First( ( ) = { ( } First( First( ) ) = { ) } First( First( int) = { int } First( First( + ) = { + } First( First( * ) = { * } T ) = {int, ( } E ) = {int, ( } X ) = {+, } Y ) = {*, } #52

Computing Follow Sets Definition Follow(X) = { b S * X b } Compute the First sets for all non-terminals first Add $ to Follow(S) (if S is the start non-terminal) For all productions Y! X A1 An Add First(A1) {} to Follow(X). Stop if First(A1) Add First(A2) {} to Follow(X). Stop if First(A2) Add First(An) {} to Follow(X). Stop if First(An) Add Follow(Y) to Follow(X) #53

Example Follow Set Computation Recall the grammar E T ( E ) X +E Y *T Follow sets Follow( + ) = { int, ( } Follow( Follow( ( ) = { int, ( } Follow( Follow( X ) = {$, ) } Follow( Follow( ) ) = {+, ), $} Follow( Follow( int) = {*, +, ), $} * ) = { int, ( } E ) = {), $} T ) = {+, ), $} Y ) = {+, ), $} #54

Constructing LL(1) Parsing Tables Here is how to construct a parsing table T for context-free grammar G For each production A in G do: For each terminal b First( ) do T[A, b] = If! *, for each b Follow(A) do T[A, b] = #55

LL(1) Table Construction Example Recall the grammar E T ( E ) X +E Y *T Where in the row of Y do we put Y! * T? In the columns of First( *T ) = { * } T E X Y int * + +E *T ( (E) ) $ #56

LL(1) Table Construction Example Recall the grammar E T ( E ) X +E Y *T Where in the row of Y we put Y!? In the columns of Follow(Y) = { $, +, ) } T E X Y int * + +E *T ( (E) ) $ #57

Avoid Multiple Definitions! #58

Notes on LL(1) Parsing Tables If any entry is multiply defined then G is not LL(1) If G is ambiguous If G is left recursive If G is not left-factored And in other cases as well Most programming language grammars are not LL(1) (e.g., Java, Ruby, C++, OCaml, Cool, Perl,...) There are tools that build LL(1) tables #59

Simple Parsing Strategies Recursive Descent Parsing But backtracking is too annoying, etc. Predictive Parsing, aka. LL(k) Predict production from k tokens of lookahead Build LL(1) table Parsing using the table is fast and easy But many grammars are not LL(1) (or even LL(k)) Next: a more powerful parsing strategy for grammars that are not LL(1) #60

Homework WA1 (written homework) due Turn in to drop-box. PA2 (Lexer) due You may work in pairs. Keep up with the reading... #61