Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Similar documents
Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Extra Credit Question

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Top down vs. bottom up parsing

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Compilers. Predictive Parsing. Alex Aiken

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Error Handling Syntax-Directed Translation Recursive Descent Parsing

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Types of parsing. CMSC 430 Lecture 4, Page 1

LL Parsing: A piece of cake after LR

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

CSCI312 Principles of Programming Languages

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Syntactic Analysis. Top-Down Parsing

CS Parsing 1

Lexical and Syntax Analysis. Top-Down Parsing

Lexical and Syntax Analysis

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

More Bottom-Up Parsing

Chapter 3. Parsing #1

CA Compiler Construction

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CS502: Compilers & Programming Systems

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

3. Parsing. Oscar Nierstrasz

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Parsing Part II (Top-down parsing, left-recursion removal)

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

Compila(on (Semester A, 2013/14)

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Monday, September 13, Parsers

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

CS 314 Principles of Programming Languages

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Introduction to Bottom-Up Parsing

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

CMSC 330: Organization of Programming Languages

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Introduction to Parsing. Comp 412

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

Table-Driven Parsing

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

CS 321 Programming Languages and Compilers. VI. Parsing

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Syntactic Analysis. Top-Down Parsing. Parsing Techniques. Top-Down Parsing. Remember the Expression Grammar? Example. Example

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Bottom-Up Parsing. Lecture 11-12

Syntax Analysis, III Comp 412

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Syntax Analysis, III Comp 412

CSE431 Translation of Computer Languages

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

Lecture 8: Deterministic Bottom-Up Parsing

Parsing II Top-down parsing. Comp 412


LL parsing Nullable, FIRST, and FOLLOW

Wednesday, August 31, Parsers

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Lecture 7: Deterministic Bottom-Up Parsing

Bottom-Up Parsing II. Lecture 8

Lecture Notes on Top-Down Predictive LL Parsing

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Syntax-Directed Translation. Lecture 14

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

Introduction to Parsing. Lecture 8

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Outline. The strategy: shift-reduce parsing. Introduction to Bottom-Up Parsing. A key concept: handles

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

The procedure attempts to "match" the right hand side of some production for a nonterminal.

Transcription:

Administrativia Building a Parser III CS164 3:30-5:00 10 vans WA1 due on hu PA2 in a week Slides on the web site I do my best to have slides ready and posted by the end of the preceding logical day yesterday, my best was not good enough 1 2 Overview Finish recursive descent parser when it breaks down and how to fix it eliminating left recursion reordering productions Predictive parsers (aka LL(1) parsers) computing FIRS, FOLLOW table-driven, stack-manipulating version of the parser Review: grammar for arithmetic expressions Simple arithmetic expressions: d n id ( ) * Some elements of this language: id n ( n ) n id id * ( id id ) 3 4 Review: derivation Recursive Descent Parsing Grammar: d n id ( ) * a derivation: rewrite with ( ) ( ) rewrite with n ( n ) this is the final string of terminals another derivation (written more concisely): d ( ) d ( * ) d ( * ) d ( n * ) d ( n id * ) d ( n id * id ) this is left-most derivation (remember it) always expand the left-most non-terminal can you guess what s right-most derivation? Consider the grammar int int * ( ) oken stream is: int 5 * int 2 Start with top-level non-terminal ry the rules for in order 5 6 1

Recursive-Descent Parsing Parsing: given a string of tokens t 1 t 2... t n, find its parse tree Recursive-descent parsing: ry all the productions exhaustively At a given moment the fringe of the parse tree is: t 1 t 2 t k A ry all the productions for A: if A d BC is a production, the new fringe is t 1 t 2 t k B C Backtrack when the fringe doesn t match the string Stop when there are no more non-terminals When Recursive Descent Does Not Work Consider a production S S a: In the process of parsing S we try the above rule What goes wrong? A fix? S must have a non-recursive production, say S b expand this production before you expand S S a Problems remain performance (steps needed to parse baaaaa ) termination (parse the error input c ) 7 8 Solutions First, restrict backtracking backtrack just enough to produce a sufficiently powerful r.d. parser Second, eliminate left recursion transformation that produces a different grammar the new grammar generates same strings but does it give us same parse tree as old grammar? Let s see the restricted r.d. parser first A Recursive Descent Parser (1) Define boolean functions that check the token string for a match of A given token terminal bool term(okn tok) { return in[next] == tok; } A given production of S (the n th ) bool S n () { } Any production of S: bool S() { } hese functions advance next 9 10 A Recursive Descent Parser (2) A Recursive Descent Parser (3) For production bool 1 () { return () && term(plus) && (); } For production bool 2 () { return (); } For all productions of (with backtracking) bool () { int save = next; return (next = save, 1 ()) (next = save, 2 ()); } Functions for non-terminal bool 1 () { return term(opn) && () && term(clos); } bool 2 () { return term(in) && term(ims) && (); } bool 3 () { return term(in); } bool () { int save = next; return (next = save, 1 ()) (next = save, 2 ()) (next = save, 3 ()); } 11 12 2

Recursive Descent Parsing. Notes. o start the parser Initialize next to point to first token Invoke () Notice how this simulates our backtracking example from lecture asy to implement by hand Predictive parsing is more efficient Now back to left-recursive grammars Does this style of r.d. parser work for our left-recursive grammar? the grammar: S S a b what happens when S S a is expanded first? what happens when S b is expanded first? 13 14 Left-recursive grammars limination of Left Recursion A left-recursive grammar has a non-terminal S S Sα for some α Recursive descent does not work in such cases It goes into an loop Notes: α: a shorthand for any string of terminals, nonterminals symbol is a shorthand for can be derived in one or more steps : S Sα is same as S Sα Consider the left-recursive grammar S S α β S generates all strings starting with a β and followed by a number of α Can rewrite using right-recursion S βs S αs 15 16 limination of Left-Recursion. xample More limination of Left-Recursion Consider the grammar S d 1 S 0 ( β = 1 and α = 0 ) can be rewritten as S d 1 S S d 0 S In general S S α 1 S α n β 1 β m All strings derived from S start with one of β 1,,β m and continue with several instances of α 1,,α n Rewrite as S β 1 S β m S S α 1 S α n S 17 18 3

General Left Recursion he grammar S A α δ A S β is also left-recursive because S S β α his left-recursion can also be eliminated See [ASU], Section 4.3 for general algorithm Summary of Recursive Descent simple parsing strategy left-recursion must be eliminated first but that can be done automatically unpopular because of backtracking thought to be too inefficient in practice, backtracking is (sufficiently) eliminated by restricting the grammar so, it s good enough for small languages careful, though: order of productions important even after left-recursion eliminated try to reverse the order of what goes wrong? 19 20 Motivation Predictive parsers Wouldn t it be nice if the r.d. parser just knew which production to expand next? Idea: replace with return (next = save, 1()) (next = save, 2()); } switch ( something ) { case L1: return 1(); case L2: return 2(); otherwise: print syntax error ; } what s something, L1, L2? the parser will do lookahead (look at next token) 22 Predictive Parsers LL(1) Languages Like recursive-descent but parser can predict which production to use By looking at the next few tokens No backtracking Predictive parsers accept LL(k) grammars L means left-to-right scan of input L means leftmost derivation k means predict based on k tokens of lookahead In practice, LL(1) is used 23 In recursive-descent, for each non-terminal and input token there may be a choice of production LL(1) means that for each non-terminal and token there is only one production that could lead to success Can be specified as a 2D table One dimension for current non-terminal to expand One dimension for next token A table entry contains one production 24 4

Predictive Parsing and Left Factoring int int * ( ) Left factoring Impossible to predict because For two productions start with int For it is not clear how to predict A grammar must be left-factored before use predictive parsing 26 Left-Factoring xample int int * ( ) Factor out common prefixes of productions X X ( ) int Y Y * LL(1) parser (details) 27 LL(1) parser LL(1) Parsing able xample to simplify things, instead of switch ( something ) { case L1: return 1(); case L2: return 2(); otherwise: print syntax error ; Left-factored grammar X ( ) int Y he LL(1) parsing table: X Y * } we ll use a LL(1) table and a parse stack the LL(1) table will replace the switch the parse stack will replace the call stack X int int Y X * ( ( ) X ) $ Y * 29 30 5

LL(1) Parsing able xample (Cont.) Consider the [, int] entry When current non-terminal is and next input is int, use production X his production can generate an int in the first place Consider the [Y,] entry When current non-terminal is Y and current token is, get rid of Y We ll see later why this is so LL(1) Parsing ables. rrors Blank entries indicate error situations Consider the [,*] entry here is no way to derive a string starting with * from non-terminal 31 32 Using Parsing ables LL(1) Parsing Algorithm Method similar to recursive descent, except For each non-terminal S We look at the next token a And choose the production shown at [S,a] We use a stack to keep track of pending nonterminals We reject when we encounter an error state We accept when we encounter end-of-input initialize stack = <S $> and next (pointer to tokens) repeat case stack of <X, rest> : if [X,*next] = Y 1 Y n then stack <Y 1 Y n rest>; else error (); <t, rest> : if t == *next then stack <rest>; else error (); until stack == < > 33 34 LL(1) Parsing xample Constructing Parsing ables Stack Input Action $ int * int $ X X $ int * int $ int Y int Y X $ int * int $ terminal Y X $ * int $ * * X $ * int $ terminal X $ int $ int Y int Y X $ int $ terminal Y X $ $ X $ $ $ $ ACCP LL(1) languages are those defined by a parsing table for the LL(1) algorithm No table entry can be multiply defined We want to generate parsing tables from CFG 35 36 6

Constructing Predictive Parsing ables Consider the state S d * βaγ With b the next token rying to match βbδ here are two possibilities: 1. b belongs to an expansion of A Any A d α can be used if b can start a string derived from α In this case we say that b First(α) Or Constructing Predictive Parsing ables (Cont.) 2. b does not belong to an expansion of A he expansion of A is empty and b belongs to an expansion of γ Means that b can appear after A in a derivation of the form S d * βabω We say that b Follow(A) in this case What productions can we use in this case? Any A d α can be used if α can expand to We say that First(A) in this case 37 38 First Sets. xample Computing First, Follow sets X X ( ) int Y Y * First sets First( ( ) = { ( } First( ) = {int, ( } First( ) ) = { ) } First( ) = {int, ( } First( int) = { int } First( X ) = {, } First( ) = { } First( Y ) = {*, } First( * ) = { * } 40 Computing First Sets Definition First(X) = { b X * bα} { X * } 1. First(b) = { b } 2. For all productions X A 1 A n Add First(A 1 ) {} to First(X). Stop if First(A 1 ) Add First(A 2 ) {} to First(X). Stop if First(A 2 ) Add First(A n ) {} to First(X). Stop if First(A n ) Add to First(X) 3. Repeat step 2 until no First set grows Follow Sets. xample X X ( ) int Y Y * Follow sets Follow( ) = { int, ( } Follow( * ) = { int, ( } Follow( ( ) = { int, ( } Follow( ) = {), $} Follow( X ) = {$, ) } Follow( ) = {, ), $} Follow( ) ) = {, ), $} Follow( Y ) = {, ), $} Follow( int) = {*,, ), $} 41 42 7

Computing Follow Sets Definition Follow(X) = { b S * β X b δ } 1. Compute the First sets for all non-terminals first 2. Add $ to Follow(S) (if S is the start non-terminal) 3. For all productions Y X A 1 A n Add First(A 1 ) {} to Follow(X). Stop if First(A 1 ) Add First(A 2 ) {} to Follow(X). Stop if First(A 2 ) Add First(A n ) {} to Follow(X). Stop if First(A n ) Add Follow(Y) to Follow(X) Constructing the LL(1) parsing table 4. Repeat step 3 until no Follow set grows 43 Constructing LL(1) Parsing ables Constructing LL(1) ables. xample Construct a parsing table for CFG G X X ( ) int Y Y * Where in the line of Y we put Y *? In the lines of First( *) = { * } For each production A αin G do: For each terminal b First(α) do [A, b] = α If α d *, for each b Follow(A) do [A, b] = α If α d * and $ Follow(A) do [A, $] = α Where in the line of Y we put Y d? In the lines of Follow(Y) = { $,, ) } 45 46 Notes on LL(1) Parsing ables If any entry is multiply defined then G is not LL(1) If G is ambiguous If G is left recursive If G is not left-factored And in other cases as well Most programming language grammars are not LL(1) here are tools that build LL(1) tables Notes and Review 47 8

op-down Parsing. Review op-down Parsing. Review op-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal int * int int 49 op-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal he leaves at any point form a string βaγ β contains only terminals int * he input string is βbδ he prefix β matches he next token is b int * int int 50 op-down Parsing. Review op-down Parsing. Review op-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal he leaves at any point form a string βaγ β contains only terminals int * he input string is βbδ he prefix β matches int he next token is b int * int int 51 op-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal he leaves at any point form a string βaγ β contains only terminals int he input string is βbδ * he prefix β matches int int he next token is b int * int int 52 Predictive Parsing. Review. A predictive parser is described by a table For each non-terminal A and for each token b we specify a production A d α When trying to expand A we use A d α if b follows next Once we have the table he parsing algorithm is simple and fast No backtracking is necessary 53 9