Syntactic Analysis. Top-Down Parsing. Parsing Techniques. Top-Down Parsing. Remember the Expression Grammar? Example. Example

Similar documents
Syntactic Analysis. Top-Down Parsing

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Parsing Part II (Top-down parsing, left-recursion removal)

CS 406/534 Compiler Construction Parsing Part I

Syntax Analysis, III Comp 412

CSCI312 Principles of Programming Languages

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Syntax Analysis, III Comp 412

Computer Science 160 Translation of Programming Languages

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Types of parsing. CMSC 430 Lecture 4, Page 1

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Parsing II Top-down parsing. Comp 412

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Introduction to Parsing. Comp 412

3. Parsing. Oscar Nierstrasz

CA Compiler Construction

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Context-free grammars

Compiler Construction: Parsing

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Chapter 4: LR Parsing

Parsing II Top-down parsing. Comp 412

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Front End. Hwansoo Han

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Bottom-Up Parsing II. Lecture 8

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Syntax Analysis Part I

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Top down vs. bottom up parsing

Chapter 4. Lexical and Syntax Analysis

Compiler Construction 2016/2017 Syntax Analysis

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Chapter 3. Parsing #1


Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

4. Lexical and Syntax Analysis

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

Table-Driven Top-Down Parsers

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

4. Lexical and Syntax Analysis

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Bottom-up Parser. Jungsik Choi

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Introduction to Parsing

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

CS 4120 Introduction to Compilers

Academic Formalities. CS Modern Compilers: Theory and Practise. Images of the day. What, When and Why of Compilers

MIT Top-Down Parsing. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Lexical Analysis. Introduction

Lexical and Syntax Analysis. Top-Down Parsing

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

CMSC 330: Organization of Programming Languages

Compilers. Bottom-up Parsing. (original slides by Sam

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Wednesday, August 31, Parsers

Syntax Analysis Check syntax and construct abstract syntax tree

Concepts Introduced in Chapter 4

CS 406/534 Compiler Construction Parsing Part II LL(1) and LR(1) Parsing

SLR parsers. LR(0) items

Context-free grammars (CFG s)

Monday, September 13, Parsers

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

The role of the parser

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

How do LL(1) Parsers Build Syntax Trees?

CS 314 Principles of Programming Languages

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Outline CS412/413. Administrivia. Review. Grammars. Left vs. Right Recursion. More tips forll(1) grammars Bottom-up parsing LR(0) parser construction

Lecture 8: Deterministic Bottom-Up Parsing

CS 230 Programming Languages

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CSX-lite Example. LL(1) Parse Tables. LL(1) Parser Driver. Example of LL(1) Parsing. An LL(1) parse table, T, is a twodimensional

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Transcription:

Syntactic Analysis op-down Parsing Parsing echniques op-down Parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production & try to match the input Bad pick may need to backtrack Some grammars are backtrack-free (predictive parsing) Copyright 2011, Pedro C. Diniz, all rights reserved. Students enrolled in Compilers class at University of Southern California (USC) have explicit permission to make copies of these materials for their personal use. Bottom-Up Parsers (LR(1), operator precedence) Start at the leaves and grow toward root As input is consumed, encode possibilities in an internal state Start in a state valid for legal first tokens Bottom-up parsers handle a large class of grammars 2 op-down Parsing Remember the xpression Grammar? A top-down parser starts with the root of the parse tree he root node is labeled with the goal symbol of the grammar op-down parsing algorithm: Construct the root node of the parse tree Repeat until the fringe of the parse tree matches the input string 1 At a node labeled A, select a production with A on its lhs and, for each symbol on its rhs, construct the appropriate child 2 When a terminal symbol is added to the fringe and it doesn t match the fringe, backtrack 3 ind the next node to be expanded(label N) he key is picking the right production in step 1 hat choice should be guided by the input string xample CG: 1! xpr 2 xpr! xpr + erm 3 xpr erm 4 erm 5 erm! erm * actor 6 erm / actor 7 actor 8 actor! number 9 id And the input x 2 * y 3 4 xample xample Let s try x 2 * y : Let s try x 2 * y : Rule Sentential orm Input!x 2 * y 1 xpr!x 2 * y 2 xpr + erm!x 2 * y 4 erm + erm!x 2 * y 7 actor + erm!x 2 * y 9 <id,x> + erm!x 2 * y 9 <id,x> + erm x! 2 * y xpr erm act. <id,x> xpr + erm Rule Sentential orm Input!x 2 * y 1 xpr!x 2 * y 2 xpr + erm!x 2 * y 4 erm + erm!x 2 * y 7 actor + erm!x 2 * y 9 <id,x> + erm!x 2 * y 9 <id,x> + erm x! 2 * y xpr xpr + erm act. <id,x> erm Leftmost derivation, choose productions in an order that exposes problems his worked well, except that doesn t match + he parser must backtrack to here 5 6 1

xample xample Continuing with x 2 * y : Rule Sentential orm Input!x 2 * y 1 xpr!x 2 * y 3 xpr erm!x 2 * y 4 erm erm!x 2 * y 7 actor erm!x 2 * y 9 <id,x> erm!x 2 * y 9 <id,x> erm x! 2 * y <id,x> erm x!2 * y xpr erm act. <id,x> xpr erm Continuing with x 2 * y : Rule Sentential orm Input!x 2 * y 1 xpr!x 2 * y 3 xpr erm!x 2 * y 4 erm erm!x 2 * y 7 actor erm!x 2 * y 9 <id,x> erm!x 2 * y 9 <id,x> erm x! 2 * y <id,x> erm x!2 * y xpr erm act. <id,x> xpr erm his time, and matched We can advance past to look at 2 Now, we need to expand erm - the last N on the fringe 7 8 xample xample rying to match the 2 in x 2 * y : Rule Sentential orm Input <id,x> erm x!2 * y 7 <id,x> actor x!2 * y xpr 9 <id,x> <num,2> x!2 * y erm <id,x> <num,2> x 2!* y act. <id,x> xpr erm act. <num,2> rying to match the 2 in x 2 * y : Rule Sentential orm Input <id,x> erm x!2 * y 7 <id,x> actor x!2 * y 9 <id,x> <num,2> x!2 * y <id,x> <num,2> x 2!* y xpr erm act. xpr Where are we? <id,x> 2 matches 2 We have more input, but no Ns left to expand he expansion terminated too soon Need to backtrack - erm act. <num,2> 9 10 xample Another Possible Parse rying again with 2 in x 2 * y : Rule Sentential orm Input <id,x> erm x!2 * y 5 <id,x> erm * actor x!2 * y 7 <id,x> actor * actor x!2 * y 8 <id,x> <num,2> * actor x!2 * y <id,x> <num,2> * actor x 2!* y <id,x> <num,2> * actor x 2 *!y 9 <id,x> <num,2> * <id,y> x 2 *!y <id,x> <num,2> * <id,y> x 2 * y! xpr erm act. his time, we matched & consumed all the input Success! xpr erm act. <id,x> <num,2> erm * act. <id,y> 11 Other choices for expansion are possible Rule Sentential orm Input!x 2 * y 1 xpr!x 2 * y 2 xpr + erm!x 2 * y 2 xpr + erm +erm!x 2 * y 2 xpr + erm + erm +erm!x 2 * y 2 xpr +erm + erm + +erm!x 2 * y consuming no input! his doesn t terminate (obviously) Wrong choice of expansion leads to non-termination termination is a bad property for a parser to have Parser must make the right choice 12 2

Left Recursion liminating Left Recursion op-down parsers cannot handle left-recursive grammars ormally, A grammar is left recursive if A N such that a derivation A + Aα, for some string α (N ) + Our expression grammar is left recursive his can lead to non-termination in a op-down parser or a op-down parser, any recursion must be right recursion We would like to convert the left recursion to right recursion termination is a bad property in any part of a compiler o remove left recursion, we can transform the grammar Consider a grammar fragment of the form ee ee α β where neither α nor β start with ee We can rewrite this as ee β ie ie α ie where ie is a new non-terminal his accepts the same language, but uses only right recursion 13 14 liminating Left Recursion liminating Left Recursion he expression grammar contains two cases of left recursion xpr! xpr + erm erm! erm * actor xpr erm erm / actor erm actor Applying the transformation yields xpr! erm xpr" erm! actor erm" xpr" + erm xpr" erm" * actor erm" erm xpr" / actor erm" hese fragments use only right recursion hey retain the original left associativity Substituting them back into the grammar yields 1! xpr 2 xpr! erm xpr" 3 xpr"! + erm xpr" 4 erm xpr" 5 # 6 erm! actor erm" 7 erm"! * actor erm" 8 / actor erm" 9 # 10 actor! number 11 id 12 ( xpr ) his grammar is correct, if somewhat non-intuitive. It is left associative, as was the original A top-down parser will terminate using it. A top-down parser may need to backtrack with it. 15 16 liminating Left Recursion he transformation (above) eliminates immediate left recursion What about more general, indirect left recursion? he general algorithm: arrange the Ns into some order A 1, A 2,, A n for i 1 to n for s 1 to i 1 replace each production A i A s γ with A i δ 1 γ δ 2 γ δ k γ, where A s δ 1 δ 2 δ k are all the current productions for A s eliminate any immediate left recursion on A i using the direct transformation Must start with 1 to ensure that A 1 A 1 β is transformed his assumes that the initial grammar has no cycles (A i + A i ), and no epsilon productions (may need to transform grammar) And back 17 liminating Left Recursion How does this algorithm work? 1. Impose arbitrary order on the non-terminals 2. Outer loop cycles through N in order 3. Inner loop ensures that a production expanding A i has no nonterminal A s in its rhs, for s < i 4. Last step in outer loop converts any direct recursion on A i to right recursion using the transformation showed earlier 5. New non-terminals are added at the end of the order & have no left recursion At the start of the i th outer loop iteration or all k < i, no production that expands A k contains a non-terminal A s in its rhs, for s < k 18 3

xample xample Order of symbols: G,, Order of symbols: G,, 1. A i = G G G + + ~ ~ id id 19 20 xample xample Order of symbols: G,, Order of symbols: G,, 1. A i = G 2. A i = 1. A i = G 2. A i = 3. A i =, A s = G G G G G + ' + ' ' ' + ' ' + ' ' + ' ~ ' ~ ' ' id ~ id ~ ' ~ id id id Go to Algorithm 21 22 xample Roadmap (Where are we?) Order of symbols: G,, 1. A i = G 2. A i = 3. A i =, A s = G G G + ' ' 4. A i = G ' We set out to study parsing Specifying syntax Context-free grammars Ambiguity ~ id ' + ' ' ~ id ' + ' ' ' ~ id ' + ' ' id ' ' ' ~ ' op-down parsers Algorithm & its problem with left recursion Left-recursion removal ' Predictive op-down parsing he LL(1) condition Simple recursive descent parsers able-driven LL(1) parsers 23 24 4

Picking the Right Production Predictive Parsing If it picks the wrong production, a top-down parser may backtrack Alternative is to look ahead in input & use context to pick correctly How much lookahead is needed? In general, an arbitrarily large amount Use the Cocke-Younger, Kasami algorithm or arley s algorithm ortunately, Large subclasses of CGs can be parsed with limited lookahead Most programming language constructs fall in those subclasses Among the interesting subclasses are LL(1) and LR(1) grammars Basic idea Given A α β, the parser should be able to choose between α and β IRS Sets or some rhs α G, define IRS(α) as the set of tokens that appear as the first symbol in some string that derives from α hat is, x IRS(α) iff α * x γ, for some γ We will defer the problem of how to compute IRS sets until we look at the LL(1) table construction algorithm 25 26 Predictive Parsing Predictive Parsing Basic idea Given A α β, the parser should be able to choose between α and β What about -productions? hey complicate the definition of LL(1) IRS Sets or some rhs α G, define IRS(α) as the set of tokens that appear as the first symbol in some string that derives from α hat is, x IRS(α) iff α * x γ, for some γ he LL(1) Property If A α and A β both appear in the grammar, we would like IRS(α) IRS(β) = his would allow the parser to make a correct choice with a lookahead of exactly one symbol! his is almost correct See the next slide 27 If A α and A β and IRS(α), then we need to ensure that IRS(β) is disjoint from OLLOW(α), too Define IRS + (α) as IRS(α) OLLOW(α), if IRS(α) IRS(α), otherwise hen, a grammar is LL(1) iff A α and A β implies OLLOW(α) is the set of IRS + (α) IRS + all words in the grammar (β) = that can legally appear immediately after an α 28 Predictive Parsing Given a grammar that has the LL(1) property Can write a simple routine to recognize each lhs Code is both simple & fast Consider A β 1 β 2 β 3, with IRS + (β 1 ) IRS + (β 2 ) IRS + (β 3 ) = /* find an A */ if (current_word IRS(β 1 )) find a β 1 and return true if (current_word IRS(β 2 )) find a β 2 and return true if (current_word IRS(β 3 )) find a β 3 and return true report an error and return false Grammars with the LL(1) property are called predictive grammars because the parser can predict the correct expansion at each point in the parse. Parsers that capitalize on the LL(1) property are called predictive parsers. One kind of predictive parser is the recursive descent parser. Recursive Descent Parsing Recall the expression grammar, after transformation 1! xpr 2 xpr! erm xpr" 3 xpr"! + erm xpr" 4 erm xpr" 5 # 6 erm! actor erm" 7 erm"! * actor erm" 8 / actor erm" 9 # 10 actor! number 11 id his produces a parser with six mutually recursive routines: xpr Prime erm Prime actor ach recognizes one N or he term descent refers to the direction in which the parse tree is built. 29 30 5

Recursive Descent Parsing (Procedural) Recursive Descent Parsing A couple of routines from the expression parser ( ) token next_token( ); if (xpr( ) = true & token = O) then next compilation step; report syntax error; return false; looking for O, xpr( ) found token if (erm( ) = false) then return false; return prime( ); actor( ) if (token = Number) then token next_token( ); return true; if (token = Identifier) then token next_token( ); return true; report syntax error; return false; Prime, erm, & Prime follow the same basic lines o build a parse tree: Augment parsing routines to build nodes Pass nodes between routines using a stack Node for each symbol on rhs Action is to pop rhs nodes, make them children of lhs node, and push this subtree o build an abstract syntax tree Build fewer nodes Put them together in a different order xpr( ) result true; if (erm( ) = false) then return false; if (Prime( ) = false) then result false; build an xpr node pop Prime node pop erm node make Prime & erm children of xpr push xpr node return result; looking for Number or Identifier, found token instead Success build a piece of the parse tree 31 32 Left actoring Left actoring What if my grammar does not have the LL(1) property? Sometimes, we can transform the grammar A graphical explanation for the same idea αβ 1 he algorithm A N, find the longest prefix α that occurs in two or more right-hand sides of A if α then replace all of the A productions, A αβ 1 αβ 2 αβ n γ, with A α Z γ Z β 1 β 2 β n where Z is a new element of N A αβ 1 αβ 2 αβ3 becomes A α Z Z β 1 β 2 β n A A αβ 2 αβ 3 αz β 1 β 2 β 3 Repeat until no common prefixes remain 33 34 Left actoring (An example) Left actoring Consider the following fragment of the expression grammar actor! Identifier Identifier [ xprlist ] Identifier ( xprlist ) After left factoring, it becomes actor! Identifier Arguments Arguments! [ xprlist ] ( xprlist ) " IRS(rhs 1 ) = { Identifier } IRS(rhs 2 ) = { Identifier } IRS(rhs 3 ) = { Identifier } It does not have the LL(1) property IRS(rhs 1 ) = { Identifier } IRS(rhs 2 ) = { [ } IRS(rhs 3 ) = { ( } IRS(rhs 4 ) = IRS(Arguments) = OLLOW(actor) It has the LL(1) property his form has the same syntax, with the LL(1) property Graphically becomes actor No basis for choice Identifier Identifier Identifier [ xprlist ] ( xprlist ) actor Identifier [ xprlist ] Word determines correct choice ( xprlist ) 35 36 6

Left actoring (Generality) Recursive Descent (Summary) Question By eliminating left recursion and left factoring, can we transform an arbitrary CG to a form where it meets the LL(1) condition? (and can be parsed predictively with a single token lookahead?) Answer Given a CG that doesn t meet the LL(1) condition, it is undecidable whether or not an equivalent LL(1) grammar exists. xample {a n 0 b n n 1} {a n 1 b 2n n 1} has no LL(1) grammar 1. Build IRS (and OLLOW) sets 2. Massage grammar to have LL(1) condition a. Remove left recursion b. Left factor it 3. Define a procedure for each non-terminal a. Implement a case for each right-hand side b. Call procedures as needed for non-terminals 4. Add extra code, as needed a. Perform context-sensitive checking b. Build an IR to record the code Can we automate this process? 37 38 IRS and OLLOW Sets Computing IRS Sets IRS(α) or some α N, define IRS(α) as the set of tokens that appear as the first symbol in some string that derives from α hat is, x IRS(α) iff α * x γ, for some γ OLLOW(α) or some α N, define OLLOW(α) as the set of symbols that can occur immediately after α in a valid sentence. OLLOW(S) = {O}, where S is the start symbol o build IRS sets, we need OLLOW sets Define IRS as If α * aβ, a, β ( N)*, then a IRS(α) If α *, then IRS(α) If α β 1 β 2 β k then a IRS(α) if form some i a IRS(β i ) and IRS(β 1 ),, IRS(β i-1 ) Note: if α = Xβ, IRS(α) = IRS(X) o compute IRS Use a fixed-point method IRS(A) 2 ( ) Loop is monotonic Algorithm halts 39 40 Computing IRS Sets Computing OLLOW Sets for each x, IRS(x) { x } for each A N, IRS(A) Ø while (IRS sets are still changing) for each p P, of the form A β, if β is then IRS(A) IRS(A) { } if β is B 1 B 2 B k then begin IRS(A) IRS(A) ( IRS(B 1 ) { } ) for i 1 to k 1 by 1 while IRS(B i ) IRS(A) IRS(A) ( IRS(B i +1 ) { } ) if i = k 1 and IRS(B k ) then IRS(A) IRS(A) { } end for each A N if IRS(A) then IRS(A) IRS(A) OLLOW(A) Define OLLOW as Place $ in OLLOW(S) where S is the start symbol If A αbβ then any (a/) IRS(β) is in OLLOW(B) If A αb or A αbβ where IRS(β), then everything in OLLOW(A) is in OLLOW(B). 41 Note: IRS(α) 42 7

Computing OLLOW Sets Building op-down Parsers o compute OLLOW Sets Use a fixed-point method OLLOW(A) 2 ( ) Loop is monotonic Algorithm halts OLLOW(S) {$ } for each A N, OLLOW(A) Ø while (OLLOW sets are still changing) for each p P, of the form A β 1 β 2 β k OLLOW(β k ) OLLOW(β k ) OLLOW(A) RAILR OLLOW(A) for i k down to 2 if IRS(β i ) then OLLOW(β i-1 ) OLLOW(β i-1 ) { IRS(β i ) { } } RAILR OLLOW(β i-1 ) OLLOW(β i-1 ) IRS(β i ) RAILR Ø 43 Given an LL(1) grammar, and its IRS & OLLOW sets mit a routine for each non-terminal Nest of if-then- statements to check alternate rhs s ach returns true on success and throws an error on false Simple, working (, perhaps ugly,) code his automatically constructs a recursive-descent parser I don t know of a Improving matters system that does this Nest of if-then- statements may be slow Good case statement implementation would be better What about a table to encode the options? Interpret the table with a skeleton, as we did in scanning 44 xample: irst and ollow Sets + * ( ) id irst() = { (, id } irst() = irst() = { (, id } irst() = { +, } irst() = { *, } ollow() = { $ } but since () then ollow() = { ), $ } ollow() ={ ), $ } ollow() = ollow() = { +, ), $ } because ollow() = { *, +, ), $ } because erminal + * () 45 46 Building op-down Parsers Strategy: SACK X Y Z $ INPU a + b $ Predictive Parsing Program Parsing able M OUPU ncode knowledge in a table Use standard skeleton parser to interpret the table Building the complete able Need a row for every N & a column for every Need a table-driven interpreter for the able Algorithm: consider X the symbol on top of the symbol stack (OS) and the current input symbol a his tuple (X,a) determines the action as follows: If X = a = $ the parser halts and announces success If X = a $ the parser pops X off the stack and advances the input If X is non-terminal, consults entry of parsing table M. If not an error entry, and is a production i.e., = { X UVW } then replace X with WVU (reverse production RHS). If error invoke error recovery routine. 47 48 8

LL(1) Skeleton Parser Building op-down Parsers token next_token() push O onto Stack if OS = O and token = O then if OS is a terminal then exit on success if OS matches token then // recognized OS token next_token() report error looking for OS // OS is a non-terminal if ABL[OS,token] is A B1B2 Bk then push Bk, Bk-1,, B1 // in that order report error expanding OS Building the complete able Need a row for every N & a column for every Need an Algorithm to build the able illing in M[X,y], X N, y 1. ntry is the rule X β, if y IRS(β ) 2. ntry is the rule X if y OLLOW(X ) and X G 3. ntry is error if neither 1 nor 2 define it If any entry is defined multiple times, G is not LL(1) his is the LL(1) able construction Algorithm 49 50 erminal + * () SACK INPU OUPU push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS erminal + * () 51 52 SACK INPU OUPU $ id + id push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS SACK INPU OUPU $ id + id id + id push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS erminal erminal + + * * () () 53 54 9

SACK INPU OUPU $ id + id id + id id id + id push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS SACK INPU OUPU $ id + id id + id id id + id + id push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS erminal erminal + + * * () () 55 56 SACK INPU OUPU $ id + id id + id id id + id + id $ + id push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS SACK INPU OUPU $ id + id id + id id id + id + id $ + id $ + + id + push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS erminal erminal + + * * () () 57 58 SACK INPU OUPU $ id + id id + id id id + id + id $ + id $ + + id $ id + push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS SACK INPU OUPU $ id + id id + id id id + id + id $ + id $ + + id $ id id + push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS erminal erminal + + * * () () 59 60 10

SACK INPU OUPU $ id + id id + id id id + id + id $ + id $ + + id $ id id id id + push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS SACK INPU OUPU $ id + id id + id id id + id + id $ + id $ + + id $ id id id id + push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS erminal erminal + + * * () () 61 62 SACK INPU OUPU $ id + id id + id id id + id + id $ + id $ + + id $ id id id id * + push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS * erminal SACK INPU OUPU $ id + id id + id id id + id + id $ + id $ + + id $ id id id id * + push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS * erminal + + * * () () 63 64 SACK INPU OUPU $ id + id id + id id id + id + id $ + id $ + + id $ id id id id * id + * SACK INPU OUPU $ id + id id + id id id + id + id $ + id $ + + id $ id id id id * id $ + * push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS erminal + push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS erminal + * * () () 65 66 11

SACK INPU OUPU $ id + id id + id id id + id + id $ + id $ + + id $ id id id id * id $ $ $ + * SACK INPU OUPU $ id + id id + id id id + id + id $ + id $ + + id $ id id id id * id $ $ $ $ $ + * push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS erminal + * push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS erminal + * () () 67 68 SACK INPU OUPU $ id + id id + id id id + id + id $ + id $ + + id $ id id id id * id $ $ $ $ $ + * push O onto Stack if OS = O and token = O then if OS is a terminal then if OS matches token then // recognized OS report error looking for OS // OS is a non-terminal if M[OS,token] is A B1B2 Bk then report error expanding OS erminal + * SACK INPU OUPU $ id + id id + id id id + id + id $ + id $ + + id $ id id id id * id $ $ $ $ $ + * id + id * id () 69 70 rror Recovery in Predictive Parsing Panic-Mode rror Recovery What happens when is empty? Announce rror, Stop and erminate!? ngage in rror Recovery mode: Panic-mode: skip symbols on the input until a token in a synchronizing (synch) set of tokens appears on the input; complete entries to the table Phrase-level mode: invoke an external (possibly programmer-defined) procedure that manipulates the stack and the input; less structure, more ad-hoc No universally accepted method Heuristics to fill in empty table entries include: Place all symbols in ollow(a) a synch set of the non-terminal A; skip input tokens until on elements of synch is seen and then pop A Pretends like we have seen A and successfully parsed it. Use hierarchical relation between grammar symbols (e.g., expr and stats). Use irst(h) as synch of lower non-terminal symbols. In effect skip or ignore lower constructs poping then off the stack Add irst(a) to synch set of A without poping. Skip input until they match ry to move on to the beginning of the next occurrence of A If A, then try to use this production as default and proceed If a terminal cannot be matched, pop it from the stack In effect mimicing its insertion in the input stream 71 72 12

SACK INPU RMARK $ ) id * + $ $ id * id $ $ id * + id * + id * + id * + * + * + + $ $ $ Panic-mode rror Recovery xample error: skip ) until id in irst() id error: skip + until id in irst() erminal * id { (, id } op-down Parsing Predictive-Procedural Parsing Summary liminating Left-Recursion & Left actoring irst and ollow Sets able-driven Parsing rror Recovery + { (, id } * { (, id } { (, id } { (, id } () { (, id } 73 74 13