Administrativia Building a Parser III CS164 3:30-5:00 10 vans WA1 due on hu PA2 in a week Slides on the web site I do my best to have slides ready and posted by the end of the preceding logical day yesterday, my best was not good enough 1 2 Overview Finish recursive descent parser when it breaks down and how to fix it eliminating left recursion reordering productions Predictive parsers (aka LL(1) parsers) computing FIRS, FOLLOW table-driven, stack-manipulating version of the parser Review: grammar for arithmetic expressions Simple arithmetic expressions: d n id ( ) * Some elements of this language: id n ( n ) n id id * ( id id ) 3 4 Review: derivation Recursive Descent Parsing Grammar: d n id ( ) * a derivation: rewrite with ( ) ( ) rewrite with n ( n ) this is the final string of terminals another derivation (written more concisely): d ( ) d ( * ) d ( * ) d ( n * ) d ( n id * ) d ( n id * id ) this is left-most derivation (remember it) always expand the left-most non-terminal can you guess what s right-most derivation? Consider the grammar int int * ( ) oken stream is: int 5 * int 2 Start with top-level non-terminal ry the rules for in order 5 6 1
Recursive-Descent Parsing Parsing: given a string of tokens t 1 t 2... t n, find its parse tree Recursive-descent parsing: ry all the productions exhaustively At a given moment the fringe of the parse tree is: t 1 t 2 t k A ry all the productions for A: if A d BC is a production, the new fringe is t 1 t 2 t k B C Backtrack when the fringe doesn t match the string Stop when there are no more non-terminals When Recursive Descent Does Not Work Consider a production S S a: In the process of parsing S we try the above rule What goes wrong? A fix? S must have a non-recursive production, say S b expand this production before you expand S S a Problems remain performance (steps needed to parse baaaaa ) termination (parse the error input c ) 7 8 Solutions First, restrict backtracking backtrack just enough to produce a sufficiently powerful r.d. parser Second, eliminate left recursion transformation that produces a different grammar the new grammar generates same strings but does it give us same parse tree as old grammar? Let s see the restricted r.d. parser first A Recursive Descent Parser (1) Define boolean functions that check the token string for a match of A given token terminal bool term(okn tok) { return in[next] == tok; } A given production of S (the n th ) bool S n () { } Any production of S: bool S() { } hese functions advance next 9 10 A Recursive Descent Parser (2) A Recursive Descent Parser (3) For production bool 1 () { return () && term(plus) && (); } For production bool 2 () { return (); } For all productions of (with backtracking) bool () { int save = next; return (next = save, 1 ()) (next = save, 2 ()); } Functions for non-terminal bool 1 () { return term(opn) && () && term(clos); } bool 2 () { return term(in) && term(ims) && (); } bool 3 () { return term(in); } bool () { int save = next; return (next = save, 1 ()) (next = save, 2 ()) (next = save, 3 ()); } 11 12 2
Recursive Descent Parsing. Notes. o start the parser Initialize next to point to first token Invoke () Notice how this simulates our backtracking example from lecture asy to implement by hand Predictive parsing is more efficient Now back to left-recursive grammars Does this style of r.d. parser work for our left-recursive grammar? the grammar: S S a b what happens when S S a is expanded first? what happens when S b is expanded first? 13 14 Left-recursive grammars limination of Left Recursion A left-recursive grammar has a non-terminal S S Sα for some α Recursive descent does not work in such cases It goes into an loop Notes: α: a shorthand for any string of terminals, nonterminals symbol is a shorthand for can be derived in one or more steps : S Sα is same as S Sα Consider the left-recursive grammar S S α β S generates all strings starting with a β and followed by a number of α Can rewrite using right-recursion S βs S αs 15 16 limination of Left-Recursion. xample More limination of Left-Recursion Consider the grammar S d 1 S 0 ( β = 1 and α = 0 ) can be rewritten as S d 1 S S d 0 S In general S S α 1 S α n β 1 β m All strings derived from S start with one of β 1,,β m and continue with several instances of α 1,,α n Rewrite as S β 1 S β m S S α 1 S α n S 17 18 3
General Left Recursion he grammar S A α δ A S β is also left-recursive because S S β α his left-recursion can also be eliminated See [ASU], Section 4.3 for general algorithm Summary of Recursive Descent simple parsing strategy left-recursion must be eliminated first but that can be done automatically unpopular because of backtracking thought to be too inefficient in practice, backtracking is (sufficiently) eliminated by restricting the grammar so, it s good enough for small languages careful, though: order of productions important even after left-recursion eliminated try to reverse the order of what goes wrong? 19 20 Motivation Predictive parsers Wouldn t it be nice if the r.d. parser just knew which production to expand next? Idea: replace with return (next = save, 1()) (next = save, 2()); } switch ( something ) { case L1: return 1(); case L2: return 2(); otherwise: print syntax error ; } what s something, L1, L2? the parser will do lookahead (look at next token) 22 Predictive Parsers LL(1) Languages Like recursive-descent but parser can predict which production to use By looking at the next few tokens No backtracking Predictive parsers accept LL(k) grammars L means left-to-right scan of input L means leftmost derivation k means predict based on k tokens of lookahead In practice, LL(1) is used 23 In recursive-descent, for each non-terminal and input token there may be a choice of production LL(1) means that for each non-terminal and token there is only one production that could lead to success Can be specified as a 2D table One dimension for current non-terminal to expand One dimension for next token A table entry contains one production 24 4
Predictive Parsing and Left Factoring int int * ( ) Left factoring Impossible to predict because For two productions start with int For it is not clear how to predict A grammar must be left-factored before use predictive parsing 26 Left-Factoring xample int int * ( ) Factor out common prefixes of productions X X ( ) int Y Y * LL(1) parser (details) 27 LL(1) parser LL(1) Parsing able xample to simplify things, instead of switch ( something ) { case L1: return 1(); case L2: return 2(); otherwise: print syntax error ; Left-factored grammar X ( ) int Y he LL(1) parsing table: X Y * } we ll use a LL(1) table and a parse stack the LL(1) table will replace the switch the parse stack will replace the call stack X int int Y X * ( ( ) X ) $ Y * 29 30 5
LL(1) Parsing able xample (Cont.) Consider the [, int] entry When current non-terminal is and next input is int, use production X his production can generate an int in the first place Consider the [Y,] entry When current non-terminal is Y and current token is, get rid of Y We ll see later why this is so LL(1) Parsing ables. rrors Blank entries indicate error situations Consider the [,*] entry here is no way to derive a string starting with * from non-terminal 31 32 Using Parsing ables LL(1) Parsing Algorithm Method similar to recursive descent, except For each non-terminal S We look at the next token a And choose the production shown at [S,a] We use a stack to keep track of pending nonterminals We reject when we encounter an error state We accept when we encounter end-of-input initialize stack = <S $> and next (pointer to tokens) repeat case stack of <X, rest> : if [X,*next] = Y 1 Y n then stack <Y 1 Y n rest>; else error (); <t, rest> : if t == *next then stack <rest>; else error (); until stack == < > 33 34 LL(1) Parsing xample Constructing Parsing ables Stack Input Action $ int * int $ X X $ int * int $ int Y int Y X $ int * int $ terminal Y X $ * int $ * * X $ * int $ terminal X $ int $ int Y int Y X $ int $ terminal Y X $ $ X $ $ $ $ ACCP LL(1) languages are those defined by a parsing table for the LL(1) algorithm No table entry can be multiply defined We want to generate parsing tables from CFG 35 36 6
Constructing Predictive Parsing ables Consider the state S d * βaγ With b the next token rying to match βbδ here are two possibilities: 1. b belongs to an expansion of A Any A d α can be used if b can start a string derived from α In this case we say that b First(α) Or Constructing Predictive Parsing ables (Cont.) 2. b does not belong to an expansion of A he expansion of A is empty and b belongs to an expansion of γ Means that b can appear after A in a derivation of the form S d * βabω We say that b Follow(A) in this case What productions can we use in this case? Any A d α can be used if α can expand to We say that First(A) in this case 37 38 First Sets. xample Computing First, Follow sets X X ( ) int Y Y * First sets First( ( ) = { ( } First( ) = {int, ( } First( ) ) = { ) } First( ) = {int, ( } First( int) = { int } First( X ) = {, } First( ) = { } First( Y ) = {*, } First( * ) = { * } 40 Computing First Sets Definition First(X) = { b X * bα} { X * } 1. First(b) = { b } 2. For all productions X A 1 A n Add First(A 1 ) {} to First(X). Stop if First(A 1 ) Add First(A 2 ) {} to First(X). Stop if First(A 2 ) Add First(A n ) {} to First(X). Stop if First(A n ) Add to First(X) 3. Repeat step 2 until no First set grows Follow Sets. xample X X ( ) int Y Y * Follow sets Follow( ) = { int, ( } Follow( * ) = { int, ( } Follow( ( ) = { int, ( } Follow( ) = {), $} Follow( X ) = {$, ) } Follow( ) = {, ), $} Follow( ) ) = {, ), $} Follow( Y ) = {, ), $} Follow( int) = {*,, ), $} 41 42 7
Computing Follow Sets Definition Follow(X) = { b S * β X b δ } 1. Compute the First sets for all non-terminals first 2. Add $ to Follow(S) (if S is the start non-terminal) 3. For all productions Y X A 1 A n Add First(A 1 ) {} to Follow(X). Stop if First(A 1 ) Add First(A 2 ) {} to Follow(X). Stop if First(A 2 ) Add First(A n ) {} to Follow(X). Stop if First(A n ) Add Follow(Y) to Follow(X) Constructing the LL(1) parsing table 4. Repeat step 3 until no Follow set grows 43 Constructing LL(1) Parsing ables Constructing LL(1) ables. xample Construct a parsing table for CFG G X X ( ) int Y Y * Where in the line of Y we put Y *? In the lines of First( *) = { * } For each production A αin G do: For each terminal b First(α) do [A, b] = α If α d *, for each b Follow(A) do [A, b] = α If α d * and $ Follow(A) do [A, $] = α Where in the line of Y we put Y d? In the lines of Follow(Y) = { $,, ) } 45 46 Notes on LL(1) Parsing ables If any entry is multiply defined then G is not LL(1) If G is ambiguous If G is left recursive If G is not left-factored And in other cases as well Most programming language grammars are not LL(1) here are tools that build LL(1) tables Notes and Review 47 8
op-down Parsing. Review op-down Parsing. Review op-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal int * int int 49 op-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal he leaves at any point form a string βaγ β contains only terminals int * he input string is βbδ he prefix β matches he next token is b int * int int 50 op-down Parsing. Review op-down Parsing. Review op-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal he leaves at any point form a string βaγ β contains only terminals int * he input string is βbδ he prefix β matches int he next token is b int * int int 51 op-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal he leaves at any point form a string βaγ β contains only terminals int he input string is βbδ * he prefix β matches int int he next token is b int * int int 52 Predictive Parsing. Review. A predictive parser is described by a table For each non-terminal A and for each token b we specify a production A d α When trying to expand A we use A d α if b follows next Once we have the table he parsing algorithm is simple and fast No backtracking is necessary 53 9