Top-down parsing with a parsing table (once more) Panic-mode error recovery CURRENT INPUT TOKEN VAR a b c d e f g h $ S b AaS AaS AaS A cb db ecdbf DB B ǫ ǫ DB DB C c d ecdbf D gc hc STACK CURRENT INPUT PRODUCTION TO APPLY S$ cgedhcf ab$ S AaS AaS$ cgedhcf ab$ A cb cbas$ cgedhcf ab$ match BaS$ gedhcf ab$ B DB DBaS$ gedhcf ab$ D gc gcbas$ gedhcf ab$ match CBaS$ edhcf ab$ C ecdbf ecdbf BaS$ edhcf ab$ match CDBf BaS$ dhcf ab$ Idea 1: If you have a variable on top of the stack, skip input tokens until a synchronizing token for that variable appears At that point, pop the variable and try to resume (Of course, also say something about what has happened) When does it (possibly) make sense to discard the variable on top of the stack? When we see a token that can follow whatever that variable can generate This idea suggests that the synchronizing tokens for variable A will be the elements of FOLLOW(A) 1 2
S AaS b A cb db ecdbfdb B DB ǫ C c d ecdbf D gc hc For every variable A, FOLLOW(A) is the set consisting of S AaS b A cb db ecdbfdb B DB ǫ C c d ecdbf D gc hc For every variable A, FOLLOW(A) is the set consisting of all terminals a st S αaaβ all terminals a st S αaaβ for some strings α, β over V Σ, along with for some strings α, β over V Σ, along with $, if S αa $, if S αa for some string α over V Σ We previously saw that FOLLOW(B) = {a, f} VARIABLE FOLLOW SET S A B {a,f} C D for some string α over V Σ $ FOLLOW(S) since S S a FOLLOW(A) since S AaS a, f, g, h FOLLOW(D) since S ecdfdbas and FIRST(B) = {g, h, ǫ} a, f, g, h FOLLOW(C) since S ecgcfgcbas and FIRST(B) = {g, h, ǫ} 3 4
FOLLOW TABLE VARIABLE FOLLOW SET S {$} A {a} B {a, f} C {a, f, g, h} D {a, f, g, h} So we can augment the parsing table to indicate variable lookahead pairs that may be useful for error recovery synchronization If there is currently no entry for a pair from the follow table, add a synch entry CURRENT INPUT TOKEN VAR a b c d e f g h $ S b AaS AaS AaS synch A synch cb db ecdbf DB B ǫ ǫ DB DB C synch c d ecdbf synch synch synch D synch synch gc hc Let s try this on an example CURRENT INPUT TOKEN VAR a b c d e f g h $ S b AaS AaS AaS synch A synch cb db ecdbf DB B ǫ ǫ DB DB C synch c d ecdbf synch synch synch D synch synch gc hc STACK CURRENT INPUT PRODUCTION TO APPLY S$ cgah$ S AaS AaS$ cgah$ A cb cbas$ cgah$ match BaS$ gah$ B DB DBaS$ gah$ D gc gcbas$ gah$ match CBaS$ ah$ error, synch BaS$ ah$ B ǫ as$ ah$ match S$ h$ error S$ $ error, synch $ $ parse complete First error? Missing C (missing term) Second error? Ignored unexpected h (unexpected ) Third error? Missing S (missing eof?) 5 6
Two more ideas for what to do when an error occurs: Idea 2: If you have a variable A on top of the stack, skip input tokens until you get a token in FIRST(A) (Also say something about what has happened) Notice that we don t need to add any information to the parse table in order to implement Idea 2 CURRENT INPUT TOKEN VAR a b c d e f g h $ S b AaS AaS AaS synch A synch cb db ecdbf DB B ǫ ǫ DB DB C synch c d ecdbf synch synch synch D synch synch gc hc So in combination with Idea 1, when there is a parse error and a variable A on top of the stack, we skip input tokens until we see either a token in FIRST(A), in which case we simply continue, or a token in FOLLOW(A), in which case we pop A off the stack and continue Idea 3: If you have a token a on top of the stack, discard it, and say inserting a in input STACK CURRENT INPUT PRODUCTION TO APPLY S$ caab$ S AaS AaS$ caab$ A cb cbas$ caab$ match BaS$ aab$ B ǫ as$ aab$ match S$ ab$ error S$ b$ S b b$ b$ match $ $ parse complete What to say about this error? ignored unexpected a 7 8
CURRENT INPUT TOKEN VAR a b c d e f g h $ S b AaS AaS AaS synch A synch cb db ecdbf DB B ǫ ǫ DB DB C synch c d ecdbf synch synch synch D synch synch gc hc STACK CURRENT INPUT PRODUCTION TO APPLY S$ f eab$ error S$ eab$ S AaS AaS$ eab$ A ecdbf DB ecdbf DBaS$ eab$ match CDBfDBaS$ ab$ error, synch DBfDBaS$ ab$ error, synch BfDBaS$ ab$ B ǫ f DBaS$ ab$ error DBaS$ ab$ error,synch BaS$ ab$ B ǫ as$ ab$ The book describes two more ideas for panic-mode error handling in top-down parsing They are less convincing It appears that on this example, the techniques we ve looked at work pretty well Of course, if you like, you can simply insert error routines as actions in the parse table, doing arbitrarily helpful and/or complex things in response to errors The book calls this phrase-level recovery First error, ignoring unexpected f Second error, missing C Third error, missing D Fourth error, inserted f 9 10
Closing Remarks on Top-Down Parsing Bottom-up parsing In many cases, as in the long example last time, we can eliminate all left recursion (in three steps) and if we simply left factor at that point, we will fail to obtain an LL(1) grammar even though there is in fact an equivalent LL(1) grammar Finding an equivalent LL(1) grammar is too much of an art! And if we do find one, it may be hard to understand and awkward for producing a translation Bottom-up parsing is more widely applicable than top-down parsing, and more widely used but less intuitive Rough idea, construct a parse tree from the bottom up (instead of from the top down) That sounds simple enough, but it seems to be harder to understand in detail how it works Top-down parsing is appealing because it is relatively intuitive But in practice, the approach often leads to grammars that are unintuitive because we need an LL(1) grammar Moreover, there are many languages that are eminently parsable, but for which there is no LL(1) grammar 11 12
Bottom-up parsing aka shift-reduce parsing Bottom-up parsing is also called shift-reduce parsing A successful parse reduces the input string to the start symbol Example Consider input a + b a and grammar E E + E E E a b STACK INPUT ACTION $ a+b a$ shift $a +b a$ E a $E +b a$ shift $E+ b a$ shift $E+b a$ E b $E+E a$ E E + E $E a$ shift $E a$ shift $E a $ E a $E E $ E E E $E $ accept STACK INPUT ACTION $ a+b a$ shift $a +b a$ E a $E +b a$ shift $E+ b a$ shift $E+b a$ E b $E+E a$ E E + E $E a$ shift $E a$ shift $E a $ E a $E E $ E E E $E $ accept To what derivation does this correspond? Notice something remarkable about the correspondence between the derivation steps and the stack and input contents at each step? 13 14
Before beginning to say more precisely what is happening here (a long story!), let s consider another example Here s the (transposed) grammar used to specify the first problem in hwk 1 S AaS b A c d B B AgC AhC DgC DhC C c d D D ebf STACK INPUT ACTION $ cab$ shift $c ab$ A c $A ab$ shift $Aa b$ shift $Aab $ S b $AaS $ S AaS $S $ accept Right-sentential forms We notice that at each step in a successful bottom-up parse, as illustrated in the previous examples, the concatenation of the current stack and input corresponds to a sentential form Moreover, the parse as a whole corresponds to a rightmost derivation of the input We call a sentential form a right-sentential form if it appears in some rightmost derivation from the start symbol Notice, in particular, that every sentence of the grammar (ie every string in the language generated by the grammar) is a right-sentential form Now why didn t we use production C c at the second step? Cab is not a sentential form! 15 16
Handles We can reduce by a production B β when there is a right-sentential form αβγ st the stack holds $αβ and the current input is γ$ and αbγ αβγ is the last step in a rightmost derivation of αβγ from the start symbol Definition A production B β is a handle of αβγ in the position following α if αbγ αβγ is the last step in a rightmost derivation from the start symbol A production B β is a handle of αβγ in the position following α if αbγ αβγ is the last step in a rightmost derivation from the start symbol It is convenient to simply say that β is a handle of αβγ, if it is clear what production and position are meant Example S cad A ab a So a is a handle for cad, since S cad cad Is a a handle for cabd? What production is meant? And in what position? So, is there a rightmost derivation that ends cabd cabd? Is ab a handle for cabd? Does ab have a handle? Does cad? 17 18
For next time We ll continue our study of bottom-up parsing Read 45 19