Bottom-Up Parsing Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient Builds on ideas in top-down parsing Bottom-up is the preferred method Originated from Prof. Aiken CS 13 Modified by Yu Zhang 1 2 An Introductory xample Bottom-up parsers don t need left-factored grammars he Idea Bottom-up parsing reduces a string to the start symbol by inverting productions: Revert to the natural grammar for our example: + ) Consider the string: + + + + + + + 3 Observation Read the productions in reverse from bottom to top) his is a rightmost derivation! + + + + + + Important Fact #1 Important Fact #1 about bottom-up parsing: A bottom-up parser traces a rightmost derivation in reverse 5 6
A Bottom-up Parse A Bottom-up Parse in Detail 1) + + + + + + + + 8 A Bottom-up Parse in Detail 2) A Bottom-up Parse in Detail 3) + + + + + + + 9 10 A Bottom-up Parse in Detail ) A Bottom-up Parse in Detail 5) + + + + + + + + + + + 12
A Bottom-up Parse in Detail 6) A rivial Bottom-Up Parsing Algorithm + + + + + + Let I = input string repeat pick a non-empty substring of I where X is a production if no such, backtrack replace one by X in I until I = S the start symbol) or all possibilities are exhausted 13 1 Questions Does this algorithm terminate? How fast is the algorithm? Does the algorithm handle all cases? How do we choose the substring to reduce at each step? Where Do Reductions Happen? Important Fact #1 has an eresting consequence: Let be a step of a bottom-up parse Assume the next reduction is by X hen is a string of terminals Why? Because X is a step in a rightmost derivation 15 16 Notation Idea: Split string o two substrings Right substring is as yet unexamined by parsing a string of terminals) Left substring has terminals and non-terminals he dividing po is marked by a he is not part of the string Shift-Reduce Parsing Bottom-up parsing uses only two kinds of actions: Shift Reduce Initially, all input is unexamined x 1 x 2... x n 1 18
Shift Shift: Move one place to the right Shifts a terminal to the left string ABC xyz ABCx yz Reduce Apply an inverse production at the right end of the left string If A xy is a production, then Cbxy ijk CbA ijk 19 20 he xample with Reductions Only he xample with Shift-Reduce Parsing + shift + shift + shift + reduce + reduce + reduce + reduce + shift + shift + reduce + reduce + reduce + reduce + reduce + + reduce + 21 22 A Shift-Reduce Parse in Detail 1) + A Shift-Reduce Parse in Detail 2) + + + + 23 2
A Shift-Reduce Parse in Detail 3) + + + A Shift-Reduce Parse in Detail ) + + + + + + 25 26 A Shift-Reduce Parse in Detail 5) A Shift-Reduce Parse in Detail 6) + + + + + + + + + + + + + 2 28 A Shift-Reduce Parse in Detail ) A Shift-Reduce Parse in Detail 8) + + + + + + + + + + + + + + + + + 29 30
A Shift-Reduce Parse in Detail 9) A Shift-Reduce Parse in Detail 10) + + + + + + + + + + + + + + + + + + + + + 31 32 A Shift-Reduce Parse in Detail ) he Stack + + + + + + + + + + + 33 Left string can be implemented by a stack op of the stack is the Shift pushes a terminal on the stack Reduce pops 0 or more symbols off of the stack production rhs) and pushes a nonterminal on the stack production lhs) 3 Conflicts Key Issue In a given state, more than one action shift or reduce) may lead to a valid parse If it is legal to shift or reduce, there is a shiftreduce conflict If it is legal to reduce by two different productions, there is a reduce-reduce conflict 35 How do we decide when to shift or reduce? xample grammar: + ) Consider step + We could shift: + We could reduce by giving + A fatal mistake! No way to reduce to the start symbol 36
移进 归约分析的冲突 1 移进 归约冲突 shift/reduce conflict) 例 stmt if expr then stmt ifexpr then stmt else stmt other 如果移进 归约分析器处于格局 configuration) 栈 输入 if expr then stmt else $ 3 移进 归约分析的冲突 2 归约 归约冲突 reduce/reduce conflict) stmt id parameter_list) expr = expr parameter_list parameter_list, parameter parameter parameter id expr id expr_list) id id ) 是数组元素的引用 expr_list expr_list, expr expr 由 AI, J) 开始的语句 栈 idid 归约成 expr 还是 parameter? 输入,id) 38 移进 归约分析的冲突 2 归约 归约冲突 stmt id parameter_list) expr = expr parameter_list parameter_list, parameter parameter parameter id expr id expr_list) id expr_list expr_list, expr expr 由 AI, J) 开始的语句 词法分析查符号表, 区分第一个 id) 自下而上分析方法和 LL 分析方法的比较 在下面的推导中, 最后一步用的是 A l S rm rm A b w rm l b w 栈 procid id 需要修改上面的文法 输入,id) 39 LL1) 决定用该产生式的位置 0 自下而上分析方法和 LL 分析方法的比较 在下面的推导中, 最后一步用的是 A l S rm rm A b w rm l b w LR1) 决定用该产生式的位置 LL1) 决定用该产生式的位置 1 Handles 句柄 ) Intuition: Want to reduce only if the result can still be reduced to the start symbol Assume a rightmost derivation S X hen X in the position after is a handle 句柄 )of 2
Handles Cont.) Handles formalize the uition A handle is a string that can be reduced and also allows further reductions back to the start symbol using a particular production at a specific spot) Important Fact #2 Important Fact #2 about bottom-up parsing: In shift-reduce parsing, handles appear only at the top of the stack, never inside We only want to reduce at handles Note: We have said what a handle is, not how to find handles 3 Why? Informal induction on # of reduce moves: rue initially, stack is empty Immediately after reducing a handle right-most non-terminal on top of the stack next handle must be to right of right-most nonterminal, because this is a right-most derivation Sequence of shift moves reaches next handle Summary of Handles In shift-reduce parsing, handles always appear at the top of the stack Handles are never to the left of the rightmost non-terminal herefore, shift-reduce moves are sufficient; the need never move left Bottom-up parsing algorithms are based on recognizing handles 5 6 Recognizing Handles Grammars here are no known efficient algorithms to recognize handles Solution: use heuristics to guess which stacks are handles On some CFGs, the heuristics always guess correctly For the heuristics we use here, these are the SLR grammars Other heuristics work for other grammars All CFGs Unambiguous CFGs SLR CFGs will generate conflicts 8
Viable Prefixes 活前缀 ) It is not obvious how to detect handles At each step the parser sees only the stack, not the entire input; start with that... is a viable prefix if there is an such that is a state of a shift-reduce parser 活前缀 : 是右句型的前缀, 该前缀不超过最右句柄的右端 S rm, S 是开始符号, 是右句型 9 Huh? What does this mean? A few things: A viable prefix does not extend past the right end of the handle It s a viable prefix because it is a prefix of the handle As long as a parser has viable prefixes on the stack no parsing error has been detected 50 Important Fact #3 Important Fact #3 about bottom-up parsing: For any grammar, the set of viable prefixes is a regular language Important Fact #3 Cont.) Important Fact #3 is non-obvious We show how to compute automata that accept viable prefixes 51 52 Items 项 ) An item is a production with a. somewhere on the rhs he items for ) are.).).) ). Items Cont.) he only item for X is X. Items are often called LR0) items 53 5
Intuition he problem in recognizing viable prefixes is that the stack has only bits and pieces of the rhs of productions If it had a complete rhs, we could reduce hese bits and pieces are always prefixes of rhs of productions xample Consider the input ) hen ) is a state of a shift-reduce parse is a prefix of the rhs of ) Will be reduced after the next shift Item.) says that so far we have seen of this production and hope to see ) 55 56 Generalization he stack may have many prefixes of rhs s Prefix 1 Prefix 2... Prefix n-1 Prefix n Let Prefix i be a prefix of rhs of X i i Prefix i will eventually reduce to X i he missing part of i-1 starts with X i i.e. there is a X i-1 Prefix i-1 X i for some An xample Consider the string ): ) is a state of a shift-reduce parse is a prefix of the rhs of ) is a prefix of the rhs of is a prefix of the rhs of Recursively, Prefix k+1 Prefix n eventually reduces to the missing part of k 5 58 An xample Cont.) he stack of items.).. Says We ve seen of ) We ve seen of We ve seen of Recognizing Viable Prefixes Idea: o recognize viable prefixes, we must Recognize a sequence of partial rhs s of productions, where ach sequence can eventually reduce to part of the missing suffix of its predecessor 59 60
An NFA Recognizing Viable Prefixes 1. Add a dummy production S S to G 2. he NFA states are the items of G Including the extra production An NFA Recognizing Viable Prefixes Cont.) 5. very state is an accepting state 6. Start state is S.S 3. For item.x add transition.x X X.. For item.x and production X add.x X. 61 62 NFA for Viable Prefixes of the xample ). ).).) ). S. +..+. +. S.. +. NFA for Viable Prefixes in Detail 1) S....... 63 6 NFA for Viable Prefixes in Detail 2) NFA for Viable Prefixes in Detail 3). ) S. S.. + S. S.. +.... 65. 66
NFA for Viable Prefixes in Detail ) NFA for Viable Prefixes in Detail 5). ). ).) S. S.. +..+ S. S.. +..+..... 6. 68 NFA for Viable Prefixes in Detail 6) NFA for Viable Prefixes in Detail ). ) S. S..). +..).+. ) S. S..). +..).+ ) )...... 69. 0 NFA for Viable Prefixes in Detail 8) NFA for Viable Prefixes in Detail 9). ) S. S..). +..).+ ) ). +.. ) S. S..). +..).+ ) ). +. +...... 1. 2
NFA for Viable Prefixes in Detail 10) NFA for Viable Prefixes in Detail ). ) S. S..). +. ).).+. ). +. +.. ) S. S..). +. ).).+. ). +. +....... 3. NFA for Viable Prefixes in Detail 12) NFA for Viable Prefixes in Detail 13). ) S. S..). +. ).).+. ). +. +.. ) S. S..). +. ).).+. ). +. +...... 5...... 6 +. ranslation to the DFA.. +.) S.... +.. S.... +..)...).).. ). ). +.. ).. +.).. Lingo he states of the DFA are canonical collections of items or canonical collections of LR0) items 项目集规范族 ) he Dragon book gives another way of constructing LR0) items 8
Valid Items 有效项目 ) Item X. is valid for a viable prefix if S X by a right-most derivation After parsing, the valid items are the possible tops of the stack of items Items Valid for a Prefix An item I is valid for a viable prefix if the DFA recognizing viable prefixes terminates on input in a state s containing I he items in s describe what the top of the item stack might be after reading input 9 80 +. Valid Items xample An item is often valid for many prefixes xample: he item.) is valid for prefixes... Valid Items for S... +. S.... +.)....).... +.)....) ) ). +.. ).. +.).. 81 82 LR0) Parsing Idea: Assume stack contains next input is t DFA on input terminates in state s Reduce by X if scontains item X. Shift if scontains item X.t equivalent to saying s has a transition labeled t LR0) Conflicts LR0) has a reduce/reduce conflict if: Any state has two reduce items: X. and Y. LR0) has a shift/reduce conflict if: Any state has a reduce item and a shift item: X. and Y.t 83 8
+. LR0) Conflicts S... + S..... +.)....)... +.. +.)... ).. +.) wo. shift/reduce. conflicts with..) LR0) rules ) ). 85 例 S as 1S S. S 0S. S S. as S. a 2S a.s S. as S. S a 3S as. shift/reduce conflicts with LR0) rules 1S S. S S. a 0S. S S. Sa S. S Sa S No shift/reduce conflict a 2S Sa. SLR SLR Parsing L = Left-to-right scan R = Rightmost derivation SLR = Simple LR SLR improves on LR0) shift/reduce heuristics Fewer states have conflicts Idea: Assume stack contains next input is t DFA on input terminates in state s Reduce by X if scontains item X. t FollowX) Shift if scontains item X.t 8 88 SLR Parsing Cont.) If there are conflicts under these rules, the grammar is not SLR he rules amount to a heuristic for detecting handles he SLR grammars are those where the heuristics detect exactly the handles 89 例 S as 1S S. S 0S. S S. as S. a 2S a.s S. as S. S a 3S as. FollowS) = {$} 0, 2: reduce by S if t is $ S Sa 1S S. a S S. a S 2S Sa. 0S. S S. Sa S.
+. SLR Conflicts S... + S..... +.)....)... +.. +.)... ).. + Follow) = { ), $ }.) Follow). = { +, ), $ }.. No.) conflicts with ) SLR rules! ). Precedence Declarations Digression Lots of grammars aren t SLR including all ambiguous grammars We can parse more grammars by using precedence declarations Instructions for resolving conflicts 91 92 Precedence Declarations Cont.) Consider our favorite ambiguous grammar: + ) he DFA for this grammar contains a state with the following items:.. + shift/reduce conflict! Declaring has higher precedence than + resolves this conflict in favor of reducing Precedence Declarations Cont.) he term precedence declaration is misleading hese declarations do not define precedence; they define conflict resolutions Not quite the same thing! 93 9 Notes If there is a conflict in the last step, grammar is not SLRk) k is the amount of lookahead In practice k = 1 Naïve SLR Parsing Algorithm 1. Let M be DFA for viable prefixes of G 2. Let x 1 x n $ be initial configuration 3. Repeat until configuration is S $ Let be current configuration Run M on current stack If M rejects, report parsing error Stack is not a viable prefix If M accepts with items I, let a be next input Shift if X. a I Reduce if X I and a FollowX) Report parsing error if neither applies 95 96
SLR xample Configuration DFA Halt State Action $ 1 shift Configuration $ +... +.) 6 2 S.. 5.. +.. S.. 3. 1. +..)...).).. ).. 10 +.. ). 8. +.).. 9 98 SLR xample Configuration DFA Halt State Action $ 1 shift $ 3 not in Follow) shift Configuration $ +... +.) 6 2 S.. 5.. +.. S.. 3. 1. +..)...).).. ).. 10 +.. ). 8. +.).. 99 100 Configuration $ +... +.) 6 2 S.. 5.. +.. S.. 3. 1. +..)...).).. ).. 10 +.. ). 8. +.).. SLR xample Configuration DFA Halt State Action $ 1 shift $ 3 not in Follow) shift $ shift 101 102
Configuration $ +... +.) 6 2 S.. 5.. +.. S.. 3. 1. +..)...).).. ).. 10 +.. ). 8. +.).. Configuration $ +... +.) 6 2 S.. 5.. +.. S.. 3. 1. +..)...).).. ).. 10 +.. ). 8. +.).. 103 10 Configuration $ +... +.) 6 2 S.. 5.. +.. S.. 3. 1. +..)...).).. ).. 10 +.. ). 8. +.).. 105 SLR xample Configuration DFA Halt State Action $ 1 shift $ 3 not in Follow) shift $ shift $ 3 $ Follow) red. 106 Configuration $ +... +.) 6 2 S.. 5.. +.. S.. 3. 1. +..)...).).. ).. 10 +.. ). 8. +.).. Configuration $ +... +.) 6 2 S.. 5.. +.. S.. 3. 1. +..)...).).. ).. 10 +.. ). 8. +.).. 10 108
Configuration $ +... +.) 6 2 S.. 5.. +.. S.. 3. 1. +..)...).).. ).. 10 +.. ). 8. +.).. Configuration $ +... +.) 6 2 S.. 5.. +.. S.. 3. 1. +..)...).).. ).. 10 +.. ). 8. +.).. 109 0 SLR xample Configuration DFA Halt State Action $ 1 shift $ 3 not in Follow) shift $ shift $ 3 $ Follow) red. $ $ Follow) red. 1 Configuration $ +... +.) 6 2 S.. 5.. +.. S.. 3. 1. +..)...).).. ).. 10 +.. ). 8. +.).. 2 Configuration $ +... +.) 6 2 S.. 5.. +.. S.. 3. 1. +..)...).).. ).. 10 +.. ). 8. +.).. Configuration $ +... +.) 6 2 S.. 5.. +.. S.. 3. 1. +..)...).).. ).. 10 +.. ). 8. +.).. 3
Configuration $ +... +.) 6 2 S.. 5.. +.. S.. 3. 1. +..)...).).. ).. 10 +.. ). 8. +.).. 5 SLR xample Configuration DFA Halt State Action $ 1 shift $ 3 not in Follow) shift $ shift $ 3 $ Follow) red. $ $ Follow) red. $ 5 $ Follow) red. 6 Configuration $ 2 S... + 5. S.. 3. 1. +.)...)... +... +.) 6....) ). 10 +.. ). 8. +.).. Configuration $ 2 S... + 5. S.. 3. 1. +.)...)... +... +.) 6....) ). 10 +.. ). 8. +.).. 8 SLR xample Notes Configuration DFA Halt State Action $ 1 shift $ 3 not in Follow) shift $ shift $ 3 $ Follow) red. $ $ Follow) red. $ 5 $ Follow) red. $ accept Skipped using extra start state S in this example to save space on slides Rerunning the automaton at each step is wasteful Most of the work is repeated 9 120
An Improvement Remember the state of the automaton on each prefix of the stack Change stack to contain pairs Symbol, DFA State An Improvement Cont.) For a stack sym 1, state 1... sym n, state n state n is the final state of the DFA on sym 1 sym n Detail: he bottom of the stack is any,start where anyis any dummy symbol startis the start state of the DFA 121 122 Goto able Define goto[i,a] = j if state i A state j goto is just the transition function of the DFA One of two parsing tables Refined Parser Moves Shift x Push a, x on the stack ais current input xis a DFA state Reduce X As before Accept rror 123 12 Action able For each state s i and terminal a If s i has item X.a and goto[i,a] = j then action[i,a] = shift j If s i has item X. and a FollowX) and X S then action[i,a] = reduce X If s i has item S S. then action[i,$] = accept Otherwise, action[i,a] = error 125 SLR Parsing Algorithm Let I = w$ be initial input Let j = 0 Let DFA state 1 have item S.S Let stack = dummy, 1 repeat case action[top_statestack),i[j]] of shift k: push I[j++], k reduce X A: pop A pairs, push X, goto[top_statestack),x] accept: halt normally error: halt and report error 126
例 S as S S Sa S 1S S. 1S S. 2S a.s S S. a a S S a. as S 0S. S S. 0S. S 2S Sa. S. as S. a S 3S as. S. Sa S. a $ S a $ S 0 s2 r2 1 0 r2 r2 1 1 acc 1 s2 acc 2 s2 r2 3 2 r1 r1 3 r1 例 S as S S Sa S a $ S a $ S 0 s2 r2 1 0 r2 r2 1 1 acc 1 s2 acc 2 s2 r2 3 2 r1 r1 3 r1 aaa$ shift aaa$ reduce a2 aa$ shift S1 aaa$ shift a2a2 a$ shift S1a2 aa$ reduce a2a2a2 $ reduce S1 aa$ shift a2a2a2s3 $ reduce S1a2 a$ reduce a2a2s3 $ reduce S1 a$ shift a2s3 $ reduce S1a2 $ reduce S1 $ accept S1 $ accept Notes on SLR Parsing Algorithm Note that the algorithm uses only the DFA states and the input he stack symbols are never used! However, we still need the symbols for semantic actions More Notes Some common constructs are not SLR1) LR1) is more powerful Build lookahead o the items An LR1) item is a pair: LR0) item x lookahead [, $] means After seeing reduce if lookahead is $ More accurate than just using follow sets ake a look at the LR1) automaton for your parser! 129 130