Bottom-up Parsing: Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals. Basic operation is to shift terminals from the input to the stack until the right-hand side of an appropriate grammar rule is seen, and then to reduce the stuff on the stack that matches the right-hand side to the single non-terminal of the rule. Hence, bottom-up parsers are often called shift-reduce parsers. 1 Example Grammar: E E + n n Input: 2 + 3, or n + n Parse: ($ is EOF input, also bottom of stack) Parsing stack Input Action 1 $ n + n $ shift 2 $ n + n $ reduce E n 3 $ E + n $ shift 4 $ E + n $ shift 5 $ E + n $ reduce E E + n 6 $ E $ accept 2 Compiler Construction F6S 1
Notes: Left recursion is not a problem in bottom-up parsing. Indeed, as we shall see, look-ahead is not as serious an issue. Keeping track of what is on the stack, however, is an issue (note the difference in the grammar rule reductions at lines 2 and 5 of the previous example). See later discussion on stack state. Right recursion is actually a bit of a problem, because it makes the stack grow large (see next example). 3 Example Grammar: E n + E n Input: 2 + 3, or n + n Parse: Parsing stack Input Action 1 $ n + n $ shift 2 $ n + n $ shift 3 $ n + n $ shift 4 $ n + n $ reduce E n 5 $ n + E $ reduce E n + E 6 $ E $ accept 4 Compiler Construction F6S 2
Decision Problems in Bottom-up Parsing (parsing conflicts): Shift-reduce conflicts: Almost always come from ambiguities, and almost always the right disambiguating rule is to shift (dangling-else). Reduce-reduce conflicts: More difficult, bottom-up parsers try to resolve them using Follow contexts. There are no shift-shift conflicts. 5 Dangling-else Example: Grammar: S I o I i S i S e S Input: i i o e o Parse: Parsing stack Input Action 1 $ i i o e o $ shift 2 $ i i o e o $ shift 3 $ i i o e o $ shift 4 $ i i o e o $ reduce S o 5 $ i i S e o $ shift/reduce (shift) 6 $ i i S e o $ shift 7 $ i i S e o $ reduce S o 8 $ i i S e S $ reduce I i S e S 9 $ i I $ reduce S I 10 $ i S $ reduce I i S 11 $ I $ reduce S I 12 $ S $ accept 6 Compiler Construction F6S 3
Reduce-reduce Example Grammar: S AB A x B x Input: x x Parse: (Follow(A) = {x}, Follow(B) = {$}) Parsing stack Input Action 1 $ x x $ shift 2 $ x x $ reduce A x (reduce B x) 3 $ A x $ shift 4 $ A x $ reduce B x 5 $ A B $ reduce S A B 6 $ S $ accept 7 Shift-reduce parsers differ in their use of Follow information: LR(0) parsers never consult the look-ahead at all. SLR(1) parsers use the Follow sets as previously constructed. LR(1) parsers use context to split the Follow sets into subsets for different parsing paths (huge, inefficient parsers). LALR(1) parsers: like LR(1) but coarser subsets are used (achieves most of the benefit, but much smaller and faster). 8 Compiler Construction F6S 4
Technical Addendum Shift-reduce parsers have trouble figuring out when to accept, so acceptance is turned into a reduction by a new rule S S with a new start symbol S. Adding this rule is called augmenting the grammar: Parsing stack Input Action <previous example>...... 5 $ A B $ reduce S A B 6 $ S $ reduce S' S 7 $ S' $ accept 9 Format of a Yacc/Bison definition file {definitions} %% {rules} %% {auxiliary C functions} Typical file extensions:.y.yacc.bison 10 Compiler Construction F6S 5
Stack States: the DFA of sets of LR(0) items An item is a grammar rule option with a distinguished position (indicated by a period or other symbol): A α. β The position in an item indicates that a parse has reached that position in recognizing that rule. α is then on the stack, and a β may be coming in the input (the α is called a viable prefix). A stack state consists of the set of items which have compatible viable prefixes. 11 Computing the DFA of sets of LR(0) items Add the augmentation item S. S to the start state of the DFA. Then add all the initial items for S to the state: S. α. Continue by adding all the initial items for those nonterminals which appear right after the dot in any previous item (this is called the closure of the set of items). Every symbol that comes immediately after the dot gives rise to a transition to a state generated by adding closure items to the item with the dot moved past that symbol. 12 Compiler Construction F6S 6
Example: S A B, A x, B x The DFA of sets of items: S' S A x. S. A B. x 0 S A S' S. 5 A x. 1 B S A. B S A B. B. x 4 2 x B x. 3 13 Parsing Table from the DFA (1) S AB, (2) A x, (3) B x State Input Goto x other $ S A B 0 s1 5 2 1 r2 r2 r2 2 s3 4 3 r3 r3 r3 4 r1 r1 r1 5 accept Note that this table is LR(0), since each state is either a shift state or a reduce state by a single rule. Thus, no look-ahead needs to be consulted. 14 Compiler Construction F6S 7
Example of a parse by the table Parsing stack Input Action 1 $0 x x $ s1 2 $ 0 x 1 x $ r2 (A x) 3 $ 0 A 2 x $ s3 4 $ 0 A 2 x 3 $ r3 (B x) 5 $ 0 A 2 B 4 $ r1 (S A B) 6 $ 0 S 5 $ accept Note that, with the state numbers on the stack, the symbols themselves need not be put on the stack, and we will delete them occasionally from future examples. 15 Bison table is essentially the same: state 0 'x' shift, and go to state 1 S go to state 5 A go to state 2 state 1 A -> 'x'.(rule 2) $default reduce using rule 2 (A) state 2 S -> A. B (rule 1) 'x' shift, and go to state 3 B go to state 4 state 3 B -> 'x'. (rule 3) $default reduce using rule 3 (B) state 4 S -> A B. (rule 1) $default reduce using rule 1 (S) state 5 $ go to state 6 state 6 $ go to state 7 state 7 $default accept 16 Compiler Construction F6S 8
Example: E E + n n The DFA of sets of items: E'. E E. E + n E. n 0 n E E' E. E E. + n 1 + E n. E E +. n n E E + n. 2 3 4 17 Parsing Table (not( LR(0)!): ): State Input Goto n + $ E 0 s2 1 1 s3 accept 2 r (E n) r (E n) 3 s4 4 r (E E + n) r (E E + n) This table uses the SLR(1) rule: consult the Follow sets to decide between a shift and a reduce, or between two reduces: Follow(E) = { + $ }, Follow(E ) = {$} (Note the clear need for augmentation in this case) 18 Compiler Construction F6S 9
Sample Parse: Parsing stack Input Action 1 $ 0 n + n + n $ shift 2 2 $ 0 n 2 + n + n $ reduce E n 3 $ 0 E 1 + n + n $ shift 3 4 $ 0 E 1 + 3 n + n $ shift 4 5 $ 0 E 1 + 3 n 4 + n $ reduce E E + n 6 $ 0 E 1 + n $ shift 3 7 $ 0 E 1 + 3 n $ shift 4 8 $ 0 E 1 + 3 n 4 $ reduce E E + n 9 $ 0 E 1 $ accept 19 Example with ε (Bison table): S A B, A x, A ε, B x state 0 'x' shift, and go to state 1 'x' [reduce using rule 3 (A)] $default reduce using rule 3 (A) S go to state 5 A go to state 2 state 1 A -> 'x'. (rule 2) $default reduce using rule 2 (A) state 2 etc... 20 Compiler Construction F6S 10
Remarks on Example with ε This grammar is unambiguous, but cannot be parsed with one symbol of look-ahead The default disambiguation rule in Bison means it will never accept the string x, only the string x x Thus, even bottom-up parsers have their power limited by look-ahead 21 A Contrasting Example with ε: S A B, A x, B x, B ε state 0 'x' shift, and go to state 1 S go to state 5 A go to state 2 state 1 A -> 'x'. (rule 2) $default reduce using rule 2 (A) state 2 S -> A. B (rule 1) 'x' shift, and go to state 3 $default reduce using rule 4 (B) B go to state 4 22 Compiler Construction F6S 11
Contrasting Example with ε (continued) : state 3 B -> 'x'. (rule 3) $default reduce using rule 3 (B) state 4 S -> A B. (rule 1) $default reduce using rule 1 (S) state 5 $ go to state 6 state 6 $ go to state 7 state 7 $default accept 23 Example: E n + E n state 0 'n' shift, and go to state 1 E go to state 4 state 1 E -> 'n'. '+' E (rule 1) E -> 'n'. (rule 2) '+' shift, and go to state 2 $default reduce using rule 2 (E) state 2 E -> 'n' '+'. E (rule 1) 'n' shift, and go to state 1 E go to state 3 state 3 E -> 'n' '+' E. (rule 1) $default reduce using rule 1 (E) state 4 $... accept 24 Compiler Construction F6S 12
Dangling-else: Grammar: (1) S I (2) S other (3) I if S (4) I if S else S DFA of LR(0) items on next slide 25 S'. S S. I S S' S. 1 S. other I. if S I S I. I. if S else S 2 0 other if I I S other. 3 other if I if. S I if. S else S S. I S. other I. if S I. if S else S 4 if I if S else. S S. I S. other I. if S I. if S else S 6 else S I if S. I if S. else S 5 other S I if S else S. 7 26 Compiler Construction F6S 13
The parsing table (with conflict): State Input Goto if else other $ S I 0 s4 s3 1 2 1 accept 2 r1 r1 3 r2 r2 4 s4 s3 5 2 5 s6/r3 r3 6 s4 s3 7 2 7 r4 r4 27 When Follow Sets Fail: S id V = E V id E V n But Yacc/Bison has no trouble with this grammar! state 0 id shift, and go to state 1 S go to state 8 V go to state 2 state 1 S -> id. (rule 1) V -> id. (rule 3) '=' reduce using rule 3 (V) $default reduce using rule 1 (S)... 28 Compiler Construction F6S 14
Notes on this example Legal strings are: { id, id=id, id=n } First(S) = First(V) = {id}, First(E) = {id, n} Follow(S) = Follow(E) = {$} Follow(V) = {$, id, =} There is a reduce-reduce SLR(1) conflict on $ in state 1 (Follow(S) and Follow(V) both contain $) 29 LALR(1) parsers add lookahead An LR(1) item has a look-ahead component: [ S.S, $] Begin with the above item, propagate look-ahead to closure items as follows: given item [ A α.bγ,a], add closure item [B.β, b] for all b in First(γa). 30 Compiler Construction F6S 15
The DFA for the example is then: [ S'. S, $ ] [ S.id, $ ] [ S. V = E, $ ] [ V.id, = ] V 0 S id [ S' S., $ ] 8 [ S id., $ ] [ V id., = ] 1 [ S V. = E, $ ] [ S V = E.,$ ] 2 := E [ S V =. E, $ ] [ E. V, $ ] [ E.n, $ ] [ V.id, $ ] 3 7 V n id [ E V., $ ] [ E n., $ ] [ V id., $ ] 6 5 4 31 Definition of First Sets for Symbols (and ε): If X is a terminal or ε, then First(X) = {X}. If X is a non-terminal, then for each production choice X X 1 X 2...X n, First(X) contains First(X 1 ) {ε}. If also for some i < n, all of the sets First(X 1 ),..., First (X i ) contain ε, then First(X) contains First(X i+1 ) {ε}. If all of the sets First(X 1 ),..., First (X n ) contain ε then First(X) also contains ε. 32 Compiler Construction F6S 16
Definition of First Sets for Strings of Symbols: First (α) for any string α = X 1 X 2...X n (a string of terminals and non-terminals) is as follows. First(α) contains First(X 1 ) {ε}. For each i = 2,..., n, if First (X k ) contains ε for all k = 1,..., i 1, then First(α) contains First (X i ) {ε}. Finally, if for all i = 1,..., n, First (X i ) contains ε, then First(α) contains ε. 33 Example S A B z, A x ε, B y ε First(A) = { x, ε } First(B) = { y, ε } First(S) = { x, y, z, ε } First(ASw) = { x, y, z, w } 34 Compiler Construction F6S 17
Algorithm for Computing First Sets for Non-terminals: For all non-terminals A do First(A) := {}; while there are changes to any First(A) do for each production choice A X 1 X 2...X n do k := 1 ; Continue := true ; while Continue = true and k <= n do add First (X k ) {ε} to First(A); if ε is not in First (X k ) then Continue := false ; k := k + 1 ; if Continue = true then add ε to First(A) ; 35 Example of Algorithm Grammar Rule Pass 1 Pass 2 statement if-stmt First(statement) = {if, other} statement other First(statement) = {other} if-stmt if ( exp) statement else-part First(if-stmt) = {if} else-part else statement First(else-part) = {else} else-part ε First(else-part) = {else, ε} exp 0 First(exp) = {0} exp 1 First(exp) = {0, 1} 36 Compiler Construction F6S 18
Definition of Follow Sets for Non-terminals: Given a non-terminal A, the set Follow(A), consisting of terminals, and possibly $, is defined as follows. 1. If A is the start symbol, then $ is in Follow(A). 2. If there is a production B αa γ, then First(γ) {ε} is in Follow(A). 3. If there is a production B αa γ such that ε is in First(γ), then Follow(A) contains Follow(B). 37 Algorithm for Computing Follow Sets for Non-terminals: Follow (start-symbol) := {$} ; for all non-terminals A start-symbol do Follow (A) := {} ; while there are changes to any Follow sets do for each production A X 1 X 2... X n do for each X i that is a non-terminal do add First(X i+1 X i+2... X n ) {ε} to Follow (X i ) (* Note: if i=n, then X i+1 X i+2... X n = ε *) if ε is in First(X i+1 X i+2... X n ) then add Follow(A) to Follow (X i ) 38 Compiler Construction F6S 19
Example S A B z, A x ε, B y ε First(A) = { x, ε } First(B) = { y, ε } First(S) = { x, y, z, ε } First(ASw) = { x, y, z, w } Follow(S) = { $ } Follow(A) = { y, $ } Follow(B) = { $ } 39 A Second Example statement if-stmt other if-stmt if ( exp )statement else-part else-part else statement ε exp 0 First(statement) = {if, other} First(if-stmt) = {if} First(else-part) = {else, ε} First(exp) = {0, 1} Follow(statement) = {$,else} Follow(if-stmt) = {$,else} Follow(else-part) = {$,else} Follow(exp) = {)} 40 Compiler Construction F6S 20