1 Formal Languages and Compiler (CSE322) Bottom-up Parser Jungsik Choi * Some slides taken from SKKU SWE3010 (Prof. Hwansoo Han) and TAMU CSCE (Prof. Lawrence Rauchwerger)

2 Bottom-up Parsing Bottom-up parsing and reverse rightmost derivation A derivation consists of a series of rewrite steps A bottom-up parser builds a derivation by working from the input sentence back toward the start symbol S S Þ γ # Þ γ \$ Þ γ & Þ Þ γ '(\$ Þ γ ' Þ sentence bottom-up In terms of the parse tree, this is working from leaves to root Nodes with no parent in a partial tree form its upper fringe Since each replacement of β with A shrinks the upper fringe, we call it a reduction 2

3 Bottom-Up Parser Example The expression grammar Goal Expr Term Expr Expr + Term Expr Term Term Term * Factor Term / Factor Factor Factor number id (Expr) Handles for rightmost derivation of x 2 * y Prod n Sentential From Handle Goal Expr Expr Term Expr Term * Factor Expr Term * <id,y> Expr Factor * <id,y> Expr - <num,2> * <id,y> Term - <num,2> * <id,y> Factor - <num,2> * <id,y> <id,x> - <num,2> * <id,y> - 1,1 3,3 5,5 9,5 7,3 8,3 4,1 7,1 9,1 Reverse rightmost derivation (RRD) Handles are specified in blue 3

4 Finding Reductions - Handles Parser must find a substring β of the tree s frontier Matches some production A β that occurs as one step in the rightmost derivation Informally, we call this substring β a handle Formally, A handle of a right-sentential form γ is a pair <A β, k> A β P k is the position in γ of β s rightmost symbol If <A β, k> is a handle, then replace β at k with A Handle pruning The process of discovering a handle & reducing it to the appropriate lhs (nonterminal) is called handle pruning Because γ is a right-sentential form, the substring to the right of a handle contains only terminal symbols 4

5 Shift-Reduce Parser One of Bottom-up Parsers push INVALID token next_token( ) repeat until (top of stack = Goal and token = EOF) if the top of the stack can reduce using a handle <A b.k> then // reduce b to A pop b (=k) symbols off the stack push A onto the stack else if (token!= EOF) then // push token token next_token( ) else // need to, but out of input report an error How do errors show up? - failure to find a handle - hitting EOF & needing to (Final else clause) Either generates an error 5

6 Back to x 2 * y \$ \$ id Stack Input Handle Action id num * id \$ num * id \$ none 6

7 Back to x 2 * y \$ \$ id \$ Factor \$ Term \$ Expr Stack Input Handle Action id num * id \$ num * id \$ num * id \$ num * id \$ num * id \$ none 9, 1 7, 1 4, 1 red. 9 red. 7 red. 4 7

8 Back to x 2 * y \$ \$ id \$ Factor \$ Term \$ Expr \$ Expr \$ Expr num Stack Input Handle Action id num * id \$ num * id \$ num * id \$ num * id \$ num * id \$ num * id \$ * id \$ none 9, 1 7, 1 4, 1 none none red. 9 red. 7 red. 4 8

9 Back to x 2 * y \$ \$ id \$ Factor \$ Term \$ Expr \$ Expr \$ Expr num \$ Expr Factor \$ Expr Term Stack Input Handle Action id num * id \$ num * id \$ num * id \$ num * id \$ num * id \$ num * id \$ * id \$ * id \$ * id \$ none 9, 1 7, 1 4, 1 none none 8, 3 7, 3 red. 9 red. 7 red. 4 red. 8 red. 7 9

10 Back to x 2 * y \$ \$ id \$ Factor \$ Term \$ Expr \$ Expr \$ Expr num \$ Expr Factor \$ Expr Term \$ Expr Term * \$ Expr Term * id Stack Input Handle Action id num * id \$ num * id \$ num * id \$ num * id \$ num * id \$ num * id \$ * id \$ * id \$ * id \$ id \$ \$ none 9, 1 7, 1 4, 1 none none 8, 3 7, 3 none none red. 9 red. 7 red. 4 red. 8 red. 7 10

11 Back to x 2 * y Stack Input Handle Action \$ \$ id \$ Factor \$ Term \$ Expr \$ Expr \$ Expr num \$ Expr Factor \$ Expr Term \$ Expr Term * \$ Expr Term * id \$ Expr Term * Factor \$ Expr Term \$ Expr \$ Goal id num * id \$ num * id \$ num * id \$ num * id \$ num * id \$ num * id \$ * id \$ * id \$ * id \$ id \$ \$ \$ \$ \$ \$ none 9, 1 7, 1 4, 1 none none 8, 3 7, 3 none none 9, 5 5, 5 3, 3 1, 1 none red. 9 red. 7 red. 4 red. 8 red. 7 red. 9 red. 5 red. 3 red. 1 accept 5 s + 9 reduces + 1 accept 11

12 Example Stack Input Action G \$ \$ id \$ Factor \$ Term \$ Expr \$ Expr \$ Expr num \$ Expr Factor \$ Expr Term \$ Expr Term * \$ Expr Term * id \$ Expr Term * Factor \$ Expr Term \$ Expr \$ Goal id num * id num * id num * id num * id num * id num * id * id * id * id id red. 9 red. 7 red. 4 red. 8 red. 7 red. 9 red. 5 red. 3 red. 1 accept E T F x E - T F 2 T * F y bottom-up building 12

13 Shift-Reduce Parsing Shift reduce parsers are easily built and easily understood A -reduce parser has just four actions Shift : next word is ed onto the stack Reduce : right end of handle is at top of stack Locate left end of handle within the stack Pop handle off stack & push appropriate lhs Accept : stop parsing & report success Error : call an error reporting/recovery routine Handle finding is key - handle is on stack - finite set of handles à use a DFA! Critical Question: How can we know when we have found a handle without generating lots of different derivations? Answer: we use look ahead in the grammar along with tables produced as the result of analyzing the grammar. LR(1) parsers build a DFA that runs over the stack & finds them 13

14 Another Bottom-Up Parser LR(1) Parsers LR(1) parsers are table-driven, -reduce parsers that use a limited right context (1 token) for handle recognition LR(1) parsers recognize languages that have an LR(1) grammar Informal definition: A grammar is LR(1) if, given a rightmost derivation S Þ γ # Þ γ \$ Þ γ & Þ Þ γ '(\$ Þ γ ' Þ sentence We can 1. isolate the handle of each right-sentential form γ *, and 2. determine the production with which to reduce, by scanning γ * from leftto-right, going at most 1 symbol beyond the right end of the handle of γ * 14

15 LR(1) Parsers A table-driven LR(1) parser looks like source code Scanner Table-driven Parser IR grammar Parser Generator ACTION & GOTO Tables Tables can be built by hand However, this is a perfect task to automate 15

16 LR(1) Skeleton Parser The skeleton parser push tokens & NTs along with DFA states uses ACTION & GOTO tables (DFA) does words s dose derivation reductions does 1 accept detects errors by failure of 3 other cases stack.push(invalid); stack.push(s 0 ); not_found = true; token = scanner.next_token(); do while (not_found) { s = stack.top(); if ( ACTION[s,token] == s next ) then { stack.push(token); stack.push(s next ); token scanner.next_token(); } else if ( ACTION[s,token] == reduce A b ) then { stack.popnum(2* b ); // pop 2* b symbols s = stack.top(); stack.push(a); stack.push(goto[s,a]); } else if ( ACTION[s,token] == accept & token == EOF ) then { not_found = false; } else report a syntax error and recover; } report success; 16

17 LR(1) Parsers How does this LR(1) stuff work? Unambiguous grammar à unique rightmost derivation Keep upper fringe on a stack All active handles include top of stack (TOS) Shift inputs until TOS is right end of a handle Language of handles is regular (finite) Build a handle-recognizing DFA ACTION & GOTO tables encode the DFA The Big Picture Model the state of the parser Use two functions goto(s, X) and closure(s) goto() is analogous to move() in subset construction (NFAà DFA) closure() adds information to form a state Build up the states and transition functions of the DFA Use this information to fill in the ACTION and GOTO tables 17

18 LR(0) example Z E E E + T T T i (E) 18

19 LR(1) Parsing - Example LIST LIST, ELEMENT ELEMENT ELEMENT a ACTION GOTO State a, \$ LIST ELEMENT 0 s s4 acc 2 r2 r2 3 r3 r3 4 s3 5 5 r1 r1 LR(1) Parsing a,a Stack Input Action 0 a,a\$ s3 0 a 3,a\$ r3 GOTO2 0 ELEMENT 2,a\$ r2 GOTO1 0 LIST 1,a\$ s4 0 LIST 1, 4 a\$ s3 0 LIST 1, 4 a 3 \$ r3 GOTO5 0 LIST 1, 4 ELEMENT 5 \$ r1 GOTO1 0 LIST 1 \$ accept 19

20 LR(k) Items A state of parser == a set of LR(k) items An LR(k) item is a pair [P, δ], where P is a production A β with a - at some position in the rhs δ is a lookahead string of length k (words/tokens or EOF) 20

21 LR(k) Items LR(1) items The - in an item indicates the position of the top of the stack [A - βγ, a] means that the input seen so far is consistent with the use of A βγ immediately after the symbol on top of the stack (possibility) [A β - γ, a] means that the input seen so far is consistent with the use of A βγ at this point, and that the parser has already recognized β (partially complete) [A βγ -, a] means that the parser has seen βγ, and that a lookahead symbol of a is consistent with reducing to A (complete) 21

22 Computing goto() goto(s, x) computes the state that the parser would reach if it recognized an x while in state s goto({ [A β -Xδ, a] }, X ) produces [A βx- δ, a] Should also includes closure( [A βx- δ, a] ) The algorithm goto(s, X) moved for each item [A β -Xδ, a] s moved moved [A βx- δ, a] return closure(moved) 22

23 Computing closure() closure(s) add all the items implied by items already in s Any item [A β -Bδ, a] implies [B - τ, x] for each production with B on the lhs, and each x FIRST(δa) Since βbδ is valid, any way to derive βbδ is valid, too The algorithm closure(s) while(s is still changing) for each item [A β -Cδ, a] s for each production C τ P for each b FISRT(δa) // δ might be ε s s [C - τ,b] 23

24 LR(1) Table Construction High-level overview 1. Build the canonical collection of sets of LR(1) Items a. Begin in an appropriate state, CC # [S -S, EOF], along with any equivalent items Derive equivalent items as closure(cc # ) b. Repeatedly compute, for each CC 9, and each X, goto(cc 9, X) If the set is not already in the collection, and it Record all the transitions created by goto() This eventually reaches a fixed point 2. Fill in the table from the collection of sets of LR(1) items 24

25 Canonical Collection Building CC: all possible states Start from CC # = closure( [S -S, EOF] ) Repeatedly construct new states, until all are found The algorithm CC # closure( [S S, EOF] ) CC { CC # } k 1 while ( CC is still changing ) for each CC : CC and for each x (T NT) CC 9 goto(cc :, x) record CC : CC 9 on x if CC 9 CC then CC CC CC 9 // new state in DFA k k

26 Example-1 (grammar & sets) Simplified, right recursive expression grammar Goal Expr Expr Term Expr Term Term Factor * Term Factor Factor ident Symbol Goal Expr Term Factor - * ident FIRST {ident} {ident} {ident} {ident} {-} {*} {ident} 26

27 Example-1 (building the collection 1) Initialization step CC # closure( { [Goal -Expr, EOF] } ) { [Goal -Expr, EOF], [Expr -Term - Expr, EOF], [Expr -Term, EOF], [Term -Factor * Term, EOF], [Term -Factor * Term, -], [Term -Factor, EOF], [Term -Factor, -], [Factor -ident, EOF], [Factor -ident, -], [Factor -ident, *] } Add CC # to a set of states, CC { CC # } 27

28 Example-1 (building the collection 2) Iteration 1 cc1 goto(cc0, Expr) cc2 goto(cc0, Term) cc3 goto(cc0, Factor) cc4 goto(cc0, ident ) Iteration 2 cc5 goto(cc2, ) cc6 goto(cc3, * ) Iteration 3 cc7 goto(cc5, Expr ) cc8 goto(cc6, Term ) # Term, Factor, ident existing states # Factor, ident existing states 28

29 Example-1 (summary 1) cc0 : { [Goal Expr, EOF], [Expr Term Expr, EOF], [Expr Term, EOF], [Term Factor * Term, EOF], [Term Factor * Term, ], [Term Factor, EOF], [Term Factor, ], [Factor ident, EOF], [Factor ident, ], [Factor ident, *] } cc1 : { [Goal Expr, EOF] } cc2 : { [Expr Term Expr, EOF], [Expr Term, EOF] } cc3 : { [Term Factor * Term, EOF],[Term Factor * Term, ], [Term Factor, EOF], [Term Factor, ] } cc4 : { [Factor ident, EOF],[Factor ident, ], [Factor ident, *] } cc5 : { [Expr Term Expr, EOF], [Expr Term Expr, EOF], [Expr Term, EOF], [Term Factor * Term, EOF], [Term Factor * Term, ], [Term Factor, EOF], [Term Factor, ], [Factor ident, EOF ], [Factor ident, ], [Factor ident, *] } 29

30 Example-1 (summary 2) cc6 : { [Term Factor * Term, EOF], [Term Factor * Term, ], [Term Factor * Term, EOF], [Term Factor * Term, ], [Term Factor, EOF], [Term Factor, ], [Factor ident, EOF], [Factor ident, ], [Factor ident, *] } cc7 : { [Expr Term Expr, EOF] } cc8 : { [Term Factor * Term, EOF], [Term Factor * Term, ] } 1 E T 0 F id 2 3 T F - F * E id id T

31 Example-1 (summary 3) The goto() relationship (from the construction) State Expr Term Factor - * ident

32 The algorithm Filling in the ACTION and GOTO Tables for each set CC < CC for each item i CC < if i is [A β -aγ,b] and goto(cc <,a)= CC 9, a T then ACTION[x,a] k else if i is [S S -,EOF] then ACTION[x, EOF] accept else if i is[a β -, a] then ACTION[x, a] reduce A β for each nt NT if goto(cc <, nt) = CC 9 then GOTO[x, nt] k 32

33 Example-1 (filling in the tables) The algorithm produces the following table State GOTO ACTION Expr Term Factor - * ident EOF s4 1 accept 2 s5 r3 3 r5 s6 r5 4 r6 r6 r s s4 7 r2 8 r4 r4 33

34 Example-2 (grammar & sets) Simplified, right recursive expression grammar S E E T + E T T id Symbol S E T FIRST {id} {id} {id} 34

35 Example-2 (building the collection ) S E E T + E T T id 35

36 Example-2 (building the collection ) S 0 [ S E, \$ ] [ E T + E, \$ ] [ E T, \$ ] [ T id, + ] [ T id, \$ ] S E E T + E T T id 36

37 Example-2 (building the collection ) S 0 [ S E, \$ ] [ E T + E, \$ ] [ E T, \$ ] [ T id, + ] [ T id, \$ ] S E E T + E T T id E S 1 [ S E, \$ ] 37

38 Example-2 (building the collection ) S 0 [ S E, \$ ] [ E T + E, \$ ] [ E T, \$ ] [ T id, + ] [ T id, \$ ] T S 2 [ E T + E, \$ ] [ E T, \$ ] S E E T + E T T id E S 1 [ S E, \$ ] 38

39 Example-2 (building the collection ) S 0 [ S E, \$ ] [ E T + E, \$ ] [ E T, \$ ] [ T id, + ] [ T id, \$ ] T id S 2 [ E T + E, \$ ] [ E T, \$ ] S E E T + E T T id E S 3 [ T id, + ] [ T id, \$ ] S 1 [ S E, \$ ] 39

40 Example-2 (building the collection ) S 0 [ S E, \$ ] [ E T + E, \$ ] [ E T, \$ ] [ T id, + ] [ T id, \$ ] T id S 2 [ E T + E, \$ ] [ E T, \$ ] + S E E T + E T T id E S 1 [ S E, \$ ] S 3 [ T id, + ] [ T id, \$ ] S 4 [ E T + E, \$ ] [ E T + E, \$ ] [ E T, \$ ] [ T id, + ] [ T id, \$ ] 40

41 Example-2 (building the collection ) S 0 [ S E, \$ ] [ E T + E, \$ ] [ E T, \$ ] [ T id, + ] [ T id, \$ ] T id S 2 [ E T + E, \$ ] [ E T, \$ ] + S E E T + E T T id E S 1 [ S E, \$ ] S 3 [ T id, + ] [ T id, \$ ] S 5 [ E T + E, \$ ] E S 4 [ E T + E, \$ ] [ E T + E, \$ ] [ E T, \$ ] [ T id, + ] [ T id, \$ ] 41

42 Example-2 (building the collection ) S 0 [ S E, \$ ] [ E T + E, \$ ] [ E T, \$ ] [ T id, + ] [ T id, \$ ] T id S 2 [ E T + E, \$ ] [ E T, \$ ] T + S E E T + E T T id E S 1 [ S E, \$ ] S 3 [ T id, + ] [ T id, \$ ] S 5 [ E T + E, \$ ] E S 4 [ E T + E, \$ ] [ E T + E, \$ ] [ E T, \$ ] [ T id, + ] [ T id, \$ ] 42

43 Example-2 (building the collection ) S 0 [ S E, \$ ] [ E T + E, \$ ] [ E T, \$ ] [ T id, + ] [ T id, \$ ] T id S 2 [ E T + E, \$ ] [ E T, \$ ] T + S E E T + E T T id E S 1 [ S E, \$ ] S 3 [ T id, + ] [ T id, \$ ] S 5 [ E T + E, \$ ] id E S 4 [ E T + E, \$ ] [ E T + E, \$ ] [ E T, \$ ] [ T id, + ] [ T id, \$ ] 43

44 Example-2 (filling in the tables) The algorithm produces the following table 44

45 Left Recursion vs. Right Recursion Right recursion Required for termination in top-down parsers Uses (on average) more stack space Produces right-associative operators Left recursion Works fine in bottom-up parsers Limits required stack space Produces left-associative operators Rule of thumb Left recursion for bottom-up parsers Right recursion for top-down parsers 45

