CS415 Compilers. LR Parsing & Error Recovery

CS415 Compilers LR Parsing & Error Recovery These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Review: LR(k) items The LR(1) table construction algorithm uses LR(1) items to represent valid configurations of an LR(1) parser An LR(k) item is a pair [P, δ], where P is a production A β with a at some position in the rhs δ is a lookahead string of length k (words or EOF) The in an item indicates the position of the top of the stack LR(1): [A βγ,a] means that the input seen so far is consistent with the use of A βγ immediately after the symbol on top of the stack [A β γ,a] means that the input seen so far is consistent with the use of A βγ at this point in the parse, and that the parser has already recognized β. [A βγ,a] means that the parser has seen βγ, and that a lookahead symbol of a is consistent with reducing to A. Lecture 13 2

Review - Computing Closures Closure(s) adds all the items implied by items already in s Any item [A β Bδ,a] implies [B τ,x] for each production with B on the lhs, and each x FIRST(δa) for LR(1) item The algorithm Closure( s ) while ( s is still changing ) items [A β Bδ,a] s productions B τ P b FIRST(δa) // δ might be ε if [B τ,b] s then add [B τ,b] to s Ø Classic fixed-point method Ø Halts because s ITEMS Closure fills out a state 3

Review - Computing Gotos Goto(s,x) computes the state that the parser would reach if it recognized an x while in state s Goto( { [A β Xδ,a] }, X ) produces [A βx δ,a] (easy part) Should also includes closure( [A βx δ,a] ) (fill out the state) The algorithm Goto( s, X ) new Ø items [A β Xδ,a] s new new [A βx δ,a] return closure(new) Ø Not a fixed-point method! Ø Straightforward computation Ø Uses closure ( ) Goto() moves forward 4

Review - Building the Canonical Collection Start from s 0 = closure( [S S,EOF ] ) Repeatedly construct new states, until all are found The algorithm cc 0 closure ( [S S, EOF] ) CC { cc 0 } while ( new sets are still being added to CC) for each unmarked set cc j CC mark cc j as processed for each x following a in an item in cc j temp goto(cc j, x) if temp CC then CC CC { temp } record transitions from cc j to temp on x Ø Fixed-point computation (worklist version) Ø Loop adds to CC Ø CC 2 ITEMS, so CC is finite 5

Review LR(1) Table Construction High-level overview 1 Build the canonical collection of sets of LR(1) Items, I a Begin in an appropriate state, s 0 Assume: S S, and S is unique start symbol that does not occur on any RHS of a production (extended CFG - ECFG) [S S,EOF], along with any equivalent items Derive equivalent items as closure( s 0 ) b Repeatedly compute, for each s k, and each X, goto(s k,x) If the set is not already in the collection, add it Record all the transitions created by goto( ) This eventually reaches a fixed point 2 Fill in the table from the collection of sets of LR(1) items The canonical collection completely encodes the transition diagram for the handle-finding DFA 6

Review: Example (building the collection) 1: Goal Expr 2: Expr Term Expr 3: Expr Term 4: Term Factor * Term 5: Term Factor 6: Factor ident Initialization Step Symbol FIRST Goal { ident } Expr { ident } Term { ident } Factor { ident } { } * { * } ident { ident } s 0 closure( { [Goal Expr, EOF] } ) = {[Goal Expr, EOF], [Expr à Term Expr, EOF], [Expr à Term, EOF], [Term à Factor * Term, -], [Term à Factor, -], [Term à Factor * Term, EOF], [Term à Factor, EOF], [Factor à ident, *], [Factor à ident, -], [Factor à ident, EOF]} S { S 0 } 7

Example (building the collection) s 0 closure( { [Goal Expr, EOF] } ) { [Goal Expr, EOF], [Expr Term Expr, EOF], [Expr Term, EOF], [Term Factor * Term, EOF], [Term Factor * Term, ], [Term Factor, EOF], [Term Factor, ], [Factor ident, EOF], [Factor ident, ], [Factor ident, *] } Iteration 1 s 1 goto(s 0, Expr) = { [Goal Expr, EOF] } s 2 goto(s 0, Term) = { [Expr Term Expr, EOF], [Expr Term, EOF] } s 3 goto(s 0, Factor) = { [Term Factor * Term, EOF],[Term Factor * Term, ], [Term Factor, EOF], [Term Factor, ] } s 4 goto(s 0, ident ) = { [Factor ident, EOF],[Factor ident, ], [Factor ident, *] } 9

Example (building the collection) Iteration 1 s 1 goto(s 0, Expr) = { [Goal Expr, EOF] } s 2 goto(s 0, Term) = { [Expr Term Expr, EOF], [Expr Term, EOF] } s 3 goto(s 0, Factor) = { [Term Factor * Term, EOF],[Term Factor * Term, ], [Term Factor, EOF], [Term Factor, ] } s 4 goto(s 0, ident ) = { [Factor ident, EOF],[Factor ident, ], [Factor ident, *] } Iteration 2 s 5 goto(s 2, ) = { [Expr Term Expr, EOF], [Expr Term Expr, EOF], [Expr Term, EOF], [Term Factor * Term, ], [Term Factor, ], [Term Factor * Term, EOF], [Term Factor, EOF], [Factor ident, *], [Factor ident, ], [Factor ident, EOF] } s 6 goto(s 3, * ) = see next page 11

Example (building the collection) Iteration 2 s 5 goto(s 2, ) = { [Expr Term Expr, EOF], [Expr Term Expr, EOF], [Expr Term, EOF], [Term Factor * Term, ], [Term Factor * Term, EOF], [Term Factor, ], [Term Factor, EOF], [Factor ident, *], [Factor ident, ], [Factor ident, EOF] } s 6 goto(s 3, * ) = { [Term Factor * Term, EOF], [Term Factor * Term, ], [Term Factor * Term, EOF], [Term Factor * Term, ], [Term Factor, EOF], [Term Factor, ], [Factor ident, EOF], [Factor ident, ], [Factor ident, *] } Iteration 3 s 7 goto(s 5, Expr ) = {? } s 8 goto(s 6, Term ) = {? } s 2 goto(s 5, Term), s 3 goto(s 5, factor), s 4 goto(s 5, ident), s 3 goto(s 6, Factor), s 4 goto(s 6, ident) 12

Example (building the collection) Iteration 2 s 5 goto(s 2, ) = { [Expr Term Expr, EOF], [Expr Term Expr, EOF], [Expr Term, EOF], [Term Factor * Term, ], [Term Factor * Term, EOF], [Term Factor, ], [Term Factor, EOF], [Factor ident, *], [Factor ident, ], [Factor ident, EOF] } s 6 goto(s 3, * ) = { [Term Factor * Term, EOF], [Term Factor * Term, ], [Term Factor * Term, EOF], [Term Factor * Term, ], [Term Factor, EOF], [Term Factor, ], [Factor ident, EOF], [Factor ident, ], [Factor ident, *] } Iteration 3 s 7 goto(s 5, Expr ) = { [Expr Term Expr, EOF] } s 8 goto(s 6, Term ) = { [Term Factor * Term, EOF], [Term Factor * Term, ] } 13

Example (Summary) S 0 : { [Goal Expr, EOF], [Expr Term Expr, EOF], [Expr Term, EOF], [Term Factor * Term, EOF], [Term Factor * Term, ], [Term Factor, EOF], [Term Factor, ], [Factor ident, EOF], [Factor ident, ], [Factor ident, *] } S 1 : { [Goal Expr, EOF] } S 2 : { [Expr Term Expr, EOF], [Expr Term, EOF] } S 3 : { [Term Factor * Term, EOF],[Term Factor * Term, ], [Term Factor, EOF], [Term Factor, ] } S 4 : { [Factor ident, EOF],[Factor ident, ], [Factor ident, *] } S 5 : { [Expr Term Expr, EOF], [Expr Term Expr, EOF], [Expr Term, EOF], [Term Factor * Term, ], [Term Factor, ], [Term Factor * Term, EOF], [Term Factor, EOF], [Factor ident, *], [Factor ident, ], [Factor ident, EOF] } 14

Example (Summary) S 6 : { [Term Factor * Term, EOF], [Term Factor * Term, ], [Term Factor * Term, EOF], [Term Factor * Term, ], [Term Factor, EOF], [Term Factor, ], [Factor ident, EOF], [Factor ident, ], [Factor ident, *] } S 7 : { [Expr Term Expr, EOF] } S 8 : { [Term Factor * Term, EOF], [Term Factor * Term, ] } 15

Example (DFA) term s 1 s 2 s 7 term - expr expr ident ident s 0 s 4 s 5 term ident factor s 6 * s 3 s 8 factor factor The State Transition Table State Ident - * Expr Term Factor 0 4 1 2 3 1 2 5 3 6 4 5 4 7 2 3 6 4 8 3 7 8 16

Filling in the ACTION and GOTO Tables The algorithm set s x S item i s x if i is [A β ad,b] and goto(s x,a) = s k, a T then ACTION[x,a] shift k else if i is [S S,EOF] then ACTION[x, EOF] accept else if i is [A β,a] then ACTION[x,a] reduce A β n NT if goto(s x,n) = s k then GOTO[x,n] k Many items generate no table entry 18

Example (Filling in the tables) The algorithm produces LR(1) parse table ACTION GOTO Ident - * EOF Expr Term Factor 0 s 4 1 2 3 1 acc 2 s 5 r 3 3 r 5 s 6 r 5 4 r 6 r 6 r 6 5 s 4 7 2 3 6 s 4 8 3 7 r 2 8 r 4 r 4 Plugs into the skeleton LR(1) parser Remember the state transition table? State Ident - * Expr Term Factor 0 4 1 2 3 1 2 5 3 6 4 5 4 7 2 3 6 4 8 3 7 8 19

An Example for Table Filling Practice A Parse Table Filling Example For pdf lecture notes readers, see attached LR(1) parse table example file Lecture 15 20

What can go wrong? What if set s contains [A β aγ,b] and [B β,a]? First item generates shift, second generates reduce Both define ACTION[s,a] cannot do both actions This is a fundamental ambiguity, called a shift/reduce error Modify the grammar to eliminate it (if-then-else) What if set s contains [A γ, a] and [B γ, a]? Each generates reduce, but with a different production Both define ACTION[s,a] cannot do both reductions This fundamental ambiguity is called a reduce/reduce error Modify the grammar to eliminate it In either case, the grammar is not LR(1) EaC includes a worked example 21

Shrinking the Tables Three options: Combine terminals such as number & identifier, + & -, * & / Directly removes a column, may remove a row For expression grammar, 198 (vs. 384) table entries Combine rows or columns Implement identical rows once & remap states Requires extra indirection on each lookup Use separate mapping for ACTION & for GOTO Use another construction algorithm Both LALR(1) and SLR(1) produce smaller tables Implementations are readily available (table compression) 22

LR(0) versus SLR(1) versus LR(1) LR(0)? -- set of LR(0) items as states LR(1)? -- set of LR(1) items as states, different states compared to LR(0) SLR(1)? -- LR(0) items and canonical sets, same as LR(0) SLR(1): add FOLLOW(A) to each LR(0) item [A γ ] as its second component: [A γ, a], a FOLLOW(A) 23

LR(0) versus SLR(1) versus LR(1) Example: S S S S ; a a LR(0)? LR(1)? SLR(1)? Lecture 15 24

LR(0) versus LR(1) versus SLR(1) LR(0) States s0 = Closure({[S.S]}) = {[S ->.S], [S ->.S; a], [S ->.a] } s1 = Closure( GoTo (s0, S)) = {[S S. ], [S S.; a] } s2 = Closure( GoTo (s0, a)) = {[S a.]} s3 = Closure( GoTo (s1, ;)) = {[S S;. a]} LR(1) States s4 = Closure( GoTo (s3, a)) = {[S S;a.] } s0 = Closure({[S.S,eof]}) = {[S ->.S,eof], [S ->.S; a,eof], [S ->.a,;] } s1 = Closure( GoTo (s0, S)) = {[S S. eof], [S S.; a,eof] } s2 = Closure( GoTo (s0, a)) = {[S a.,;]} s3 = Closure( GoTo (s1, ;)) = {[S S;. a,eof]} s4 = Closure( GoTo (s3, a)) = {[S S;a., eof] } Grammar is not LR(0), but LR(1) and SLR(1) Lecture 15 25

LALR(1) versus LR(1) LALR(1)? LR(1) items, State -> Grouped LR(1) states LALR(1): Merge two sets of LR(1) items (states), if they have the same core. core of set of LR(1) items: set of LR(0) items derived by ignoring the lookahead symbols FACT: collapsing LR(1) states into LALR(1) states cannot introduce shift/reduce conflicts 26

LALR(1) versus LR(1) s0 = Closure({[S.S, eof]}) s1 = Closure( GoTo (s0, a)) = {[S a. Ad, eof], [S a. Be, eof], [A.c, d], [ B.c, e]} s2 = Closure( GoTo (s0, b)) = {[S b. Ae, eof], [S b. Bd, eof], [A.c, e], [B.c, d]} s3 = Closure( GoTo (s1, c)) = {[A c., d], [B c., e]} s4 = Closure( GoTo (s2, c)) = {[A c., e], [B c., d]} There are other states that are not listed here! Grammar is LR(1), but not LALR(1), since collapsing s3 and s4 (same core) will introduce reduce-reduce conflict. Lecture 15 27

Hierarchy of Context-Free Grammars Floyd-Evans Parsable Context-free grammars Unambiguous CFGs Operator Precedence Operator precedence includes some ambiguous grammars LL(1) is a subset of SLR(1) LR(k) LR(1) LALR(1) LL(k) The inclusion hierarchy for context-free grammars SLR(1) LR(0) LL(1) Ref Book: Michael Sipser, Introduction to the Theory of Computation, 3 rd Edition Lecture 16 28

Error Recovery in Shift-Reduce Parsers The problem: parser encounters an invalid token Goal: Want to parse the rest of the file Basic idea (panic mode): Assume something went wrong while trying to find handle for nonterminal A Pretend handle for A has been found; pop handle, skip over input to find terminal that can follow A Restarting the parser (panic mode): find a restartable state on the stack (has transition for nonterminal A) move to a consistent place in the input (token that can follow A) perform (error) reduction (for nonterminal A) print an informative message Lecture 15 29

Error Recovery in YACC Yacc s (bison s) error mechanism (note: version dependent!) designated token error used in error productions of the form A error α // basic case α specifies synchronization points When error is discovered pops stack until it finds state where it can shift the error token resumes parsing to match α special cases: α = w, where w is string of terminals: skip input until w has been read α = ε : skip input until state transition on input token is defined error productions can have actions Lecture 15 30

Error Recovery in YACC cmpdstmt: BEG stmt_list END stmt_list : stmt stmt_list ; stmt error { yyerror( \n***error: illegal statement\n );} This should throw out the erroneous statement synchronize at ; or end (implicit: α = ε) writes message ***Error: illegal statement to stderr Example: begin a & 5 hello ; a := 3 end resume parsing ***Error: illegal statement Lecture 15 31

Project #2 (see lex & yacc, Levine et al., O Reilly) You do have to (slightly) change the scanner (scan.l) How to specify and use attributes in YACC? Define attributes as types in attr.h typedef struct info_node {int a; int b} infonode; Include type attribute name in %union in parse.y %union {tokentype token; infonode myinfo; } Assign attributes in parse.y to Terminals: %token <token> ID ICONST Non-terminals: %type <myinfo> block variables procdecls cmpdstmt Accessing attribute values in parse.y use $$, $1, $2 etc. notation: block : variables procdecls {$2.b = $1.b + 1;} cmpdstmt { $$.a = $1.a + $2.a + $3.b;} Lecture 16 32

YACC : parse.y parse.y : %{ #include <stdio.h> #include "attr.h" int yylex(); void yyerror(char * s); #include "symtab.h" %} %union {tokentype token; } Will be included verbatim in parse.tab.c List and assign attributes %token PROG PERIOD PROC VAR ARRAY RANGE OF %token INT REAL DOUBLE WRITELN THEN ELSE IF %token BEG END ASG NOT %token EQ NEQ LT LEQ GEQ GT OR EXOR AND DIV NOT %token <token> ID CCONST ICONST RCONST %start program %% program : PROG ID ';' block PERIOD { } ; block : BEG ID ASG ICONST END { } ; %% void yyerror(char* s) { fprintf(stderr,"%s\n",s); } int main() { printf("1\t"); yyparse(); return 1; } Rules with semantic actions Main program and helper functions; may contain initialization code of global structures. Will be included verbatim in parse.tab.c Lecture 16 33

Project #2 : Things to do Learn/Review the C programming language Add error productions (syntax errors) Define and assign attributes to non-terminals Implement single-level symbol table Perform type checking and produce required error messages; note: actions may occur at any location on right-hand side (implicit use of marker productions) Lecture 16 34

Next two classes Ad-hoc, syntax directed translation schemes,type checking Read EaC: Chapters 4.1-4.4 Lecture 15 35