A clarification on terminology: Recognizer: accepts or rejects strings in a language Parser: recognizes and generates parse trees (imminent topic) Assignment 3: building a recognizer for the Lake expression language CS 202-9 assignment 3 1 A general comment: Many details in PL implementation (lots of space for ambiguity) No one will suffer if they forget to dot an i 10 years from now: Will you remember whether an octal has a leading \? Will you remember what top-down vs. bottomup parsing is? CS 202-9 assignment 3 2 How does parser recover from syntax error? Three choices: 1. Insert a token guess what token the user forgot 2. Delete a token guess what extra token the user inserted 3. Replace a token guess what token the user really meant Can mix and match all 3 approaches CS 202-9 error recovery 3 1
Common LL(1) strategy: skip all tokens not in follow set parsefact(token nexttoken ) { switch (nexttoken) { case ID: break; case NUM: break; case LPAREN: parseexp(yylex()); consume(rparen); break; default: error( Expected an ID, NUM or ( ); scanto(follow(factor)); break; } } CS 202-9 recursive descent recovery 4 Common strategy: insert token if looking for specific token parsefact(token nexttoken ) { switch (nexttoken) { case ID: break; case NUM: break; case LPAREN: parseexp(yylex()); consume_or_insert(rparen); break; default: error( Expected an ID, NUM or () ); scanto(follow(factor)); break; } } CS 202-9 recursive descent error recovery 5 Common strategy: substitute in special cases parsefact(token nexttoken ) { switch (nexttoken) { case ID: break; case NUM: break; case LPAREN: parseexp(yylex()); consume(rparen); break; case FOR: IF: error( expected an ID ); break; default: error( Expected an ID, NUM or ( ); scanto(follow(factor)); break; } } CS 202-9 recursive descent error recovery 6 2
A warning on substitution strategy: If original token was in the follow set (indicating ommitted tokens), insert strategy may be better Substitution may introduce cascading errors (in next production) CS 202-9 error recovery 7 Basically same approach to error recovery for LR(1) Primary approach is skip tokens Look for synchronizing tokens things like ) ; else Selected insertion or substitution can be added CS 202-9 LR(1) error recovery 8 Java cup (and bison/yacc) Compiler writer defines error recovery Typically use error non-terminal Pre-defined non terminal Match only when error reported example: expr ::= LPAREN error RPAREN CS 202-9 error production 9 3
When syntax error detected Parser pops off states until state with error recovery found Shifts on error non terminal Discards tokens until synchronizing token found Then resume normal parsing CS 202-9 error processing 10 factor ( error ) + factor ( pop states ( a + + ) CS 202-9 error example 11 factor ( error ) error ( shift error ( a + + ) CS 202-9 error example 12 4
factor ( error ) error ( skip token + ( a + + ) CS 202-9 error example 13 factor ( error ) ) error ( push ) ( a + + ) CS 202-9 error example 14 factor ( error ) factor reduce factor ( error ) ( a + + ) CS 202-9 error example 15 5
For other recovery strategies, add wrong productions: To replace token a_expr ::= access ASSIGN expr SEMI; a_expr ::= access EQ expr SEMI; To insert access ::= access LBRACK expr RBRACK access ::= access LBRACK RBRACK Must add explicit messages for these CS 202-9 grammar changes 16 Strategies often fail to recognize real error consider: fi (b) x = 1; else x = 2; Will report 2 errors: missing ; after fi(b) unexpected else But: real problem is mis-spelled if CS 202-9 recovery limitations 17 Some research in global error recovery: Try various strategies in various combinations, including looking back Compare results of different strategies, pick best solution (usually shortest fix) Interesting theoretically, currently too inefficient in practice CS 202-9 global error recovery 18 6
So far we are working on a recognizer Only silently accepts or reports an error message Not very useful for compilation Need to do work with each derivation step Need to add parser actions CS 202-9 on to a parser 19 Every parser generator gives an option to perform actions on each reduction java_cup is much like yacc or bison Insert actions in the grammar as code java_cup actions in {: :} CS 202-9 parser actions 20 Cup actions anywhere in grammar Unlike lex Usually at the end Any code can go in actions Not usually return; would end parse! expr ::= expr PLUS factor {: System.out.println( x+f ); :}; CS 202-9 defining actions 21 7
printing x+f still not very useful Want to get values from rhs elements Can name and reference each piece Use :id to name pieces refer to as id in action expr ::= INT_CONST:l PLUS INT_CONST:r {: System.out.println( sum = + (l.intvalue()+r.intvalue())); :}; CS 202-9 values from production 22 Can use value of any rhs element Token or non terminal What is its type? Must declare types when declaring terminals and non terminals terminal Integer INT_CONST; non terminal Integer expr; CS 202-9 typing the values 23 For tokens: Value is value assigned during lexing Third argument to LakeSym() What is value of non terminal? Set by each production rule Assign to RESULT (implicitly labels lhs) expr ::= expr:l PLUS expr: r {: RESULT =new Integer (l.intvalue() + r.intvalue()); :}; CS 202-9 values on non terminals 24 8
Can now build working calculator: terminal Integer NUMBER; terminal PLUS,MINUS,TIMES,DIVIDE; terminal LPAREN, RPAREN; non terminal Integer expr,term,factor; non terminal calculation; CS 202-9 calculator example 25 calculation ::= expr :e {: System.out.println(e); :}; expr ::= expr:l PLUS term:r {: RESULT = new Integer (l.intvalue()+r.intvalue()); :}; expr ::= expr:l MINUS term:r {: RESULT = new Integer (l.intvalue()-r.intvalue()); :}; expr ::= term:t {: RESULT = t; :}; CS 202-9 calculator example continued 26 term ::= term:l TIMES factor:r {: RESULT = new Integer (l.intvalue()*r.intvalue()); :}; term ::= term:l DIVIDE factor:r {: RESULT = new Integer (l.intvalue()/r.intvalue()); :}; term ::= factor:f {: RESULT = f; :}; factor ::= NUMBER:n {: RESULT = n; :}; factor ::= LPAREN expr:e RPAREN {: RESULT = e; :}; CS 202-9 calculator example end 27 9
This was example of an interpreter Behavior of source language defined in terms of another language Compiler transforms source code into machine language; no mediation CS 202-9 embedded actions 28 Most actions are at end of production they need not be Sometimes embedded actions useful block ::= LBRACE {: startblock(); :} declarations statements RBRACE {: endblock(); :}; startblock will run before any declarations reduced CS 202-9 embedded actions 29 What if two rules for block? block ::= LBRACE {: startblock(); :} declarations statements RBRACE {: endblock(); :}; block ::= LBRACE RBRACE; Parser reads token past LBRACE before calling startblock Can cause unexpected behavior if action affects next token more problems in yacc/lex CS 202-9 problems 30 10
Each action is assumed distinct on parse stack block ::= LBRACE {: startblock(); :} declarations statements RBRACE {: endblock(); :}; block ::= LBRACE {: startblock(); :} declarations RBRACE {: endblock(); :}; Introduces conflict don t know which action until RBRACE CS 202-9 ambiguous actions 31 Fix with common rule block ::= LBRACE startblock declarations statements RBRACE {: endblock(); :}; block ::= LBRACE startblock declarations RBRACE {: endblock(); :}; startblock ::= {: startblock(); :}; CS 202-9 resolving internal actions 32 Many parse trees thus far Most of them: have tokens on leaves have non terminals on interior nodes Reading the leaves gives the entire program as a sequence of tokens Called a concrete parse tree CS 202-9 reading the leaves 33 11
Assume the grammar expr ::= expr PLUS mult_expr; expr ::= expr MINUS mult_expr; expr ::= mult_expr; mult_expr ::= mult_expr TIMES factor; mult_expr ::= mult_expr DIV factor; mult_expr ::= factor; factor ::= ID; factor ::= NUMBER; factor ::= LPAREN expr RPAREN; CS 202-9 example grammar 34 What is concrete parse tree for a * (b - c) CS 202-9 concrete example 35 mult_expr factor ID expr mult_expr * a * (b - c) factor expr ( ) expr + mult_expr mult_expr factor ID factor ID CS 202-9 concrete parse tree example 36 12
Concrete parse trees are messy Include specifics of language Include specifics of grammar Want the semantics of program Leave behind all the clutter Called an abstract parse tree Occasionally used abstract trees CS 202-9 abstract parse trees 37 Abstract parse tree for a * ( b - c): * ID - ID ID CS 202-9 abstract example 38 Obviously much simpler And much easier to work with Abstracted away grammar Don t care about mult_expr Abstracted away language Don t care infix/prefix/rpn/ CS 202-9 abstract advantage 39 13
Compilers use abstract parse trees Assume abstract parse trees now Unless specified otherwise But remember ambiguity is defined on concrete parse trees: If distinct concrete parse trees exist for same sentence, grammar ambiguous CS 202-9 compiler parse trees 40 Create parse trees with actions Type of non terminals is ParseTree: non terminal ParseTree expr,stmt; stmt ::= expr:e SEMI {: RESULT = e; :}; expr ::= expr:l PLUS term:r {: RESULT = new BinParseTree(l,r, BinParseTree.Plus); :}; CS 202-9 creating parse trees 41 14