Context free grammars and predictive parsing Programming Language Concepts and Implementation Fall 2011, Lecture 6 Only 8/15 submitted! Why? Merge: } Complexity? Mandatory ex 5 public static List<T> Merge<T>(List<T> first, List<T> second) where T: IComparable<T> { List<T> result = new List<T>(); result.addrange(first); result.addrange(second); result.sort(); return result; Does the right thing, but what did we learn? Exercises important! Also non-mandatory ones 2
Context free grammars Next week: LR parsing Describing programming language syntax Ambiguities and eliminating these The parser generator coco/r Overview Predictive parsing: Under the hood of coco/r 3 An example and a derivation = + * () Context free grammars => + => + * => 2 + 3*4 Think of it as regular expressions + recursion Terminology: - 1 non-terminal - 5 terminals (tokens): +, *, (, ), num - 4 productions (right hand sides) - Terminals and nonterminals collectively are symbols 4
Straight line programs (from book) S = S;S id := E print(l) E = id E + E (S,E) L = E L,E Another example S S ; S S ; id := E id := E; id := E id := num ; id := E id := num ; id := E + E id := num ; id := E + (S, E) id := num ; id := id + (S, E) id := num ; id := id + (id := E, E) id := num ; id := id + (id := E + E, E) id := num ; id := id + (id := E + E, id ) id := num ; id := id + (id := num + E, id) id := num ; id := id + (id := num + num, id) 5 A context free grammar consists of - A finite set of nonterminals - A finite set of terminals - A finite set of productions Official definition A production consists of - A nonterminal (called the left hand side) - A string of symbols (terminals or nonterminals) This is called Backus-Naur Form (BNF) 6
= + * () Ambiguity + 2 + 4 3 4 => + => + * => 2 + 3*4 2 3 => * => + * => 2 + 3*4 7 Encoding operator precedence Multiplication has higher precedence (binds stronger) than addition One nonterminal per precedence level Exercise: = + Term = Term * Term Term () - How many ways can you parse 2+3*4? - How about 2 + 3 + 4? 8
Ambiguity and associativity = - 5 2 3 2 Forcing left associativity 5 3 = - num 9 Exercise What ambiguities exist in the following grammar, and how do we get rid of them? = + * - / () 10
Exercise What ambiguities exist in the following grammar, and how do we get rid of them? = + * - / () * and / have higher precedence than -,+ All operators associate to the left, e.g., - 3/6*2 = (3/6)*2 2/(6*3) - 3-6+2 = (3-6)+2 3-(6+2) 11 Encoding operator precedence = + * - / () Encoding associativity = + Term - Term Term Term = Term * num Term * () Term / num Term / () () or(better) = + - Term Term = Term * Term Term / Term () = + Term - Term Term Term = Term * Prim Term / Prim Prim Prim = () Exercise 12
Associativity of operators Most binary operators are left associative, e.g., +, -, *, / Few are right associative, e.g. = in C: x = y = 2 parsed as x = (y = 2) Forcing right associativity = ident = ident Some are not associative, e.g., 1<2<3 is not legal Log = < =... 13 Consider the grammar Amguity: How to parse? Ambiguity: Dangling else Stmt = if then Stmt else Stmt if then Stmt id = if then if then Stmt else Stmt 14
Consider the grammar Amguity: How to parse Resolving the ambiguity Stmt = Matched_Stmt Unmatched_Stmt Ambiguity: Dangling else Stmt = if then Stmt else Stmt if then Stmt id = if then if then Stmt else Stmt Matched_Stmt = if then Matched_Stmt else Matched_Stmt id = Better to handle this using parser tricks. See later Unmatched_Stmt = if then Stmt if then Matched_Stmt else Unmatched_Stmt 15 From MCIJ (note mixed notation) Example: Mini Java 16
SQL specification (in extended BNF)... <query specification> ::= SELECT [ <set quantifier> ] <select list> <table expression> <select list> ::= <asterisk> <select sublist> [ { <comma> <select sublist> }... ] <select sublist> ::= <derived column> <qualifier> <period> <asterisk> <derived column> ::= <value expression> [ <as clause> ] <as clause> ::= [ AS ] <column name> <table expression> ::= <from clause> [ <where clause> ] [ <group by clause> ] [ <having clause> ] http://savage.net.au/ SQL/sql-92.bnf <from clause> ::= FROM <table reference> [ { <comma> <table reference> }... ]... 17 Extended BNF Example = Term { + Term - Term } Term = num { * num} Extra symbols - {α} means zero, one or many α - [α] means zero or one α - (α) is used for grouping EBNF is no more expressive than BNF, only more convenient 18
Using coco/r COMPILER essions... PRODUCTIONS /*-------------------------------------------------------------------*/ = Term { '+' Term '-' Term }. Term = number { '*' number }. essions =. END essions. 19 Using coco/r 20
Semantic actions in coco/r COMPILER essions public int res;... PRODUCTIONS /*-------------------------------------------------------------------*/ <out int n> (. int n1, n2;.) = Term<out n1> (. n = n1;.) { '+' Term<out n2> (. n = n+n2;.) '-' Term<out n2> (. n = n-n2;.) }. Term<out int n> = number (. n = Convert.ToInt32(t.val);.) { '*' number (. n = n*convert.toint32(t.val);.) }. essions (. int n;.) = <out n> (. res = n;.). END essions. 21 Method for parsing expressions In resulting Parser.cs void (out int n) { int n1, n2; Term(out n1); n = n1; while (la.kind == 3 la.kind == 4) { if (la.kind == 3) { Get(); Term(out n2); n = n+n2; } else { Get(); Term(out n2); n = n-n2; } } } The generated parser Pass by reference, similar to ref If next token is + 22
Using coco/r with semantic actions 23 Predictive parsing Top-down parsing method aka LL-parsing coco/r generates LL parsers Produces left-most derivations Example grammar 3.11 Guess a production based on the next token Example parsing on board S = if E then S else S begin S L print E L = end ; S L E = num ident 24
Parser implementation final int IF=1, THEN=2, ELSE=3, BEGIN=4, END=5, PRINT=6, SEMI=7, NUM=8, EQ=9; int tok = gettoken(); void advance() {tok=gettoken();} void eat(int t) {if (tok==t) advance(); else error();} void S() {switch(tok) { case IF: eat(if); E(); eat(then); S(); eat(else); S(); break; case BEGIN: eat(begin); S(); L(); break; case PRINT: eat(print); E(); break; default: error(); }} void L() {switch(tok) { case END: eat(end); break; case SEMI: eat(semi); S(); L(); break; default: error(); 25 Parsing table S L E ---------------------------------------------------- if S->if E then S else S begin S->begin S L print S->print E end L->end ; L->;S L num E->num ident E->ident S = if E then S else S begin S L print E L = end ; S L E = num ident 26
Intended learning outcomes Construct grammars for programming languages Eliminate ambiguity by - Encoding operator precedence - Encoding operator associativity Use coco/r to create parsers and lexers 27