Prelude COMP 8 October, 9 What is triskaidekaphobia? Fear of the number s? No aisle in airplanes, no th floor in buildings Fear of Friday the th? Paraskevidedekatriaphobia or friggatriskaidekaphobia Why unlucky? Partly: considered perfect number Christianity: th guest at last supper was Judas Norse myth: th god Loki arranged murder of Baldr Code of Hammurabi: no number Lucky Christianity: guests at last supper, attributes of mercy Judaism: principles of Maimonides, age of Bar Mitzvah Sikhism: lucky number Tufts University Computer Science Prelude Analysis What is the probability that the th occurs on a Friday? Day of the week Sunday 8 Monday 8 Tuesday 8 Wednesday 8 Thursday 8 Friday 88 Saturday 8 of occurences Summary of parsing Parsing A solid foundation: context-free grammars A simple parser: LL() A more powerful parser: LR() An efficiency hack: LALR() LALR() parser generators Tufts University Computer Science Tufts University Computer Science A Hierarchy of Grammar Classes From Andrew Appel, Modern Compiler Implementation in Java Tufts University Computer Science More power? So far: LALR and SLR: reduce size of tables Also reduce space of languages What if I want to expand the space of languages? What could I do at a reduce/reduce conflict? Try both reductions! GLR parsing At a choice: split the stack, explore both sibilities If one doesn t work out, kill it Run-time proportional to amount of ambiguity Must design the stack data structure very carefully Tufts University Computer Science
General algorithms LR parsing Parsers for full class of context-free grammars Mostly used in linguistics constructive proof of decidability CYK (9) Bottom-up dynamic programming algorithm O(n ) Earley s algorithm (9) Top-down dynamic programming algorithm Developed the notation for partial production (LR item) Worst-case O(n ) running time But, O(n ) even for unambiguous grammars GLR Worse-case O(n ), but O(n) for unambiguous grammars Input: Stack: s m X m s m- X m- s a a a i a n $ LR Parsing Engine Scanner Compiler construction Parser Action Goto Grammar Generator LR tables Tufts University Computer Science Tufts University Computer Science 8 Real world parsers Real generated code lex, flex, yacc, bison Interaction between lexer and parser C typedef problem Merging two languages Debugging Diagnosing reduce/reduce conflicts How to step through an LR parser Parser generators : JavaCUP LALR() parser generator Input: grammar specification Output: Java classes Generic engine Action/goto tables Separate scanner specification Similar tools: SableCC yacc and bison generate C/C++ parsers JavaCC: similar, but generates LL() parser Tufts University Computer Science 9 Tufts University Computer Science JavaCUP example Simple expression grammar Operations over numbers only // Import generic engine code import java_cup.runtime.*; /* Preliminaries to set up and use the scanner. */ init with {: scanner.init(); :}; scan with {: return scanner.next_token(); :}; Note: interface to scanner One issue: how to agree on names of the tokens Define terminals and non-terminals Indicate operator precedence /* Terminals (tokens returned by the scanner). */ terminal SEMI, PLUS, MINUS, TIMES, DIVIDE, MOD; terminal UMINUS, LPAREN, RPAREN; terminal Integer NUMBER; /* Non terminals */ non terminal expr_list, expr_part; non terminal Integer expr, term, factor; /* Precedences */ precedence left PLUS, MINUS; precedence left TIMES, DIVIDE, MOD; Tufts University Computer Science Tufts University Computer Science
Grammar rules Roadmap expr_list ::= expr_list expr_part expr_part ; text chars Lexical analyzer tokens Parser IR expr_part ::= expr SEMI ; expr ::= expr PLUS expr expr MINUS expr expr TIMES expr expr DIVIDE expr expr MOD expr LPAREN expr RPAREN NUMBER ; Parsing Tells us if input is syntactically correct Gives us derivation or parse tree But we want to do more: Build some data structure the IR Perform other checks and computations Errors Tufts University Computer Science Tufts University Computer Science In practice: Fold some computations into parsing Computations are triggered by parsing steps Parser generators Add action code to do something Typically build the IR How much can we do during parsing? General strategy Associate ues with grammar symbols Associate computations with productions Implementation approaches Formal: attribute grammars Informal: ad-hoc translation schemes Some things cannot be folded into parsing Tufts University Computer Science Tufts University Computer Science Desk calculator Expression grammar Build parse tree Euate the resulting tree G E E E + T E T T T * F T F F ( E ) F num * + G E E + T T T * F F F Can we euate the expression without building the tree first? Piggyback on parsing G E E E + T E T T T * F T F F ( E ) F num G 9 E E + T T T * F F * + F Tufts University Computer Science Tufts University Computer Science 8
Codify: Store intermediate ues with non-terminals Perform computations in each production G E E E + T E T T T * F T F F ( E ) F num Computation print(e.) E. E. + T. E. T. T. T. * F. T. F. F. E. F. ueof(num) Attribute grammars A context-free grammar with a set of rules Each symbol has a set of ues, or attributes Semantic rules : how to compute each attribute The bad news: Formal attribute grammars never widely adopted Why study them? The attribute grammar formalism is important Succinctly makes many points clear Sets the stage for actual, ad-hoc practice The problems motivate practice Tufts University Computer Science 9 Tufts University Computer Science derivations Grammar: Describes signed binary numbers We would like to augment it with rules that compute the decimal ue of each id input string Sign Sign + - For Sign Sign For Sign Sign Sign Sign Sign Sign Sign Sign Tufts University Computer Science Tufts University Computer Science Attribute grammar Attribution rules Goal: Compute the ue of the binary number Information we need Position of each bit to compute place ue = * + * + * Sum of bit ues Computation Propagate ition information Accumulate the sums Attributes Rules Sign Sign + - How to compute from Sign and? if Sign.neg then. - *. else.. How to compute ue?.. +. or.. How to compute ue?. or. (bit ition) Tufts University Computer Science Where does come from? Tufts University Computer Science
Attribution rules Attribution rules Sign Sign + - Start at bit ition. = Push ition information down.. +.. or.. Now we can compute ue. or. (.) Attribution rules.. +... +.. Notice: Information can flow top-down or bottom-up (Also: Left-to-right or Right-to-left) Tufts University Computer Science Tufts University Computer Science Attribution rules Top-down ues are inherited attributes Bottom-up ues are synthesized attributes Values with no dependence are called independent attributes Attribute grammars Specification: Attributes Associated with nodes in parse tree Distinguish multiple non-terminals with index Rules Value assignments associated with productions Information is entirely local: it can only refer to ues in the given production Given a parse tree Rules form a dependence graph between attributes Result: a high-level, functional specification Tufts University Computer Science Tufts University Computer Science 8 Euation Tricky part Values flowing both up and down in tree How do we order the computation? And, how does that relate to parsing order? (i.e., the order in which parse tree nodes are created) Sign Key Must obey the dependences between attributes Let s look at a general technique for euation For Tufts University Computer Science 9 Tufts University Computer Science
Sign neg: : : Annotate parse tree with attributes Sign neg: : Inherited attributes flow down in the tree : : : : For For Tufts University Computer Science Tufts University Computer Science Sign neg: true : : At leaves, add dependences between inherited and synthesized attributes Sign neg: true : : Collect the synthesized attributes For For Tufts University Computer Science Tufts University Computer Science Sign neg: true : Complete graph Now, throw away the parse tree neg: true : Need a method to solve the set of constraints described by this dependence graph : : For For Tufts University Computer Science Tufts University Computer Science
Euation Dynamic, dependence-based methods Build the parse tree, dependence graph Topologically sort the graph Gives us an order of euation Rule-based methods Analyze rules at compiler-generation time Determine a fixed (static) ordering Euate nodes in that order Oblivious methods Ignore rules & parse tree Pick a convenient order (at design time) & use it Attribute grammars Clean, declarative Handle a wide variety of problems BUT, have limitations and euation issues Reality Never widely adopted In practice: Apply arbitrary code actions on attributes Order of euation dictated by parsing algorithm Only works for limited classes of attribute grammars Tufts University Computer Science Tufts University Computer Science 8 Special class: L-attributed L-attributed definition Use ues from parent and siblings For production A X X X n Each attribute of X i depends on Attributes of X X X i-, and Inherited attributes of A Suited to LL parsing Euate in a single top-down pass (left to right) Pass ues down through recursive descent Emit code to compute attributes in appropriate procedures Special class: S-attributed S-attributed definition All attributes are synthesized For production A X X X n Value of A is computed as a function of the attributes already computed for X X X n Suited to LR parsing Can be computed in a single bottom-up pass Associate pieces of code with each production At each reduction, the code is executed Tufts University Computer Science 9 Tufts University Computer Science Name for the ue associated with this production Another example expr_part ::= expr SEMI ; expr ::= expr:e PLUS expr:e {: RESULT = new Integer(e.intValue() + e.intvalue()); :} expr:e MINUS expr:e {: RESULT = new Integer(e.intValue() - e.intvalue()); :}... LPAREN expr:e RPAREN {: RESULT = e; :} NUMBER:n {: RESULT = n; :} ; Arbitrary code between {: and :} RESULT refers to the attribute of the LHS non-terminal Build an abstract syntax tree expr_part ::= expr SEMI ; expr ::= expr:e PLUS expr:e {: RESULT = new AddNode(e, e); :} expr:e MINUS expr:e {: RESULT = new SubNode(e, e); :}... LPAREN expr:e RPAREN {: RESULT = e; :} NUMBER:n {: RESULT = new Node(n); :} ; Tufts University Computer Science Tufts University Computer Science
Implementation How does this work? Where are the attributes stored? What do e, e, n, RESULT refer to? Key: store attributes on stack At a reduction of A β Pop β symbols symbol, state, attribute ue Map ues to names: expr:e PLUS expr:e e = top of stack, then PLUS, then e Invoke action code on ues store in RESULT Push RESULT back on stack with new symbol, state Next Static checking Tufts University Computer Science Tufts University Computer Science 8