Administrivia Ambiguity, Precedence, Associativity & op-down Parsing eam assignments this evening for all those not listed as having one. HW#3 is now available, due next uesday morning (Monday is a holiday). Lecture 9-10 (From slides by G. Necula & R. Bodik) 2/13/2008 Prof. Hilfinger CS164 Lecture 9 1 2/13/2008 Prof. Hilfinger CS164 Lecture 9 2 Remaining Issues How do we find a derivation of s? Ambiguity: what there is more than one parse tree (erpretation) for some string s? rrors: what there is no parse tree for some string s? Given a derivation, how do we construct an abstract syntax tree from it? Ambiguity Grammar ( ) Strings 2/13/2008 Prof. Hilfinger CS164 Lecture 9 3 2/13/2008 Prof. Hilfinger CS164 Lecture 9 4 Ambiguity. xample he string has two parse trees Ambiguity. xample he string has two parse trees is left-associative 2/13/2008 Prof. Hilfinger CS164 Lecture 9 5 has higher precedence than 2/13/2008 Prof. Hilfinger CS164 Lecture 9 6 1
Ambiguity (Cont.) A grammar is ambiguous it has more than one parse tree for some string quivalently, there is more than one rightmost or leftmost derivation for some string Ambiguity is bad Leaves meaning of some programs ill-defined Ambiguity is common in programming languages Arithmetic expressions IF-HN-LS 2/13/2008 Prof. Hilfinger CS164 Lecture 9 7 Dealing with Ambiguity here are several ways to handle ambiguity Most direct method is to rewrite the grammar unambiguously ( ) nforces precedence of over nforces left-associativity of and 2/13/2008 Prof. Hilfinger CS164 Lecture 9 8 Ambiguity. xample he has only one parse tree now Ambiguity: he Dangling lse Consider the grammar then then else OHR his grammar is also ambiguous 2/13/2008 Prof. Hilfinger CS164 Lecture 9 9 2/13/2008 Prof. Hilfinger CS164 Lecture 9 10 he Dangling lse: xample he expression 1 then 2 then 3 else 4 has two parse trees 1 2 3 4 ypically we want the second form 2/13/2008 Prof. Hilfinger CS164 Lecture 9 11 1 2 3 4 he Dangling lse: A Fix else matches the closest unmatched then We can describe this in the grammar (distinguish between matched and unmatched then ) MIF / all then are matched / UIF / some then are unmatched / MIF then MIF else MIF OHR UIF then then MIF else UIF Describes the same set of strings 2/13/2008 Prof. Hilfinger CS164 Lecture 9 12 2
he Dangling lse: xample Revisited he expression 1 then 2 then 3 else 4 1 2 3 4 A valid parse tree (for a UIF) 2/13/2008 Prof. Hilfinger CS164 Lecture 9 13 1 2 3 4 Not valid because the then expression is not a MIF Ambiguity Impossible to convert automatically an ambiguous grammar to an unambiguous one Used with care, ambiguity can simply the grammar Sometimes allows more natural definitions But we need disambiguation mechanisms Instead of rewriting the grammar Use the more natural (ambiguous) grammar Along with disambiguating declarations Most tools allow precedence and associativity declarations to disambiguate grammars xamples 2/13/2008 Prof. Hilfinger CS164 Lecture 9 14 Associativity Declarations Consider the grammar Ambiguous: two parse trees of Left-associativity declaration: %left 2/13/2008 Prof. Hilfinger CS164 Lecture 9 15 Precedence Declarations Consider the grammar And the string 2/13/2008 Prof. Hilfinger CS164 Lecture 9 16 Precedence declarations: %left %left How It s Done I: Intro to op-down Parsing op-down Depth-First Parsing erminals are seen in order of appearance in the token stream: t 1 t 2 t 3 t 4 t 5 he parse tree is constructed From the top From left to right As for leftmost derivation t 1 A B C t 2 t 3 D t 4 t 4 Consider the grammar ( ) oken stream is: Start with top-level non-terminal ry the rules for in order 2/13/2008 Prof. Hilfinger CS164 Lecture 9 17 2/13/2008 Prof. Hilfinger CS164 Lecture 9 18 3
Depth-First Parsing. xample Depth-First Parsing. xample Start with start symbol ry hen try a rule for ( ) () But ( input ; backtrack to ry. oken matches. But input ; back to ry But (skipping some steps) can t be matched Must backtrack to 2/13/2008 Prof. Hilfinger CS164 Lecture 9 19 ry Follow same steps as before for And succeed with and With the following parse tree 2/13/2008 Prof. Hilfinger CS164 Lecture 9 20 Depth-First Parsing Parsing: given a string of tokens t 1 t 2... t n, find a leftmost derivation (and thus, parse tree) Depth-first parsing: Beginning with start symbol, try each production exhaustively on leftmost non-terminal in current sentential form and recurse. 2/13/2008 Prof. Hilfinger CS164 Lecture 9 21 Depth-First Parsing of t 1 t 2 t n At a given moment, have sentential form that looks like this: t 1 t 2 t k A, 0 k n Initially, k=0 and A is just start symbol ry a production for A: A BC is a production, the new form is t 1 t 2 t k B C Backtrack when leading terminals aren t prefix of t 1 t 2 t n and try another production Stop when no more non-terminals and terminals all matched (accept) or no more productions left (reject) 2/13/2008 Prof. Hilfinger CS164 Lecture 9 22 When Depth-First Doesn t Work Well Consider productions S S a a: In the process of parsing S we try the above rules Applied consistently in this order, get infinite loop Could re-order productions, but search will have lots of backtracking and general rule for ordering is complex Problem here is left-recursive grammar: one that has a non-terminal S S Sα for some α limination of Left Recursion Consider the left-recursive grammar S S α β S generates all strings starting with a β and followed by any number of α s Can rewrite using right-recursion S β S S α S 2/13/2008 Prof. Hilfinger CS164 Lecture 9 23 2/13/2008 Prof. Hilfinger CS164 Lecture 9 24 4
limination of left Recursion. xample Consider the grammar S 1 S 0 ( β = 1 and α = 0 ) can be rewritten as S 1 S S 0 S More limination of Left Recursion In general S S α 1 S α n β 1 β m All strings derived from S start with one of β 1,,β m and continue with several instances of α 1,,α n Rewrite as S β 1 S β m S S α 1 S α n S 2/13/2008 Prof. Hilfinger CS164 Lecture 9 25 2/13/2008 Prof. Hilfinger CS164 Lecture 9 26 General Left Recursion he grammar S A α δ (1) A S β (2) is also left-recursive because S S β α his left recursion can also be eliminated by first substituting (2) o (1) here is a general algorithm (e.g. Aho, Sethi, Ullman 4.3) But personally, I d just do this by hand. 2/13/2008 Prof. Hilfinger CS164 Lecture 9 27 An Alternative Approach Instead of reordering or rewriting grammar, can use top-down breadth-first search. S S a a String: aaa S a a a S a a S S a a (not all matched) a a a a a 2/13/2008 Prof. Hilfinger CS164 Lecture 9 28 Summary of op-down Parsing So Far Simple and general parsing strategy Left recursion must be eliminated first but that can be done automatically Or can use breadth-first search But backtracking (depth-first) or maaining list of possible sentential forms (breadthfirst) can make it slow Often, though, we can avoid both Predictive Parsers Modication of depth-first parsing in which parser predicts which production to use By looking at the next few tokens No backtracking Predictive parsers accept LL(k) grammars L means left-to-right scan of input L means leftmost derivation k means need k tokens of lookahead to predict In practice, LL(1) is used 2/13/2008 Prof. Hilfinger CS164 Lecture 9 29 2/13/2008 Prof. Hilfinger CS164 Lecture 9 30 5
Recursive Descent: Grammar as Program In recursive descent, we think of a grammar as a program. Here, we ll look at predictive version. ach non-terminal is turned o a procedure ach right-hand side transliterated o part of the procedure body for its non-terminal First, define next() current token of input scan(t) check that next()=t (else RROR), and then read new token. 2/13/2008 Prof. Hilfinger CS164 Lecture 9 31 Recursive Descent: xample P S $ S S S S ( S ) def P (): S(); scan($) def S(): (); S () def S (): predict next() == : scan( ); S() el next() in [ ), $]: pass else: RROR def (): next () == : scan() el next() == ( : scan( ( ); S(); scan ( ) ) else: RROR ($ = end marker) But where do prediction tests come from? 2/13/2008 Prof. Hilfinger CS164 Lecture 9 32 Predicting Right-hand Sides he -tests are conditions by which parser predicts which right-hand side to use. In our example, used only next symbol (LL(1)); but could use more. Can be specied as a 2D table One dimension for current non-terminal to expand One dimension for next token A table entry contains one production But First: Left Factoring With the grammar ( ) Impossible to predict because For two productions start with For it is not clear how to predict A grammar must be left-factored before use for predictive parsing 2/13/2008 Prof. Hilfinger CS164 Lecture 9 33 2/13/2008 Prof. Hilfinger CS164 Lecture 9 34 Left-Factoring xample Starting with the grammar ( ) Factor out common prefixes of productions X X ( ) Y Y Recursive Descent for Real So far, have presented a purist view. In fact, use of recursive descent makes le simpler in many ways we cheat a bit. Here s how you really handle left recursion in recursive descent, for S S A R: def S(): R () while next() FIRS(A): A() It s a program: all kinds of shortcuts possible. 2/13/2008 Prof. Hilfinger CS164 Lecture 9 35 2/13/2008 Prof. Hilfinger CS164 Lecture 9 36 6
LL(1) Parsing able xample Left-factored grammar X ( ) Y X Y he LL(1) parsing table ($ is a special end marker): X Y Y X ( ( ) X ) $ LL(1) Parsing able xample (Cont.) Consider the [, ] entry Means When current non-terminal is and next input is, use production X X can generate string with in front Consider the [Y,] entry When current non-terminal is Y and current token is, get rid of Y We ll see later why this is right 2/13/2008 Prof. Hilfinger CS164 Lecture 9 37 2/13/2008 Prof. Hilfinger CS164 Lecture 9 38 LL(1) Parsing ables. rrors Blank entries indicate error situations Consider the [,] entry here is no way to derive a string starting with from non-terminal Using Parsing ables ssentially table-driven recursive descent, except We use a stack to keep track of pending nonterminals We reject when we encounter an error state We accept when we encounter end-of-input 2/13/2008 Prof. Hilfinger CS164 Lecture 9 39 2/13/2008 Prof. Hilfinger CS164 Lecture 9 40 LL(1) Parsing Algorithm initialize stack = <S,$> repeat case stack of <X, rest> : [X,next()] == Y 1 Y n : stack <Y 1 Y n rest>; else: error (); <t, rest> : scan (t); stack <rest>; until stack == < > LL(1) Parsing xample Stack Input Action $ $ X X $ $ Y Y X $ $ terminal Y X $ $ X $ $ terminal X $ $ Y Y X $ $ terminal Y X $ $ X $ $ $ $ ACCP 2/13/2008 Prof. Hilfinger CS164 Lecture 9 41 2/13/2008 Prof. Hilfinger CS164 Lecture 9 42 7
Constructing Parsing ables LL(1) languages are those definable by a parsing table for the LL(1) algorithm such that no table entry is multiply defined Once we have the table Can create table-driver or recursive-descent parser he parsing algorithms are simple and fast No backtracking is necessary We want to generate parsing tables from CFG 2/13/2008 Prof. Hilfinger CS164 Lecture 9 43 Predicting Productions I op-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal 2/13/2008 Prof. Hilfinger CS164 Lecture 9 44 Predicting Productions II Predicting Productions III op-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal he leaves at any po form a string βaγ β contains only terminals he input string is βbδ he prefix β matches he next token is b 2/13/2008 Prof. Hilfinger CS164 Lecture 9 45 op-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal he leaves at any po form a string βaγ (A=, γ=) β contains only terminals γ contains any symbols he input string is βbδ (b=) So Aγ must derive bδ 2/13/2008 Prof. Hilfinger CS164 Lecture 9 46 Predicting Productions IV Predicting Productions V op-down parsing expands a parse tree from the start symbol to the leaves Always expand the leftmost non-terminal So choose production for that can eventually derive something that starts with 2/13/2008 Prof. Hilfinger CS164 Lecture 9 47 Go back to previous grammar, with X X ( ) Y Y Y X Here, YX must match Since doesn t start with, can t use Y 2/13/2008 Prof. Hilfinger CS164 Lecture 9 48 8
Predicting Productions V Go back to previous grammar, with X X ( ) Y Y Y X But can follow Y, so we choose Y FIRS and FOLLOW o summarize, we are trying to predict how to expand A with b the next token, then either: b belongs to an expansion of A Any A α can be used b can start a string derived from α In this case we say that b First(α) or b does not belong to an expansion of A, A has an expansion that derives, and b belongs to something that can follow A (so S βabω) We say that b Follow(A) in this case. 2/13/2008 Prof. Hilfinger CS164 Lecture 9 49 2/13/2008 Prof. Hilfinger CS164 Lecture 9 50 Summary of Definitions Computing First Sets For b, the set of terminals; α a sequence of terminal & non-terminal symbols, S the start symbol, A N, the set of non-terminals: FIRS(α) { } b FIRS(α) f α b FIRS(α) f α FOLLOW(A) b FOLLOW(A) f S A b 2/13/2008 Prof. Hilfinger CS164 Lecture 9 51 Definition First(X) = { b X bα} { X }, X any grammar symbol. 1. First(b) = { b } for b any terminal symbol 2. For all productions X A 1 A n Add First(A 1 ) {} to First(X). Stop First(A 1 ) Add First(A 2 ) {} to First(X). Stop First(A 2 ) Add First(A n ) {} to First(X). Stop First(A n ) Add to First(X) 3. Repeat 2 until nothing changes ( compute fixed po ) 2/13/2008 Prof. Hilfinger CS164 Lecture 9 52 Computing First Sets, Contd. hat takes care of single-symbol case. In general: FIRS(X 1 X 2 X k ) = FIRS(X 1 ) FIRS(X 2 ) FIRS(X 1 ) FIRS(X k ) FIRS(X 1 X 2 X k-1 ) - { } unless FIRS(X i ) i First Sets. xample For the grammar X X ( ) Y Y First sets First( ( ) = { ( } First( ) = {, ( } First( ) ) = { ) } First( ) = {, ( } First( ) = { } First( X ) = {, } First( ) = { } First( Y ) = {, } First( ) = { } 2/13/2008 Prof. Hilfinger CS164 Lecture 9 53 2/13/2008 Prof. Hilfinger CS164 Lecture 9 54 9
Computing Follow Sets Follow Sets. xample Definition Follow(X) = { b S β X b ω } 1. Compute the First sets for all non-terminals first 2. Add $ to Follow(S) ( S is the start non-terminal) 3. For all productions Y X A 1 A n Add First(A 1 ) {} to Follow(X). Stop First(A 1 ) Add First(A 2 ) {} to Follow(X). Stop First(A 2 ) Add First(A n ) {} to Follow(X). Stop First(A n ) Add Follow(Y) to Follow(X) 4. Repeat 3 until nothing changes. For the grammar X ( ) Y Follow sets Follow( ) = {), $} Follow( X ) = {$, ) } Follow( Y ) = {, ), $} Follow( ) = {, ), $} X Y 2/13/2008 Prof. Hilfinger CS164 Lecture 9 55 2/13/2008 Prof. Hilfinger CS164 Lecture 9 56 Constructing LL(1) Parsing ables LL(1) Parsing able xample Construct a parsing table for CFG G X ( ) Y X Y For each production A α in G do: For each terminal b First(α) do [A, b] = α If α, for each b Follow(A) do [A, b] = α 2/13/2008 Prof. Hilfinger CS164 Lecture 9 57 X Y Y X ( ) X Follow( ) = {), $} First( ) = {, ( } Follow( X ) = {$, ) } First( ) = {, ( } Follow( Y ) = {, ), $} First( X ) = {, } Follow( ) = {, ), $} First( Y ) = {, } 2/13/2008 Prof. Hilfinger CS164 Lecture 9 58 ( ) $ Notes on LL(1) Parsing ables If any entry is multiply defined then G is not LL(1). his happens If G is ambiguous If G is left recursive If G is not left-factored And in other cases as well Most programming language grammars are not LL(1) here are tools that build LL(1) tables Review For some grammars there is a simple parsing strategy Predictive parsing (LL(1)) Once you build the LL(1) table (or know the FIRS and FOLLOW sets), you can write the parser by hand: recursive descent. Next: a more powerful parsing strategy for grammars that are not LL(1) 2/13/2008 Prof. Hilfinger CS164 Lecture 9 59 2/13/2008 Prof. Hilfinger CS164 Lecture 9 60 10