C12/1 Introduction to Compilers and Translators pring 00 Outline More tips forll1) grammars Bottom-up parsing LR0) parser construction Lecture 5: Bottom-up parsing Lecture 5 C 12/1 pring '00 Andrew Myers 2 Administrivia Programming Assignment 1 due next class Fray) should be well under way -- leave time for testing, documentation do not need to construct DFA! All group assignments should have settled out Homework 2 due next Fray Reading: finish Chapter of Appel Lecture 5 C 12/1 pring '00 Andrew Myers Review Can make recursive descent parsers for LL1) grammars Natural language grammar LL1) grammar predictive parse table recursive-descent parser recursive-descent parser w/ AT generation How to perform this step? Lecture 5 C 12/1 pring '00 Andrew Myers Grammars Have been using grammar for language of sums with parentheses 1++))+5 imple grammar w/ left associativity: + E E E number ) LL1) grammar for same language: E ε + E number ) Lecture 5 C 12/1 pring '00 Andrew Myers 5 E ε + Left vs. Right Recursion + E E Right recursion : right-associative E + E 1 + Lecture 5 C 12/1 pring '00 Andrew Myers 6 + + 5 2 + Left recursion : left-associative + + 5 + + 1 2 1
Left-recursive vs Right-recursive Left-recursive grammars don t work with topdown parsing: arbitrary amount of lookahead needed derived string lookahead read/unread 1 1 + 2 + + + E 1 1 + 2 + + + E + E 1 1 + 2 + + + E + E + E 1 1 + 2 + + E + E + E + E 1 1 + 2 + + 1 + E + E + E 2 1 + 2 + + 1 + 2 + E + E 1 + 2 + + 1 + 2 + + E 1 + 2 + + 1 + 2 + + $ 1 + 2 + + + E E How to create an LL1) grammar Write a right-recursive grammar E + E Left-factor common prefixes, place suffix in new non-terminal E ε + Lecture 5 C 12/1 pring '00 Andrew Myers 7 Lecture 5 C 12/1 pring '00 Andrew Myers 8 EBNF Extended Backus-Naur Form: allows some regular expression syntax on RH *, +, ),? operators Iota spec:? = [ ]) BNF: operator at top level E ε + E + E ) * 1 + 2 + +)) + 5 EBNF version: no position on + associativity Lecture 5 C 12/1 pring '00 Andrew Myers 9 + + 5 1 2 + Top-down parsing EBNF Recursive-descent code can directly implement the EBNF grammar: E + E ) * vo parse_ ) { // parses sequence of E + E + E... parse_e ); while true) { switch token) { case + : token = input.read); parse_e); break; case ) : case EOF: return; default: throw new ParseError); Lecture 5 C 12/1 pring '00 Andrew Myers 10 Building a left-associative AT Expr parse_) { Expr result = parse_e); while true) { switch token) { case + : token = input.read); result = new Addresult, parse_e)); break; case ) : case EOF: return result; default: throw new ParseError); ummary Now have complete recipe for building a parser Language grammar LL1) grammar predictive parse table recursive-descent parser recursive-descent parser w/ AT generation Lecture 5 C 12/1 pring '00 Andrew Myers 11 Lecture 5 C 12/1 pring '00 Andrew Myers 12 2
Bottom-up parsing A more powerful parsing technology LR grammars -- more expressive than LL can handle left-recursive grammars, virtually all programming languages More natural expression of programming language syntax hift-reduce parsers automatic parser generators e.g. yacc,cup) detect errors as soon as possible allows better error recovery Lecture 5 C 12/1 pring '00 Andrew Myers 1 Top-down parsing 1+2++))+5 +E E+E )+E +E)+E +E+E)+E E+E+E)+E 1+E+E)+E 1+2+E)+E... In left-most derivation, entire tree above a token 2) has been expanded when encountered Must be able to predict productions! + E E E number ) + E E 5 ) + E + E ) E 2 + E 1 E Lecture 5 C 12/1 pring '00 Andrew Myers 1 Bottom-up parsing Right-most derivation -- backward tart with the tokens End with the start symbol + E E E number ) 1+2++))+5 E+2++))+5 +2++))+5 +E++))+5 ++))+5 +E+))+5 ++))+5 ++E))+5 +))+5 +E)+5 )+5 E+5 +E Lecture 5 C 12/1 pring '00 Andrew Myers 15 right-most derivation Progress of bottom-up parsing 1+2++))+5 1+2++))+5 E+2++))+5 1 +2++))+5 +2++))+5 1 +2++))+5 +E++))+5 1+2 ++))+5 ++))+5 1+2+ +))+5 +E+))+5 1+2+ +))+5 ++))+5 1+2+ +))+5 ++E))+5 1+2++ ))+5 +))+5 1+2++ ))+5 +E)+5 1+2++) )+5 )+5 1+2++) )+5 E+5 1+2++)) +5 +E 1+2++))+5 1+2++))+5 Lecture 5 C 12/1 pring '00 Andrew Myers 16 1+2++))+5 E+2++))+5 +2++))+5 +E++))+5 Bottom-up parsing + E E E number ) Advantage of bottom-up parsing: can select productions based on more information + E E 5 ) + E +E ) E 2 + E 1 E Top-down vs. Bottom-up Bottom-up: Don t need to figure out as much of the parse tree for a given amount of input scanned unscanned Top-down scanned unscanned Bottom-up Lecture 5 C 12/1 pring '00 Andrew Myers 17 Lecture 5 C 12/1 pring '00 Andrew Myers 18
hift-reduce parsing Parsing is a sequence of shift and reduce operations Parser state is a stack of terminals and nonterminals grows to the right) Unconsumed input is a string of terminals Current derivation step is always stack+input Derivation step stack unconsumed input 1+2++))+5 1+2++))+5 E+2++))+5 E +2++))+5 +2++))+5 +2++))+5 +E++))+5 +E ++))+5 Lecture 5 C 12/1 pring '00 Andrew Myers 19 hift-reduce parsing Parsing is a sequence of shifts and reduces hift -- move look-ahead token to stack stack input action 1+2++))+5 shift 1 1 +2++))+5 Reduce -- Replace symbols γ in top of stack with non-terminal symbol X, corresponding to production X γ pop γ, push X) stack input action +E ++))+5 reduce +E ++))+5 Lecture 5 C 12/1 pring '00 Andrew Myers 20 hift-reduce parsing derivation stack input stream action + E E E number ) 1+2++))+5 1+2++))+5 shift 1+2++))+5 1+2++))+5 shift 1+2++))+5 1 +2++))+5 reduce E num E+2++))+5 E +2++))+5 reduce E +2++))+5 +2++))+5 shift +2++))+5 + 2++))+5 shift +2++))+5 +2 ++))+5 reduce E num +E++))+5 +E ++))+5 reduce +E ++))+5 ++))+5 shift ++))+5 + +))+5 shift ++))+5 + +))+5 shift ++))+5 + +))+5 reduce E num Lecture 5 C 12/1 pring '00 Andrew Myers 21 Problem How do we know which action to take -- whether to shift or reduce, and which production? ometimes can reduce but shouldn t e.g., X εcan always be reduced ometimes can reduce in different ways Lecture 5 C 12/1 pring '00 Andrew Myers 22 Action election Problem Given stack σ and look-ahead symbol b, should we shift b onto the stack making it σb) reduce some production X γassuming that stack has the form αγ making it αx) If stack has form αγ, should apply reduction X γdepending on what stack prefix α is -- but α is different for different possible reductions, since γ s have different length. How to keep track? Lecture 5 C 12/1 pring '00 Andrew Myers 2 Parser tates Goal: know what reductions are legal at any given point Idea: summarize all possible stack prefixes α as a parser state Parser state is defined by a DFA that reads in the stack α Accept states of DFA: unique reduction! ummarizing discards information affects what grammars parser handles affects size of DFA number of states) Lecture 5 C 12/1 pring '00 Andrew Myers 2
LR0) parser Left-to-right scanning, Right-most derivation, zero look-ahead characters Too weak to handle most language grammars including this one) But will help us understand how to build better parsers An LR0) grammar: non-empty lists L ) L L L, x x,y) x, y,z), w) x)))) x, y, z, w))) Lecture 5 C 12/1 pring '00 Andrew Myers 25 Lecture 5 C 12/1 pring '00 Andrew Myers 26 LR0) states tart tate & Closure L ) L L, A state is a set of items An LR0) item is a production from the language with a separator. somewhere in the RH of the production E number. state E. ) item tuff before. already on stack beginnings of possible γ s to be reduced) tuff after. : what we might see next The prefixes α represented by state itself Lecture 5 C 12/1 pring '00 Andrew Myers 27 DFA start state closure. $. $ First step: augment grammar with prod n $ tart state of DFA: empty stack =. $ Closure of a state adds items for all productions whose LH occurs in an item in the state, just after. set of possible productions to be reduced next Added items have the. located at the beginning: no symbols for these items on the stack yet Lecture 5 C 12/1 pring '00 Andrew Myers 28. $ Applying symbols.. L ) L. L. L, L ) L L, In new state, include all items that have appropriate input symbol just after dot, and advance dot in those items and take closure.) Lecture 5 C 12/1 pring '00 Andrew Myers 29 Applying reduce actions. $.. L ) L. L. L, Pop RH off stack, replace with LH X X γ), rerun DFA e.g. x)) Lecture 5 C 12/1 pring '00 Andrew Myers 0 L states causing reductions L. ) L L., L. 5
1. $ Full DFA Appel p. 6). $ $ final state 2.. L ) L. L. L, L. 7 L L L,. L. ) L L., 6 L ). 8 9 L L,. Lecture 5 C 12/1 pring '00 Andrew Myers 1 ) 5 L ) L L, L ) Optimization: stack is labeled w/state L L, Let s try parsing x),y) derivation stack input action x),y) 1 x),y) shift, goto x),y) 1 x),y) shift, goto x),y) 1 x),y) shift, goto 2 x),y) 1 x 2 ),y) reduce ),y) 1 7 ),y) reduce L L),y) 1 L 5 ),y) shift, goto 6 L),y) 1 L 5 ) 6,y) reduce L),y) 1 7,y) reduce L L,y) 1 L 5,y) shift, goto 8 L,y) 1 L 5, 8 y) shift, goto 9 L,y) 1 L 5, 8 y 2 ) reduce L,) 1 L 5, 8 9 ) reduce L L, L) 1 L 5 ) shift, goto 6 L) 1 L 5 ) 6 reduce L) 1 $ done Lecture 5 C 12/1 pring '00 Andrew Myers 2 Bottom-up parsing Grammars can be parsed bottom-up using a DFA + stack tate construction converts grammar into states that capture information needed to know what action to take Next time: shift-reduce parsing tables LR, LR1) parsers, automatic parser generators Lecture 5 C 12/1 pring '00 Andrew Myers 6