COMP3131/9102: Programming Languages and Compilers Jingling Xue School of Computer Science and Engineering The University of New South Wales Sydney, NSW 2052, Australia http://www.cse.unsw.edu.au/~cs3131 http://www.cse.unsw.edu.au/~cs9102 Copyright @2018, Jingling Xue COMP3131/9102 Page 257 March 26, 2018
Feedback for Assignment 2 Not required to handle errors. The output is specified precisely in the spec ( successful or unsuccessful ) The empty program (i.e.,ǫ) is legal (as in C, C++ an Java) The following declaration is (syntactically) legal: void v; To build a recursive-descent parser, we may need to transform the grammar to eliminate common prefixes and left recursion. But L(The VC grammar) = L(your transformed grammar) COMP3131/9102 Page 258 March 26, 2018
Assignment 3 Modify your recogniser to build a parser (that constructs the AST for the program being compiled) Important to complete your Assignment 2 on time The spec will be available on the coming Monday COMP3131/9102 Page 259 March 26, 2018
Lecture 5: Top-Down Parsing: Table-Driven 1. Compare and contrast top-down and bottom-up parsing 2. LL(1) table-driven parsing 3. Parser generators 4. Recursive-descent parsing revisited 5. Error recovery Grammar G Eliminating left recursion & common prefixes The Transformed Grammar G Constructing First, Follow and Select Sets forg A Recursive-Descent Parser The LL(1) Parsing Table The LL(1) Table-Driving Parser The red parts done last week The the blue parts today COMP3131/9102 Page 260 March 26, 2018
The micro-english Grammar Revisited 1 sentence subject predicate 2 subject NOUN 3 ARTICLE NOUN 4 predicate VERB object 5 object NOUN 6 ARTICLE NOUN The English Sentence PETER PASSED THE TEST COMP3131/9102 Page 261 March 26, 2018
The micro-english Grammar Revisited (Cont d) The Leftmost Derivation: sentence = lm subject predicate by P1 = lm NOUN predicate by P2 = lm NOUN VERB object by P4 = lm NOUN VERB ARTICLE NOUN by P6 The Rightmost Derivation: sentence = rm subject predicate by P1 = rm subject VERB object by P4 = rm subject VERB ARTICLE NOUN by P6 = rm NOUN VERB ARTICLE NOUN by P2 COMP3131/9102 Page 262 March 26, 2018
The Role of the Parser PETER PASSED THE TEST Scanner NOUN1 VERB ARTICLE NOUN2 Parser subject NOUN PETER sentence VERB PASSED predicate ARTICLE THE object NOUN TEST COMP3131/9102 Page 263 March 26, 2018
Two General Parsing Methods 1. Top-down parsing Build the parse tree top-down: Productions used represent the leftmost derivation. The best known and widely used methods: Recursive descent Table-driven LL(k) (Left-to-right scan of input, Leftmost derivation, k tokens of lookahead). Almost all programming languages can be specified by LL(1) grammars, but such grammars may not reflect the structure of a language In practice, LL(k) for small k is used Implemented more easily by hand. Used in parser generators such as JavaCC 2. Bottom-up parsing Build the parse tree bottom-up: Productions used represent the rightmost derivation in reverse. The best known and widely used method: LR(1) (Left-to- right scan of input, Rightmost derivation in reverse, 1 token of lookahead) More powerful every LL(1) is LR(1) but the converse is false Used by parser generators (e.g., Yacc and JavaCUP). COMP3131/9102 Page 264 March 26, 2018
Lookahead Token(s) Lookahead Token(s): The currently scanned token(s) in the input. In Recogniser.java, currenttoken represents the lookahead token For most programming languages, one token lookahead only. Initially, the lookahead token is the leftmost token in the input. COMP3131/9102 Page 265 March 26, 2018
Top-Down Parse Of NOUN VERB ARTICLE NOUN TREE INPUT NOUN sentence TREE INPUT subject NOUN sentence predicate Notations: on the tree indicates the nonterminal being expanded or recognised on the sentence points to the lookahead token All tokens to theleft of have been read All tokens to theright of have NOT been processed COMP3131/9102 Page 266 March 26, 2018
TREE sentence subject predicate INPUT NOUN NOUN TREE sentence INPUT subject NOUN NOUN VERB predicate COMP3131/9102 Page 267 March 26, 2018
TREE sentence subject predicate INPUT NOUN NOUN VERB VERB object TREE sentence subject predicate INPUT NOUN VERB NOUN VERB ARTICLE object COMP3131/9102 Page 268 March 26, 2018
TREE sentence subject NOUN VERB predicate object INPUT ARTICLE NOUN VERB ARTICLE NOUN TREE sentence subject NOUN VERB predicate object INPUT ARTICLE NOUN VERB ARTICLE NOUN NOUN COMP3131/9102 Page 269 March 26, 2018
Top-Down Parsing Build the parse tree starting with the start symbol (i.e., the root) towards the sentence being analysed (i.e., leaves). Use one token of lookahead, in general Discover the leftmost derivation I.e, the productions used in expanding the parse tree represent a leftmost derivation COMP3131/9102 Page 270 March 26, 2018
Predictive (Non-Backtracking) Top-Down Parsing To expand a nonterminal, the parser always predict (choose) the right alternative for the nonterminal by looking at the lookahead symbol only. Flow-of-control constructs, with their distinguishing keywords, are detectable this way, e.g., in the VC grammar: stmt compound-stmt if ( expr ) (ELSE stmt )? break ; continue ; Prediction happens before the actual match begins. COMP3131/9102 Page 271 March 26, 2018
Bottom-Up Parse Of NOUN VERB ARTICLE NOUN TREE INPUT NOUN TREE INPUT subject NOUN NOUN TREE INPUT subject NOUN NOUN VERB COMP3131/9102 Page 272 March 26, 2018
TREE INPUT subject NOUN NOUN VERB ARTICLE TREE INPUT subject NOUN NOUN VERB ARTICLE NOUN TREE INPUT subject NOUN ARTICLE object NOUN VERB ARTICLE NOUN NOUN Note: What if the parser had chosen subject ARTICLENOUN instead of object ARTICLENOUN? In this case, the parser would not make any further process. Having read a subject andverb, the parser has reached a state in which it should not parse another subject. COMP3131/9102 Page 273 March 26, 2018
TREE subject predicate NOUN VERB object INPUT ARTICLE NOUN VERB ARTICLE NOUN NOUN TREE sentence subject predicate NOUN VERB object INPUT ARTICLE NOUN VERB ARTICLE NOUN NOUN COMP3131/9102 Page 274 March 26, 2018
Bottom-Up Parsing Build the parse tree starting with the the sentence being analysed (i.e., leaves) towards the start symbol (i.e., the root). Use one token of lookahead, in general. The basic (smallest) language constructs recognised first, then they are used to discover more complex constructs. Discover the rightmost derivation in reverse the productions used in expanding the parse tree represent a rightmost derivation in reverse order COMP3131/9102 Page 275 March 26, 2018
Lecture 5: Top-Down Parsing: Table-Driven 1. Compare and contrast top-down and bottom-up parsing 2. LL(1) table-driven parsing 3. Parser generators 4. Recursive-descent parsing revisited COMP3131/9102 Page 276 March 26, 2018
Predictive Non-Recursive Top-Down Parsers Recursion = Iteration + Stack Recursive calls in a recursive-descent parser can be implemented using an explicit stack, and a parsing table Understanding one helps your understanding the other COMP3131/9102 Page 277 March 26, 2018
The Structure of a Table-Driven LL(1) Parser Input parsed from left to right Leftmost derivation 1 token of lookahead Stack source code Scanner token get next token Parser tree grammar LL(1) Table Generator Parsing Table LR(1) parsers (almost always table-driven) also built this way COMP3131/9102 Page 278 March 26, 2018
The Model of an LL(1) Table-Driven Parser INPUT a + b $ STACK X Y Z $ LL(1) Parsing Program LL(1) Parsing Table Output S T T XYZ Output: The productions used (representing the leftmost derivation), or An AST (Lecture 6) COMP3131/9102 Page 279 March 26, 2018
Push $ onto the stack The LL(1) Parsing Program Push the start symbol onto the stack WHILE (stack not empty)do BEGIN Let X be the top stack symbol and a be the lookahead symbol in the input IF X is a terminalthen IF X = a then pop X and get the next token /* match */ ELSE error ELSE / X is a nonterminal / IF T able[x, a] nonblank THEN PopX PushTable[X,a] onto stack in the reverse order ELSE error END The parsing is successful when the stack is empty and no errors reported COMP3131/9102 Page 280 March 26, 2018
Building a Table-Driving Parser from a Grammar Grammar G Eliminating left recursion & common prefixes (Slide 255) The Transformed Grammar G Constructing First, Follow and Select Sets forg Constructing the LL(1) Parsing Table The LL(1) Table-Driving Parser Follow Slide 255 to eliminate left recursion to get a BNF grammar Can build a table-driven parser for EBNF as well (but not considered) COMP3131/9102 Page 281 March 26, 2018
The Expression Grammar The grammar with left recursion: Grammar 1: E E +T E T T T T F T/F F F INT (E) The transformed grammar without left recursion: Grammar 2: E TQ Q +TQ TQ ǫ T FR R FR /FR ǫ F INT (E) COMP3131/9102 Page 282 March 26, 2018
First sets: Follow sets: First and Follow Sets for Grammar 2 First(TQ) = First(FR) = {(,i} First(Q) = {(+,, ǫ} First(R) = {(,/,ǫ} First(+T Q) = {+} First( T Q) = { } First( F R) = { } First(/FR) = {/} First((E)) = {(} First(i) = {i} Follow(E) = {$,)} Follow(Q) = {$,)} Follow(T) = {+,,$,)} Follow(R) = {+,, $,)} Follow(F) = {+,,,/,$,)} COMP3131/9102 Page 283 March 26, 2018
Select Sets for Grammar 2 Select(E T Q) = First(T Q) = {(, INT} Select(Q + T Q) = First(+T Q) = {+} Select(Q T Q) = First( T Q) = { } Select(Q ǫ) = (First(ǫ) {ǫ}) Follow(Q) = {), $} Select(T F R) = First(F R) = {(, INT} Select(R F R) = First(+F R) = { } Select(R /FR) = First(/FR) = {/} Select(R ǫ) = (First(ǫ) {ǫ}) Follow(T) = {+,,), $} Select(F INT) = First(INT) = {INT} Select(F (E)) = First((E)) = {(} COMP3131/9102 Page 284 March 26, 2018
The Rules for Constructing an LL(1) Parsing Table For every production of the A α in the grammar, do: for allain Select(A α), set Table[A,a] = α COMP3131/9102 Page 285 March 26, 2018
LL(1) Parsing Table for Grammar 2 INT + / ( ) $ E TQ TQ Q +TQ TQ ǫ ǫ T FR FR R ǫ ǫ FR /FR ǫ ǫ F INT The blanks are errors. (E) COMP3131/9102 Page 286 March 26, 2018
An LL(1) Parse on Input i+i: INT i STACK INPUT PRODUCTION DERIVATION $E i+i$ E TQ E= lm TQ $QT i+i$ T FR = lm FRQ $QRF i+i$ F i = lm irq $QRi i+i$ pop and go to next token $QR +i$ R ǫ = lm iq $Q +i$ Q +TQ = lm i+tq $QT+ +i$ pop and go to next token $QT i$ T FR = lm i+frq $QRF i$ F i = lm i+irq $QRi i$ pop and go to next token $QR $ R ǫ = lm i+irq $Q $ Q ǫ = lm i+iq $ $ F i PARSE TREE E T Q R + T ǫ F R i ǫ Q ǫ COMP3131/9102 Page 287 March 26, 2018
An LL(1) Parse on an Erroneous Input () STACK INPUT PRODUCTION DERIVATION $E ()$ E TQ E= lm TQ $QT ()$ T FR E= lm FRQ $QRF ()$ F (E) E= lm (E)RQ $QR)E( ()$ pop and go to next token $QR)E )$ Error: no table entry for [E,)] A better error message: expression missing inside ( ) COMP3131/9102 Page 288 March 26, 2018
LL(1) Grammars and Table-Driven LL(1) Parsers Like recursive descent, table-driven LL(1) parsers can only parse LL(1) grammars. Conversely, only LL(1) grammars can be parsed by the table-driven LL(1) parsers. Definition of LL(1) grammar given in Slide 237 Definition of LL(1) grammar using the parsing table: A grammar is LL(1) if every table entry contains at most one production. COMP3131/9102 Page 289 March 26, 2018
Why Table-Driven LL(1) Parsers Cannot Handle Left Recursions? A grammar with left recursion: Select Sets: The parsing table: expr expr + id id Select( expr +id) = {id} Select(id) = {id} id $ expr expr +id id T able[ expr, id] contains two entries! Any grammar with left recursions is not LL(1) COMP3131/9102 Page 290 March 26, 2018
Why Table-Driven LL(1) Parsers Cannot Handle Left Recursions (Cont d)? Eliminating the left recursion yields an LL(1) grammar: Select Sets: expr id expr-tail expr-tail ǫ + id expr-tail Select expr id expr-tail ) = {id} Select( expr-tail ǫ) = = {$} Select( expr-tail +id expr-tail ) = {+} The parsing table for the transformed grammar: expr expr-tail id + $ id expr-tail +id expr-tail ǫ COMP3131/9102 Page 291 March 26, 2018
Why LL(1) Table-Driven Parsers Cannot Handle Common Prefixes? A grammar with a common prefix: Select sets: S if(e) S if(e) S elses s E e Select(S if(e) S) = {if} Select(S if(e) S else S) = {if} Any grammar with common prefixes is not LL(1) Eliminating the common prefix does not yield an LL(1) grammar: S if(e) SQ s Q else S ǫ E e COMP3131/9102 Page 292 March 26, 2018
Why LL(1) Table-Driven Parsers Cannot Handle Common Prefixes (Cont d)? Select sets: The parsing table: Select(S if(e)sq) = {if} Select(S s) = {s} Select(Q elses) = {else} Select(Q ǫ) = {else, ǫ} Select(E e) = {e} if ( e ) s else $ S ife then SQ s E Q e else S ǫ This modified grammar, although having no common prefixes, is still ambiguous. You are referred to Week 6 Tutorial. To resolve the ambiguity in the grammar, we make the convention to selectelses as the table entry. This effectively implements the following rule: Match anelse to the most recent unmatchedthen COMP3131/9102 Page 293 March 26, 2018 ǫ
Recognise Palindromes Easily Grammar: Parsing Table: S (S) ǫ ( ) $ S (S) ǫ ǫ Try to parse the following three inputs: a. (()) b. (() c. ()) Cannot design a DFA/NFA to recognise the language L(S) COMP3131/9102 Page 294 March 26, 2018
Lecture 5: Top-Down Parsing: Table-Driven 1. Compare and contrast top-down and bottom-up parsing 2. LL(1) table-driven parsing 3. Parser generators 4. Recursive-descent parsing revisited COMP3131/9102 Page 295 March 26, 2018
The Expression Grammar The grammar with left recursion: Grammar 1: E E +T E T T T T F T/F F F INT (E) Eliminating left recursion using the Kleene Closure Grammar 3: E T ( + T - T) T F ( * F / F) F INT ( E ) All tokens are enclosed in double quotes to distinguish them for the regular operators: (, ) and Compare with Slide 279 COMP3131/9102 Page 296 March 26, 2018
Parser Generators (Generating Top-Down Parsers) Grammar G Parser Generator Recursive-Descent Parser LL(k) Parsing Tables Tool Grammar Accepted Parsers and Their Implementation Languages JavaCC EBNF Recursive-Descent LL(1) (with some LL(k) portions) in Java COCO/R EBNF Recursive-Descent LL(1) in Pascal, C, C++,, Java, etc. ANTLR Predicated LL(k) Recursive-Descent LL(k) in C, C++,, Java These and other tools can be found on the internet Predicated: a conditional evaluated by the parser at run time to determine which of the two conflicting productions to use Q if (lookahead is else ) else S ǫ where the condition inside the box resolves the dangling-else problem. COMP3131/9102 Page 297 March 26, 2018
Parser Generators (Generating Bottom-Up Parsers) Grammar G Parser Generator LALR(1) Parsing Tables Tool Grammar Accepted Parsers and Their Implementation Languages Yacc BNF LALR(1) table-driven in C JavaCUP BNF LALR(1) table-driven in Java These and other tools can be found on the internet Will not deal with LR parsing in this course COMP3131/9102 Page 298 March 26, 2018
The JavaCC Spec for Grammar 3 /* * Parser.jj * * The scanner and parser for Grammar 3 * * Install JavaCC from https://javacc.dev.java.net/ * * 1. javacc Parser.jj * 2. javac Parser.java * 3. java Parser */ options { LOOKAHEAD=1; } PARSER_BEGIN(Parser) public class Parser { public static void main(string args[]) throws ParseException { Parser parser = new Parser (System.in); while (true) { System.out.print("Enter Expression: "); System.out.flush(); try { switch (parser.one_line()) { COMP3131/9102 Page 299 March 26, 2018
case -1: System.exit(0); default: System.out.println("Compilation was successful."); break; } } catch (ParseException x) { System.out.println("Exiting."); throw x; } } } } PARSER_END(Parser) SKIP : { " " "\r" "\t" } TOKEN : { } < EOL: "\n" > TOKEN : /* OPERATORS */ { COMP3131/9102 Page 300 March 26, 2018
< PLUS: "+" > < MINUS: "-" > < MULTIPLY: "*" > < DIVIDE: "/" > } TOKEN : { < CONSTANT: ( <DIGIT> )+ > < #DIGIT: ["0" - "9"] > } int one_line() : {} { Expr() <EOL> { return 1; } <EOL> { return 0; } <EOF> { return -1; } } void Expr() : { } { Term() (( <PLUS> <MINUS> ) Term())* } COMP3131/9102 Page 301 March 26, 2018
void Term() : { } { Factor() (( <MULTIPLY> <DIVIDE> ) Factor())* } void Factor() : {} { <CONSTANT> "(" Term() ")" } COMP3131/9102 Page 302 March 26, 2018
Lecture 5: Top-Down Parsing: Table-Driven 1. Compare and contrast top-down and bottom-up parsing 2. LL(1) table-driven parsing 3. Parser generators 4. Recursive-descent parsing revisited COMP3131/9102 Page 303 March 26, 2018
Reading Table-driven parsing: pages 186 192 of Red Dragon or 4.4.4 of Purple Dragon JavaCC, ANTLR and COCO/R: available on the Web JavaCUP also available on the Web http://www.cs.princeton.edu/~appel/modern/java/cup/ Error recovery for LL parsers: Using acceptance sets: Pages 192 195 of Red Dragon / Pages 4.4.5 of Purple Dragon http://teaching.idallen.com/cst8152/98w/panic_mode.html Using continuation (pages 136 142 of Grune et al s 2000 compiler book. ISBN: 0-471-97697-0) Next Class: Abstract Syntax Trees and Assignment 3 COMP3131/9102 Page 304 March 26, 2018