Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.

Similar documents
Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

MiniJava Parser and AST. Winter /27/ Hal Perkins & UW CSE E1-1

Context-free grammars (CFG s)

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

CSE 3302 Programming Languages Lecture 2: Syntax

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter UW CSE P 501 Winter 2016 C-1

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Context-free grammars

CPS 506 Comparative Programming Languages. Syntax Specification

3. Parsing. Oscar Nierstrasz

Compiler Construction: Parsing

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Building Compilers with Phoenix

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Compiler Passes. Semantic Analysis. Semantic Analysis/Checking. Symbol Tables. An Example

CMSC 330: Organization of Programming Languages

Principles of Programming Languages COMP251: Syntax and Grammars

Building Compilers with Phoenix

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3

A programming language requires two major definitions A simple one pass compiler

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Bottom-Up Parsing. Lecture 11-12

Syntax-Directed Translation

Concepts Introduced in Chapter 4

Bottom-Up Parsing. Lecture 11-12

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Monday, September 13, Parsers

Lecture 8: Deterministic Bottom-Up Parsing

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

LR Parsing LALR Parser Generators

CS 314 Principles of Programming Languages

Part 5 Program Analysis Principles and Techniques

CMSC 330: Organization of Programming Languages

LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

Lecture 7: Deterministic Bottom-Up Parsing

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

A Simple Syntax-Directed Translator

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Syntax. A. Bellaachia Page: 1

CMSC 330: Organization of Programming Languages. Context Free Grammars

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

CMSC 330: Organization of Programming Languages

Context-Free Grammars

Conflicts in LR Parsing and More LR Parsing Types

Wednesday, August 31, Parsers

CMSC 330: Organization of Programming Languages. Context Free Grammars

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

CS 4120 Introduction to Compilers

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

LR Parsing LALR Parser Generators

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Parsing II Top-down parsing. Comp 412

CSE 401 Midterm Exam Sample Solution 2/11/15

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

CMSC 330: Organization of Programming Languages. Context Free Grammars

S Y N T A X A N A L Y S I S LR

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Principles of Programming Languages COMP251: Syntax and Grammars

CSCI312 Principles of Programming Languages!

Chapter 3: Lexing and Parsing

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

Chapter 3. Describing Syntax and Semantics ISBN

Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

Using an LALR(1) Parser Generator

LR Parsing Techniques

CS606- compiler instruction Solved MCQS From Midterm Papers

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Habanero Extreme Scale Software Research Project

shift-reduce parsing

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Syntax. In Text: Chapter 3

Defining syntax using CFGs

Programming Language Syntax and Analysis

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

In One Slide. Outline. LR Parsing. Table Construction

Part 3. Syntax analysis. Syntax analysis 96

UNIT III & IV. Bottom up parsing

Transcription:

Syntactic Analysis Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree. Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis token stream Syntactic Analysis abstract syntax tree Semantic Analysis annotated AST Synthesis of output program (back-end) Intermediate Code Generation intermediate form Optimization intermediate form Code Generation target language Syntactic Analysis / Parsing Goal: Convert token stream to abstract syntax tree Abstract syntax tree (AST): Captures the structural features of the program Primary data structure for remainder of analysis Three Part Plan Study how context-free grammars specify syntax Study algorithms for parsing / building ASTs Study the minijava Implementation Context-free Grammars Compromise between REs, which can t nest or specify recursive structure General grammars, too powerful, undecidable Context-free grammars are a sweet spot Powerful enough to describe nesting, recursion Easy to parse; but also allow restrictions for speed Not perfect Cannot capture semantics, as in, variable must be declared, requiring later semantic pass Can be ambiguous EBNF, Extended Backus Naur Form, is popular notation CFG Terminology Terminals -- alphabet of language defined by CFG Nonterminals -- symbols defined in terms of terminals and nonterminals Productions -- rules for how a nonterminal (lhs) is defined in terms of a (possibly empty) sequence of terminals and nonterminals Recursion is allowed! Multiple productions allowed for a nonterminal, alternatives Start symbol -- root of the defining language Program ::= Stmt Stmt ::= if ( Expr ) then Stmt else Stmt Stmt ::= while ( Expr ) do Stmt EBNF Syntax of initial MiniJava Program ::= MainClassDecl { ClassDecl } MainClassDecl ::= class ID { public static void main ( String [ ] ID ) { { Stmt } } ClassDecl ::= class ID [ extends ID ] { { ClassVarDecl } { MethodDecl } } ClassVarDecl ::= Type ID ; MethodDecl ::= public Type ID ( [ Formal {, Formal } ] ) { { Stmt } return Expr ; } Formal ::= Type ID Type ::= int boolean ID 1

Initial minijava [continued] Stmt ::= Type ID ; { {Stmt} } if ( Expr ) Stmt else Stmt while ( Expr ) Stmt System.out.println ( Expr ) ; ID = Expr ; Expr ::= Expr Op Expr! Expr Expr. ID( [ Expr {, Expr } ] ) ID this Integer true false ( Expr ) Op ::= + - * / < <= >= > ==!= && RE Specification of initial MiniJava Lex Program ::= (Token Whitespace)* Token ::= ID Integer ReservedWord Operator Delimiter ID ::= Letter (Letter Digit)* Letter ::= a... z A... Z Digit ::= 0... 9 Integer ::= Digit + ReservedWord::= class public static extends void int boolean if else while return true false this new String main System.out.println Operator ::= + - * / < <= >= > ==!= &&! Delimiter ::= ;., = ( ) { } [ ] Derivations and Parse Trees Derivation: a sequence of expansion steps, beginning with a start symbol and leading to a sequence of terminals Parsing: inverse of derivation Given a sequence of terminals (a\k\a tokens) want to recover the nonterminals representing structure Example Grammar E ::= E op E - E ( E ) id op ::= + - * / Can represent derivation as a parse tree, that is, the concrete syntax tree a * ( b + - c ) Ambiguity Some grammars are ambiguous Multiple distinct parse trees for the same terminal string Famous Ambiguity: Dangling Else Stmt ::=... if ( Expr ) Stmt if ( Expr ) Stmt else Stmt Structure of the parse tree captures much of the meaning of the program ambiguity implies multiple possible meanings for the same program if (e 1 ) if (e 2 ) s 1 else s 2 : if (e 1 ) if (e 2 ) s 1 else s 2 2

Resolving Ambiguity Option 1: add a meta-rule For example else associates with closest previous if works, keeps original grammar intact ad hoc and informal Resolving Ambiguity [continued] Option 2: rewrite the grammar to resolve ambiguity explicitly Stmt ::= MatchedStmt UnmatchedStmt MatchedStmt ::=... if ( Expr ) MatchedStmt else MatchedStmt UnmatchedStmt ::= if ( Expr ) Stmt if ( Expr ) MatchedStmt else UnmatchedStmt formal, no additional rules beyond syntax sometimes obscures original grammar Resolving Ambiguity Example Stmt ::= MatchedStmt UnmatchedStmt MatchedStmt ::=... if ( Expr ) MatchedStmt else MatchedStmt UnmatchedStmt ::= if ( Expr ) Stmt if ( Expr ) MatchedStmt else UnmatchedStmt Resolving Ambiguity [continued] Option 3: redesign the language to remove the ambiguity Stmt ::=... if Expr then Stmt end if Expr then Stmt else Stmt end if (e 1 ) if (e 2 ) s 1 else s 2 formal, clear, elegant allows sequence of Stmts in then and else branches, no {, } needed extra end required for every if Another Famous Example E ::= E Op E - E ( E ) id Op ::= + - * / a + b * c : a + b * c Resolving Ambiguity (Option 1) Add some meta-rules, e.g. precedence and associativity rules Example: Operator Preced Assoc E ::= E Op E - E E ++ Postfix ++ Highest Left ( E ) id Op::= + - * / % ** == < && Prefix - ** (Exp) Right Right *, /, % Left +, - ==, < && Lowest Left None Left Left 3

Removing Ambiguity (Option 2) Option2: Modify the grammar to explicitly resolve the ambiguity Strategy: create a nonterminal for each precedence level expr is lowest precedence nonterminal, each nonterminal can be rewritten with higher precedence operator, highest precedence operator includes atomic exprs at each precedence level, use: left recursion for left-associative operators right recursion for right-associative operators no recursion for non-associative operators Redone Example E ::= E0 E0 ::= E0 E1 E1 E1 ::= E1 && E2 E2 E2 ::= E3 (== <) E3 E3 E3 ::= E3 (+ -) E4 E4 E4 ::= E4 (* / %) E5 E5 E5 ::= E6 ** E5 E6 E6 ::= - E6 E7 E7 ::= E7 ++ E8 E8 ::= id ( E ) left associative left associative non associative left associative left associative right associative right associative left associative Designing A Grammar Parsing Algorithms Concerns: Accuracy Unambiguity Formality Readability, Clarity Ability to be parsed by a particular algorithm: Top down parser ==> LL(k) Grammar Bottom up Parser ==> LR(k) Grammar Ability to be implemented using particular approach By hand By automatic tools Given a grammar, want to parse the input programs Check legality Produce AST representing the structure Be efficient Kinds of parsing algorithms Top down Bottom up Top Down Parsing Build parse tree from the top (start symbol) down to leaves (terminals) Basic issue: when "expanding" a nonterminal with some r.h.s., how to pick which r.h.s.? E.g. Stmts ::= Call Assign If While Call ::= Id ( Expr {,Expr} ) Assign ::= Id = Expr ; If ::= if Test then Stmts end if Test then Stmts else Stmts end While ::= while Test do Stmts end Solution: look at input tokens to help decide Predictive Parser Predictive parser: top-down parser that can select rhs by looking at most k input tokens (the lookahead) Efficient: no backtracking needed linear time to parse Implementation of predictive parsers: recursive-descent parser each nonterminal parsed by a procedure call other procedures to parse sub-nonterminals, recursively typically written by hand table-driven parser PDA:like table-driven FSA, plus stack to do recursive FSA calls typically generated by a tool from a grammar specification 4

LL(k) Grammars Can construct predictive parser automatically / easily if grammar is LL(k) Left-to-right scan of input, Leftmost derivation k tokens of look ahead needed, 1 Some restrictions: no ambiguity (true for any parsing algorithm) no common prefixes of length k: If ::= if Test then Stmts end if Test then Stmts else Stmts end no left recursion: E ::= E Op E... a few others Restrictions guarantee that, given k input tokens, can always select correct rhs to expand nonterminal. Easy to do by hand in recursive-descent parser Eliminating common prefixes Can left factor common prefixes to eliminate them create new nonterminal for different suffixes delay choice till after common prefix Before: If ::= if Test then Stmts end if Test then Stmts else Stmts end After: If ::= if Test then Stmts IfCont IfCont ::= end else Stmts end Eliminating Left Recursion Can Rewrite the grammar to eliminate left recursion Before E ::= E + T T T ::= T * F F F ::= id... After E ::= T ECon ECon ::= + T ECon ε T ::= F TCon TCon ::= * F TCon ε F ::= id... Bottom Up Parsing Construct parse tree for input from leaves up reducing a string of tokens to single start symbol (inverse of deriving a string of tokens from start symbol) Shift-reduce strategy: read ( shift ) tokens until seen r.h.s. of correct production xyzabcdef A ::= bc.d reduce handle to l.h.s. nonterminal, ^ then continue done when all input read and reduced to start nonterminal LR(k) LR(k) parsing Left-to-right scan of input, Rightmost derivation k tokens of look ahead Strictly more general than LL(k) Gets to look at whole rhs of production before deciding what to do, not just first k tokens of rhs can handle left recursion and common prefixes fine Still as efficient as any top-down or bottom-up parsing method Complex to implement need automatic tools to construct parser from grammar LR Parsing Tables Construct parsing tables implementing a FSA with a stack rows: states of parser columns: token(s) of lookahead entries: action of parser shift, goto state X reduce production X ::= RHS accept error Algorithm to construct FSA similar to algorithm to build DFA from NFA each state represents set of possible places in parsing LR(k) algorithm builds huge tables 5

LALR-Look Ahead LR LALR(k) algorithm has fewer states ==> smaller tables less general than LR(k), but still good in practice size of tables acceptable in practice k == 1 in practice most parser generators, including yacc and jflex, are LALR(1) Global Plan for LR(0) Parsing Goal: Set up the tables for parsing an LR(0) grammar Add S --> S$ to the grammar, i.e. solve the problem for a new grammar with terminator Compute parser states by starting with state 1 containing added production, S -->.S$ Form closures of states and shifting to complete diagram Convert diagram to transition table for PDA Step through parse using table and stack LR(0) Parser Generation Example grammar: S ::= S $ // always add this production S ::= beep { L } L ::= S L ; S Key idea: simulate where input might be in grammar as it reads tokens "Where input might be in grammar" captured by set of items, which forms a state in the parser s FSA LR(0) item: lhs ::= rhs production, with dot in rhs somewhere marking what s been read (shifted) so far LR(k) item: also add k tokens of lookahead to each item Initial item: S ::=. S $ Closure Initial state is closure of initial item closure: if dot before non-terminal, add all productions for that non-terminal with dot at the start "epsilon transitions" Initial state (1): S ::=. S $ S ::=. beep S ::=. { L } State Transitions Given set of items, compute new state(s) for each symbol (terminal and non-terminal) after dot state transitions correspond to shift actions New item derived from old item by shifting dot over symbol do closure to compute new state Initial state (1): S ::=. S $ S ::=. beep S ::=.{ L } State (2) reached on transition that shifts S: S ::= S. $ State (3) reached on transition that shifts beep: S ::= beep. S ::= {. L } State (4) reached on transition that shifts {: L ::=. S L ::=. L ; S S ::=. beep S ::=. { L } Accepting Transitions If state has S ::=.... $ item, then add transition labeled$ to the accept action Example: S ::= S. $ has transition labeled $ to accept action 6

Reducing States If state has lhs ::= rhs. item, then it has a reduce lhs ::= rhs action Example: S ::= beep. has reduce S ::= beep action No label; this state always reduces this production what if other items in this state shift, or accept? what if other items in this state reduce differently? Rest of the States, Part 1 State (4): if shift beep, goto State (3) State (4): if shift {, goto State (4) State (4): if shift S, goto State (5) State (4): if shift L, goto State (6) State (5): L ::= S. State (6): S ::= { L. } L ::= L. ; S State (6): if shift }, goto State (7) State (6): if shift ;, goto State (8) State (7): S ::= { L }. State (8): L ::= L ;. S S ::=. beep S ::=. { L } Rest of the States (Part 2) State (8): if shift beep, goto State (3) State (8): if shift {, goto State (4) State (8): if shift S, goto State (9) State (9): L ::= L ; S. (whew) LR(0) State Diagram S ::= S $ S ::= beep { L } L ::= S L ; S 1 3 S -->.S$ beep S --> beep. S -->.{L} beep S -->.beep { 4 S --> {.L} L -->.S L -->.L;S S { S -->.{L} S -->.beep 2 S --> S.$ S 5 L --> S. 9 L --> L;S. S beep 8 L --> L;.S S -->.{L} { S --> beep ; 6 L S --> {L.} L --> L.;S } 7 S --> {L}. Building Table of States & Transitions Table of This Grammar Create a row for each state Create a column for each terminal, non-terminal, and $ For every "state (i): if shift X goto state (j)" transition: if X is a terminal, put "shift, goto j" action in row i, column X if X is a non-terminal, put "goto j" action in row i, column X For every "state (i): if $ accept" transition: put "accept" action in row i, column $ For every "state (i): lhs ::= rhs." action: put "reduce lhs ::= rhs" action in all columns of row i State { 1 s,g4 2 3 4 s,g4 5 6 7 8 s,g4 9 } beep ; S L s,g3 g2 reduce S ::= beep s,g3 g5 g6 reduce L ::= S s,g7 s,g8 reduce S ::= { L } s,g3 g9 reduce L ::= L ; S $ a! 7

St { } beep ; S L $ 1 s,g4 s,g3 g2 Example 2 3 4 s,g4 reduce S ::= beep s,g3 g5 g6 a! Problems In Shift-Reduce Parsing S ::= S $ S ::= beep { L } L ::= S L ; S reduce L ::= L ; S 1 { beep ; { beep } } $ 1 { 4 beep ; { beep } } $ 1 { 4 beep 3 ; { beep } } $ 1 { 4 S 5 ; { beep } } $ 1 { 4 L 6 ; { beep } } $ 1 { 4 L 6 ; 8 { beep } } $ 1 { 4 L 6 ; 8 { 4 beep } } $ 1 { 4 L 6 ; 8 { 4 beep 3 } } $ 1 { 4 L 6 ; 8 { 4 S 5 } } $ 1 { 4 L 6 ; 8 { 4 L 6 } } $ 1 { 4 L 6 ; 8 { 4 L 6 } 7 } $ 1 { 4 L 6 ; 8 S 9 } $ 1 { 4 L 6 } $ 1 { 4 L 6 } 7 $ 1 S 2 $ accept 5 6 7 8 9 s,g4 s,g7 reduce L ::= S reduce S ::= { L } s,g3 s,g8 g9 Can write grammars that cannot be handled with shift-reduce parsing Shift/reduce conflict: state has both shift action(s) and reduce actions Reduce/reduce conflict: state has more than one reduce action Shift/Reduce Conflicts LR(0) example: E ::= E + T T State: E ::= E. + T E ::= T. Can shift + Can reduce E ::= T LR(k) example: S ::= if E then S if E then S else S... State: S ::= if E then S. S ::= if E then S. else S Can shift else Can reduce S ::= if E then S Avoiding Shift-Reduce Conflicts Can rewrite grammar to remove conflict E.g. Matched Stmt vs. Unmatched Stmt Can resolve in favor of shift action try to find longest r.h.s. before reducing works well in practice yacc, jflex, et al. do this Reduce/Reduce Conflicts Example: Stmt ::= Type id ; LHS = Expr ;...... LHS ::= id LHS [ Expr ]...... Type ::= id Type []... State: Type ::= id. LHS ::= id. Can reduce Type ::= id Can reduce LHS ::= id Avoid Reduce/Reduce Conflicts Can rewrite grammar to remove conflict can be hard e.g. C/C++ declaration vs. expression problem e.g. MiniJava array declaration vs. array store problem Can resolve in favor of one of the reduce actions but which? yacc, jflex, et al. Pick reduce action for production listed textually first in specification 8

Abstract Syntax Trees The parser s output is an abstract syntax tree (AST) representing the grammatical structure of the parsed input ASTs represent only semantically meaningful aspects of input program, unlike concrete syntax trees which record the complete textual form of the input There s no need to record keywords or punctuation like (), ;, else The rest of compiler only cares about the abstract structure AST Node Classes Each node in an AST is an instance of an AST class IfStmt, AssignStmt, AddExpr, VarDecl, etc. Each AST class declares its own instance variables holding its AST subtrees IfStmt has testexpr, thenstmt, and elsestmt AssignStmt has lhsvar and rhsexpr AddExpr has arg1expr and arg2expr VarDecl has typeexpr and varname AST Class Hierarchy AST classes are organized into an inheritance hierarchy based on commonalities of meaning and structure Each "abstract non-terminal" that has multiple alternative concrete forms will have an abstract class that s the superclass of the various alternative forms Stmt is abstract superclass of IfStmt, AssignStmt, etc. Expr is abstract superclass of AddExpr, VarExpr, etc. Type is abstract superclass of IntType, ClassType, etc. AST Extensions For Project New variable declarations: StaticVarDecl New types: DoubleType ArrayType New/changed statements: IfStmt can omit else branch ForStmt BreakStmt ArrayAssignStmt New expressions: DoubleLiteralExpr OrExpr ArrayLookupExpr ArrayLengthExpr ArrayNewExpr Automatic Parser Generation in MiniJava We use the CUP tool to automatically create a parser from a specification file, Parser/minijava.cup The MiniJava Makefile automatically rebuilds the parser whenever its specification file changes A CUP file has several sections: introductory declarations included with the generated parser declarations of the terminals and nonterminals with their types The AST node or other value returned when finished parsing that nonterminal or terminal precedence declarations productions + actions Terminal and Nonterminal Declarations Terminal declarations we saw before: /* reserved words: */ terminal CLASS, PUBLIC, STATIC, EXTENDS;... /* tokens with values: */ terminal String IDENTIFIER; terminal Integer INT_LITERAL; Nonterminals are similar: nonterminal Program Program; nonterminal MainClassDecl MainClassDecl; nonterminal List/*<...>*/ ClassDecls; nonterminal RegularClassDecl ClassDecl;... nonterminal List/*<Stmt>*/ Stmts; nonterminal Stmt Stmt; nonterminal List/*<Expr>*/ Exprs; nonterminal List/*<Expr>*/ MoreExprs; nonterminal Expr Expr; nonterminal String Identifier; 9

Precedence Declarations Can specify precedence and associativity of operators equal precedence in a single declaration lowest precedence textually first specify left, right, or nonassoc with each declaration Examples: precedence left AND_AND; precedence nonassoc EQUALS_EQUALS, EXCLAIM_EQUALS; precedence left LESSTHAN, LESSEQUAL, GREATEREQUAL, GREATERTHAN; precedence left PLUS, MINUS; precedence left STAR, SLASH; precedence left EXCLAIM; precedence left PERIOD; Productions All of the form: LHS ::= RHS1 {: Java code 1 :} RHS2 {: Java code 2 :}... RHSn {: Java code n :}; Can label symbols in RHS with:var suffix to refer to its result value in Java code varleft is set to line in input where var symbol was E.g.: Expr ::= Expr:arg1 PLUS Expr:arg2 {: RESULT = new AddExpr( arg1,arg2,arg1left);:} INT_LITERAL:value{: RESULT = new IntLiteralExpr( value.intvalue(),valueleft);:} Expr:rcvr PERIOD Identifier:message OPEN_PAREN Exprs:args CLOSE_PAREN {: RESULT = new MethodCallExpr( rcvr,message,args,rcvrleft);:}... ; Error Handling How to handle syntax error? Option 1: quit compilation + easy - inconvenient for programmer Option 2: error recovery + try to catch as many errors as possible on one compile - difficult to avoid streams of spurious errors Option 3: error correction + fix syntax errors as part of compilation - hard!! Panic Mode Error Recovery When finding a syntax error, skip tokens until reaching a landmark landmarks in MiniJava: ;, ), } once a landmark is found, hope to have gotten back on track In top-down parser, maintain set of landmark tokens as recursive descent proceeds landmarks selected from terminals later in production as parsing proceeds, set of landmarks will change, depending on the parsing context In bottom-up parser, can add special error nonterminals, followed by landmarks if syntax error, then will skip tokens till seeing landmark, then reduce and continue normally E.g. Stmt ::=... error ; { error } Expr ::=... ( error ) 10