A clarification on terminology: Recognizer: accepts or rejects strings in a language. Parser: recognizes and generates parse trees (imminent topic)

Similar documents
Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Topic 3: Syntax Analysis I

Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev

Compilers. Bottom-up Parsing. (original slides by Sam

Context-free grammars

UNIVERSITY OF CALIFORNIA

Chapter 4. Lexical and Syntax Analysis

COMP 181. Prelude. Prelude. Summary of parsing. A Hierarchy of Grammar Classes. More power? Syntax-directed translation. Analysis

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

CS 4120 Introduction to Compilers

Compiler Construction: Parsing

Fall Compiler Principles Lecture 5: Parsing part 4. Roman Manevich Ben-Gurion University

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

How do LL(1) Parsers Build Syntax Trees?

Abstract Syntax. Mooly Sagiv. html://

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Syntax-Directed Translation

Recursive Descent Parsers

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Syntax-Directed Translation. Lecture 14

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Context-free grammars (CFG s)

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Monday, September 13, Parsers

Lecture Code Generation for WLM

For example, we might have productions such as:

CS 415 Midterm Exam Spring 2002

Alternatives for semantic processing

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Introduction to Parsing

CS 11 Ocaml track: lecture 6

Parsing II Top-down parsing. Comp 412


Syntactic Analysis. Top-Down Parsing

Programming Language Syntax and Analysis

4. Lexical and Syntax Analysis

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Using an LALR(1) Parser Generator

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Syntax. In Text: Chapter 3

Project 2 Interpreter for Snail. 2 The Snail Programming Language

Defining syntax using CFGs

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

4. Lexical and Syntax Analysis

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

CS 406/534 Compiler Construction Parsing Part I

Wednesday, August 31, Parsers

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Compiler construction in4020 lecture 5

Chapter 3. Describing Syntax and Semantics ISBN

Parsing III. (Top-down parsing: recursive descent & LL(1) )

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:

Lexical and Syntax Analysis (2)

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

MP 3 A Lexer for MiniJava

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

Lexical and Syntax Analysis

In One Slide. Outline. LR Parsing. Table Construction

EECS 6083 Intro to Parsing Context Free Grammars

COP 3402 Systems Software Syntax Analysis (Parser)

Topic 5: Syntax Analysis III

Compiler Theory. (Semantic Analysis and Run-Time Environments)

3. Parsing. Oscar Nierstrasz

Reachability and error diagnosis in LR(1) parsers

CS 230 Programming Languages

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Syntax Analysis Part I

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Chapter 3. Parsing #1

Lecture 8: Deterministic Bottom-Up Parsing

LECTURE 11. Semantic Analysis and Yacc

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

Building a Parser Part III

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Bottom-Up Parsing. Lecture 11-12

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

ASTs, Objective CAML, and Ocamlyacc

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Compiler construction 2002 week 5

Lecture 7: Deterministic Bottom-Up Parsing

Yacc: A Syntactic Analysers Generator

MP 3 A Lexer for MiniJava

CSE 340 Fall 2014 Project 4

Table-Driven Top-Down Parsers

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Transcription:

A clarification on terminology: Recognizer: accepts or rejects strings in a language Parser: recognizes and generates parse trees (imminent topic) Assignment 3: building a recognizer for the Lake expression language CS 202-9 assignment 3 1 A general comment: Many details in PL implementation (lots of space for ambiguity) No one will suffer if they forget to dot an i 10 years from now: Will you remember whether an octal has a leading \? Will you remember what top-down vs. bottomup parsing is? CS 202-9 assignment 3 2 How does parser recover from syntax error? Three choices: 1. Insert a token guess what token the user forgot 2. Delete a token guess what extra token the user inserted 3. Replace a token guess what token the user really meant Can mix and match all 3 approaches CS 202-9 error recovery 3 1

Common LL(1) strategy: skip all tokens not in follow set parsefact(token nexttoken ) { switch (nexttoken) { case ID: break; case NUM: break; case LPAREN: parseexp(yylex()); consume(rparen); break; default: error( Expected an ID, NUM or ( ); scanto(follow(factor)); break; } } CS 202-9 recursive descent recovery 4 Common strategy: insert token if looking for specific token parsefact(token nexttoken ) { switch (nexttoken) { case ID: break; case NUM: break; case LPAREN: parseexp(yylex()); consume_or_insert(rparen); break; default: error( Expected an ID, NUM or () ); scanto(follow(factor)); break; } } CS 202-9 recursive descent error recovery 5 Common strategy: substitute in special cases parsefact(token nexttoken ) { switch (nexttoken) { case ID: break; case NUM: break; case LPAREN: parseexp(yylex()); consume(rparen); break; case FOR: IF: error( expected an ID ); break; default: error( Expected an ID, NUM or ( ); scanto(follow(factor)); break; } } CS 202-9 recursive descent error recovery 6 2

A warning on substitution strategy: If original token was in the follow set (indicating ommitted tokens), insert strategy may be better Substitution may introduce cascading errors (in next production) CS 202-9 error recovery 7 Basically same approach to error recovery for LR(1) Primary approach is skip tokens Look for synchronizing tokens things like ) ; else Selected insertion or substitution can be added CS 202-9 LR(1) error recovery 8 Java cup (and bison/yacc) Compiler writer defines error recovery Typically use error non-terminal Pre-defined non terminal Match only when error reported example: expr ::= LPAREN error RPAREN CS 202-9 error production 9 3

When syntax error detected Parser pops off states until state with error recovery found Shifts on error non terminal Discards tokens until synchronizing token found Then resume normal parsing CS 202-9 error processing 10 factor ( error ) + factor ( pop states ( a + + ) CS 202-9 error example 11 factor ( error ) error ( shift error ( a + + ) CS 202-9 error example 12 4

factor ( error ) error ( skip token + ( a + + ) CS 202-9 error example 13 factor ( error ) ) error ( push ) ( a + + ) CS 202-9 error example 14 factor ( error ) factor reduce factor ( error ) ( a + + ) CS 202-9 error example 15 5

For other recovery strategies, add wrong productions: To replace token a_expr ::= access ASSIGN expr SEMI; a_expr ::= access EQ expr SEMI; To insert access ::= access LBRACK expr RBRACK access ::= access LBRACK RBRACK Must add explicit messages for these CS 202-9 grammar changes 16 Strategies often fail to recognize real error consider: fi (b) x = 1; else x = 2; Will report 2 errors: missing ; after fi(b) unexpected else But: real problem is mis-spelled if CS 202-9 recovery limitations 17 Some research in global error recovery: Try various strategies in various combinations, including looking back Compare results of different strategies, pick best solution (usually shortest fix) Interesting theoretically, currently too inefficient in practice CS 202-9 global error recovery 18 6

So far we are working on a recognizer Only silently accepts or reports an error message Not very useful for compilation Need to do work with each derivation step Need to add parser actions CS 202-9 on to a parser 19 Every parser generator gives an option to perform actions on each reduction java_cup is much like yacc or bison Insert actions in the grammar as code java_cup actions in {: :} CS 202-9 parser actions 20 Cup actions anywhere in grammar Unlike lex Usually at the end Any code can go in actions Not usually return; would end parse! expr ::= expr PLUS factor {: System.out.println( x+f ); :}; CS 202-9 defining actions 21 7

printing x+f still not very useful Want to get values from rhs elements Can name and reference each piece Use :id to name pieces refer to as id in action expr ::= INT_CONST:l PLUS INT_CONST:r {: System.out.println( sum = + (l.intvalue()+r.intvalue())); :}; CS 202-9 values from production 22 Can use value of any rhs element Token or non terminal What is its type? Must declare types when declaring terminals and non terminals terminal Integer INT_CONST; non terminal Integer expr; CS 202-9 typing the values 23 For tokens: Value is value assigned during lexing Third argument to LakeSym() What is value of non terminal? Set by each production rule Assign to RESULT (implicitly labels lhs) expr ::= expr:l PLUS expr: r {: RESULT =new Integer (l.intvalue() + r.intvalue()); :}; CS 202-9 values on non terminals 24 8

Can now build working calculator: terminal Integer NUMBER; terminal PLUS,MINUS,TIMES,DIVIDE; terminal LPAREN, RPAREN; non terminal Integer expr,term,factor; non terminal calculation; CS 202-9 calculator example 25 calculation ::= expr :e {: System.out.println(e); :}; expr ::= expr:l PLUS term:r {: RESULT = new Integer (l.intvalue()+r.intvalue()); :}; expr ::= expr:l MINUS term:r {: RESULT = new Integer (l.intvalue()-r.intvalue()); :}; expr ::= term:t {: RESULT = t; :}; CS 202-9 calculator example continued 26 term ::= term:l TIMES factor:r {: RESULT = new Integer (l.intvalue()*r.intvalue()); :}; term ::= term:l DIVIDE factor:r {: RESULT = new Integer (l.intvalue()/r.intvalue()); :}; term ::= factor:f {: RESULT = f; :}; factor ::= NUMBER:n {: RESULT = n; :}; factor ::= LPAREN expr:e RPAREN {: RESULT = e; :}; CS 202-9 calculator example end 27 9

This was example of an interpreter Behavior of source language defined in terms of another language Compiler transforms source code into machine language; no mediation CS 202-9 embedded actions 28 Most actions are at end of production they need not be Sometimes embedded actions useful block ::= LBRACE {: startblock(); :} declarations statements RBRACE {: endblock(); :}; startblock will run before any declarations reduced CS 202-9 embedded actions 29 What if two rules for block? block ::= LBRACE {: startblock(); :} declarations statements RBRACE {: endblock(); :}; block ::= LBRACE RBRACE; Parser reads token past LBRACE before calling startblock Can cause unexpected behavior if action affects next token more problems in yacc/lex CS 202-9 problems 30 10

Each action is assumed distinct on parse stack block ::= LBRACE {: startblock(); :} declarations statements RBRACE {: endblock(); :}; block ::= LBRACE {: startblock(); :} declarations RBRACE {: endblock(); :}; Introduces conflict don t know which action until RBRACE CS 202-9 ambiguous actions 31 Fix with common rule block ::= LBRACE startblock declarations statements RBRACE {: endblock(); :}; block ::= LBRACE startblock declarations RBRACE {: endblock(); :}; startblock ::= {: startblock(); :}; CS 202-9 resolving internal actions 32 Many parse trees thus far Most of them: have tokens on leaves have non terminals on interior nodes Reading the leaves gives the entire program as a sequence of tokens Called a concrete parse tree CS 202-9 reading the leaves 33 11

Assume the grammar expr ::= expr PLUS mult_expr; expr ::= expr MINUS mult_expr; expr ::= mult_expr; mult_expr ::= mult_expr TIMES factor; mult_expr ::= mult_expr DIV factor; mult_expr ::= factor; factor ::= ID; factor ::= NUMBER; factor ::= LPAREN expr RPAREN; CS 202-9 example grammar 34 What is concrete parse tree for a * (b - c) CS 202-9 concrete example 35 mult_expr factor ID expr mult_expr * a * (b - c) factor expr ( ) expr + mult_expr mult_expr factor ID factor ID CS 202-9 concrete parse tree example 36 12

Concrete parse trees are messy Include specifics of language Include specifics of grammar Want the semantics of program Leave behind all the clutter Called an abstract parse tree Occasionally used abstract trees CS 202-9 abstract parse trees 37 Abstract parse tree for a * ( b - c): * ID - ID ID CS 202-9 abstract example 38 Obviously much simpler And much easier to work with Abstracted away grammar Don t care about mult_expr Abstracted away language Don t care infix/prefix/rpn/ CS 202-9 abstract advantage 39 13

Compilers use abstract parse trees Assume abstract parse trees now Unless specified otherwise But remember ambiguity is defined on concrete parse trees: If distinct concrete parse trees exist for same sentence, grammar ambiguous CS 202-9 compiler parse trees 40 Create parse trees with actions Type of non terminals is ParseTree: non terminal ParseTree expr,stmt; stmt ::= expr:e SEMI {: RESULT = e; :}; expr ::= expr:l PLUS term:r {: RESULT = new BinParseTree(l,r, BinParseTree.Plus); :}; CS 202-9 creating parse trees 41 14