POLITECNICO DI TORINO. (01JEUHT) Formal Languages and Compilers. Laboratory N 3. Lab 3. Cup Advanced Use

Similar documents
Parser and syntax analyzer. Context-Free Grammar Definition. Scanning and parsing. How bottom-up parsing works: Shift/Reduce tecnique.

Compilation 2013 Parser Generators, Conflict Management, and ML-Yacc

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

For example, we might have productions such as:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Yacc Yet Another Compiler Compiler

COMPILER (CSE 4120) (Lecture 6: Parsing 4 Bottom-up Parsing )

Introduction to Parsing Ambiguity and Syntax Errors

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

Using JFlex. "Linux Gazette...making Linux just a little more fun!" by Christopher Lopes, student at Eastern Washington University April 26, 1999

Compilers and Language Processing Tools

Introduction to Parsing Ambiguity and Syntax Errors

Last Time. What do we want? When do we want it? An AST. Now!

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Compilers. Bottom-up Parsing. (original slides by Sam

LECTURE 3. Compiler Phases

Syntax-Directed Translation

CSE 401 Midterm Exam Sample Solution 11/4/11

POLITECNICO DI TORINO. Formal Languages and Compilers. Laboratory N 1. Laboratory N 1. Languages?

Formal Languages and Compilers

Lexical and Syntax Analysis

Syntax Analysis Part IV

A clarification on terminology: Recognizer: accepts or rejects strings in a language. Parser: recognizes and generates parse trees (imminent topic)

Introduction to Parsing

Bottom-Up Parsing. Lecture 11-12

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Recursive Descent Parsers

Parsing Techniques. AST Review. AST Data Structures. Implicit AST Construction. AST Construction CS412/CS413. Introduction to Compilers Tim Teitelbaum

Conflicts in LR Parsing and More LR Parsing Types

Bottom-Up Parsing. Lecture 11-12

LECTURE 11. Semantic Analysis and Yacc

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Parser Tools: lex and yacc-style Parsing

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Agenda. Previously. Tentative syllabus. Fall Compiler Principles Lecture 5: Parsing part 4 12/2/2015. Roman Manevich Ben-Gurion University

Abstract Syntax. Mooly Sagiv. html://

CSCI Compiler Design

Introduction to Programming Using Java (98-388)

Configuration Sets for CSX- Lite. Parser Action Table

(01JEUHT) Formal Languages and Compilers

Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev

CSE 3302 Programming Languages Lecture 2: Syntax

Compiler Construction Assignment 3 Spring 2018

A Simple Syntax-Directed Translator

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:

Ambiguity and Errors Syntax-Directed Translation

EECS 6083 Intro to Parsing Context Free Grammars

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Lecture 8: Deterministic Bottom-Up Parsing

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Properties of Regular Expressions and Finite Automata

Fall Compiler Principles Lecture 5: Parsing part 4. Roman Manevich Ben-Gurion University

CSE 401 Midterm Exam Sample Solution 2/11/15

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Lecture 7: Deterministic Bottom-Up Parsing

Context-free grammars

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Let s look at the CUP specification for CSX-lite. Recall its CFG is

Principles of Programming Languages

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

How do LL(1) Parsers Build Syntax Trees?

shift-reduce parsing

In One Slide. Outline. LR Parsing. Table Construction

CS 11 Ocaml track: lecture 6

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Compiler Construction: Parsing

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.

CUP. Lecture 18 CUP User s Manual (online) Robb T. Koether. Hampden-Sydney College. Fri, Feb 27, 2015

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Introduction to Compiler Design

Context-free grammars (CFG s)

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

COMP 181. Prelude. Prelude. Summary of parsing. A Hierarchy of Grammar Classes. More power? Syntax-directed translation. Analysis

3. Parsing. Oscar Nierstrasz

Using an LALR(1) Parser Generator

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

Parsing II Top-down parsing. Comp 412

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

Parsing Techniques. AST Review. AST Data Structures. LL AST Construction. AST Construction CS412/CS413. Introduction to Compilers Tim Teitelbaum

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

Plan for Today. Regular Expressions: repetition and choice. Syntax and Semantics. Context Free Grammars

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

CSE 401/M501 18au Midterm Exam 11/2/18. Name ID #

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Exercise 1: Balanced Parentheses

Defining syntax using CFGs

CSE 582 Autumn 2002 Exam Sample Solution

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Introduction to Lexical Analysis

Building a Parser Part III

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Transcription:

POLITCNICO DI TORINO (01JUHT) Laboratory N 3 tefano canzio Mail: Web: http://www.skenz.it/compilers 1 Cup Advanced Use Grammars with ambiguities s Operator precedence Handling syntax errors 2

Ambiguous grammars in CUP Conflicts can arise when the grammar is ambiguous This implies that the parser must choose between two or more alternative actions. The problem can be solved by modifying the grammar (in order to make it non-ambiguous) or by instructing the parser on how to handle ambiguity. The latter option requires that the parsing algorithm is fully understood, in order to avoid unwanted / wrong behaviors. 3 Ambiguous Grammar A grammar is ambiguous if there is at least one sequence of symbols for which two or more distinct parse trees exist. xercise: find all parse trees for if (i==1) if (j==2) a=0 else a=1 given the grammar: ::= M M ::= if C M M ::= if C M else' M M ::= ID = NUM ID = ID C ::= ( VAR == NUM ) 4

Non-ambiguous grammar: if-then-else statement It is possible to write a non-ambiguous grammar for the if-else statements, as follows: ::= M U U ::= if C U ::= if C M else U M ::= if C M else M M ::= ID = NUM ID = ID C ::= ( ID == NUM ) if (i==1) if (j==2) a=0 else a=1 5 Non-ambiguous grammar : Algebraic expressions The non-ambiguous grammar that describes algebraic expressions is: ::= ::= + T ::= - T ::= T T ::= T * F T ::= T / F T ::= F F ::= ( ) F ::= NUM The symbols T and F are used to solve the ambiguity given by the priority of operators * and / over the operators + e -. 6

Ambiguous grammars in Cup: shift-reduce conflict (I) 1) ::= IF THN 2) ::= IF THN L 3) ::= V Input: IF THN IF THN (*) L The next token is L 2 possible actions: THN IF THN IF Top of tack HIFT L token into the tack => Rule 2 IF THN IF THN V L V RDUC the first 4 top elements of the tack => Rule 1 IF THN IF THN V L V 7 Ambiguous grammars in Cup: shift-reduce conflict (II) 1) ::= IF THN 2) ::= IF THN L 3) ::= V Input IF THN IF THN V L V *** hift/reduce conflict found in state #8 between ::= IF THN (*) and ::= IF THN (*) L under symbol L Resolved in favor of shifting. IF THN IF THN V L V Cup performs a shift action. 8

Ambiguous grammars in Cup: reduce-reduce conflict (I) Input a b The next token is OF 2 possible actions: b Top of tack RDUC the first 2 top elements of the tack => Rule 3 RDUC the first top element of the tack => Rule 4 a 1) ::= a B B B 2) ::= B 3) B ::= a b a b a b 4) B ::= b 9 Ambiguous grammars in Cup: reduce-reduce conflict (II) b a Top of tack *** Reduce/Reduce conflict found in state #7 between B ::= b (*) and B ::= a b (*) under symbols: {OF} 1) ::= a B 2) ::= B 3) B ::= a b 4) B ::= b Resolved in favor of the second production. Cup performs a reduction using the first defined rule (3). 10

s (I) xamples of lists: with at least one element, separated with commas C: ::= //without C ::= C Parse tree of 3 (without C) of elements, possibly empty (first example): ::= ε ::= Parse tree mpty list of 3 ε 11 xamples of lists: of elements, possibly empty (second example): s (II) ame sequence of input tokens, 2 different parse trees => AMBIGUOU GRAMMAR of elements, possibly empty (WRONG example): ::= ε ::= ε Parse tree mpty list of 3 ε ε mpty list ε Parse tree of 3 (I) of 3 (II) ε 12

s (III) xamples of lists: of at least 3 elements: of at least 3 elements in an odd number: ::= ::= Parse tree of 4 Parse tree of 5 13 Precedence ection: Ambiguous grammars Ambiguous grammars can result in fewer, simpler rules, and hence can be sometimes preferred. It is necessary to provide disambiguating rules in those cases. A typical example is given by algebraic expressions: Non-ambiguous grammar ::= ::= + T ::= - T ::= T T ::= T * F T ::= T / F T ::= F F ::= ( ) F ::= INTGR Ambiguous grammar ::= + ::= - ::= * ::= / ::= ( ) ::= INTGR 14

Associativity Left-associative operator ( ::= + ) 1+2+3+4 3+3+4 6+4 10 Right-associative operator ( ::= + ) 1+2+3+4 1+2+7 1+9 10 The assignment operator = is right-associative: a = b = 3 The power operator is also right-associative 3^2^2 3^4 81 15 Precedence ection: Operators Rule #1 (as well as Rule #2) is ambiguous Associativity of the + ( * ) operator is not specified Moreover, the precedence of the + and * is not specified by Rules #1 and #2 It is possible to make these rules non-ambiguous by adding information in the precedence section. The keyword precedence left defines a left-associative operator, precedence right a right-associative operator, whereas precedence nonassoc defines a nonassociative operators. The order in which precedence keywords are declared is inversely proportional to their priority. 1) ::= + 2) ::= * 3) ::= ( ) 4) ::= INT 16

Precedence ection: Disambiguating rules To each production that contains at least one terminal defined as operator, Cup associates the precedence and associativity of the rightmost operator. If the rule is followed by the keyword %prec, the precedence and associativity are those of the specified operator. In the case of a shift-reduce conflict, the action corresponding to the highest precedence production is executed. If the precedence is the same, associativity is used: leftassociativity results in a reduce action, right-associativity in a shift action. 17 Precedence ection: xample terminal uminus precedence left +, - /* Low priority */ precedence left *, / precedence left uminus /* High priority */ start with ::= + - * / - %prec uminus ( ) INTGR 18

User code Directives are available to insert user code directly in the parser. They are useful for Personalizing the parser behavior Adding code directly in the class that implements the parser Using a scanner generator different from the default one (JFlex) They are: init with {: :} This code is executed before calling any scanner method, hence before any terminal symbol is passed to the parser It is used to inizialize variables or to initialize the scanner in the case JFlex is not used. 19 User code (II) scan with {: :} Indicates to the parser which procedure to use to request the next terminal to the scanner It must return an object of the class java_cup.runtime.ymbol It is used for non-default scanner generators (different than JFlex) scan with {: return scanner.next_token() :} When CUP generates the java file that implements the parser, two classes are defined: public class parser extends java_cup.runtime.lr_parser parser is the java class that implements the parser and inherits different methods from the java_cup.runtime.lr_parser class class CUP$parser$actions CUP$parser$actions is the class where declared grammar rules are translated into a java program. Here, also semantic actions (i.e., the java code related to each rule) are reported 20

User code (III) The java_cup.runtime.lr_parser class is implemented in the file java_cup/runtime/lr_parser.java, in the CUP installation directory parser code {: :} The code is included in the parser class It is used to include scanning methods within the parser but usually to override parser methods (e.g. to override methods for error handling) action code {: :} The code included in this directive is copied as is in the CUP$parser$actions class The code is reachable only in the semantic actions associated with grammar rules It is used to define procedures and variables to be used in the actions associated to the grammar (e.g., symbol table) 21 scanner.flex rrors: Printing line and column import java_cup.runtime.* %% %cup %line %column ymbol constructors: public ymbol( int sym_id) public ymbol( int sym_id, int left, int right) public ymbol( int sym_id, Object o) public ymbol( int sym_id, int left, int right, Object o) %{ private ymbol symbol(int type){ return new ymbol(type, yyline, yycolumn) } private ymbol symbol(int type, Object value){ //emantic analysis return new ymbol(type, yyline, yycolumn,value) } %} %% [a-z] { return symbol(sym.l) }, { return symbol(sym.cm) } 22

parser.cup rrors: Printing line and column import java_cup.runtime.* parser code {: public void report_error(tring message, Object info) { tringbuffer m = new tringbuffer(message) if (info instanceof ymbol) { if (((ymbol)info).left!= -1 && ((ymbol)info).right!= -1) { int line = (((ymbol)info).left)+1 int column = (((ymbol)info).right)+1 m.append(" (line "+line+", column "+column+")") } } ystem.err.println(m) } :} 23 Handling syntax error (I) Generally speaking, when a parser finds an error it should not immediately terminate the execution A compiler usually tries to recover from the error in order to analyze the rest of the input and signal the highest possible number of errors As default, a CUP-generated parser when an error is detected: ignals by means of the method public void syntax_error(ymbol cur_token) defined in the java_cup.runtime.lr_parser class a syntax error, writing yntax error in stderr. If the error is not managed by the parser through the predefined error symbol, the parser call the public void unrecovered_syntax_error(ymbol cur_token) method, also defined in java_cup.runtime.lr_parser. This function, after writing Couldn't repair and continue parse in stderr (to notify the user of an unrecoverable syntax error), stops the execution of the parser. 24

Handling syntax error (II) Analyzing the two functions in detail: public void syntax_error(ymbol cur_token) Calls the function report_error with the following parameters report_error("yntax error", cur_token) Where, when an error occurs, cur_token is the currently looahead symbol public void unrecovered_syntax_error(ymbol cur_token) Calls the function report_fatal_error, with the following parameters report_fatal_error("couldn't repair and continue parse", cur_token) The report_fatal_error function calls with the same parameters report_error and it launches an exception that causes the end of the parser A suitable redefinition, in parser code {: :}, of the listed functions, allow to customize errors management 25 error predefined symbol The error predefined symbol signals an error condition. It can be used within the grammar in order to enable the parser to continue execution when an error is encountered. xample: ass ::= ID Q ID Q error 26

How does Cup handle the error symbol? When an error occurs, the parser will start emptying the stack until a state is found in which the error symbol is allowed In the previous example, uncorrect (i.e. symbol sequences that cannot be reduced as ) are removed from the stack, until the terminal Q is found on the top of the stack. The error token is shifted in the stack If the next token is acceptable, the parser resumes syntax analysis. Otherwise the parser will continue to read and discard tokens, until an acceptable one is found In the prevoius example, the parser will read and discard all tokens until is found. 27 ome general rules A simple strategy for error handling is skipping the current statement: stmt ::= error ometimes it can be useful to find a closing symbol corresponding to an opening symbol: expr ::= ( expr ) ( error ) Note: to limit the generation of spurious error messages, after an error occurs, error signaling is suspended until at least three consecutive tokens are shifted. 28

Grammar file ::= funcs stmts ::= /* empty */ stmts stmt funcs ::= /* empty */ funcs func func ::= ID '(' ')' compound compound ::= '{' stmts '}' stmt ::= exp '' compound exp ::= NUM exp '+' exp exp '-' exp exp '*' exp exp '/' exp '-' exp %prec NG ( exp ) 29 tatements and expressions stmt ::= exp '' compound error ' {: ystem.err.println("syntax error in statement ) :} compound ::= '{' stmts '}' '{' stmts error '} {: ystem.err.println("missing before '} ) :} exp ::= '(' error ') {: ystem.err.println(" syntax error in expression") :} 30