Exercise 1: Balanced Parentheses

Similar documents
Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Compilers Course Lecture 4: Context Free Grammars

Syntax Analysis Check syntax and construct abstract syntax tree

Introduction to Parsing

Normal Forms for CFG s. Eliminating Useless Variables Removing Epsilon Removing Unit Productions Chomsky Normal Form

Chapter 3 Parsing. time, arrow, banana, flies, like, a, an, the, fruit

Principles of Programming Languages COMP251: Syntax and Grammars

Defining syntax using CFGs

Habanero Extreme Scale Software Research Project

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

EECS 6083 Intro to Parsing Context Free Grammars

Part 3. Syntax analysis. Syntax analysis 96

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

CMPT 755 Compilers. Anoop Sarkar.

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

CONTEXT FREE GRAMMAR. presented by Mahender reddy

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Compiler Construction

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Chapter 3. Describing Syntax and Semantics

Ambiguous Grammars and Compactification

COL728 Minor1 Exam Compiler Design Sem II, Answer all 5 questions Max. Marks: 20

CPS 506 Comparative Programming Languages. Syntax Specification

CSE 401 Midterm Exam Sample Solution 2/11/15

Parsing II Top-down parsing. Comp 412

2.2 Syntax Definition

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CS525 Winter 2012 \ Class Assignment #2 Preparation

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Chapter 3. Describing Syntax and Semantics ISBN

Syntax. In Text: Chapter 3

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing Ambiguity and Syntax Errors

a. Determine the EPS, FIRST and FOLLOWs set for this grammar.

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

ITEC2620 Introduction to Data Structures

Context-Free Grammars

Principles of Programming Languages COMP251: Syntax and Grammars

LL(1) predictive parsing

Formal Languages. Formal Languages

MIDTERM EXAMINATION - CS130 - Spring 2003

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

CSE 582 Autumn 2002 Exam Sample Solution

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

CSE 413 Final Exam Spring 2011 Sample Solution. Strings of alternating 0 s and 1 s that begin and end with the same character, either 0 or 1.

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

3. Context-free grammars & parsing

CSE431 Translation of Computer Languages

Lecture Notes on Shift-Reduce Parsing

CSE P 501 Exam Sample Solution 12/1/11

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

ASTs, Objective CAML, and Ocamlyacc

This book is licensed under a Creative Commons Attribution 3.0 License

Describing Syntax and Semantics

ECE251 Midterm practice questions, Fall 2010

Lecture 12: Cleaning up CFGs and Chomsky Normal

3. Parsing. Oscar Nierstrasz

Natural Language Processing

CS /534 Compiler Construction University of Massachusetts Lowell. NOTHING: A Language for Practice Implementation

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

S Y N T A X A N A L Y S I S LR

Chapter 3. Describing Syntax and Semantics ISBN

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Chapter 4: Syntax Analyzer

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

EDA180: Compiler Construc6on. More Top- Down Parsing Abstract Syntax Trees Görel Hedin Revised:

CSCI312 Principles of Programming Languages!

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:

INFOB3TC Solutions for the Exam

3. DESCRIBING SYNTAX AND SEMANTICS

A simple syntax-directed

Context free grammars and predictive parsing

CS 406/534 Compiler Construction Parsing Part I

CMSC 330: Organization of Programming Languages

CSE P 501 Exam 12/1/11

Chapter 3. Describing Syntax and Semantics

Programming Languages

Eng. Maha Talaat Page 1

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Compilers and computer architecture From strings to ASTs (2): context free grammars

CS 314 Principles of Programming Languages

Introduction to Parsing. Lecture 5

Homework 3 Dragon: 4.6.2, 4.6.3, 4.6.5, 4.6.6, , 4.7.4, , 4.8.2

Chapter 18: Decidability

A Simple Syntax-Directed Translator

Related Course Objec6ves

Lexical and Syntax Analysis. Top-Down Parsing

QUESTION BANK. Unit 1. Introduction to Finite Automata

CSE 413 Final Exam. June 7, 2011

CSE 401/M501 18au Midterm Exam 11/2/18. Name ID #

Programming Languages Third Edition

Transcription:

Exercise 1: Balanced Parentheses Show that the following balanced parentheses grammar is ambiguous (by finding two parse trees for some input sequence) and find unambiguous grammar for the same language. B ::= ( B ) B B

Remark The same parse tree can be derived using two different derivations, e.g. B -> (B) -> (BB) -> ((B)B) -> ((B)) -> (()) B -> (B) -> (BB) -> ((B)B) -> (()B) -> (()) this correspond to different orders in which nodes in the tree are expanded Ambiguity refers to the fact that there are actually multiple parse trees, not just multiple derivations.

Towards Solution (Note that we must preserve precisely the set of strings that can be derived) This grammar: B ::= A A ::= ( ) A A (A) solves the problem with multiple symbols generating different trees, but it is still ambiguous: string ( ) ( ) ( ) has two different parse trees

Proposed solution: B ::= B (B) Solution this is very smart! How to come up with it? Clearly, rule B::= B B generates any sequence of B's. We can also encode it like this: B ::= C* C ::= (B) Now we express sequence using recursive rule that does not create ambiguity: B ::= C B C ::= (B) but now, look, we "inline" C back into the rules for so we get exactly the rule B ::= B (B) This grammar is not ambiguous and is the solution. We did not prove this fact (we only tried to find ambiguous trees but did not find any).

Exercise 2: Dangling Else The dangling-else problem happens when the conditional statements are parsed using the following grammar. S ::= S ; S S ::= id := E S ::= if E then S S ::= if E then S else S Find an unambiguous grammar that accepts the same conditional statements and matches the else statement with the nearest unmatched if.

if (x > 0) then Discussion of Dangling Else if (y > 0) then z = x + y else x = - x This is a real problem languages like C, Java resolved by saying else binds to innermost if Can we design grammar that allows all programs as before, but only allows parse trees where else binds to innermost if?

Sources of Ambiguity in this Example Ambiguity arises in this grammar here due to: dangling else binary rule for sequence (;) as for parentheses priority between if-then-else and semicolon (;) if (x > 0) if (y > 0) z = x + y; u = z + 1 // last assignment is not inside if Wrong parse tree -> wrong generated code

How we Solved It We identified a wrong tree and tried to refine the grammar to prevent it, by making a copy of the rules. Also, we changed some rules to disallow sequences inside if-then-else and make sequence rule non-ambiguous. The end result is something like this: S::= A S A ::= id := E A ::= if E then A A ::= if E then A' else A A' ::= id := E A' ::= if E then A' else A' At some point we had a useless rule, so we deleted it. // a way to write S::=A* We also looked at what a practical grammar would have to allow sequences inside if-then-else. It would add a case for blocks, like this: A ::= { S } A' ::= { S } We could factor out some common definitions (e.g. define A in terms of A'), but that is not important for this problem.

Steps: Transforming Grammars into Chomsky Normal Form 1. remove unproductive symbols 2. remove unreachable symbols 3. remove epsilons (no non-start nullable symbols) 4. remove single non-terminal productions X::=Y 5. transform productions w/ more than 3 on RHS 6. make terminals occur alone on right-hand side

1) Unproductive non-terminals How to compute them? What is funny about this grammar: stmt ::= identifier := identifier while (expr) stmt if (expr) stmt else stmt expr ::= term + term term term term ::= factor * factor factor ::= ( expr ) There is no derivation of a sequence of tokens from expr Why? In every step will have at least one expr, term, or factor If it cannot derive sequence of tokens we call it unproductive

1) Unproductive non-terminals Productive symbols are obtained using these two rules (what remains is unproductive) Terminals (tokens) are productive If X::= s 1 s 2 s n is rule and each s i is productive then X is productive stmt ::= identifier := identifier while (expr) stmt if (expr) stmt else stmt expr ::= term + term term term term ::= factor * factor factor ::= ( expr ) program ::= stmt stmt program Delete unproductive symbols. Will the meaning of top-level symbol (program) change?

2) Unreachable non-terminals What is funny about this grammar with starting terminal program program ::= stmt stmt program stmt ::= assignment whilestmt assignment ::= expr = expr ifstmt ::= if (expr) stmt else stmt whilestmt ::= while (expr) stmt expr ::= identifier No way to reach symbol ifstmt from program

2) Computing unreachable What is the general algorithm? non-terminals What is funny about this grammar with starting terminal program program ::= stmt stmt program stmt ::= assignment whilestmt assignment ::= expr = expr ifstmt ::= if (expr) stmt else stmt whilestmt ::= while (expr) stmt expr ::= identifier

2) Unreachable non-terminals Reachable terminals are obtained using the following rules (the rest are unreachable) starting non-terminal is reachable (program) If X::= s 1 s 2 s n is rule and X is reachable then each non-terminal among s 1 s 2 s n is reachable Delete unreachable symbols. Will the meaning of top-level symbol (program) change?

3) Removing Empty Strings Ensure only top-level symbol can be nullable program ::= stmtseq stmtseq ::= stmt stmt ; stmtseq stmt ::= assignment whilestmt blockstmt blockstmt ::= { stmtseq } assignment ::= expr = expr whilestmt ::= while (expr) stmt expr ::= identifier How to do it in this example?

3) Removing Empty Strings - Result program ::= stmtseq stmtseq ::= stmt stmt ; stmtseq ; stmtseq stmt ; ; stmt ::= assignment whilestmt blockstmt blockstmt ::= { stmtseq } { } assignment ::= expr = expr whilestmt ::= while (expr) stmt whilestmt ::= while (expr) expr ::= identifier

3) Removing Empty Strings - Algorithm Compute the set of nullable non-terminals Add extra rules If X::= s 1 s 2 s n is rule then add new rules of form X::= r 1 r 2 r n where r i is either s i or, if s i is nullable then r i can also be the empty string (so it disappears) Remove all empty right-hand sides If starting symbol S was nullable, then introduce a new start symbol S instead, and add rule S ::= S

3) Removing Empty Strings Since stmtseq is nullable, the rule blockstmt ::= { stmtseq } gives blockstmt ::= { stmtseq } { } Since stmtseq and stmt are nullable, the rule stmtseq ::= stmt stmt ; stmtseq gives stmtseq ::= stmt stmt ; stmtseq ; stmtseq stmt ; ;