Ambiguous Grammars and Compactification

Similar documents
Context-Free Languages and Parse Trees

Normal Forms for CFG s. Eliminating Useless Variables Removing Epsilon Removing Unit Productions Chomsky Normal Form

Context-Free Grammars

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

JNTUWORLD. Code No: R

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

Final Course Review. Reading: Chapters 1-9

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

CSE 105 THEORY OF COMPUTATION

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half.

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed


Chapter Seven: Regular Expressions

UNIT I PART A PART B

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

Optimizing Finite Automata

QUESTION BANK. Unit 1. Introduction to Finite Automata

Context-Free Grammars and Languages (2015/11)

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

(a) R=01[((10)*+111)*+0]*1 (b) ((01+10)*00)*. [8+8] 4. (a) Find the left most and right most derivations for the word abba in the grammar

CS525 Winter 2012 \ Class Assignment #2 Preparation

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

(Refer Slide Time: 0:19)

Multiple Choice Questions

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

CMSC 330: Organization of Programming Languages

Theory of Computation, Homework 3 Sample Solution

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

CT32 COMPUTER NETWORKS DEC 2015

Compiler Construction

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

9.5 Equivalence Relations

Automata Theory TEST 1 Answers Max points: 156 Grade basis: 150 Median grade: 81%

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Languages and Compilers

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

Formal Languages and Automata

T.E. (Computer Engineering) (Semester I) Examination, 2013 THEORY OF COMPUTATION (2008 Course)

Homework. Announcements. Before We Start. Languages. Plan for today. Chomsky Normal Form. Final Exam Dates have been announced

Learn Smart and Grow with world

CMSC 330: Organization of Programming Languages

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

Decision Properties for Context-free Languages

CIT3130: Theory of Computation. Regular languages

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

University of Nevada, Las Vegas Computer Science 456/656 Fall 2016

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

CHAPTER 8. Copyright Cengage Learning. All rights reserved.

Skyup's Media. PART-B 2) Construct a Mealy machine which is equivalent to the Moore machine given in table.

CMSC 330: Organization of Programming Languages

CS154 Midterm Examination. May 4, 2010, 2:15-3:30PM

Post's Correspondence Problem. An undecidable, but RE, problem that. appears not to have anything to do with. TM's.

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

1. [5 points each] True or False. If the question is currently open, write O or Open.

CMSC 330: Organization of Programming Languages. Context Free Grammars

Closure Properties of CFLs; Introducing TMs. CS154 Chris Pollett Apr 9, 2007.

Context-Free Grammars

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

Slides for Faculty Oxford University Press All rights reserved.

TOPIC PAGE NO. UNIT-I FINITE AUTOMATA

CMSC 330: Organization of Programming Languages. Context Free Grammars

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Regular Languages and Regular Expressions

CSE 431S Scanning. Washington University Spring 2013

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

14.1 Encoding for different models of computation

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Context Free Languages and Pushdown Automata

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Foundations of Computer Science Spring Mathematical Preliminaries

Midterm Exam. CSCI 3136: Principles of Programming Languages. February 20, Group 2

Non-deterministic Finite Automata (NFA)

Finite Automata Part Three

SLR parsers. LR(0) items

Theory of Computation

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Zhizheng Zhang. Southeast University

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

Syntax Analysis Check syntax and construct abstract syntax tree

Finite Automata Part Three

Answer All Questions. All Questions Carry Equal Marks. Time: 20 Min. Marks: 10.

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Introduction to Syntax Analysis. The Second Phase of Front-End

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

4. Lexical and Syntax Analysis

Section 17. Closed Sets and Limit Points

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Properties of Regular Expressions and Finite Automata

Decision Properties of RLs & Automaton Minimization

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

Transcription:

Ambiguous Grammars and Compactification Mridul Aanjaneya Stanford University July 17, 2012 Mridul Aanjaneya Automata Theory 1/ 44

Midterm Review Mathematical Induction and Pigeonhole Principle Finite Automata (DFA s, NFA s, ε-nfa s) Equivalence of DFA s, NFA s and ε-nfa s Regular Languages and Closure Properties Union, Concatenation, Kleene Closure, Intersection, Difference, Complement, Reversal, (Inverse) Homomorphisms Regular Expressions and Equivalence with Automata Pumping Lemma for DFA s Efficient State Minimization for DFA s Context-Free Languages and Parse Trees Mridul Aanjaneya Automata Theory 2/ 44

Homework #1 Problem 1 Given 8 distinct natural numbers, none greater than 15, show that at least three pairs of them have the same positive difference. The pairs need not be disjoint as sets. 8 distinct natural numbers 28 pairs. 14 possible differences. The difference 14 can only be achieved by 1 pair PHP! Common Mistakes: 1 Not mentioning PHP. Mridul Aanjaneya Automata Theory 3/ 44

Homework #1 Problem 2 Prove that the number 111... 11 (243 ones) is divisible by 243. Let α n denote the number 111... 11 (3 n ones). Proof by induction. α n+1 = α n 10 2 3n + α n 10 3n + α n 10 2 3n and 10 3n leave remainder 1 modulo 3. α n+1 3 α n (mod 3) 0(mod 3 n+1 ) Common Mistakes: 1 Using divisibility of sum of digits modulo 3 without proof. 2 Not stating general claim implies n = 5. Mridul Aanjaneya Automata Theory 4/ 44

Homework #1 Problem 3 Show that the language L consisting of runs of even numbers of 0 s and runs of odd numbers of 1 s is regular. 1 0,1 0 A B 1 C 0 A 1 0 0,1 1 B C 0 D 0 1 Common Mistakes: 1 Not having a reject state that loops on all inputs. 2 Inconsistent use of empty string. Mridul Aanjaneya Automata Theory 5/ 44

Homework #1 Problem 4 Construct an NFA for the language L over {0,1} such that each string has two 0 s separated by a number of positions that is a nonzero multiple of 5. 0,1 A 0 0,1 0,1 B C D 0,1 1 0,1 H 0 G 0,1 F 0,1 E Common Mistakes: 1 Looping back incorrectly. 2 Not enforcing that the first and last transitions are zeroes. Mridul Aanjaneya Automata Theory 6/ 44

Homework #1 Problem 5 Define an NFA N 2 from a given NFA N 1 by reversing the final/nonfinal states. Is L(N 2 ) the complement of L(N 1 )? A Common Mistakes: 1 Overcomplicated thoughts. 2 Proving its true! Mridul Aanjaneya Automata Theory 7/ 44

Recap: Context-Free Grammars Terminals: symbols of the alphabet of the language being defined. Variables (nonterminals): a finite set of other symbols, each of which represents a language. Start symbol: the variable whose language is the one being defined. Mridul Aanjaneya Automata Theory 8/ 44

Example: Formal CFG Here is a formal CFG for {0 n 1 n n 1}. Terminals = {0,1}. Variables = {S}. Start symbol = S. Productions = S 01 S 0S1 Mridul Aanjaneya Automata Theory 9/ 44

Recap: Leftmost Derivations Say waα lm wβα if w is a string of terminals only and A β is a production. Also, α lm β if α becomes β by a sequence of zero or more lm steps. Balanced parantheses grammar: S SS (S) () S lm SS lm (S)S lm (())S lm (())() Thus, S lm (())() Mridul Aanjaneya Automata Theory 10/ 44

Recap: Rightmost Derivations Say αaw rm αβw if w is a string of terminals only and A β is a production. Also, α rm β if α becomes β by a sequence of zero or more rm steps. Balanced parantheses grammar: S SS (S) () S rm SS rm S() rm (S)() rm (())() Thus, S rm (())() Mridul Aanjaneya Automata Theory 11/ 44

Recap: Parse Trees Parse trees are trees labeled by symbols of a particular CFG. Leaves: labeled by a terminal or ε. Interior nodes: labeled by a variable. Children are labeled by the right side of a production for the parent. Root: must be labeled by the start symbol. Mridul Aanjaneya Automata Theory 12/ 44

Example: Parse Tree S SS (S) () S S S ( S ) ( ) ( ) Mridul Aanjaneya Automata Theory 13/ 44

Recap: Ambiguous Grammars A CFG is ambiguous is there is a string in the language that is the yield of two or more parse trees. Example: S SS (S) () Two parse trees for ()()()! Mridul Aanjaneya Automata Theory 14/ 44

Ambiguous Grammars Ambiguity is a property of grammars not languages. For the balanced parentheses language, here is another CFG which is unambiguous: B (RB ε R ) (RR Note: R generates strings that have one more right parentheses than left. Mridul Aanjaneya Automata Theory 15/ 44

Unambiguous Grammars B (RB ε R ) (RR Construct a unique leftmost derivation for a given balanced string of parentheses by scanning the string from left to right. If we need to expand B, then use B (RB if the next symbol is ( and ε if at the end. If we need to expand R, use R ) if the next symbol is ) and (RR if it is (. Mridul Aanjaneya Automata Theory 16/ 44

The Parsing Process (())() B (RB ((RRB (()RB (())B (())(RB (())()B (())() Mridul Aanjaneya Automata Theory 17/ 44

LL(1) Grammars As an aside, a grammar such as B (RB ε R ) (RR where you can always figure out the production to use in a leftmost derivation by scanning the given string left-to-right and looking only at the next one symbol is called LL(1). Leftmost derivation, left-to-right scan, one symbol of lookahead. Most programming languages have LL(1) grammars. LL(1) grammars are never ambiguous. Mridul Aanjaneya Automata Theory 18/ 44

Inherent Ambiguity It would be nice if for every ambiguous grammar, there was some way to fix the ambiguity, as we did for the balanced parentheses grammar. Unfortunately, certain CFL s are inherently ambiguous, meaning that every grammar for the language is ambiguous. Mridul Aanjaneya Automata Theory 19/ 44

Example: Inherent Ambiguity The language L = {0 i 1 j 2 k i = j or j = k} Intuitively, at least some of the strings of the form 0 n 1 n 2 n must be generated by two different parse trees, one based on checking the 0 s and 1 s, the other based on checking the 1 s and 2 s. Mridul Aanjaneya Automata Theory 20/ 44

One Possible Ambiguous Grammar S AB CD A 0A1 01 B 2B 2 C 0C 0 D 1D2 12 A generates equal numbers 0 s and 1 s. B generates any number of 2 s. C generates any number of 0 s. D generates equal numbers 1 s and 2 s. Mridul Aanjaneya Automata Theory 21/ 44

One Possible Ambiguous Grammar S AB CD A 0A1 01 B 2B 2 C 0C 0 D 1D2 12 And there are two derivations of every string with equal numbers of 0 s, 1 s and 2 s. e.g.: S AB 01B 012 S CD 0D 012 Mridul Aanjaneya Automata Theory 22/ 44

Example #1 Problem Show that the language of all palindromes over {0,1} is context-free. Mridul Aanjaneya Automata Theory 23/ 44

Example #1 Problem Show that the language of all palindromes over {0,1} is context-free. P ε 0 1 0P0 1P1 Mridul Aanjaneya Automata Theory 24/ 44

Example #2 Problem Give a context-free language for the regular expression 0 1(0+1). Mridul Aanjaneya Automata Theory 25/ 44

Example #2 Problem Give a context-free language for the regular expression 0 1(0+1). S A1B A 0A ε B 0B 1B ε Mridul Aanjaneya Automata Theory 26/ 44

Variables that derive nothing Consider: S AB, A aa, B AB Although A derives all strings of a s, B derives no terminal strings. Thus, S derives nothing, and the language is empty! Mridul Aanjaneya Automata Theory 27/ 44

Testing if a variable derives a terminal string Basis: If there is a production A w, where w has no variables, then A derives a terminal string. Induction: If there is a production A α, where α consists only of terminals and variables known to derive a terminal string, then A derives a terminal string. Mridul Aanjaneya Automata Theory 28/ 44

Testing Eventually, we can find no more variables. An easy induction on the order in which variables are discovered shows that each one truly derives a terminal string. Conversely, any variable that derives a terminal string will be discovered by this algorithm. Mridul Aanjaneya Automata Theory 29/ 44

Proof of Converse The proof is an induction on the height of the least-height parse tree by which a variable A derives a terminal string. Basis: Height = 1. Tree looks like: a A... a 1 n Then the basis of the algorithm tells us that A will be discovered. Mridul Aanjaneya Automata Theory 30/ 44

Induction for Converse Assume IH for parse trees of height < h, and suppose A derives a terminal string via a parse tree of height h: X A... X 1 n w 1 w n By IH, those X i s that are variables are discovered. Thus, A will also be discovered, because it has a right side of terminals and/or discovered variables. Mridul Aanjaneya Automata Theory 31/ 44

Algorithm to Eliminate Variables that Derive Nothing 1 Discover all variables that derive terminal strings. 2 For all other variables, remove all productions in which they appear, either on the left or on the right. Mridul Aanjaneya Automata Theory 32/ 44

Example: Eliminate Variables S AB C, A aa a, B bb, C c Basis: A and c are identified because of A a and C c. Induction: S is identified because of S C. Nothing else can be identified. Result: S C, A aa a, C c Mridul Aanjaneya Automata Theory 33/ 44

Unreachable Symbols Another way a terminal or a variable deserves to be eliminated is if it cannot appear in any derivation from the start symbol. Basis: We can reach S (the start symbol). Induction: If we can reach A, and there is a production A α, then we can reach all symbols of α. Mridul Aanjaneya Automata Theory 34/ 44

Unreachable Symbols Easy inductions in both directions show that when we can discover no more symbols, then we have all and only the symbols that appear in derivations from S. Left as exercises. Algorithm: Remove from the grammar all symbols not discovered reachable from S and all productions that involve these symbols. Mridul Aanjaneya Automata Theory 35/ 44

Eliminating Useless Symbols A symbol is useful if it appears in some derivation of some terminal string from the start symbol. Otherwise it is useless. Eliminate all useless symbols by: 1 Eliminating symbols that derive no terminal string. 2 Eliminating unreachable symbols. Mridul Aanjaneya Automata Theory 36/ 44

Useless Symbols S AB, A C, C c, B bb If we eliminated unreachable symbols first, we would find everything is reachable. A, C and c would never get eliminated. Mridul Aanjaneya Automata Theory 37/ 44

Why It Works After step (1), every symbol remaining derives some terminal string. After step (2), the only symbols remaining are all derivable from S. In addition, they still derive a terminal string, because such a derivation can only involve symbols reachable from S. Mridul Aanjaneya Automata Theory 38/ 44

Epsilon Productions We can almost avoid using productions of the form A ε (called ε-productions). The problem is that ε cannot be in the language of any grammar that has no ε-productions. Theorem If L is a CFL, then L {ε} has a CFG with no ε-productions. Mridul Aanjaneya Automata Theory 39/ 44

Nullable Symbols To eliminate ε-productions, we first need to discover the nullable symbols = variables A such that A ε. Basis: If there is a production A ε, then A is nullable. Induction: If there is a production A α, and all symbols of α are nullable, then A is nullable. Mridul Aanjaneya Automata Theory 40/ 44

Example: Nullable Symbols S AB, A aa ε, B bb A Basis: A is nullable because of A ε. Induction: B is nullable because of B A. Then, S is nullable because of S AB. Mridul Aanjaneya Automata Theory 41/ 44

Proof of Algorithm: Nullable Symbols Proof is very much like that for the algorithm for testing variables that derive terminal strings. Left to the imagination! Mridul Aanjaneya Automata Theory 42/ 44

Eliminating ε-productions Key idea: turn each production A X 1...X n into a family of productions. For each subset of nullable X s, there is one production with those eliminated from the right side in advance. Except, if all X s are nullable, do not make a production with ε as the right hand side. Mridul Aanjaneya Automata Theory 43/ 44

Example: Eliminating ε-productions S ABC, A aa ε, B bb ε, C ε A, B, C and S are all nullable. New grammar: S ABC AB AC BC A B C A aa a B bb b Note: C is now useless, eliminate its productions. Mridul Aanjaneya Automata Theory 44/ 44