Decision Properties for Context-free Languages

Similar documents
JNTUWORLD. Code No: R

CT32 COMPUTER NETWORKS DEC 2015

Yet More CFLs; Turing Machines. CS154 Chris Pollett Mar 8, 2006.

Normal Forms for CFG s. Eliminating Useless Variables Removing Epsilon Removing Unit Productions Chomsky Normal Form


UNIT I PART A PART B

Multiple Choice Questions

Normal Forms and Parsing. CS154 Chris Pollett Mar 14, 2007.

Decidable Problems. We examine the problems for which there is an algorithm.

The CYK Parsing Algorithm

Skyup's Media. PART-B 2) Construct a Mealy machine which is equivalent to the Moore machine given in table.

CS402 - Theory of Automata Glossary By

Context Free Languages and Pushdown Automata

Models of Computation II: Grammars and Pushdown Automata

Theory Bridge Exam Example Questions Version of June 6, 2008

Homework. Announcements. Before We Start. Languages. Plan for today. Chomsky Normal Form. Final Exam Dates have been announced

CS 44 Exam #2 February 14, 2001

Syntax Analysis Part I

CS210 THEORY OF COMPUTATION QUESTION BANK PART -A UNIT- I

THEORY OF COMPUTATION

Languages and Compilers

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Closure Properties of CFLs; Introducing TMs. CS154 Chris Pollett Apr 9, 2007.

Theory of Computation

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

Compiler Construction

I have read and understand all of the instructions below, and I will obey the Academic Honor Code.

Ambiguous Grammars and Compactification

Final Course Review. Reading: Chapters 1-9

A Characterization of the Chomsky Hierarchy by String Turing Machines

Parsing. Cocke Younger Kasami (CYK) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 35

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

From Theorem 8.5, page 223, we have that the intersection of a context-free language with a regular language is context-free. Therefore, the language

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

Turing Machine Languages

Definition: A context-free grammar (CFG) is a 4- tuple. variables = nonterminals, terminals, rules = productions,,

Theory of Computations Spring 2016 Practice Final Exam Solutions

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Universal Turing Machine Chomsky Hierarchy Decidability Reducibility Uncomputable Functions Rice s Theorem Decidability Continued

(a) R=01[((10)*+111)*+0]*1 (b) ((01+10)*00)*. [8+8] 4. (a) Find the left most and right most derivations for the word abba in the grammar

AUBER (Models of Computation, Languages and Automata) EXERCISES

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

1. [5 points each] True or False. If the question is currently open, write O or Open.

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half.

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Formal Grammars and Abstract Machines. Sahar Al Seesi

Learn Smart and Grow with world

QUESTION BANK. Unit 1. Introduction to Finite Automata

(Refer Slide Time: 0:19)

AUTOMATA THEORY AND COMPUTABILITY

Formal Languages and Automata

LL(1) predictive parsing

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

CSE 105 THEORY OF COMPUTATION

Parsing. For a given CFG G, parsing a string w is to check if w L(G) and, if it is, to find a sequence of production rules which derive w.

TOPIC PAGE NO. UNIT-I FINITE AUTOMATA

Introduction to Syntax Analysis

1. Which of the following regular expressions over {0, 1} denotes the set of all strings not containing 100 as a sub-string?

1 Parsing (25 pts, 5 each)

PDA s. and Formal Languages. Automata Theory CS 573. Outline of equivalence of PDA s and CFG s. (see Theorem 5.3)

ECS 120 Lesson 16 Turing Machines, Pt. 2

Lecture 8: Context Free Grammars

PS3 - Comments. Describe precisely the language accepted by this nondeterministic PDA.

Theory of Computation, Homework 3 Sample Solution

Formal Languages. Grammar. Ryan Stansifer. Department of Computer Sciences Florida Institute of Technology Melbourne, Florida USA 32901

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

CpSc 421 Final Solutions

The CYK Algorithm. We present now an algorithm to decide if w L(G), assuming G to be in Chomsky Normal Form.

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY SIRUVACHUR, PERAMBALUR DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Theory of Programming Languages COMP360

CS154 Midterm Examination. May 4, 2010, 2:15-3:30PM

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Regular Languages (14 points) Solution: Problem 1 (6 points) Minimize the following automaton M. Show that the resulting DFA is minimal.

Introduction to Syntax Analysis. The Second Phase of Front-End

Chapter 18: Decidability

Theory of Computations Spring 2016 Practice Final

University of Nevada, Las Vegas Computer Science 456/656 Fall 2016

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

TAFL 1 (ECS-403) Unit- V. 5.1 Turing Machine. 5.2 TM as computer of Integer Function

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

Limits of Computation p.1/?? Limits of Computation p.2/??

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

LING/C SC/PSYC 438/538. Lecture 20 Sandiway Fong

CSC-461 Exam #2 April 16, 2014

Recursively Enumerable Languages, Turing Machines, and Decidability

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

Talen en Compilers. Johan Jeuring , period 2. January 17, Department of Information and Computing Sciences Utrecht University

Reflection in the Chomsky Hierarchy

KEY. A 1. The action of a grammar when a derivation can be found for a sentence. Y 2. program written in a High Level Language

Problems, Languages, Machines, Computability, Complexity

Context-Free Languages and Parse Trees

Lecture 12: Cleaning up CFGs and Chomsky Normal

14.1 Encoding for different models of computation

ONE-STACK AUTOMATA AS ACCEPTORS OF CONTEXT-FREE LANGUAGES *

Turing Machines. A transducer is a finite state machine (FST) whose output is a string and not just accept or reject.

Computer Sciences Department

Solving systems of regular expression equations

Transcription:

Previously: Decision Properties for Context-free Languages CMPU 240 Language Theory and Computation Fall 2018 Context-free languages Pumping Lemma for CFLs Closure properties for CFLs Today: Assignment 5 due Decision properties for CFLs, e.g., is a string in the language? Election special! Later I ll post practice problems for Exam 2 tonight Exam 2 review on Thursday Decision properties for context-free languages (CFLs) Start with a representation of a CFL, i.e., a context-free grammar (CFG) or a pushdown automaton (PDA). Since we can convert between CFGs and PDAs, we can use whatever is more convenient. Spoiler: Very little is decidable about CFLs! Testing emptiness of a CFL Given a representation of some context-free language, ask whether it represents We ve already seen how to do this when we were converting to CNF. Check if the start symbol is useless, i.e., it doesn t derive at least one string. We can decide if a language is empty. We can decide if a string is in a language.

Testing finiteness of a CFL Let L be a CFL. Then there is some Pumping Lemma constant n for L. Test all strings of length between n and 2n 1 for membership. If there is any such string, it can be pumped, and the language is infinite. If there is no such string, then n 1 is an upper limit on the length of strings, so the language is finite. Trick: If there were a string s = uvxyz of length 2n or longer, you can find a shorter string uxz in L, but it s at most n shorter. (Why?) Thus, if there are any strings of length 2n or more, you can repeatedly cut out vy to get, eventually, a string whose length is in the range n to 2n 1. Testing membership of a string in a CFL Important result: Given a context-free grammar G and a word w, we can tell if G generates w! This can be done in finite time, algorithmically. Testing membership of a string in a CFL What if we only considered PDAs? It s not obvious that this could be done in finite time. Why can t we just simulate a PDA on w and, whenever it stops, we d have our answer? Simulating a PDA for L on string w doesn t quite work, because the PDA can grow its stack indefinitely on ε input, and we never finish, even if the PDA is deterministic Testing membership of a string in a CFL The approach to recognizing if a grammar G generates a string w has two steps: 1 Convert G to Chomsky normal form (CNF) 2 Use the CYK algorithm. The Cocke Younger Kasami (CYK) algorithm is an O(n 3 ) algorithm (n = length of w) that uses a dynamic programming technique.

Aside: CNF Recall that in Chomsky normal form, every rule in the grammar is of the form or A BC A a where a is a terminal, A is any variable, and B and C are variables other than the start variable. (Exception: allow S ε) Aside: Big O notation We said the algorithm is O(n 3 ). If you haven t seen this notation before, it means that it takes at most n 3 steps of computation (loosely defined) to process an input of length n. Big O notation is used in complexity analysis, which we may spend some time on at the end of the course, and is used extensively in CMPU 241, Algorithms. Aside: Dynamic programming Dynamic programming is a class of methods that avoid duplicate computation at the expense of memory. Values that may be used in future computations are stored in a table. Think of computing the Fibonacci sequence: Each value depends on the two previous ones, so we save them after computation. In dynamic programming, you may have many previous calculations that you want to re-use. CYK algorithm Start with a CNF grammar for L Build a two-dimensional table: Row = length of a substring of w Column = beginning position of the substring Entry in row i and column j = set of variables that generate the substring of w beginning at position j and extending for i positions These entries are denoted X j,i+j 1, i.e., the subscripts are the first and last positions of the string represented, so the first row is X, X,, X n,n ; the second row is X, X 2,3,, X n 1,n, and so on

Table The horizontal axis corresponds to the positions of the string w = a 1 a 2 a n. Table entry X i,j is the set of non-terminals A such that A a i a i+1 a j. We are particularly interested in whether S is in X 1,n because that is the same as saying S w (that is, w is in L) Basis: (row 1) X i,i = the set of variables A such that A a is a production, and a is the symbol at position i of w. The grammar is in CNF, therefore the only way to derive a terminal is with a production of the form A a, so X i,i is the set of non-terminals such that A a i is a production of G Induction: Suppose we want to compute X i,j, which is in row j i +1 and we have computed all the Xs in the rows for shorter strings. We can derive a i a i+1 a j from A if there is a production A BC, B derives any proper prefix of a i a i +1 a j, and C derives the rest. Thus, we must ask if there is any value of k such that i k < j B is in X i,k C is in X k+1,j Example We ll use the algorithm to determine if the string w = aabbb is in the language generated by the grammar S AB S AB Note that w = a, so X is the set of all variables that immediately derive a. that is X =. Since w = a, we also have X =, and so on to get X =, X =, X =, X =, X = 2,3 3,4 4,5

S AB Compute X : since X = and X =, X consists of all variables on the left side of a production whose right side is AA. None, so X is empty. S AB Next X 2,3 = {A A BB, B X, B X } so the required right side is AB, thus X 2,3 = 2,3 3,4 4,5 2,3 3,4 4,5 S AB The rest is easy. S AB 1,4 2,3 3,4 4,5 2,4 2,5 3,5 1,4 2,3 2,4 2,5 3,4 3,5 4,5 Since S is in X, w L(G) A A B B B A B B S

Which variables have a production body b? a? 2,3 3,4 4,5 2,3 3,4 4,5 Which variables have a production body b? a? 2,3 3,4 4,5 Break ba into two nonempty substrings, b and a. Rule must have body αβ where α X and β X, i.e., BA or BC 2,3 3,4 4,5

... {S,A} {A,C} 2,3 2,4 3,5 {A,C} 3,4 {S,C} 4,5 {S,A} {A,C} We can break the string aab (position 2 to 4) after position 2 or after position 3: k =2 or k =3. Need to consider bodies in X X 3,4 X 2,3X = {A,C}{S,C} = {AS, AC, CS, CC, BB} 2,3 2,4 3,4 {S, C} 3,5 4,5 Only CC shows up as a body baaba L(G) 1,4 {S, A, C} 2,3 2,4 2,5 {S, A, C} 3,4 {S, C} 3,5 4,5 CYK as a parsing algorithm Applicability of the CYK algorithm as a parser is limited by the computational requirements needed to find a derivation For an input string of length n, (n 2 +n)/2 sets need to be constructed to complete the dynamic programming table Each of these sets may require the consideration of several decompositions of the associated substring

Preview of undecidable CFL problems The Chomsky hierarchy Is a given CFG G ambiguous? Is a given CFL inherently ambiguous? Recursively Enumerable Languages Context-sensitive Languages Turing Machine Linear-Bounded Automata Is the intersection of two CFLs empty? Are two CFLs the same? r Context-free Languages Regular Languages Pushdown Automata Finite Automata Is a given CFL equal to Σ*, where Σ is the alphabet of the language? Context-sensitive grammars The next grammar type, more powerful than CFGs, is a somewhat restricted grammar A grammar is context-sensitive if all productions are of the form x y, where x, y are in (V T)+ and x y Fundamental property: grammar is non-contracting i.e., the length of successive sentential forms can never decrease Why context-sensitive? All productions can be rewritten in a normal form xay xvy Effectively, A can be replaced by v only in the context of a preceding x and a following y Example CSG for {a n b n c n n 1} S abc aabc Ab ba Ac Bbcc bb Bb ab aa aaa Try to derive a 3 b 3 c 3 S aabc abac abbbcc abbbcc aaabbcc aababcc aabbacc aabbbbccc aabbbbccc aabbbbccc aaabbbccc A and B are messengers an A is created on the left, travels to the right to the first c, creates another b and c. Then sends B back to create the corresponding a. Similar to the way one would program a TM to accept the language.

Linear-bounded automata A limited Turing machine in which tape use is restricted Use only part of the tape occupied by the input I.e., has an unbounded tape, but the amount that can be used is a function of the input Restrict usable part of tape to exactly the cells taken by the input LBA is assumed to be nondeterministic Relation between CSLs and LBAs If a language L is accepted by some linear bounded automaton, then there is a context-sensitive grammar that generates L Every step in a derivation from a CSG is a bounded function of w because any CSG G is non-contracting That is all.