Limits of Computation p.1/?? Limits of Computation p.2/??

Similar documents
CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

CT32 COMPUTER NETWORKS DEC 2015

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Definition 2.8: A CFG is in Chomsky normal form if every rule. only appear on the left-hand side, we allow the rule S ǫ.

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Multiple Choice Questions

CMSC 330: Organization of Programming Languages

Languages and Compilers

CMSC 330: Organization of Programming Languages

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

PS3 - Comments. Describe precisely the language accepted by this nondeterministic PDA.

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

Decision Properties for Context-free Languages

JNTUWORLD. Code No: R

CMSC 330: Organization of Programming Languages. Context Free Grammars

Models of Computation II: Grammars and Pushdown Automata

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

Regular Languages and Regular Expressions

Ambiguous Grammars and Compactification


Closure Properties of CFLs; Introducing TMs. CS154 Chris Pollett Apr 9, 2007.

Final Course Review. Reading: Chapters 1-9

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Context-Free Grammars

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

UNIT I PART A PART B

Actually talking about Turing machines this time

Yet More CFLs; Turing Machines. CS154 Chris Pollett Mar 8, 2006.

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Context-Free Languages and Parse Trees

Theory of Computation p.1/?? Theory of Computation p.2/?? A Turing machine can do everything that any computing

Syntax Analysis Part I

Lecture 8: Context Free Grammars

COP 3402 Systems Software Syntax Analysis (Parser)

CSE 105 THEORY OF COMPUTATION

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

PDA s. and Formal Languages. Automata Theory CS 573. Outline of equivalence of PDA s and CFG s. (see Theorem 5.3)

Introduction to Syntax Analysis

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half.

Definition: A context-free grammar (CFG) is a 4- tuple. variables = nonterminals, terminals, rules = productions,,

Reflection in the Chomsky Hierarchy

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Context Free Languages and Pushdown Automata

ECS 120 Lesson 7 Regular Expressions, Pt. 1

ECS 120 Lesson 16 Turing Machines, Pt. 2

Normal Forms and Parsing. CS154 Chris Pollett Mar 14, 2007.

Introduction to Syntax Analysis. The Second Phase of Front-End

Syntax Analysis Check syntax and construct abstract syntax tree

University of Nevada, Las Vegas Computer Science 456/656 Fall 2016

Compiler Construction

Week 2: Syntax Specification, Grammars

Pushdown Automata. A PDA is an FA together with a stack.

Introduction to Computers & Programming

Non-deterministic Finite Automata (NFA)

CS402 - Theory of Automata Glossary By

CS525 Winter 2012 \ Class Assignment #2 Preparation

QUESTION BANK. Unit 1. Introduction to Finite Automata

Optimizing Finite Automata

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

Introduction to Lexing and Parsing

Formal Languages and Automata

Skyup's Media. PART-B 2) Construct a Mealy machine which is equivalent to the Moore machine given in table.

From Theorem 8.5, page 223, we have that the intersection of a context-free language with a regular language is context-free. Therefore, the language

CS311 / MATH352 - AUTOMATA AND COMPLEXITY THEORY

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

I have read and understand all of the instructions below, and I will obey the Academic Honor Code.

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Introduction to Parsing. Lecture 5

Context-Free Grammars

COMP 330 Autumn 2018 McGill University

Chapter 3: Lexing and Parsing

Theory of Computation

A Characterization of the Chomsky Hierarchy by String Turing Machines

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Midterm Exam. CSCI 3136: Principles of Programming Languages. February 20, Group 2

where is the transition function, Pushdown Automata is the set of accept states. is the start state, and is the set of states, is the input alphabet,

Formal Grammars and Abstract Machines. Sahar Al Seesi

Homework. Announcements. Before We Start. Languages. Plan for today. Chomsky Normal Form. Final Exam Dates have been announced

14.1 Encoding for different models of computation

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Introduction to Parsing. Lecture 5

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

CS143 Midterm Fall 2008

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

1. [5 points each] True or False. If the question is currently open, write O or Open.

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

Solutions to Homework 10

CSE 105 THEORY OF COMPUTATION

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

Transcription:

Context-Free Grammars (CFG) There are languages, such as {0 n 1 n n 0} that cannot be described (specified) by finite automata or regular expressions. Context-free grammars provide a more powerful mechanism for language specification. Context-free grammars can describe features that have a recursive structure making them useful beyond finite automata. Limits of Computation p.2/?? Example CFG G1 has the following specification rules: A 0A1 A B B # Nonterminals of CFG G1 are {A,B} and A is the start symbol. Terminals of CFG G1 are {0, 1, #}. Limits of Computation p.4/?? 22c:135 Theory of Computation Hantao Zhang http://www.cs.uiowa.edu/ hzhang/c135 The University of Iowa Department of Computer Science Limits of Computation p.1/?? Formal Definition of a CFG A context-free grammar is a 4-tuple (V, Σ,R,S) where: 1. V is a finite set of symbols called the variables or nonterminals. 2. Σ is a finite set of symbols, disjoint from V, called terminals. 3. R is a finite set of rules (or specification rules) of the form lhs rhs, where lhs V, rhs (V Σ). 4. S V is the start variable. Limits of Computation p.3/??

Example derivation tree The derivation tree of the string 000#111 using CFG G1 is below: A A A A B 0 0 0 # 1 1 1 Limits of Computation p.6/?? CFG G2 The CFG G2 specifies a fragment of English SENT ENCE NOUN P HRASE V ERB P HRASE NOUN PHRASE CP NOUN CP NOUN PREP PHRASE V ERB PHRASE CP V ERB CP V ERB PREP PHRASE PREP PHRASE PREP CP NOUN CP NOUN ART ICLE NOUN CP V ERB V ERB V ERB NOUN PHRASE ART ICLE a the NOUN boy girl flower V ERB touches likes sees P REP with Limits of Computation p.8/?? Language specification A grammar is used for a language specification by generating each string of the language in following manner: 1. Write down the start variable; it is the lhs of the first specification rules, unless specified otherwise. 2. Find a variable that is written down and a rule whose lhs is that variable. Replace the written down variable with the rhs of that rule. 3. Repeat step 2 until no variables remain in the string thus generated Note: The sequence of substitutions used to obtain a string using a CFG is called a derivation and may be represented by a tree called a derivation tree or a parse tree. Limits of Computation p.5/?? Note All strings of terminals generated in this way constitute the language specified by the grammar We write L(G) for the language generated by the grammar G. Thus, L(G1) = {0 n #1 n n 0}. A language generated by a context-free grammar (CFG) is called a Context-Free Language, CFL. Limits of Computation p.7/??

Direct derivation If u,v,w (V Σ) (i.e., are strings of variables and terminals) and A w R (i.e., is a rule of the grammar) then we say that uav yields uwv, written uav uwv We may also say that uwv is directly derived from uav using the rule A w Limits of Computation p.10/?? Language Specified by G If G = (V, Σ,R,S) is a CFG then the language specified by G (or the language of G) is a CFL L(G) = {w Σ S w} Limits of Computation p.12/?? Example derivation with G2 SENTENCE NOUN PHRASE V ERB PHRASE CP NOUN V ERB PHRASE ARTICLE NOUN V ERB PHRASE a NOUN V ERB PHRASE a boy V ERB PHRASE a boy CP V ERB a boy V ERB a boy sees Limits of Computation p.9/?? Derivation We write u v if u = v or if a sequence u1,u2,...,uk (V Σ) exists, for k 0, and u1 u2... uk v We may also say that u1,u2,...,uk,v is a derivation of v from u1 Limits of Computation p.11/??

Important application Context-free grammars are used as basis for compiler design and implementation. Context-free grammars are used as specification mechanisms for programming languages Designers of compilers use such grammars to implement compiler s components, such a scanners, parsers, and code generators The implementation of almost any programming language is preceded by a context-free grammar that specifies it Limits of Computation p.14/?? Example Derivation with G4 E E + T T T * F F F a a a Limits of Computation p.16/?? More examples of CFGs Consider the grammar G3 = ({S}, {a,b}, {S asb SS ǫ},s) L(G3) contains strings such as abab, aaabbb, aababb; Note: if one think of a and b as ( and ) then we can see that L(G3) is the language of all strings of properly nested parentheses Limits of Computation p.13/?? Arithmetic Expressions Consider the grammar G4 = ({E, T, F }, {a, +,, (, )}, R, E) where R is: E E + T T T T F F F (E) a L(G4) is the language of arithmetic expressions Limits of Computation p.15/??

Design Techniques Many CFG are unions of simpler CFGs. Hence the suggestion is to construct smaller, simpler grammars first and then to join them into a larger grammar The mechanism of grammar combination consists of putting all their rules together and adding the new rules S S1 S2... Sk where the variables Si,1 i k, are the start variables of the individual grammars and S is a new variable Limits of Computation p.18/?? Second Design Technique Constructing a CFG for a regular language is easy if one can first construct a DFA for that language Conversion procedure: 1. Make a variable Ri for each state qi of DFA 2. Add the rule Ri arj to the CFG if δ(qi, a) = qj is a transition in the DFA 3. Add the rule Ri ǫ if qi is an accept state of the DFA 4. If q0 is the start state of the DFA make R0 the start variable of the CFG. Theorem: Every regular language is context-free. Limits of Computation p.20/?? Designing CFGs As with the design of automata, the design of CFGs requires creativity. CFGs are even trickier to construct than finite automata because we are more accustomed to programming a machine than we are to specify programming languages". Limits of Computation p.17/?? Example Grammar Design Design a grammar for the language {0 n 1 n n 0} {1 n 0 n n 0} 1. Construct the grammar S1 0S11 ǫ for the language {0 n 1 n n 0} 2. Construct the grammar S2 1S20 ǫ for the language {1 n 0 n n 0} 3. Put them together adding the rule S S1 S2 thus getting S S1 S2 S1 0S11 ǫ S2 1S20 ǫ Limits of Computation p.19/??

Example Application Consider the CFG G = ({S, B}, {a, b}, {S asb B ǫ, B bb b}, S). The following are derivations with G: S asb aasbb aasbbb, S asb aasbb aasbbb, S asb aasbb aasb, S asb aasbb aabb which show that derivations in this grammar are quite complex When rewriting the string aasbb we can consider further derivations of each of its symbols in isolation Derivations from B are B bb bbb b k 1 B b k, k 1 Therefore S asb asb k, k 1 Hence, L(G) = {a n b m n m}. Limits of Computation p.22/?? Ambiguous expressions Two different derivation trees for a+a*a E E E * E E + E E + E a a E * E a a a a Limits of Computation p.24/?? Third Design Technique Certain CFLs contain strings with two related substrings such as 0 n and 1 n in {0 n 1 n n 0}. Example of relationship: to recognize such a language a machine would need to remember an unbounded amount of info about one of the substrings A CFG that handles this situation uses a rule of the form R urv which generates strings wherein the portion containing u s corresponds to the portion containing v s. Limits of Computation p.21/?? Ambiguity If a CFG generates the same string in several different ways, we say that the string is derived ambiguously in that grammar. If a CFG generates some string ambiguously we say that the grammar is ambiguous. Example: the grammar G5, whose rules are: E E + E E E (E) a generate ambiguously some arithmetic expressions. Limits of Computation p.23/??

Note When a grammar generates a string ambiguously it means that the string has two different derivation trees. Two different derivations however, may produce the same derivation tree because they may differ in the order in which they replace nonterminals not in the rules they use. To concentrate on the structure of derivations we need to fix the order of rule application. Limits of Computation p.26/?? Inherent Ambiguity Some CFL can have both ambiguous and unambiguous grammars. Some CFL, however, can be generated only by an ambiguous grammar. A CFL that can be generated only by ambiguous grammars is called inherently ambiguous. Example of inherently ambiguous language: {0 i 1 j 2 k i = j j = k}. Limits of Computation p.28/?? Note The grammar G5 does not capture the usual precedence relations and so groups the + before * and vice versa. In contrast, the grammar G4 generates the same language, but every generated string has a unique derivation tree. Hence, G5 is ambiguous and G4 is not, i.e., G4 is unambiguous. G4 = ({E, T, F }, {a, +,, (, )}, R, E) where R is: E E + T T T T F F F (E) a Limits of Computation p.25/?? Fixing rule application order Leftmost derivation: a derivation of a string w in a grammar G is a leftmost derivation if at every step the leftmost nonterminal is replaced. Rightmost derivation: a derivation of a string w in a grammar G is a rightmost derivation if at every step the rightmost nonterminal is replaced. Note The leftmost and rightmost derivations of a string w are unique, so they are equivalent to the derivation tree. Limits of Computation p.27/??

Definition A context-free grammar G is in Chomsky normal form if every rule is of the form: A BC A a where a is a terminal, A,B,C are nonterminals, and B,C may not be the start variable. Note: the rule S ǫ, where S is the start variable, is not excluded. Limits of Computation p.30/?? Proof Add a new start symbol S0 and the rule S0 S where S was the original start symbol. Note: this change guarantees that the start symbol does not occur on the rhs of any rule. Limits of Computation p.32/?? Chomsky Normal Form It is often convenient to simplify CFG. One of the simplest and most useful simplified forms of CFG is called the Chomsky normal form Another normal form usually used in algebraic specifications is Greibach normal form. Limits of Computation p.29/?? Theorem 2.9 Any context-free language is generated by a context-free grammar in Chomsky normal form. Proof idea: Show that any grammar G can be converted into Chomsky normal form. Conversion procedure has several stages where the rules that violate Chomsky normal form conditions are replaced with equivalent rules that satisfy these conditions. Order of transformations:(1) add a new start variable, (2) eliminate all ǫ-rules, (3) eliminate unit-rules, (4) convert rules. Check that the obtained grammar define the same language as the initial. Limits of Computation p.31/??

Remove unit rules Repeat 1. Remove a unit rule A B. 2. For each rule B u that appears, add the rule A u, unless it was a unit rule previously removed. until all unit rules are eliminated. Note: u is a string of variables and terminals. Limits of Computation p.34/?? Example CFG conversion Consider the grammar G6 whose rules are: S ASA ab A B S B b ǫ After first step of transformation we get: S0 S S ASA ab A B S B b ǫ Limits of Computation p.36/?? Eliminate ǫ-rules Repeat 1. Eliminate the ǫ rule A ǫ where A is not the start symbol. 2. For each occurrence of A on the rhs of a rule, add a new rule with that occurrence of A deleted Example: To delete A ǫ, replace B uav by B uav uv; replace R uavaw by B uavaw uvaw aavw uvw. 3. Replace the rule B A, (if it is present) by B A ǫ unless the rule B ǫ has not been previously eliminated. until all ǫ rules are eliminated. Limits of Computation p.33/?? Convert all remaining rules Repeat 1. Replace a rule A u1u2...uk, k 3, where each ui, 1 i k, is a variable or a terminal, by: A u1a1, A1 u2a2,..., Ak 2 uk 1uk where A1,A2,..., Ak 2 are new variables. 2. If k 2 replace any terminal ui with a new variable Ui and add the rule Ui ui. until no rules of the form A u1u2...uk with k 3 remain. Limits of Computation p.35/??

Removing unit rule Removing S S: S0 S S ASA ab a SA AS A B S B b Removing S0 S: S0 ASA ab a SA AS S ASA ab a SA AS A B S B b Limits of Computation p.38/?? Converting the remaining rules S0 AA1 UB a SA AS S AA1 UB a SA AS A b AA1 UB a SA AS A1 SA U a B b Limits of Computation p.40/?? Removing ǫ rules Removing B ǫ: S0 S S ASA ab a A B S ǫ B b Removing A ǫ: S0 S S ASA ab a SA AS S A B S B b More unit rules Removing A B: S0 ASA ab a SA AS S ASA ab a SA AS A S b B b Removing A S: S0 ASA ab a SA AS S ASA ab a SA AS A b ASA ab a SA AS B b Limits of Computation p.37/?? Limits of Computation p.39/??

PDA and CFG PDA are equivalent in specification power with CFG This is useful because it gives us two options for proving that a language is context-free: 1. construct a CFG that generates the language or 2. construct a PDA that recognizes the language Limits of Computation p.42/?? Schematic representation of a PDA The schematic representations of PDA and NFA: Schematic of a NFA Finite Input a a b b Control Schematic of a PDA Finite Control Input a a b b Stack x y z... Limits of Computation p.44/?? A new type of computation model Pushdown automata, PDA, are a new type of computation model PDAs are like NFAs but have an extra component called a stack The stack provides additional memory beyond the finite amount available in the control The stack allows PDA to recognize some nonregular languages Limits of Computation p.41/?? Note Some CFL are more easily described in terms of their generators, whereas others are more easily described in terms of their recognizers Limits of Computation p.43/??

Benefits of the stack A stack can hold an unlimited amount of information A PDA can recognize a language like {0 n 1 n n 0} because it can use the stack to remember the number of 0s it has seen Informal algorithm: 1. Read symbols from the input. As each 0 is read push it onto the stack 2. As soon as a 1 is read, pop a 0 off the stack for each 1 read 3. If input finishes when stack becomes empty accept; if stack becomes empty while there is still input or input finishes while stack is not empty reject Limits of Computation p.46/?? Toward PDA formalization Formal definition of a PDA is similar to a NFA, except the stack PDA may use different alphabets for input and stack, we will denote them by Σ and Γ Nondeterminism allows PDA to make transitions on empty input. Hence we will use Σǫ = Σ {ǫ} and Γǫ = Γ {ǫ} Domain of the PDA transition function is Q Σǫ Γǫ where Q is the set of states Since a PDA can write on the stack while performing nondeterministic transitions the range of the PDA transition function is P(Q Γǫ) In conclusion: δ : Q Σǫ Γǫ P(Q Γǫ) Limits of Computation p.48/?? Terminology Writing a symbol on the stack is referred to as pushing the symbol Removing a symbol from the stack is referred to as popping the symbol All access to the stack may be done only at the top In other words: a stack is a last-in first-out" storage device Limits of Computation p.45/?? Nature of PDA PDA may be nondeterministic and this is a crucial feature because nondeterminism adds power Some languages, such as {0 n 1 n n 0} do not require nondeterminism, but other do. Such a language is {ww R w {0, 1} } Limits of Computation p.47/??

PDA computation A PDA M = (Q, Σ, Γ,δ,q0,F) computes as follows: M inputs w = w1w2...wm, where each wi Σǫ There are a sequence of states r0, r1,...,rm Q and a sequence of strings s0, s1,...,sn Γ that satisfy the conditions: 1. r0 = q0,s0 = ǫ, i.e., M starts in the start state with empty stack 2. For i = 0, 1,...,m 1 we have (ri+1, b) δ(ri, wi+1, a) where si = at and si+1 = bt for some a, b Γǫ and t Γ, i.e., M moves properly according to the state, stack, and input symbol 3. rm F, i.e., an accept state occurs at the input end Limits of Computation p.50/?? Transition diagrams of PDA We can use state transition diagrams to describe PDA Such diagrams are similar to state transition diagrams used to describe finite automata To show how PDA uses the stack we write a,b c" to signify that the machine: 1. is reading a from input 2. may replace the symbol b on top of the stack 3. with the symbol c where any of a, b, c may be ǫ Limits of Computation p.52/?? Definition 2.13 A pushdown automaton is a 6-tuple (Q, Σ, Γ,δ,q0,F) where Q, Σ, Γ, and F are finite sets, and: 1. Q is a set of states 2. Σ is the input alphabet 3. Γ is the stack alphabet 4. δ : Q Σǫ Γǫ P(Q Γǫ) is the transition function 5. q0 Q is the start state 6. F Q is the set of accept states Limits of Computation p.49/?? Example PDA The PDA that recognizes the language {0 n 1 n n 0} is M1 = (Q, Σ, Γ,δ,q1,F) where: Q = {q1, q2, q3, q4}, Σ = {0, 1}, Γ = {0, $}, F = {q1, q4} δ is defined by the table: Σǫ 0 1 ǫ Q/Γǫ 0 $ ǫ 0 $ ǫ 0 $ ǫ q1 {q2, $} q2 {(q2, 0)} {(q3, ǫ)} q3 {(q3, ǫ)} {(q4, ǫ)} q4 Limits of Computation p.51/??

Transition diagram of PDA M1 The state transition diagram of M1 recognizing the language {0 n 1 n n 0} ǫ, ǫ q1 $ 0, ǫ 0 q2 1, 0 ǫ 1, 0 ǫ q3 q4 ǫ,$ ǫ Limits of Computation p.54/?? Example 2.14 This example provides the PDA M2 that recognizes the language {a i b j c k i, j, k 0 i = j i = k} Informal description: The PDA first reads and push a s on the stack When this is done, it cam match as with bs or cs Since machine does not know in advance whether as are matched with bs or cs, nondeterminism helps Using nondeterminism the machine can guess: the machine has two branches, one for each possible guess If either of these branches match, that branch accepts and the entire machine accepts Limits of Computation p.56/?? Interpretation If a is ǫ, the machine may make this transaction without reading any symbol from the input If b is ǫ, the machine may make this transaction without reading and popping any symbol from the stack If c is ǫ, the machine may make this transaction without writing any symbol on the stack Limits of Computation p.53/?? Note 1 The formal definition of a PDA contains no explicit mechanism for testing the empty stack PDA M1 tests empty stack by initially placing a special symbol, $, on the stack If ever it sees $ again on the stack, it knows that the stack is effectively empty This is the procedure we use to test empty stack Limits of Computation p.55/??

Example 2.18 This example provides the PDA M3 that recognizes the language {ww R w {0, 1} }. Recall that w R means w written backwards. Limits of Computation p.58/?? Transition diagram of PDA M3 The state transition diagram of M3 recognizing the language {ww R w {0, 1}} ǫ, ǫ q1 $ 0, ǫ 0 q2 1, ǫ 1 ǫ, ǫ ǫ q3 q4 ǫ,$ ǫ 0, 0 ǫ 1, 1 ǫ Limits of Computation p.60/?? M2 recognizing {a i b j c k i, j, k 0 i = j i = k} The transition diagram q1 ǫ, ǫ $ q2 ǫ, ǫ ǫ a, ǫ a ǫ, ǫ ǫ b, a ǫ q3 q5 b, ǫ ǫ ǫ, $ ǫ q4 ǫ, ǫ ǫ c, ǫ ǫ q6 c, a ǫ ǫ, $ ǫ q7 Limits of Computation p.57/?? Informal description 1. M3 begins by pushing the symbols that are read onto the stack 2. At each point M3 nondeterministically guesses that the middle of the input has been reached 3. When middle of the word has reached the machine starts popping off the stack for each symbol read, checking to see that what is read and what is popped off is the same symbol 4. If the symbol read from the input and popped out from the stack is the same and the stack empties as the same time as input is finished, accept; otherwise reject The transition diagram of M3 is next. Limits of Computation p.59/??

Theorem 2.20 A language is context-free iff some pushdown automaton recognizes it. Note: This means that: 1. if a language L is context-free then there is a PDA ML that recognizes it 2. if a language L is recognized by a PDA ML then there is a CFG GL that generates L. Limits of Computation p.62/?? What is a derivation? Recall: For G = (V, Σ,R,S), w L(G): A derivation of w is a sequence of substitutions S S α1 w1 A α2... B α k w, S α1, A α2,...,b αk R, S, A,..., B not necessarily distinguished, made as the grammar generates w Each step in the derivation yields an intermediate string of variables and terminals Hence, P will determine whether some series of substitutions using rules in R can lead from start variable S to w Limits of Computation p.64/?? Motivation CFG and PDA are equivalent in power: both specify context-free languages We show here how to convert a CFG into a PDA that recognizes the language specified by the CFG and vice versa Application: this equivalence allows a CFG to be used to specify a programming language and the equivalent PDA to be used to specify and implement its compiler. Limits of Computation p.61/?? Lemma 2.21 If a language is context-free then some pushdown automaton recognizes it Proof idea: 1. Let A be a CFL. From the definition we know that A has a CFG G, generating it 2. We will show how to convert G into a PDA P that accepts strings w if G generates w 3. P will work by determining a derivation of w. Limits of Computation p.63/??

Note If while consuming w, P arrives at a string of terminals that equals w then accept; otherwise reject Limits of Computation p.66/?? The way around Try to reconstruct the leftmost derivation of w Keep only part of the intermediate string on the stack starting with the first variable in the intermediate string. Any terminal symbol appearing before the first variable can be matched with symbols in the input. The graphic image of P is next. Limits of Computation p.68/?? Difficulties expected How should we figure out which substitution to make? PDA nondeterminism allows us to guess At each step of the derivation one of the rules for a particular variable is selected nondeterministically How does P starts? P begins by writing the start variable on the stack and then continues working this string Limits of Computation p.65/?? More questions The initial string (the start variable) is on the stack. How does P stores the other intermediate strings? Using the stack doesn t quite work because the PDA needs to find the variables in the intermediate string and make substitution. Note: stack does not support this because only the top is accessible Limits of Computation p.67/??

Informal description of P Place the marker symbol $ and the start variable on the stack Repeat 1. If the top of the stack is a variable symbol A, nondeterministically select a rule r such that lhs(r) = A and substitute A by the string rhs(r). 2. If the top of the stack is a terminal symbol, a, read the next input symbol and compare it with a. If they match pop the stack; if they don t match reject on this branch of nondeterminism 3. If the top of the stack is the symbol $, enter the accept state: if all text has been read accept, otherwise reject. until accept or reject Limits of Computation p.70/?? Formal construction Let q,r Q, a Σǫ and s Γǫ. Assume that we want P to go from q to r when it reads a, and replaces s by the string u = u1...uk in the stack: δ(q,a,s) has (r,u) That is, we want to extend the transition function δ from δ : Q Σǫ Γǫ P(Q Γǫ) to δ : Q Σǫ Γǫ P(Q Γ ) Limits of Computation p.72/?? An intermediate string P representing 01A1A0 stack $ 0 A 1 A Intermediate string 0 1 A 1 A 0 matched A1A0 R Input Control 0 1 1 0 0 1 Limits of Computation p.69/?? Proof of lemma 2.21 Now we can give formal details of the constriction of the PDA P = (Q, Σ, Γ,δ,q1,F) First we introduce an appropriate notation for transition function that provides a way to write an entire string rhd(r) on the stack in one step of the machine Simulation: this action can be simulated by introducing additional states to write the string symbol by symbol Limits of Computation p.71/??

Graphic Implementing the shorthand (r, xyz) δ(q, a, s) q a, s xyz r q a, s z q1 ǫ, ǫ y ǫ, ǫ x r q2 Limits of Computation p.74/?? Main loop transitions 1. First we handle the case where the top of the stack is a variable, by setting: δ(qloop, ǫ, A) = {(qloop, w) A w R} where R is the set of rules of CFG generating the language 2. Then we handle the case where the top of the stack is a terminal, setting: δ(qloop, a, a) = {(qloop, ǫ)} 3. Finally, if the top of stack is $ we set: δ(qloop, ǫ, $) = {(qaccept, ǫ)} The state diagram of P is next. Limits of Computation p.76/?? Implementation This construction can be implemented by introducing the new states q1,...,qk 1 and setting transition function as follows: δ(q, a, s) has (q1, uk), δ(q1, ǫ, ǫ) = {(q2, uk 1)}, δ(q2, ǫ, ǫ) = {(q3, uk 2)},...... δ(qk 1, ǫ, ǫ) = {(r, u1)} Limits of Computation p.73/?? Construction of P The states of P are Q = {qstart,qloop,qaccept} E where E is the set of states that we need to implement the shorthand The transition function is defined as follows: Initialize the stack to contain $ and S, i.e., δ(qstart, ǫ, ǫ) = {(qloop, S$)} Construct transitions for the main loop Limits of Computation p.75/??

Example 2.25 We use procedure developed during the proof of Lemma 2.21 to construct the PDA PG that recognizes the language generated by the CFG G with the rules: S atb b T Ta ǫ A direct application of the construction leads us to the PDA in the slide. Limits of Computation p.78/?? Lemma 2.27 If a language is recognized by a pushdown automaton then that language is a context-free language. The proof is omitted. Limits of Computation p.80/?? State diagram of P qstart ǫ, ǫ S$ qloop ǫ,$ ǫ qaccept ǫ, A a w ǫ for rule A w a, for terminal a State diagram of PG qstart ǫ, ǫ S$ ǫ,$ ǫ qloop qaccept 2 ǫ, S b ǫ, T ǫ a, a ǫ b, b ǫ ǫ, S b ǫ, ǫ T ǫ, ǫ a ǫ, T a ǫ, ǫ T Limits of Computation p.77/?? Limits of Computation p.79/??

Informal Pumping lemma for context-free languages states that every CFL has a specific value called pumping length such that all longer strings in the language can be pumped However, the meaning of pumping is a bit more complex than in case of regular languages Here pumping means that a string can be divided into five parts so that the second and fourth parts may be repeated any number of times and the resulting string is in the language Limits of Computation p.82/?? Interpretation Condition 1 states that length of strings of A can be unlimited but have a fixed structure uvxyz Condition 2 says that in the structure uvxyz either v or y is not empty; otherwise theorem would be trivially true Condition 3 states that pieces v, x, y together have length at most p; this is useful for proving that some languages are not context-free Limits of Computation p.84/?? Proving context-freeness We develop here a mechanism for proving that a given language is or not context-free This mechanism starts by recalling that pumping lemma was used for proving that a given language is or not regular. Here we present a similar lemma for context-free languages Limits of Computation p.81/?? Pumping lemma for CFL Theorem 2.34: If A is a context-free language, then there is a number p (the pumping length) where, if s A and s p, then s may be divided into five pieces, s = uvxyz satisfying the conditions: 1. For each i 0, uv i xy i z A 2. vy 0 3. vxy p Limits of Computation p.83/??

Note Repetition of X in Ds noticed before allows us to replace the subtree under the second occurrence of X with the subtree under the first occurrence of X and still get a legal derivation tree. Hence, we may cut s into five pieces, as shown in the next slide, and we may repeat the second and fourth piece and obtain a string uv i xy i z A, for any i 0 Limits of Computation p.86/?? Formal proof Let G = (V, Σ,R,S) be a CFG that generates A. Let b 2 be the maximum number of symbols in the rhs of rules in R, i.e. b = max{ rhs(r) r R}. In any derivation tree D using G we know that a node cannot have more than b children; i.e., at most b leaves are in step 1 from S; at most b 2 leaves in step 2; at most b h in step h. If the hight of D is at most h the length of the string generated is at most b h If V is the number of variables in G, then we set p to b V +2 ; since b 2, p > b V +1. Consequently, a derivation tree of any string from A of length at least p requires a hight at least V + 2 Limits of Computation p.88/?? Proof idea Let A be a CFL and G be the CFG generating A. We must show that any sufficiently long s A can be pumped and remains in A. The idea behind this proof is: 1. Because s A it is derivable from G and has a derivation tree Ds 2. The derivation tree Ds of s must be very tall because s is very long 3. This means that Ds contains some long path from start variable at the root to one of terminal symbols at a leaf 4. On this long path some variable symbol X must be repeated because of pigeonhole principle Limits of Computation p.85/?? Vary tall trees T... X... X u v x y z T... X... X... u v X y z v x y T... X x u z Limits of Computation p.87/??

Formal proof, continuation Now we divide s into uvxyz. Each occurrence of X has a subtree under it, generating a portion of s The upper occurrence of X has the larger subtree and generates vxy, whereas the lower occurrence generates just x Since both of the trees mentioned above are generated by X we may substitute one for the other and still obtain a valid derivation tree Replacing the smaller tree by the larger repeatedly, gives derivation trees for the string uv i xy i z, at each i > 1. Replacing the larger tree by the smaller one generates the string uxz. This establishes condition 1 of the lemma. Limits of Computation p.90/?? Condition 3 Now we need to be sure that vxy p In the derivation tree Ds the upper occurrence of X generates vxy We chose X so that both occurrences fall within the bottom V + 1 variables on the path and we chose the longest path in Ds Hence, the subtree where X generates vxy is at most V + 2 high. A tree of this height can generates a string of length at most b V +2 = p Limits of Computation p.92/?? Formal proof, continuation Suppose s A and s p. We will show how to pump s Let Ds be the derivation tree of s; if s has several derivation trees we choose Ds to be the tree with the smallest number of nodes Since s p we know that Ds has hight at least V + 2, so the longest path in Ds is also V + 2 The longest path in Ds has at least V + 1 nodes labeled by variables because only the leaf is a terminal Since there are only V variables in V, some variable X appear more than once on the longest path in Ds; we may assume that X is the closest variables to the leaf that repeat on this path. Limits of Computation p.89/?? Condition 2 To get condition 2 we must ensure that not both v and y are ǫ. If both v and y were ǫ the derivation tree obtained by substituting the smaller tree for the larger tree would have fewer nodes than Ds and would still generates s. This is impossible because we have chosen Ds to be the derivation tree for s with the smallest number of nodes Limits of Computation p.91/??

Checking condition 2 Condition 2 stipulates that either v or y is not empty. 1. When both v and y contain only one type of symbols (a,b,c) v does not contain both a s and b s or both b s and c s; the same hold for y. In this case uv 2 xy 2 z cannot contain equal number of a s, b s and c s., hence it cannot be in B which violates condition 1 of the lemma 2. When either v of y contain more than one type of symbols (a,b,c) uv 2 xy 2 z may contain equal numbers of a s, b s, c s but they don t come in the right order. Hence, it cannot be in B either One of these cases must occur. Because both cases result in contradiction, a contradiction is unavoidable so the assumption B is a CFL must be false Limits of Computation p.94/?? Case 1 Because v or y contains one symbol, one of the symbols a, b, or c does not appear in v or y. 1. The a s do not appear in v and y. Consider the string uv 0 xy 0 z = uxz which contains the same number of a s as s but it contain fewer b s or fewer c s. Therefore uxz is not a member of C. 2. The b s do not appear in v and y. Since not both v and y may be empty strings, a s or c s must appear in v or y. If a s appear the string uv 2 xy 2 z contains more a s than b s so it is not in C; if c s appear, the string uv 0 xy 0 z contains more b s than c s so it is not in C. Either way we obtain a contradiction 3. The c s do not appear. Then uv 2 xy 2 z contains more a s or b s than c s so it is not in C, and a contradiction occurs. Limits of Computation p.96/?? Example 2.36 Use pumping lemma to show that the language B = {a n b n c n n 0} is not context-free Proof: assume that B is context-free to obtain a contradiction Let p be the pumping length for B that is guaranteed to exists by pumping lemma Consider the string s = a p b p c p B of length at least p The pumping lemma states that s can be pumped and here we show that it cannot be pumped I.e, we will show that no matter how we divide s into uvxyz one of the three conditions of the pumping lemma is violated Limits of Computation p.93/?? Example 2.37 Use pumping lemma to show that C = {a i b j c k 0 i j k} is not a context-free language Proof: assume that C is CFL and obtain a contradiction. Let p be the pumping length and s = a p b p c p. We will try to pump it down and pump up. Let s = uvxyz and again consider two cases 1. Both v and y contain only one of the symbols a,b,c 2. Either v or y contain more than one of symbols a,b,c Limits of Computation p.95/??

Example 2.38 Use pumping lemma to show that D = {ww w {0, 1} } is not a CFL Proof: assume that D is CFL and obtain a contradiction. Let p be the pumping length given by pumping lemma. However, the choosing of s is less obvious 1. Try s = 0 p 10 p 1. But we may chose: u = 0 p 1, v = 0, x = 1, y = 0, z = 0 p 1 1 and we can see that it can be pumped. 2. Another candidate is s = 0 p 1 p 0 p 1 p. We shows that this string cannot be pumped using condition 3 of the pumping lemma. Limits of Computation p.98/?? Closure Properties of CFL Closed under union, Closed under concatenation Closed under star operation Closed under intersection with a regular language Limits of Computation p.100/?? Case 2 When either v or y contain more than one of a, b, c, then uv 2 xy 2 z will not contain the symbols a, b, c in the correct order. Hence it cannot be a member of C and a contradiction occurs. Conclusion : s cannot be pumped in violation of pumping lemma and C is not a CFL. Limits of Computation p.97/?? Using s = 0 p 1 p 0 p 1 p Assume that s can be pumped and set s = uvxyz where vxy p. String vxy must contain the midpoint of s. If vxy would occur only in the first half of s, pumping uv 2 xy 2 z moves a 1 into the first position of the second half and so it cannot be of the form ww. Similarly if vxy is in the second part of s. pumping uv 2 xy 2 z moves a 0 into the last position of the first half of s so it cannot be of the form ww. If vxz contains the midpoint of s, when we pump s down to uxz, it has the form 0 p 1 i 0 j 1 p where i and j cannot be both p. Hence this string is not of the form ww either. Hence, s cannot be pumped and D is not a CFL Limits of Computation p.99/??