Context-Free Grammars 1
Informal Comments A context-free grammar is a notation for describing languages. It is more powerful than finite automata or RE s, but still cannot define all possible languages. Useful for nested structures, e.g., parentheses in programming languages. 2
Informal Comments (2) Basic idea is to use variables to stand for sets of strings (i.e., languages). These variables are defined recursively, in terms of one another. Recursive rules ( productions ) involve only concatenation. Alternative rules for a variable allow union. 3
Context Free Grammar G = (V,, P, S) V: Alphabet of non-terminals : Alphabet of terminals P: set of productions of the form A -> where A V and (V U )* S: an element of V, called the start symbol 4
Example: CFG for { 0 n 1 n n > 1} Productions: S -> 01 S -> 0S1 Basis: 01 is in the language. Induction: if w is in the language, then so is 0w1. 5
CFG Formalism Terminals = symbols of the alphabet of the language being defined. Variables = nonterminals = a finite set of other symbols, each of which represents a language. Start symbol = the variable whose language is the one being defined. 6
Productions A production has the form variable -> string of variables and terminals. Convention: A, B, C, are variables. a, b, c, are terminals., X, Y, Z are either terminals or variables., w, x, y, z are strings of terminals only.,,, are strings of terminals and/or variables. 7
Example: Formal CFG Here is a formal CFG for { 0 n 1 n n > 1}. Terminals = {0, 1}. Variables = {S}. Start symbol = S. Productions = S -> 01 S -> 0S1 8
Derivations Intuition We derive strings in the language of a CFG by starting with the start symbol, and repeatedly replacing some variable A by the right side of one of its productions. That is, the productions for A are those that have A on the left side of the ->. 9
Derivations Formalism We say A => if A -> is a production. A binary relation over Example: S -> 01; S -> 0S1. S => 0S1 => 00S11 => 000111. 10
Iterated Derivation =>* means zero or more derivation steps. A binary relation over, which relates 2 strings Basis: =>* for any string. Induction: if =>* and =>, then =>*. 11
Example: Iterated Derivation S -> 01; S -> 0S1. S => 0S1 => 00S11 => 000111. So S =>* S; S =>* 0S1; S =>* 00S11; S =>* 000111. 12
Another Example Terminals = {a, b}. Variables = {S}. Start symbol = S. Productions = S -> asa S -> bsb S -> 13
Sentential Forms Any string of variables and/or terminals derived from the start symbol is called a sentential form. Formally, is a sentential form iff S =>*. 14
Language of a Grammar If G is a CFG, then L(G), the language of G, is {w S =>* w}. Note: w must be a terminal string, S is the start symbol. Example: G has productions S -> ε and S -> 0S1. L(G) = {0 n 1 n n > 0}. Note: ε is a legitimate right side. 15
Context Free Grammar A grammar G = (V,, P, S) is a context free grammar if each element of P is of the form: X -> where X V and (V U )* 16
Context-Free Languages A language L is a context-free language if there is a context-free grammar G such that L = L (G) L(G) is the language generated by the grammar G There are CFL s that are not regular languages, such as the example just given. But not all languages are CFL s. Intuitively: CFL s can count two things, not three. 17
Another Example S -> ab S -> ba A -> as A baa A -> a B -> bs B -> abb B -> b V = {S, A, B}, = {a,b}, P, S 18
Derivation Example S => ab => aabb => aabbs => aabbba => aabbba Questions: Choice between terminals in the RHS to replace. Also, which rule to use 19
Leftmost and Rightmost Derivations Derivations allow us to replace any of the variables in a string. Leads to many different derivations of the same string. By forcing the leftmost variable (or alternatively, the rightmost variable) to be replaced, we avoid these distinctions without a difference. 20
S => ab => aabb => aabsb => aababb => aababb => aababb Derivation Example 21
Derivation Example Any string x such that S =>* x, x *, will always have a lm derivation Every NT is rewritten independent of its context A -> x does not constrain us on which context A occurs in Context free 22
Leftmost Derivations Say wa => lm w if w is a string of terminals only and A -> is a production. Also, =>* lm if becomes by a sequence of 0 or more => lm steps. 23
Example: Leftmost Derivations Balanced-parentheses grammmar: S -> SS (S) () S => lm SS => lm (S)S => lm (())S => lm (())() Thus, S =>* lm (())() S => SS => S() => (S)() => (())() is a derivation, but not a leftmost derivation. 24
Rightmost Derivations Say Aw => rm w if w is a string of terminals only and A -> is a production. Also, =>* rm if becomes by a sequence of 0 or more => rm steps. 25
Example: Rightmost Derivations Balanced-parentheses grammmar: S -> SS (S) () S => rm SS => rm S() => rm (S)() => rm (())() Thus, S =>* rm (())() S => SS => SSS => S()S => ()()S => ()()() is neither a rightmost nor a leftmost derivation. 26
Examining a grammar G1 = {{S}, {0, 1}, {S -> 01; S -> 0S1}, S} Claim: G1 generates precisely all strings of the form 0 n 1 n, n >= 1 To show: L(G1) {0 n 1 n, n >= 1} and {0 n 1 n, n >= 1} L(G1) 27
Examining a grammar Basis: n = 1, S = 01 IH: G1 generates all strings of the form 0 n 1 n To show 0 n+1 1 n+1 can be generated. We can prove S =>* 0 n S1 n always Assume S =>* 0 n S1 n and show S =>* 0 n+1 S1 n+1 Other side: This grammar generates strings only of this form 28
S -> ab S -> ba A -> as A baa A -> a B -> bs B -> abb B -> b The other grammar G: V = {S, A, B}, = {a,b}, P, S 29
A word about the grammar L(G) = {x {a, b}*, x >= 2 such that x has equal number of a s and b s} If S =>* x {a, b}*, then x has equal number of a s and b s and any string with equal number of a s and b s can be generated from S If A =>* x {a, b}*, then x has one more a than b and any string with one more a than b can be generated from A If B =>* x {a, b}*, then x has one more b than a and any string with one more b than a can be generated from B 30
P1: L(G) = {x {a, b}* S =>* x} P2: {x {a, b}* no. of a s and b s are equal} Q1: L(G) = {x {a, b}* A =>* x} Q2: {x {a, b}* no. of a s is one more} R1: L(G) = {x {a, b}* B =>* x} R2: {x {a, b}* no. of b s is one more} To show P1 = P2, Q1 = Q2 and R1=R2 Induction over the length of the string helps us show: P2 P1, Q2 Q1, R2 R1 Induction over the length of the derivation helps us show: P1 P2, Q1 Q2, R1 R2 31
Proof outline for P2 P1 Simultaneous induction Base case: S =>* ab and S =>* ba IH: Any string w {a, b}* with n a s and n b s can be generated by S, i.e. S =>* w, w has n a s and n b s To show: Any string x with n+1 a s and n+1 b s can be generated by S, i.e. S =>* x, x has n+1 a s and n+1 b s Examine the structure of x, will start with either a or b Case 1: x = ay y has one more b than a B can generate it, hence B =>* y [IH] Hence, we can use S =>* ab to generate x Case 2: x = bz z has one more a than b A can generate it, hence A =>* z [IH] Hence, we can use S =>* ba to generate x Hence, proved. 32
Proof outline for Q2 Q1 Base case: A =>* a IH: Any string w {a, b}* with n a s and n-1 b s can be generated by A, i.e. S =>* w, w has n a s and n-1 b s To show: Any string x with n+1 a s and n b s can be generated by A, i.e. A =>* x, x has n+1 a s and n b s Examine the structure of x, will start with either a or b Case 1: x = ay y has equal number of a s and b s, which can be generated by S To generate x, we can use A =>* as and thus, A =>* as =>* ay Case 2: x = bz z has 2 more a s than b s Z can be written as z = vw, where each of v and w have one more a than B => Both v and w can be generated by A Therefore, to generate x, we can use A =>* baa =>* bvw =>* bz 33
To show: P1 P2 Q1 Q2 R1 R2 1. Every string over {a,b}* derived from S has equal no. of a s and b s 2. Every string over {a,b}* derived from A has one more a 3. Every string over {a,b}* derived from B has one more b Induction over the length of the derivation S => 1 => 2. => n {a, b}* Base case: do not get a terminal string in one step from S For S, base case will be 2-step derivation: ab and ba (can verify both can be derived from S) For A, can derive a in 1 step but no other terminal string in 2 steps For B, can derive b in 1 step but no other terminal string in 2 steps 34
IH: All derivations up-to length k satisfies (1)-(3) Consider a derivation of length k + 1, starting with S S => 1 => 2. => k+1 {a, b}* First step: it can be either S => ab or ba Assume we take S => ab => 2. => k+1 Argue about k+1 being generated from B 35
Parse Trees Definitions Relationship to Left- and Rightmost Derivations Ambiguity in Grammars 36
Parse Trees Parse trees are trees labeled by symbols of a particular CFG. Leaves: labeled by a terminal or ε. Interior nodes: labeled by a variable. Children are labeled by the right side of a production for the parent. Root: must be labeled by the start symbol. 37
Example: Parse Tree S -> SS (S) () S S S ( S ) ( ) ( ) 38
Yield of a Parse Tree The concatenation of the labels of the leaves in left-to-right order That is, in the order of a preorder traversal. is called the yield of the parse tree. Example: yield of S is (())() S S ( S ) ( ) ( ) 39