1 CSE 311 Lecture 21: ContextFree Grammars Emina Torlak and Kevin Zatloukal 1
2 Topics Regular expressions A brief review of Lecture 20. Contextfree grammars Syntax, semantics, and examples. 2
3 Regular expressions A brief review of Lecture 20.
5 Sets of strings as languages A language is a sets of strings with specific syntax, e.g.: Syntactically correct Java/C/C++ programs. The set of all strings over the alphabet. Palindromes over. Binary strings with no 1 s before 0 s. 4
6 Sets of strings as languages A language is a sets of strings with specific syntax, e.g.: Syntactically correct Java/C/C++ programs. The set of all strings over the alphabet. Palindromes over. Binary strings with no 1 s before 0 s. Regular expressions let us specify regular languages, e.g.: All binary strings. The strings {0000, 0010, 1000, 1010}. All strings that contain the string CSE311. 4
7 Regular expressions over : syntax Basis step:, ε are regular expressions. is a regular expression for any. a a Recursive step: If A and B are regular expressions, then so are,, and. AB A B A 5
8 Regular expressions over : syntax Basis step:, ε are regular expressions. is a regular expression for any. a a Recursive step: If A and B are regular expressions, then so are,, and. AB A B A = {0, 1} Examples: regular expressions over Basis:, ε, 0, 1. Recursive:,,, etc (0 1)0(0 1)0 5
9 Regular expressions over : semantics A regular expression over represents a set of strings over. 6
10 Regular expressions over : semantics A regular expression over represents a set of strings over. represents the set with no strings. 6
11 Regular expressions over : semantics A regular expression over represents a set of strings over. represents the set with no strings. represents the set. ε {ε} 6
12 Regular expressions over : semantics A regular expression over represents a set of strings over. represents the set with no strings. ε represents the set {ε}. represents the set. a {a} 6
13 Regular expressions over : semantics A regular expression over represents a set of strings over. represents the set with no strings. ε represents the set {ε}. a represents the set {a}. AB represents the concatenation of the sets represented by A and B:. {a b a A, b B} 6
14 Regular expressions over : semantics A regular expression over represents a set of strings over. represents the set with no strings. ε represents the set {ε}. a represents the set {a}. AB represents the concatenation of the sets represented by A and B: {a b a A, b B}. represents the union of the sets represented by and :. A B A B A B 6
15 Regular expressions over : semantics A regular expression over represents a set of strings over. represents the set with no strings. ε represents the set {ε}. a represents the set {a}. AB represents the concatenation of the sets represented by A and B: {a b a A, b B}. A B represents the union of the sets represented by A and B: A B. A represents the concatenation of the set represented by A with itself zero or more times: A = {ε} A AA AAA AAAA 6
16 Regular expressions over : semantics A regular expression over represents a set of strings over. represents the set with no strings. ε represents the set {ε}. a represents the set {a}. AB represents the concatenation of the sets represented by A and B: {a b a A, b B}. A B represents the union of the sets represented by A and B: A B. A represents the concatenation of the set represented by A with itself zero or more times: A = {ε} A AA AAA AAAA This just defines a recursive function definition for computing the meaning of a regular expression: language( ) = {} language(ε) = {ε} language(ab) = {a b a language(a), b language(b)} language(a B) = language(a) language(b) language( A ) = {ε} language(a) language(aa) 6
17 Examples of regular expressions (0 1)0(0 1)0 (0 1 ) (0 1) 0110(0 1) 7
18 Examples of regular expressions Binary strings with 00 followed by any number of 1s. (0 1)0(0 1)0 (0 1 ) (0 1) 0110(0 1) 7
19 Examples of regular expressions Binary strings with 00 followed by any number of 1s. Binary strings with any number of 0s followed by any number of 1s. (0 1)0(0 1)0 (0 1 ) (0 1) 0110(0 1) 7
20 Examples of regular expressions Binary strings with 00 followed by any number of 1s. Binary strings with any number of 0s followed by any number of 1s. (0 1)0(0 1)0 {0000, 0010, 1000, 1010} (0 1 ) (0 1) 0110(0 1) 7
21 Examples of regular expressions Binary strings with 00 followed by any number of 1s. Binary strings with any number of 0s followed by any number of 1s. (0 1)0(0 1)0 {0000, 0010, 1000, 1010} (0 1 ) All binary strings. (0 1) 0110(0 1) 7
22 Examples of regular expressions Binary strings with 00 followed by any number of 1s. Binary strings with any number of 0s followed by any number of 1s. (0 1)0(0 1)0 {0000, 0010, 1000, 1010} (0 1 ) All binary strings. (0 1) 0110(0 1) Binary strings that contain
23 Regular expressions in practice Used to define the tokens in a programming language. Legal variable names, keywords, etc. Used in grep, a Unix program that searches for patterns in a set of files. For example, grep "311" *.md searches for the string 311 in all Markdown files in the current directory. Used in programs to process strings. These slides are generated with the help of regular expressions :) 8
24 Contextfree grammars Syntax, semantics, and examples. 9
25 Regular expressions can specify only regular languages But many languages aren t regular, including simple ones such as palindromes, and strings with an equal number of 0s and 1s. Many programming language constructs are also irregular, such as expressions with matched parentheses, and properly formed arithmetic expressions. 10
26 Regular expressions can specify only regular languages But many languages aren t regular, including simple ones such as palindromes, and strings with an equal number of 0s and 1s. Many programming language constructs are also irregular, such as expressions with matched parentheses, and properly formed arithmetic expressions. Contextfree grammars are a more powerful formalism that lets us specify all of these example languages (i.e., sets of strings)! 10
27 Contextfree grammars over : syntax A contextfree grammar (CFG) is a finite set of production rules over: An alphabet of terminal symbols. A finite set V of nonterminal symbols. A start symbol from, usually denoted by (i.e., ). V S S V 11
28 Contextfree grammars over : syntax A contextfree grammar (CFG) is a finite set of production rules over: An alphabet of terminal symbols. A finite set V of nonterminal symbols. A start symbol from, usually denoted by (i.e., ). V S S V A V A w 1 w 2 w k where each (V ) A production rule for a nonterminal w i takes the form is a string of nonterminals and terminals. 11
29 Contextfree grammars over : syntax A contextfree grammar (CFG) is a finite set of production rules over: An alphabet of terminal symbols. A finite set V of nonterminal symbols. A start symbol from, usually denoted by (i.e., ). V S S V A V A w 1 w 2 w k where each (V ) A production rule for a nonterminal w i takes the form is a string of nonterminals and terminals. Only nonterminals can appear on the le!hand side of a production rule. 11
30 Contextfree grammars over A CFG over represents a set of strings over. : semantics Compute (or generate) a string from this set as follows: S 1. Begin with the start symbol as the current string. 2. If the current string contains a nonterminal A, apply the rule A w 1 w k to replace A in the current string with one of the w i s. 3. Repeat step 2 until the current string contains only terminals. 12
31 Contextfree grammars over A CFG over represents a set of strings over. : semantics Compute (or generate) a string from this set as follows: S 1. Begin with the start symbol as the current string. 2. If the current string contains a nonterminal A, apply the rule A w 1 w k to replace A in the current string with one of the w i s. 3. Repeat step 2 until the current string contains only terminals. A CFG represents the set of all strings over that can be generated in this way. 12
32 Example contextfree grammars S 0S0 1S1 0 1 ε S 0S S1 ε S (S) SS ε { 0 n 1 n : n 0} CFG for, strings an equal number of 0s and 1s. 13
33 Example contextfree grammars S 0S0 1S1 0 1 ε The set of all binary palindromes. S 0S S1 ε S (S) SS ε { 0 n 1 n : n 0} CFG for, strings an equal number of 0s and 1s. 13
34 Example contextfree grammars S 0S0 1S1 0 1 ε The set of all binary palindromes. S 0S S1 ε 0 1 The set of strings denoted by the regular expression. S (S) SS ε { 0 n 1 n : n 0} CFG for, strings an equal number of 0s and 1s. 13
35 Example contextfree grammars S 0S0 1S1 0 1 ε The set of all binary palindromes. S 0S S1 ε The set of strings denoted by the regular expression. S (S) SS ε The set of all strings of matched parentheses. { 0 n 1 n : n 0} 0 1 CFG for, strings an equal number of 0s and 1s. 13
36 Example contextfree grammars S 0S0 1S1 0 1 ε The set of all binary palindromes. S 0S S1 ε The set of strings denoted by the regular expression. S (S) SS ε The set of all strings of matched parentheses. { 0 n 1 n : n 0} S 0S1 ε 0 1 CFG for, strings an equal number of 0s and 1s. 13
37 Another example CFG: simple arithmetic expressions E E + E E E (E) x y z (2 x) + y Can this CFG generate? 14
38 Another example CFG: simple arithmetic expressions E E + E E E (E) x y z (2 x) + y Can this CFG generate? E 14
39 Another example CFG: simple arithmetic expressions E E + E E E (E) x y z (2 x) + y Can this CFG generate? E E + E 14
40 Another example CFG: simple arithmetic expressions E E + E E E (E) x y z (2 x) + y Can this CFG generate? E E + E (E) + E 14
41 Another example CFG: simple arithmetic expressions E E + E E E (E) x y z (2 x) + y Can this CFG generate? E E + E (E) + E (E E) + E 14
42 Another example CFG: simple arithmetic expressions E E + E E E (E) x y z (2 x) + y Can this CFG generate? E E + E (E) + E (E E) + E (2 E) + E 14
43 Another example CFG: simple arithmetic expressions E E + E E E (E) x y z (2 x) + y Can this CFG generate? E E + E (E) + E (E E) + E (2 E) + E (2 x) + E 14
44 Another example CFG: simple arithmetic expressions E E + E E E (E) x y z (2 x) + y Can this CFG generate? E E + E (E) + E (E E) + E (2 E) + E (2 x) + E (2 x) + y 14
45 Another example CFG: simple arithmetic expressions E E + E E E (E) x y z Can this CFG generate? Can this CFG generate (2 x) + y E E + E (E) + E (E E) + E (2 E) + E (2 x) + E (2 x) + y x + y z in two entirely different ways? 14
46 Another example CFG: simple arithmetic expressions E E + E E E (E) x y z Can this CFG generate? Can this CFG generate (2 x) + y E E + E (E) + E (E E) + E (2 E) + E (2 x) + E (2 x) + y x + y z in two entirely different ways? E E + E x + E x + E E x + y E x + y z 14
47 Another example CFG: simple arithmetic expressions E E + E E E (E) x y z Can this CFG generate? Can this CFG generate (2 x) + y E E + E (E) + E (E E) + E (2 E) + E (2 x) + E (2 x) + y x + y z in two entirely different ways? E E + E x + E x + E E x + y E x + y z E E E E + E E x + E E x + y E x + y z 14
48 Another example CFG: simple arithmetic expressions E E + E E E (E) x y z Can this CFG generate? Can this CFG generate (2 x) + y E E + E (E) + E (E E) + E (2 E) + E (2 x) + E (2 x) + y x + y z in two entirely different ways? E E + E x + E x + E E x + y E x + y z E E E E + E E x + E E x + y E x + y z This is perfectly valid according to the CFG rule, but it violates operator precedence for arithmetic! How can we write our grammar to enforce operator precedence? 14
49 Building precedence in simple arithmetic expressions E T F I N T E + T F F T (E) I N x y z We use multiple production rules to encode precedence. E generates expressions; it s the start symbol. T generates terms. F generates factors. I generates identifiers. generates numbers. N 15
50 Visualizing CFG derivations with parse trees Suppose that a grammar generates a string. The sequence of steps (rule applications) that generates is called a derivation. x We represent derivations as parse trees. The root of the tree is the start symbol. The internal nodes are the nonterminal symbols in the derivation. The leaves are the terminal symbols in the derivation. G x Palindrome grammar S 0S0 1S1 0 1 ε Derivation of S 0S0 01S S 0 S 0 1 S
51 In practice, CFGs are o"en given in BackusNaur Form BackusNaur Form (BNF) is a notation for CFGs developed for specifying the syntax of programming languages. Production rules use ::= instead of. Nonterminals are denoted by names enclosed in angle brackets, e.g., <identifier>, <digit>, <expression>, etc. E T F I N T E + T F F T (E) I N x y z <expression> ::= <term> <expression> + <term> <term> ::= <factor> <factor> * <term> <factor> ::= (<expression>) <identifier> <number> <identifier> ::= x y z <number> ::=
52 Summary A regular expression defines a set of strings over an alphabet., ε, and a are regular expressions. If A and B are regular expressions, then so are (AB), (A B), A. Many practical applications, from grep to everyday programming. Contextfree grammars (CFGs) are a more expressive formalism for specifying strings over an alphabet. A CFG consists of a set of terminal symbols, a set of nonterminal symbols including the distinguished start symbol, and a set of production rules that specify how to rewrite nonterminals in a string. Used for specifying programming language syntax and for parsing. 18
