ContextFree Languages. WenGuey Tzeng Department of Computer Science National Chiao Tung University


1 ContextFree Languages WenGuey Tzeng Department of Computer Science National Chiao Tung University 1
2 ContextFree Grammars Some languages are not regular. Eg. L={a n b n : n 0} A grammar G=(V, T, S, P) is contextfree if all productions are of form A x, where A V, x (V T)* A language L is contextfree if and only if there is a contextfree grammar G such that L=L(G). 2
3 Examples G=({S}, {a, b}, S, P), with P={S asa bsb } S asa aasaa aabsbaa aab baa=aabbaa L(G) = {ww R : w {a, b}*} S abb, A aabb, B bbaa L(G) = {ab(bbaa) n bba(ba) n : n 0}? 3
4 Design cfg s Give a cfg for L={a n b m : n>m} 4
5 Design cfg s Give a cfg for L={a n b m : n m 0} Idea1: parse L into two cases (not necessarily disjoint) L 1 ={a n b m : n>m} L 2 ={a n b m : n<m}. Then, construct productions for L 1 and L 2, respectively. Idea2: for L 1, produce the same amount of a s and b s, then extra a s 5
6 Give a cfg for L={a n b m c k : m=n+k} Give a cfg for L={a n b m c k : nm =k} 6
7 Give a cfg for L={a n b m c k : m>n+k} 7
8 Give a cfg for L={a n b m c k : m n+k} 8
9 Give a cfg for L={w {a,b}* : n a (w)=n b (w)} 9
10 Give a cfg for L={w {a,b}* : n a (w)>n b (w)} 10
11 What is L(G)? S asb SS L(G)? L(G)= {w {a,b}* : n a (w)=n b (w) and n a (v) n b (v), where v is any prefix of w} 11
12 Leftmost and rightmost derivation G={{A, B, S}, {a, b}, S, P}, where P contains S AB, A aaa, A, B Bb, B L(G)={a 2n b m : n, m 0} For string aab Rightmost derivation Leftmost derivation 12
13 Derivation (parse) tree A ababc 13
14 S aab, A bbb, B A 14
15 Some comments Derivation trees represent no orders of derivation Leftmost/rightmost derivations correspond to depthfirst visiting of the tree Derivation tree and derivation order are very important to programming language and compiler design 15
16 Grammar for C 16
17 Parsing and ambiguity Parsing of w L(G): find a sequence of productions by which w L(G) is derived. Questions: given G and w Is w L(G)? (membership problem) Efficient way to determine whether w L(G)? How is w L(G) parsed? Is the parsing unique? 17
18 Exhaustive search/top down parsing S SS asb bsa Determine aabb L(G)? 1 st round: (1) S SS; (2) S asb; (3) S bsa; (4) S 2 nd round: From (1), S SS SSS, S SS asbs, S SS bsas, S SS S From (2), S asb assb, S asb aasbb, S asb absab, S asb ab 3 rd round: Drawback: inefficiency Other ways? 18
19 If no productions of form A or A B, the exhaustive search for w L(G) can be done in P + P P 2 w = O( P 2 w +1 ) 19
20 Bottom up parsing To reduce a string w to the start variable S S asb w=aabb aasbb asb S Efficiency: O( w 3 ) 20
21 Lineartime parsing Simple grammar (sgrammar) All productions are of form A ax, where x (V T)* Any pair (A, a) occurs at most once in P. Example: S as bss c Parsing for ababccc 21
22 Ambiguous grammars G is ambiguous if some w L(G) has two derivation trees. Example: S asb SS 22
23 Example from programming languages Clike grammar for arithmetic expressions. G=({E, I}, {a, b, c, +, x, (, )}, E, P), where P contains E I E E+E E ExE E (E) I a b c w=a+bxc has two derivation trees 23
24 24
25 Ambiguous languages A cfl L is inherently ambiguous if any cfg G with L(G)=L is ambiguous. Otherwise, it is unambiguous. Note: an unambiguous language may have ambiguous grammar. Example: L={a n b n c m } {a n b m c m } is inherently ambigous. Hard to prove. 25
26 CFG and Programming Languages Programming language: syntax + semantics Syntax is defined by a grammar G <expression> ::= <term> <expression> + <term> <term> ::= <factor> <term> * <factor> <while_statement> ::= while <expression><statement> Syntax checking in compilers is done by a parser Is a program p correct? Is p L(G)? We need efficient parsers. 26
27 Restricted grammars for Programming Languages Goal: The expression power is enough. There exist efficient parsers. C  LR(1) PASCAL  LL(1) Hierarchy of classes of contextfree languages LL(1) LR(0) LR(1)=DCFL LR(2) CFL 27
28 Syntactic Correctness Lexical analyzer produces a stream of tokens Parser (syntactic analyzer) verifies that this token stream is syntactically correct by constructing a valid parse tree for the entire program Unique parse tree for each language construct Program = collection of parse trees rooted at the top by a special start symbol Parser can be built automatically from the BNF description of the language s CFG Example tools: yacc, Bison slide 28
29 CFG For Floating Point Numbers ::= stands for production rule; < > are nonterminals; represents alternatives for the righthand side of a production rule Sample parse tree: slide 29
30 CFG For Balanced Parentheses Could we write this grammar using regular expressions or DFA? Why? Sample derivation: <balanced> ( <balanced> ) (( <balanced> )) (( <empty> )) (( )) slide 30
31 CFG For Decimal Numbers (Redux) This grammar is rightrecursive Sample topdown leftmost derivation: <num> <digit> <num> 7 <num> 7 <digit> <num> 7 8 <num> 7 8 <digit> slide 31
More information