Context-Free Languages Wen-Guey Tzeng Department of Computer Science National Chiao Tung University
Context-Free Grammars A grammar G=(V, T, S, P) is context-free if all productions in P are of form A x, where A V, x (V T)* The left side has only one variable. A language L is context-free if and only if there is a context-free grammar G such that L=L(G). Context-free? 2
Examples G=({S}, {a, b}, S, P), with P={S asa bsb } Derivation: S asa aasaa aabsbaa aab baa=aabbaa L(G) = {ww R : w {a, b}*} 3
S abb, A aabb, B bbaa L(G) = {ab(bbaa) n bba(ba) n : n 0}? 4
Design cfg s Give a cfg for L={a n b m : n>m} 5
Design cfg s Give a cfg for L={a n b m : n m 0} Idea1: parse L into two cases (not necessarily disjoint) L 1 ={a n b m : n>m} L 2 ={a n b m : n<m}. Then, construct productions for L 1 and L 2, respectively. 6
Give a cfg for L={a n b m : n m 0} Idea2: produce the same amount of a s and b s, then extra a s or b s 7
Give a cfg for L={a n b m c k : m=n+k} Match a and b, b and c 8
Give a cfg for L={a n b m c k : m>n+k} 9
Give a cfg for L={w {a,b}* : n a (w)=n b (w)} Find the recursion 10
Give a cfg for L={w {a,b}* : n a (w)>n b (w)} Find relation with other language Consider starting with a and b, respectively 11
Leftmost and rightmost derivation G={{A, B, S}, {a, b}, S, P}, where P contains S AB, A aaa, A, B Bb, B L(G)={a 2n b m : n, m 0} For string aab Rightmost derivation Leftmost derivation 12
Derivation (parse) tree A ababc 13
S aab, A bbb, B A 14
Some comments Derivation trees represent no orders of derivation Leftmost/rightmost derivations correspond to depth-first visiting of the tree Derivation tree and derivation order are very important to programming language and compiler design 15
Grammar for C 16
main() { int i=1; printf("i starts out life as %d.", i); i = add(1, 1); /* Function call */ printf(" And becomes %d after function is executed.\n", i); } 17
Parsing and ambiguity Parsing of w L(G): find a sequence of productions by which w L(G) is derived. Questions: given G and w Is w L(G)? (membership problem) Efficient way to determine whether w L(G)? How is w L(G) parsed? (build the parsing tree) Is the parsing unique? 18
Exhaustive search/top down parsing S SS asb bsa Determine aabb L(G)? 1 st round: (1) S SS; (2) S asb; (3) S bsa; (4) S 2 nd round: From (1), S SS SSS, S SS asbs, S SS bsas, S SS S From (2), S asb assb, S asb aasbb, S asb absab, S asb ab 3 rd round: Drawback: inefficiency Other ways? 19
If no productions of form A or A B, the exhaustive search for w L(G) can be done in P + P 2 + + P 2 w = O( P 2 w +1 ) Consider the leftmost parsing method. w can be obtained within 2 w derivations. 20
Bottom up parsing To reduce a string w to the start variable S S asb w=aabb aasbb asb S Efficiency: O( w 3 ) 21
Linear-time parsing Simple grammar (s-grammar) All productions are of form A ax, where x (V T)* Any pair (A, a) occurs at most once in P. Example: S as bss c Parsing for ababccc 22
Ambiguous grammars G is ambiguous if some w L(G) has two derivation trees. Example: S asb SS 23
Example from programming languages C-like grammar for arithmetic expressions. G=({E, I}, {a, b, c, +, x, (, )}, E, P), where P contains E I E E+E E ExE E (E) I a b c w=a+bxc has two derivation trees 24
25
Ambiguous languages A cfl L is inherently ambiguous if any cfg G with L(G)=L is ambiguous. Otherwise, it is unambiguous. Note: an unambiguous language may have ambiguous grammar. Example: L={a n b n c m } {a n b m c m } is inherently ambigous. Hard to prove. 26
CFG and Programming Languages Programming language: syntax + semantics Syntax is defined by a grammar G <expression> ::= <term> <expression> + <term> <term> ::= <factor> <term> * <factor> <while_statement> ::= while <expression><statement> Syntax checking in compilers is done by a parser Is a program p grammatically correct? Is p L(G)? We need efficient parsers. 27
Goal: Restricted CFG Programming Languages Its expression power is enough. It has no ambiguity. if then if then else If then if then else If then if then else There exists an efficient parser. 28
C -- LR(1) PASCAL -- LL(1) Hierarchy of classes of context-free languages LL(1) LR(0) LR(1)=DCFL LR(2) CFL 29
Syntactic Correctness Lexical analyzer produces a stream of tokens x = y +2.1 <id> <op> <id> <op> <real> Parser (syntactic analyzer) verifies that this token stream is syntactically correct by constructing a valid parse tree for the entire program Unique parse tree for each language construct Program = collection of parse trees rooted at the top by a special start symbol slide 30
CFG For Floating Point Numbers ::= stands for production rule; < > are non-terminals; represents alternatives for the right-hand side of a production rule Sample parse tree: slide 31
CFG For Balanced Parentheses Could we write this grammar using regular expressions or DFA? Why? Sample derivation: <balanced> ( <balanced> ) (( <balanced> )) (( <empty> )) (( )) slide 32
CFG For Decimal Numbers (Redux) This grammar is right-recursive Sample top-down leftmost derivation: <num> <digit> <num> 7 <num> 7 <digit> <num> 7 8 <num> 7 8 <digit> 7 8 9 slide 33
Compiler-compiler A compiler-compiler is a program that generates a compiler from a defined grammar Parser can be built automatically from the BNF description of the language s CFG Tools: yacc, Bison slide 34
program Programming language grammar G=(V, T, S, P) Compilercompiler Compiler: parser + code generator Input data Execution code result slide 35