1 Context-Free Languages Wen-Guey Tzeng Department of Computer Science National Chiao Tung University 1

2 Context-Free Grammars Some languages are not regular. Eg. L={a n b n : n 0} A grammar G=(V, T, S, P) is context-free if all productions are of form A x, where A V, x (V T)* A language L is context-free if and only if there is a context-free grammar G such that L=L(G). 2

3 Examples G=({S}, {a, b}, S, P), with P={S asa bsb } S asa aasaa aabsbaa aab baa=aabbaa L(G) = {ww R : w {a, b}*} S abb, A aabb, B bbaa L(G) = {ab(bbaa) n bba(ba) n : n 0}? 3

4 Design cfg s Give a cfg for L={a n b m : n>m} 4

5 Design cfg s Give a cfg for L={a n b m : n m 0} Idea1: parse L into two cases (not necessarily disjoint) L 1 ={a n b m : n>m} L 2 ={a n b m : n<m}. Then, construct productions for L 1 and L 2, respectively. Idea2: for L 1, produce the same amount of a s and b s, then extra a s 5

6 Give a cfg for L={a n b m c k : m=n+k} Give a cfg for L={a n b m c k : n-m =k} 6

7 Give a cfg for L={a n b m c k : m>n+k} 7

8 Give a cfg for L={a n b m c k : m n+k} 8

9 Give a cfg for L={w {a,b}* : n a (w)=n b (w)} 9

10 Give a cfg for L={w {a,b}* : n a (w)>n b (w)} 10

11 What is L(G)? S asb SS L(G)? L(G)= {w {a,b}* : n a (w)=n b (w) and n a (v) n b (v), where v is any prefix of w} 11

12 Leftmost and rightmost derivation G={{A, B, S}, {a, b}, S, P}, where P contains S AB, A aaa, A, B Bb, B L(G)={a 2n b m : n, m 0} For string aab Rightmost derivation Leftmost derivation 12

13 Derivation (parse) tree A ababc 13

14 S aab, A bbb, B A 14

15 Some comments Derivation trees represent no orders of derivation Leftmost/rightmost derivations correspond to depth-first visiting of the tree Derivation tree and derivation order are very important to programming language and compiler design 15

16 Grammar for C 16

17 Parsing and ambiguity Parsing of w L(G): find a sequence of productions by which w L(G) is derived. Questions: given G and w Is w L(G)? (membership problem) Efficient way to determine whether w L(G)? How is w L(G) parsed? Is the parsing unique? 17

18 Exhaustive search/top down parsing S SS asb bsa Determine aabb L(G)? 1 st round: (1) S SS; (2) S asb; (3) S bsa; (4) S 2 nd round: From (1), S SS SSS, S SS asbs, S SS bsas, S SS S From (2), S asb assb, S asb aasbb, S asb absab, S asb ab 3 rd round: Drawback: inefficiency Other ways? 18

19 If no productions of form A or A B, the exhaustive search for w L(G) can be done in P + P P 2 w = O( P 2 w +1 ) 19

20 Bottom up parsing To reduce a string w to the start variable S S asb w=aabb aasbb asb S Efficiency: O( w 3 ) 20

21 Linear-time parsing Simple grammar (s-grammar) All productions are of form A ax, where x (V T)* Any pair (A, a) occurs at most once in P. Example: S as bss c Parsing for ababccc 21

22 Ambiguous grammars G is ambiguous if some w L(G) has two derivation trees. Example: S asb SS 22

23 Example from programming languages C-like grammar for arithmetic expressions. G=({E, I}, {a, b, c, +, x, (, )}, E, P), where P contains E I E E+E E ExE E (E) I a b c w=a+bxc has two derivation trees 23

24 24

25 Ambiguous languages A cfl L is inherently ambiguous if any cfg G with L(G)=L is ambiguous. Otherwise, it is unambiguous. Note: an unambiguous language may have ambiguous grammar. Example: L={a n b n c m } {a n b m c m } is inherently ambigous. Hard to prove. 25

26 CFG and Programming Languages Programming language: syntax + semantics Syntax is defined by a grammar G <expression> ::= <term> <expression> + <term> <term> ::= <factor> <term> * <factor> <while_statement> ::= while <expression><statement> Syntax checking in compilers is done by a parser Is a program p correct? Is p L(G)? We need efficient parsers. 26

27 Restricted grammars for Programming Languages Goal: The expression power is enough. There exist efficient parsers. C -- LR(1) PASCAL -- LL(1) Hierarchy of classes of context-free languages LL(1) LR(0) LR(1)=DCFL LR(2) CFL 27

28 Syntactic Correctness Lexical analyzer produces a stream of tokens Parser (syntactic analyzer) verifies that this token stream is syntactically correct by constructing a valid parse tree for the entire program Unique parse tree for each language construct Program = collection of parse trees rooted at the top by a special start symbol Parser can be built automatically from the BNF description of the language s CFG Example tools: yacc, Bison slide 28

29 CFG For Floating Point Numbers ::= stands for production rule; < > are non-terminals; represents alternatives for the right-hand side of a production rule Sample parse tree: slide 29

30 CFG For Balanced Parentheses Could we write this grammar using regular expressions or DFA? Why? Sample derivation: <balanced> ( <balanced> ) (( <balanced> )) (( <empty> )) (( )) slide 30

31 CFG For Decimal Numbers (Redux) This grammar is right-recursive Sample top-down leftmost derivation: <num> <digit> <num> 7 <num> 7 <digit> <num> 7 8 <num> 7 8 <digit> slide 31

Parsing Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students

### LR Parsing Techniques

LR Parsing Techniques Introduction Bottom-Up Parsing LR Parsing as Handle Pruning Shift-Reduce Parser LR(k) Parsing Model Parsing Table Construction: SLR, LR, LALR 1 Bottom-UP Parsing A bottom-up parser