CFG [1] The CYK Algorithm We present now an algorithm to decide if w L(G), assuming G to be in Chomsky Normal Form. This is an example of the technique of dynamic programming Let n be w. The natural algorithm (trying all productions of length < 2n) may be exponential. This technique gives a O(n 3 ) algorithm!! 1
CFG [2] dynamic programming fib 0 = fib 1 = 1 fib (n + 2) = fib n + fib (n + 1) fib 5? calls fib 4, fib 3 and fib 4 calls fib 3 So in a top-down computation there is duplication of works (if one does not use memoization) 2
CFG [3] dynamic programming For a bottom-up computation fib 2 = 2, fib 3 = 3, fib 4 = 5, fib 5 = 8 What is going on in the CYK algorithm or Earley algorithm is similar S AB BC, A BA a, B CC b, C AB a bab L(G)?? and aba L(G)? 3
CFG [4] dynamic programming The idea is to represent bab as the collection of the facts b(0, 1), a(1, 2), b(2, 3) We compute then the facts X(i, k) for i < k by induction on k i Only one rule: If we have a production C AB and A in X(i, j) and B in X(j, k) then C is in X(i, k) 4
CFG [5] The CYK Algorithm The algorithm is best understood in term of production systems Example: the grammar S AB BA SS AC BD A a, B b, C SB, D SA becomes the production system 5
CFG [6] The CYK Algorithm A(x, y), B(y, z) S(x, z), B(x, y), A(y, z) S(x, z) S(x, y), S(y, z) S(x, z), A(x, y), C(y, z) S(x, z) B(x, y), D(y, z) S(x, z), S(x, y), B(y, z) C(x, z) S(x, y), A(y, z) D(x, z), a(x, y) A(x, y), b(x, y) B(x, y) 6
CFG [7] The CYK Algorithm The problem if one can one derive S aabbab is transformed to the problem: can one produce S(0, 6) in this production system given the facts a(0, 1), a(1, 2), b(2, 3), b(3, 4), a(4, 5), b(5, 6) 7
CFG [8] The CYK Algorithm For this we apply a forward chaining/bottom up sequence of productions A(0, 1), A(1, 2), B(2, 3), B(3, 4), A(4, 5), B(5, 6) S(1, 3), S(3, 5), S(4, 6) S(1, 5), C(1, 4), C(3, 6) S(0, 4),... S(0, 6) 8
CFG [9] The CYK Algorithm For instance the fact that C(3, 6) is produced corresponds to the derivation C SB BAB bab bab bab In this way, we get a solution in O(n 3 )! 9
CFG [10] Forward-chaining inference This idea works actually for any grammar. For instance is represented by the production system S SS asb ɛ S(x, x), S(x, y), S(y, z) S(x, z) a(x, y), S(y, z), b(z, t) S(x, t) and the problem to decide S aabb is replaced by the problem to derive S(0, 4) from the facts a(0, 1), a(1, 2), b(2, 3), b(3, 4) 10
CFG [11] Forward-chaining inference This is the main idea behind Earley algorithm Mainly used for parsing in computational linguistics Earley parsers are interesting because they can parse all context-free languages 11
CFG [12] Pumping Lemma for CFL We prove that {a n b n c m n m} is not context-free using the pumping lemma Similar problem for {a n b m m n}: one can show that it is not regular using the pumping lemma 12
CFG [13] Complement of a CLF We have seen that CLF are not closed under intersection, are closed under union It follows that they are not closed under complement Here is an explicit example: we show that the complement of {a n b n c n n 0} is a CFL For this we prove that the complemenent of L(a b c ) is regular 13
CFG [14] Undecidable Problems We have given algorithm to decide L(G) and w L(G). What is surprising is that it can be shown that there are no algorithms for the following problems Given G 1 and G 2 do we have L(G 1 ) L(G 2 )? Do we have L(G 1 ) = L(G 2 )? Given G and R regular expression, do we have L(G) = L(R)? L(R) L(G)? Do we have L(G) = T where T is the alphabet of G? (Compare to the case of regular languages) Given G is G ambiguous?? 14
CFG [15] Undecidable Problems One reduces these problems to the Post Correspondance Problem Given u 1,..., u n and v 1,..., v n in {0, 1} is it possible to find i 1,..., i k such that u i1... u ik = v i1... v ik Example: 1, 10, 011 and 101, 00, 11 Challenge example: 001, 01, 01, 10 and 0, 011, 101, 001 15
CFG [16] Haskell Program isprefix [] ys = True isprefix (x:xs) (y:ys) = x == y && isprefix xs ys isprefix xs ys = False iscomp (xs,ys) = isprefix xs ys isprefix ys xs exists p [] = False exists p (x:xs) = p x exists p xs exhibit p (x:xs) = if p x then x else exhibit p xs 16
CFG [17] Haskell Program addnum k [] = [] addnum k (x:xs) = (k,x):(addnum (k+1) xs) nextstep xs ys = concat (map (\ (n,(s,t)) -> map (\ (ns,(u,v)) -> (ns++[n],(u ++ s,v ++ t))) ys) xs) 17
CFG [18] Haskell Program mainloop xs ys = let bs = filter (iscomp. snd) ys prop (_,(u,v)) = u == v in if exists prop bs then exhibit prop bs else if bs == [] then error"no SOLUTION" else mainloop xs (nextstep xs bs) 18
CFG [19] Haskell Program post xs = let as = addnum 1 xs in mainloop as (map (\ (n,z) -> ([n],z)) as) xs1 = [("1","101"),("10","00"),("011","11")] xs2 = [("001","0"),("01","011"),("01","101"),("10","001")] 19
CFG [20] Haskell Program Main> post xs1 ([1,3,2,3],("101110011","101110011")) Main> post xs2 ERROR - Garbage collection fails to reclaim sufficient space [2,2,2,3,2,2,2,3,3,4,4,6,8,8,15, 21,15,17,18,24,15,12,12,18,18,24,24,45, 63,66,84,91,140,182,201,346,418,324,330,321,423,459,780 20
CFG [21] Post Correspondance Problem and CFL To the sequence u 1,..., u n we associate the following grammar G A The alphabet is {0, 1, a 1,..., a n } The productions are A u 1 a 1... u n a n u 1 Aa 1... u n Aa n This grammar is non ambiguous 21
CFG [22] Post Correspondance Problem and CFL To the sequence v 1,..., v n we associate the following grammar G B The alphabet is the same {0, 1, a 1,..., a n } The productions are B v 1 a 1... v n a n v 1 Ba 1... v n Ba n This grammar is non ambiguous 22
CFG [23] Post Correspondance Problem and CFL Theorem: We have L(G A ) L(G B ) iff the Post Correspondance Problem for u 1,..., u n and v 1,..., v n has a solution 23
CFG [24] Post Correspondance Problem and CFL Finally we have the grammar G with productions S A B Theorem: The grammar G is ambiguous iff the Post Correspondance Problem for u 1,..., u n and v 1,..., v n has a solution 24
CFG [25] Post Correspondance Problem and CFL The complement of L(G A ) is CF We see this on one example u 1 = 0, u 2 = 10 The complement of L(G B ) is CF Hence we have a grammar G C for the union of the complement of L(G A ) and the complement of L(G B ) 25