Lecture 8: Context Free s Dr Kieran T. Herley Department of Computer Science University College Cork 2017-2018 KH (12/10/17) Lecture 8: Context Free s 2017-2018 1 / 1
Specifying Non-Regular Languages Recall Language Observations Not every language is regular, e.g. L = {a n b n : n non-negative integer} Consider following recursive rules defining L 1 ɛ L 2 if α L, then so is a α b Every string derived by repeated application of above rules is in L Every string in L can be formed by these rules by applying second rule n times KH (12/10/17) Lecture 8: Context Free s 2017-2018 2 / 1
Context Free s Idea Capture above idea using a context-free grammar (CFG). S ɛ S a S b Intuitive Explanation Symbol < S > a substitutable placeholder productions are substitution rules Language consists of strings derivable by starting with < S > repeatedly applying rules continuing until no place-holders left KH (12/10/17) Lecture 8: Context Free s 2017-2018 3 / 1
CFG cont d S ɛ S a S b Examples of Derivations S ɛ S a S b a b S a S b aa S bb aabb So ɛ, ab, aabb, belong to language KH (12/10/17) Lecture 8: Context Free s 2017-2018 4 / 1
Some Terminology A (context-free) grammar consists of one or more Productions S a S b LHS a single nonterminal (here S ) RHS sequence of one or more symbols (here a S b ) composed of terminals, nonterminals and ɛs ( separates two; sometimes ::= etc. used instead) where Terminals are symbols from underlying alphabet, e.g. {a, b } Nonterminals are placeholder symbols, e.g. S (Here enclosed in angle brackets for clarity) Start Symbol a nonterminal ( S ) ; the first nonterminal by default KH (12/10/17) Lecture 8: Context Free s 2017-2018 5 / 1
Derivations S ɛ S a S b Derivation Transformation of start symbol into sentence (sequence of terminals) by repeated application of grammar productions i.e. substitution of RHS of some production for the nonterminal in its LHS Example: S a S b aa S bb aabb The intermediate stages e.g. a S b are known as sentential forms Definition Sentences derivable from start symbol constitute the language defined by grammar KH (12/10/17) Lecture 8: Context Free s 2017-2018 6 / 1
More Examples and Counterexamples S ɛ S a S b KH (12/10/17) Lecture 8: Context Free s 2017-2018 7 / 1
More Examples and Counterexamples S ɛ S a S b aaabbb? KH (12/10/17) Lecture 8: Context Free s 2017-2018 7 / 1
More Examples and Counterexamples S ɛ S a S b aaabbb? aaab? KH (12/10/17) Lecture 8: Context Free s 2017-2018 7 / 1
More Examples and Counterexamples S ɛ S a S b aaabbb? aaab? abba? Upshot specifies language L = {a n b n n 0} Note If we interpret a as ( and b as ), this captures the set of nested parentheses KH (12/10/17) Lecture 8: Context Free s 2017-2018 7 / 1
Another S N S S N N ɛ N ( S ) Features Start S Nonterminals S, N Terminals Left and right parentheses symbols ( ( and ) shown in boldface) KH (12/10/17) Lecture 8: Context Free s 2017-2018 8 / 1
Left Recursion S N S S N N ɛ N ( S ) Note is left recursive: embodies rules of form X X α 1 This is one of the standard grammar idioms used to express repetition Some techniques disfavour left recursion; can usually recast grammar to avoid 1 More indirect forms of left recursion are also possible KH (12/10/17) Lecture 8: Context Free s 2017-2018 9 / 1
Another cont d S N S S N N ɛ N ( S ) Observation The first two rules imply S N S S N N N S S N S N N N N N i.e. S can roll out sequence of one or more N s depending on the number of applications of Rule 2. This is a standard CFG idiom to specify repetition. KH (12/10/17) Lecture 8: Context Free s 2017-2018 10 / 1
Some More Derivations S N S S N N ɛ N ( S ) Some Derivations S N ɛ KH (12/10/17) Lecture 8: Context Free s 2017-2018 11 / 1
Some More Derivations S N S S N N ɛ N ( S ) Some Derivations S N ɛ S N ( S ) ( N ) () KH (12/10/17) Lecture 8: Context Free s 2017-2018 11 / 1
Some More Derivations S N S S N N ɛ N ( S ) Some Derivations S N ɛ S N ( S ) ( N ) () S N ( S ) ( N ) (( S )) (( N )) (()) KH (12/10/17) Lecture 8: Context Free s 2017-2018 11 / 1
Some More Derivations S N S S N N ɛ N ( S ) More Derivations S S N N N ( S ) N ( N ) N () N () ( S ) ()( N ) ()() KH (12/10/17) Lecture 8: Context Free s 2017-2018 12 / 1
Some More Derivations S N S S N N ɛ N ( S ) More Derivations S S N N N ( S ) N ( N ) N () N () ( S ) ()( N ) ()() Upshot captures set of balanced parentheses as found in validly formated arithmetic expressions. KH (12/10/17) Lecture 8: Context Free s 2017-2018 12 / 1
Parse Trees Parse Trees S N S S N N ɛ N ( S ) Sentence/ Source : ()()() Parse Tree tree representation of derivation start symbol at root terminals at leaves each non-leaf reflects a production inorder traversal (leaves only) yields sentence. KH (12/10/17) Lecture 8: Context Free s 2017-2018 13 / 1
Parse Trees Parse Trees S N S S N N ɛ N ( S ) Sentence/ Source : ()()() KH (12/10/17) Lecture 8: Context Free s 2017-2018 14 / 1
Parse Trees Parse Trees cont d Tree representation encodes connection between source and grammar Compilers often use such trees to model detailed structure of source to drive code generation, for example KH (12/10/17) Lecture 8: Context Free s 2017-2018 15 / 1
Parse Trees Notational Note Productions sharing the same LHS can be combined using the symbol (read or ). So X α X β X γ can be abbreviated to X α β γ KH (12/10/17) Lecture 8: Context Free s 2017-2018 16 / 1
CFGs and Programming Language Syntax for Simple Arithmetic Expressions expr expr + term expr - term term term term * factor term / factor factor factor NUM ( expr ) Terminal NUM stands for a number (i.e. sequence of digits). CFGs can be used to specify syntax for arithmetic expressions and most programming languages CFG-based tools allow us to generate parser capable of recognizing expressions automatically KH (12/10/17) Lecture 8: Context Free s 2017-2018 17 / 1
CFGs and Programming Language Syntax Some Examples of Valid Expressions 1 NUM 2 NUM NUM 3 NUM + NUM 4 NUM + NUM NUM 5 NUM (NUM + NUM) KH (12/10/17) Lecture 8: Context Free s 2017-2018 18 / 1
CFGs and Programming Language Syntax Example 1 Expression Parse Tree NUM expr expr + term expr - term term term term * factor term / factor factor factor NUM ( expr ) KH (12/10/17) Lecture 8: Context Free s 2017-2018 19 / 1
CFGs and Programming Language Syntax Example 2 Expression Parse Tree NUM NUM expr expr + term expr - term term term term * factor term / factor factor factor NUM ( expr ) KH (12/10/17) Lecture 8: Context Free s 2017-2018 20 / 1
CFGs and Programming Language Syntax Example 3 Expression Parse Tree NUM + NUM expr expr + term expr - term term term term * factor term / factor factor factor NUM ( expr ) KH (12/10/17) Lecture 8: Context Free s 2017-2018 21 / 1
CFGs and Programming Language Syntax Example 4 Expression Parse Tree NUM + NUM NUM expr expr + term expr - term term term term * factor term / factor factor factor NUM ( expr ) KH (12/10/17) Lecture 8: Context Free s 2017-2018 22 / 1
CYK Algorithm Parsing Algorithm <expr > <expr > + <term> <expr > <term> <term> <term> <term> <factor > <term>/<factor > <factor > < factor > NUM (<expr >) For CFG G and string s how do we determine if s L(G)? Could try enumerating all possible derivations but TGBABW... KH (12/10/17) Lecture 8: Context Free s 2017-2018 23 / 1
CYK Algorithm CYK Algorithm for i 1 to n do V[i, 1] {A A > a is a production and ith symbol of x is a} for j 2 to n do for i 1 to n j + 1 do V[i, j ] {} for k 1 to j 1 do V[i, j ] V[i, j ] Union {A A >BC is a production, B is in V[i, k] and C is in V[i+k, j k]} 2 Computes (in V [i, j]) set of nonterminals <X> which for which derivation <X> x i x i+1 x i+j 1 exists, where x i x i+1 x i+j 1 denotes substring of source beginning at x i and of length j. 2 See J. E. Hopcroft and J. D. Ullmann, Introduction to Automata, Languages and Computation, Addison-Wesley, 1979 (pp139 141) KH (12/10/17) Lecture 8: Context Free s 2017-2018 24 / 1
CYK Algorithm Chomsky Normal Form Chomsky Normal Form (CNF) Any grammar without ɛ can be recast to use only productions of form A B C A a where. are nonterminals and a is a terminal. Transformation reasonably straightforward, but not discussed here KH (12/10/17) Lecture 8: Context Free s 2017-2018 25 / 1
CYK Algorithm Determines for any CNF G and string s, whether s L(G) (Can be modified to produce derivation/parse tree) (Dynamic Programming!) KH (12/10/17) Lecture 8: Context Free s 2017-2018 26 / 1 CYK Algorithm for i 1 to n do V[i, 1] {A A > a is a production and ith symbol of x is a} for j 2 to n do for i 1 to n j + 1 do V[i, j ] {} for k 1 to j 1 do V[i, j ] V[i, j ] Union {A A >BC is a production, B is in V[i, k] and C is in V[i+k, j k]} CYK Algorithm