Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Similar documents
MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

Context-Free Languages and Parse Trees

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

Ambiguous Grammars and Compactification

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Context-Free Grammars

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half.

Syntax Analysis Part I

CT32 COMPUTER NETWORKS DEC 2015

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Context-Free Grammars and Languages (2015/11)

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Introduction to Syntax Analysis

CS525 Winter 2012 \ Class Assignment #2 Preparation

UNIT I PART A PART B

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Introduction to Syntax Analysis. The Second Phase of Front-End

Optimizing Finite Automata


Models of Computation II: Grammars and Pushdown Automata

CMSC 330: Organization of Programming Languages

Normal Forms for CFG s. Eliminating Useless Variables Removing Epsilon Removing Unit Productions Chomsky Normal Form

JNTUWORLD. Code No: R

CMSC 330: Organization of Programming Languages

Closure Properties of CFLs; Introducing TMs. CS154 Chris Pollett Apr 9, 2007.

Syntax Analysis Check syntax and construct abstract syntax tree

CMSC 330: Organization of Programming Languages. Context Free Grammars

Compiler Design Concepts. Syntax Analysis

Homework. Announcements. Before We Start. Languages. Plan for today. Chomsky Normal Form. Final Exam Dates have been announced

Slides for Faculty Oxford University Press All rights reserved.

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Introduction to Parsing. Lecture 8

Definition: A context-free grammar (CFG) is a 4- tuple. variables = nonterminals, terminals, rules = productions,,

CMSC 330: Organization of Programming Languages

Lecture 8: Context Free Grammars

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Decision Properties for Context-free Languages

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

1. [5 points each] True or False. If the question is currently open, write O or Open.

Compilers Course Lecture 4: Context Free Grammars

Defining Languages GMU

Languages and Compilers

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

15 212: Principles of Programming. Some Notes on Grammars and Parsing

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

LL(1) predictive parsing

Actually talking about Turing machines this time

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Context Free Languages and Pushdown Automata

University of Nevada, Las Vegas Computer Science 456/656 Fall 2016

Context-Free Grammars

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

(a) R=01[((10)*+111)*+0]*1 (b) ((01+10)*00)*. [8+8] 4. (a) Find the left most and right most derivations for the word abba in the grammar

ECS 120 Lesson 16 Turing Machines, Pt. 2

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

(Refer Slide Time: 0:19)

Final Course Review. Reading: Chapters 1-9

Foundations of Computer Science Spring Mathematical Preliminaries

Programming Lecture 3

CS 403: Scanning and Parsing

COP 3402 Systems Software Syntax Analysis (Parser)

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Introduction to Lexing and Parsing

LL(1) predictive parsing

Defining syntax using CFGs

Multiple Choice Questions

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Decidable Problems. We examine the problems for which there is an algorithm.

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

CS 314 Principles of Programming Languages. Lecture 3

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

CS 321 Programming Languages and Compilers. VI. Parsing

THEORY OF COMPUTATION

TAFL 1 (ECS-403) Unit- V. 5.1 Turing Machine. 5.2 TM as computer of Integer Function

Formal Languages and Automata

Formal Languages. Grammar. Ryan Stansifer. Department of Computer Sciences Florida Institute of Technology Melbourne, Florida USA 32901

The Turing Machine. Unsolvable Problems. Undecidability. The Church-Turing Thesis (1936) Decision Problem. Decision Problems

CS154 Midterm Examination. May 4, 2010, 2:15-3:30PM

LR Parsing Techniques

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Compiler Construction

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

Theory of Computation

Transcription:

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Lecture 11 Ana Bove April 26th 2018 Recap: Regular Languages Decision properties of RL: Is it empty? Does it contain this word? Contains ɛ, contains at most ɛ, is infinite? Equivalence of languages using table-filling algorithm; Minimisation of DFA using table-filling algorithm. April 26th 2018, Lecture 11 TMV027/DIT321 1/30

Overview of Today s Lecture Context-free grammars; Derivations; Parse trees; Proofs in grammars. Contributes to the following learning outcome: Explain and manipulate the diff. concepts in automata theory and formal lang; Prove properties of languages, grammars and automata with rigorously formal mathematical methods; Design automata, regular expressions and context-free grammars accepting or generating a certain language; Describe the language accepted by an automata or generated by a regular expression or a context-free grammar; Determine if a certain word belongs to a language; Differentiate and manipulate formal descriptions of lang, automata and grammars. April 26th 2018, Lecture 11 TMV027/DIT321 2/30 Context-free Grammars We have seen that not all languages are regular. Context-free grammars (CFG) define the so called context-free languages. They have been developed in the mid-1950s by Noam Chomsky. We will see (next lecture) that regular languages context-free languages CFG play an important role in the description and design of programming languages and compilers, for example, for the implementation of parsers. April 26th 2018, Lecture 11 TMV027/DIT321 3/30

Example: Palindromes Let us consider the language P of palindromes over Σ = {0, 1}. That is, P = {w Σ w = rev(w)}. Example of palindromes over Σ are ɛ, 0110, 00011000, 010. We can use the Pumping Lemma for RL to show that P is not regular. How can P be defined? Consider the inductive definition of a (very similar) language L: ɛ, 0 and 1 are in L; if w L then 0w0 and 1w1 are also in L. April 26th 2018, Lecture 11 TMV027/DIT321 4/30 Example: CFG for Palindromes P ɛ P 0 P 1 P 0P0 P 1P1 The variable P represents the set of strings that are palindromes. The rules explain how to construct the strings in the language. April 26th 2018, Lecture 11 TMV027/DIT321 5/30

Example: CFG for Simple Expressions We can define several sets in a grammar. Example: Here we define 2 sets: those representing simple numerical expressions (denoted by E) and those representing Boolean expressions (denoted by B). Note that they depend on each other! E 0 E 1 E E + E E if B then E else E B True B False B E < E B E == E April 26th 2018, Lecture 11 TMV027/DIT321 6/30 Compact Notation We can have a more compact notation for the productions defining elements in a certain set. Example: Palindromes can be defined as P ɛ 0 1 0P0 1P1 Example: The expressions can be defined as E 0 1 E + E if B then E else E B True False E < E E == E April 26th 2018, Lecture 11 TMV027/DIT321 7/30

Context-Free Grammars Definition: A context-free grammar is a 4-tuple G = (V, T, R, S) where: V is a finite set of variables or non-terminals: each variable represents a language or set of strings; T is a finite set of terminals: think of T as the alphabet of the language we are defining; R is a finite set of rules or productions which recursively define the language. Each production consists of: A variable being defined in the production; The symbol ; A string of 0 or more terminals and variables called the body of the production; S is the start variable and represents the language we are defining; other variables define the auxiliary classes needed to define our language. April 26th 2018, Lecture 11 TMV027/DIT321 8/30 Example: C++ Compound Statements A context free grammar for statements: with R as follows: ({S, LC, C, E, Id, LLoD, L, D}, {a,..., A,..., 0,..., (, ),...}, R, S) S {LC} LC ɛ C LC C S if (E) C if (E) C else C while (E) C do C while (E) for (C E; E) C case E : C switch (E) C return E; goto Id; break; continue; E... Id L LLoD LLoD L LLoD D LLoD ɛ L A B... Z a b... z D 0 1 2 3 4 5 6 7 8 9 April 26th 2018, Lecture 11 TMV027/DIT321 9/30

Notation for Context-Free Grammars We use the following convention when working with CFG: a, b, c,..., 0, 1,..., (, ), +, %,... are terminal symbols; A, B, C,... are variables (non-terminals); w, x, y, z,... are strings of terminals; X, Y,... are either a terminal or a variable; α, β, γ,... are strings of terminals and/or variables; In particular, they can also represent strings with only variables. April 26th 2018, Lecture 11 TMV027/DIT321 10/30 Working with Grammars We use the productions of a CFG to infer that a string w is in the language of a given variable. Example: Let us see if 0110110 is a palindrome. We can do this in 2 (equivalent) ways: Recursive inference: From body to head. (Similar to how we would construct the element in the inductive set.) 1) 0 is palindrome by rule P 0 2) 101 is palindrome using 1) and rule P 1P1 3) 11011 is palindrome using 2) and rule P 1P1 4) 0110110 is palindrome using 3) and rule P 0P0 Derivation: Notation: From head to body using the same rules in the opposite order. P 0P0 01P10 011P110 0110110 April 26th 2018, Lecture 11 TMV027/DIT321 11/30

Formal Definition of a Derivation Let G = (V, T, R, S) be a CFG. Definition: Let A V and α, β (V T ). Let A γ R. Then αaβ αγβ is one derivation step (alternative αaβ G αγβ). Example: B E == E 0 == E 0 == E + E 0 == E + 1 0 == 0 + 1 April 26th 2018, Lecture 11 TMV027/DIT321 12/30 Reflexive-transitive Closure of a Derivation We can define a relation performing zero or more derivation steps. Definition: The reflexive-transitive closure of is the relation (alternative G ) defined as follows: α α α β β γ α γ Example: B B B 0 + E < E B 0 == 0 + 1 Note: We denote A n α when α is derived from A in n steps. Example: B 0 B April 26th 2018, Lecture 11 TMV027/DIT321 13/30

Leftmost and Rightmost Derivations At every derivation step we can choose to replace any variable by the right-hand side of one of its productions. Two particular derivations are: Leftmost derivation: Notations: lm and lm. At each step we choose to replace the leftmost variable. Example: B lm 0 == 0 + 1 B lm E == E lm 0 == E lm 0 == E + E lm 0 == 0 + E lm 0 == 0 + 1 Rightmost derivation: Notations: rm and rm. At each step we choose to replace the rightmost variable. Example: B rm 0 == 0 + 1 B rm E == E rm E == E + E rm E == E + 1 rm E == 0 + 1 rm 0 == 0 + 1 April 26th 2018, Lecture 11 TMV027/DIT321 14/30 Sentential Forms Derivations from the start variable have a special role. Definition: Let G = (V, T, R, S) be a CFG and let α (V T ). α is called a sentential form if S α. We say left sentential form if S lm α or right sentential form if S rm α. Example: 010P010 is a sentential form since P 0P0 01P10 010P010 April 26th 2018, Lecture 11 TMV027/DIT321 15/30

Language of a Grammar Definition: Let G = (V, T, R, S) be a CFG. The language of G, denoted L(G) is the set of terminal strings that can be derived from the start variable. L(G) = {w T S G w} Definition: If L is the language of a certain context-free grammar, then L is said to be a context-free language. April 26th 2018, Lecture 11 TMV027/DIT321 16/30 Examples of Context Free Grammars Example: Construct a CFG generating {0 i 1 j j i 0}. S ɛ 0S1 S1 Example: Construct a CFG generating {w {0, 1} # 0 (w) = # 1 (w)}. S ɛ 0S1 1S0 SS Example: Construct a grammar for the following language: {a n b n c m d m n, m 1} {a n b m c m d n n, m 1} S AB C A aab ab B cbd cd C acd add D bdc bc April 26th 2018, Lecture 11 TMV027/DIT321 17/30

Observations on Derivations Observe: If we have A γ then we also have αaβ αγβ. The same sequence of steps that took us from A to γ will also take us from αaβ to αγβ. Example: We have E 0 + 1 since E E + E 0 + E 0 + 1. This same derivation justifies E == E E == 0 + 1 since E == E E == E + E E == 0 + E E == 0 + 1 April 26th 2018, Lecture 11 TMV027/DIT321 18/30 Observations on Derivations Observe: If we have A X 1 X 2... X k w, then we can break w into pieces w 1, w 2,..., w k such that X i w i. If X i is a terminal then X i = w i. This can be showed by proving (by induction on the length of the derivation) that if X 1 X 2... X k α then all positions of α that come from expansion of X i are to the left of all positions that come from the expansion of X j if i < j. To obtain X i w i from A w we need to remove all positions of the sentential form (see next slide) that are to the left and to the right of all the positions derived from X i, and all steps not relevant for the derivation of w i from X i. Example: Let B E == E E == E + E E == E + 0 E == E + E + 0 E == 0 + E + 0 1 == 0 + E + 0 1 == 0 + 1 + 0. The derivation from the middle E in the sentential form E == E + E is E E + E 0 + E 0 + 1. April 26th 2018, Lecture 11 TMV027/DIT321 19/30

Parse Trees Parse trees are a way to represent derivations. A parse tree reflects the internal structure of a word of the language. Using parse trees it becomes very clear which is the variable that was replaced at each step. In addition, it becomes clear which terminal symbols where generated/derived form a particular variable. Parse trees are very important in compiler theory. In a compiler, a parser takes the source code into its abstract tree, which is an abstract version of the parse tree. This parse tree is the structure of the program. April 26th 2018, Lecture 11 TMV027/DIT321 20/30 Parse Trees Definition: Let G = (V, T, R, S) be a CFG. The parse trees for G are trees with the following conditions: Nodes are labelled with a variable; Leaves are either variables, terminals or ɛ; If a node is labelled A and it has children labelled X 1, X 2,..., X n respectively from left to right, then it must exist in R a production of the form A X 1 X 2... X n. Note: If a leaf is ɛ it should be the only child of its parent A and there should be a production A ɛ. Note: Of particular importance are the parse trees with root S. Exercise: Construct the parse trees for 0 == E + 1 and for 001100. April 26th 2018, Lecture 11 TMV027/DIT321 21/30

Height of a Parse Tree Definition: The height of a parse tree is the maximum length of a path from the root of the tree to one of its leaves. Observe: We count the edges in the tree, and not the number of nodes and the leaf in the path. Example: The height of the parse tree for 0 == E + 1 is 3 and the one for 001100 is 4. April 26th 2018, Lecture 11 TMV027/DIT321 22/30 Yield of a Parse Tree Definition: A yield of a parse tree is the string resulted from concatenating all the leaves of the tree from left to right. Observations: We will show than the yield of a tree is a string derived from the root of the tree; Of particular importance are the yields that consist of only terminals; that is, the leaves are either terminals or ɛ; When, in addition, the root is S then we have a parse tree for a string in the language of the grammar; We will see that yields can be used to describe the language of a grammar. April 26th 2018, Lecture 11 TMV027/DIT321 23/30

Context-Free Languages and Inductive Definitions For each CFG there is a corresponding inductive definition (see slide 4). Example: Consider the grammar for palindromes in slide 5 (and 7). Now consider the following inductive definition: Base Cases: The empty string is a palindrome; 0 and 1 are palindromes. Inductive Steps: If w is a palindrome, then so are 0w0 and 1w1. A natural way to do proofs on context-free languages defined with an inductive set is to follow this inductive structure. How to prove properties when the language is defined with a grammar? April 26th 2018, Lecture 11 TMV027/DIT321 24/30 Proofs About a Grammar When we want to prove something about a grammar we usually need to prove an statement for each of the variables in the grammar. See notes on induction for an example! (Compare this with proofs about FA where we needed statements for each state.) Proofs about grammars are in general done by: (course-of-value) induction on the length of a certain string of the language; (course-of-value) induction on the length (number of steps) of a derivation of a certain string; induction on the structure of the strings/words in the language; (Recall words are either empty or not empty!) (course-of-value) induction on the height of the parse tree. April 26th 2018, Lecture 11 TMV027/DIT321 25/30

Example: Proof About a Grammar Exercise: Consider the grammar S ɛ 0S1 1S0 SS. Prove that w. if S w then # 0 (w) = # 1 (w). Proof: Our P(n) : w. if S n w then # 0 (w) = # 1 (w). Proof by course-of-value induction on the length n of the derivation. Base cases: If length is 1 then we have S ɛ and # 0 (ɛ) = # 1 (ɛ) holds. Inductive Steps: Our IH is that i. 1 i n, w. if S i w then # 0 (w ) = # 1 (w ). Let S n+1 w with n > 0. Then we have 3 cases: S 0S1 n w: Here w = 0w 1 1 with S n w 1. By IH # 0 (w 1 ) = # 1 (w 1 ) hence # 0 (w) = # 0 (0w 1 1) = 1 + # 0 (w 1 ) = 1 + # 1 (w 1 ) = # 1 (w); S 1S0 n 1w 1 0: Similar... S SS n w: Here w = w 1 w 2 with S i w 1 and S j w 2 for i, j < n. By IH # 0 (w 1 ) = # 1 (w 1 ) and # 0 (w 2 ) = # 1 (w 2 ). Hence # 0 (w) = # 0 (w 1 ) + # 0 (w 2 ) = # 1 (w 1 ) + # 1 (w 2 ) = # 1 (w). April 26th 2018, Lecture 11 TMV027/DIT321 26/30 Example: Proof About a Grammar Lemma: Let G be the grammar presented on slide 5. Then L(G) is the set of palindromes over {0, 1}. Proof: We will prove that w. if w {0, 1}, then w L(G) iff w = rev(w). If) Our P(n) : w. if w is a palindrome and w = n then w L(G), that is, P w. Proof by course-of-value induction on n (that is, w ). Base cases: If w = 0 or w = 1 then w is ɛ, 0 or 1. We have productions P ɛ, P 0 and P 1. Then P w so w L(G). Inductive Steps: Assume w such that w > 1. Our IH is that the property holds for any w such that w < w. Since w = rev(w) then w = 0w 0 or w = 1w 1, and w = rev(w ). w < w so by IH then P w. If w = 0w 0 we have P 0P0 0w 0 = w so w L(G). Similarly if w = 1w 1. April 26th 2018, Lecture 11 TMV027/DIT321 27/30

Example: Proof About a Grammar (Cont.) Only-if) We prove that w. if w L(G) then w is a palindrome. Our P(n) : w. if P n w then w = rev(w). Proof by mathematical induction on n (length of the derivation of w). Base case: If the derivation is in one step then we should have P ɛ, P 0 and P 1. In all cases we have w = rev(w). Inductive Step: Our IH is that w. if P n w with n > 0 then w = rev(w ). Assume P n+1 w. The we have 2 cases: P 0P0 n 0w 0 = w; P 1P1 n 1w 1 = w. Observe that in both cases P n w with n > 0. Hence by IH w = rev(w ) so w = rev(w). April 26th 2018, Lecture 11 TMV027/DIT321 28/30 Overview of Next Lecture Sections 5.2.3 5.2.6, 5.4: Inference, derivations and parse trees; Ambiguity in grammars; More about proofs on grammars and languages. April 26th 2018, Lecture 11 TMV027/DIT321 29/30

Overview of next Week Mon 30 Tue 1 Wed 2 Thu 3 Fri 4 15-17 EL41 Consultation Lec 13-15 HB3 CFG. April 26th 2018, Lecture 11 TMV027/DIT321 30/30