NFAs and Myhill-Nerode. CS154 Chris Pollett Feb. 22, 2006.

Similar documents
CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

Regular Languages and Regular Expressions

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

lec3:nondeterministic finite state automata

Converting a DFA to a Regular Expression JP

CS154. Streaming Algorithms and Communication Complexity

CS 181 B&C EXAM #1 NAME. You have 90 minutes to complete this exam. You may assume without proof any statement proved in class.

CSE 105 THEORY OF COMPUTATION

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

ECS 120 Lesson 16 Turing Machines, Pt. 2

CMPSCI 250: Introduction to Computation. Lecture 20: Deterministic and Nondeterministic Finite Automata David Mix Barrington 16 April 2013

CS154 Midterm Examination. May 4, 2010, 2:15-3:30PM

(Refer Slide Time: 0:19)

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Finite Automata Part Three

Lexical Analysis. Implementation: Finite Automata

Decision Properties of RLs & Automaton Minimization

CMPE 4003 Formal Languages & Automata Theory Project Due May 15, 2010.

Implementation of Lexical Analysis

CS 125 Section #10 Midterm 2 Review 11/5/14

Theory of Computation, Homework 3 Sample Solution

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

CS103 Handout 35 Spring 2017 May 19, 2017 Problem Set 7

Closure Properties of CFLs; Introducing TMs. CS154 Chris Pollett Apr 9, 2007.

Languages and Compilers

Non-deterministic Finite Automata (NFA)

Solutions to Homework 10

Automating Construction of Lexers

I have read and understand all of the instructions below, and I will obey the Academic Honor Code.

2. Lexical Analysis! Prof. O. Nierstrasz!

14.1 Encoding for different models of computation

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

Regular Expression Constrained Sequence Alignment

More Closure Properties, Algorithms, and the Pumping Lemma. CS154 Chris Pollett Feb 21, 2007.

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Lexical Analysis. Chapter 2

Midterm I review. Reading: Chapters 1-4

Lexical Analysis. Lecture 2-4

Ambiguous Grammars and Compactification

CS 310: State Transition Diagrams

Finite Automata Part Three

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Chapter 1 AUTOMATA & LANGUAGES

Finite Automata Part Three

Chapter Seven: Regular Expressions

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

6 NFA and Regular Expressions

CS103 Handout 14 Winter February 8, 2013 Problem Set 5

Lexical Analysis. Lecture 3-4

CSE 431S Scanning. Washington University Spring 2013

We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

CIT3130: Theory of Computation. Regular languages

Introduction to Lexical Analysis

CMSC 350: COMPILER DESIGN

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CSE Theory of Computing Spring 2018 Project 2-Finite Automata

CS52 - Assignment 10

Lexical Analysis - 2

Chapter 1 AUTOMATA & LANGUAGES. Roger Wattenhofer. ETH Zurich Distributed Computing

CSE Theory of Computing Spring 2018 Project 2-Finite Automata

Lexical Analysis. Prof. James L. Frankel Harvard University

Lecture 2 Finite Automata

Formal Languages and Automata

CS2 Practical 2 CS2Ah

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Theory of Computation

Outline. Language Hierarchy

Languages and Models of Computation

Computer Sciences Department

Actually talking about Turing machines this time

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

Final Course Review. Reading: Chapters 1-9

Limits of Computation p.1/?? Limits of Computation p.2/??

Variants of Turing Machines

Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Automata & languages. A primer on the Theory of Computation. The imitation game (2014) Benedict Cumberbatch Alan Turing ( ) Laurent Vanbever

CS2 Language Processing note 3

Midterm Exam. CSCI 3136: Principles of Programming Languages. February 20, Group 2

Front End: Lexical Analysis. The Structure of a Compiler

where is the transition function, Pushdown Automata is the set of accept states. is the start state, and is the set of states, is the input alphabet,

Computer Science 236 Fall Nov. 11, 2010

Lecture 9 CIS 341: COMPILERS

CSE 105 THEORY OF COMPUTATION

FAdo: Interactive Tools for Learning Formal Computational Models


CS103 Handout 39 Winter 2018 February 23, 2018 Problem Set 7

Implementation of Lexical Analysis. Lecture 4

CS308 Compiler Principles Lexical Analyzer Li Jiang

Formal Definition of Computation. Formal Definition of Computation p.1/28

Yet More CFLs; Turing Machines. CS154 Chris Pollett Mar 8, 2006.

Learn Smart and Grow with world

Theory of Computations Spring 2016 Practice Final Exam Solutions

Transcription:

NFAs and Myhill-Nerode CS154 Chris Pollett Feb. 22, 2006.

Outline Bonus Questions Equivalence with Finite Automata Myhill-Nerode Theorem.

Bonus Questions These questions are open to anybody. I will only accept solutions up to the class day after the second midterm. The points you receive from doing them can be added to your midterm score. You can do either problem or both. Your midterm score can be raised to a maximum of 15pts. To receive points on the bonus problems you must come see me during my office hours or after class and demo your code. I will ask you questions and review your code to determine how much your code is worth.

Bonus Question I (page1) Problem 1 (2pts). Write a regular expression to NFA program in Java called rex2nfa. This should be run from the command line with a line like: java rex2nfa regular_expression Here a regular_expression is built up out of: the lower case alphabet symbols: a,b,c,,z; 0 -empty set; E -empty string; (rex1.rex2) -- concatenation; (rex1 U rex2) --union; and (rex1)* -- star. The output should be a sequence of rows of the form state; symbol > state

Bonus Question I (page2) state - of the form A followed by some string of digits (for instance, A02345) or R followed by some string of digits (for example, R99). The former are supposed to accept state the latter are reject states. Symbol is supposed to be either an alphabet symbol or E for the empty string. The first state listed in the first row output is supposed to be the start state.

Bonus Question II (page1) Problem 2(3pts). Write an NFA to DFA program in Java called nfa2dfa. This should be run from the command line with a line like: java nfa2dfa filename string Here filename is the name of a file containing lines of NFA instructions in the format of Problem1; string is an string that the output DFA will then try to scan and either accept or reject. Your program should output a sequence of rows for the DFA, then skip a line and output accept or reject based on whether the original NFA would have accepted or rejected that string.

Bonus Question II (page2) The format of the output rows should be: seq_of_states1; symbol > seq_of_states2 By a sequence of states, I mean a comma separated list of states where states are in the format of Problem 1. Again, the first row should contain the start state of the DFA. I only want you to output rows which are reachable from this start state.

Proof that regular implies the language of some regular expression We will again split the proof into two parts: We first define a new kind of finite automata called a generalized nondeterministic finite automata (GNFA) and show how to convert any DFA into a GNFA. Then we show how to convert any GNFA into a regular expression. To begin we define a GNFA to be an NFA where we allow transition arrows to have any regular expression as labels: a * b The transition function δ now takes a pair of states q,r and outputs a regular expression, R. The intended meaning is in state q reading a substring of the input of form R we transition to state r.

Converting DFAs to GNFAs We will be interested in GNFAs that have the following special form: The start state has transition arrows to every other state but no arrows coming in from other states. There is a single accept state, and it has arrows coming in from every other state but no arrows going to any other state. Except for the start and accepts state, one arrow goes from from every state to every other state and also from each state to itself. To convert a DFA into a GNFA, we add a new start state with and ε arrow to the old start state and a new accept state with ε arrows from the old accept states. If any arrows have multiple labels (or if we have two or more arrows between the same two states) we replace each with a single label whose label is the union labels of the these arrows. Finally, we add arrows with labels between states which had no labels so as to satisfy the remaining conditions of our special form.

Converting GNFAs to Regular expressions Our conversion above gives a GNFA with k >= 2 states. If k > 2, we will construct an equivalent GNFA with k-1 states. To do this we pick some state q rip other than the start or accept state, and we will rip it out of the machine. To compensate for the loss of this state, for any pair of states q i, q j. in this new machine we replace δ(q i, q j ) with: δ (q i, q j ) =(R 1 )(R 2 )*(R 3 ) (R 4 ) where δ(q i, q rip ) = R 1 ; δ(q rip, q rip ) = R 2 ; δ(q rip, q j ) = R 3 ; δ(q i, q j ) = R 4 This machine will be equivalent to the old machine. Further, by repeatedly ripping out states in this fashion we can get down to the 2-state machine with just a regular expression on the single transition between these two states. This regular expression will be equivalent to the original NFA.

Making DFAs as Small as Possible We have given a process for going from regular expressions (a widely used language for pattern matching) to DFAs. We would now like to optimize out DFAs and make them as small as possible. The Myhill-Nerode Theorem allows us to do this.

Pairwise Distinguishability Definition: (1) We say two strings x,y are pairwise distinguishable by L, if some string z exists such that exactly one of xz or yz is in L. (2) We say two strings x,y are pairwise indistinguishable by L, if all strings z both of xz and yz is in L or both are not in L. Fact: Pairwise Indistinguishability is an equivalence relation on strings. Definition: Let the index of L be the number of equivalence classes of L with respect to pairwise indistinguishability.

Intuition Suppose we have a DFA for some regular language. We pick two states in this DFA, say q1 and q2 and we ask whether we could collapse them into one state or not. If it is the case that given any string z, that when we are state q1 when we begin reading z we do exactly the same as far as accepting/rejecting as we would do if we were in state q2 when we began reading a z, then we could collapse these states into one. Call two states q1 and q2 distinguishable if there is some string z which such that beginning in state q1 reading a z we reject; whereas, in state q2 reading a z we accept or vice-versa. Similarly, we can define state indistinguishability. This is also an equivalence relation with equivalence classes contained in those of language indistinguishability. So indistinguishability is measuring whether or not we can collapse states.

Myhill-Nerode Theorem (start) A language is regular iff it is of finite index. Proof: Give any DFA for a language L, state indistinguishability for this DFA will have more equivalence classes then language indistinguishability for L. So if the number of language indistinguishable equivalence classes is not finite, the DFA can t have a finite number of states giving a contradiction.