CMPSCI 250: Introduction to Computation. Lecture 20: Deterministic and Nondeterministic Finite Automata David Mix Barrington 16 April 2013

Similar documents
Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

CMPSCI 250: Introduction to Computation. Lecture #36: State Elimination David Mix Barrington 23 April 2014

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Lecture 2 Finite Automata

Definition: A context-free grammar (CFG) is a 4- tuple. variables = nonterminals, terminals, rules = productions,,

CMPSCI 250: Introduction to Computation. Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014

Automating Construction of Lexers

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

CS2 Language Processing note 3

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

CSE 105 THEORY OF COMPUTATION

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

CS 125 Section #10 Midterm 2 Review 11/5/14

lec3:nondeterministic finite state automata

Chapter Seven: Regular Expressions

CMPSCI 250: Introduction to Computation. Lecture #1: Things, Sets and Strings David Mix Barrington 22 January 2014

ECS 120 Lesson 16 Turing Machines, Pt. 2

NFAs and Myhill-Nerode. CS154 Chris Pollett Feb. 22, 2006.

Automata & languages. A primer on the Theory of Computation. The imitation game (2014) Benedict Cumberbatch Alan Turing ( ) Laurent Vanbever

Regular Languages and Regular Expressions

CS/ECE 374 Fall Homework 1. Due Tuesday, September 6, 2016 at 8pm

CMPSCI 250: Introduction to Computation. Lecture #30: Properties of the Regular Languages David Mix Barrington 7 April 2014

Regular Expression Constrained Sequence Alignment

Formal Languages and Automata

1. Draw the state graphs for the finite automata which accept sets of strings composed of zeros and ones which:

(Refer Slide Time: 0:19)

CMPSCI 250: Introduction to Computation. Lecture #30: Proving Properties of the Regular Languages David Mix Barrington 9 April 2012

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

CS402 - Theory of Automata Glossary By

CIT3130: Theory of Computation. Regular languages

ECS 120 Lesson 7 Regular Expressions, Pt. 1


CS52 - Assignment 10

CSE 431S Scanning. Washington University Spring 2013

CS 181 EXAM #1 NAME. You have 90 minutes to complete this exam. You may state without proof any fact taught in class or assigned as homework.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

1 Parsing (25 pts, 5 each)

Theory Bridge Exam Example Questions Version of June 6, 2008

Solutions to Homework 10

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

CT32 COMPUTER NETWORKS DEC 2015

CS 310: State Transition Diagrams

JNTUWORLD. Code No: R

14.1 Encoding for different models of computation

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

CMPSCI 250: Introduction to Computation. Lecture #14: Induction and Recursion (Still More Induction) David Mix Barrington 14 March 2013

Definition 2.8: A CFG is in Chomsky normal form if every rule. only appear on the left-hand side, we allow the rule S ǫ.

Learn Smart and Grow with world

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

Ambiguous Grammars and Compactification

6 NFA and Regular Expressions

Midterm I (Solutions) CS164, Spring 2002

Homework 1 Due Tuesday, January 30, 2018 at 8pm

Converting a DFA to a Regular Expression JP

1. Which of the following regular expressions over {0, 1} denotes the set of all strings not containing 100 as a sub-string?

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

UNIT I PART A PART B

CSE 105 THEORY OF COMPUTATION

Theory of Computation

Material from Recitation 1

CS164: Midterm I. Fall 2003

I have read and understand all of the instructions below, and I will obey the Academic Honor Code.

CSE 105 THEORY OF COMPUTATION

Introduction to Computer Architecture

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

CMPSCI 187: Programming With Data Structures. Lecture 6: The StringLog ADT David Mix Barrington 17 September 2012

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Front End: Lexical Analysis. The Structure of a Compiler

Multiple Choice Questions

Formal Definition of Computation. Formal Definition of Computation p.1/28

Implementation of Lexical Analysis

CMPSCI 250: Introduction to Computation. Lecture #22: Graphs, Paths, and Trees David Mix Barrington 12 March 2014

Languages and Compilers

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Compiler Construction

Exercises Computational Complexity

Chapter 3: Lexing and Parsing

Implementation of Lexical Analysis. Lecture 4

Recursively Enumerable Languages, Turing Machines, and Decidability

Languages and Finite Automata

Lexical Error Recovery

CSC-461 Exam #2 April 16, 2014

Finite Automata Part Three

Decidable Problems. We examine the problems for which there is an algorithm.

CMPSCI 250: Introduction to Computation. Lecture #7: Quantifiers and Languages 6 February 2012

Actually talking about Turing machines this time

Name: Finite Automata

CS103 Handout 14 Winter February 8, 2013 Problem Set 5

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Compiler Construction

Turing Machines. A transducer is a finite state machine (FST) whose output is a string and not just accept or reject.

Finite Automata Part Three

To illustrate what is intended the following are three write ups by students. Diagonalization

COMP Logic for Computer Scientists. Lecture 25

CPSC 121 Some Sample Questions for the Final Exam Tuesday, April 15, 2014, 8:30AM

Transcription:

CMPSCI 250: Introduction to Computation Lecture 20: Deterministic and Nondeterministic Finite Automata David Mix Barrington 16 April 2013

Deterministic and Nondeterministic Finite Automata Deterministic Finite Automata Formal Definition of DFA s Examples of DFA s Characterizing the Strings For Each State Nondeterministic Finite Automata Interpretations of Nondeterminism The Model of λ-nfa s

Deterministic Finite Automata We now turn to finite-state machines, a model of computation that captures the idea of reading a file of text with a fixed limit on the memory we can use to remember what we have seen. In particular, the memory used must be constant, independent of the length of the file. (We called this O(1) in CMPSCI 187.) We ensure this by requiring our machine to have a finite state set, so that at any time during the computation all that it knows is which state it is in. The initial state is fixed. When the machine sees a new letter, it changes to a new state based on a fixed transition function. When it finishes the string, it gives a yes or no answer based on whether it is in a final state. Because the new state depends only on the old state and the letter seen, the computation is deterministic and the machine is called a deterministic finite automaton or DFA.

Where We Are Going A DFA decides a language -- it says yes or no after reading any string over its alphabet, and its language is the set of strings for which it says yes. The Myhill-Nerode Theorem will give us a way to take an arbitrary language and determine whether there is a DFA that decides it. We ll define a particular equivalence relation on strings, based only on the language. If this relation has a finite set of equivalence classes, there is a DFA for the language, and there is a minimal DFA with as many states as there are classes. We ll see how to compute the minimal DFA from any DFA for the language. As we ve mentioned, there is a DFA for a language if and only if the language is regular (is the language denoted by some regular expression). We ll prove this important result, called Kleene s Theorem, over several lectures. Our proofs will show us how to convert a DFA to a regular expression and vice versa.

Formal Definition of DFA s Formally a DFA is defined by its state set S, its initial state i S, its final state set F S, its input alphabet Σ, and its transition function δ from (S Σ) to S. We usually represent DFA s by diagrams (labeled directed multigraphs) with a node for each state, a special mark for the initial state, a double circle on each final state, and an arrow labeled a from node p to node q whenever δ(p, a) = q. The behavior function of a particular DFA is a function called δ* from (S Σ*) to S, such that δ(p, w) is the state of the DFA after it starts in state p and reads the string w. Formally, we say that δ(p, λ) = p and that δ*(p, wa) = δ(δ*(p, w), a). The language of a DFA is defined to be the set of strings w such that δ*(i, w) is a final state. For a DFA M, we call this language L(M).

An Example of a DFA This DFA has four states, 1, 2, 3, and 4. Its input alphabet is {0, 1}. Its start state is 1, and its final state set is {1, 2, 3}. Examples of strings in the language of this DFA are λ (because 1 is final), 0 (because 2 is final), and 00100 (because 3 is final). Any string with three 0 s in a row is not in the language of this DFA, because 000 takes you from anywhere to the nonfinal death state 4. In fact the only way to get to 4 is by three 0 s in a row, so the language of this DFA is the complement of the regular language Σ*000Σ*.

iclicker Question #1: DFA Diagrams Which statement about the pictured DFA is false? (a) The alphabet of the DFA is {a, b}. (b) The start state is not a final state. (c) The string bbaa is in the language of the DFA. (d) The string aaabb is not in the language of the DFA.

Examples of DFA s One of the simplest possible DFA s decides the language of binary strings with an odd number of ones. It has two states E and O, representing whether the machine has seen an even or odd number of ones so far. The initial state is E, and the final state set is {O}. The transition function has δ(e, 0) = E, δ(e, 1) = O, δ(o, 0) = O, and δ(o, 1) = E. We can build a four-state DFA for the language EE from Discussion #11 tomorrow. Its states are EE, EO, OE, and OO, where for example δ*(ee, w) = EO if w has an even number of a s and an odd number of b s. The initial state is EE and the final state set is {EE}. An a changes the first letter of the state, a b the second. Another four-state DFA can decide whether the next to last letter of a binary string w exists and is 1. The state set is {00, 01, 10, 11} and the state after reading w represents the last two letters seen. The initial state is 00 and the final state set is {10, 11}.

iclicker Question #2: Language of a DFA What is the language of the pictured DFA? (a) all strings with an even number of 1 s (b) all strings of even length (c) all strings with an odd number of 0 s (d) all strings with an even number of 0 s

DFA s in Pseudo-Java We consider the input to be given like a file, with a method to give the next letter and one to tell when the input is done. We relabel the state set and the alphabet to be {0,..., states - 1} and {0,..., letters - 1} respectively. public class DFA { natural states; natural letters; natural start; boolean [ ] isfinal = new boolean[states]; natural [ ] [ ] delta = new natural [states] [letters]; natural getnext( ) {code omitted} boolean inputdone( ) {code omitted} boolean decide ( ) {// returns whether input string is in language of DFA natural current = start; while (!inputdone( )) current = delta [current] [getnext( )]; return isfinal [current];}}

Characterizing Strings With Given Behavior How do we prove that a particular DFA has a particular language? With the even-odd DFA, we can say that δ*(e, w) = E if w has an even number of ones, and δ*(e, w) = O if it has an odd number of ones. Letting P(w) be the entire statement in the bullet above, we can prove w:p(w) by induction on all binary strings. P(λ) says that δ*(e, λ) = E, because λ has no ones and 0 is even, and δ*(e, λ) = E is true by definition of δ*. Now assume that P(w) is true, so that δ*(e, w) is E if w has an even number of ones and O otherwise. Then w0 has the same number of ones as w, so δ*(e, w0) should be the same state as δ*(e, w). And w1 has one more one than w, so δ*(e, w1) should be the other state from δ*(e, w). In each of the four cases, the new state is the one given by the δ function of the DFA.

Another Characterization Example The language No-aba is the set of strings that never have an aba substring. We can build a DFA M for No-aba with state set {1, 2, 3, 4}, start state 1, final state set {1, 2, 3}, and transition function δ(1, a) = 2, δ(1, b) = 1, δ(2, a) = 2, δ(2, b) = 3, δ(3, a) = 4, δ(3, b) = 1, and δ(4, a) = δ(4, b) = 4. (We call 4 a death state.) We can see that an aba will take us from any state to 4. Let L1 be the set of strings that have no aba and don t end in a or ab. Let L2 be the set of strings that don t have an aba and end in a. L3 is the set of strings that don t have an aba and end in ab, and L4 is the set that have aba. We can make eight checks, one for each value of δ. If δ(i, x) = j, we check that any string in Li, followed by the letter x, yields a string in Lj. Then we can do an inductive proof, where our statement P(w) is the entire statement in the bullet above: For all states i, δ*(1, w) = i if and only if w Li where each Li is as defined. This proves that w is in L(M) if and only if w is in No-aba.

Nondeterministic Finite Automata DFA s are deterministic in that the same input always leads to the same output. Some algorithms are not deterministic because they are randomized, but here we will consider algorithms that are not deterministic because they are underdefined -- given a single input, more than one output is possible. We had an example of such an algorithm with our generic search, which didn t say which element came off the open list when we needed a new one. Formally, a nondeterministic finite automaton or NFA has an alphabet, state set, start state, and final state just like a DFA. But instead of the transition function δ, it has a transition relation Q Σ Q. If (p, a, q), the NFA may move to state q if it sees the letter a while in state p. We draw an NFA like a DFA, with an a-arrow from p to q whenever (p, a, q). The NFA no longer has the rule that there must be exactly one arrow for each letter out of each state -- there may be more than one, or none.

The Language of an NFA We can no longer say what the NFA will do when reading a string, only what it might do. The language of an NFA N is defined to be the set {w: w might be accepted by N}. More formally, we define a relation * Q Σ* Q so that the triple (p, w, q) is in * if and only if N might go from p to q while reading w. Then w L(N) (i, w, f) * for some final state f F. Consider the NFA N with state set {i, p, q}, start state i, final state set {i}, alphabet {a, b, c}, and = {(i, a, i), (i, a, p), (p, b, i), (i, b, q), (q, c, i)}. This is nondeterministic because there are two a-moves out of i, and several situations with no move at all. Here L(N) is the regular language (a + ab+ bc)*, because any path from i to itself must consist of pieces labeled a, ab, or bc. It is not immediately clear how, for a larger NFA, we could determine whether a particular string was in L(N). Our method will be to turn N into a DFA.

iclicker Question #3: The Language of an NFA What is the language of this NFA? (a) Exactly those strings ending in abb. (b) The set {abb}. (c) Exactly those strings containing abb as a substring. (d) The language is empty.

Interpretations of Nondeterminism Because we can t speak clearly of what happens when we run N on w, we need other ways to think of the action of an NFA. In our proofs, we will just replace w L(N) by f: (i, w, f) * and argue about the possible w-paths in the graph of N. We can think of N as being randomized, so that whenever it has a choice of moves it selects one of them uniformly at random. (This essentially makes N a Markov process, as studied in CMPSCI 240.) Then we could speak of the probability that N accepts w, and w L(N) if and only if this probability is greater than 0. We can think of the action of N on w as a one-player game where White, who want N to accept w, chooses each move from the set of legal options. Then White has a winning strategy for this game if and only if w L(N).

iclicker Question #4: A Randomized NFA Consider running the pictured NFA randomly on the string 101. When there is one arrow available we take it, when there are two we flip a coin, and when there are none we die. What is the probability that we will accept this string? (a) 0 (b) 1/4 (c) 1/2 (d) 3/4

The Model of λ-nfa s The main reason to use NFA s is that they are easier to design in many situations when we have some other definition of the language. Often we will find it convenient to give the NFA the option to jump from one state to another without reading a letter. A λ-move is a transition (p, λ, q) that allows a λ-nfa to do just that. We need to redefine the type of, so that it is a subset of Q (Σ {λ}) Q. In the diagram, this transition is represented by an arrow from p to q labeled with λ. Formally * is now more complicated to define. We say that (p, λ, q) * if there is a path of λ-moves from p to q. Then we define *(p, wa, q) to be true if and only if there exist states r, s, and t such that (p, w, r), (r, λ, s) and (t, λ, q) are all in *, and (s, a, t) is in. What this means is that *(p, w, q) is true if and only if there exists a path from p to q such that the letters on the path, read in order, spell out w. There may be any number of λ-moves in the path as well. (Thus we don t even know how long the path from p to q might be.)