Lecture 2 Finite Automata

Similar documents
CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

(Refer Slide Time: 0:19)

CMPSCI 250: Introduction to Computation. Lecture 20: Deterministic and Nondeterministic Finite Automata David Mix Barrington 16 April 2013

I have read and understand all of the instructions below, and I will obey the Academic Honor Code.

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

To illustrate what is intended the following are three write ups by students. Diagonalization

lec3:nondeterministic finite state automata

Alternation. Kleene Closure. Definition of Regular Expressions

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Computer Sciences Department

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

CSE 105 THEORY OF COMPUTATION

Regular Languages and Regular Expressions

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Material from Recitation 1

CS402 - Theory of Automata Glossary By

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Computer Science 236 Fall Nov. 11, 2010

Formal Definition of Computation. Formal Definition of Computation p.1/28

Grade 6 Math Circles November 6 & Relations, Functions, and Morphisms

CS21 Decidability and Tractability

Equivalence of NTMs and TMs

CIT3130: Theory of Computation. Regular languages

14.1 Encoding for different models of computation

CS/ECE 374 Fall Homework 1. Due Tuesday, September 6, 2016 at 8pm

Theory of Computations Spring 2016 Practice Final Exam Solutions

Recursively Enumerable Languages, Turing Machines, and Decidability

1. Draw the state graphs for the finite automata which accept sets of strings composed of zeros and ones which:

Cardinality of Sets MAT231. Fall Transition to Higher Mathematics. MAT231 (Transition to Higher Math) Cardinality of Sets Fall / 15

CS2 Language Processing note 3

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Homework 1 Due Tuesday, January 30, 2018 at 8pm

Languages and Automata

COMP 330 Autumn 2018 McGill University

9. MATHEMATICIANS ARE FOND OF COLLECTIONS

Definition: A context-free grammar (CFG) is a 4- tuple. variables = nonterminals, terminals, rules = productions,,

Chapter 3: Lexing and Parsing

How convincing is our Halting Problem proof? Lecture 36: Modeling Computing. Solutions. DrScheme. What is a model? Modeling Computation

Automata & languages. A primer on the Theory of Computation. The imitation game (2014) Benedict Cumberbatch Alan Turing ( ) Laurent Vanbever

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Introduction to Computers & Programming

Lecture 5: Duality Theory

Decidable Problems. We examine the problems for which there is an algorithm.

Theory of Programming Languages COMP360

Lecture 4: examples of topological spaces, coarser and finer topologies, bases and closed sets

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

Theory of Computation, Homework 3 Sample Solution

6 NFA and Regular Expressions

CS103 Handout 13 Fall 2012 May 4, 2012 Problem Set 5

Have the students look at the editor on their computers. Refer to overhead projector as necessary.

Pushdown Automata. A PDA is an FA together with a stack.

1 Parsing (25 pts, 5 each)

Order from Chaos. University of Nebraska-Lincoln Discrete Mathematics Seminar

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

COMP Logic for Computer Scientists. Lecture 25

ECS 120 Lesson 16 Turing Machines, Pt. 2

Part I: Multiple Choice Questions (40 points)

Lexical Analysis - 2

Finite Automata and Scanners

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

A Scanner should create a token stream from the source code. Here are 4 ways to do this:

Ambiguous Grammars and Compactification

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

Regular Languages (14 points) Solution: Problem 1 (6 points) Minimize the following automaton M. Show that the resulting DFA is minimal.

Lecture 4: Petersen Graph 2/2; also, Friendship is Magic!

CS 181 EXAM #1 NAME. You have 90 minutes to complete this exam. You may state without proof any fact taught in class or assigned as homework.

CS2 Algorithms and Data Structures Note 10. Depth-First Search and Topological Sorting

MITOCW watch?v=4dj1oguwtem

Midterm Exam. CSCI 3136: Principles of Programming Languages. February 20, Group 2

CSE 105 THEORY OF COMPUTATION

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES

Finite State Machines

Lecture 20 : Trees DRAFT

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5

G52LAC Languages and Computation Lecture 6

Today, we will dispel the notion that Java is a magical language that allows us to solve any problem we want if we re smart enough.

Name: CS 341 Practice Final Exam. 1 a 20 b 20 c 20 d 20 e 20 f 20 g Total 207

Turing Machines. A transducer is a finite state machine (FST) whose output is a string and not just accept or reject.

Ramsey s Theorem on Graphs

Chapter Seven: Regular Expressions

MATH 22 MORE ABOUT FUNCTIONS. Lecture M: 10/14/2003. Form follows function. Louis Henri Sullivan

Formal Languages and Compilers Lecture VI: Lexical Analysis

CSE 105 THEORY OF COMPUTATION

Lecture 3: Lexical Analysis

Basic Reliable Transport Protocols

Theory of Computation

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Semantic Analysis. Lecture 9. February 7, 2018

CSE 401 Midterm Exam Sample Solution 2/11/15

4.7 Approximate Integration

CSE 105 THEORY OF COMPUTATION

Final Course Review. Reading: Chapters 1-9

CT32 COMPUTER NETWORKS DEC 2015

Transcription:

Lecture 2 Finite Automata August 31, 2007 This lecture is intended as a kind of road map to Chapter 1 of the text just the informal examples that I ll present to motivate the ideas. 1 Expressions without Parentheses In the last set of notes we looked at algebraic expressions and removed everything but the parentheses. We asked how we could tell whether a string of parentheses was correctly nested. Here we ll remove the parentheses and ask how we can determine if the resulting expression is correctly formed. That is, we ll look at expressions like 345-2789-27+675+43-2 Let s suppose for now that the only operation symbols that appear are + and, and that the terms are all unsigned decimal integers. 2 A State-transition Graph for the Problem We can draw a kind of flow chart for an algorithm that determines whether such an expression. 1

The incoming arrow at the left-hand node means that the algorithm starts in this state. The double circle at the right-hand node means the input is accepted if the algorithm is in this state when it finishes scanning the input string. If the algorithm is in the other state when the end of the input is reached, the input string is rejected. Here are two pseudocode versions of the algorithm. In the first one, the state is literally a place in the program code: state0: if there is no more input, reject and terminate; else if the next input symbol is a digit, go to state1; state1: if there is no more input accept and terminate; else if the next input symbol is a digit, go to state1; else if the next input symbol is + or -, go to state0; In the second, the variable representing the state plays an analogous role to the variable excess in the parenthesis-testing algorithm. while there is still more input if next input symbol is a digit state=1; else if next input symbol is + or - and (state==1) if(state==1) accept else reject These reflect two different interpretations of what is meant by the state of a computer program, an issue that will be important to us when we try to describe formally exactly what an algorithm is. 3 DFA s But I digress. The gizmo illustrated in the diagram above, and realized in the pseudocode implementations, is called a deterministic finite automaton. Au- 2

tomaton is an old word, traditionally used to mean something along the lines of robot. The plural of automaton is automata. Two relatively common words of English that form their plurals in the same way are phenomenon and criterion. People who should know better habitually misuse all of these words and used the plural form for both the singular and the plural. I can get fussy about this sort of thing, but I know that I m fighting a losing battle. But I digress. The word deterministic in the definition means that when you know the present state and the next input, then the next state is completely determined. Strictly speaking, the commonly-used definition of DFA requires that the state transition function of the automaton be completely specified: that is, for every choice of state and input symbol, there is a next state defined. The diagram above violates this condition, so we have to fix things up by adding a new state: The set of strings accepted by a DFA is called a regular language. 4 NFA s Let s suppose we wanted to an algorithm for recognizing the set of expressions in which the last operation symbol is + rather than. We can redraw our state-transition graph to obtain: 3

Now there s something really fishy going on we ve lost the determinism, since when we re in state 1 and the next input symbol is + we re allowed to guess whether or not this is the last operation symbol and change state accordingly. It s as though you were allowed to rewrite the pseudocode as while there is still more input if next input symbol is a digit and (state==0) or (state==1) state=1; else if next input symbol is a digit and (state==2) or (state==3) state=3; else if next input symbol is - and (state==1) else if next symbol is + and (state==1) either or state=2; if(state==3) accept else reject The either-or statement is not a standard feature in programming languages, for obvious reasons! The nondeterministic finite automaton pictured above accepts its input as long as there is some sequence of guesses that leads to an accepting state at the end of the input. Nondeterministic algorithms don t look like very reasonable things to be 4

talking about, yet the concept is surprisingly useful. In the case of finite automaton, nondeterminism is useful because (a) it s often much easier to devise a nondeterministic solution for a problem; and (b) it s always possible to eliminate nondeterminism: Theorem 1 The set of strings accepted by an NFA is a regular language. (That is, given an NFA, there is a DFA that accepts exactly the same set of strings.) 5 Regular Expressions Here is a very different kind of description of the set of correct expressions. Remember that when we discussed correct bracketing, we gave two rather different descriptions: One that told how to generate correct patterns, and the other that told how to test a string for correctness. For the expressions under consideration here, we ve already described how to test them, here is a description of how to generate them: A digit is one of the symbols 0,1,2,3,4,5,6,7,8,9. A number is a sequence of 1 or more digits. An operator is one of the symbols + or -. An expression is a number, followed by a sequence of 0 or more occurrences of the pattern: operator followed by a number. This description builds sets of strings by starting with the simplest pattern an individual symbol and combining them using the operations or, followed by and a sequence 0 or more occurrences of. We use sybmols,, and to denote these operations. For instance, we would denote a digit by a sequence of 0 or more digits by (0 1 9) (0 1 9) and a sequence of one or more digits by We denote the complete pattern by (0 1 9)(0 1 9). (0 1 9)(0 1 9) ((+ )(0 1 9)(0 1 9) ). Pattern specifications built using these three operations are called regular expressions. The regular expression denotes the set of all strings that match the specified pattern. The crucial fact about these is: 5

Theorem 2 A language is denoted by a regular expression if and only if it is a regular language. The practical import is this: There is a lot of software out there that let s you search for patterns in text where you specify the pattern by a regular expression. In many cases such programs build a finite automaton from the input regular expression, and then simulate the finite automaton scanning the text. As we shall see in class, it is much easier to build an NFA from a regular expression than a DFA. The NFA can then either be converted into a DFA or simulated directly. I will have you play around with the UNIX utility grep which allows you to search for instances of regular expressions in a file. 6 Nonregular Languages Is the language P of correct arrangements of brackets a regular language? The following intuition suggests that it is not: When we scan a string from left to right, we need to keep track of the excess of left brackets over right brackets. The set of possible values for this excess is infinite, so we cannot do this with a finite set of state values. Here is a formal proof that P is not a regular language. Suppose that it is, and suppose that it is accepted by a DFA with n states. Now let us read the the string a n (n occurrences of a) starting from the initial state q 0 of the automaton, and look at the sequence of states we encounter: q 0, q 1,..., q n. There are n+1 states on this list, but only n distinct states in the automaton, so two of these states must be the same: That is q i = q j for some i < j. Now q i is the state we reach when we have read i occurrences of a, so if we now proceed to read i b s we will wind up in an accepting state q acc of the automaton, since a i b i P. But q i is also the state we reach when we have read j occurrences of a, and thus we wind up in an accepting state when we read a j b i. But then our DFA cannot accept the language P, because a j b i / P, a contradiction. (Another way to state this argument is that any regular language that contains all the strings a i b i must also contain a string of the form a j b i where j > i, so P cannot be a regular language.) In class we will formalize this reasoning in a general theorem called the pumping lemma that allows one to prove that a great many languages are non-regular. 6