Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Similar documents
Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Lexical Analysis. Prof. James L. Frankel Harvard University

Lexical Analyzer Scanner

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

CS308 Compiler Principles Lexical Analyzer Li Jiang

Lexical Analysis - 2

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

Front End: Lexical Analysis. The Structure of a Compiler

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Lexical Error Recovery

Lexical Analyzer Scanner

CSE302: Compiler Design

Converting a DFA to a Regular Expression JP

[Lexical Analysis] Bikash Balami

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Dr. D.M. Akbar Hussain

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

CS2 Language Processing note 3

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Zhizheng Zhang. Southeast University

6 NFA and Regular Expressions

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

CMPSCI 250: Introduction to Computation. Lecture 20: Deterministic and Nondeterministic Finite Automata David Mix Barrington 16 April 2013

lec3:nondeterministic finite state automata

Theory Bridge Exam Example Questions Version of June 6, 2008

CS415 Compilers. Lexical Analysis


Automata & languages. A primer on the Theory of Computation. The imitation game (2014) Benedict Cumberbatch Alan Turing ( ) Laurent Vanbever

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Figure 2.1: Role of Lexical Analyzer

CSE 105 THEORY OF COMPUTATION

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

FAdo: Interactive Tools for Learning Formal Computational Models

Lexical Analysis. Implementation: Finite Automata

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5

UNIT -2 LEXICAL ANALYSIS

J. Xue. Tutorials. Tutorials to start in week 3 (i.e., next week) Tutorial questions are already available on-line

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Compiler course. Chapter 3 Lexical Analysis

A Characterization of the Chomsky Hierarchy by String Turing Machines

Lexical Analysis. Introduction

Theory of Computations Spring 2016 Practice Final Exam Solutions

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Finite Automata Part Three

Implementation of Lexical Analysis

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

I have read and understand all of the instructions below, and I will obey the Academic Honor Code.

Chapter 3 Lexical Analysis

CSc 453 Lexical Analysis (Scanning)

Alternation. Kleene Closure. Definition of Regular Expressions

Compiler phases. Non-tokens

Compiler Construction

Decidable Problems. We examine the problems for which there is an algorithm.

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CIT3130: Theory of Computation. Regular languages

Midterm I (Solutions) CS164, Spring 2002

Formal Languages and Compilers Lecture VI: Lexical Analysis

Implementation of Lexical Analysis. Lecture 4

Regular Languages and Regular Expressions

Finite Automata Part Three

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Lexical Error Recovery

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

Symbolic Automata Library for Fast Prototyping

Lecture 2 Finite Automata

Automating Construction of Lexers

Decision Properties of RLs & Automaton Minimization

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

Writing a Lexical Analyzer in Haskell (part II)

Lexical Analysis. Finite Automata. (Part 2 of 2)

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

2. Lexical Analysis! Prof. O. Nierstrasz!

Lexical Analysis. Chapter 2

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Compiler Construction

Lexical Analysis 1 / 52

Computer Sciences Department

Formal Languages and Automata

Lecture 3: Lexical Analysis

Regular Expression Constrained Sequence Alignment

CT32 COMPUTER NETWORKS DEC 2015

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

CS 310: State Transition Diagrams

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Finite Automata and Scanners

Monday, August 26, 13. Scanners

(Refer Slide Time: 0:19)

Wednesday, September 3, 14. Scanners

CSE 105 THEORY OF COMPUTATION

Module 6 Lexical Phase - RE to DFA

Kinder, Gentler Nation

We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Transcription:

Finite automata We have looked at using Lex to build a scanner on the basis of regular expressions. Now we begin to consider the results from automata theory that make Lex possible. Recall: An alphabet Σ is a finite set of symbols. A string over Σ is a finite sequence of symbols from Σ. A language over Σ is a set of strings over Σ. A recognizer for a language L over Σ takes as input a string x over Σ and answers yes if x is in L and no otherwise. Lex scanners are based on an implementation of Kleene s Theorem: The regular languages are exactly the languages that can be recognized by a finite automaton. BTW The textbook gives a nonstandard definition of the set of regular languages, neglecting to include the empty language. So a correct statement of Kleene s Theorem in the context of the textbook is: The regular languages are exactly the nonempty languages that can be recognized by a finite automaton. 1 Regular languages can be recognized by finite automata. In fact, for every regular language, there is a finite automaton that recognizes it, and, moreover, every finite automaton recognizes a regular language. (Well, as I mentioned, there s an unfortunate exception for us, because your book does not count among the regular languages.) Finite automata (FA s) can be deterministic or not. We look first at nondeterministic finite automata (NFAs), because it is particularly easy to transform regular expressions into NFAs, and we can understand deterministic finite automata (DFAs) as a special case of NFAs. 2

Here is a diagram (a transition graph ) representing an NFA that recognizes the language (a b) abb (fig 3.19) NFAs and their transition graph representations Definition An NFA is a 5-tuple (S, Σ,move, s 0, F) where S is a finite set (of states) Σ is an alphabet (the input alphabet) move is a function from S (Σ {ǫ}) to 2 S (the powerset of S, that is, the set of all subsets of S) The set of states of the NFA is {0, 1, 2, 3}. The input alphabet is {a, b}. The start state is 0. There is only one accepting state: 3. The transition function for this NFA is represented by the table INPUT STATE a b ǫ 0 {0, 1} {0} 1 {2} 2 {3} 3 s 0 S (the start state) F S (the set of final, or accepting, states) An NFA is often represented as a transition graph in which: states are the nodes, represented as circles, the start state is indicated by an incoming arrow with no source, final states are indicated by a second, concentric circle, there is an arrow from state s to state t, labeled σ, if t move(s, σ). 3 4

String acceptance, language recognition, and an example An NFA M accepts a string x if there is a path in the transition graph of M from the start state to an accepting state, such that the labels along this path spell out x. (That is, the concatenation of the labels is x.) An NFA M accepts or recognizes, a language L if it accepts all, and only, strings from L. So, how many languages can a given NFA recognize? Deterministic Finite Automata (DFAs) A deterministic finite automaton is an NFA in which no state has an ǫ-transition (that is, in the transition graph, no node has an outgoing edge labeled ǫ) for each state s and input symbol a there is at most one outgoing edge labeled a (in the transition graph). Here is a DFA that accepts (a b) abb (fig 3.23) Here s a diagram of an NFA accepting the language a + b + : (fig 3.21) 5 6

A DFA has at most one transition from each state on any input, so it is easy to simulate. Let s look at a way to do this... First if there is any state s and input symbol a for which s does not have an outgoing edge labeled a, add a new state s d to S, and for every s and a for which move(s, a) =, let move(s, a) = {s d }. (And let move(s d, a) = {s d } for all a.) Now for every s and a, move(s, a) is a set with one state so let s instead understand move as the corresponding function that takes each state and input symbol to a state. With this slight adjustment to the DFA and its transition function, we can decide whether an input string x (terminated with eof) belongs to the language of the DFA, as follows: Converting an NFA into a DFA While DFAs are easy to simulate, NFAs are easier to obtain: 1. Easier to write directly. 2. Easy to construct on the basis of regular expressions. So we ll want an algorithm for converting any NFA into a DFA recognizing the same language... Let s start with a special case NFAs without ǫ-transitions. And let s begin with an example. (fig 3.19) s := s 0 ; c := nextchar; while c eof do s := move(s, c); c := nextchar; ; if s is in F then return yes else return no 7 8

Algorithm for reducing NFA with no ǫ-transitions to DFA: Dstates will be a set of subsets of S. (So each state in the DFA corresponds to a set of states in the NFA.) The reduction is slightly more complicated when the NFA has ǫ-transitions. Let s try an example first: an NFA for a(ab a ) b. Start state is {s 0 }. Dfinal = {T Dstates T F }. For each T Dstates, and each input symbol a Dmove(T, a) = move(s, a) s T It remains only to compute Dstates, as follows. initially, {s 0 } is the only element of Dstates, and it is unmarked while there is an unmarked state T in Dstates do begin mark T; for each input symbol a do begin U := Dmove(T, a); if U / Dstates then add U as an unmarked element of Dstates; 9 10

For each state s of an NFA, let s write ǫ-closure(s) to denote the set of all states reachable from s by a path with each transition labeled with ǫ. Notice that, for all s, s ǫ-closure(s) since you can reach s from s by the empty path (path with no transitions trivially, all of its transitions are labeled with ǫ). For every set T of states of an NFA, let ǫ-closure(t) = ǫ-closure(s). Now we can specify the general reduction of NFAs to DFAs, much as before... 11 s T Algorithm for reducing NFA to DFA: Dstates will again be a set of subsets of S. Start state is ǫ-closure({s 0 }). (Notice use of ǫ-closure.) Dfinal = {T Dstates T F }. (As before). For each T Dstates, and each input symbol a Dmove(T, a) = move(s, a) (Also as before). We ll up computing another function Dtrans as the transition function for the DFA. It remains to compute Dstates and Dtrans, as follows. initially, ǫ-closure({s 0 }) is the only element of Dstates, and it is unmarked while there is an unmarked state T in Dstates do begin mark T; for each input symbol a do begin U := ǫ-closure(dmove(t, a)); if U / Dstates then add U as an unmarked element of Dstates; Dtrans(T, a) := U; 12 s T

initially, ǫ-closure({s 0 }) is the only element of Dstates, and it is unmarked while there is an unmarked state T in Dstates do begin mark T; for each input symbol a do begin U := ǫ-closure(dmove(t, a)); if U / Dstates then add U as an unmarked element of Dstates; Dtrans(T, a) := U; Let s try it. Read Section 3.7. For next time (fig 3.27) 13 14