Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

Similar documents
Implementation of Lexical Analysis

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Lexical Analysis. Implementation: Finite Automata

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

6 NFA and Regular Expressions

Lexical Analysis. Chapter 2

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Introduction to Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis

Lexical Analysis. Lecture 2-4

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

Implementation of Lexical Analysis

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Lexical Analysis. Lecture 3-4

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Automating Construction of Lexers

Behaviour Diagrams UML

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

Ambiguous Grammars and Compactification

Converting a DFA to a Regular Expression JP

Languages and Compilers

UNIT -2 LEXICAL ANALYSIS

CSE 431S Scanning. Washington University Spring 2013

CIT3130: Theory of Computation. Regular languages

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Implementation of Lexical Analysis. Lecture 4

Skyup's Media. PART-B 2) Construct a Mealy machine which is equivalent to the Moore machine given in table.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Lexical Analysis. Prof. James L. Frankel Harvard University

JNTUWORLD. Code No: R

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Non-deterministic Finite Automata (NFA)

DVA337 HT17 - LECTURE 4. Languages and regular expressions

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

Compiler Construction

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Dr. D.M. Akbar Hussain

CSE 105 THEORY OF COMPUTATION

CS308 Compiler Principles Lexical Analyzer Li Jiang

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Lexical Analysis 1 / 52

Lexical Analyzer Scanner

Lexical Analyzer Scanner

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Module 6 Lexical Phase - RE to DFA

Front End: Lexical Analysis. The Structure of a Compiler

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

COMP Logic for Computer Scientists. Lecture 25

1. Draw the state graphs for the finite automata which accept sets of strings composed of zeros and ones which:


FAdo: Interactive Tools for Learning Formal Computational Models

Finite Automata Part Three

1.0 Languages, Expressions, Automata

2. Lexical Analysis! Prof. O. Nierstrasz!

Course Project 2 Regular Expressions

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Chapter 3 Lexical Analysis

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Compiler course. Chapter 3 Lexical Analysis

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

CS415 Compilers. Lexical Analysis

Compiler Construction

8 ε. Figure 1: An NFA-ǫ

Regular Languages and Regular Expressions

Finite Automata Part Three

Formal Languages. Formal Languages

Lexical Analysis - 2

Zhizheng Zhang. Southeast University

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

1. [5 points each] True or False. If the question is currently open, write O or Open.

T.E. (Computer Engineering) (Semester I) Examination, 2013 THEORY OF COMPUTATION (2008 Course)

Kinder, Gentler Nation

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

[Lexical Analysis] Bikash Balami

CMSC 350: COMPILER DESIGN

Automata Theory TEST 1 Answers Max points: 156 Grade basis: 150 Median grade: 81%

Theory Bridge Exam Example Questions Version of June 6, 2008

lec3:nondeterministic finite state automata

Finite Automata Part Three

Lexical Analysis. Finite Automata. (Part 2 of 2)

Lecture 3: Lexical Analysis

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

WFST: Weighted Finite State Transducer. September 12, 2017 Prof. Marie Meteer

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

Lecture 4: Syntax Specification

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

Transcription:

Regular Languages MACM 3 Formal Languages and Automata Anoop Sarkar http://www.cs.sfu.ca/~anoop The set of regular languages: each element is a regular language Each regular language is an example of a (formal) language, i.e. a set of strings e.g. { a m b n : m, n are +ve integers } 2/9/6 2/9/6 3 Formal Languages: Recap Symbols: a, b, c Alphabet : finite set of symbols! = {a, b} String: sequence of symbols bab Empty string: Define:! =! # {} Set of all strings:!* (Formal) Language: a set of strings { a n b n : n > } cf. The Library of Babel, Jorge Luis Borges Regular Languages Defining the set of all regular languages: The empty set and {a} for all a in! are regular languages If L and L 2 and L are regular languages, then: (union) (Kleene closure) are also regular languages There are no other regular languages (concatenation) 2/9/6 2 2/9/6 4

Formal Grammars A formal grammar is a concise description of a formal language A formal grammar uses a specialized syntax For example, a regular expression is a concise description of a regular language (a#b)*abb : is the set of all strings over the alphabet {a, b} which end in abb Regular Expressions: Examples Alphabet {, } All strings that represent binary numbers divisible by 4 (but accept ) ((#)*) All strings that do not contain as a substring ** 2/9/6 5 2/9/6 7 Regular Expressions: Definition Every symbol of! # { } is a regular expression The empty language $ is a regular expression Note that *$ = $ If r and r 2 are regular expressions, so are Concatenation: r r 2 Alternation: r # r 2 Repetition: r * Nothing else is. But grouping re s is allowed: e.g. aa#bc vs. ((aa)#b)c Finite Automata: Recap A set of states S One start state q, zero or more final states F An alphabet % of input symbols A transition function: &: S x! ' S Example: &(, a) = 2 2/9/6 6 2/9/6 8

Finite Automata: Example What regular expression does this automaton accept? Answer: (#)* 2/9/6 9 Thompson s construction Converts regexps to NFA Six simple rules Empty language Symbols Empty String Alternation (r or r 2 ) Concatenation (r followed by r 2 ) Repetition (r *) 2/9/6 NFAs NFA: like a DFA, except A transition can lead to more than one state, that is, &: S x! ' 2 S One state is chosen non-deterministically Transitions can be labeled with, meaning states can be reached without reading any input, that is, &: S x! # { } ' 2 S Thompson Rule For the empty language $ (optionally include a sinkhole state)!! 2/9/6 2/9/6 2

Thompson Rule For each symbol x of the alphabet, there is a NFA that accepts it (optionally include a sinkhole state) x Thompson Rule 3 Given two NFAs for r, r 2, there is a NFA that accepts r #r 2 r!\x! r2 2/9/6 3! 2/9/6 5 Thompson Rule 2 There is an NFA that accepts only!! Thompson Rule 4 Given two NFAs for r, r 2, there is a NFA that accepts r r 2 r r2 no longer final state after concatenation 2/9/6 4 2/9/6 6

Thompson Rule 5 Given a NFA for r, there is an NFA that accepts r * Basic Blocks and r (this version does not report errors: no sinkholes) 2/9/6 7 2/9/6 9 Example Set of all binary strings that are divisible by four (include in this set) Defined by the regexp: ((#)*)# Apply Thompson s Rules to create an NFA 2/9/6 8 2/9/6 2

( )* 2/9/6 2 (( )*) 2/9/6 23 2 4 5 3 8 6 7 9 2 3 4 ( )* 2/9/6 22 (( )*) 2/9/6 24

NFA to DFA Conversion Subset construction Idea: subsets of set of all NFA states are equivalent and become one DFA state Algorithm simulates movement through NFA Key problem: how to treat -transitions? NFA Simulation After computing the -closure move, we get a set of states On some input extend all these states to get a new set of states 2/9/6 25 2/9/6 27 -Closure NFA Simulation Start state: q -closure(s): S is a set of states Start state: q Input: c,, c k 2/9/6 26 2/9/6 28

Conversion from NFA to DFA -closure(q ) Conversion method closely follows the NFA simulation algorithm Instead of simulating, we can collect those NFA states that behave identically on the same input Group this set of states to form one state in the DFA 2 4 5 3 8 6 7 2 3 9 4 2/9/6 29 2/9/6 3 Example: subset construction move(-closure(q ), ) 2 4 5 3 8 6 7 9 2 4 5 3 8 6 7 9 2 3 4 2 3 4 2/9/6 3 2/9/6 32

-closure(move(-closure(q ), )) -closure(move(-closure(q ), )) 2 4 5 3 8 6 7 9 2 4 5 3 8 6 7 9 2 3 4 2 3 4 2/9/6 33 2/9/6 35 move(-closure(q ), ) DFA (partial) 2 4 5 3 8 6 7 2 3 9 4 [, 2, 3, 4, 6, 9, 2] [3, 4, 5, 6, 8, 9,, 3, 4] [3, 4, 6, 7, 8, 9] 2/9/6 34 2/9/6 36

[, 2, 3, 4, 6, 9, 2] DFA for (( )*) [3, 4, 5, 6, 8, 9,, 3, 4] [3, 4, 6, 7, 8, 9] [3, 4, 5, 6, 8, 9, ] [3, 4, 5, 6, 8, 9,,, 4] Minimization (II) [3, 4, 6, 7, 8, 9] [3, 4, 5, 6, 8, 9, ] [3, 4, 5, 6, 8, 9,,, 4] 2/9/6 37 2/9/6 39 [, 2, 3, 4, 6, 9, 2] Minimization (I) [3, 4, 6, 7, 8, 9] [3, 4, 5, 6, 8, 9, ] [3, 4, 5, 6, 8, 9,,, 4] 2/9/6 38 Minimization of DFAs Algorithm for minimizing the number of states in a DFA Step : partition states into 2 groups: accepting and non-accepting A D B 2/9/6 4 E C

Minimization of DFAs Step 2: in each group, find a sub-group of states having property P P: The states have transitions on each symbol (in the alphabet) to the same group A, : blue A, : yellow E, : blue E, : yellow D, : yellow D, : yellow A D B 2/9/6 4 E C B, : blue B, : yellow C, : blue C, : yellow Minimization of DFAs Step 4: each group becomes a state in the minimized DFA Transitions to individual states are mapped to a single state representing the group of states A D B E C 2/9/6 43 Minimization of DFAs NFA to DFA Step 3: if a sub-group does not obey P split up the group into a separate group Go back to step 2. If no further sub-groups emerge then continue to step 4 A, : blue A, : green E, : blue E, : green D, : yellow D, : green A D B 2/9/6 42 E C B, : blue B, : green C, : blue C, : green Subset construction converts NFA to DFA Complexity: in programs we measure time complexity in number of steps For FSAs, we measure complexity in terms of the number of states 2/9/6 44

NFA to DFA NFA to DFA Problem: An n state NFA can sometimes become a 2 n state DFA, an exponential increase in complexity Try the subset construction on NFA built for the regexp A*aA n- where A is the regexp (a b) Minimization can reduce the number of states But minimization requires determinization 2/9/6 45 2/9/6 47 NFA to DFA NFA to DFA 2/9/6 46 2 5 = 32 states 2/9/6 48

Equivalence of Regexps NFA to RegExp (R S) T == R (S T) == R S T (RS)T == R(ST) (R S) == (S R) R*R* == (R*)* == R* == RR* R** == R* (R S)T = RT ST R(S T) == RS RT (R S)* == (R*S*)* == (R*S)*R* == (R* S*)* RR* == R*R (RS)*R == R(SR)* R = R R = R = R A = a B A B = b D b C a B b b D a a C D = a B C = a D 2/9/6 49 2/9/6 5 Equivalence of Regexps NFA to RegExp ()* ()* ()()* ()* ()()* ()()* ()()* ()* (RS)*R == R(SR)* RS == (RS) R* == RR* R == R R R* == RR* Three steps in the algorithm (apply in any order):. Substitution: for B = X pick every A = B T and replace to get A = X T 2. Factoring: (R S) (R T) = R (S # T) and (R T) (S T) = (R # S) T 3. Arden's Rule: For any set of strings S and T, the equation X = (S X) T has X = (S*) T as a solution. 2/9/6 5 2/9/6 52

NFA to RegExp Summary A = a B B = b D b C D = a B C = a D Substitute: A = a B B = b D b a D D = a B Factor: A = a B B = (b # b a) D D = a B Substitute: A = a (b # b a) D D = a (b # b a) D Recognition of a string in a regular language: is a string accepted by an NFA? Conversion of regular expressions to NFAs Determinization: converting NFA to DFA Converting an NFA into a regular expression Other useful closure properties: union, concatenation, Kleene closure, intersection 2/9/6 53 2/9/6 55 NFA to RegExp A = a (b # b a) D D = a (b # b a) D Factor: A = (a b # a b a) D D = (a b # a b a) D Arden: A = (a b # a b a) D D = (a b # a b a)* Remove epsilon: A = (a b # a b a) D D = (a b # a b a)* Substitute: A = (a b # a b a) (a b # a b a)* Simplify: A = (a b # a b a)+ 2/9/6 54