Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts

Similar documents
Regular Expressions. Chapter 6

Regular Expressions. Lecture 10 Sections Robb T. Koether. Hampden-Sydney College. Wed, Sep 14, 2016

Chapter Seven: Regular Expressions

Regular Expressions. Regular Expressions. Regular Languages. Specifying Languages. Regular Expressions. Kleene Star Operation

Regular Languages and Regular Expressions

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Formal Languages and Automata

Notes for Comp 454 Week 2

Regular Expression Module-2

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

CMPSCI 250: Introduction to Computation. Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

Languages and Strings. Chapter 2

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Chapter 4: Regular Expressions

Lexical Analysis. Prof. James L. Frankel Harvard University

Alphabets, strings and formal. An introduction to information representation

Automating Construction of Lexers

1.3 Functions and Equivalence Relations 1.4 Languages

Multiple Choice Questions

2. Lexical Analysis! Prof. O. Nierstrasz!

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

DVA337 HT17 - LECTURE 4. Languages and regular expressions

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

CMSC 132: Object-Oriented Programming II

Learn Smart and Grow with world

Name: Finite Automata

TOPIC PAGE NO. UNIT-I FINITE AUTOMATA

Dr. D.M. Akbar Hussain

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Glynda, the good witch of the North

1 Finite Representations of Languages

Ling/CSE 472: Introduction to Computational Linguistics. 4/6/15: Morphology & FST 2

Converting a DFA to a Regular Expression JP

Regular Expressions & Automata

LECTURE NOTES THEORY OF COMPUTATION

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

LECTURE NOTES THEORY OF COMPUTATION

Decision, Computation and Language

8 ε. Figure 1: An NFA-ǫ

CHAPTER TWO LANGUAGES. Dr Zalmiyah Zakaria

Lexical Analysis - 1. A. Overview A.a) Role of Lexical Analyzer

CS308 Compiler Principles Lexical Analyzer Li Jiang

Welcome! CSC445 Models Of Computation Dr. Lutz Hamel Tyler 251

Power Set of a set and Relations

Lexical Analysis (ASU Ch 3, Fig 3.1)

Proof Techniques Alphabets, Strings, and Languages. Foundations of Computer Science Theory

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Slides for Faculty Oxford University Press All rights reserved.

CMPSCI 250: Introduction to Computation. Lecture #36: State Elimination David Mix Barrington 23 April 2014

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

NFAs and Myhill-Nerode. CS154 Chris Pollett Feb. 22, 2006.

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Finite Automata Part Three

Strings, Languages, and Regular Expressions

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Assignment No.4 solution. Pumping Lemma Version I and II. Where m = n! (n-factorial) and n = 1, 2, 3

Turing Machine Languages

CSE 431S Scanning. Washington University Spring 2013

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Automata Theory TEST 1 Answers Max points: 156 Grade basis: 150 Median grade: 81%

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

Normal Forms for CFG s. Eliminating Useless Variables Removing Epsilon Removing Unit Productions Chomsky Normal Form

Languages and Compilers

JNTUWORLD. Code No: R

Ambiguous Grammars and Compactification

Languages and Finite Automata

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Chapter 18 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. Cargal.

Finite Automata Part Three

2068 (I) Attempt all questions.

1.0 Languages, Expressions, Automata

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by

KHALID PERVEZ (MBA+MCS) CHICHAWATNI

CS402 - Theory of Automata FAQs By

Compilers CS S-01 Compiler Basics & Lexical Analysis

MA/CSSE 474. Today's Agenda

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

Compiler Construction

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

The Generalized Star-Height Problem

2. Sets. 2.1&2.2: Sets and Subsets. Combining Sets. c Dr Oksana Shatalov, Fall

Why Study the Theory of Computation?

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Formal Grammars and Abstract Machines. Sahar Al Seesi

Lec-5-HW-1, TM basics

QUESTION BANK. Unit 1. Introduction to Finite Automata

CSE 105 THEORY OF COMPUTATION

A Formal Study of Practical Regular Expressions

Group A Assignment 3(2)

UNIT I PART A PART B

Transcription:

Regular Languages L Regular Language Regular Expression Accepts Finite State Machine

Regular Expressions The regular expressions over an alphabet are all and only the strings that can be obtained as follows: 1. is a regular expression. 2. is a regular expression. 3. Every element of is a regular expression. 4. If, are regular expressions, then so is. 5. If, are regular expressions, then so is. ( + ) 6. If is a regular expression, then so is *. 7. is a regular expression, then so is +. 8. If is a regular expression, then so is ( ).

Regular Expressions Examples: If = {a, b}, the following are regular expressions: a (a b)* abba

Regular Expressions Define Languages Define L, a semantic interpretation function for regular expressions: 1. L( ) =. 2. L( ) = { }. 3. L(c), where c = {c}. 4. L( ) = L( ) L( ). 5. L( ) = L( ) L( ). 6. L( *) = (L( ))*. 7. L( + ) = L( *) = L( ) (L( ))*. If L( ) is equal to, then L( + ) is also equal to. Otherwise L( + ) is the language that is formed by concatenating together one or more strings drawn from L( ). 8. L(( )) = L( ).

Regular Sets Definition: 1. Basis:, { } and {a}, for every a are regular sets over. 2. Recursive step: Let X and Y be regular sets over. Then the sets X Y, XY, and X* are regular sets over. 3. Closure: X is a regular set over only if it can be obtained from the basis elements by a finite number of applications of the recursive step Examples: Let = {0, 1} Following are some regular sets over :, { }, {0}, {1}, {00, 11}, {0} *, {0, 1} *

Analyzing a Regular Expression L((a b)*b) = L((a b)*) L(b) = (L((a b)))* L(b) = (L(a) L(b))* L(b) = ({a} {b})* {b} = {a, b}* {b}

Constructing Regular Expression L = {w {a, b}*: w is even} ((a b) (a b))* (aa ab ba bb)* L = {w {a, b}*: w contains an odd number of a s} b* (ab*ab*)* a b* b* a b* (ab*ab*)*

Analyzing a Regular Expression a* b* (a b)* (ab)* a*b*

Operator Precedence in Regular Expressions Regular Expressions Arithmetic Expressions Highest Kleene star exponentiation concatenation multiplication Lowest union addition a b* c d* x y 2 + i j 2

Kleene s Theorem Finite state machines and regular expressions define the same class of languages. To prove this, we must show: Theorem: Any language that can be defined with a regular expression can be accepted by some FSM and so is regular. Theorem: Every regular language (i.e., every language that can be accepted by some DFSM) can be defined with a regular expression.

For Every Regular Expression There is a Corresponding FSM Proof by construction (construct FSM corresponding to re: : A single element of : :

For Every Regular Expression There is a Corresponding FSM Proof by construction (construct FSM corresponding to re: If is the regular expression and if both L( ) and L( ) are regular:

For Every Regular Expression There is a Corresponding FSM Proof by construction (construct FSM corresponding to re: If is the regular expression and if both L( ) and L( ) are regular:

For Every Regular Expression There is a Corresponding FSM Proof by construction (construct FSM corresponding to re: If is the regular expression * and if L( ) is regular:

For Every Regular Expression There is a Corresponding FSM Example: (b ab)* An FSM for b An FSM for a An FSM for b An FSM for ab:

For Every Regular Expression There is a Corresponding FSM Example: (b ab)* An FSM for (b ab): b a b

For Every Regular Expression There is a Corresponding FSM Example: (b ab)* An FSM for (b ab)*: b a b

The Algorithm fsmtoregexheuristic fsmtoregexheuristic(m: FSM) = 1. Remove unreachable states from M. 2. If M has no accepting states then return. 3. If the start state of M is part of a loop, create a new start state s and connect s to M s start state via an -transition. 4. If there is more than one accepting state of M or there are any transitions out of any of them, create a new accepting state and connect each of M s accepting states to it via an -transition. The old accepting states no longer accept. 5. If M has only one state then return. 6. Until only the start state and the accepting state remain do: 6.1 Select rip (not s or an accepting state). 6.2 Remove rip from M. 6.3 *Modify the transitions among the remaining states so M accepts the same strings. 7. Return the regular expression that labels the one remaining transition from the start state to the accepting state.

The Algorithm fsmtoregexheuristic Example:

The Algorithm fsmtoregexheuristic Example: 1. Create a new initial state and a new, unique accepting state, neither of which is part of a loop.

The Algorithm fsmtoregexheuristic Example continued: 2. Remove states and arcs and replace with arcs labelled with larger and larger regular expressions.

The Algorithm fsmtoregexheuristic Example continued: Remove state 3:

The Algorithm fsmtoregexheuristic Example continued: Remove state 2:

The Algorithm fsmtoregexheuristic Example continued: Remove state 1:

Further Modifications to M We require that, from every state other than the accepting state there must be exactly one transition to every state (including itself) except the start state. And into every state other than the start state there must be exactly one transition from every state (including itself) except the accepting state. 1. If there is more than one transition between states p and q, collapse them into a single transition: 2. If any of the required transitions are missing, add them: 3. Choose a state. Rip it out. Restore functionality

Defining R(p, q) The Algorithm fsmtoregex After removing rip, the new regular expression that should label the transition from p to q is: R(p, q) R(p, rip) R(rip, rip)* R(rip, q) /* Go directly from p to q /* or /* Go from p to rip, then /* Go from rip back to itself any number of times, then /* Go from rip to q Without the comments, we have: R = R(p, q) R(p, rip) R(rip, rip)* R(rip, q)

The Algorithm fsmtoregex fsmtoregex(m: FSM) = 1. M = standardize(m: FSM). 2. Return buildregex(m ). standardize(m: FSM) = 1. Remove unreachable states from M. 2. If necessary, create a new start state. 3. If necessary, create a new accepting state. 4. If there is more than one transition between states p and q, collapse them. 5. If any transitions are missing, create them with label.

The Algorithm fsmtoregex buildregex(m: FSM) = 1. If M has no accepting states then return. 2. If M has only one state, then return. 3. Until only the start and accepting states remain do: 3.1 Select some state rip of M. 3.2 For every transition from p to q, if both p and q are not rip then do Compute the new label R for the transition from p to q: R (p, q) = R(p, q) R(p, rip) R(rip, rip)* R(rip, q) 3.3 Remove rip and all transitions into and out of it. 4. Return the regular expression that labels the transition from the start state to the accepting state.

Regular Expression Construction Construction of regular expression from FSM: R ij (k) regular expression representing the set of labels of all paths from state i to state j going through intermediate states {1, 2, 3,, k} only (defined recursively) R 1f (n) regular expression representing strings accepted by f, f is in F Equivalent regular expression is the union of all R 1f (n) Construction: BASE: R ij (0) = a 1 + a 2 + + a k where i j and a m are labels of arcs from state i to state j. R ij (0) = + a 1 + a 2 + + a k where i = j and a m are labels of arcs from state i to state j. INDUCTON: R ij (k) = R ij (k-1) + R ik (k-1) (R kk (k-1) ) * R kj (k-1)

Example: FSM to Regular Expressions R = R(p, q) R(p, rip) R(rip, rip)* R(p, rip) Let rip be state 2. Then: R (1, 3) = R(1, 3) R(1, rip)r(rip, rip)*r(rip, 3) = R(1, 3) R(1, 2)R(2, 2)*R(2, 3) = a b* a = ab*a

Simplifying Regular Expressions Regex s describe sets: Union is commutative: =. Union is associative: ( ) = ( ). is the identity for union: = =. Union is idempotent: =. Concatenation: Concatenation is associative: ( ) = ( ). is the identity for concatenation: = =. is a zero for concatenation: = =. Concatenation distributes over union: ( ) = ( ) ( ). ( ) = ( ) ( ). Kleene star: * =. * =. ( *)* = *. * * = *. ( )* = ( * *)*.

Applications of regular expressions UNIX Keyword search Given a protein or DNA sequence, find others that are likely to be evolutionarily close to it Using Regular Expressions in the Real World Matching numbers Matching ip addresses Finding doubled words Identifying spam Trawl for email addresses