Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

Similar documents
Chapter Seven: Regular Expressions

CSE 105 THEORY OF COMPUTATION

Regular Languages and Regular Expressions

ECS 120 Lesson 7 Regular Expressions, Pt. 1

CSE 431S Scanning. Washington University Spring 2013

CSE 105 THEORY OF COMPUTATION

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

Lexical Analysis. Prof. James L. Frankel Harvard University

Formal Languages and Automata

Languages and Strings. Chapter 2

Regular Expressions. Chapter 6

Ambiguous Grammars and Compactification

Regular Expressions & Automata

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Finite Automata Part Three

Welcome! CSC445 Models Of Computation Dr. Lutz Hamel Tyler 251

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Learn Smart and Grow with world

Lecture 6,

CMSC 132: Object-Oriented Programming II

Regular Expressions. Regular Expressions. Regular Languages. Specifying Languages. Regular Expressions. Kleene Star Operation

TOPIC PAGE NO. UNIT-I FINITE AUTOMATA

Lexical Analysis. Chapter 2

8 ε. Figure 1: An NFA-ǫ

Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Decision, Computation and Language

1 Finite Representations of Languages

Actually talking about Turing machines this time

Name: Finite Automata

Notes for Comp 454 Week 2

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CMPSCI 250: Introduction to Computation. Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014

Lexical Analysis. Lecture 3-4

CMPSCI 250: Introduction to Computation. Lecture 20: Deterministic and Nondeterministic Finite Automata David Mix Barrington 16 April 2013

DVA337 HT17 - LECTURE 4. Languages and regular expressions

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Proof Techniques Alphabets, Strings, and Languages. Foundations of Computer Science Theory

Chapter Eight: Regular Expression Applications. Formal Language, chapter 8, slide 1

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Lexical Analysis. Lecture 2-4

Alphabets, strings and formal. An introduction to information representation

Automata & languages. A primer on the Theory of Computation. The imitation game (2014) Benedict Cumberbatch Alan Turing ( ) Laurent Vanbever

(Refer Slide Time: 0:19)

Implementation of Lexical Analysis. Lecture 4

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Languages and Compilers

Integers and Mathematical Induction

Languages and Finite Automata

An Interesting Way to Combine Numbers

LECTURE NOTES THEORY OF COMPUTATION

6 NFA and Regular Expressions

Slides for Faculty Oxford University Press All rights reserved.

COMP Logic for Computer Scientists. Lecture 25

LECTURE NOTES THEORY OF COMPUTATION

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Front End: Lexical Analysis. The Structure of a Compiler

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

CS308 Compiler Principles Lexical Analyzer Li Jiang

Automata Theory CS S-FR Final Review

CIT3130: Theory of Computation. Regular languages

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

CS 301. Lecture 05 Applications of Regular Languages. Stephen Checkoway. January 31, 2018

(a) R=01[((10)*+111)*+0]*1 (b) ((01+10)*00)*. [8+8] 4. (a) Find the left most and right most derivations for the word abba in the grammar

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Regular Expressions. A Language for Specifying Languages. Thursday, September 20, 2007 Reading: Stoughton 3.1

CHAPTER TWO LANGUAGES. Dr Zalmiyah Zakaria

Regular Expressions. Lecture 10 Sections Robb T. Koether. Hampden-Sydney College. Wed, Sep 14, 2016

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Theory of Programming Languages COMP360

Compilers CS S-01 Compiler Basics & Lexical Analysis

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

I have read and understand all of the instructions below, and I will obey the Academic Honor Code.

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

A Small Interpreted Language

B The SLLGEN Parsing System

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Zhizheng Zhang. Southeast University

JNTUWORLD. Code No: R

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

14.1 Encoding for different models of computation

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

University of Nevada, Las Vegas Computer Science 456/656 Fall 2016

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

A Formal Study of Practical Regular Expressions

Multiple Choice Questions

Power Set of a set and Relations

Structure of Programming Languages Lecture 3

Finite Automata Part Three

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Transcription:

Chapter Seven: Regular Expressions Formal Language, chapter 7, slide

The first time a young student sees the mathematical constant π, it looks like just one more school artifact: one more arbitrary symbol whose definition to memorize for the next test. Later, if he or she persists, this perception changes. In many branches of mathematics and with many practical applications, π keeps on turning up. "There it is again!" says the student, thus joining the ranks of mathematicians for whom mathematics seems less like an artifact invented and more like a natural phenomenon discovered. So it is with regular languages. We have seen that DFAs and NFAs have equal definitional power. It turns out that regular expressions also have exactly that same definitional power: they can be used to define all the regular languages, and only the regular languages. There it is again! Formal Language, chapter 7, slide 2 2

Outline 7. Regular Expressions, Formally Defined 7.2 Regular Expression Examples 7.3 For Every Regular Expression, a Regular Language 7.4 Regular Expressions and Structural Induction 7.5 For Every Regular Language, a Regular Expression Formal Language, chapter 7, slide 3 3

Concatenation of Languages The concatenation of two languages L and L 2 is L L 2 = {xy x L and y L 2 } The set of all strings that can be constructed by concatenating a string from the first language with a string from the second For example, if L = {a, b} and L 2 = {c, d} then L L 2 = {ac, ad, bc, bd} Formal Language, chapter 7, slide 4 4

Kleene Closure of a Language The Kleene closure of a language L is L* = {x x 2... x n n, with all x i L} The set of strings that can be formed by concatenating any number of strings, each of which is an element of L Not the same as {x n n and x L} In L*, each x i may be a different element of L For example, {ab, cd}* = {ε, ab, cd, abab, abcd, cdab, cdcd, ababab,...} For all L, ε L* For all L containing at least one string other than ε, L* is infinite Formal Language, chapter 7, slide 5 5

Regular Expressions A regular expression is a string r that denotes a language L(r) over some alphabet Σ Regular expressions make special use of the symbols ε,, +, *, and parentheses We will assume that these special symbols are not included in Σ There are six kinds of regular expressions Formal Language, chapter 7, slide 6 6

The Six Regular Expressions The six kinds of regular expressions, and the languages they denote, are: Three kinds of atomic regular expressions: Any symbol a Σ, with L(a) = {a} The special symbol ε, with L(ε) = {ε} The special symbol, with L( ) = {} Three kinds of compound regular expressions built from smaller regular expressions, here called r, r, and r 2 : (r + r 2 ), with L(r + r 2 ) = L(r ) L(r 2 ) (r r 2 ), with L(r r 2 ) = L(r )L(r 2 ) (r)*, with L((r)*) = (L(r))* The parentheses may be omitted, in which case * has highest precedence and + has lowest Formal Language, chapter 7, slide 7 7

Other Uses of the Name These are classical regular expressions Many modern programs use text patterns also called regular expressions: Tools like awk, sed and grep Languages like Perl, Python, Ruby, and PHP Language libraries like those for Java and the.net languages All slightly different from ours and each other More about them in a later chapter Formal Language, chapter 7, slide 8 8

Outline 7. Regular Expressions, Formally Defined 7.2 Regular Expression Examples 7.3 For Every Regular Expression, a Regular Language 7.4 Regular Expressions and Structural Induction 7.5 For Every Regular Language, a Regular Expression Formal Language, chapter 7, slide 9 9

ab Denotes the language {ab} Our formal definition permits this because a is an atomic regular expression denoting {a} b is an atomic regular expression denoting {b} Their concatenation (ab) is a compound Unnecessary parentheses can be omitted Thus any string x in Σ* can be used by itself as a regular expression, denoting {x} Formal Language, chapter 7, slide

ab+c Denotes the language {ab,c} We omitted parentheses from the fully parenthesized form ((ab)+c) The inner pair is unnecessary because + has lower precedence than concatenation Thus any finite language can be defined using a regular expression Just list the strings, separated by + Formal Language, chapter 7, slide

ba* Denotes the language {ba n }: the set of strings consisting of b followed by zero or more as Not the same as (ba)*, which denotes {(ba) n } * has higher precedence than concatenation The Kleene star is the only way to define an infinite language using regular expressions Formal Language, chapter 7, slide 2 2

(a+b)* Denotes {a,b}*: the whole language of strings over the alphabet {a,b} The parentheses are necessary here, because * has higher precedence than + a+b* denotes {a} {b}* Reminder: not "zero or more copies " That would be a*+b*, which denotes {a}* {b}* Formal Language, chapter 7, slide 3 3

ab+ε Denotes the language {ab,ε} Occasionally, we need to use the atomic regular expression ε to include ε in the language But it's not needed in (a+b)*+ε, because ε is already part of every Kleene star Formal Language, chapter 7, slide 4 4

Denotes {} There is no other way to denote the empty set with regular expressions That's all you should ever use for It is not useful in compounds: L(r ) = L( r) = {} L(r+ ) = L( +r) = L(r) L( *) = {ε} Formal Language, chapter 7, slide 5 5

More Examples (a+b)(c+d) Denotes {ac, ad, bc, bd} (abc)* Denotes {(abc) n } = {ε, abc, abcabc, } a*b* Denotes {a n b m } = {xy x {a}* and y {b}*} Formal Language, chapter 7, slide 6 6

More Examples (a+b)*aa(a+b)* Denotes {x {a,b}* x contains at least 2 consecutive as} (a+b)*a(a+b)*a(a+b)* Denotes {x {a,b}* x contains at least 2 as} (a*b*)* Denotes {a,b}*, same as the simpler (a+b)* Because L(a*b*) contains both a and b, and that's enough: we already have L((a+b)*) = {a,b}* In general, whenever Σ L(r), then L((r)*) = Σ* Formal Language, chapter 7, slide 7 7

Outline 7. Regular Expressions, Formally Defined 7.2 Regular Expression Examples 7.3 For Every Regular Expression, a Regular Language 7.4 Regular Expressions and Structural Induction 7.5 For Every Regular Language, a Regular Expression Formal Language, chapter 7, slide 8 8

Regular Expression to NFA Goal: to show that every regular expression defines a regular language Approach: give a way to convert any regular expression to an NFA for the same language Advantage: large NFAs can be composed from smaller ones using ε-transitions Formal Language, chapter 7, slide 9 9

Standard Form To make them easier to compose, our NFAs will all have the same standard form: Exactly one accepting state, not the start state That is, for any regular expression r, we will show how to construct an NFA N with L(N) = L(r), pictured like this: r Formal Language, chapter 7, slide 2 2

Composing Example That form makes composition easy For example, given NFAs for L(r ) and L(r 2 ), we can easily construct one for L(r +r 2 ): r r 2 This new NFA still has our special form Formal Language, chapter 7, slide 2 2

Lemma 7.3 If r is any regular expression, there is some NFA N that has a single accepting state, not the same as the start state, with L(N) = L(r). Proof sketch: There are six kinds of regular expressions We will show how to build a suitable NFA for each kind Formal Language, chapter 7, slide 22 22

Proof Sketch: Atomic Expressions There are three kinds of atomic regular expressions Any symbol a Σ, with L(a) = {a} a a : The special symbol ε, with L(ε) = {ε} : The special symbol, with L( ) = {} : Formal Language, chapter 7, slide 23 23

Proof: Compound Expressions There are three kinds of compound regular expressions: (r + r 2 ), with L(r + r 2 ) = L(r ) L(r 2 ) r (r + r 2 ): r 2 Formal Language, chapter 7, slide 24 24

(r r 2 ), with L(r r 2 ) = L(r ) L(r 2 ) (r r 2 ): (r )*, with L((r )*) = (L(r ))* r r 2 (r )*: r Formal Language, chapter 7, slide 25 25

Outline 7. Regular Expressions, Formally Defined 7.2 Regular Expression Examples 7.3 For Every Regular Expression, a Regular Language 7.4 Regular Expressions and Structural Induction 7.5 For Every Regular Language, a Regular Expression Formal Language, chapter 7, slide 26 26

Sketchy Proof That proof left out a number of details To make it more rigorous, we would have to Give the 5-tuple form for each NFA Show that it each NFA accepts the right language More fundamentally, we would have to organize the proof as an induction: a structural induction Formal Language, chapter 7, slide 27 27

Structural Induction Induction on a recursively-defined structure Here: the structure of regular expressions Base cases: the bases of the recursive definition Here: the atomic regular expressions Inductive cases: the recursive cases of the definition Here: the compound regular expressions Inductive hypothesis: the assumption that the proof has been done for structurally simpler cases Here: for a compound regular expression r, the assumption that the proof has been done for r's subexpressions Formal Language, chapter 7, slide 28 28

Lemma 7.3, Proof Outline Proof is by induction on the structure of r Base cases: when r is an atomic expression, it has one of these three forms: For each, give NFA N and show L(N) correct Recursive cases: when r is a compound expression, it has one of these three forms: For each, give NFA N, using the NFAs for r's subexpressions as guaranteed by the inductive hypothesis, and show L(N) correct QED Formal Language, chapter 7, slide 29 29

Outline 7. Regular Expressions, Formally Defined 7.2 Regular Expression Examples 7.3 For Every Regular Expression, a Regular Language 7.4 Regular Expressions and Structural Induction 7.5 For Every Regular Language, a Regular Expression Formal Language, chapter 7, slide 3 3

NFA to Regular Expression There is a way to take any NFA and construct a regular expression for the same language Lemma 7.5: if N is any NFA, there is some regular expression r with L(r) = L(N) A tricky construction, covered in Appendix A For now, just an example of the construction Formal Language, chapter 7, slide 3 3

2 Recall this NFA (which is also a DFA) from chapter 3 L(M) = the set of strings that are binary representation of numbers divisible by 3 We'll construct an equivalent regular expression Not as hard as it looks Ultimately, we want the set of strings that take it from to, passing through any of the other states But we'll start with some easy pieces Formal Language, chapter 7, slide 32 32

2 What is a regular expression for the language of strings that take it from 2 back to 2, any number of times, without passing through or? Formal Language, chapter 7, slide 33 33

2 What is a regular expression for the language of strings that take it from 2 back to 2, any number of times, without passing through or? Easy: * Formal Language, chapter 7, slide 34 34

2 What is a regular expression for the language of strings that take it from 2 back to 2, any number of times, without passing through or? Easy: * Then what is a regular expression for the language of strings that take it from back to, any number of times, without passing through? Formal Language, chapter 7, slide 35 35

2 What is a regular expression for the language of strings that take it from 2 back to 2, any number of times, without passing through or? Easy: * Then what is a regular expression for the language of strings that take it from back to, any number of times, without passing through? That would be (*)*: Go to 2 (the first ) Go from 2 to 2 any number of times (we already got * for that) Go back to (the last ) Repeat any number of times (the outer (..)*) Formal Language, chapter 7, slide 36 36

2 Then what is a regular expression for the language of strings that take it from back to, any number of times, without passing through? That would be (*)* Then what is a regular expression for the language of strings that take it from back to, any number of times? Formal Language, chapter 7, slide 37 37

2 Then what is a regular expression for the language of strings that take it from back to, any number of times, without passing through? That would be (*)* Then what is a regular expression for the language of string that take it from back to, any number of times? That would be ( + (*)*)*: One way to go from to once is with a Another is with a, then (*)*, then a final That makes + (*)* Repeat any number of times (the outer (..)*) Formal Language, chapter 7, slide 38 38

2 So the regular expression is ( + (*)*)* The full construction in Appendix A uses a similar approach, and works on any NFA It defines the regular expression in terms of smaller regular expressions that correspond to restricted paths through the NFA Putting Lemmas 7.3 and 7.5 together, we have... Formal Language, chapter 7, slide 39 39

Theorem 7.5 (Kleene's Theorem) A language is regular if and only if it is L(r) for some regular expression r. Proof: follows from Lemmas 7.3 and 7.5 This makes our third way of defining the regular languages: By DFA By NFA By regular expression These three have equal power for defining languages Formal Language, chapter 7, slide 4 4