Regular Languages and Regular Expressions

Similar documents
Formal Languages and Automata

Chapter Seven: Regular Expressions

ECS 120 Lesson 7 Regular Expressions, Pt. 1

CSE 105 THEORY OF COMPUTATION

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

Dr. D.M. Akbar Hussain

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

NFAs and Myhill-Nerode. CS154 Chris Pollett Feb. 22, 2006.

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

CSE 105 THEORY OF COMPUTATION

CSE 431S Scanning. Washington University Spring 2013

Learn Smart and Grow with world

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Name: Finite Automata

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Automating Construction of Lexers

TOPIC PAGE NO. UNIT-I FINITE AUTOMATA

Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts

UNION-FREE DECOMPOSITION OF REGULAR LANGUAGES

Formal Languages and Compilers Lecture VI: Lexical Analysis

1.3 Functions and Equivalence Relations 1.4 Languages

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

LECTURE NOTES THEORY OF COMPUTATION

Lexical Analysis. Implementation: Finite Automata

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Zhizheng Zhang. Southeast University

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

LECTURE NOTES THEORY OF COMPUTATION

lec3:nondeterministic finite state automata

CMSC 132: Object-Oriented Programming II

Answer All Questions. All Questions Carry Equal Marks. Time: 20 Min. Marks: 10.

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Converting a DFA to a Regular Expression JP

Lexical Analysis. Chapter 2

Lexical Analysis. Prof. James L. Frankel Harvard University

Ambiguous Grammars and Compactification

Regular Expressions & Automata

Implementation of Lexical Analysis

CT32 COMPUTER NETWORKS DEC 2015


Lexical Analysis. Lecture 3-4

2. Lexical Analysis! Prof. O. Nierstrasz!

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Implementation of Lexical Analysis

Limits of Computation p.1/?? Limits of Computation p.2/??

Multiple Choice Questions

Compiler Construction

UNIT -2 LEXICAL ANALYSIS

CS308 Compiler Principles Lexical Analyzer Li Jiang

Front End: Lexical Analysis. The Structure of a Compiler

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Languages and Compilers

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Lexical Analysis. Lecture 2-4

CMPSCI 250: Introduction to Computation. Lecture 20: Deterministic and Nondeterministic Finite Automata David Mix Barrington 16 April 2013

QUESTION BANK. Unit 1. Introduction to Finite Automata

Compiler Construction LECTURE # 3

Implementation of Lexical Analysis

FAdo: Interactive Tools for Learning Formal Computational Models

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Decision Properties of RLs & Automaton Minimization

(Refer Slide Time: 0:19)

CMPSCI 250: Introduction to Computation. Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014

ECS 120 Lesson 16 Turing Machines, Pt. 2

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

University of Nevada, Las Vegas Computer Science 456/656 Fall 2016

CMPE 4003 Formal Languages & Automata Theory Project Due May 15, 2010.

Lexical Analysis - 2

Skyup's Media. PART-B 2) Construct a Mealy machine which is equivalent to the Moore machine given in table.

Theory of Computation

Lecture 2 Finite Automata

Lexical Analysis (ASU Ch 3, Fig 3.1)

Regular Expressions. Lecture 10 Sections Robb T. Koether. Hampden-Sydney College. Wed, Sep 14, 2016

Decision, Computation and Language

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

Optimizing Finite Automata

Closure Properties of CFLs; Introducing TMs. CS154 Chris Pollett Apr 9, 2007.

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

CS 310: State Transition Diagrams

Formal Languages. Formal Languages

Automata Theory CS S-FR Final Review

Automata & languages. A primer on the Theory of Computation. The imitation game (2014) Benedict Cumberbatch Alan Turing ( ) Laurent Vanbever

Midterm Exam. CSCI 3136: Principles of Programming Languages. February 20, Group 2

Lexical Analyzer Scanner

Final Course Review. Reading: Chapters 1-9

Midterm I (Solutions) CS164, Spring 2002

Regular Expressions. A Language for Specifying Languages. Thursday, September 20, 2007 Reading: Stoughton 3.1

Lexical Analyzer Scanner

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CS 181 B&C EXAM #1 NAME. You have 90 minutes to complete this exam. You may assume without proof any statement proved in class.

Introduction to Lexical Analysis

Regular Expression Constrained Sequence Alignment

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

A Characterization of the Chomsky Hierarchy by String Turing Machines

Transcription:

Regular Languages and Regular Expressions According to our definition, a language is regular if there exists a finite state automaton that accepts it. Therefore every regular language can be described by some nfa or dfa. Regular expressions: One way of describing regular languages is via the notation of regular expressions. This notation involves a combination of strings of symbols from some alphabet, parentheses and the operators +,., and *. If is an alphabet, the regular expression describes the language consisting of all strings of length 1 over this alphabet and * describes the language consisting of all strings over that alphabet. Formal Definition of a Regular Expression: Let be a given alphabet, then:,, and a are all regular expressions. These are called primitive regular expressions. if R1 and R2 are regular expressions, then o (R1 R2) or R1 + R2, o R1.R2, o R1* and o (R1) are also regular expressions A string is a regular expression iff it can be derived from primitive regular expressions by a finite number of applications of the rules in 2 In item 1 the regular expressions a and represent the languages {a}, and {}, respectively; the regular expression represents the empty language.

In items 2, the expressions represent the languages obtained by taking the union, the concatenation of the languages R1 and R2 or the star of the language R1, respectively. Regular expressions are useful tools in the design of compilers for programming languages. Languages associated with Regular Expressions: Regular expressions can be used to describe some simple languages. If R is a regular expression, we will let L(R) denote the language associated with R. 1. is a regular expression denoting the empty set 2. is a regular expression denoting {} 3. For every a, a is a regular expression denoting {a}, if r1 and r2 are regular expressions, then: a. L(r1 + r2) = L(r1) L(r2) b. L(r1.r2) = L(r1)L(r2) c. L((r1))=L(r1) d. L(r1*)=(L(r1))* Precedence of Regular Expressions Operators: Like any algebra, the regular-expression operators have an assumed order of precedence, which means that operators are associated with the operands in a particular order. The following is the order of precedence: 1. The star operator is of the highest precedence. 2. Next comes the dot or concatenation operator 3. finally all the unions (+) are grouped with their operands. The expression 01* +1 is grouped (0(1)*) +1 Write the language L(a*.(a+b)) in set notation L(a*.(a+b)) = L(a*)L(a+b) = (L(a))*( L(a) L(b)) = {, a,aa,aaa,.}{a,b}

= {a, aa,aaa,, b, ab, aab,aaab,..} Write a regular expression for the set of strings that consists of alternating 0 s and 1 s. Let us first develop a regular expression for the language consisting of a single string 01. We can then use the star operator to get an expression of all strings of the form 0101 01 The basis rule for RE tells us that 0 and 1 are expressions denoting the languages {0} and {1}, respectively. If we concatenate the two expressions, we get a regular expression for the language {01}; this is the expression 01. Now, to get all strings consisting of zero or more occurrences of 01, we use the regular expression (01)* and get L(01)*. However this is not exactly what we want, since it only includes strings beginning with 0. To account for strings that start with 1 and strings that end with 0 or with 1: (01)* +(10)*+ 0 (10)*+ 1(01)* Regular expressions and Regular Languages The class R of regular languages over an alphabet, has the following properties: i) The emptyset R, R and if a, then {a} R (ii) If s2 and s2 R, then s1 s2, s1.s2, s1* R (iii) Only sets formed using (i) and (ii) R Let S and T be subsets of *, if S= T* then S is generated by T. If S is generated by T, then every element of S is the finite concatenation of elements of T. The elements in the concatenation forming a given word are not necessarily unique. Example:

= {a, b}, * consists of the empty word and all possible finite strings of the symbols a and b. T ={a, ab, b}, then *= T *. However, although every string in * can be expressed uniquely as the concatenation of elements of, this is not true of T, since the expression abab can be expressed as: (a).(b)(ab), (a).(b).(a).(b) and (ab).(ab) Let S and C be subsets of *. if S=C*, then every string in S can be expressed as the concatenation of elements of in C and we say that C is a code. A code C is uniquely decipherable if every string in S can be uniquely expressed as the concatenation of elements of C. Is C ={a,b,c,de} uniquely decipherable? same for C= {ba, ab,c}, C = {a, ab, bc, c} Let be an alphabet. A code C subset of * is called a prefix code if for all words u, v, in C, if u=vw for w in *, then u=v or u=. This means that one word in a code cannot be the beginning of another word in the same code. Equivalence with finite automata: Regular expressions and finite automata are equivalent in their descriptive power. That is any regular expression can be converted into a finite automaton that recognizes the language and vice versa. Theorem:

A language is regular if and only if some regular expression describes it. Lemma: If a language is described by a regular expression, then it is regular. Proof: We consider the six cases in the formal definition of regular expressions and build an nfa that accepts them based on the rules below: 1 2 3 a q0 q1 q0 q1 q0 q1 M(r) M(r1) 5 4 1-nfa accepts 2-nfa accepts {} 3-nfa accepts {a} 4-nfa accepts L(r) M(r2) 5 nfa for L(r1 + r2) 6 nfa for L(r1.r2) 7-L(r1*) M(r1) M(r2) 7 6 Lemma:

If a language is regular, then it is described by a regular expression Algorithm for converting a DFA into an equivalent regular expression: 1) Add two states to the DFA: a start state called s and an accept state called a. This defines a new automaton called G CONVERT (G): Let k be the number of states of G If k=2, then G must consist of a start state, an accept state and a single arrow connecting them and labeled with a regular expression R. Return R. If k >2, we select any state q rip Q different from s and a and let G = (Q,, δ,s, a), where: Q = Q-{q rip } and for any q i Q -{a} and any q j Q -{s} let: δ (q i, q j ) = (R1)(R2)*(R3) R4 for R1= δ(q i, q rip ), R2= δ( q rip, q rip ), R3= δ( q rip, q j ) and R4= δ(q i, q j ) Compute CONVERT(G ) and return this value Simplification Rules: Let r be the transition symbol from q i to q j, the following simplification rules apply: 1. r + = r 2. r. = 3. *=