Chapter Seven: Regular Expressions

Similar documents
Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

Welcome! CSC445 Models Of Computation Dr. Lutz Hamel Tyler 251

Regular Languages and Regular Expressions

ECS 120 Lesson 7 Regular Expressions, Pt. 1

CSE 431S Scanning. Washington University Spring 2013

Regular Expressions & Automata

CSE 105 THEORY OF COMPUTATION

CMSC 132: Object-Oriented Programming II

Formal Languages and Automata

Ambiguous Grammars and Compactification

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

CMPSCI 250: Introduction to Computation. Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014

Regular Expressions. Chapter 6

Regular Expressions. Lecture 10 Sections Robb T. Koether. Hampden-Sydney College. Wed, Sep 14, 2016

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Finite Automata Part Three

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CHAPTER TWO LANGUAGES. Dr Zalmiyah Zakaria

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

CS308 Compiler Principles Lexical Analyzer Li Jiang

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

Lexical Analysis. Prof. James L. Frankel Harvard University

Proof Techniques Alphabets, Strings, and Languages. Foundations of Computer Science Theory

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Lecture 6,

Compiler Construction

Notes for Comp 454 Week 2

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CSE 105 THEORY OF COMPUTATION

Languages and Compilers

8 ε. Figure 1: An NFA-ǫ

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Lexical Analysis. Chapter 2

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

CSE 401 Midterm Exam 11/5/10

JNTUWORLD. Code No: R

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Multiple Choice Questions

Dr. D.M. Akbar Hussain

Lexical Analysis. Lecture 3-4

Compilers CS S-01 Compiler Basics & Lexical Analysis

Languages and Strings. Chapter 2

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Learn Smart and Grow with world

Zhizheng Zhang. Southeast University

CMPSCI 250: Introduction to Computation. Lecture 20: Deterministic and Nondeterministic Finite Automata David Mix Barrington 16 April 2013

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

TOPIC PAGE NO. UNIT-I FINITE AUTOMATA

Compilers CS S-01 Compiler Basics & Lexical Analysis

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

I have read and understand all of the instructions below, and I will obey the Academic Honor Code.

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Part 5 Program Analysis Principles and Techniques

CIT3130: Theory of Computation. Regular languages

Implementation of Lexical Analysis. Lecture 4

Lexical Analysis. Lecture 2-4

Name: Finite Automata

CMSC 330: Organization of Programming Languages

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

CMSC 330: Organization of Programming Languages

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Lexical Analysis. Lecture 3. January 10, 2018

Context-Free Grammars

Non-deterministic Finite Automata (NFA)

B The SLLGEN Parsing System

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

Lexical Analysis. Implementation: Finite Automata

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Front End: Lexical Analysis. The Structure of a Compiler

CMSC 330: Organization of Programming Languages

Alphabets, strings and formal. An introduction to information representation

Lexical Analysis. Introduction

CS S-01 Compiler Basics & Lexical Analysis 1

(Refer Slide Time: 0:19)

LECTURE NOTES THEORY OF COMPUTATION

CS103 Handout 35 Spring 2017 May 19, 2017 Problem Set 7

Chapter 4: Regular Expressions

14.1 Encoding for different models of computation

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

LECTURE NOTES THEORY OF COMPUTATION

Regular Expressions. A Language for Specifying Languages. Thursday, September 20, 2007 Reading: Stoughton 3.1

CMSC 330: Organization of Programming Languages. Context Free Grammars

Automata Theory CS S-FR Final Review

Converting a DFA to a Regular Expression JP

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

1. Which of the following regular expressions over {0, 1} denotes the set of all strings not containing 100 as a sub-string?

Finite Automata Part Three

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Structure of Programming Languages Lecture 3

Transcription:

Chapter Seven: Regular Expressions

Regular Expressions We have seen that DFAs and NFAs have equal definitional power. It turns out that regular expressions also have exactly that same definitional power: they can be used to define all the regular languages, and only the regular languages.

Outline 7.1 Regular Expressions, Formally Defined 7.2 Regular Expression Examples 7.3 For Every Regular Expression, a Regular Language 7.4 Regular Expressions and Structural Induction 7.5 For Every Regular Language, a Regular Expression

Regular Expression In order to define regular expressions we need to additional operators on languages: Concatenation Kleene closure

Concatenation of Languages The concatenation of two languages L 1 and L 2 is L 1 L 2 = {xy x L 1 and y L 2 } The set of all strings that can be constructed by concatenating a string from the first language with a string from the second For example, if L 1 = {a, b} and L 2 = {c, d} then L 1 L 2 = {ac, ad, bc, bd}

Kleene Closure of a Language The Kleene closure of a language L is L* = {x 1 x 2... x n n 0, with all x i L} The set of strings that can be formed by concatenating any number of strings, each of which is an element of L In L*, each x i may be a different element of L For example, {ab, cd}* = {ε, ab, cd, abab, abcd, cdab, cdcd, ababab,...} For all L, ε L* For all L containing at least one string other than ε, L* is infinite Note: this is very similar to the set of all strings Σ* over alphabet Σ. In fact, sometimes we talk about the Kleene closure of the alphabet.

Regular Expressions A regular expression is a string r that denotes a language L(r) over some alphabet Σ Regular expressions make special use of the symbols ε,, +, *, and parentheses We will assume that these special symbols are not included in Σ There are six kinds of regular expressions

The Six Regular Expressions The six kinds of regular expressions, and the languages they denote, are: Three kinds of atomic regular expressions: Any symbol a Σ, with L(a) = {a} The special symbol ε, with L(ε) = {ε} The special symbol, with L( ) = {} Three kinds of compound regular expressions built from smaller regular expressions, here called r, r 1, and r 2 : (r 1 + r 2 ), with L(r 1 + r 2 ) = L(r 1 ) L(r 2 ) (r 1 r 2 ), with L(r 1 r 2 ) = L(r 1 )L(r 2 ) (r)*, with L((r)*) = (L(r))* The parentheses may be omitted, in which case * has highest precedence and + has lowest

Outline 7.1 Regular Expressions, Formally Defined 7.2 Regular Expression Examples 7.3 For Every Regular Expression, a Regular Language 7.4 Regular Expressions and Structural Induction 7.5 For Every Regular Language, a Regular Expression

ab Denotes the language {ab} Our formal definition permits this because a is an atomic regular expression denoting {a} b is an atomic regular expression denoting {b} Their concatenation (ab) is a compound Unnecessary parentheses can be omitted Thus any string x in Σ* can be used by itself as a regular expression, denoting {x}

ab+c Denotes the language {ab,c} We omitted parentheses from the fully parenthesized form ((ab)+c) The inner pair is unnecessary because + has lower precedence than concatenation Thus any finite language can be defined using a regular expression Just list the strings, separated by + Hint: when you see a + in a RE just think or

ba* Denotes the language {ba n n 0}: the set of strings consisting of b followed by zero or more as Not the same as (ba)*, which denotes {(ba) n n 0} * has higher precedence than concatenation The Kleene star is the only way to define an infinite language using regular expressions

(a+b)* Denotes {a,b}*: the whole language of strings over the alphabet {a,b} The parentheses are necessary here, because * has higher precedence than + Kleene closure does not distribute, that is, (a+b)* a*+b* a*+b* denotes {a}* {b}*

Denotes {} There is no other way to denote the empty set with regular expressions That's all you should ever use for It is not useful in compounds: L(r ) = L( r) = {} L(r+ ) = L( +r) = L(r) L( *) = {ε}

From Languages to RE Give the regular expressions for the following languages: {x x is a string that starts with three 0s followed by arbitrary 0s and 1s and then ends with three 0s} {x x is a string that starts with a 0 followed by an arbitrary number of 1s and ends with a 0 OR x is a string that starts with a 1 followed by an arbitrary number of 0s and ends with a 1} { x n x is either the string ab or the string c and n >= 0}

Outline 7.1 Regular Expressions, Formally Defined 7.2 Regular Expression Examples 7.3 For Every Regular Expression, a Regular Language 7.4 Regular Expressions and Structural Induction 7.5 For Every Regular Language, a Regular Expression

Regular Expression to NFA Approach: convert any regular expression to an NFA for the same language

Standard Form To make them easier to compose, our NFAs will all have the same standard form: Exactly one accepting state, not the start state That is, for any regular expression r, we will show how to construct an NFA N with L(N) = L(r), pictured like this: r

Atomic REs

Compound REs where r 1 and r 2 are REs

Outline 7.1 Regular Expressions, Formally Defined 7.2 Regular Expression Examples 7.3 For Every Regular Expression, a Regular Language 7.4 Regular Expressions and Structural Induction 7.5 For Every Regular Language, a Regular Expression

NFA to Regular Expression There is a way to take any NFA and construct a regular expression for the same language This gives us our next lemma: Lemma: if N is any NFA, there is some regular expression r with L(r) = L(N) A tricky construction, covered in Appendix A (very difficult to follow), the hand-out has a more intuitive proof.

Theorem (Kleene's Theorem) A language is regular if and only if it is L(r) for some regular expression r. Proof: follows from previous two lemmas.

Defining Regular Languages We can define the regular languages: By DFA By NFA By regular expression These three have equal power for defining languages

Alphabets An alphabet is any finite set of symbols {0,1}: binary alphabet {0,1,2,3,4,5,6,7,8,9}: decimal alphabet ASCII, Unicode: machine-text alphabets Or just {a,b}: enough for many examples {}: a legal but not usually interesting alphabet We will usually use Σ as the name of the alphabet we re considering, as in Σ = {a,b}

Strings A string is a finite sequence of zero or more symbols Length of a string: abbb = 4 A string over the alphabet Σ means a string all of whose symbols are in Σ The set of all strings of length 2 over the alphabet {a,b} is {aa, ab, ba, bb}

Languages A language is a set of strings over some fixed alphabet Not restricted to finite sets: in fact, finite sets are not usually interesting languages All our alphabets are finite, and all our strings are finite, but most of the languages we're interested in are infinite

The Quest Using set formers to describe complex languages is challenging They can often be vague, ambiguous, or selfcontradictory A big part of our quest in the study of formal language is to develop better tools for defining and classifying languages

The Quest We went from this: {x x is a string that starts with three 0s followed by arbitrary 0s and 1s and then ends with three 0s} to this: 000(0+1)*000

The Quest We just defined a major class of languages: the regular languages The hallmark of these languages is that their structure is such that simple computational models (DFA/NFA) can recognize them and that they can be defined using regular expressions.

The Quest The idea that the structure of languages is connected to computational models is important. Later on we see that the structure of languages is tightly coupled with idea of algorithms and classes of computational problems.

See website Assignment #4