Strings, Languages, and Regular Expressions

Similar documents
CHAPTER TWO LANGUAGES. Dr Zalmiyah Zakaria

CMPSCI 250: Introduction to Computation. Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014

Notes for Comp 454 Week 2

Chapter 4: Regular Expressions

Finite Automata Part Three

Glynda, the good witch of the North

Finite Automata Part Three

Automata Theory TEST 1 Answers Max points: 156 Grade basis: 150 Median grade: 81%

Proof Techniques Alphabets, Strings, and Languages. Foundations of Computer Science Theory

Finite Automata Part Three

Chapter Seven: Regular Expressions

Languages and Strings. Chapter 2

Welcome! CSC445 Models Of Computation Dr. Lutz Hamel Tyler 251

CS402 - Theory of Automata FAQs By

Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts

CMPSCI 250: Introduction to Computation. Lecture #36: State Elimination David Mix Barrington 23 April 2014

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Assignment No.4 solution. Pumping Lemma Version I and II. Where m = n! (n-factorial) and n = 1, 2, 3

Compiler Construction

Regular Languages and Regular Expressions

KHALID PERVEZ (MBA+MCS) CHICHAWATNI

Languages and Finite Automata

MA/CSSE 474. Today's Agenda

Regular Expressions & Automata

JNTUWORLD. Code No: R

Dr. D.M. Akbar Hussain

CMSC 132: Object-Oriented Programming II

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

CSE 431S Scanning. Washington University Spring 2013

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

Automata Theory CS S-FR Final Review

Learn Smart and Grow with world

Turing Machine Languages

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Tilings of the Sphere by Edge Congruent Pentagons

Slides for Faculty Oxford University Press All rights reserved.

1 Finite Representations of Languages

Regular Expressions. Chapter 6

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

Formal languages and computation models

Name: Finite Automata

CMPSCI 250: Introduction to Computation. Lecture #7: Quantifiers and Languages 6 February 2012

1.3 Functions and Equivalence Relations 1.4 Languages

TOPIC PAGE NO. UNIT-I FINITE AUTOMATA

Multiple Choice Questions

Key to Homework #8. (a) S aa A bs bbb (b) S AB aaab A aab aaaaaaab B bba bbbb

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2004 Columbia University Department of Computer Science

UNION-FREE DECOMPOSITION OF REGULAR LANGUAGES

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

1. Lexical Analysis Phase

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

ITEC2620 Introduction to Data Structures

CS 3100 Models of Computation Fall 2011 This assignment is worth 8% of the total points for assignments 100 points total.

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Lecture 6,

AUBER (Models of Computation, Languages and Automata) EXERCISES

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

SWEN 224 Formal Foundations of Programming

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

ECS 20 Lecture 9 Fall Oct 2013 Phil Rogaway

Lec-5-HW-1, TM basics

Regular Expressions. A Language for Specifying Languages. Thursday, September 20, 2007 Reading: Stoughton 3.1

LECTURE NOTES THEORY OF COMPUTATION

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

UNIT I PART A PART B

LECTURE NOTES THEORY OF COMPUTATION

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

2.1 Sets 2.2 Set Operations

CSE 105 THEORY OF COMPUTATION

CMSC 330: Organization of Programming Languages

Lexical Analysis - 1. A. Overview A.a) Role of Lexical Analyzer

This book is licensed under a Creative Commons Attribution 3.0 License

Automating Construction of Lexers

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Ambiguous Grammars and Compactification

CS195H Homework 1 Grid homotopies and free groups. Due: February 5, 2015, before class

Languages and Compilers

CMSC 330: Organization of Programming Languages

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CS308 Compiler Principles Lexical Analyzer Li Jiang

Regular Expressions. Lecture 10 Sections Robb T. Koether. Hampden-Sydney College. Wed, Sep 14, 2016

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

CMSC 330: Organization of Programming Languages. Context Free Grammars

Skyup's Media. PART-B 2) Construct a Mealy machine which is equivalent to the Moore machine given in table.

Solutions to Homework 10

CS375: Logic and Theory of Computing

Lexical Analysis (ASU Ch 3, Fig 3.1)

On Learning Regular Expressions and Patterns via Membership and Correction Queries. Efim Kinber Sacred Heart University Fairfield, CT, USA

CS 314 Principles of Programming Languages. Lecture 3

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Lesson 12: Angles Associated with Parallel Lines

1. Draw the state graphs for the finite automata which accept sets of strings composed of zeros and ones which:

Transcription:

Strings, Languages, and Regular Expressions Strings An alphabet sets is simply a nonempty set of things. We will call these things symbols. A finite string of lengthnover an alphabets is a total function from{0,..n} tos. Examples. SupposeS={ a, b, c }. Then the following function is a finite string of length3overs. ({0,1,2},S,{0 a,1 b,2 b,}) We ll also write this string as[ a, b, b ]. SupposeS={ a, b, c }. Then the following function is a finite string of length0overs. (,S, ) We will also write this string (and any other string of length0) as[]or asɛ. WhenS is a set of characters, I ll use double quotes to indicate strings Example: So[ a, b, b ]= abb. March 1, 2017 1

For eachn N the set( of strings of lengthnovers ) is S n = {0,..n} S tot Ifs S n, we say that its length isnand write s =n. Examples SupposeS={0,1} thens 2 is {[0,0],[0,1],[1,0],[1,1]} aabb =4 The set of finite strings overs is S = n N S n Example SupposeS={ a, b } thens is {ɛ, a, b, aa, ab, ba, bb, aaa, } Note that while the size ofs is infinite (even for finites), the length of each element ofs is finite. We can concatenate two stringssandtto get a stringsˆt. Such that sˆt S s + t and thus sˆt = s + t. Furthermore (sˆt)(i)=s(i), if0 i< s, and (sˆt)( s +i)=t(i), if0 i< t Note theɛis the identity element of catenationsˆɛ=s= ɛˆs, for alls. March 1, 2017 2

Letter conventions for this section of the course I ll use variables as follows S an alphabet a,b,c S s,t,w S M,N S x,y regular expressions overs Q a set of states p,q,r Q R,F Q T a set of transitions March 1, 2017 3

Languages A language overs is any subset ofs. We can extend the operation of concatenation to languages. IfM andn are languages overs. MˆN ={s M,t N sˆt} For example if M={ a, aa, ab } andn={ɛ, b } then MˆN={ a, aa, ab, aab, abb } For eachi N we definem i to bemm M where there areims. For example if M ={ a, aa, ab } then M 1 ={ a, aa, ab } M 2 ={ aa, aaa, aab, aaaa, aaab, aba, abaa, abab } M 3 = { aaa, aaaa, aaab, aaaaa, aaaab, aaba, aabaa, aabab, aaaaaa, aaaaab, aaaba, aaabaa, aaabab, abaa, abaaa, abaab, abaaaa, abaaab, ababa, ababaa, ababab } By convention,m 0 is{ɛ}. So we can definem i by M 0 = {ɛ} M i+1 = MˆM i, for alli N We can define the Kleene closurem of a set of strings as the set of all finite strings that can be generated from its March 1, 2017 4

members by catenation. So ifm ={ a, bb } then M = {ɛ, a, bb, aa, abb, bba, bbbb, aaa, aabb, abba, abbbb, bbaa, bbabb, bbbba, bbbbbb, } Formally we can define M = i N M i To put it differently, a finite stringsis a member ofm iff there is a sequence of 0 or more strings s 0,s 1,,s n M such that s=s 0ˆs 1ˆ ˆs n March 1, 2017 5

Regular languages Given an alphabets, we can make the following simple regular languages: the empty language; contains no strings {ɛ} the language that contains only the empty string For eacha S,{[a]} the languages of single, length 1 strings. Example: IfS = 0,1} we have the following4simple regular languages,{ɛ},{[0]},{[1]} Given thatm andn are languages overs, consider three ways to make languages Union:M N the language that contains all strings either inm or inn Concatenation:MˆN the language of stringssthat can be split into two partss=tˆw, wheret Mand w N. Kleene closure:m the language of stringssthat can be split into some number (including0) of parts, each of which is inm. March 1, 2017 6

Examples: IfS is{0,1} we can make the following languages out of the simple regular languages: {[0]} {[1]} {ɛ}={ɛ,[0],[1]} {[0]}ˆ{[1]}={[0,1]} {[0]} ={ɛ,[0],[0,0],[0,0,0],...} {[0]} {[1]} ={ɛ,[0],[1],[0,0],[1,1],...} {[0]} ˆ{[1]} ={ɛ [0],[1], [0,0],[0,1],[1,1], [0,0,0],[0,0,1],[0,1,1],[1,1,1],...} A regular language overs is any language that can be formed from the simple regular languages over S using the three operations of union, concatenation, and Kleene closure. Next we look at a nice syntax for describing regular languages. March 1, 2017 7

Regular expressions A regular expression overs defines a language overs. The set of regular expressions overs is itself a language overs {ɛ,,,;,,(,)} Syntax: For eacha S,ais a regular expression ɛ is a regular expression. is a regular expression. Ifxandy are regular expressions then (x y) is a regular expression (alternation) (x;y) is a regular expression (concatenation) (x ) is a regular expression (repetition) Underlining: I ve underlined regular expressions to make it clear that that is what they are. For exampleɛis a regular expression and is a string of length 1 in the language of regular expressions, whereasɛis a string of length 0. March 1, 2017 8

Parentheses: Parentheses can be omitted with the understanding that has higher precedence than;and that;has higher precedence than. Thus. (a;(b )) can be abbreviated bya;b whereas (a;b) needs its parentheses. And ((a;b) (c;d)) can be abbreviated bya;b c;d whereas a;(b c);d needs its parentheses. We can writex;y;z to mean(x;y);z Similarly, we can writex y zto mean(x y) z. Redundant parentheses can be added. E.g.((( b ) (( a ) ))). (The operators, ; and are analogous to+,, and exponentiation both in precedence and in some algebraic properties.) March 1, 2017 9

Semantics: We can define the meaning of regular expressions over an alphabet S by defining a function L from regular expressions to languages overs. L(a)={[a]}, for eacha S. L(ɛ)={ɛ} L( )= Ifxandy are regular expressions then L((x;y))=L(x)ˆL(y) L((x y))=l(x) L(y) L((x ))=(L(x)) Ifs L(x) we say thatxdescribessor thatxrecognizess. Two regular expression are equivalent if they describe the same language. March 1, 2017 10

Examples Money Suppose that S = { $, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,,,., - } We want to describe strings representing amounts of money such as $0.05 $-12,345.67 We can describe a set of strings representing amounts of money as follows. Letxrepresent the regular expression ( 0 1 2 3 4 5 6 7 8 9 ) The regular expression y= $ ;( - ɛ);(x;x;x x;x x);(, ;x;x;x) ;. ;x;x is such that $0.05 L(y) and $-12,345.67 L(y), whereas $12345.67 / L(y) and $12.6 / L(y), etc. Conventions and abbreviations We leave off the underline when it is clear we are talking about a regular expression March 1, 2017 11

R.E. Abbreviates Describes... x y x; y catenation s s(0)s(1) s( s 1) {s} x + xx any catenation of 1 or more strings, each described byx. x n xx...x }{{} any catenation of n, each n times described byx. x n m x m x m+1 x n any catenation of frommton strings, each described byx. x? ɛ x a string described byxorɛ.. a b any string of length one. [a b] any string of length one with a character alphabetically between a and b inclusive. With these abbreviations $ ;( - ɛ);(x;x;x x;x x);(, ;x;x;x) ;. ;x;x wherexis( 0 1 2 3 4 5 6 7 8 9 ) can be written as $ -? [ 0 9 ] 3 1 (, [ 0 9 ]3 ). [ 0 9 ] 2 The abbreviations are useful in application of regular expressions. However, any regular expression can be expanded to one that does not use the abbreviations. When dealing with the theory of regular expressions, we can safely ignore the abbreviations. March 1, 2017 12

Identifiers Suppose that S = { _, 0, 1,..., 9, a, b,..., z, A, B,..., Z } then letabe the regular expression ([ a z ] [ A Z ] _ )([ a z ] [ A Z ] [ 0 9 ] _ ) describes C++ identifiers. Example Parity LetS ={0,1}. Define a regular expression that matches only strings with an even number of 1s. ( 0 1 0 1 ) 0 String search LetS = { _, 0, 1,..., 9, a, b,..., z, A, B,..., Z }. Define an r.e. that matches all strings that contain the substring engi. Now the regular expression is. engi. More string searches Does a document contain any of the strings woman, women, man, men?. ( woman women man men ). or, equivalently,. wo? m ( a e ) n. March 1, 2017 13

Does a document contain John followed by Smith?. ( John ). ( Smith ). Challenges A C comment is a string that follows the following rules. C comments have at least 4 characters. The first two are /* and the last two are */. The sequence */ does not occur in between the first two and last two characters. Examples include /**/, /*/*/, and /*o*o/*/. Counter examples include o/**/, /**/o, and /**/*/. Supposexis a regular expression whose language is the set of all strings of length 1 other than / and *. Devise a regular expression whose language is the set of all C comments. Show that if a languagem is represented by a regular expression x, then there is a regular expression that represents the reversal ofm, i.e., the languagen that consists of all the strings inm written backwards. Represent an addition such as2+3=5 by 235 and 12 + 35 = 47 by 134257. Pad the operands with 0s so that both operands and the result are the same number of digits; e.g., represent23+777=800 by 078270370. Thus strings in the language are always of lengths that are multiples of3.create a regular expression that represents correct additions in binary. Hint: it might be easier to create a regular expression for the reversed language. Examples March 1, 2017 14

11+101 = 1000 is represented by a string 001010100110 which is in the language. 11+11=110 is represented by a string001111110 which is in the language 1+1 1 and so111 is not in the language. 00000 is not in the language, as its length is not a multiple of 3. Regular languages (again) A languagem is a regular language exactly if there exists a regular expressionxsuch thatm =L(x). So far we have seen that some languages are regular. Perhaps not surprisingly, there are also languages that are not regular. In the next section of the course we will see that a language is regular exactly if it can be recognized by a program that has a fixed amount of memory. Consider the following language M = i N { a } iˆ{ b } i = {ɛ, ab, aabb, aaabbb, } A program to recognized this language would somehow need to count the number of a s and b s. This would take an amount of memory that depends on the size of the input; i.e. it is not fixed. So this language is not regular. The preceding argument is not a formal proof; after learning a bit more, we will be in a position to be able to formally prove that some languages are not regular. March 1, 2017 15