Notes for Comp 454 Week 2

Similar documents
Chapter 4: Regular Expressions

CHAPTER TWO LANGUAGES. Dr Zalmiyah Zakaria

Glynda, the good witch of the North

Proof Techniques Alphabets, Strings, and Languages. Foundations of Computer Science Theory

CMPSCI 250: Introduction to Computation. Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014

Languages and Strings. Chapter 2

Strings, Languages, and Regular Expressions

Chapter Seven: Regular Expressions

Automata Theory TEST 1 Answers Max points: 156 Grade basis: 150 Median grade: 81%

CS402 - Theory of Automata FAQs By

ITEC2620 Introduction to Data Structures

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

1.3 Functions and Equivalence Relations 1.4 Languages

Finite Automata Part Three

Welcome! CSC445 Models Of Computation Dr. Lutz Hamel Tyler 251

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts

Regular Expressions. Lecture 10 Sections Robb T. Koether. Hampden-Sydney College. Wed, Sep 14, 2016

Finite Automata Part Three

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

KHALID PERVEZ (MBA+MCS) CHICHAWATNI

Alphabets, strings and formal. An introduction to information representation

Multiple Choice Questions

Announcements. CS243: Discrete Structures. Strong Induction and Recursively Defined Structures. Review. Example (review) Example (review), cont.

Recursively Defined Functions

CS402 - Theory of Automata Glossary By

Compiler Construction

Assignment No.4 solution. Pumping Lemma Version I and II. Where m = n! (n-factorial) and n = 1, 2, 3

Computer Science 236 Fall Nov. 11, 2010

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

Languages and Finite Automata

JNTUWORLD. Code No: R

UNIT I PART A PART B

Solutions to Homework 10

CMPSCI 250: Introduction to Computation. Lecture #7: Quantifiers and Languages 6 February 2012

1. Which of the following regular expressions over {0, 1} denotes the set of all strings not containing 100 as a sub-string?

Formal Languages and Automata

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

1 Finite Representations of Languages

CMPSCI 250: Introduction to Computation. Lecture #1: Things, Sets and Strings David Mix Barrington 22 January 2014

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Finite Automata Part Three

Turing Machine Languages

MA/CSSE 474. Today's Agenda

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

8 ε. Figure 1: An NFA-ǫ

Learn Smart and Grow with world

Languages and Compilers

Dr. D.M. Akbar Hussain

Slides for Faculty Oxford University Press All rights reserved.

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

A Formal Study of Practical Regular Expressions

UNION-FREE DECOMPOSITION OF REGULAR LANGUAGES

Data Compression Fundamentals

TOPIC PAGE NO. UNIT-I FINITE AUTOMATA

Open and Closed Sets

14.1 Encoding for different models of computation

Section 1.7 Sequences, Summations Cardinality of Infinite Sets

Suffix Trees and Arrays

The Kuratowski Closure-Complement Theorem

Name: Finite Automata

Automata Theory CS S-FR Final Review

Regular Expressions & Automata

CIT3130: Theory of Computation. Regular languages

Lecture 5: The Halting Problem. Michael Beeson

Lecture 6,

Section 2.4 Sequences and Summations

Skyup's Media. PART-B 2) Construct a Mealy machine which is equivalent to the Moore machine given in table.

AUBER (Models of Computation, Languages and Automata) EXERCISES

Regular Expressions. Chapter 6

Formal Grammars and Abstract Machines. Sahar Al Seesi

1. Provide two valid strings in the languages described by each of the following regular expressions, with alphabet Σ = {0,1,2}.

Chapter Summary. Mathematical Induction Recursive Definitions Structural Induction Recursive Algorithms

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

Recursive Definitions Structural Induction Recursive Algorithms

Automata & languages. A primer on the Theory of Computation. The imitation game (2014) Benedict Cumberbatch Alan Turing ( ) Laurent Vanbever

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Lamé s Theorem. Strings. Recursively Defined Sets and Structures. Recursively Defined Sets and Structures

Chapter 18: Decidability

Sets MAT231. Fall Transition to Higher Mathematics. MAT231 (Transition to Higher Math) Sets Fall / 31

To illustrate what is intended the following are three write ups by students. Diagonalization

Phil 320 Chapter 1: Sets, Functions and Enumerability I. Sets Informally: a set is a collection of objects. The objects are called members or

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Lec-5-HW-1, TM basics

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Regular Expressions. Regular Expressions. Regular Languages. Specifying Languages. Regular Expressions. Kleene Star Operation

Math Introduction to Advanced Mathematics

CMSC 132: Object-Oriented Programming II

EDAA40 At home exercises 1

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

CS 3100 Models of Computation Fall 2011 This assignment is worth 8% of the total points for assignments 100 points total.

SC/MATH Boolean Formulae. Ref: G. Tourlakis, Mathematical Logic, John Wiley & Sons, York University

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

CMSC Theory of Algorithms Second Midterm

Recursion defining an object (or function, algorithm, etc.) in terms of itself. Recursion can be used to define sequences

CSE 105 THEORY OF COMPUTATION

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding

Structure of Programming Languages Lecture 3

MC 302 GRAPH THEORY 10/1/13 Solutions to HW #2 50 points + 6 XC points

Transcription:

Notes for Comp 454 Week 2 This week we look at the material in chapters 3 and 4. Homework on Chapters 2, 3 and 4 is assigned (see end of notes). Answers to the homework problems are due by September 10th. Errata in Chapter 3: None known. Chapter 3 You are probably familiar with the concept of recursion from programming and/or math classes. In Chapter 3, Cohen shows how recursion is a powerful tool in defining languages. Recall from last week that a language is a set of strings. A typical recursive definition of a set will have three parts: 1. A base set of objects, 2. Rules for specifying how additional objects are defined in terms of existing ones, 3. A rule that states that the only objects in the set are those required by rules 1 and 2. Example 1. EVEN: the set of (positive) even numbers. 1. The numbers 2 and 4 are in EVEN, 2. If x is in EVEN then so is x+4, 3. Only those numbers required to be in EVEN by rules 1 and 2 are in EVEN. Example 2. TRIPLEX: the set of strings of Xs that have length which is a multiple of 3. 1. Λ, the empty string, is in TRIPLEX, 2. If the string w is in TRIPLEX then so is the string wxxx, 3. Only strings required to be in TRIPLEX by 1 and 2 are in TRIPLEX. Usually, the third rule is assumed so we do not bother including it in our definition. There are often many ways to define a particular set. Here is another definition of EVEN from Cohen page 22. Example 3. EVEN: the set of positive even numbers (a different definition than Example 1). 1. The number 2 is in EVEN, 2. If both x and y are in EVEN then so is x + y. x can be the same as y. Verify for yourself that this includes every positive even number. Comp 454 Notes Page 1 of 8 September 3, 2013

Example 4. INTEGERS: the set of integers. 1. The number 1 is in INTEGERS, 2. If x and y are in INTEGERS then so are x+y and x-y. x can be the same as y. Verify for yourself that this defines a set that includes 0, +3 and -4 Note the difficulty of using this recursive approach to define the set of real numbers. There is no smallest real number on which to base a definition. AE: The Language of Arithmetic Expressions. It is interesting to define the language of arithmetic expressions as they appear in most programming languages. Limiting ourselves to constants for the moment, here are three examples of arithmetic expressions: (3 + 4) / ( 12 * ( 8 3 )) -7 * (( 4 + 21) / ( -8 + 23 )) 23 / 0 Note that we are only concerned with expressions that are syntactically correct so we do not care that the value of the last example is mathematically undefined. Here is Cohen s definition of AE. 1. Any number is in AE, 2. If x is in AE then so are (x) and -x unless x starts with a minus sign, 3. If x and y are in AE then so are x + y (y must not begin with a sign character), x - y (y must not begin with a sign character), x * y, x / y, x^y, x**y (whatever operators your language supports). The language defined in this way permits an expression like 34/24/8 but is not concerned with whether it means (34/24)/8 or 34/(24/8). Question: could you extend the definition of AE to include (a) identifiers and (b) function calls? THEOREM 2 No string in AE can contain $. $ is not part of any number. None of our recursive rules contains $. Therefore there is no way that $ can appear in a string in AE. Comp 454 Notes Page 2 of 8 September 3, 2013

THEOREM 3 No string in AE can begin or end with /. / is not part of any number. Any string formed by a recursive rule must start with a parenthesis or a number or -. Any string formed by a recursive rule must end with a parenthesis or a number. Therefore there is no way that a member of AE begins or ends with /. THEOREM 4 No string in AE can contain //. Read Cohen s proof of this on page 27. Rather than simply use a variation on the proof of Theorem 3, he uses a proof by contradiction. He assumes that there is a string in AE containing // then shows that this must contradict Theorem 3. Proof by contradiction is another tool that Cohen uses in later chapters. The language of well-formed formulae (WFF) See page 28. The structure of this definition is similar to the one for AE, just different operators. Chapter 4 Errata in Chapter 4: Page 41 13 lines from the end, delete either whether or if Page 41 penultimate line, delete. Regular expressions. Regular expressions might already be familiar to those of you who have used Unix shell commands. The name of the Unix pattern-matching utility grep is an acronym: global regular expression and print. The regular expression notation used by Cohen is a little different from Unix and is also somewhat different than used in most other computer theory textbooks. In regular expressions we have the notions of repetition, sequence and iteration. Repetition. Using the idea of the Kleene star, X* represents a sequence of zero or more X s. Thus, XSTRING = language(x*) = { Λ X XX XXX XXXX XXXXX } Comp 454 Notes Page 3 of 8 September 3, 2013

Sequence We use concatenation. For example, wx represents w followed by x. Note that pq* represents all strings where p is followed by any number of q s pq* = { p pq pqq pqqq pqqqq } and is different from (pq)* which represents strings that are zero or more repetitions of pq (pq)* = { Λ pq pqpq pqpqpq pqpqpqpq } If each string in our language is a sequence of one or more X s rather than zero or more X s then we can represent it XX* or X*X. To simplify matters we could define the + operator to represent one or more repetitions, thus X + is the same language as XX*. There are likely to be many ways to represent an infinite language. Each of the following regular expression represents the language of strings consisting of one or more x s: xx* x + xx*x* x*xx* x + x* x*x + x*x*x*xx* Convince yourself that each expression does define the language of one or more x s. What if the language were strings consisting of two or more x s? Are there correspondingly many ways to represent a finite language? Choice On page 34 Cohen introduces his or operator. Some books and Web sites use the Unix pipe operator for this. Thus, to represent either x or y Cohen: Others: x + y x y Now we have choice, sequence and iteration giving us a complete set of operators for regular expressions. Here are some examples of expressions and the set of strings that each represents. (x + y)z* = { x y xz yz xzz yzz }. (a+b)* = { Λ a b aa ab ba bb aaa }. a(a+b)*b = { ab aab abb aaab aabb abab abbb aaaab }. Comp 454 Notes Page 4 of 8 September 3, 2013

Formal Definition of Regular Expression (R.E.) We can use our ideas from recursive definitions to define a valid regular expression as follows. 1. Λ is a regular expression and every character in Σ is a regular expression. 2. If w is a regular expression then so are (w) and w*. 3. If w and v are regular expressions then so are wv and w+v. Example regular expressions. Assume that our alphabet is {a b}. There are often many different possible regular expressions for a particular language. Language of all words containing at least one a can be represented by (a+b)*a(a+b)*. Language of all words containing at least two a s can be represented by (a+b)*ab*ab*. Language of all words containing exactly two a s can be represented by b*ab*ab*. See the example on page 39 of how relatively tricky it turns out to be to devise an R.E. representing the set of strings that have at least one a and at least one b. The a might appear before the b or it might appear after it. Product set If S and T are both sets of strings then we define the product set of S and T, denoted ST, as the set of strings in which each member is a member of S concatenated with a member of T. Example 1 S = { cat dog } T = {fish house } ST = { catfish cathouse dogfish doghouse }. Example 2 S = { Λ a aa } T = { Λ b bbb bbbbb } ST = { Λ a b aa ab aab bbb abbb }. Languages associated with regular expressions. Suppose we have regular expression R1 defining a language and regular expression R2 defining a language, what is the language defined by R1+R2 and by R1R2? Again, Cohen takes a recursive approach. (1) If an R.E. is a single letter, the corresponding language is that one-letter word. (2) If R.E. R 1 defines language L 1 and R.E. R 2 defines language L 2 then (R 1 )(R 2 ) defines the product of L 1 and L 2, that is, a language in which each string is a string from L 1 followed by a string from L 2. Comp 454 Notes Page 5 of 8 September 3, 2013

(R 1 ) + (R 2 ) defines the union of L 1 and L 2, that is, a language in which each string is either in L 1 or in L 2. The language associated with (R 1 )* is the Kleene closure of L 1, that is a language in which each string is a sequence of zero or more strings drawn from L 1. This means that every regular expression defines a language; that is, it represents a set of strings. It is often quite tricky to determine the characteristics of the language given the regular expression. Cohen s example on page 39, slightly modified, is ( (a+b)*a(a+b)*b(a+b)* ) + bb*aa*. This represents the set of strings that have both a and b in them. The expression to the left of the top-level + represents the strings where the a precedes the b and the expression on the right of the + represents the strings omitted by the first term. THEOREM 5 All finite languages are regular. This is clearly true because we can list the strings in the set and devise the appropriate R.E. So if our language is { cat dog frog mouse } the R.E. is R = cat + dog + frog + mouse The understandability of regular expressions. The ease with which we can determine the language represented by an R.E. is highly-variable. Given that (a+b)* represents any string of a s and b s then clearly (a+b)*(aa+bb)(a+b)* represents strings that are guaranteed to have at least one instance of either aa or bb. But trying to define an expression for the inverse, that is strings that do not contain either aa or bb is tricky. See page 45. Try the extended example on page 46. See if you can determine what language is represented by the expression at the top of the page before reading Cohen s analysis. Note the observation on Page 47. It is unknown whether an algorithm exists that can transform an arbitrary regular expression to another equivalent one. Comp 454 Notes Page 6 of 8 September 3, 2013

The language EVEN-EVEN. (see page 48). The language EVEN-EVEN will appear at several points in the book. It is simply the set of strings where each string has an even number of a s and an even number of b s. Strings in EVEN-EVEN include Λ abba bbbbbb abbaababaa. Strings not in EVEN-EVEN include baa bababa bab aaaaa aaabbbb. The language is easily specified but what about devising a regular expression? Let us work backwards to the expression that Cohen gives. Every string that is in EVEN-EVEN can be split up into syllables where each syllable is of one of three types. 1: aa. 2: bb. 3: mismatch followed by any number of aa/bb strings followed by mismatch. Where mismatch means ab or ba. Note that in each syllable there is an even number of a s and an even number of b s (Zero is even). Examples of strings in EVEN-EVEN broken into syllables as defined above aaabaabbbabbaaaa : a a a b a a b b b a b b a a a a abbaabbbabaa : a b b a a b b b a b a a This leads us to the following R.E. for EVEN-EVEN EVEN-EVEN = [ aa + bb + (ab+ba)(aa+bb)*(ab+ba) ]*. Homework 1 There are 5 homework problems drawn from Chapters 2, 3 and 4. Each answer is worth a maximum of 20 points. Answers due 9/10/13. 1. Consider the language PALINDROME over the alphabet { a b } (a) Prove that if x is in PALIDROME, then so is x n for any n > 0 (b) Prove that if y 3 is in PALINDROME then so is y (c) Prove that PALINDROME has as many words of length 4 as it does of length 3 (d) Prove that PALINDROME has as many words of length 2n as it does of length 2n-1. Comp 454 Notes Page 7 of 8 September 3, 2013

2. Show that the following is another recursive definition of the set EVEN: Rule 1: 2 and 4 are in EVEN Rule 2: If x is in EVEN, then so is x+4. 3. Using the second recursive definition of EVEN (page 22), what is the smallest number of steps required to prove that 100 is EVEN? Describe a good method for showing that 2*n (for positive integer n) is in EVEN. 4. Construct a regular expression defining each of the following languages over the alphabet {a b}. (i) (ii) all words that do not have the substring ab all words that do not have both the substrings bba and abb. 5. Show that the following pairs of regular expressions define the same language over the alphabet Σ = {a b} (a) ((a + bb)*aa)* and Λ + (a + bb)*aa (b) a(aa)*(λ + a)b + b and a*b Reading Assignment Read Chapters 3 and 4. Next week s class notes will cover Chapters 5 and 6. Comp 454 Notes Page 8 of 8 September 3, 2013