CS103 Handout 39 Winter 2018 February 23, 2018 Problem Set 7

Similar documents
CS103 Handout 35 Spring 2017 May 19, 2017 Problem Set 7

CS103 Handout 14 Winter February 8, 2013 Problem Set 5

CS103 Handout 13 Fall 2012 May 4, 2012 Problem Set 5

CS103 Handout 50 Fall 2018 November 30, 2018 Problem Set 9

CS103 Handout 42 Spring 2017 May 31, 2017 Practice Final Exam 1

Problem One: A Quick Algebra Review

CS103 Handout 29 Winter 2018 February 9, 2018 Inductive Proofwriting Checklist

Context-Free Grammars

Skill 1: Multiplying Polynomials

CS103 Spring 2018 Mathematical Vocabulary

Finite Automata Part Three

Range Minimum Queries Part Two

CS103 Handout 16 Winter February 15, 2013 Problem Set 6

Graph Theory Part Two

Chapter Seven: Regular Expressions

(Refer Slide Time: 0:19)

Context-Free Grammars

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

Range Minimum Queries Part Two

In our first lecture on sets and set theory, we introduced a bunch of new symbols and terminology.

Recursively Enumerable Languages, Turing Machines, and Decidability

CMPSCI 250: Introduction to Computation. Lecture 20: Deterministic and Nondeterministic Finite Automata David Mix Barrington 16 April 2013

Extra Practice Problems 2

Proofwriting Checklist

Lecture 2 Finite Automata

14.1 Encoding for different models of computation

NFAs and Myhill-Nerode. CS154 Chris Pollett Feb. 22, 2006.

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

6.001 Notes: Section 8.1

Notebook Assignments

CSE 21 Spring 2016 Homework 5. Instructions

Ambiguous Grammars and Compactification

MITOCW watch?v=zm5mw5nkzjg

Finite Automata Part Three

This is already grossly inconvenient in present formalisms. Why do we want to make this convenient? GENERAL GOALS

Chapter 3. Set Theory. 3.1 What is a Set?

CS 125 Section #4 RAMs and TMs 9/27/16

CS52 - Assignment 10

ECS 120 Lesson 7 Regular Expressions, Pt. 1

MITOCW watch?v=4dj1oguwtem

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 4: examples of topological spaces, coarser and finer topologies, bases and closed sets

Formal Methods of Software Design, Eric Hehner, segment 24 page 1 out of 5

COMP 330 Autumn 2018 McGill University

CPSC 320 Sample Solution, Playing with Graphs!

Context-Free Grammars

CSE 105 THEORY OF COMPUTATION

Proof Techniques Alphabets, Strings, and Languages. Foundations of Computer Science Theory

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

Implementation of Lexical Analysis. Lecture 4

CS143 Handout 20 Summer 2012 July 18 th, 2012 Practice CS143 Midterm Exam. (signed)

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

Formal Methods of Software Design, Eric Hehner, segment 1 page 1 out of 5

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

CSE 105 THEORY OF COMPUTATION

MITOCW watch?v=0jljzrnhwoi

First-Order Translation Checklist

CS2 Practical 2 CS2Ah

PROFESSOR: Last time, we took a look at an explicit control evaluator for Lisp, and that bridged the gap between

Ling/CSE 472: Introduction to Computational Linguistics. 4/6/15: Morphology & FST 2

Turing Machines Part Two

Reflection in the Chomsky Hierarchy

Assignment 1: grid. Due November 20, 11:59 PM Introduction

p x i 1 i n x, y, z = 2 x 3 y 5 z

CSE 105 THEORY OF COMPUTATION

Binary Trees. This recursive defnition prefgures the patern of most algorithms we use on this data structure, as we will see below.

6.001 Notes: Section 6.1

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

MITOCW watch?v=kz7jjltq9r4

MITOCW ocw f99-lec07_300k

University of Nevada, Las Vegas Computer Science 456/656 Fall 2016

Topology and Topological Spaces

Computational Complexity and Implications for Security DRAFT Notes on Infeasible Computation for MA/CS 109 Leo Reyzin with the help of Nick Benes

CS3102 Theory of Computation Problem Set 2, Spring 2011 Department of Computer Science, University of Virginia

III. Supplementary Materials

Foundations, Reasoning About Algorithms, and Design By Contract CMPSC 122

CS/ECE 374 Fall Homework 1. Due Tuesday, September 6, 2016 at 8pm

1. [5 points each] True or False. If the question is currently open, write O or Open.

Discrete Mathematics for CS Spring 2008 David Wagner Note 13. An Introduction to Graphs

MITOCW watch?v=hverxup4cfg

CS402 - Theory of Automata Glossary By

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

6.001 Notes: Section 15.1

mk-convert Contents 1 Converting to minikanren, quasimatically. 08 July 2014

MITOCW watch?v=yarwp7tntl4

Languages and Compilers

CSE 21 Spring 2016 Homework 5. Instructions

Turing Machine Languages

Order from Chaos. University of Nebraska-Lincoln Discrete Mathematics Seminar

CPSC W2 Midterm #2 Sample Solutions


Introduction; Parsing LL Grammars

Binary, Hexadecimal and Octal number system

Theory of Computation

CPSC 536N: Randomized Algorithms Term 2. Lecture 5

Midterm 2 Solutions. CS70 Discrete Mathematics for Computer Science, Fall 2007

An Interesting Way to Combine Numbers

CS 125 Section #10 Midterm 2 Review 11/5/14

Transcription:

CS103 Handout 39 Winter 2018 February 23, 2018 Problem Set 7 What can you do with regular expressions? What are the limits of regular languages? In this problem set, you'll explore the answers to these questions along with their practical consequences. As always, please feel free to drop by ofce hours, ask on Piazza, or send us emails if you have any questions. We'd be happy to help out. Good luck, and have fun! Due Friday, March 2 nd at 2:30PM

Problem One: Designing Regular Expressions Below are a list of alphabets and languages over those alphabets. For each language, write a regular expression for that language. 2 / 8 Please use our online tool to design, test, and submit your regular expressions. Typed or handwritten solutions will not be accepted. To use it, visit the CS103 website and click the Regex Editor link under the Resources header. As before, if you submit in a pair, please make a note in your GradeScope submission of which partner submitted your answers to this question so that we know where to look. Also, as a reminder, please test your submissions thoroughly, since we'll be grading them with an autograder. i. Let Σ = {a, b} and let L = { w Σ* w does not contain ba as a substring }. Write a regular expression for L. ii. Let Σ = {a, b} and let L = { w Σ* w does not contain bb as a substring }. Write a regular expression for L. You may want to draw out some sample strings in this language and look for a pattern. iii. Suppose you are taking a walk with your dog on a leash of length two. Let Σ = {y, d} and let L = { w Σ* w represents a walk with your dog on a leash where you and your dog both end up at the same location }. For example, we have yyddddyy L because you and your dog are never more than two steps apart and both of you end up four steps ahead of where you started; similarly, ddydyy L. However, yyyyddd L, since halfway through your walk you re three steps ahead of your dog; ddyd L, because your dog ends up two steps ahead of you; and ddyddyyy L, because at one point your dog is three steps ahead of you. Write a regular expression for L. iv. Let Σ = {a, b} and let L = { w Σ* w ab }. Write a regular expression for L. v. Let Σ = {M, D, C, L, X, V, I} and let L = { w Σ* w is number less than 2,000 represented in Roman numerals }. For example, CMXCIX L, since it represents the number 999, as are the strings L (50), VIII (8), DCLXVI (666), CXXXVII (137), CDXII (412), and MDCXVIII (1,618). However, we have VIIII L (you'll never have four I's in a row; use IX or IV instead), that MM L (it's a Roman numeral, but it's for 2,000, which is too large), that VX L (this isn't a valid Roman numeral), and that IM L (the notation of using a smaller digit to subtract from a larger one only lets you use I to prefx V and X, or X to prefx L and C, or C to prefx D and M). The Romans didn't have a way of expressing the number 0, so to make your life easier we'll say that ε L and that the empty string represents 0. (Oh, those silly Romans.) Write a regular expression for L. (As a note, we re using the standard form of Roman numerals. You can see a sample of numbers written out this way via this link.) Problem Two: Finite and Cofnite Languages A language L is called fnite if L contains fnitely many strings (that is, L is a natural number.) A language L is cofnite if its complement is a fnite language; that is, L is cofnite if L is a natural number. i. Given a fnite language L, explain how to write a regular expression for L. Briefy justify your answer; no formal proof is necessary. This shows that all fnite languages are regular. ii. Look up the trie data structure on Wikipedia. Explain, in your own words, how a trie can be thought of as a fnite automaton. This gives another view on why fnite languages are regular. iii. Prove that any cofnite language is regular.

Problem Three: State Elimination The state elimination algorithm gives a way to transform a fnite automaton (DFA or NFA) into a regular expression. It's a really beautiful algorithm once you get the hang of it, so we thought that we'd let you try it out on a particular example. Let Σ = {a, b} and let L = { w Σ* w has an even number of a's and an even number of b's}. Below is a fnite automaton for L that we've prepared for the state elimination algorithm by adding in a new start state q start and a new accept state q acc : 3 / 8 We'd like you to use the state elimination algorithm to produce a regular expression for L. i. Run two steps of the state elimination algorithm on the above automaton. Specifcally, frst remove state q₁, then remove state q₂. Show your result at this point. Go slowly. Remember that to eliminate a state q, you should identify all pairs of states q in and q out where there s a transition from q in to q and from q to q out, then add shortcut edges from q in to q out to bypass state q. Remember that q in and q out can be the same state. If you ve done everything right at the end of this stage, none of the transitions you have at this point should have Kleene stars in them. ii. Finish the state elimination algorithm. What regular expression do you get for L? You should start seeing Kleene stars appearing as you remove the remaining states. iii. Without making reference to the original automaton given above, give an intuitive explanation for how the regular expression you found in part (ii) works.

Problem Four: Distinguishable Strings The Myhill-Nerode theorem is one of the trickier and more nuanced theorems we've covered this quarter. This question explores what the theorem means and, importantly, what it doesn't mean. Let Σ = {a, b} and let L = { w Σ* w is even }. i. Show that L is a regular language. There are lots of ways you can go about doing this. Pick whatever is easiest. ii. Prove that there is a infnite set S Σ* where there are infnitely many pairs of distinct strings x, y S such that x L y. An analogy that might help: do you understand the diference between a group of six people, any two of which are friends versus a group of six people, of which six pairs of those people are friends? iii. Prove that there is no infnite set S Σ* where all pairs of distinct strings x, y S satisfy x L y. 4 / 8 The distinction between parts (ii) and (iii) is important for the Myhill-Nerode theorem. A language is nonregular not if you can fnd infnitely many pairs of distinguishable strings, but rather if you can fnd infnitely many strings that are all pairwise distinguishable. Problem Five: At Least Three Point Five The Myhill-Nerode theorem is a powerful tool for fnding nonregular languages, but it can take some adjusting to get used to. Let Σ = {a, b} and consider the following language: L = { w Σ* there are at least as many a s as b s in w }. Below is a purported proof that L is not a regular language. Theorem: L is not a regular language. Proof: Consider the set S = { a n n N }. This set is infnite because it contains one string for each natural number. We will prove that any two distinct strings in S are distinguishable relative to L. To do so, consider any distinct strings a m, a n S, and assume without loss of generality that m > n. Then b m a m L because this string contains the same number of a s and b s, but b m a n L because it contains m b s and n a s and m > n. Therefore, we see that a m L a n. This means that S is an infnite set of strings that are pairwise distinguishable relative to L. Therefore, by the Myhill-Nerode theorem, L is not regular. Unfortunately, this proof is incorrect. i. What s wrong with this proof? Be as specifc as possible. The best way to identify a faw in a proof is to point to a specifc claim that s being made that s not true or not properly substantiated and to explain why. ii. Is L a regular language? If so, show why it s regular. If not, prove that L is not a regular language. Think about the intuition behind what makes a language regular or not regular. If your intuition tells you that this language is regular, then use that intuition to guide your justifcation. If your intuition tells you that this language is not regular, use that intuition to guide your disproof. And if you don t have an intuition, take a look back at the lecture slides and feel free to stop by ofce hours!

Problem Six: Balanced Parentheses Consider the following language over Σ = {(, )}: L₁ = { w Σ* w is a string of balanced parentheses } For example, we have () L₁, (()) L₁, (()())() L₁, ε L₁, and (())((()())) L₁, but )( L₁, (() L₁, and ((()))) L₁. This question explores properties of this language. 5 / 8 i. Prove that L₁ is not a regular language. One consequence of this result which you don't need to prove is that most languages that support some sort of nested parentheses, such as most programming languages and HTML, aren't regular and so can't be parsed using regular expressions. As with all problems involving nonregular languages, proceed with this one in stages. First, ask yourself: if you were reading an input string from left to right, what information would you necessarily have to keep track of? The Myhill-Nerode theorem asks you to fnd an infnite set of strings that are all pairwise distinguishable, so try creating an infnite set of strings, one for each possible value that this information could take on. Let's say that the nesting depth of a string of balanced parentheses is the maximum number of unmatched open parentheses at any point inside the string. For example, the string ((())) has nesting depth three, the string (()())() has nesting depth two, and the string ε has nesting depth zero. Consider the language L₂ = { w Σ* w is a string of balanced parentheses and w's nesting depth is at most four }. For example, ((())) L₂, (()()) L₂, and (((())))(()) L₂, but ((((())))) L₂ because although it's a string of balanced parentheses, the nesting goes fve levels deep. ii. Design a DFA for L₂, showing that L₂ is regular. A consequence of this result is that while you can't parse all programs or HTML with regular expressions, you can parse programs with low nesting depth or HTML documents without deeply-nested tags using regexes. Please submit this DFA using the DFA editor on the course website and tell us on GradeScope who submitted it. iii. Look back at your proof from part (i) of this problem. Imagine that you were to take that exact proof and blindly replace every instance of L₁ with L₂. This would give you a (incorrect) proof that L₂ is nonregular (which we know has to be wrong because L₂ is indeed regular.) Where would the error be in that proof? Be as specifc as possible. Again, you should be able to point at a specifc spot in the proof that contains a logic error and explain exactly why the statement in question is not true or not supported by the preceding statements. If you can t do this, it likely means you have an error in your proof from part (i)! iv. Intuitively, regular languages correspond to problems that can be solved using only fnite memory. Using this intuition and without making reference to DFAs, NFAs, or the Myhill-Nerode theorem, explain why L₁ is nonregular while L₂ is regular.

Problem Seven: Tautonyms A tautonym is a word that consists of the same string repeated twice. For example, caracara and dikdik are all tautoynms (the frst is a bird, the second the cutest animal you'll ever see), as is hotshots (people who aren't very fun to be around). Let Σ = {a, b} and consider the following language: L = { ww w Σ* } This is the language of all tautonyms over Σ. Below is an incorrect proof that L is not regular: Proof: Let S = { a n n N }. This set is infnite because it contains one string for each natural number. We claim that any two strings in S are distinguishable relative to L. To see this, consider any two distinct strings a m and a n in the set S, where m n. Then a m a m L but a n a m L, so a m L a n. This means that S is an infnite set of strings that are pairwise distinguishable relative to L. Therefore, by the Myhill-Nerode theorem, L is not regular. i. What's wrong with this proof? Be as specifc as possible. This one is subtle, and it s not the same error that you identifed earlier. As before, point out a specifc claim that is made that isn t true or isn t supported by the preceding statements and justify why. ii. Although the above proof is incorrect, the language L isn't regular. Prove this. 6 / 8 The most common mistake we see people make in part (ii) of this problem is to make the exact same sort of mistake that they identifed in part (i) of this problem (oops!) So go slowly here and double-check your proof to make sure it s airtight, especially in light of what you saw in part (i). You may want to take a sentence or two to precisely articulate why each claim that you re making is true, since as you saw above it s easy for an erroneous statement to slip in under the radar if you re not careful!

Problem Eight: State Lower Bounds 7 / 8 The Myhill-Nerode theorem we proved in lecture is actually a special case of a more general theorem about regular languages that can be used to prove lower bounds on the number of states necessary to construct a DFA for a given language. i. Let L be a language over Σ. Suppose there's a fnite set S such that any two distinct strings x, y S are distinguishable relative to L (that is, x L y for any two strings x, y S where x y.) Prove that any DFA for L must have at least S states. (You sometimes hear this referred to as lower-bounding the size of any DFA for L.) According to old-school Twitter rules, all tweets need to be 140 characters or less. Let Σ be the alphabet of characters that can legally appear in a tweet and consider the following language: TWEETS = { w Σ* w 140 }. This is the language of all legal tweets, assuming the empty string is a legal tweet. The good news is that this language is regular. The bad news is that any DFA for it has to be pretty large. ii. Using your result from part (i), prove that any DFA for TWEETS must have at least 142 states. It might be easier to tackle this problem if you consider replacing 140 and 142 with some smaller numbers (say, 2 and 4) to build up an intuition. And work backwards what will you need to do to invoke part (i)? iii. Refer back to the formal, 5-tuple defnition of a DFA given in Problem Set Six. Defne an 142- state DFA for TWEETS using the formal 5-tuple defnition. Briefy explain how your DFA works. No formal proof is necessary. Again, this might be a lot easier to do if you frst reduce 140 and 142 to 2 and 4, respectively, and see what you come up with. Start by drawing out what the DFA would look like, then think about how you d formalize your idea as a 5-tuple. Your results from parts (ii) and (iii) collectively establish that the smallest possible DFA for TWEETS has exactly 142 states. This approach to fnding the smallest object of some type using some theorem to prove a lower bound ( we need at least this many states ) combined with a specifc object of the given type ( we certainly can t do worse than this ) is a common strategy in discrete mathematics, algorithm design, and computational complexity theory. If you take classes like CS161, CS254, etc., you ll likely see similar sorts of approaches!

Problem Nine: Closure Properties Revisited When building up the regular expressions, we explored several closure properties of the regular languages. This problem explores some of their nuances. The regular languages are closed under complementation: If L is regular, so is L. i. Prove or disprove: the nonregular languages are closed under complementation. In other words, if you have a nonregular language L, is L necessarily nonregular? The regular languages are closed under union: If L₁ and L₂ are regular, so is L₁ L₂. ii. Prove or disprove: the nonregular languages are closed under union. 8 / 8 We know that the union of any two regular languages is regular. Using induction, we can show that the union of any fnite number of regular languages is also regular. As a result, we say that the regular languages are closed under fnite union. An infnite union is the union of infnitely many sets. Formally speaking, imagine that you have a sequence of infnitely many sets S₀, S₁, S₂,, one for each natural number. Then the infnite union of those sets is defned as follows: S i = {w n N. w S n }. i=0 For example, if L is a language, then L*, which we defned as { w Σ* n N. w L n }, can be thought of as the infnite union L i. i=0 Similarly, the set of all rational numbers, Q, can be thought of as x Q = { i=0 i +1 x Z }. iii. Prove or disprove: the regular languages are closed under infnite union. In other words, if you have an infnite list of regular languages L₀, L₁, L₂,, is their infnite union necessarily a regular language? Optional Fun Problem: Generalized Fooling Sets (1 Point Extra Credit) In Problem Eight, you used distinguishability to lower-bound the size of DFAs for a particular language. Unfortunately, distinguishability is not a powerful enough technique to lower-bound the sizes of NFAs. In fact, it's in general quite hard to bound NFA sizes; there's a $1,000,000 prize for anyone who fnds a polynomial-time algorithm that, given an arbitrary NFA, converts it to the smallest possible equivalent NFA! Although it's generally difcult to lower-bound the sizes of NFAs, there are some techniques we can use to fnd lower bounds on the sizes of NFAs. Let L be a language over Σ. A generalized fooling set for L is a set F Σ* Σ* is a set with the following properties: For any (x, y) F, we have xy L. For any distinct pairs (x₁, y₁), (x₂, y₂) F, we have x₁y₂ L or x₂y₁ L (this is an inclusive OR.) Prove that if L is a language and there is a generalized fooling set F for L that contains n pairs of strings, then any NFA for L must have at least n states.