CS 36 Meeting 8 9/4/8 Announceents. Hoework 3 due Friday. Review. The closure properties of regular languages provide a way to describe regular languages by building the out of sipler regular languages using the operations union, product and closure.. The notation called regular expressions is based on this fact. Definition: Given soe finite alphabet Σ, we define e to be a regular expression if e is a for soe a Σ ε e e, where e and e are regular expressions e e = e e where e and e are regular expressions e where e is a regular expression. (e ) where e is a regular expression. 3. We view regular expressions as another foralis for describing languages. If e is a regular expression, the language defined by e is denoted by L(e) and defined recursively/inductively as follows: Base clauses L(x) for soe a Σ is just {a} L( ) is L(ε) is {ɛ} Recursive clauses L(e e ) is L(e ) L(e ) L(e e ) is L(e )L(e ) Click here to view the slides for this class L(e ) is L(e ) L((e )) is L(e ) 4. Given the closure properties we have just shown, it is clear that all regular expressions describe regular languages. It is also true, though far fro clear, that every regular language can be described by soe regular expression. Our first goal to day will be to justify this clai. 5. Here are soe languages one ight want to describe with regular expressions Binary strings of length or less: ( ɛ)( ɛ). Binary strings that end in : ( ). Binary strings that don t end in : ( ɛ)( ɛ) ( ) ( ) Binary strings that are ultiples of 3:? 6. The last exaple raises the question of whether or not every regular language can be described by a regular expression. Generalize Nondeterinistic Finite Autoata. The book presents an algorith that translates the description of a DFA into a regular expression describing the sae language. The existence of and correctness of this algorith proves that all regular languages are described by soe regular expression.. To introduce this algorith, let s think about how we would convert the divisible by 3 FDA we have considered previously into a regular expression:
3. Looking at the diagra for this achine, it is clear that for the achine to go fro state to state and then get back to state again, it ust encounter an input substring described by the regular expression. Given this fact, if we don t really want to have to think about state, we could use the following diagra to capture the behavior of the achine. to use regular expressions. The idea is that the achine can ove fro one state to another if it finds a sequence of input sybols that atch the regular expression on the edge connecting the states. The word ay is critical here. Like an NFA, we assue this achine is very clever at guessing which strings to atch with the regular expressions labeling its edges. 5. Just as we were able to eliinate state in our diagra by adding an edge labeled with a regular expression to account for its absence we can also eliinate state. In the reduced version of the achine, there is a path fro to through and there is also a path fro back to itself through. We will need to account for both paths with new edges. To follow the path fro to we ust see a string that atches ( ). To follow the path fro back to itself, we siilarly ust see a string that atches ( ). This leads to the GNFA shown below. (*)* (*)* * 4. This diagra is an exaple of what the text calls a generalized nondeterinistic finite autoata or GNFA. It is basically a NFA where instead of labeling transitions with siple sybols, we allow ourselves 6. We now have ultiple edges fro to and fro back to itself. In an NFA, this would not bother us. In a GNFA, however, since we have the power to use regular expressions as labels, we can eliinate such edges by creating a single edge labeled with the union of the regular expressions on the existing edges. Doing this to our achine yields.
U ( (*)* ) ( U ( (*)* )) ( U ( (*)* ) )* U ( (*)* ) 7. At this point, you should be able to tell what regular expression describes the language of this achine. But it would be nice if we could continue the approach of reoving nodes fro the achine until we got to the point where we had a single edge labeled with the desired regular expression. This is hard to do when the achine reaches the point that all we have left is the only state and the only state and they are different states. 8. Given that we have ɛ-transitions, we can fix this by creating a single, external state with ɛ-transitions going fro what would norally be our states to this new state. U ( (*)* ) U ( (*)* ) 9. Now we can use the sae approach we used to eliinate states and to eliinate state giving:. Aazingly, if you think about it you will (ay?) realize that ( (( ) ))( (( ) )) actually does describe the language of binary nubers divisible by 3.. The algorith in the book takes the basic approach that we just followed, but strealines things in several ways. First, rather than waiting to add a single, separate state when they get in trouble, the algorith s by adding both a new state and a new state and connecting these new states to the original state and states with epsilontransitions. Second, so that they don t have to handle the erging of edges as a special case, they iediately add edges between all states not connected directly by edges (except for their new and state) that are labeled with the regular expression. They can get away with this because such edges act as if they are not there. In class and when doing your hoework, it is not worth adding these edges. Just erge or add edges when appropriate. With this in ind, let s consider the ultiples of 3 achine again. To ake it interesting, we can reove states in a different order. Also, I will ask you to help by telling e which edges I will have to add or augent when I reove an existing state and by telling e what the labels on these edges should be. 3
3. As our first step, rather than waiting until we get in trouble, we iediately augent the achine with a new state and a new state. We add epsilon-transitions fro the new state to the old one and fro all old states to the new one. U U U 5. Next we will reove state. This requires updating the labels of the edges fro to and fro to itself to reflect the paths between these sources and destinations that currently pass through. U U ( U )* U U ( U )* 4. Now, instead of reoving state, let s try reoving state first. This will require adding edges fro to, fro to, and fro to. We will also have to update the label of the edges fro to, and the loops fro to itself and fro to itself. The result looks like: 6. Finally, since the only edge fro to is an epsilon-transition, it is clear that we can reove both states and to obtain: 4
( U U ( U )*) ( U U ( U )*)*. After spending hours aking up the slides showing how to extract a regular expression for a language fro a DFA that recognizes the language, I could not help thinking that it would be nice to have soe sort of regular expression checker that would tell e for sure that two regular expressions actually do describe the sae languages. 3. If you think about it, you will realize that what I really wanted was a decider for the language: 7. If you have a really good eory, you will have already noticed that we obtained a different regular expression by perforing this sequence of state eliinations that we did last tie. Eliinating then then and gave us: ( (( ) ))( (( ) )) Eliinating then then and gave us: ( ( ) )( ( ) ) Hopefully, these two regular expressions describe the sae sets! 8. My hope is that this practice gives you a clear enough understanding of how to use GNFAs to extract a regular expression that describes the language of a DFA. The book gives a ore foral presentation (alost a proof). You should reread (or read) that section now to solidify you understanding and convince yourselves that the algorith can be applied to any DFA. Languages that are not regular. We have seen two distinct exaples regular expressions that (should/ight) describe the sae language binary representations of nubers divisible by 3: Jason s and ( ( ) )( ( ) ) L EQ RE = {e = e e & e are regular expressions over Σ and L(e) = L(e )} 4. This language is a bit ore interesting than ost of the exaples we have been talking about so far this seester. Certainly, it would be harder for you to write a progra that decided whether an input belonged to this language than it would be to decide if a binary string represented a nuber divisible by 3. 5. If this proble does not ipress you, consider the siilar proble for a language soewhat richer than the language of regular expressions: L EQ Java = {j = j j & j are Java progras that behave identically} Those in the know ight even suspect that writing a progra to recognize strings that belong to this language is ore than difficult. 6. For now, let s stick to regular languages and ask whether a set like L EQ RE = {e = e e & e are regular expressions over Σ and L(e) = L(e )} is regular. 7. In fact, let s with soething even easier. In your last hoework assignent I entioned that {a+b = c a, b, c {, } and the su of the nubers represented by a and b in binary notation is the nuber represented by c } was not regular. Let s consider an even sipler representation of addition: 5
{a + b = c a, b, c {} and the su of the nubers represented by a and b in unary notation is the nuber represented by c } 8. That is, we would like to deterine whether the language L UnaryAdd = { a + b = c k refers to a string of k s and a + b = c} is regular. Getting Loopy. One approach to showing that a particular language is not regular involves recognizing that strings of sufficient length will encounter loops of states as they are processed by a DFA. I want to ake this notion very concrete for you before using it in a ore abstract way to show languages are not regular. varying lengths that take the achine fro state to state while repeating at least one but as few as possible of the states in the path. Here are soe exaples: input path ɛ ɛ 3 ɛ ɛ 3 I haven t included inputs that caused siple cycles of length 5 and 6 because there are no such siple cycles in this achine s state graph. Consider the following achine (which happens to be the 5 version of the binary nubers that are ultiples of N DFA we presented using our foral notation earlier). 3 4 What I would like to do with this achine is look for siple cycles. To be ore precise I would like to look for strings of 6