CS2 Practical 2 Finite automata This practical is based on material in the language processing thread. The practical is made up of two parts. Part A consists of four paper and pencil exercises, designed to test your understanding of the lecture material. Your answers to this part should be submitted to your tutor, by the deadline below, using whatever mechanism your tutor arranges for receiving submissions. For part B you must write and submit part of a Java application, which can be used to convert NFAs into equivalent DFAs and to simulate the execution of DFAs. Your answer to this part should be submitted electronically, using the handin command, as described below. Both parts of the practical are worth 50% of the mark. Note that there are no dependencies between parts A and B of the practical. You are strongly encouraged to start on both parts of the practical as soon as possible. In particular, it is not advisable to leave part B until you have completed part A. This practical has been issued on Monday 5th November, and the deadline for submitting answers to both parts is 5pm on Monday 19th November. Late submissions will only be accepted in the case of genuine mitigating circumstances, for example illness, and with the agreement of the course organiser. Your marked work for this practical will be returned to you by your tutor during the tutorial in the week starting Monday 3rd December. Remember that you will need at least a grade C (i.e. at least 50%) in your final CS2 mark to proceed to any of the Computer Science or Software Engineering single or joint honours courses. You should also bear in mind the guidelines on plagiarism, which can be found via a link on the CS2 Web page. Resources The practical handout refers to various files. webpage These can all be found on the http://www.dcs.ed.ac.uk/teaching/cs2/www/practicals.html. 1
Part A [50] 1. (a) Design a DFA that recognises the language L over the alphabet {a, b, c} consisting of all strings that contain the substring ababc, i.e., the language L = { xababcy x, y {a, b, c} }. Draw a picture of this DFA and write out the formal specification of the DFA including its transition table. (b) Design a DFA that accepts precisely those strings over {a, b, c} that are not contained in the language L defined in (a), i.e., a DFA that recognises the language Draw a picture of this DFA. L { = { x {a, b, c} x L }. 2. Which of the following languages over the alphabet {0, 1, 2} are regular? L 1 = { 0 n 112 2n n N }, L 2 = {(01122) n n N}, L 3 = { 0 m 112 2n m, n N }. Justify your answers, either by providing a DFA, an NFA, or a regular expression for the language to show that it is regular or by proving that the language is not regular. Use the Pumping Lemma to prove that a language is not regular. 3. (a) Give regular expressions for the following two languages over the alphabet {0, 1}: L 1 = { x {0, 1} x contains an even number of 0s }, L 2 = { x {0, 1} x contains an even number of 0s and an odd number of 1s }. Hint: To find regular expressions, you may first want to design a DFA recognising the languages and then convert them to regular expressions using the method described in Lecture Note 4. (b) Construct an NFA with ε-transitions that recognises the language L(R) of the following regular expression R over the alphabet {a, b}: 4. For a language L Σ, let R = (a + b(ab a) b) SH(L) = { y Σ x Σ x = y and xy L }, the language consisting of the second halves of all strings of even length in L. For example, if L = {ε, a, abba, aba, aaba, aa} then SH(L) = {ε, ba, a}. Show that if L is regular then SH(L) is also regular. (Note: This problem is quite difficult, it is recommended that you complete the rest of the practical before attempting to solve it.) 2
Problem 1 is worth 10% of your mark, Problems 2 and 3 are worth 15% each, and Problem 4 is worth 10%. Submit your answers to Part A following the instructions given to you by your tutor. You must submit before the deadline of 5pm on Monday 19th November. Part B [50] The directory for this practical contains a partially implemented Java application, Prac2, for simulating finite automata. Your task is to complete this application and apply it to a pattern matching problem. The files and Java classes The directory contains the following files: A file FA.java, which contains the code of an abstract class FA implementing most of the features of finite automata. States of finite automata are represented by String objects, and letters of the alphabet by chars. The most important public methods of the class are: the constructor method public FA(), which creates an empty automaton object, the constructor method public FA(String filename), which reads an automaton from a file (the format of this file is explained below), a number of public methods for modifying an existing automaton object, for example a method public void addstate(string q) that adds a state to an automaton, or a method public void setstart- State(String q) that defines the start state, a number of methods for obtaining information about an existing automaton object, for example a method public boolean isstate(string q) that test if string q represents a state of the automaton, a method public String tostring() that writes the whole automaton into a string in the same format as the file used in the constructor. For details, look at the source code. Files DFA.java and NFA.java, which contain the code of two subclasses DFA and NFA of FA implementing the remaining features of DFAs and NFAs, respectively. The reason for this design is that DFAs and NFAs share most of their basic functionality, but have different types of transition functions. So most features related to the transitions have to be implemented separately. Most notably, the method public String getnextstate(string q,char c) 3
of the class DFA returns an object of type String representing the next state of a DFA when reading c in state q, and the method public LLSet getnextstates(string q,char c) of the class NFA returns an object of type LLSet representing the set of possible next states of an NFA when reading c in state q. A file LLSet.java, which contains the implementation of a class LLSet that can be used to represent sets. The implementation is based on linked lists; technically, LLSet is a subclass of class LinkedList. The important property of sets, as opposed to simple linked lists, is that no object can occur twice in a set. We use LLSets to represent sets of next states in an NFA. A file FAGui.java which contains the code of the graphical user interface. A file Prac2.java which contains the main method of the application. Files RunDFA.java and NFA2DFA.java in which you are supposed to implement the following three methods: public static boolean accepts(dfa M,String s), public static LinkedList extract(dfa M,String filename), public static DFA convert(nfa N). A file TokExample.java which contains a class illustrating the use of the Java class StreamTokenizer. Example files nfa1, nfa1a, dfa1, dfa2, which contain a few example automata. You should begin by copying all the files to a new directory in your own filespace (and remember to read-protect the directory). 1 You can then compile the, currently incomplete, application: javac Prac2.java To execute it type java Prac2. 1 Look at the CS2 Practical 1 handout if you don t remember how to do these things. 4
0 9 1 1 0 2 q0 q1 q2 q3 q4 Figure 1: The NFA stored in file nfa1 The graphical user interface Starting the application produces a GUI containing an input field, a result field, and six buttons with the following functionalities: Load NFA: Asks you to enter a filename in the input field, then loads the NFA specified in this file and displays it in the result field. Load DFA: Same for DFA instead of NFA. Save DFA: Asks you to enter a filename in the input field and then saves the currently loaded DFA into a file with that name. Save Result: Asks you to enter a filename in the input field and then saves the current content of the result field into a file with that name. Convert: Converts the NFA currently loaded into a DFA accepting the same language and displays this DFA in the result field. This functionality is not yet implemented. Run DFA: Asks you to enter a string in the input field and tests if the DFA currently loaded accepts this string. This functionality is not yet implemented. Extract: Asks you to enter a filename in the input field and displays all words in the file with this name that are accepted by the DFA currently loaded in the result window. This functionality is not yet implemented. Quit: Quits the application. The file format The format in which finite automata are stored for this application is largely selfexplanatory. Open the file nfa1. It contains a description of the automaton in Figure 1. Since it is often tedious to write out the full alphabet, it is allowed to represent intervals of consecutive letters using a dash, as in A-Z. This is allowed both in the alphabet and transitions section of the file. Open the file nfa1a to see how this allows a more compact representation of the automaton in Figure 1. 5
In general, a file representing a DFA or an NFA consists of: A field enclosed by tags <states> and </states> which contains strings representing the states of the automaton, separated by whitespaces. Strings representing states may consist of all printable ASCII symbols except (blank), or more precisely, of all ASCII symbols whose code is between 33 and 126. A field enclosed by tags <alphabet> and </alphabet> which contains either single characters (whose ASCII code is between 33 and 126) representing letters of the alphabet of the automaton or expressions of the form c 1 -c 2, where c 1 and c 2 are characters, representing all characters whose ASCII code is between that of c 1 and c 2. Note that no whitespaces are allowed between the dash and the enclosing letters. A field enclosed by tags <startstate> and </startstate> which contains a string representing the start state of the automaton. A field enclosed by tags <finalstates> and </finalstates> which contains strings representing the final states of the automaton, separated by whitespaces. A field enclosed by tags <transitions> and </transitions> which contains the transitions of the automaton. Transitions are represented by lines q1 c q2 saying that if the automaton is in state q1 and reads character c it may proceed to state q2. Similarly to the alphabet section, transitions may also be represented in the form q1 c 1 -c 2 q2 saying that if the automaton is in state q1 and reads any character c whose ASCII code is between that of c 1 and c 2 it may proceed to state q2. As a final example, look at the file dfa1, which contains a DFA accepting the same language as the NFA in nfa1. Your Tasks 1. Implement the method public static boolean accepts(dfa M,String s) of the class RunDFA (in the file RunDFA.java). Given an object M of type DFA and an object s of type String, this method is supposed to return true if the DFA represented by M accepts the string s and false otherwise. 2. Implement the method public static LinkedList extract(dfa M,String filename) 6
of the class RunDFA (in the file RunDFA.java). Given an object M of type DFA, representing a DFA whose alphabet is {A,..., Z, a,..., z}, and an object filename of type String, this method is supposed to read the textfile specified by filename, split it into words that only contain letters (i.e., characters between A and Z and between a and z ), and return a linked list consisting of all strings accepted by the DFA represented by M. 3. Implement the method public static DFA convert(nfa N) of the class NFA2DFA (in the file NFA2DFA.java). Given an object N of type NFA, this method is supposed to return an object M of type DFA such that the NFA represented by N and the DFA represented by M recognise the same language. 4. Specify an NFA N that accepts the language L consisting of all strings over the alphabet Σ = {A, B,..., Z, a, b,..., z} that contain the letters a, b, c in any order. (Thus for example, tobacco and subtraction are in L, and char is not.) Write N into a file abc-nfa in the format specified above. Use the convert function to transform it into an equivalent DFA, and save this DFA in a file abc-dfa. Then use the extract function to extract all words that contain the letters a,b,c from the file /usr/dict/words. Save the result in a file abc-words. When implementing the methods, it is advisable to split the tasks into subtask handled by separate methods (which you can implement as private methods of the respective classes). It is important that you comment your code appropriately. A complete solution consists of the (modified) files RunDFA.java and NFA2DFA.java and the files abc-nfa and abc-words. Ensure that all your files contain your name, email address, tutor s name, and submission date. Submit your solution using the commands handin 2 RunDFA.java handin 2 NFA2DFA.java handin 2 abc-nfa handin 2 abc-words You may do this any time after 5pm on Monday 12th November, up until the deadline of 5pm on Monday 19th November. Bear in mind that the computer labs are likely to be busy just before the practical deadline. It is up to you to ensure that you plan your work in order to submit in time. (Note that if you modify your code once you have submitted then you can resubmit and your earlier submission will be overwritten.) 7
Assessment Assessment for part B will include: marks assigned on the basis of how your program behaves on a test suite, and also marks for good programming style (including commenting the code). To get reasonable marks: You must submit code that compiles. You should test your code yourself on a number of suitably chosen test cases to ensure that it works. You must submit following the instructions above. (I.e. you must give the files the correct filenames, and you must use the handin command.) Tasks 1, 2, and 4 are worth 10% of your mark each and tasks 3 is worth 20%. Hints After you have implemented task 1, test the accept functionality by running the example DFAs, for instance those in the files dfa1 and dfa2, on a couple of example strings. To process the input file for task 2, you may want to use a StreamTokenizer. The file TokExample.java gives an example of how this can be done. Also recall CS1 Lecture Notes 25 and 27. Consult the Java API documentation to learn more about StreamTokenizers. For task 3, use the method described in Lecture Note 3. Recall that in this method, the states of the DFA you obtain are sets of states of the original NFA. Sets can be represented as objects of the class LLSet (defined in the file LLSet.java). Since the states of an NFA are strings, you have to turn the LLSets into strings using the method LLSet.toString(). Familiarise yourselves with the Java class java.util.linkedlist and its subclass LLSet. Recall CS1 Lecture Note 26 on Lists and Iterators and also look at the Java API documentation. Note that we are exclusively dealing with NFAs without ε-transitions. This simplifies the conversion to DFAs, because in this case for every state q a of an NFA and every letter a of its alphabet, the set s with q s (in the notation of Lecture Note 3) is simply the q, a-entry in the transition table, that is, the set returned by the method getnextstates of the class NFA. To see the result of an example conversion, consider the file nfa1 describing an NFA and the file dfa1 showing the result of converting this NFA into a DFA recognising the same language. 8
To test whether your conversion algorithm works properly, you can (but do not have to) use the method public boolean isdeterministic() of the class NFA. Do not make changes to any classes except RunDFA and DFA2NFA, or your submitted coded may not work properly when we compile it and test it with the original classes. Martin Grohe 9