CS18 Integrated Introduction to Computer Science Fisler, Nelson Contents Homework 4: Hash Tables Due: 5:00 PM, Mar 9, 2018 1 DIY Grep 2 2 Chaining Hash Tables 4 3 Hash Table Iterator 5 Objectives By the end of this homework, you will know: ˆ how to iterate over the elements a hash table By the end of this homework, you will be able to: ˆ implement a hash table ˆ use hash tables to represent sets (as well as dictionaries) How to Hand In For this (and all) homework assignments, you should hand in answers for all the non-practice questions. For this homework specifically, this entails answering the Hash Tables, Hash Table Iterator, and DIY Grep questions. In order to hand in your solutions to these problems, they must be stored in appropriately-named files with the appropriate package header in an appropriately-named directory. The source code files should comprise the hw04.src package, and your solution code files, the hw04.sol package. Begin by copying the source code from the course directory to your own personal directory. That is, copy the following files from /course/cs0180/src/hw04/src/*.java to /course/ cs0180/workspace/javaproject/src/hw04/src: ˆ IDictionary.java, containing public interface IDictionary<K,V> ˆ KeyNotFoundException.java, containing public class KeyNotFoundException ˆ KeyAlreadyExistsException.java, containing public class KeyAlreadyExistsException ˆ AbsHashTable.java, containing public abstract class AbsHashTable<K,V>.
ˆ IGrep.java, containing public interface IGrep. Do not alter these files! After completing this assignment, the following solution files should be in your /course/cs0180/workspace/javaproject/sol/hw04/sol directory: ˆ DIY Grep Grep.java, containing public class Grep, which implements the interface IGrep. GrepTest.txt, which contains documentation of all of your tests for your Grep class. ˆ Hash Tables Chaining.java, containing public class Chaining<K,V>, which extends AbsHashTable<K,V>. HashTableTester.java containing public class HashTableTester, which tests your hash table implementation. ˆ Hash Table Iterator Iterator.tex, which contains your answers to the Hash Table Iterator question. To hand in your files, navigate to the /course/cs0180/workspace/javaproject/ directory, and run the command cs018 handin hw04. This will automatically hand in all of the above files. Once you have handed in your homework, you should receive an email, more or less immediately, confirming that fact. If you don t receive this email, try handing in again, or ask the TAs what went wrong. Java s Built-in Hash Tables In this homework, you will be completing an implementation of a chaining hash table. For the problem that does not involve implementing hash tables, 1 you can (and should) make use of one of Java s built-in hash tables, specifically, HashMap or HashSet. These classes differ in that the former uses a hash table to represent a dictionary, 2 while the latter uses a hash table to represent a set. Observe that a set is a special case of a dictionary; the keys are the elements of the set, and there are no values. 3 Consequently, it is straightforward to use a hash table to represent a set. You can use Java s HashMap class by importing java.util.hashmap. Documentation on how to use Java s HashMap can be found here. Likewise, you can use Java s HashSet class by importing java.util.hashset. Documentation on how to use Java s HashSet can be found here. 1 DIY Grep 2 Also called a map; however, we refrain from using this term in CS 18, since it means something entirely different in CS 17/ CS 19. 3 Recall the invariant that dictionaries do not allow duplicate keys. That is why it makes sense to view sets as dictionaries. 2
Problems 1 DIY Grep Suppose you want to search a file for a word and retrieve all the line numbers on which that word appears. If you know that you ll have to support lots of queries, you ll probably want to preprocess your file to make it easier to look words up. Task: Explain how you can preprocess a file to make it easy to look up the line numbers associated with each word in the file. Hint: Use a dictionary. Note: Write your answer to this question at the top of the Grep class, which the next task asks you to write. Task: Write a class Grep with a constructor and a single method, lookup. Your constructor should take as input a filename and perform any necessary preprocessing. The lookup method should take as input a word and return the line numbers on which that word appears in the file. It should operate in expected constant time. As noted above, you can (and should) make use of Java s built-in hash table data structure, which is called HashMap, to solve this problem. Notes: ˆ If the same word appears more than once on the same line, you should include the line number only once in your output. ˆ You should treat words as sequences of characters separated by whitespace; so "flow" and "flow!" are distinct words. Also, you can assume words are case sensitive; so "flower" and "Flower" are distinct words. Hints: ˆ As part of the preprocessing step, you may want to use the split method in the String class, which splits up a string into pieces each time it encounters a specific character, and stores those pieces in an array of strings. ˆ You might find the LineNumberReader class, which extends BufferedReader, useful. It has a method getlinenumber that gets the current line number. ˆ The Java syntax to declare a HashMap that maps, for example, from a String to a Set of Integers is new HashMap<String, Set<Integer>>(). Likewise, the syntax to declare a HashSet of Integers is new HashSet<Integer>(). Task: Write a main method for the Grep class. The String[] args should correspond to a file name, and then at least 1 other word to look for, in that order. For example, running java hw04.sol.grep /course/cs018/src/hw04/iliad tree water cats should print something like: 3
tree found on lines: 1353 water found on lines: 686 7260 9731 15877 17749 20584 cats is not found Non-DIY Grep What you ve just written is very similar to one of the functions of the UNIX command grep, an extremely powerful and useful tool. You can use grep to search a file for a given pattern, and report where that pattern appears in the file, as follows: grep -n <pattern> <file> The -n is an optional parameter that tells grep that we want it to report line numbers. It prints out each line of text next to the line number. This is but one of many, many grep features, which are fully documented on its man page (which you can access, if you want to learn more, by typing man grep into a terminal). Here s a couple of examples of how you d use grep with this line number option: grep -n polyp myfile This command would print out something like: 1: Lyn likes the word polyp. 4: polyp 9: Lyn needs to stop using the word polyp all the time. Task: Test your Grep class. Document all of your tests in a file called GrepTest.txt. This file should have what you input to your Grep class, what the output was, and a short explanation (a sentence or less) of what you were trying to test with that input. Note: As usual, you should make sure your testing is exhaustive. unexpected program inputs. Test edge cases, and test You cannot easily test your implementation of grep using the Tester library, as you have become accustomed to doing in CS 18. But you can use grep to test your program, simply by comparing the output of your program with the output of grep. Hint: We ve included several test files (thankfully, none of which is the file shown above) in /course /cs018/src/poems/. But do not be afraid to create your own text files for testing, especially to try to catch edge cases! If you do this, make sure to hand in these files with the rest of your code. 2 Chaining Hash Tables In this problem, you will implement a hash table using chaining. Recall that the internal data structure of a hash table implemented using chaining is an array of lists. Although you have built your very own implementation of mutable linked lists by now, we recommend (insist, actually) that you use Java s built-in mutable lists, so that you get some practice using Java s mutable list interface. Java s LinkedList class lives in the java.util library. 4
In Java, you cannot create an array of a generic type. In order to circumvent this limitation, you should do the following to create your hash table: this.data = (LinkedList<KVPair<K,V>>[]) new LinkedList[size]; This situation is one of a few exceptions (equals is another) when casting is acceptable; in general, however, it s still discouraged. This line of code will generate a warning that there is an unchecked cast. To get rid of the warning, above any methods that cast like this, you should write: @SuppressWarnings("unchecked") Typically, warnings such as these indicate a problem with your code, so you should not suppress them; you should pay attention to them! In this specific instance, however, we justify its use as we are trying to get around a Java limitation. Task: Extend AbsHashTable<K,V> with a class Chaining<K, V>. The Chaining constructor should take as input a size variable, which specifies the size of the table. Hint: The key method to write in Chaining is findkvpair. Write this method first, and use it as a helper when you write insert and delete. Task: Test your Chaining class thoroughly. Since your class is generic, be sure to use multiple types in your testing. Additionally, since we are working with a mutable data structure, be sure to have setup methods. Put your tests in the HashTableTester.java file. 3 Hash Table Iterator Note: For this problem, you need not submit any Java code. A high-level description of each algorithm is enough. Whenever you implement a class, you should always override equals, tostring, and hashcode. Whenever you implement a collection, in addition, you should override iterator. We did this successfully for linked lists, and in this problem, you will think about how to do this for hash tables. (You may assume that the hash table does not change while you are iterating; no items are inserted or deleted.) Note: While not part of the assignment, take a moment to think about how you would use an iterator to write equals and tostring for hash tables. First, let s think about how an iterator for a chaining hash table might work. As a first attempt, you could try iterating over all the slots in the hash table: if a slot is empty, you can skip right over it; but if it is not empty, you would then iterate over the bucket stored at that slot. Task: The iterator we just described would examine all n slots in the chaining hash table, even though there might not be data stored at all, or even most, of them. Explain how to implement a more efficient chaining iterator that only examines slots which store data. Your iterator should not affect the run time of the hash table s basic operations; that is, the run times of lookup, insert, update, and delete should not change. 5
Hint: Consider augmenting the key-value pairs stored in your dictionary with additional fields. Task: Discuss the trade-offs between the naive iterator we proposed, and your iterator design. Please let us know if you find any mistakes, inconsistencies, or confusing language in this or any other CS18 document by filling out the anonymous feedback form: http://cs.brown.edu/ courses/cs018/feedback. 6