CSCI E 119 Section Notes Section 11 Solutions

CSCI E 119 Section Notes Section 11 Solutions 1. Double Hashing Suppose we have a 7 element hash table, and we wish to insert following words: apple, cat, anvil, boy, bag, dog, cup, down We use hash functions: h1(key) = index related to first letter of the word ( a = 0, b = 1, ) h2(key) = length of the word (ex. h2( apple ) = 5) Let s go through inserting elements using double hashing and count the total length of the probes: First we insert apple. h1( apple ) = 0 is not occupied. Total probe length = 1: 1 _ 2 _ 5 _ Next we insert cat. h1( cat ) = 2 is not occupied. Total probe length = 1 + 1 = 2: 1 _ 5 _ Next we insert anvil. h1( anvil ) = 0 which is occupied. h2( anvil ) = 5. 0 + 5 is not occupied. Total probe length = 2 + 2 = 4: 1 _ Next we insert boy. h1( boy ) = 1 which is not occupied. Total probe length = 4 + 1 = 5:

Next we insert bag. h1( bag ) = 1 which is occupied. h2( bag ) = 3. 1 + 3 = 4 is unoccupied. Total probe length = 5 + 2 = 7: 4 bag Next we insert dog. h1( dog ) = 3 which is not occupied. Total probe length = 7 + 1 = 8: 3 dog 4 bag Next we insert cup. h1( cup ) = 2 which is occupied. h2( cup ) = 3. 2 + 3 = 5 is occupied. 2 + 2*3 = 8 = 1 is occupied. 2 + 3*3 = 11 = 4 is occupied. 2 + 4*3 = 14 = 0 is occupied. 2 + 5*3 = 17 = 3 is occupied. 2 + 6*3 = 20 = 6 is not occupied. Total probe length = 8 + 7 = 15: 3 dog 4 bag 6 cup Again, we cannot insert down because the table is full. Total probe length = 15 + 7 = 22. 2. The probe() method in our HashTable class (REVISITED) The return value of the probe() method is an integer. In some cases, it represents the index of the key that we re searching for. In other cases, it represents the index of the first empty or removed cell encountered during the search for the specified key. 0 aardvark 1 2 cat 3 bear 4 5 dog 6 The hashtable above has been partially filled using linear probing and the hash function h1 from problem 1. A gray cell indicates that an item has been removed.

One of the items in the table has been inserted incorrectly. Which one, and how do you know? dog is misplaced. Its hash code is 3, because it begins with d. Position 3 may have been filled when it was inserted, which explains why it wasn t put there. However, because position 4 is empty, it should have been inserted there, and it wasn t. Note that position 4 could not have been previously occupied, because it isn t gray. For each of the keys below, determine: i. the probe length ii. the return value of the probe() method Assume that none of these keys are actually inserted in the table. a. bear h1( bear ) = 1. Position 1 is a removed cell, so the probe() method takes note of that and continues probing. Position 2 is filled with a different key, so it moves on to position 3, which contains the key we are searching for. Thus, the method returns 3. Probe length = 3 (position 1, 2, and 3). b. cow h1( cow ) = 2. Position 2 is filled with a different key, so the probe() method moves on to position 3, which is also filled with a different key. Position 4 is empty, so the probe() method breaks out of the while loop and returns 4. Probe length = 3. c. buffalo h1( buffalo ) = 1. Position 1 is a removed cell, so the probe() method takes note of that and continues probing. Position 2 is filled with a different key, so it moves on to position 3, which is also filled with a different key. Position 4 is empty, so the probe() method breaks out of the while loop. Because it encountered a removed cell (position 1), it returns its position, so that a newly inserted value could be put there. Return value = 1. Probe length = 4. d. giraffe h1( giraffe ) = 6. Position 6 is a removed cell, so the probe() method takes note of that and moves on to position (6 + 1) % 7 = 0, which is filled with a different key, so it moves on to position 1. Position 1 is also a removed cell, but it is not the first one encountered, so the probe() method does not record its position, but moves on to position 2. Position 2 is filled with a different key, so it moves on to position 3, which is also filled with a different key. Position 4 is empty, so the probe() method breaks out of the while loop. It returns the position of the first encountered removed cell. Return value = 6. Probe length = 6. What is the largest probe length that we could have for this table, regardless of its contents? 7 the length of the table. After 7 positions, the probe sequence repeats, so the probe() method will give up after trying 7 positions.

3. Comparing data structures A local retailer wants to implement a simple in memory database that can be used to access information about products. Although a snapshot of this database will be periodically copied to disk, the entire contents fit in memory, and your component of the application will operate only on data stored in memory. Here are the requirements specified by the retailer: She wants to be able to retrieve product records by specifying the name of the product. She wants to be able to specify the first n characters of a product name and to retrieve all records that begin with those characters. She wants the record retrieval to be as efficient as possible on the order of 20 operations per retrieval, given a database of approximately one million records. She wants to be able to increase the size of the database adding large sets of new records without taking the system offline. Given this list of requirements, which data structure would be the better choice for this application, a binary search tree or a hash table or would these two data structures work equally well? Let s consider each of the criteria in turn: 1) Retrieving product records by specifying name of the product Search Tree: Assuming we used a balanced search tree, this takes O(log n). If the tree is unbalanced this could take O(n). Hash Table: This should take constant time as long as there are not too many collisions, but it could in theory be O(n) if the hash function doesn t work well or the table becomes too full. 2) Specifing the first n characters of a product name and retrieving all records that begin with these characters: Search Tree: While in the worst case, we have to go through the entire tree, if the tree is balanced, we should be able to prune much of the search space. Worst case O(n). Best case is much better than O(n). Hash Table: This is difficult since we probably have to go through the entire table (depending on the hash function used). Most likely O(n). 3) Required time to retrieve: Search Tree: one million ~= 2^20, so O(log n) = 20, which is within the specifications. Hash Table: O(1), but could approach or exceed 20 if the hash function doesn t work well or the table becomes too full that is there are many collisions. 4) Increasing the size of the database: Search Tree: O(m log n) in the best case, where m is the number of records they want to add. O(mn) in the worst case. Can be done without taking the system offline. Hash Table: potentially O(m + n), because you may need to resize the hash table, and then copy the existing records and add the new ones which takes O(m+n) steps. Additionally this may require taking the system offline while the existing records are copied over to the new table. Therefore, it seems that given the criteria, a search tree would work best due to the ability to retrieve the first n characters of a product name without going through the entire search tree, and the ability to add an arbitrary number of records without resizing or going offline. While the hash

table has the potential for constant insertion and lookup time, this is not much better than O(log n), especially when n is one million. 4. Graph Terminology and Representation Consider the highway graph from lecture: 84 Portland 39 Concord Albany 63 134 74 Worcester 83 44 Portsmouth 54 Boston 42 49 New York 185 Providence What are Worcester s neighbors in the graph? Albany, Boston, Concord, Portsmouth, and Providence, because it is connected to each of them by a single edge. Is the graph connected? Why or why not? Yes, because there is a path between every pair of vertices. Is it complete? Why or why not? No, because there isn t an edge between every pair of vertices. For example, there is no edge between Albany and Boston. Is it acyclic? If not, what is one example of a cycle in the graph? No. One example of a cycle is the path Worcester Boston Providence Worcester. If we used an adjacency matrix to represent this graph, what would it look like? Assume that the vertices are numbered alphabetically: 0 = Albany, 1 = Boston, 2 = Concord, 3 = New York, 4 = Portland, 5 = Portsmouth, 6 = Providence, 7 = Worcester 0 1 2 3 4 5 6 7 0 134 1 74 54 49 44 2 74 84 63 3 185 4 84 39 5 54 39 6 49 185 42 7 134 44 63 42 All of the empty cells would hold a special value indicating the absence of an edge.

5. Graph Traversals Let s try some additional traversals on the highway graph from lecture. a. What order would the cities be visited in if we performed a depth first traversal from Boston, and what is the resulting spanning tree? (Draw the spanning tree below)? Order visited: Boston, Worcester, Providence, New York, Concord, Portland, Portsmouth, Albany. Steps: 1) dftrav(boston, null): visit Boston, set its parent reference to null, and make a recursive call on the unvisited neighbor that is the smallest distance away (Worcester). 2) dftrav(worcester, Boston): visit Worcester, set its parent reference to Boston, and make a recursive call on the unvisited neighbor that is the smallest distance away (Providence). 3) dftrav(providence, Worcester): visit Providence, set its parent reference to Worcester, and make a recursive call on the unvisited neighbor that is the smallest distance away (New York). 4) dftrav(new York, Providence): visit New York, set its parent reference to Providence. It has no unvisited neighbors, so we return. 5) Providence has no other unvisited neighbors, so we return. 6) Worcester still has unvisited neighbors. Make a recursive call on the unvisited neighbor that is the smallest distance away (Concord). 7) dftrav(concord, Worcester): visit Concord, set its parent reference to Worcester, and make a recursive call on the unvisited neighbor that is the smallest distance away (Portland). 8) dftrav(portland, Concord): visit Portland, set its parent reference to Concord, and make a recursive call on the unvisited neighbor that is the smallest distance away (Portsmouth). 9) dftrav(portsmouth, Portland): visit Portsmouth, set its parent reference to Portland. It has no unvisited neighbors, so we return. 10) Portland has no other unvisited neighbors, so we return. 11) Concord has no other unvisited neighbors, so we return. 12) Worcester still has one unvisited neighbor Albany. Make a recursive call on it. 13) dftrav(albany, Worcester): visit Albany, set its parent reference to Worcester. It has no unvisited neighbors, so we return. 14) Worcester has no other unvisited neighbors, so we return. 15) Boston has no other unvisited neighbors, so we return from the original invocation.

b. What order would the cities be visited in if we performed a breadth first traversal from Boston, and what is the resulting spanning tree? (Draw the spanning tree below.) Step 2: Remove 8, Place 4 at the root and sift: Order visited: Boston, Worcester, Providence, Portsmouth, Concord, Albany, New York, Portland. 7 6 Evolution of the queue: remove insert contents Bos Bos Bos Worc, Prov, Portsmouth, Conc Worc, Prov, Portsmouth, Conc Worc Alb Prov, Portsmouth, Conc, Alb Prov NY Portsmouth, Conc, Alb, NY Portsmouth Portland Conc, Alb, NY, Portland Conc none (no unencountered neighbors) Alb, NY, Portland Alb none NY, Portland NY none Portland Portland none empty Cities are marked as encountered before they are inserted in the queue, and their parent reference is set to the city that was just removed from the queue. Cities are visited upon removal from the queue.