9.2 Hash Tables. When implementing a map with a hash table, the goal is to store item (k, e) at index i = h(k)

Size: px

Start display at page:

Download "9.2 Hash Tables. When implementing a map with a hash table, the goal is to store item (k, e) at index i = h(k)"

Constance James
5 years ago
Views:

1 9.2 Hash Tables A hash function h() maps keys of a given type to integers in a fixed interval [0, N - 1] Example: h(x) = x mod N is a hash function for integer keys The integer h(x) is called the hash value of key x A hash table for a given key type consists of Hash function h() Array (called table) of size N When implementing a map with a hash table, the goal is to store item (k, e) at index i = h(k) DS 9-1

2 Bucket Array Bucket Arrays a bucket array for a hash table is an array A of size N, where each cell of A is thought of as a bucket (collection of keyvalue pairs), and the integer N defines the capacity of the array DS 9-2

3 Hash Functions A hash function is usually specified as the composition of two functions: Hash code: h 1 : keys integers Compression function: h 2 : integers [0, N 1] The hash code is applied first, and the compression function is applied next on the result, i.e., h(x) = h 2 (h 1 (x)) The goal of the hash function is to disperse the keys in an apparently random way DS 9-3

4 Hash Codes Memory address: We reinterpret the memory address of the key object as an integer Good in general, except for numeric and string keys Integer cast: We reinterpret the bits of the key as an integer Suitable for keys of length less than or equal to the number of bits of the integer type (e.g., byte, short, int and float in C++) Component sum: We partition the bits of the key into components of fixed length (e.g., 16 or 32 bits) and we sum the components (ignoring overflows) Suitable for numeric keys of fixed length greater than or equal to the number of bits of the integer type (e.g., long and double in C++) DS 9-4

5 Polynomial Hash Codes Polynomial accumulation: We partition the bits of the key into a sequence of components of fixed length (e.g., 8, 16 or 32 bits) x 0 a k-1 + x 1 a k x k-2 a + x k 1 We evaluate the polynomial p(a) = x k 1 + a(x k a(x 2 + a(x 1 + ax 0 )) ) at a fixed value a, ignoring overflows Especially suitable for strings (e.g., the choice a = 33 gives at most 6 collisions on a set of 50,000 English words) Polynomial p(a) can be evaluated in O(n) time using Horner s rule: The following polynomials are successively computed, each from the previous one in O(1) time p 0 (a) = a n 1 p i (a) = a n i 1 + zp i 1 (a) (i = 1, 2,, n 1) DS 9-5

6 Cyclic Shift Hash Codes Cyclic Shift Hash Code int hashcode (const char* p, int len) { unsigned int h = 0; for (int i=0; i< len; i++) { h = (h << 5) (h >> 27); h += (unsigned int) p[i]; } return h } Experimental results shift comparisons of the number of collisions for various shift amounts for 25,000 English words collisions collisions shift total max total max DS 9-6

7 Compression Functions Division: h (k) = k mod N The size N of the hash table is usually chosen to be a prime The reason has to do with number theory and is beyond the scope of this course Multiply, Add and Divide (MAD): h (k) = ak + b mod N a and b are nonnegative integers such that a mod N 0 Otherwise, every integer would map to the same value b DS 9-7

8 Collision Handling Collisions occur when different elements are mapped to the same cell Separate Chaining: let each cell in the table point to a linked list of entries that map there Separate chaining is simple, but requires additional memory outside the table DS 9-8

9 Map with Separate Chaining Delegate operations to a list-based map at each cell: Algorithm find(k): Output: the position of the matching entry of the map, or end if there is no key k in the map return A[h(k)].find(k) // delegate the find(k) to the list-based map at A[h(k)] Algorithm put(k,v): p = A[h(k)].put(k,v) // delegate the put(k,v) to the list-based map at A[h(k)] n = n + 1 return p Algorithm erase(k): Output: none A[h(k)].erase(k) // delegate the erase(k) to the list-based map at A[h(k)] n = n - 1 DS 9-9

10 Linear Probing Open addressing: the colliding item is placed in a different cell of the table Linear probing: handles collisions by placing the colliding item in the next (circularly) available table cell Each table cell inspection is referred to as a probe Colliding items lump together, causing future collisions to cause a longer sequence of probes Example: h(x) = x mod 11 Insert new element with key = 15 DS 9-10

11 quadratic probing Quadratic Probing iteratively try the buckets A[(i + f(j)) mod N], for j=0, 1, 2, f(j) = j 2 as linear probing, the quadratic-probing strategy complicates the removal operation, but it does not avoid the kinds of clustering patterns that occur with linear probing it creates its own kind of clustering, called secondary clustering DS 9-11

12 Double Hashing Double hashing uses a secondary hash function h (k) if h maps some key k to a bucket A[i], with i = h(k), that is already occupied, then iteratively try the buckets A[(i + f(j) mode N] next for i=1, 2, 3,, where f(j) = j h (k) The secondary hash function h (k) is not allowed to evaluate to zero common choice: h (k) = q ( k mod q), for some prime number q < N The table size N must be a prime number to allow probing of all the cells DS 9-12

13 Load factor In the worst case, searches, insertions and removals on a hash table take O(n) time The worst case occurs when all the keys inserted into the map collide The load factor λ = n/n affects the performance of a hash table Assuming that the hash values are like random numbers, it can be shown that the expected number of probes for an insertion with open addressing is 1 / (1 λ) The expected running time of all the dictionary ADT operations in a hash table is O(1) In practice, hashing is very fast provided the load factor is not close to 100% DS 9-13

14 Re-hashing Resizing if the load factor of a hash table goes significantly above a specified threshold, then it is common to resize the hash table with the new array s size be at least double the previous size once we have allocated a new bucket array, we must define a new hash function to go with it given this new hash function, we then reinsert every item from the old array into the new array using this new hash function DS 9-14

15 HashMap (1) C++ Hash Map Implementation /** HashMap.h (1) */ #ifndef HASHMAP_H #define HASHMAP_H #include <list> #include <vector> #include "Entry.h" #include "Exceptions.h" template <typename K, typename V, typename H> class HashMap { public: // public types typedef Entry<const K,V> Entry; // a (key, value) pair class Iterator; public: // public functions HashMap(int capacity = 101); // constructor int size() const; // number of entries bool empty() const; // is the map empty? DS 9-15

16 HashMap (2) /** HashMap.h (2) */ Iterator find(const K& k); // find entry with key k Iterator put(const K& k, const V& v); // insert/replace (k,v) void erase(const K& k); // remove entry with key k void erase(const Iterator& p); // erase entry at p Iterator begin(); // iterator to first entry Iterator end(); // iterator to end entry protected: // protected types typedef std::list<entry> Bucket; // a bucket of entries for separate chaining typedef std::vector<bucket> BktArray; // a bucket array for hash table // HashMap utilities here Iterator finder(const K& k); // find utility Iterator inserter(const Iterator& p, const Entry& e); // insert utility void eraser(const Iterator& p); // remove utility typedef typename BktArray::iterator BItor; // bucket iterator typedef typename Bucket::iterator EItor; // entry iterator static void nextentry(iterator& p) // bucket's next entry { ++p.ent; } static bool endofbkt(const Iterator& p) // end of bucket? { return p.ent == p.bkt->end(); } DS 9-16

17 HashMap (3) /** HashMap.h (3) */ private: int n; // number of entries H hash; // the hash comparator BktArray B; // bucket array (Hash Table) public: // public types // Iterator class declaration class Iterator { // an HashMap::Iterator (& position) private: const BktArray* ba; // which bucket array in the hash table BItor bkt; // which bucket in the bucket array (hash table) EItor ent; // which entry in the bucket public: Iterator(const BktArray& a, const BItor& b, const EItor& q = EItor()) : ba(&a), bkt(b), ent(q) { } Iterator() {} // default constructor Entry& operator*() const; // get entry bool operator==(const Iterator& p) const; // are iterators equal? bool operator!=(const Iterator& p) const; // are iterators different? Iterator& operator++(); // advance to next entry friend class HashMap; // give HashMap access }; }; #endif DS 9-17

18 Bucket and Bucket Array std::list<entry> Bucket std::vector<bucket> BktArray BItor typedef std::vector<bucket> BktArray B[0] B[1] B[99] EItor typedef std::list<entry> Bucket DS 9-18

19 HashMap (4) /** HashMap.cpp (1) */ #include <iostream> #include "HashMap.h" #include "Entry.h" using namespace std; template <typename K, typename V, typename H> HashMap<K,V,H>::HashMap(int capacity) : n(0), B(capacity) { } template <typename K, typename V, typename H> int HashMap<K,V,H>::size() const { return n; } template <typename K, typename V, typename H> bool HashMap<K,V,H>::empty() const { return size() == 0; } // constructor // number of entries // is the map empty? DS 9-19

20 HashMap (5) /** HashMap.cpp (2) */ template <typename K, typename V, typename H> // iterator to front typename HashMap<K,V,H>::Iterator HashMap<K,V,H>::begin() { if (empty()) return end(); // emtpty - return end BItor bkt = B.begin(); // else search for an entry while (bkt->empty()) ++bkt; // find nonempty bucket return Iterator(B, bkt, bkt->begin()); // return first of bucket } template <typename K, typename V, typename H> // iterator to end typename HashMap<K,V,H>::Iterator HashMap<K,V,H>::end() { return Iterator(B, B.end()); } template <typename K, typename V, typename H> // get entry typename HashMap<K,V,H>::Entry& HashMap<K,V,H>::Iterator::operator*() const { return *ent; } template <typename K, typename V, typename H> // are iterators equal? bool HashMap<K,V,H>::Iterator::operator==(const Iterator& p) const { if (ba!= p.ba bkt!= p.bkt) return false; // ba or bkt differ? else if (bkt == ba->end()) return true; // both at the end? else return (ent == p.ent); // else use entry to decide } DS 9-20

21 HashMap (6) /** HashMap.cpp (3) */ template <typename K, typename V, typename H> // are iterators equal? bool HashMap<K,V,H>::Iterator::operator!=(const Iterator& p) const { if (ba!= p.ba bkt!= p.bkt) return true; // ba or bkt differ? else if (bkt == ba->end()) return false; // both at the end? else return (ent!= p.ent); // else use entry to decide } template <typename K, typename V, typename H> // advance to next entry typename HashMap<K,V,H>::Iterator& HashMap<K,V,H>::Iterator::operator++() { ++ent; // next entry in bucket if (endofbkt(*this)) { // at end of bucket? ++bkt; // go to next bucket while (bkt!= ba->end() && bkt->empty()) // find nonempty bucket ++bkt; if (bkt == ba->end()) return *this; // end of bucket array? ent = bkt->begin(); // first nonempty entry } return *this; // return self } DS 9-21

22 HashMap (7) /** HashMap.cpp (4) */ template <typename K, typename V, typename H> // find utility typename HashMap<K,V,H>::Iterator HashMap<K,V,H>::finder(const K& k) { int i = hash(k) % B.size(); // get hash index i #ifdef HASH_TEST cout << i; #endif BItor bkt = B.begin() + i; // the ith bucket Iterator p(b, bkt, bkt->begin()); // start of ith bucket while (!endofbkt(p) && (*p).key()!= k) // search for k nextentry(p); return p; // return final position } template <typename K, typename V, typename H> // find key typename HashMap<K,V,H>::Iterator HashMap<K,V,H>::find(const K& k) { Iterator p = finder(k); // look for k if (endofbkt(p)) // didn't find it? return end(); // return end iterator else return p; // return its position } DS 9-22

23 HashMap (8) /** HashMap.cpp (5) */ template <typename K, typename V, typename H> // insert utility typename HashMap<K,V,H>::Iterator HashMap<K,V,H>::inserter(const Iterator& p, const Entry& e) { EItor ins = p.bkt->insert(p.ent, e); // insert before p using insert() of list<entry> n++; // one more entry return Iterator(B, p.bkt, ins); // return this position } template <typename K, typename V, typename H> typename HashMap<K,V,H>::Iterator HashMap<K,V,H>::put(const K& k, const V& v) { Iterator p = finder(k); if (endofbkt(p)) { return inserter(p, Entry(k, v)); } else { p.ent->setvalue(v); return p; } } // insert/replace (v,k) // search for k // k not found? // insert at end of bucket // found it? // replace value with v // return this position DS 9-23

24 HashMap (9) /** HashMap.cpp (6) */ template <typename K, typename V, typename H> // remove utility void HashMap<K,V,H>::eraser(const Iterator& p) { p.bkt->erase(p.ent); // remove entry from bucket n--; // one fewer entry } template <typename K, typename V, typename H> // remove entry at p void HashMap<K,V,H>::erase(const Iterator& p) { eraser(p); } template <typename K, typename V, typename H> // remove entry with key k void HashMap<K,V,H>::erase(const K& k) { Iterator p = finder(k); // find k if (endofbkt(p)) // not found? throw NonexistentElement("Erase of nonexistent"); //...error eraser(p); // remove it } DS 9-24

25 Entry.h /** Entry.h */ #include <iostream> using namespace std; template <typename K, typename V> class Entry { // a (key, value) pair friend ostream& operator<<(ostream& fout, const Entry& ent) { fout << "Entry [key: " << ent.key() << ", value : " << ent.value() <<"]"; return fout; } public: // public functions Entry(const K& k = K(), const V& v = V()) // constructor : _key(k), _value(v) { } const K& key() const { return _key; } // get key const V& value() const { return _value; } // get value void setkey(const K& k) { _key = k; }// set key void setvalue(const V& v) { _value = v; } // set value private: // private data K _key; // key V _value; // value }; DS 9-25

26 class Task /** Task.h */ #include <iostream> #include <string> using namespace std; class Task { friend ostream& operator<<(ostream&, const Task&); public: Task(); // constructor Task(string tname, int pri, int dur); string gettaskname() {return taskname;} int getpriority() { return priority;} int getduration() { return duration;} void settaskname(string n) { taskname = n;} void setpriority(int); void setduration(int); bool operator>(task& t) { return (priority > t.priority);} bool operator<(task& t) { return (priority < t.priority);} private: string taskname; // name of task int priority; // priority of task in scheduling: 0 ~ 49 int duration; // duration of processing time, [ms] }; DS 9-26

27 class LessThan /** LessThan.h */ #ifndef LESSTHAN_H #define LESSTHAN_H using namespace std; template<class T> class LessThan { public: bool operator()(t& p, T& q) const { return p < q; } }; #endif DS 9-27

28 CyclicShiftHashCode /** CyclicShiftHashCode.h */ using namespace std; class CyclicShiftHashCode { public: int operator() (const string key); }; /** CyclicShiftHashCode.cpp */ #include <string> #include "CyclicShiftHashCode.h" int CyclicShiftHashCode::operator() (const string key) { int len = key.length(); unsigned int h = 0; for (int i=0; i<len; i++) { h = (h << 5) (h >> 27); h += (unsigned int)key.at(i); } return h; } DS 9-28

29 Exceptions /** Exceptions.h */ #include <string> #include <iostream> using namespace std; class RuntimeException { private: string errormsg; public: RuntimeException(const string& err) { errormsg = err; } string getmessage() const { return errormsg; } }; class NonexistentElement : public RuntimeException { public: NonexistentElement(const string& err) : RuntimeException(err) {} }; DS 9-29

30 main.cpp (1) /** main.cpp (1) */ #include <iostream> #include "Task.h" #include "HashMap.h" #include "HashMap.cpp" #include "LessThan.h" #include "CyclicShiftHashCode.h" #include "Entry.h" #include <string> #define NUM_TASKS 20 void main() { HashMap<string, Task, CyclicShiftHashCode> *pthm; pthm = new HashMap<string, Task, CyclicShiftHashCode>; Task* ptask[num_tasks]; string tname = ""; char tmpstr[10], idstr[10]; int priority = -1; int duration = 0; int task_id = 0; LessThan<Task> lessthan_task; DS 9-30

31 main.cpp (2) /** main.cpp (2) */ cout << "\ n Generate " << NUM_TASKS << " tasks and put them into TaskHashMap" << endl; for (int i=0; i<num_tasks; i++) { task_id = rand(); _itoa_s(task_id, tmpstr, 10); tname = "t_" + string(tmpstr); priority = rand() % 50; duration = rand() % 100; ptask[i] = new Task(tName, priority, duration); cout << " Generated : " << *ptask[i]; cout << ", put task into " ; pthm->put(tname, *ptask[i]); cout << "-th bucket" << endl; } // end for } cout << " Print HashMap for Tasks: " << endl; HashMap<string, Task, CyclicShiftHashCode>::Iterator p; for (p = pthm->begin(); p!= pthm->end(); ++p) { cout << *p << endl; } cout << endl; delete ptask; DS 9-31

32 Execution Result is stored in 3-rd bucket DS 9-32 is stored in 97-th bucket

33 9.3 Ordered Map Ordered Map keep the entries in a map sorted according to some total order and be able to look up keys and values based on this ordering the ordered map includes all the functions of the standard map ADT plus the following: firstentry(k) : return an iterator to the entry with the smallest key value; if the map is empty, it returns end lastentry(k) : return an iterator to the entry with the largest key value; if the map is empty, it returns end ceilingentry(k) : return an iterator to the entry with the least key value greater than or equal to k; if the map is empty, it returns end floorentry(k) : return an iterator to the entry with the greatest key value less than or equal to k; if the map is empty, it returns end lowerentry(k) : return an iterator to the entry with the greatest key value less than k; if the map is empty, it returns end higherentry(k) : return an iterator to the entry with the least key value greater than k; if the map is empty, it returns end DS 9-33

34 Order Search Table and Binary Search Order search table if the keys in a map comes from a total order, we can store the map s entries in a vector L in increasing order of the keys space requirement of an ordered search table: O(n) accessing an element of L by its index: O(1) update operation in a search table (insert(k, v), erase(k)) : O(n) in worst case DS 9-34

35 Binary Search (1) Algorithm BinarySearch(L, K, low, high) Input: an ordered vector L storing n entries and integers low and high Output: an entry of L with key equal to k and index between low and high, if such an entry exists, and otherwise the special sentinel end if (low > high) then return end else mid (low + high) / 2 entry L.at(mid) if (k = e.key()) then return entry else if (k < entry.key()) then return BinarySearch(L, k, low, mid 1) else return BinarySearch(L, k, mid + 1, high) DS 9-35

36 Binary Search (2) DS 9-36

37 Comparing Map implementations Map implementation with un-ordered list Map implementation with hash table Map implementation with ordered search table Method Un-ordered List Hash Table Ordered Search Table DS 9-37

38 Applications of ordered maps - Flight databases Flight objects: k = (origin, destination, date, time) although a user typically wants to exactly match the origin and destination cities, as well as the departure date, he/she will probably be content with any departure time that is close to his or her requested departure time ceilingentry(k) : returns the flight between the desired cities on the desired date, with departure time at the desired time or after floorentry(k) : returns the flight with departure time at the desired or before DS 9-38

39 9.4 Skip List A skip list for a set S of distinct (key, element) items is a series of lists S 0, S 1,, S h such that Each list S i contains the special keys + and List S 0 contains the keys of S in nondecreasing order Each list is a subsequence of the previous one, i.e., S 0 S 1 S h List S h contains only the two special keys We show how to use a skip list to implement the dictionary ADT S 3 tower + S 2 31 level + S S DS 9-39

40 Search and Update Operations in a Skip List We search for a key x in a a skip list as follows: We start at the first position of the top list Operations for traversal : after(p) : return the position following p on the same level before(p) : return the position preceding p on the same level below(p) : return the position below p in the same tower above(p) : return the position above p in the same tower If we try to drop down past the bottom list, we return null Algorithm SkipSearch(k) Input: a search key k Output: position p in the bottom list S0 such that the entry at p has the largest key less than or equal to k p s // start position while (below(p) NULL) do p below(p) // drop-down while ( k key (after(p)) do p after(p) return p DS 9-40

41 Insertion To insert an entry (x, e) into a skip list, we use a randomized algorithm: We repeatedly toss a coin until we get tails, and we denote with i the number of times the coin came up heads If i h, we add to the skip list new lists S h+1,, S i +1, each containing only the two special keys We search for x in the skip list and find the positions p 0, p 1,, p i of the items with largest key less than x in each list S 0, S 1,, S i For j 0,, i, we insert item (x, e) into list S j after position p j Example: insert key 15, with i = 2 S 3 + S 2 p 2 + S S 1 S 0 p 1 p S 1 S DS 9-41

42 SkipInsert Algorithm SkipInsert(k) Input: Key k and value v Output: topmost position of the entry inserted in the skip list p SkipSearch(k) q NULL e (k, v) i -1 repeat i i + 1 if ( i h) then h h + 1 t after(s) s insertafterabove(null, s, (-, NULL)) // level h left node insertafterabove(s, t, (+, NULL)) // level-h right node q insertafterabove(p, q, e) while (above(p) = NULL) do p before(p) // scan backward p above(p) until coinflip() = tails n n + 1 return q DS 9-42

43 Randomized Algorithms A randomized algorithm performs coin tosses (i.e., uses random bits) to control its execution It contains statements of the type b random() if b = 0 do A else { b = 1} do B Its running time depends on the outcomes of the coin tosses We analyze the expected running time of a randomized algorithm under the following assumptions the coins are unbiased, and the coin tosses are independent The worst-case running time of a randomized algorithm is often large but has very low probability (e.g., it occurs when all the coin tosses give heads ) We use a randomized algorithm to insert items into a skip list DS 9-43

44 insertion of an entry (key 42) before insertion after insertion DS 9-44

45 Deletion To remove an entry with key x from a skip list, we proceed as follows: We search for x in the skip list and find the positions p 0, p 1,, p i of the items with key x, where position p j is in list S j We remove positions p 0, p 1,, p i from the lists S 0, S 1,, S i We remove all but one list containing only the two special keys Example: remove key 34 S 3 + S 2 p S 2 + S 1 p p0 S S S DS 9-45

46 Removal in a Skip List ( key = 25) DS 9-46

47 Implementation We can implement a skip list with quad-nodes A quad-node stores: entry quad-node link to the node prev pabove link to the node next link to the node below link to the node above Also, we define special keys PLUS_INF and MINUS_INF, and we modify the key comparator to handle them pprev x pbelow pnext DS 9-47

48 Space Usage The space used by a skip list depends on the random bits used by each invocation of the insertion algorithm We use the following two basic probabilistic facts: Fact 1: The probability of getting i consecutive heads when flipping a coin is 1/2 i Fact 2: If each of n entries is present in a set with probability p, the expected size of the set is np Consider a skip list with n entries By Fact 1, we insert an entry in list S i with probability 1/2 i By Fact 2, the expected size of list S i is n/2 i The expected number of nodes used by the skip list is h i= 0 h n 1 = n 2n i < i 2 2 i= 0 Thus, the expected space usage of a skip list with n items is O(n) DS 9-48

49 Height The running time of the search an insertion algorithms is affected by the height h of the skip list We show that with high probability, a skip list with n items has height O(log n) We use the following additional probabilistic fact: Fact 3: If each of n events has probability p, the probability that at least one event occurs is at most np Consider a skip list with n entires By Fact 1, we insert an entry in list S i with probability 1/2 i By Fact 3, the probability that list S i has at least one item is at most n/2 i By picking i = 3log n, we have that the probability that S 3log n has at least one entry is at most n/2 3log n = n/n 3 = 1/n 2 Thus a skip list with n entries has height at most 3log n with probability at least 1 1/n 2 DS 9-49

50 Search and Update Times The search time in a skip list is proportional to the number of drop-down steps, plus the number of scan-forward steps The drop-down steps are bounded by the height of the skip list and thus are O(log n) with high probability To analyze the scan-forward steps, we use yet another probabilistic fact: Fact 4: The expected number of coin tosses required in order to get tails is 2 When we scan forward in a list, the destination key does not belong to a higher list A scan-forward step is associated with a former coin toss that gave tails By Fact 4, in each list the expected number of scanforward steps is 2 Thus, the expected number of scan-forward steps is O(log n) We conclude that a search in a skip list takes O(log n) expected time The analysis of insertion and deletion gives similar results DS 9-50

51 Summary of Skip List A skip list is a data structure for dictionaries that uses a randomized insertion algorithm In a skip list with n entries The expected space used is O(n) The expected search, insertion and deletion time is O(log n) Using a more complex probabilistic analysis, one can show that these performance bounds also hold with high probability Skip lists are fast and simple to implement in practice DS 9-51

52 9.5 Dictionary The dictionary ADT models a searchable collection of key-element entries The main operations of a dictionary are searching, inserting, and deleting items Multiple items with the same key are allowed Applications: word-definition pairs credit card authorizations DNS mapping of host names (e.g., datastructures.net) to internet IP addresses (e.g., ) DS 9-52

53 Dictionary ADT Dictionary ADT methods: size(): return the number of entries in D empty(): return true if D is empty and false otherwise find(k): if there is an entry with key k, returns an iterator p referring any such entry, else return the special iterator end findall(k): returns a pair of iterators (b, e), such that all entries with key value k lie in the range from b up to, but not including, e: [b, e) // starting at b and ending just prior to e insert(k, v): insert an entry with key k and value v into D, returing an iterator referring to the newly created entry erase(k): remove from D an arbitrary entry with key equal to k; an error condition occurs if D has no such entry erase(p): remove from D the entry referenced by iterator p; an error condition occurs if D has no such entry begin(): return an iterator to the first entry of the dictionary end(): return an iterator to a position just beyond the end of the dictionary DS 9-53

54 Example Operation Output Dictionary empty() true put(5,a) p1: [(5,A) {(5,A)} put(7,b) p2: [(7,B) {(5,A),(7,B)} put(2,c) p3: (2,C) { (5,A),(7,B),(2,C)} put(8,d) p4: (8,D) { (5,A),(7,B),(2,C),(8,D)} put(2,e) p5: (2,E) { (5,A),(7,B),(2,C),(2,E),(8,D)} find(7) p2: (7,B) { (5,A),(7,B),(2,C),(2,E),(8,D)} find(4) end { (5,A),(7,B),(2,C),(2,E),(8,D)} find(2) p3: (2,C) { (5,A),(7,B),(2,C),(2,E),(8,D)} findall(2) (p3, p4) { (5,A),(7,B),(2,C),(2,E),(8,D)} size() 5 { (5,A),(7,B),(2,C),(2,E),(8,D)} erase(5) { (7,B),(2,C),(2,E),(8,D)} erase(p3) { (7,B),(2,E),(8,D)} find(2) p5:[(2,e)] { (7,B),(2,E),(8,D)} DS 9-54

55 C++ Dictionary Implementation class HashDict (1) template <typename K, typename V, typename H> class HashDict : public HashMap<K,V,H> { public: // public functions typedef typename HashMap<K,V,H>::Iterator Iterator; typedef typename HashMap<K,V,H>::Entry Entry; // Range class declaration class Range { // an iterator range private: Iterator _begin; // front of range Iterator _end; // end of range public: Range(const Iterator& b, const Iterator& e) // constructor : _begin(b), _end(e) { } Iterator& begin() { return _begin; } // get beginning Iterator& end() { return _end; } // get end }; public: // public functions HashDict(int capacity = 100); // constructor Range findall(const K& k); // find all entries with k Iterator insert(const K& k, const V& v); // insert pair (k,v) }; DS 9-55

56 class HashDict (2) template <typename K, typename V, typename H> // constructor HashDict<K,V,H>::HashDict(int capacity) : HashMap<K,V,H>(capacity) { } template <typename K, typename V, typename H> // insert pair (k,v) typename HashDict<K,V,H>::Iterator HashDict<K,V,H>::insert(const K& k, const V& v) { Iterator p = finder(k); // find key Iterator q = inserter(p, Entry(k, v)); // insert it here return q; // return its position } template <typename K, typename V, typename H> // find all entries with k typename HashDict<K,V,H>::Range HashDict<K,V,H>::findAll(const K& k) { Iterator b = finder(k); // look up k Iterator p = b; while (!endofbkt(p) && (*p).key() == (*b).key()) { // find next unequal key ++p; } return Range(b, p); // return range of positions } DS 9-56

57 Implementations with Location-aware Entries location-aware entry : augmented by private location variable protected functions: location(), setlocation(p) Several possible ways to implement Dictionary ADT with location-aware entry unordered list: maintain the location variable of each entry e to point e s position in the underlying linked list for unordered list L hash table: use the location variable of each entry e to point to e s position in the list L implementing the list A[h(k)] ordered search table: maintain the location variable of each entry e to be e s index in ordered table T skip list: maintain the location variable of each entry e to point to e s position in the bottom level of skip list S DS 9-57

58 Homework DS-9 DS-9.1 Projects P-9.3 (map ADT using a vector) DS-9.2 Projects P-9.11 (implementation of the skip-list, map and dictionary ADT with skip-list) DS 9-58

CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS

0 1 2 025-612-0001 981-101-0002 3 4 451-229-0004 CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++, GOODRICH,