Open Addressing: Linear Probing (cont.)

Open Addressing: Linear Probing (cont.) Cons of Linear Probing () more complex insert, find, remove methods () primary clustering phenomenon items tend to cluster together in the bucket array, as clustering gets worse, insert(k,e) takes longer because it must step all the way through a cluster to find a vacant bucket as clustering gets worse, find(k) takes longer since elements get placed further and further from their correct hashed index Example [ consecutive insertion of k=4567, 069, 073 ] Different keys that hash into different addresses compete with each other in successive rehashes. 3 4 5 7496 4567 069 073 should be at 3 should be at 4

Open Addressing: Linear Probing (cont.) Average RT in a non-full hash table, with no previous removals, the average running time of insert, find, remove is RT average (n) = + + - λ, ( - λ) for, successful search for unsuccessful search NOTE: ) What would be the worst case RT of all three methods?! ) Both formulas collapse for λ=. Why?!

Open Addressing: Linear Probing (cont.) 3 Example 4 [ linear probing ] h(k) = k mod 3 Insert keys: 8, 4,, 44, 59, 3, 3, 73, in this order. h(8) = 5 h(4) = h() = 9 h(44) = 5+ h(59) = 7 h(3) = 6++ h(3) = 5+++++ h(73) = 8+++ 4 8 44 59 3 3 73 0 3 4 5 6 7 8 9 0 cluster

Open Addressing: Quadratic Probing 4 Quadratic Probing involves iteratively trying the following buckets A [( h(k) j ) modn] + i i+ i+ i+3 where j=0,,,, until finding an empty bucket eliminates primary clusters by probing far-away buckets however, creates secondary clusters if two keys have the same initial probe position, then then they share the same probe sequence can be avoided by making the probe sequence a function of the key not just the home location if N is not prime, quadratic probing may not find an empty bucket even if one exists even if N is prime, may not find an empty slot if the bucket array is half-full

Open Addressing: Double Hashing 5 Double Hashing uses a secondary hash function h (k) and places the colliding item in st available cell of the series k k i i i A [( k + j h'(k) ) modn] where j=0,,,, until finding an empty bucket linear & quadratic probing are key independent; double hashing defines key dependant probing sequence h (k) determines the size of steps h cannot have 0 values; common choice: h (k) = q (k mod q) i 3 q < N, and q is a prime number possible values for h (k) are,,.., q

Open Addressing: Double Hashing (cont.) 6 Example 5 [ double hashing ] h(k) = k mod 3, h (k) = 7 k mod 7 Insert keys: 8, 4,, 44, 59, 3, 3, 73, in this order. Probes: [ k + j*h (k) ] mod 3, j=0,,, h(k) h(8) = 5 h(4) = h() = 9 h(44) = 5 h(59) = 7 h(3) = 6 h(3) = 5 h(73) = 8 h (k) h (44) = 5 h (3) = 4 5 9 5, 0 7 6 5, 9, 0 8 wraparound 3 4 8 3 59 73 44 0 3 4 5 6 7 8 9 0

Open Addressing: Double Hashing (cont.) 7 Pros of drastically reduces clustering and requires fewer Double Hashing comparisons than linear probing as a result, smaller hash tables can be used Cons of similar as in the case of linear/quadratic probing, Double Hashing the performance degrades as the table fills up Average RT in a non-full hash table, with no previous removals, the average running time of insert, find, remove is RT average (n) = - ln, - λ ( - λ) λ, for successful search for unsuccessful search Using more than one hash function is called rehashing. While having more than hash functions can be desirable, such schemes are difficult to implement.

Collision Handling Schemes: Comparison 8 NOTE: Quadratic Probing and Double Hashing have identical performance.

Collision Handling Schemes: Comparison (cont.) 9 () In all 4 cases, hashing efficiency decreases as load factor increases. () When λ=0.5, i.e. hash table is half-full, all 4 methods have nearly equal efficiency. (3) For Linear Probing, Quadratic Probing and Double Hashing λ should be kept below /3 to maintain reasonably good performance. if λ goes above /3 we could improve performance by resizing bucket array and using new hash function (4) Quadratic and Double Hashing perform better than Linear Probing on average, but they also suffer when table fills up. all open addressing schemes have linear O(n) worst case RT when λ! Would you ever use open addressing instead of Chaining?! (When the hash function is one-to-one on the set of all possible keys perfect hashing.) Would you ever use Linear instead of Quadratic Prob. and Double Hashing?! (When fast rehashing is required.)

Hash Tables: Conclusions 0 Hash Tables when we can afford a large tablesize (i.e. small load should be factor λ, below /3) and occasionally slow search used when Hash Tables should NOT be used if () if traversal in sorted order is required since hashing function distributes the elements to table positions somewhat randomly, order cannot be guaranteed () if range queries are required (3) if a guaranteed (better than O(n)) search time needs to be provided!!! In all three cases use binary search tree instead!