Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative Commos 2.5 Licese. 2015 Goodrich ad Tamassia Hash Tables 1
Recall the Map Operatios get(k): if the map M has a etry with key k, retur its associated value; else, retur ull put(k, v): isert etry (k, v) ito the map M; if key k is ot already i M, the retur ull; else, retur old value associated with k remove(k): if the map M has a etry with key k, remove it from M ad retur its associated value; else, retur ull size(), isempty() 2015 Goodrich ad Tamassia Hash Tables 2
Ituitive Notio of a Map Ituitively, a map M supports the abstractio of usig keys as idices with a sytax such as M[k]. As a metal warm-up, cosider a restricted settig i which a map with items uses keys that are kow to be itegers i a rage from 0 to N 1, for some N. 2015 Goodrich ad Tamassia Hash Tables 3
More Geeral Kids of Keys But what should we do if our keys are ot itegers i the rage from 0 to N 1? Use a hash fuctio to map geeral keys to correspodig idices i a table. For istace, the last four digits of a Social Security umber. 0 1 2 3 025-612-0001 981-101-0002 4 451-229-0004 2015 Goodrich ad Tamassia Hash Tables 4
Hash Fuctios ad Hash Tables A hash fuctio h maps keys of a give type to itegers i a fixed iterval [0, N - 1] Example: h(x) = x mod N is a hash fuctio for iteger keys The iteger h(x) is called the hash value of key x A hash table for a give key type cosists of Hash fuctio h Array (called table) of size N Whe implemetig a map with a hash table, the goal is to store item (k, o) at idex i = h(k) 2015 Goodrich ad Tamassia Hash Tables 5
Example We desig a hash table for a map storig etries as (SSN, Name), where SSN (social security umber) is a ie-digit positive iteger Our hash table uses a array of size N = 10,000 ad the hash fuctio h(x) = last four digits of x 0 1 2 3 4 9997 9998 9999 025-612-0001 981-101-0002 451-229-0004 200-751-9998 2015 Goodrich ad Tamassia Hash Tables 6
Hash Fuctios A hash fuctio is usually specified as the compositio of two fuctios: Hash code: h 1 : keys itegers Compressio fuctio: h 2 : itegers [0, N - 1] The hash code is applied first, ad the compressio fuctio is applied ext o the result, i.e., h(x) = h 2 (h 1 (x)) The goal of the hash fuctio is to disperse the keys i a apparetly radom way 2015 Goodrich ad Tamassia Hash Tables 7
Hash Codes Memory address: We reiterpret the memory address of the key object as a iteger. Good i geeral, except for umeric ad strig keys Iteger cast: We reiterpret the bits of the key as a iteger Suitable for keys of legth less tha or eual to the umber of bits of the iteger type (e.g., byte, short, it ad float) Compoet sum: We partitio the bits of the key ito compoets of fixed legth (e.g., 16 or 32 bits) ad we sum the compoets (igorig overflows) Suitable for umeric keys of fixed legth greater tha or eual to the umber of bits of the iteger type. 2015 Goodrich ad Tamassia Hash Tables 8
Hash Codes (cot.) Polyomial accumulatio: We partitio the bits of the key ito a seuece of compoets of fixed legth (e.g., 8, 16 or 32 bits) a 0 a 1 a -1 We evaluate the polyomial p(z) = a 0 + a 1 z + a 2 z2 + + a -1 z -1 at a fixed value z, igorig overflows Especially suitable for strigs (e.g., the choice z = 33 gives at most 6 collisios o a set of 50,000 Eglish words) Polyomial p(z) ca be evaluated i O() time usig Horer s rule: The followig polyomials are successively computed, each from the previous oe i O(1) time p 0 (z) = a -1 p i (z) = a -i-1 + zp i-1 (z) (i = 1, 2,, -1) We have p(z) = p -1 (z) 2015 Goodrich ad Tamassia Hash Tables 9
Tabulatio-Based Hashig Suppose each key ca be viewed as a tuple, k = (x 1, x 2,..., x d ), for a fixed d, where each x i is i the rage [0,M 1]. There is a class of hash fuctios we ca use, which ivolve simple table lookups, kow as tabulatio-based hashig. We ca iitialize d tables, T 1, T 2,..., T d, of size M each, so that each T i [j] is a uiformly chose idepedet radom umber i the rage [0,N 1]. We the ca compute the hash fuctio, h(k), as h(k) = T 1 [x 1 ] T 2 [x 2 ]... T d [x d ], where deotes the bitwise exclusive-or fuctio. Because the values i the tables are themselves chose at radom, such a fuctio is itself fairly radom. For istace, it ca be show that such a fuctio will cause two distict keys to collide at the same hash value with probability 1/N, which is what we would get from a perfectly radom fuctio. 2015 Goodrich ad Tamassia Hash Tables 10
Compressio Fuctios Divisio: h 2 (y) = y mod N The size N of the hash table is usually chose to be a prime The reaso has to do with umber theory ad is beyod the scope of this course Radom liear hash fuctio: h 2 (y) = (ay + b) mod N a ad b are radom oegative itegers such that a mod N 0 Otherwise, every iteger would map to the same value b 2015 Goodrich ad Tamassia Hash Tables 11
Collisio Hadlig Collisios occur whe differet elemets are mapped to the same cell Separate Chaiig: let each cell i the table poit to a liked list of etries that map there 0 1 2 3 025-612-0001 4 451-229-0004 981-101-0004 Separate chaiig is simple, but reuires additioal memory outside the table 2015 Goodrich ad Tamassia Hash Tables 12
Map with Separate Chaiig Delegate operatios to a list-based map at each cell: Algorithm get(k): retur A[h(k)].get(k) Algorithm put(k,v): t = A[h(k)].put(k,v) if t = ull the = + 1 retur t Algorithm remove(k): t = A[h(k)].remove(k) if t ull the = - 1 retur t 2015 Goodrich ad Tamassia {k is a ew key} {k was foud} Hash Tables 13
Performace of Separate Chaiig Let us assume that our hash fuctio, h, maps keys to idepedet uiform radom values i the rage [0,N 1]. Thus, if we let X be a radom variable represetig the umber of items that map to a bucket, i, i the array A, the the expected value of X, E(X) = /N, where is the umber of items i the map, sice each of the N locatios i A is eually likely for each item to be placed. This parameter, /N, which is the ratio of the umber of items i a hash table,, ad the capacity of the table, N, is called the load factor of the hash table. If it is O(1), the the above aalysis says that the expected time for hash table operatios is O(1) whe collisios are hadled with separate chaiig. 2015 Goodrich ad Tamassia Hash Tables 14
Liear Probig Ope addressig: the collidig item is placed i a differet cell of the table Liear probig: hadles collisios by placig the collidig item i the ext (circularly) available table cell Each table cell ispected is referred to as a probe Collidig items lump together, causig future collisios to cause a loger seuece of probes Example: h(x) = x mod 13 Isert keys 18, 41, 22, 44, 59, 32, 31, 73, i this order 0 1 2 3 4 5 6 7 8 9 10 11 12 41 18 44 59 32 22 31 73 0 1 2 3 4 5 6 7 8 9 10 11 12 2015 Goodrich ad Tamassia Hash Tables 15
Search with Liear Probig Cosider a hash table A that uses liear probig get(k) We start at cell h(k) We probe cosecutive locatios util oe of the followig occurs 2015 Goodrich ad Tamassia w A item with key k is foud, or w A empty cell is foud, or w N cells have bee usuccessfully probed Algorithm get(k) i h(k) p 0 repeat c A[i] if c = retur ull else if c.getkey () = k retur c.getvalue() else i (i + 1) mod N p p + 1 util p = N retur ull Hash Tables 16
Updates with Liear Probig To hadle isertios ad deletios, we itroduce a special object, called DEFUNCT, which replaces deleted elemets remove(k) We search for a etry with key k If such a etry, (k, v), is foud, we move elemets to fill the hole created by its removal. put(k, v) We throw a exceptio if the table is full We start at cell h(k) We probe cosecutive cells util a A cell i is foud that is empty. w We store (k, v) i cell i 2015 Goodrich ad Tamassia Hash Tables 17
Pseudo-code for get ad put 2015 Goodrich ad Tamassia Hash Tables 18
Pseudo-code for remove 2015 Goodrich ad Tamassia Hash Tables 19
Performace of Liear Probig I the worst case, searches, isertios ad removals o a hash table take O() time The worst case occurs whe all the keys iserted ito the map collide The load factor α = /N affects the performace of a hash table Assumig that the hash values are like radom umbers, it ca be show that the expected umber of probes for a isertio with ope addressig is 1 / (1 - α) 2015 Goodrich ad Tamassia The expected ruig time of all the dictioary ADT operatios i a hash table is O(1) with costat load < 1 I practice, hashig is very fast provided the load factor is ot close to 100% Applicatios of hash tables: small databases compilers browser caches Hash Tables 20
A More Careful Aalysis of Liear Probig Recall that, i the liear-probig scheme for hadlig collisios, wheever a isertio at a cell i would cause a collisio, the we istead isert the ew item i the first cell of i+1, i+2, ad so o, util we fid a empty cell. For this aalysis, let us assume that we are storig items i a hash table of size N = 2, that is, our hash table has a load factor of 1/2. 2015 Goodrich ad Tamassia Hash Tables 21
A More Careful Aalysis of Liear Probig, 2 Thus, if we ca boud the expected value of the sum of Y i s, the we ca boud the expected time for a search or update operatio i a liear-probig hashig scheme. 2015 Goodrich ad Tamassia Hash Tables 22
A More Careful Aalysis of Liear Probig, 2 Thus, if we ca boud the expected value of the sum of Y i s, the we ca boud the expected time for a search or update operatio i a liear-probig hashig scheme. 2015 Goodrich ad Tamassia Hash Tables 23
A More Careful Aalysis of Liear Probig, 3 2015 Goodrich ad Tamassia Hash Tables 24
A More Careful Aalysis of Liear Probig, 4 2015 Goodrich ad Tamassia Hash Tables 25
Double Hashig Double hashig uses a secodary hash fuctio d(k) ad hadles collisios by placig a item i the first available cell of the series (i + jd(k)) mod N for j = 0, 1,, N - 1 The secodary hash fuctio d(k) caot have zero values The table size N must be a prime to allow probig of all the cells Commo choice of compressio fuctio for the secodary hash fuctio: d 2 (k) = - k mod where < N is a prime The possible values for d 2 (k) are 1, 2,, 2015 Goodrich ad Tamassia Hash Tables 26
Example of Double Hashig Cosider a hash table storig iteger keys that hadles collisio with double hashig N = 13 h(k) = k mod 13 d(k) = 7 - k mod 7 Isert keys 18, 41, 22, 44, 59, 32, 31, 73, i this order k h (k ) d (k ) Probes 18 5 3 5 41 2 1 2 22 9 6 9 44 5 5 5 10 59 7 4 7 32 6 3 6 31 5 4 5 9 0 73 8 4 8 0 1 2 3 4 5 6 7 8 9 10 11 12 31 41 18 32 59 73 22 44 0 1 2 3 4 5 6 7 8 9 10 11 12 2015 Goodrich ad Tamassia Hash Tables 27