Dictionaries (Maps) Hash tables. ADT Dictionary or Map. n INSERT: inserts a new element, associated to unique value of a field (key)

Similar documents
Week 9. Hash tables. 1 Generalising arrays. 2 Direct addressing. 3 Hashing in general. 4 Hashing through chaining. 5 Hash functions.

CS 270 Algorithms. Oliver Kullmann. Generalising arrays. Direct addressing. Hashing in general. Hashing through chaining. Reading from CLRS for week 7

Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

CS 561, Lecture 2 : Randomization in Data Structures. Jared Saia University of New Mexico

DATA STRUCTURES AND ALGORITHMS

Hashing. Hashing Procedures

HASH TABLES. Goal is to store elements k,v at index i = h k

CS 561, Lecture 2 : Hash Tables, Skip Lists, Bloom Filters, Count-Min sketch. Jared Saia University of New Mexico

Hash Tables Hash Tables Goodrich, Tamassia

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Hash Table and Hashing

BBM371& Data*Management. Lecture 6: Hash Tables

Dictionaries and Hash Tables

Module 3: Hashing Lecture 9: Static and Dynamic Hashing. The Lecture Contains: Static hashing. Hashing. Dynamic hashing. Extendible hashing.

Hashing Techniques. Material based on slides by George Bebis

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

of characters from an alphabet, then, the hash function could be:

Hashing. October 19, CMPE 250 Hashing October 19, / 25

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

Hash Tables. Reading: Cormen et al, Sections 11.1 and 11.2

Data Structures and Algorithms. Roberto Sebastiani

Cpt S 223. School of EECS, WSU

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

DATA STRUCTURES AND ALGORITHMS

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

Practical Session 8- Hash Tables

COMP171. Hashing.

Hash Tables. Hashing Probing Separate Chaining Hash Function

CS 241 Analysis of Algorithms

Hash[ string key ] ==> integer value

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS

Algorithms and Data Structures

CSC263 Week 5. Larry Zhang.

This lecture. Iterators ( 5.4) Maps. Maps. The Map ADT ( 8.1) Comparison to java.util.map

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Announcements. Biostatistics 615/815 Lecture 8: Hash Tables, and Dynamic Programming. Recap: Example of a linked list

Data Structures and Algorithms. Chapter 7. Hashing

Hash Tables. Johns Hopkins Department of Computer Science Course : Data Structures, Professor: Greg Hager

CS 261 Data Structures

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Dictionaries-Hashing. Textbook: Dictionaries ( 8.1) Hash Tables ( 8.2)

Worst-case running time for RANDOMIZED-SELECT

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Data and File Structures Chapter 11. Hashing

Elementary Data Structures and Hash Tables

Open Addressing: Linear Probing (cont.)

III Data Structures. Dynamic sets

Hashing 1. Searching Lists

Hashing. 7- Hashing. Hashing. Transform Keys into Integers in [[0, M 1]] The steps in hashing:

Topic HashTable and Table ADT

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Fundamental Algorithms

Dictionaries and Hash Tables

CSE 214 Computer Science II Searching

Elementary Data Structures and Hash Tables

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Hashing Algorithms. Hash functions Separate Chaining Linear Probing Double Hashing

CSC Design and Analysis of Algorithms. Lecture 9. Space-For-Time Tradeoffs. Space-for-time tradeoffs

Data Structures And Algorithms

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

Unit #5: Hash Functions and the Pigeonhole Principle

CS 350 : Data Structures Hash Tables

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Hash Tables. Gunnar Gotshalks. Maps 1

AAL 217: DATA STRUCTURES

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Chapter 9: Maps, Dictionaries, Hashing

Hashing (Κατακερματισμός)

9/24/ Hash functions

Lecture 4. Hashing Methods

TABLES AND HASHING. Chapter 13

Lecture 7: Efficient Collections via Hashing

HASH TABLES.

Lecture 17. Improving open-addressing hashing. Brent s method. Ordered hashing CSE 100, UCSD: LEC 17. Page 1 of 19

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

SFU CMPT Lecture: Week 8

key h(key) Hash Indexing Friday, April 09, 2004 Disadvantages of Sequential File Organization Must use an index and/or binary search to locate data

Successful vs. Unsuccessful

HASH TABLES. Hash Tables Page 1

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Lecture 6: Hashing Steven Skiena

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

Maps, Hash Tables and Dictionaries. Chapter 10.1, 10.2, 10.3, 10.5

CS/COE 1501

Hashing. Introduction to Data Structures Kyuseok Shim SoEECS, SNU.

The dictionary problem

Fast Lookup: Hash tables

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

AdvanceDataStructures-Unit-1(Dictionaries)

Transcription:

Dictionaries (Maps) Hash tables ADT Dictionary or Map Has following operations: n INSERT: inserts a new element, associated to unique value of a field (key) n SEARCH: searches an element with a certain value of the key. If it esists, it returns it n DELETE: cancels element with given key, if exists 2 1

Uses of dictionaries n Symbol table in a compiler n Key: nameof identifier n Values: types, context n Citizens in a country n Key: social security number n Values: name, surname, age, address 3 Associative array A dictionary would be easily implemented with an associative array (index of value = key instead of position) Ex: n Citizens = {{ jr50, john, red }, { bg40, bill, green }, } n Citizens[ jr50 ] = { jr50, john, red } 4 2

Goal Complexity of insert/search/delete: n O(1) average case n Θ(n) worst case 5 Hash tables Implementation of associative arrays An array containing elements. Address of element is computed by hash function, in time O(1). Ex: n Hash( jr50 ) = 117: element john red is in position 117 of vector 6 3

Associative array 1 2 4 U (all keys) 7 0 6 9 3 5 8 0 1 2 3 4 5 6 7 8 9 T 2 3 5 8 key value K (used keys) 7 Dictionary implemented w associative array n T: associative array, key: key, x: value n Search(T, key) n Return T[key] n Insert(T, x) n T[key[x]] x n Delete(T, x) n T[key[x]] NIL n Complexity O(1), memory O( U ) O( U ) number of different values of key 8 4

Assumptions Two assumptions are needed: n No two elements with same key (keys are unique) n Size of T == size of max number of possible values of key, U. n This is critical, if U is large, array unfeasible n Ex: key = SSN, 10chars, U = 24 10 10 13 n Assuming 24 values alphabet n But, the citizens of a country are in the order 10 7-10 9 n It is essential that size of array be O( K ) and not O( U ) 9 Hash tables n A kind of associative array with size O( K ) and not O( U ) n Insert/search/delete are O(1) on average n However, the way of computing index given key must be different: hash function 10 5

Hash function n Hash table is array with size m (m<< U ) n Hash function h, from key to position in array (index) n h: U { 0, 1,..., m-1 } n Element x is stored in n T[h(key[x])] 11 Hash function k 1 U k 3 k2 k 4 k 5 0 1 2 3 4 5 6 7 8 m-1 T h(k 1 ) h(k 4 ) h(k 2 )=h(k 5 ) h(k 3 ) 12 6

Collision n Collision n when h(k i )=h(k j ) and k i k j, n Essential to: n Minimize number of collisions n Depend on hash function n Manage collisions 13 Example Key is a string of characters Hash function h(k) = Σ(c i ) mod m with n c i ASCII code of i-th char of string k n m number of elements (size) of array T 14 7

Ex (II) m = 15. Collision with strings paperino and paperoga n h( pippo ) = (112+105+112+112+111)mod 15= 552 mod 15 = 12 n h( pluto ) = (112+108+117+116+111)mod 15= 564 mod 15 = 9 n h( paperino ) = (112+97+112+101+114+105+110+111)mod 15= 862 mod 15 = 7 n h( topolino ) = (116+111+112+111+108+105+110+111)mod 15= 884 mod 15 = 14 n h( paperoga ) = (112+97+112+101+114+111+103+97)mod 15= 847 mod 15 = 7 15 Ex (II) m = 15. n h("mickey ) = (77 + 105 + 99 + 107 + 101 + 121) mod 15 = 10 n h("minnie") = (77 + 105 + 110 + 110 + 105 + 101) mod 15 = 8 n h("donald") = (68 + 111 + 110 + 97 + 108 + 100) mod 15 = 9 n h("daisy") = (68 + 97 + 105 + 115 + 121) mod 15 = 11 n h("foo") = (102 + 111 + 111) mod 15 = 9 n h("bar") = (98 + 97 + 114) mod 15 = 9 Collision with strings foo and bar 16 8

Collisions mitigation The best hash functions are capable of distributing as uniformly (randomly) as possible the K elements among the m positions available Typical strategies: pick m as a prime number manipulate bits of k 17 Collision management n Chaining n Open Addressing 18 9

Chaining (I) Position i can contain more than one element This can be implemented through a linked list 19 Chaining (II) k 1 U k 3 T 0 1 2 k 1 k 6 k 3 6 k 4 4 k 4 5 k 2 k 5 k2 6 k 5 7 8 k 3 m-1 20 10

Chaining (III) n T[i] is a pointer to a list, initially NIL. n CHAINED-HASH-INSERT(T,x) n insert x at head of list T[h(key[x])] n CHAINED-HASH-SEARCH(T,k) n Search element with key k in list T[h(k)] n CHAINED-HASH-DELETE(T,x) n Cancel x from list T[h(key[x])] 21 Chaining - Complexity n Assumption: unorderd list, single chaining n Insert: O(1) n Search: O(length of lists) n Cancel: O(length of lists) n Requires a search 22 11

Search (hash + chaining) - complexity n We have n n : number of elements in hash table T n m : size of hash table T n α=n/m: load factor for hash table T n Normally α>1 n What if m,n (with same α)? 23 Search (hash + chaining) complexity (II) n Search n Worst case: a linked list, not ordered n Time to compute h(k) + n Time to transverse the list, Θ(n) n Best case: depends on how uniformly h(k) distributes the elements n Let s assume h(k) is capable of simple uniform hashing (distributes in perfect uniform way) (this requires that the table grows with the elements, so that α remains constant) 24 12

Search (hash + chaining) complexity (II) Search Time to compute h(k) = O(1). Time to trasverse the list, depends on length of list T[h(k)] depends on element found/not found In both cases complexity is Θ(1+α). summing up O(1) + Θ(1+α) = O(1) 25 Open Addressing T[i] can contain only one element In case of collision another free cell is searched for next one, after next, etc Must be α<1. 26 13

Hash-Insert HASH-INSERT(T, k) 1 i 0 2 repeat j h(k, i) 3 if T[j] = NIL 4 then T[j] k 5 return 6 else i i + 1 7 until i = m 8 error hash table overflow 27 Hash-Search HASH-SEARCH(T, k) 1 i 0 2 repeat j h(k, i) 3 if T[j] = k 4 then return j 5 i i + 1 6 until T[j] = NIL or i = m 7 return NIL 28 14

Re-hash functions n Linear probing n h(k, i) = (h (k)+i) mod m n Quadratic probing n h(k, i) = (h (k)+ c 1 i + c 2 i 2 ) mod m n Double hashing n h(k, i) = (h 1 (k)+ i h 2 (k) ) mod m 29 Ex - insert n m = 10 n open addressing with linear probing. Hash values sequence: n h(a)=5, h(b)=4, h(c)=9, h(d)=4, h(e)=8, h(f)=8, h(g)=10 30 15

Ex - insert (II) A B A B A B A D B A D B A D C C E C E C F 5 4 9 4 8 8 G B A D E C F 10 31 Ex - search (III) search: n D: (h(d)=4) n Read 4 n Read 5 n Read 6 found n G: (h(g)=10) n Read 10 n Read 1 found n M: (h(m)=4) n Read 4, n Read 5, n Read 6, n Read 7, not found 32 16

Delete Very complex, because changes the rehash/ collision sequence In practice open hashing is used only if no delete 33 Complexity With uniform hashing and linear probing: n The number of probing trials is 1/(1 α), and complexity is the same as for insert n Complexity of search is 1 1 1 ln + α 1 α α 34 17

Hash functions 35 Uniform hashing Best hash functions do a uniform hashing: if keys have the same probability, also h(k) should have equal probability k: h( k ) = j 1 P( k) =, j = 0,1,, m 1 m 36 18

Keys are not uniform However, keys often are not equally distributed (ex words in a language, ex names and surnames) use all characters amplify the differences 37 Keys as numbers Usually keys are strings of characters Easiest thing is to treat them as integers n Ex: abc becomes a *256 2 + b *256 + c However, with very long strings this is impractical, variants have to be used In the following the key is an integer 38 19

Hash function = mod m n k is an integer : n h(k) = k mod m n Requires m n/α. n m size, n number of elements 39 Choice of m n Avoid n Powers of 2 n Division by m looses high bits of k n Powers of 10 n Same as above, if k is decimal number n Use n A prime number n Far from powers of 2 40 20

Ex n n = 2000 n On average 3 comparisons in searches n m = 701 is a prime, close to 2000/3 but far from powers of 2 n h(k) = k mod 701 41 Hash function = multiply n K integer: n A constant 0<A<1 n Frac(x) = x - x n h(k) = m frac(k A) n k A shuffles bits of k, n Multiplying by m expands [0,1] in [0,m] 42 21

Choice of m and A n M is not critical. Using a power of 2 simplifies the multiplication n Best A depends on how keys are statistically distributed n A = ( 5 1) / 2 = 0.6180339887... Is a good choice 43 22