Data Structures and Algorithms. Roberto Sebastiani

Similar documents
Data Structures and Algorithms. Chapter 7. Hashing

DS ,21. L11-12: Hashmap

Hash Table and Hashing

Worst-case running time for RANDOMIZED-SELECT

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

Hash Tables. Reading: Cormen et al, Sections 11.1 and 11.2

9/24/ Hash functions

Data Structures and Algorithms. Roberto Sebastiani

Hash Tables. Hashing Probing Separate Chaining Hash Function

CS 241 Analysis of Algorithms

Hashing 1. Searching Lists

COMP171. Hashing.

III Data Structures. Dynamic sets

Hashing Techniques. Material based on slides by George Bebis

Hashing. Hashing Procedures

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

Algorithms and Data Structures

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

DATA STRUCTURES AND ALGORITHMS

Fundamental Algorithms

CS 270 Algorithms. Oliver Kullmann. Generalising arrays. Direct addressing. Hashing in general. Hashing through chaining. Reading from CLRS for week 7

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

BBM371& Data*Management. Lecture 6: Hash Tables

Week 9. Hash tables. 1 Generalising arrays. 2 Direct addressing. 3 Hashing in general. 4 Hashing through chaining. 5 Hash functions.

Practical Session 8- Hash Tables

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

CSC263 Week 5. Larry Zhang.

Outline. Computer Science 331. Desirable Properties of Hash Functions. What is a Hash Function? Hash Functions. Mike Jacobson.

CSE 214 Computer Science II Searching

Lecture 7: Efficient Collections via Hashing

Data Structures and Algorithm Analysis (CSC317) Hash tables (part2)

Data Structures and Algorithms. Roberto Sebastiani

Hash Tables and Hash Functions

Understand how to deal with collisions

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

CSE 2320 Notes 13: Hashing

TABLES AND HASHING. Chapter 13

HASH TABLES.

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Hashing. 7- Hashing. Hashing. Transform Keys into Integers in [[0, M 1]] The steps in hashing:

Introduction to. Algorithms. Lecture 6. Prof. Constantinos Daskalakis. CLRS: Chapter 17 and 32.2.

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Topic HashTable and Table ADT

HASH TABLES. Goal is to store elements k,v at index i = h k

We assume uniform hashing (UH):

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

Cpt S 223. School of EECS, WSU

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

Fast Lookup: Hash tables

Lecture 6: Hashing Steven Skiena

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

CSCI Analysis of Algorithms I

Hash[ string key ] ==> integer value

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

AAL 217: DATA STRUCTURES

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

CS 303 Design and Analysis of Algorithms

Dictionaries and Hash Tables

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Priority Queue Sorting

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Lecture 18. Collision Resolution

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Data Structures. Topic #6

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Outline. hash tables hash functions open addressing chained hashing

Comp 335 File Structures. Hashing

Dictionaries (Maps) Hash tables. ADT Dictionary or Map. n INSERT: inserts a new element, associated to unique value of a field (key)

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

Open Addressing: Linear Probing (cont.)

SFU CMPT Lecture: Week 8

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Hashing Algorithms. Hash functions Separate Chaining Linear Probing Double Hashing

CMSC 341 Hashing. Based on slides from previous iterations of this course

lecture23: Hash Tables

CS 350 Algorithms and Complexity

CS 350 : Data Structures Hash Tables

Data Structures and Algorithms. Werner Nutt

Hashing 1. Searching Lists

Chapter 9: Maps, Dictionaries, Hashing

( D. Θ n. ( ) f n ( ) D. Ο%

Data and File Structures Chapter 11. Hashing

Data Structure. Measuring Input size. Composite Data Structures. Linear data structures. Data Structure is: Abstract Data Type 1/9/2014

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

CSCI 104 Hash Tables & Functions. Mark Redekopp David Kempe

UNIT III BALANCED SEARCH TREES AND INDEXING

Transcription:

Data Structures and Algorithms Roberto Sebastiani roberto.sebastiani@disi.unitn.it http://www.disi.unitn.it/~rseba - Week 07 - B.S. In Applied Computer Science Free University of Bozen/Bolzano academic year 2010-2011 1

Acknowledgements & Copyright Notice These slides are built on top of slides developed by Michael Boehlen. Moreover, some material (text, figures, examples) displayed in these slides is courtesy of Kurt Ranalter. Some examples displayed in these slides are taken from [Cormen, Leiserson, Rivest and Stein, ``Introduction to Algorithms'', MIT Press], and their copyright is detained by the authors. All the other material is copyrighted by Roberto Sebastiani. Every commercial use of this material is strictly forbidden by the copyright laws without the authorization of the authors. No copy of these slides can be displayed in public or be publicly distributed without containing this copyright notice. 2

Data Structures and Algorithms Week 7 1. Dictionaries 2. Hashing 3. Hash Functions 4. Collisions 5. Performance Analysis 3

Data Structures and Algorithms Week 7 1. Dictionaries 2. Hashing 3. Hash Functions 4. Collisions 5. Performance Analysis 4

Dictionary Dictionary a dynamic data structure with methods: Search(S, k) an access operation that returns a pointer x to an element where x.key = k Insert(S, x) a manipulation operation that adds the element pointed to by x to S Delete(S, x) a manipulation operation that removes the element pointed to by x from S An element has a key part and a satellite data part. 5

Dictionaries Dictionaries store elements so that they can be located quickly using keys. A dictionary may hold bank accounts. Each account is an object that is identified by an account number. Each account stores a lot of additional information. An application wishing to operate on an account would have to provide the account number as a search key. 6

Dictionaries/2 If order (methods such as min, max, successor, predecessor) is not required it is enough to check for equality. Operations that require ordering are still possible but cannot use the dictionary access structure. Usually all elements must be compared, which is slow. Can be OK if it is rare enough 7

Dictionaries/3 Different data structures to realize dictionaries arrays linked lists Hash tables Binary trees Red/Black trees B trees In Java: java.util.map interface defining Dictionary ADT 8

Data Structures and Algorithms Week 7 1. Dictionaries 2. Hashing 3. Hash Functions 4. Collisions 5. Performance Analysis 9

The Problem AT&T is a large phone company, and they want to provide caller ID capability: given a phone number, return the caller s name phone numbers range from 0 to r = 10 8 1 want to do this as efficiently as possible 10

The Problem/2 Two suboptimal ways to design this dictionary direct addressing: an array indexed by key: Requires O(1) time, Requires O(r) space huge amount of wasted space (null) (null) Jens (null) (null) 0000 0000 0000 0001 9635 8904 9635 8905 9999 9999 a linked list: requires O(n) time, O(n) space Jens 9635 8904 Ole 9635 9999 11

Another Solution: Hashing We can do better, with a Hash table of size m. Like an array, but with a function to map the large range into one which we can manage. e.g., take the original key, modulo the (relatively small) size of the table, and use that as an index Insert (9635 8904, Jens) into a hash table with, say, five slots (m = 5) 96358904 mod 5 = 4 (null) (null) (null) (null) 0 1 2 3 4 O(1) expected time, O(n+m) space Jens 12

Data Structures and Algorithms Week 7 1. Dictionaries 2. Hashing 3. Hash Functions 4. Collisions 5. Performance Analysis 13

Hash Functions Need to choose a good hash function (HF) quick to compute distributes keys uniformly throughout the table How to deal with hashing non integer keys: find some way of turning the keys into integers in our example, remove the hyphen in 9635 8904 to get 96358904 for a string, add up the ASCII values of the characters of your string (e.g., java.lang.string.hashcode()) then use a standard hash function on the integers. 14

HF: Division Method Use the remainder: h(k) = k mod m k is the key, m the size of the table Need to choose m m = b e (bad) if m is a power of 2, h(k) gives the e least significant bits of k all keys with the same ending go to the same place m prime (good) helps ensure uniform distribution primes not too close to exact powers of 2 are best 15

HF: Division Method/2 Example 1 hash table for n = 2000 character strings, ok to investigate an average of three attempts/search m = 701 a prime near 2000/3 but not near any power of 2 Further examples m = 13 h(3) = 3 h(12) = 12 h(13) = 0 16

HF: Multiplication Method Use h(k) = m (k A mod 1) k is the key m the size of the table A is a constant 0 < A < 1 (k A mod 1): the fractional part of k A The steps involved map 0...k max into 0...k max A take the fractional part (mod 1) map it into 0...m 1 17

HF: Multiplication Method/2 Choice of m and A Value of m is not critical (typically use for some p m = 2 p Optimal choice of A depends on the characteristics of the data 5 1 2 Knuth says use A = = 0.618033988 18

HF: Multiplication Method/3 Assume 7 bit binary keys, 0 k < 128 m = 64 = 26, p = 6 A = 89/128 =.1011001, k = 107 = 1101011 Computation of h(k):.1011001 A 1101011 k 1001010.0110011 ka.0110011 ka mod 1 011001.1 m(ka mod 1) Thus, h(k) = 25 19

Data Structures and Algorithms Week 7 1. Dictionaries 2. Hashing 3. Hash Functions 4. Collisions 5. Performance Analysis 20

Collisions Assume a key is mapped to an already occupied table location what to do? Use a collision handling technique 3 techniques to deal with collisions: chaining open addressing/linear probing open addressing/double hashing 21

Chaining Chaining maintains a table of links, indexed by the keys, to lists of items with the same key. 0 1 2 3 4 22

Open Addressing All elements are stored in the hash table (can fill up), i.e., n m Each table entry contains either an element or null When searching for an element, systematically probe table slots Modify hash function to take probe number i as second parameter h: U x { 0, 1,..., m 1 } -> { 0, 1,..., m 1 } 23

Open Addressing/2 Hash function, h, determines the sequence of slots examined for a given key Probe sequence for a given key k given by ( h(k,0), h(k,1),..., h(k,m 1) ) a permutation of ( 0, 1,..., m 1 ) 24

Linear Probing LinearProbingInsert(k) 01 if (table is full) error 02 probe = h(k) 03 while (table[probe] occupied) 04 probe = (probe+1) mod m 05 table[probe] = k If the current location is used, try the next table location: h(key,i) = (h1(key)+i) mod m Lookups walk along the table until the key or an empty slot is found Uses less memory than chaining one does not have to store all those links Slower than chaining one might have to probe the table for a long time 25

Linear Probing/2 Problem primary clustering : long lines of occupied slots A slot preceded by i full slots has a high probability of getting filled: (i+1)/m Alternatives: (quadratic probing,) double hashing Example: h(k) = k mod 13 insert keys: 18 41 22 44 59 32 31 73 26

Double Hashing Use two hash functions: h(key,i) = (h1(key) + i*h2(key)) mod m, i=0,1,... DoubleHashingInsert(k) 01 if (table is full) error 02 probe = h1(k) 03 offset = h2(k) 03 while (table[probe] occupied) 04 probe = (probe+offset) mod m 05 table[probe] = k Distributes keys much more uniformly than linear probing. 27

Double Hashing/2 h2(k) must be relative prime to m to search the entire hash table. Suppose h2(k) = k*a and m = w*a, a > 1 Two ways to ensure this: m is power of 2, h2(k) is odd m: prime, h2(k): positive integer < m Example h1(k) = k mod 13, h2(k) = 8 (k mod 8) insert keys: 18 41 22 44 59 32 31 73 28

Open addressing: delete Complex to delete from A slot may be reached from different points We cannot simply store NIL : we'd loose the information necessary to retrieve other keys Possible solution: mark the deleted slot as deleted, insert also on deleted Drawback: retrieval time no more depending by load factor: potentially lots of jumps on deleted slots When deletion admitted/frequent, chaining preferred 29

Data Structures and Algorithms Week 7 1. Dictionaries 2. Hashing 3. Hash Functions 4. Collisions 5. Performance Analysis 30

Analysis of Hashing: An element with key k is stored in slot h(k) (instead of slot k without hashing) The hash function h maps the universe U of keys into the slots of hash table T[0...m 1] h: U { 0, 1,..., m 1 } Assumption: Each key is equally likely to be hashed into any slot (bucket); simple uniform hashing Given hash table T with m slots holding n elements, the load factor is defined as α=n/m 31

Analysis of Hashing/2 Assume time to compute h(k) is Θ(1) To find an element using h, look up its position in table T search for the element in the linked list of the hashed slot uniform hashing yields an average list length α = n/m expected number of elements to be examined α search time O(1+α) 32

Analysis of Hashing/3 Assuming the number of hash table slots is proportional to the number of elements in the table n = O(m) = n/m = O(m)/m = O(1) searching takes constant time on average insertion takes O(1) worst case time deletion takes O(1) worst case time (pass the element not key, lists are doubly linked) 33

Expected Number of Probes Load factor α < 1 for probing Analysis of probing uses uniform hashing assumption any permutation is equally likely Chaining Probing Unsuccessful O 1 O Successful O 1 1 1 O 1 ln 1 1 Chaining: 1( =0%), 1.5( =50%), 2( =100%), n( =n) Probing, unsucc: 1.25( =20%), 2( =50%), 5( =80%), 10( =90%) Probing, succ: 0.28( =20%), 1.39( =50%), 2.01( =80%), 2.56( =90%) 34

Expected Number of Probes/2 35

Summary Hashing is very efficient (not obvious, probability theory). Its functionality is limited (printing elements sorted according to key is not supported). The size of the hash table may not be easy to determine. A hash table is not really a dynamic data structure. 36

Suggested exercises Implement a Hash Table with the different techniques With paper & pencil, draw the evolution of a hash table when inserting, deleting and searching for new element, with the different techniques See also exercises of CLRS 37

Next Week Graphs: Representation in memory Breadth first search Depth first search Topological sort 38