Hashing 1. Searching Lists

Similar documents
Hashing 1. Searching Lists

Data Structures and Algorithm Analysis (CSC317) Hash tables (part2)

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

CS/COE 1501

Data Structures and Algorithms. Chapter 7. Hashing

Data Structures and Algorithms. Roberto Sebastiani

Hash Tables. Hashing Probing Separate Chaining Hash Function

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Hashing Techniques. Material based on slides by George Bebis

Hashing. Hashing Procedures

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Hash Table and Hashing

Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Fast Lookup: Hash tables

COMP171. Hashing.

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

CSE 2320 Notes 13: Hashing

Fundamental Algorithms

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

DATA STRUCTURES AND ALGORITHMS

Understand how to deal with collisions

TABLES AND HASHING. Chapter 13

Open Addressing: Linear Probing (cont.)

CS 270 Algorithms. Oliver Kullmann. Generalising arrays. Direct addressing. Hashing in general. Hashing through chaining. Reading from CLRS for week 7

Worst-case running time for RANDOMIZED-SELECT

9/24/ Hash functions

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

CS 241 Analysis of Algorithms

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Unit #5: Hash Functions and the Pigeonhole Principle

HASH TABLES.

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

CSC263 Week 5. Larry Zhang.

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Successful vs. Unsuccessful

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

CMSC 341 Hashing. Based on slides from previous iterations of this course

Hashing for searching

AAL 217: DATA STRUCTURES

Week 9. Hash tables. 1 Generalising arrays. 2 Direct addressing. 3 Hashing in general. 4 Hashing through chaining. 5 Hash functions.

Data Structures And Algorithms

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

Hash Tables. Gunnar Gotshalks. Maps 1

CS 350 : Data Structures Hash Tables

BBM371& Data*Management. Lecture 6: Hash Tables

Lecture 7: Efficient Collections via Hashing

HASH TABLES. Goal is to store elements k,v at index i = h k

SFU CMPT Lecture: Week 8

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

CPSC 259 admin notes

Hashing. Introduction to Data Structures Kyuseok Shim SoEECS, SNU.

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

Data and File Structures Chapter 11. Hashing

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Algorithms and Data Structures

DS ,21. L11-12: Hashmap

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Hashing. 7- Hashing. Hashing. Transform Keys into Integers in [[0, M 1]] The steps in hashing:

Hash[ string key ] ==> integer value

of characters from an alphabet, then, the hash function could be:

Cpt S 223. School of EECS, WSU

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

CSCI 104 Hash Tables & Functions. Mark Redekopp David Kempe

IS 2610: Data Structures

Lecture 17. Improving open-addressing hashing. Brent s method. Ordered hashing CSE 100, UCSD: LEC 17. Page 1 of 19

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

4 Hash-Based Indexing

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

CSCI Analysis of Algorithms I

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Hashing Algorithms. Hash functions Separate Chaining Linear Probing Double Hashing

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

Part I Anton Gerdelan

III Data Structures. Dynamic sets

Practical Session 8- Hash Tables

We assume uniform hashing (UH):

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Hash Tables and Hash Functions

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Introduction to. Algorithms. Lecture 5

Integer keys. Databases and keys. Chaining. Analysis CS206 CS206

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

CSE 214 Computer Science II Searching

Algorithms and Data Structures

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Lecture 6: Hashing Steven Skiena

Introduction to. Algorithms. Lecture 6. Prof. Constantinos Daskalakis. CLRS: Chapter 17 and 32.2.

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

Transcription:

Hashing 1 Searching Lists There are many instances when one is interested in storing and searching a list: A phone company wants to provide caller ID: Given a phone number a name is returned. Somebody who plays the Lotto thinks they can increase their chance to win if they keep track of the winning number from the last 3 years. Both of these have simple solutions using arrays, linked lists, binary trees, etc. Unfortunately, none of these solutions is adequate, as we will see.

Hashing 2 Problematic List Searching Here are some solutions to the problems: For caller ID: Use an array indexed by phone number. Requires one operation to return name. An array of size 1,000,000,000 is needed. Use a linked list. This requires O(n) time and space, where n is the number of actual phone numbers. This is the best we can do space-wise, but the time required is horrendous. A balanced binary tree: O(n) space and O(log n) time. Better, but not good enough. For the Lotto we could use the same solutions. The number ofpossible lottonumber sis 15,625,000 for a 6/50 lotto. The number of entries stored is on the order of 1000, assuming a daily lotto.

Hashing 3 The Hash Table Solution A Hash table is similar to an array: Has fixed size m. An item with key k goes into index h(k), where h is a function from the keyspace to {0, 1,...,m 1} The function h is called a hash function. We call h(k) the hash value of k. A hash table allows us to store values from a large set in a small array in such a way that searching is fast. The space required to store n numbers in a hash table of size m is O(m + n). Notice that this does not depend on the size of the keyspace. The average time required to insert, delete, and search for an element in a hash table is O(1). Sounds perfect. Then what s the problem? Let s look at an example.

Hashing 4 Hash Table Example I have 5 friends whose phone numbers I want to store in a hash table of size 8. Their names and numbers are: Susy Olson 555-1212 Sarah Skillet 555-4321 Ryan Hamster 545-3241 Justin Case 555-6798 Chris Lindmeyer 535-7869 I use as a hash function h(k) = k mod 8, where k is the phone number viewed as a 7-digit decimal number. Notice that 5554321 mod 8 = 5453241 mod 8 = 1. But I can t put Sarah Skillet and Ryan Hamster both in the position 1. Can we fix this problem?

Hashing 5 Hash Table Problems The problem with hash tables is that two keys can have the same hash value. This is called collision. There are several ways to deal with this problem: Pick the hash function h to minimize the number of collisions. Implement the hash table in a way that allows keys with the same hash value to all be stored. The second method is almost always needed, even for very good hash functions. Why? We will talk about 2 collision resolution techniques: Chaining Open Addressing But first, a bit more on hash functions

Hashing 6 Hash Functions Most hash functions assume the keys come from the set {0, 1,...}. If the keys are not natural numbers, some method must be used to convert them. Phone number and ID numbers can be converted by removing the hyphens. Characters can be converted using ASCII. Strings can be converted by converting each character using ASCII, and then interpreting the string of natural numbers as if it where stored base 128. We need to choose a good hash function. A hash function is good if it can be computed quickly, and the keys are distributed uniformly throughout the table.

Hashing 7 Some Good Hash Functions The division method: h(k) = k mod m The modulus m must be chosen carefully. Powers of 2 and 10 can be bad. Why? Prime numbers not too close to powers of 2 are a good choice. We can pick m by choosing an appropriate prime number that is close to the table size we want. The multiplication method: Let A be a constant with 0 < A < 1. Then we use h(k) = m(ka mod 1), where ka mod 1 means the fractional part of ka. The choice of m is not as critical here. We can choose m to make the implementation easy and/or fast.

Hashing 8 Universal Hashing Let x and y be distinct keys. Let H be a set of hash functions. H is called universal if the number of functions h H for which h(x) = h(y) is precisely H /m. In other words, if we pick a random h H, the probability that x and y collide under h is 1/m. Universal hashing can be useful in many situations. You are asked to come up with a hashing technique for your boss. Your boss tells you that after you are done, he will pick some keys to hash. If you get too many collisions, he will fire you. If you use universal hashing, the only way he can succeed is by getting lucky. Why?

Hashing 9 Collision Resolution: Chaining With chaining, we set up an array of links, indexed by the hash values, to lists of items with the same hash value. 0 32 5 1 2 3 4 13 4 65 21 Let n be the number of keys, and m the size of the hash table. Define the load factor α = n/m. Successful and unsuccessful searching both take time Θ(1 + α) on average, assuming simple uniform hashing. By simple uniform hashing we mean that a key has equal probability to hash into any of the m slots, independent of the other elements of the table.

Hashing 10 Collision Resolution: Open Addressing With Open Addressing, all elements are stored in the hash table. This means several things Less memory is used than with chaining since we don t have to store pointers. The hash table has an absolute size limit of m. Thus, planning ahead is important when using open addressing. We must have a way to store multiple elements with the same hash value. Instead of a hash function, we need to use a probe sequence. That is, a sequence of hash values. We go through the sequence one by one until we find an empty position. For searching, we do the same thing, skipping values that do not match our key.

Hashing 11 Probe Sequences We define our hash functions with an extra parameter, the probe number i. Thus our hash functions look like h(k, i). We compute h(k, 0), h(k, 1),..., h(k, i) until h(k, i) is empty. The hash function h must be such that the probe sequence h(k, 0),...h(k, m 1) is a permutation of 0, 1,..., m 1. Why? Insertion and searching are fairly quick, assuming a good implementation. Deletion can be a problem because the probe sequence can be complex. We will discuss 2 ways of defining probe sequences: Linear Probing Double Hashing

Hashing 12 Linear Probing Let g be a hash function. When we use linear probing, the probe sequence is computed by h(k, i) = (g(k) + i) mod m. When we insert an element, we start with the hash value, and proceed element by element until we find an empty slot. For searching, we start with the hash value and proceed element by element until we find the key we are looking for. Example: Let g(k) = k mod 13. We will insert the following keys into the hash table: 18, 41, 22, 44, 59, 32, 31, 73 0 1 2 3 4 5 6 7 8 9 10 11 12 Problem: The values in the table tend to cluster.

Hashing 13 Double Hashing With double hashing we use two hash functions. Let h 1 and h 2 be hash functions. We define our probe sequence by h(k, i) = (h 1 (k) + i h 2 (k)) mod m. Often pick m = 2 n for some n, and ensure that h 2 (k) is always odd. Double hashing tends to distribute keys more uniformly than linear probing.

Hashing 14 Double Hashing Example Let h 1 = k mod 13 Let h 2 = 1 + (k mod 8), and Let the hash table have size 13. Then our probe sequence is defined by h(k, i) = (h 1 (k) + i h 2 (k)) mod 13 = (k mod 13 + i(1 + (k mod 8))) mod 13 Insert the following keys into the table: 18, 41, 22, 44, 59, 32, 31, 73 0 1 2 3 4 5 6 7 8 9 10 11 12

Hashing 15 Open Addressing: Performance Again, we set α = n/m. This is the average number of keys per array index. Note that α 1 for open addressing. We assume a good probe sequence has been used. That is, for any key, each permutation of 0, 1...,m 1 is equally likely as a probe sequence. The average number of probes for insertion or unsuccessful search is at most 1/(1 α). The average number of probes for a successful search is at most 1 α ln 1 1 α + 1 α.

20 15 10 5 0 Chaining or Open Addressing? Hashing Performance: Chaining Verses Open Addressing 1 1+x 1/(1-x) (1/x)*log(1/(1-x))+1/x 0 0.2 0.4 0.6 0.8 1 Ch: I Ch: S OA: I/US OA: SS Hashing 16