Hash Tables: Handling Collisions

Similar documents
CSE 373: Data Structures and Algorithms. Hash Tables. Autumn Shrirang (Shri) Mare

CSE 373: Data Structures and Algorithms. Binary Search Trees. Autumn Shrirang (Shri) Mare

CSE 373: Data Structures and Algorithms. Asymptotic Analysis. Autumn Shrirang (Shri) Mare

Floyd s buildheap, Sorting

CSE 373: Data Structures and Algorithms. Disjoint Sets. Autumn Shrirang (Shri) Mare

Testing and Debugging

Binary Heaps. Autumn Shrirang (Shri) Mare CSE 373: Data Structures and Algorithms

Lecture 10: Introduction to Hash Tables

CSE 373: Data Structures and Algorithms. Memory and Locality. Autumn Shrirang (Shri) Mare

Minimum Spanning Tree

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Lecture 1: CSE 373. Data Structures and Algorithms

CSE 373: Data Structures and Algorithms. Final Review. Autumn Shrirang (Shri) Mare

Hash Table and Hashing

COMP171. Hashing.

CSE 373: Data Structures and Algorithms. Graphs. Autumn Shrirang (Shri) Mare

CSE 373: Data Structures and Algorithms. Graph Traversals. Autumn Shrirang (Shri) Mare

Algorithms and Data Structures

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

CS 350 : Data Structures Hash Tables

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

CSI33 Data Structures

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Hashing. Hashing Procedures

AAL 217: DATA STRUCTURES

Graphs. Data Structures and Algorithms CSE 373 SU 18 BEN JONES 1

Randomized Algorithms: Element Distinctness

HASH TABLES.

Hash Tables. Hashing Probing Separate Chaining Hash Function

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Hashing Techniques. Material based on slides by George Bebis

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

DATA STRUCTURES AND ALGORITHMS

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

HASH TABLES. Goal is to store elements k,v at index i = h k

CSE 214 Computer Science II Searching

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

TABLES AND HASHING. Chapter 13

BBM371& Data*Management. Lecture 6: Hash Tables

Fundamental Algorithms

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

III Data Structures. Dynamic sets

Randomized Algorithms, Hash Functions

CPSC 259 admin notes

Lecture 18. Collision Resolution

1 CSE 100: HASH TABLES

CMSC 341 Hashing. Based on slides from previous iterations of this course

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Implementing Hash and AVL

Lecture 3: Queues, Testing, and Working with Code

Data Structures and Algorithms. Roberto Sebastiani

CS 241 Analysis of Algorithms

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Understand how to deal with collisions

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

CS 261 Data Structures

Topic HashTable and Table ADT

CS 270 Algorithms. Oliver Kullmann. Generalising arrays. Direct addressing. Hashing in general. Hashing through chaining. Reading from CLRS for week 7

Data Structures and Algorithms. Chapter 7. Hashing

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

The dictionary problem

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

Hashing. October 19, CMPE 250 Hashing October 19, / 25

Hashing Algorithms. Hash functions Separate Chaining Linear Probing Double Hashing

Cpt S 223. School of EECS, WSU

Week 9. Hash tables. 1 Generalising arrays. 2 Direct addressing. 3 Hashing in general. 4 Hashing through chaining. 5 Hash functions.

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Lecture 16 More on Hashing Collision Resolution

Hash table basics mod 83 ate. ate

Hash table basics. ate à. à à mod à 83

Worst-case running time for RANDOMIZED-SELECT

Comp 335 File Structures. Hashing

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Hash Tables. Gunnar Gotshalks. Maps 1

You must print this PDF and write your answers neatly by hand. You should hand in the assignment before recitation begins.

Lecture 1: Welcome! CSE 373: Data Structures and Algorithms CSE WI - KASEY CHAMPION 1

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

Symbol Tables. ASU Textbook Chapter 7.6, 6.5 and 6.3. Tsan-sheng Hsu.

Ch 3.4 The Integers and Division

Lecture 7: Efficient Collections via Hashing

2 Fundamentals of data structures

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

! A set is a collection of objects of the same. ! {6,9,11,-5} and {11,9,6,-5} are equivalent. ! There is no first element, and no successor. of 9.

Unit #5: Hash Functions and the Pigeonhole Principle

Übungen zur Vorlesung Informatik II (D-BAUG) FS 2017 D. Sidler, F. Friedrich

Lecture 3: Maps and Iterators

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

ASSIGNMENTS. Progra m Outcom e. Chapter Q. No. Outcom e (CO) I 1 If f(n) = Θ(g(n)) and g(n)= Θ(h(n)), then proof that h(n) = Θ(f(n))

Data and File Structures Chapter 11. Hashing

Transcription:

CSE 373: Data Structures and Algorithms Hash Tables: Handling Collisions Autumn 208 Shrirang (Shri) Mare shri@cs.washington.edu Thanks to Kasey Champion, Ben Jones, Adam Blank, Michael Lee, Evan McCarty, Robbie Weber, Whitaker Brand, Zora Fung, Stuart Reges, Justin Hsia, Ruth Anderson, and many others for sample slides and materials...

Announcements - HW3 due Friday Noon - Office hours for next week have changed. Please see the calendar for the correct info - We made a mistake in a comment in HW4. We ll push a commit to your repo to correct that. (So expect one more git commit from us.) CSE 373 AU 8 SHRI MARE 2

Today - Review Hashing - Separate Chaining - Open addressing with linear probing - Open addressing with quadratic probing CSE 373 AU 8 SHRI MARE 3

Problem (Motivation for hashing) How can we implement a dictionary such that dictionary operations are efficient? Idea : Create a giant array and use keys as indices. (This approach is called direct-access table or direct-access map) Two main problems:. Can only work with integer keys? 2. Too much wasted space Idea 2: What if we (a) convert any type of key into a non-negative integer key (b) map the entire key space into a small set of keys (so we can use just the right size array) CSE 373 AU 8 SHRI MARE 4

Solution to problem : Can only work with integer keys? Idea: Use functions that convert a non-integer key into a non-negative integer key CSE 373 AU 8 SHRI MARE 5

Solution to problem : Can only work with integer keys? Idea: Use functions that convert a non-integer key into a non-negative integer key - Everything is stored as bits in memory and can be represented as an integer. - But the representation can be much simpler (nothing to do with memory). - For example (just for illustration; this is not how strings, images, and videos are hashed in practice): - Strings can be represented with number of characters in the string, ascii value of the first char, last char - Image can be represented with resolution, size of image, value of the 5 th pixel in the image, 00 th pixel - Similarly, video can be represented resolution, size, frame rate, size of the 0 th frame CSE 373 AU 8 SHRI MARE 6

Solution to problem : Can only work with integer keys? Idea: Use functions that convert a non-integer key into a non-negative integer key - Everything is stored as bits in memory and can be represented as an integer. - But the representation can be much simpler (nothing to do with memory). - For example (just for illustration; this is not how strings, images, and videos are hashed in practice): - Strings can be represented with number of characters in the string, ascii value of the first char, last char - Image can be represented with resolution, size of image, value of the 5 th pixel in the image, 00 th pixel - Similarly, video can be represented resolution, size, frame rate, size of the 0 th frame Question: What are some good strategies to pick a hash function? (This is important). Quick: Computing hash should be quick (constant time). 2. Deterministic: Hash value of a key should be the same hash table. 3. Random: A good hash function should distribute the keys uniformly into the slots in the table. CSE 373 AU 8 SHRI MARE 7

Solution to problem 2: Too much wasted space Idea: Map the entire key space into a small set of keys (so we can use just the right sized array) 5000 202 900007.. 202 5000.... 900007.. indices 0 2 202 5000 900007 CSE 373 AU 8 SHRI MARE 8

Solution to problem 2: Too much wasted space Idea: Map the entire key space into a small set of keys (so we can use just the right sized array) indices 5000 202 900007 0 2 7 5000 202 900007 0 2 3 4 5 6 7 8 9 CSE 373 AU 8 SHRI MARE 9

Review: The modulus (mod) operation The modulus (mod) operation The modulus (or mod) operation gives the remainder of a division of one number by another. Written as x mod n or x % n. Examples: % 0 = % 0 = 0 % 0 = 0 5746 % 0 = 6 7 % 7 = For more review/practice, check out https://www.khanacademy.org/computing/computer-science/cryptography/modarithmetic/a/what-is-modular-arithmetic 0

Review: The modulus (mod) operation The modulus (mod) operation The modulus (or mod) operation gives the remainder of a division of one number by another. Written as x mod n or x % n. Examples: % 0 = % 0 = 0 % 0 = 0 5746 % 0 = 6 7 % 7 = Common applications of the mod operation: - finding last digit ( % 0) - whether a number is odd/even (% 2) - wrap around behavior (% wrap limit) The application we are interested in is the wrap around behavior. It lets us map any large integer into an index in our array of size m (using % m) For more review/practice, check out https://www.khanacademy.org/computing/computer-science/cryptography/modarithmetic/a/what-is-modular-arithmetic

Implementing a simple hash table (assume no collisions) public V get(int key) { } return this.array[key].value; public void put(int key, V value) { } this.array[key] = value; public void remove(int key) { } this.array[key] = null; CSE 373 AU 8 SHRI MARE 2

Implementing a simple hash table (assume no collisions) public V get(int key) { key = gethash(key) return this.array[key].value; } public int gethash(int a) { return a % this.array.length; } public void put(int key, V value) { key = gethash(key) this.array[key] = value; } public void remove(int key) { key = gethash(key) this.array[key] = null; } CSE 373 AU 8 SHRI MARE 3

Our simple hash table: insert (000) indices 5000 202 900007 0 2 7 5000 202 900007 0 2 3 4 5 6 7 8 9 CSE 373 AU 8 SHRI MARE 4

Our simple hash table: insert (000) 000 5000 202 900007 Hash collision 0 2 7 Some other value exists in slot at index 0 5000 202 900007 indices 0 2 3 4 5 6 7 8 9 CSE 373 AU 8 SHRI MARE 5

Hash collision What is a hash collision? It s a case when two different keys have the same hash value. Mathematically, h(k) = h(k2) when k k2 CSE 373 AU 8 SHRI MARE 6

Hash collision What is a hash collision? It s a case when two different keys have the same hash value. Mathematically, h(k) = h(k2) when k k2 Why is this a problem? - We put keys in slots determined by the hash function. That is, we put k at index h(k), - A collision means the natural choice slot is taken - We cannot replace k with k2 (because the keys are different) - So the problem is where do we put k2? CSE 373 AU 8 SHRI MARE 7

Strategies to handle hash collision CSE 373 AU 8 SHRI MARE 8

Strategies to handle hash collision There are multiple strategies. In this class, we ll cover the following three:. Separate chaining 2. Open addressing - Linear probing - Quadratic probing 3. Double hashing CSE 373 AU 8 SHRI MARE 9

Separate chaining - Separate chaining is a collision resolution strategy where collisions are resolved by storing all colliding keys in the same slot (using linked list or some other data structure) - Each slot stores a pointer to another data structure (usually a linked list or an AVL tree) put(2, value2) put(44, value44) indices 0 2 3 4 5 6 7 8 9 22 3 7 Note: For simplicity, the table shows only keys, but in each slot/node both, key and value, are stored. CSE 373 AU 8 SHRI MARE 20

Separate chaining - Separate chaining is a collision resolution strategy where collisions are resolved by storing all colliding keys in the same slot (using linked list or some other data structure) - Each slot stores a pointer to another data structure (usually a linked list or an AVL tree) put(2, value2) put(44, value44) indices 0 2 3 4 5 6 7 8 9 22 3 44 7 2 Note: For simplicity, the table shows only keys, but in each slot/node both, key and value, are stored. CSE 373 AU 8 SHRI MARE 2

Separate chaining: Running Times What are the running times for: insert find Best: Worst: Best: Worst: delete Best: Worst: CSE 332 SU 8 ROBBIE WEBER

Separate chaining: Running Times What are the running times for: insert find delete Best:!() Worst:!(%) (if insertions are always at the end of the linked list) Best:!() Worst:!(%) Best:!() Worst:!(%) CSE 332 SU 8 ROBBIE WEBER

<latexit sha_base64="4bt5nydz+jealkidqlo5gdq0+m=">aaacnhicbvdlsgmxfm3uvxfvzduglvwiwwmilorsl0oukliw6ftsiatsagzzjbklblmo9z4iw5eckgiw7/btbeqwcuhm659+bm+amjujnoswymz2bxygu2kvlk6trpfwnloxtguktxywwz6shffomooqrq4tqvdkm9l2bycjv3lhkqxvldhhqjdmnpsdfsruqvzr2ahgz2venfzfrytj7p6ogzndm7+zy2vv2pwawbyidx9alqogw5png2aitvyo7fwcm+je4osmdhie6delypxghcvmkjqd0luvyohkgyks7ukgthabohhum5iojs6vepgdwsgddwjjico7vnxmarvioi990rkj5bq3ev/zoqkkj7qa8irvhopjq2hkoirhkeeyuegwykndebbu3apxh5kklmnznig40/+srvimv4rbvcq+dxfmew2az7wawhoaboqam0aqb34am8gjfrwxqx3q2pswvbymc2ws9yn/qyamz</latexit> <latexit sha_base64="4bt5nydz+jealkidqlo5gdq0+m=">aaacnhicbvdlsgmxfm3uvxfvzduglvwiwwmilorsl0oukliw6ftsiatsagzzjbklblmo9z4iw5eckgiw7/btbeqwcuhm659+bm+amjujnoswymz2bxygu2kvlk6trpfwnloxtguktxywwz6shffomooqrq4tqvdkm9l2bycjv3lhkqxvldhhqjdmnpsdfsruqvzr2ahgz2venfzfrytj7p6ogzndm7+zy2vv2pwawbyidx9alqogw5png2aitvyo7fwcm+je4osmdhie6delypxghcvmkjqd0luvyohkgyks7ukgthabohhum5iojs6vepgdwsgddwjjico7vnxmarvioi990rkj5bq3ev/zoqkkj7qa8irvhopjq2hkoirhkeeyuegwykndebbu3apxh5kklmnznig40/+srvimv4rbvcq+dxfmew2az7wawhoaboqam0aqb34am8gjfrwxqx3q2pswvbymc2ws9yn/qyamz</latexit> <latexit sha_base64="4bt5nydz+jealkidqlo5gdq0+m=">aaacnhicbvdlsgmxfm3uvxfvzduglvwiwwmilorsl0oukliw6ftsiatsagzzjbklblmo9z4iw5eckgiw7/btbeqwcuhm659+bm+amjujnoswymz2bxygu2kvlk6trpfwnloxtguktxywwz6shffomooqrq4tqvdkm9l2bycjv3lhkqxvldhhqjdmnpsdfsruqvzr2ahgz2venfzfrytj7p6ogzndm7+zy2vv2pwawbyidx9alqogw5png2aitvyo7fwcm+je4osmdhie6delypxghcvmkjqd0luvyohkgyks7ukgthabohhum5iojs6vepgdwsgddwjjico7vnxmarvioi990rkj5bq3ev/zoqkkj7qa8irvhopjq2hkoirhkeeyuegwykndebbu3apxh5kklmnznig40/+srvimv4rbvcq+dxfmew2az7wawhoaboqam0aqb34am8gjfrwxqx3q2pswvbymc2ws9yn/qyamz</latexit> <latexit sha_base64="4bt5nydz+jealkidqlo5gdq0+m=">aaacnhicbvdlsgmxfm3uvxfvzduglvwiwwmilorsl0oukliw6ftsiatsagzzjbklblmo9z4iw5eckgiw7/btbeqwcuhm659+bm+amjujnoswymz2bxygu2kvlk6trpfwnloxtguktxywwz6shffomooqrq4tqvdkm9l2bycjv3lhkqxvldhhqjdmnpsdfsruqvzr2ahgz2venfzfrytj7p6ogzndm7+zy2vv2pwawbyidx9alqogw5png2aitvyo7fwcm+je4osmdhie6delypxghcvmkjqd0luvyohkgyks7ukgthabohhum5iojs6vepgdwsgddwjjico7vnxmarvioi990rkj5bq3ev/zoqkkj7qa8irvhopjq2hkoirhkeeyuegwykndebbu3apxh5kklmnznig40/+srvimv4rbvcq+dxfmew2az7wawhoaboqam0aqb34am8gjfrwxqx3q2pswvbymc2ws9yn/qyamz</latexit> Load Factor Load Factor (λ) Ratio of number of entries in the table to table size. If n is the total number of (key, value) pairs stored in the table and c is capacity of the table (i.e., array), Load factor = n c CSE 373 AU 8 SHRI MARE 24

Worksheet Q-Q3 CSE 373 AU 8 SHRI MARE 25

Worksheet Q3 CSE 373 AU 8 SHRI MARE 26

Open Addressing - Open addressing is a collision resolution strategy where collisions are resolved by storing the colliding key in a different location when the natural choice is full. CSE 373 AU 8 SHRI MARE 27

Open Addressing - Open addressing is a collision resolution strategy where collisions are resolved by storing the colliding key in a different location when the natural choice is full. indices 0 put(2, value2) 2 22 3 3 4 5 6 7 7 8 9 Note: For simplicity, the table shows only keys, but in each slot both, key and value, are stored. CSE 373 AU 8 SHRI MARE 28

Open Addressing: Linear probing - Open addressing is a collision resolution strategy where collisions are resolved by storing the colliding key in a different location when the natural choice is full. indices Linear probing put(2, value2) 0 2 3 4 5 6 7 8 9 22 3 7 Index = hash(k) + 0 (if occupied, try next i) = hash(k) + (if occupied, try next i) = hash(k) + 2 (if occupied, try next i) =.. =.. =.. Note: For simplicity, the table shows only keys, but in each slot both, key and value, are stored. CSE 373 AU 8 SHRI MARE 29

Open Addressing: Quadratic probing - Open addressing is a collision resolution strategy where collisions are resolved by storing the colliding key in a different location when the natural choice is full. indices Quadratic probing put(2, value2) 0 2 3 4 5 6 7 8 9 22 3 7 Index = hash(k) + 0 (if occupied, try next i^2) = hash(k) + ^2 (if occupied, try next i^2) = hash(k) + 2^2 (if occupied, try next i^2) = hash(k) + 3^2 (if occupied, try next i^2) =.. =.. Note: For simplicity, the table shows only keys, but in each slot both, key and value, are stored. CSE 373 AU 8 SHRI MARE 30

Worksheet Q4 3

Worksheet Q5 CSE 373 AU 8 SHRI MARE 32