Lecture 16 More on Hashing Collision Resolution

Similar documents
Lecture 18. Collision Resolution

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

CS 350 : Data Structures Hash Tables

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

AAL 217: DATA STRUCTURES

Hash Tables. Hashing Probing Separate Chaining Hash Function

More on Hashing: Collisions. See Chapter 20 of the text.

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Algorithms and Data Structures

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

CPSC 259 admin notes

Cpt S 223. School of EECS, WSU

Hash Tables. Gunnar Gotshalks. Maps 1

Hash[ string key ] ==> integer value

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

HASH TABLES. Goal is to store elements k,v at index i = h k

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

COMP171. Hashing.

You must print this PDF and write your answers neatly by hand. You should hand in the assignment before recitation begins.

The dictionary problem

Outline. hash tables hash functions open addressing chained hashing

CS 310 Advanced Data Structures and Algorithms

Lecture Notes on Interfaces

Understand how to deal with collisions

Hash Table and Hashing

Introduction to Hashing

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

Why do we need hashing?

Hashing Techniques. Material based on slides by George Bebis

HASH TABLES.

TABLES AND HASHING. Chapter 13

Topic HashTable and Table ADT

Adapted By Manik Hosen

Computer Science Foundation Exam

Comp 335 File Structures. Hashing

Lecture 10 March 4. Goals: hashing. hash functions. closed hashing. application of hashing

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Announcements. Submit Prelim 2 conflicts by Thursday night A6 is due Nov 7 (tomorrow!)

Hash table basics mod 83 ate. ate. hashcode()

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

Comp Sci 1MD3 Mid-Term II 2004 Dr. Jacques Carette

MIDTERM EXAM THURSDAY MARCH

DATA STRUCTURES AND ALGORITHMS

HASH TABLE BY AKARSH KUMAR

Hash table basics mod 83 ate. ate

Priority Queue. 03/09/04 Lecture 17 1

Hash table basics. ate à. à à mod à 83

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

Hashing as a Dictionary Implementation

Announcements. Hash Functions. Hash Functions 4/17/18 HASHING

Algorithms and Data Structures

Fast Lookup: Hash tables

Dictionaries and Hash Tables

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Lecture 13 Notes Sets

ECE 242 Data Structures and Algorithms. Hash Tables II. Lecture 25. Prof.

Most of this PDF is editable. You can either type your answers in the red boxes/lines, or you can write them neatly by hand.

CS 3410 Ch 20 Hash Tables

ECE 2035 Programming HW/SW Systems Fall problems, 5 pages Exam Three 28 November 2012

Hash table basics mod 83 ate. ate

COMP 103 RECAP-TODAY. Hashing: collisions. Collisions: open hashing/buckets/chaining. Dealing with Collisions: Two approaches

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

Hashing. mapping data to tables hashing integers and strings. aclasshash_table inserting and locating strings. inserting and locating strings

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

Lecture 4: Advanced Data Structures

Data Structures. Topic #6

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Lecture 13 Hash Dictionaries

Basic Idea of Hashing Hashing

Dictionaries and Hash Tables

Lecture 13 Notes Sets

200 points total. Start early! Update March 27: Problem 2 updated, Problem 8 is now a study problem.

ECE 242 Data Structures and Algorithms. Hash Tables I. Lecture 24. Prof.

Übungen zur Vorlesung Informatik II (D-BAUG) FS 2017 D. Sidler, F. Friedrich

STRUKTUR DATA. By : Sri Rezeki Candra Nursari 2 SKS

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

ECE 2035 Programming HW/SW Systems Fall problems, 5 pages Exam Three 19 November 2014

Logic, Algorithms and Data Structures ADT:s & Hash tables. By: Jonas Öberg, Lars Pareto

You must include this cover sheet. Either type up the assignment using theory5.tex, or print out this PDF.

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

Part I Anton Gerdelan

Chapter 20 Hash Tables

Data Structures (CS 1520) Lecture 23 Name:

Fundamentals of Database Systems Prof. Arnab Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

Fundamental Algorithms

a) State the need of data structure. Write the operations performed using data structures.

Structures, Algorithm Analysis: CHAPTER 5: HASHING

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Programming in C/C Lecture 2

CPSC 211, Sections : Data Structures and Implementations, Honors Final Exam May 4, 2001

CS61B Lecture #24: Hashing. Last modified: Wed Oct 19 14:35: CS61B: Lecture #24 1

Hashing. Hashing Procedures

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

CSE 332 Spring 2014: Midterm Exam (closed book, closed notes, no calculators)

Transcription:

Lecture 16 More on Hashing Collision Resolution Introduction In this lesson we will discuss several collision resolution strategies. The key thing in hashing is to find an easy to compute hash function. However, collisions cannot be avoided. Here we discuss three strategies of dealing with collisions, linear probing, quadratic probing and separate chaining. Linear Probing Suppose that a key hashes into a position that is already occupied. The simplest strategy is to look for the next available position to place the item. Suppose we have a set of hash codes consisting of {89, 18, 49, 58, 9} and we need to place them into a table of size 10. The following table demonstrates this process. Table Courtesy of Weiss Data Structures Book The first collision occurs when 49 hashes to the same location with index 9. Since 89 occupies the A[9], we need to place 49 to the next available position. Considering the array as circular, the next available position is 0. That is (9+1) mod 10. So we place 49 in A[0]. Several more collisions occur in this simple example and in each case we keep looking to find the next available location in the array to place the element. Now if we need to find the element, say for example, 49, we first compute the hash code (9), and look in A[9]. Since we do not find it there, we look in A[(9+1) % 10] = A[0], we find it there and we are done. So what if we are looking for 79? First we compute hashcode of

79 = 9. We probe in A[9], A[(9+1)%10]=A[0], A[(9+2)%10]=A[1], A[(9+3)%10]=A[2], A[(9+4)%10]=A[3] etc. Since A[3] = null, we do know that 79 could not exists in the set. Lazy Deletion When collisions are resolved using linear probing, we need to be careful about removing elements from the table as it may leave holes in the table. One strategy is to do what s called lazy deletion. That is, not to delete the element, but place a marker in the place to indicate that an element that was there is now removed. In other words, we leave the dead body there, even though it is not part of the data set anymore. So when we are looking for things, we jump over the dead bodies until we find the element or we run into a null cell. One drawback in this approach is that, if there are many removals (many dead bodies ), we leave a lot of places marked as unavailable in the array. So this could lead to a lot of wasted spaces. Also we may have to frequently resize the table to find more space. However, considering space is cheap, lazy deletion is still a good strategy. The following figure shows the lazy deletion process. Clustering in Linear Probing One problem in linear probing is that clustering could develop if many of the objects have hashed into places that are closer to each other. If the linear probing process takes long due to clustering, any advantage gained by O(1) lookups and updates can be erased. One strategy is to resize the table, when the load factor of the table exceeds 0.7. The load factor of the table is defined as number of occupied places in the table divided by the table size. The following image shows a good key distribution with little clustering and clustering developed when linear probing is used for a table of load factor 0.7.

Table Courtesy of Weiss Data Structures Book Quadratic Probing Although linear probing is a simple process where it is easy to compute the next available location, linear probing also leads to some clustering when keys are computed to closer values. Therefore we define a new process of Quadratic probing that provides a better distribution of keys when collisions occur. In quadratic probing, if the hash value is K, then the next location is computed using the sequence K + 1, K + 4, K + 9 etc.. The following table shows the collision resolution using quadratic probing.

Separate Chaining The last strategy we discuss is the idea of separate chaining. The idea here is to resolve a collision by creating a linked list of elements as shown below. In the picture above the objects, As, foo, and bar all hash to the same location in the table, that is A[0]. So we create a list of all the elements that hash into that location. Similarly, all other lists indicate keys that were hashed into the same location. Obviously a good hash function is needed so that keys can be evenly distributed. Because any uneven distribution of keys will neutralize any advantage gained by the concept of hashing. Also we must note that separate chaining requires dynamic memory management (using pointers) that may not be available in some programming languages. Also manipulating a list using pointers is generally more complicated than using a simple array. Implementation Now we will implement a hash table in C where separate chaining is used for collision resolution. First we define the node that can contain any type of a data object as follows. typedef struct node { void* data; struct node* next; } node; Next we define a struct that can contain a hash table as follows.

typedef struct hashtable { void** table; long size, capacity; double loadfactor; } hashtable; Define a pointer to a hash table. hashtable* ptr = NULL; Allocate memory for the hash table structure. ptr = malloc(sizeof(hashtable)); initialize the hash table as follows. ptr table = malloc(sizeof(void*) * INITIAL_CAPACITY); ptr capacity = INITIAL_CAPACITY; ptr size = 0; ptr loadfactor = (double)size/capacity; Assuming that function hashcode is defined, we can insert a node into the hash table as follows. First we will initialize the node. node* nodeptr = malloc(sizeof(node)); nodeptr data = malloc(strlen( guna )+1); strcpy(nodeptr data, guna\0 ); nodeptr next = NULL; now insert the node into the hashtable. ptr table[hashcode(nodeptr data)] = nodeptr;

Exercises 1. Suppose you are planning to hash 10,000 words, each of length 5 into a table of size 15,000. What percentage of the table will be filled if you use the hash function H(S) = sum of the characters of the string S, and any string that collides with an existing key will be ignored? 2. Assuming the structs Node and hashtable defined above, what is the size of node and hashtable structs assuming unix.andrew.cmu.edu machines 3. Assuming structs Node and hashtable defined above, complete the function, resolve that returns the next available position using Quadratic Probing. int resolve(hashtable* H, node* N); 4. List some of the advantages and disadvantages of using separate chaining as a strategy to resolve collisions 5. Given the keys {634, 567, 890, 534, 355, 137} find a perfect hash function that produces no collisions. 6. Implement the delete(hashtable* H, node* N) function using the lazy delete strategy.