! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Similar documents
CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

COMP171. Hashing.

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Hash Table and Hashing

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

AAL 217: DATA STRUCTURES

Open Addressing: Linear Probing (cont.)

Chapter 20 Hash Tables

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

! A set is a collection of objects of the same. ! {6,9,11,-5} and {11,9,6,-5} are equivalent. ! There is no first element, and no successor. of 9.

Algorithms and Data Structures

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

CSE 214 Computer Science II Searching

Fundamental Algorithms

CS 310 Advanced Data Structures and Algorithms

Hash[ string key ] ==> integer value

Comp 335 File Structures. Hashing

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

HASH TABLES. Hash Tables Page 1

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Hashing. Hashing Procedures

CS 3410 Ch 20 Hash Tables

Hashing Techniques. Material based on slides by George Bebis

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Cpt S 223. School of EECS, WSU

Topic HashTable and Table ADT

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

Unit #5: Hash Functions and the Pigeonhole Principle

Hash Tables. Gunnar Gotshalks. Maps 1

TABLES AND HASHING. Chapter 13

HASH TABLES.

Understand how to deal with collisions

Priority Queue. 03/09/04 Lecture 17 1

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Introduction to Hashing

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

Data Structures And Algorithms

Hash Tables. Hashing Probing Separate Chaining Hash Function

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

Data Structures and Algorithms(10)

Chapter 10. Sorting and Searching Algorithms. Fall 2017 CISC2200 Yanjun Li 1. Sorting. Given a set (container) of n elements

Algorithms and Data Structures

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Lecture 18. Collision Resolution

CS 2412 Data Structures. Chapter 10 Sorting and Searching

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

More on Hashing: Collisions. See Chapter 20 of the text.

CS 350 : Data Structures Hash Tables

HASH TABLES. Goal is to store elements k,v at index i = h k

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

Hash Tables and Hash Functions

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Successful vs. Unsuccessful

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

CSCD 326 Data Structures I Hashing

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

CS 206 Introduction to Computer Science II

BBM371& Data*Management. Lecture 6: Hash Tables

Hashing. October 19, CMPE 250 Hashing October 19, / 25

UNIT III BALANCED SEARCH TREES AND INDEXING

Why do we need hashing?

Data and File Structures Chapter 11. Hashing

Hashing Algorithms. Hash functions Separate Chaining Linear Probing Double Hashing

Hashing 1. Searching Lists

Topic 22 Hash Tables

Fast Lookup: Hash tables

CPSC 259 admin notes

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

CMSC 341 Hashing. Based on slides from previous iterations of this course

Hash table basics mod 83 ate. ate. hashcode()

HASH TABLES. CSE 332 Data Abstractions: B Trees and Hash Tables Make a Complete Breakfast. An Ideal Hash Functions.

Chapter 27 Hashing. Objectives

Data Structures & File Management

CSC 321: Data Structures. Fall 2016

Hash-Based Indexes. Chapter 11

Lecture 16 More on Hashing Collision Resolution

Hashing for searching

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

Hash table basics. ate à. à à mod à 83

lecture23: Hash Tables

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Dictionaries and Hash Tables

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

DS ,21. L11-12: Hashmap

CSE 373 Autumn 2012: Midterm #2 (closed book, closed notes, NO calculators allowed)

BINARY HEAP cs2420 Introduction to Algorithms and Data Structures Spring 2015

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

Transcription:

Hash Tables Chapter 20 CS 3358 Summer II 2013 Jill Seaman Sections 201, 202, 203, 204 (not 2042), 205 1 What are hash tables?! A Hash Table is used to implement a set, providing basic operations in constant time: - insert - remove (optional) - find - make (need not be constant time)! The table uses a function that maps an object in the set to its location in the table! The function is called a hash function 2 Using a hash function Placing elements in the array 8903 8 10 HandyParts company makes no more than 100 different parts But the parts all have four digit numbers This hash function can be used to store and retrieve parts in an array Hash(partNum) = partnum % 100 8903 8 10 Use the hash function Hash(partNum) = partnum % 100 to place the element with part number in the array 41 42

Placing elements in the array How to resolve the collision? Next place part number 6702 in the array One way is by linear probing This uses the following function Hash(partNum) = partnum % 100 (HashValue + 1) % 100 6702 % 100 = 2 But [2] is already occupied COLLISION OCCURS repeatedly until an empty location is found for part number 6702 43 44 Resolving the collision Collision resolved Still looking for a place for 6702 using the function Part 6702 can be placed at the location with index 4 (HashValue + 1) % 100 45 46

Collision resolved Hashing concepts 6702 Part 6702 is placed at the location with index 4 Where would the part with number 4598 be placed using linear probing?! Hash Table: where objects are stored by according to their key (usually an array) - key: attribute of an object used for searching/ sorting - number of valid keys usually greater than number of slots in the table - number of keys in use usually much smaller than table size! Hash function: maps keys to a Table index! Collision: when two separate keys hash to the same location 10 47 Hashing concepts! Collision resolution: method for finding an open spot in the table for a key that has collided with another key already in the table! Load Factor: the fraction of the hash table that is full - may be given as a percentage: 50% - may be given as a fraction in the range from 0 to 1, as in: 5! Goals: Hash Function - computation should be fast - should minimize collisions (good distribution)! Some issues: - should depend on ALL of the key (not just the last 2 digits or first 3 characters, which may not themselves be well distributed) 11 12

Hash Function! Final step of hash function is usually: temp % size - temp is some intermediate result - size is the hash table size - ensures the value is a valid location in the table! Picking a value for size: - Bad choices: a power of 2: then the result is only the lowest order bits of temp (not based on whole key) a power of 10: result is only lowest order digits of decimal number - Good choices: prime numbers 13 Hash Function: string keys! If the key is not a number, hash function must transform it to a number, to mod by the size! Method 1: Add up ascii int hash (string key, int tablesize) { int hashval = 0; for (int i=0; i<keylength(); i++) hashval = hashval + key[i]; return hashval % tablesize; } - different permutations of same chars have same hash value ( cat and act have same value) - large tablesize and short key length do not distribute well: if tablesize is 10007 and keys are 8 characters long: since ascii are <=127, hash produces between 0 and 127*8 = 1016 14 all falling in first 1/10th of the table Hash Function: string keys! Method 2: Multiply each char by a power of 128 hash = k[0]*128 3 + k[1]*128 2 + k[2]*128 1 + k[3]*128 0 This equivalent to (which avoids large intermediate results): hash = (((k[0]*128 + k[1])*128 + k[2])*128 + k[3])*128 int hash (string key, int tablesize) { int hashval = 0; for (int i=0; i<keylength(); i++) hashval = (hashval * 128 + key[i]) % tablesize; return hashval; } - now cat and act map to different (most likely) - but now we get really big numbers (overflow) - we take mod of intermediate results to reduce overflow - but mod is expensive 15 Hash Function: string keys - could just allow overflow, take mod at the end - but repeated multiplying by 128 tends to shift early chars out to the left (potentially leading to more collisions)! Method 3: Multiply each char by a power of 37 hash = (((k[0]*37 + k[1])*37 + k[2])*37 + k[3])*37 int hash (string key, int tablesize) { int hashval = 0; for (int i=0; i<keylength(); i++) hashval = hashval * 37 + key[i]; return hashval % tablesize; } - compromise 37 is prime Has good distribution - au and bp map to the same value, but collisions are 16less common than in method 1

Collision Resolution: 1 Linear Probing! Insert: When there is a collision, search sequentially for the next available slot! Find: if the key is not at the hashed location, keep searching sequentially for it - if it reaches an empty slot, the key is not found! Problem: if the the table is somewhat full, it may take a long time to find the open slot - May not be O(1) any more! Problem: Removing an element in the middle of a chain 17 Linear Probing: Example! Insert: 89, 18, 49, 58, 69, hash(k) = k mod 10 Probing function (attempt i): hi(k) = (hash(k) + i) % tablesize 49 is in 0 because 9 was full 58 is in 1 because 8, 9, 0 were full 69 is in 1 because 9, 0 were full 18 Linear Probing: delete problem 6702 Part 6702 was placed at the location with index 4, after colliding with Now remove Now find 6702 (hash(6702)=2): not at [2] [3] is empty, so not found Linear Probing: Lazy deletion! Don t remove the deleted object, just mark as deleted! During find, marked deletions don t stop the searching! During insert, the spot may be reused! If there are a lot of deletions, searching may still take a long time in a sparse table 20 46

Linear Probing: Primary Clustering! Cluster: a large, sequential block of occupied slots in the table! Any key that hashes into the cluster requires excessive attempts to resolve the collision! If it s during an insert operation, the cluster gets bigger! If two clusters are separated by one slot, a single insertion will drastically degrade the future performance! Primary clustering is a problem at high load factors (90%), not at 50% or less 21 Collision Resolution: 2 Quadratic Probing! An attempt to eliminate primary clustering! If the hash function returns H, and H is occupied, try H+1, then H+4, then H+9, - for each attempt i, try H+i 2 next Probing function (attempt i): hi(k) = (hash(k) + i 2 ) % tablesize! Is it guaranteed to find an empty slot if there is one (like linear probing)? - Yes IF: the table size is prime and the load is <= 50% 22 Quadratic Probing: Example! Insert: 89, 18, 49, 58, 69, hash(k) = k mod 10 Probing function (attempt i): hi(k) = (hash(k) + i 2 ) % tablesize 49 is in 0 because 9 was full 58 is in 2 because 8, 8+1=9 were full, (8+4)%10=2 wasn t 69 is in 3 because 9, (9+1)%10=0 were full, (9+4)%10=3 wasn t Note: smaller clusters 23 Quadratic probing: expansion of table! Since the table should be less than 50% full:! Can the table be expanded if the load factor gets more than 50%?! Yes - Find the next prime number greater than 2*tableSize, resize to that - Don t just copy all the elements (new tablesize => new hash function) - Scan old table for non-empties, and use insert function to add them to new table 24! This is called rehashing

Collision Resolution: 3 Separate chaining! Use an array of linked lists for the hash table! Each linked list contains all objects that hashed to that location - no collisions Hash function is still: h(k) = k % 10! To insert a an object: Separate Chaining - compute hash(k) - insert at front of list at that location (if empty, make first node)! To find an object: - compute hash(k) - search the linked list there for the key of the object! To delete an object: 25 - compute hash(k) - search the linked list there for the key of the object - if found, remove it 26 Separate Chaining! The load can be 1 or more - more than 1 node at each location, still O(1) inserts and finds - smaller loads do not improve performance - moderately larger loads do not hurt performance! Disadvantages - Memory allocation could be expensive - too many nodes at one position can slow operations! Advantages: - deletion is easy - don t have to resize/rehash 27