Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

Similar documents
Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Understand how to deal with collisions

Hash Tables. Lower bounds on sorting Hash functions. March 05, 2018 Cinda Heeren / Geoffrey Tien 1

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

TABLES AND HASHING. Chapter 13

AAL 217: DATA STRUCTURES

CPSC 259 admin notes

Data Structures And Algorithms

CS/COE 1501

Hash Tables. Gunnar Gotshalks. Maps 1

CS 241 Analysis of Algorithms

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Hashing Techniques. Material based on slides by George Bebis

Hash Tables. Hashing Probing Separate Chaining Hash Function

Topic HashTable and Table ADT

Open Addressing: Linear Probing (cont.)

Cpt S 223. School of EECS, WSU

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

CSI33 Data Structures

HASH TABLES. Hash Tables Page 1

CS 350 : Data Structures Hash Tables

HASH TABLES.

Worst-case running time for RANDOMIZED-SELECT

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Outline. hash tables hash functions open addressing chained hashing

Hash[ string key ] ==> integer value

Hash Table and Hashing

ECE 242 Data Structures and Algorithms. Hash Tables II. Lecture 25. Prof.

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Hashing. Hashing Procedures

Data and File Structures Chapter 11. Hashing

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

COMP171. Hashing.

Hash Tables Hash Tables Goodrich, Tamassia

CMSC 341 Hashing. Based on slides from previous iterations of this course

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Introduction to Hashing

Hashing 1. Searching Lists

Dictionaries and Hash Tables

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Sorting. Quicksort analysis Bubble sort. November 20, 2017 Hassan Khosravi / Geoffrey Tien 1

Successful vs. Unsuccessful

Comp 335 File Structures. Hashing

SFU CMPT Lecture: Week 8

More on Hashing: Collisions. See Chapter 20 of the text.

Unit #5: Hash Functions and the Pigeonhole Principle

Fundamental Algorithms

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

Outline. Computer Science 331. Desirable Properties of Hash Functions. What is a Hash Function? Hash Functions. Mike Jacobson.

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Priority Queue. 03/09/04 Lecture 17 1

Data Structures. Topic #6

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

CSCI 104 Hash Tables & Functions. Mark Redekopp David Kempe

HASH TABLES. Goal is to store elements k,v at index i = h k

Queues. October 20, 2017 Hassan Khosravi / Geoffrey Tien 1

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Integer keys. Databases and keys. Chaining. Analysis CS206 CS206

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

CSE 214 Computer Science II Searching

Chapter 27 Hashing. Objectives

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Hashing. Prof. Dr. Debora Weber-Wulff Informatik 2

CS2210 Data Structures and Algorithms

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

Hash Tables and Hash Functions

Algorithms and Data Structures

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

Lecture 7: Efficient Collections via Hashing

2 Fundamentals of data structures

BINARY HEAP cs2420 Introduction to Algorithms and Data Structures Spring 2015

Hashing. mapping data to tables hashing integers and strings. aclasshash_table inserting and locating strings. inserting and locating strings

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

9/24/ Hash functions

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

Hashing 1. Searching Lists

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

The dictionary problem

Data Structures and Algorithm Analysis (CSC317) Hash tables (part2)

CSCD 326 Data Structures I Hashing

Chapter 27 Hashing. Liang, Introduction to Java Programming, Eleventh Edition, (c) 2017 Pearson Education, Inc. All rights reserved.

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

Why do we need hashing?

Desire. We want to store objects in some structure and be able to retrieve them extremely quickly. The number of items to store might be big.

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

Transcription:

Hash Tables Hash functions Open addressing November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

Review: hash table purpose We want to have rapid access to a dictionary entry based on a search key The key comes from an extremely large key space We have an array which stores a limited number of elements There should be a mathematical relation between the search key and the array index in our table Hash function! November 24, 2017 Hassan Khosravi / Geoffrey Tien 2

Hash functions A hash function is a function that map key values to array indexes Hash functions are performed in two steps Map the key value to an integer Map the integer to a legal array index Hash functions should have the following properties Fast Deterministic Uniformity November 24, 2017 Hassan Khosravi / Geoffrey Tien 3

Hash function speed Hash functions should be fast and easy to calculate Access to a hash table should be nearly instantaneous and in constant time Most common hash functions require a single division on the representation of the key Converting the key to a number should also be able to be performed quickly November 24, 2017 Hassan Khosravi / Geoffrey Tien 4

Deterministic hash functions A hash function must be deterministic For a given input it must always return the same value Otherwise it will not generate the same array index And the item will not be found in the hash table Hash functions should therefore not be determined by System time Memory location Pseudo-random numbers November 24, 2017 Hassan Khosravi / Geoffrey Tien 5

Scattering data A typical hash function usually results in some collisions Where two different search keys map to the same index A perfect hash function avoids collisions entirely Each search key value maps to a different index The goal is to reduce the number and effect of collisions To achieve this the data should be distributed evenly over the table November 24, 2017 Hassan Khosravi / Geoffrey Tien 6

Possible values i.e. the key space Any set of values stored in a hash table is an instance of the universe of possible values The universe of possible values may be much larger than the instance we wish to store There are many possible combinations of 10 letters But we might want a hash table to store 1,000 names November 24, 2017 Hassan Khosravi / Geoffrey Tien 7

Uniformity A good hash function generates each value in the output range with the same probability That is, each legal hash table index has the same chance of being generated This property should hold for the universe of possible values and for the expected inputs The expected inputs should also be scattered evenly over the hash table November 24, 2017 Hassan Khosravi / Geoffrey Tien 8

A bad hash function A hash table is to store 1,000 numeric estimates that can range from 1 to 1,000,000 Hash function h(estimate) = estimate % n Where n = array size = 1,000 Is the distribution of values from the universe of all possible values uniform? What about the distribution of expected values? November 24, 2017 Hassan Khosravi / Geoffrey Tien 9

Another bad hash function A hash table is to store 676 names The hash function considers just the first two letters of a name Each letter is given a value where a = 1, b = 2, Function = (1 st letter * 26 + value of 2 nd letter) % 676 Is the distribution of values from the universe of all possible values uniform? What about the distribution of expected values? November 24, 2017 Hassan Khosravi / Geoffrey Tien 10

General principles Use the entire search key in the hash function If the hash function uses modulo arithmetic make the table size a prime number A simple and (usually) effective hash function is Convert the key value to an integer, x h x = x mod tablesize Where tablesize is the first prime number larger than twice the size of the number of expected values But be aware that designing a good hash function is a complex subject and beyond the scope of this course! November 24, 2017 Hassan Khosravi / Geoffrey Tien 11

Converting strings to integers In the previous examples, we had a convenient numeric key which could be easily converted to an array index what about non-numeric keys (e.g. strings)? Strings are already numbers (in a way) e.g. 7/8-bit ASCII encoding "cat", 'c' = 0110 0011, 'a' = 0110 0001, 't' = 0111 0100 "cat" becomes 6,513,012 November 24, 2017 Hassan Khosravi / Geoffrey Tien 12

Strings to integers If each letter of a string is represented as an 8-bit number then for a length n string value = ch 0 *256 n-1 + + ch n-2 *256 1 + ch n-1 *256 0 For large strings, this value will be very large And may result in overflow (i.e. 64-bit integer, 9 characters will overflow) This expression can be factored ( (ch 0 *256 + ch 1 ) * 256 + ch 2 ) * ) * 256 + ch n-1 This technique is called Horner's Method This minimizes the number of arithmetic operations Overflow can then be prevented by applying the modulo operator after each expression in parentheses November 24, 2017 Hassan Khosravi / Geoffrey Tien 13

Horner s method example Consider the integer representation of some string, e.g. "Grom" 71*256 3 + 114*256 2 + 111*256 1 + 109*256 0 = 1,191,182,336 + 7,471,104 + 28,416 + 109 = 1,198,681,965 Factoring this expression results in (((71*256 + 114) * 256 + 111) * 256 + 109) = 1,198,681,965 Assume that this key is to be hashed to an index using the hash function key % 23 1,198,681,965 % 23 = 4 ((((71 % 23)*256 + 114) % 23 * 256 + 111) % 23 * 256 + 109) % 23 = 4 November 24, 2017 Hassan Khosravi / Geoffrey Tien 14

Open addressing November 24, 2017 Hassan Khosravi / Geoffrey Tien 15

Collision handling A collision occurs when two different keys are mapped to the same index Collisions may occur even when the hash function is good Inevitable due to pigeonhole principle There are two main ways of dealing with collisions Open addressing Separate chaining November 24, 2017 Hassan Khosravi / Geoffrey Tien 16

Open addressing Idea when an insertion results in a collision look for an empty array element Start at the index to which the hash function mapped the inserted item Look for a free space in the array following a particular search pattern, known as probing There are three major open addressing schemes Linear probing Quadratic probing Double hashing November 24, 2017 Hassan Khosravi / Geoffrey Tien 17

Linear probing The hash table is searched sequentially Starting with the original hash location For each time the table is probed (for a free location) add one to the index Search h(search key) + 1, then h(search key) + 2, and so on until an available location is found If the sequence of probes reaches the last element of the array, wrap around to arr[0] Linear probing leads to primary clustering The table contains groups of consecutively occupied locations These clusters tend to get larger as time goes on Reducing the efficiency of the hash table November 24, 2017 Hassan Khosravi / Geoffrey Tien 18

Linear probing example Hash table is size 23 The hash function, h x = x mod 23, where x is the search key value The search key values are shown in the table 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 29 32 58 21 November 24, 2017 Hassan Khosravi / Geoffrey Tien 19

Linear probing example Insert 81, h = 81 mod 23 = 12 Which collides with 58 so use linear probing to find a free space First look at 12 + 1, which is free so insert the item at index 13 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 29 32 58 81 21 November 24, 2017 Hassan Khosravi / Geoffrey Tien 20

Linear probing example Insert 35, h = 35 mod 23 = 12 Which collides with 58 so use linear probing to find a free space First look at 12 + 1, which is occupied so look at 12 + 2 and insert the item at index 14 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 29 32 58 81 35 21 November 24, 2017 Hassan Khosravi / Geoffrey Tien 21

Linear probing example Insert 60, h = 60 mod 23 = 14 Note that even though the key doesn t hash to 12 it still collides with an item that did First look at 14 + 1, which is free 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 29 32 58 81 35 60 21 November 24, 2017 Hassan Khosravi / Geoffrey Tien 22

Linear probing example Insert 12, h = 12 mod 23 = 12 The item will be inserted at index 16 Notice that primary clustering is beginning to develop, making insertions less efficient 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 29 32 58 81 35 60 12 21 November 24, 2017 Hassan Khosravi / Geoffrey Tien 23

Try It! Insert the items into a hash table of 29 elements using linear probing: 61, 19, 32, 72, 3, 76, 5, 34 Using a hash function: h(x) = x mod 29 Using a hash function: h(x) = (x 17) mod 29 November 24, 2017 Hassan Khosravi / Geoffrey Tien 24

Readings for this lesson Thareja Chapter 15.5.1 (Linear probing) Next class Thareja Chapter 15.5.1 (quadratic probing, double hashing) Chapter 15.5.2 (chaining) Midterm 2 solution posted to course website! See Piazza for document password same as midterm 1 solution Please bring a pencil to class next Monday for TA evaluation forms! November 24, 2017 Hassan Khosravi / Geoffrey Tien 25