Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Similar documents
CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1

Heaps. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

Hash Tables Hash Tables Goodrich, Tamassia

Priority Queue Sorting

Analysis of Algorithms

Data Structures Lecture 12

CSED233: Data Structures (2017F) Lecture10:Hash Tables, Maps, and Skip Lists

Hash Tables. Johns Hopkins Department of Computer Science Course : Data Structures, Professor: Greg Hager

ECE4050 Data Structures and Algorithms. Lecture 6: Searching

This lecture. Iterators ( 5.4) Maps. Maps. The Map ADT ( 8.1) Comparison to java.util.map

Dictionaries-Hashing. Textbook: Dictionaries ( 8.1) Hash Tables ( 8.2)

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Big-O Analysis. Asymptotics

CSE 417: Algorithms and Computational Complexity

Minimum Spanning Trees

Maps, Hash Tables and Dictionaries. Chapter 10.1, 10.2, 10.3, 10.5

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

Big-O Analysis. Asymptotics

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Analysis of Algorithms

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Analysis of Algorithms

Message Integrity and Hash Functions. TELE3119: Week4

HASH TABLES. Goal is to store elements k,v at index i = h k

Minimum Spanning Trees. Application: Connecting a Network

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Data Structures and Algorithms. Analysis of Algorithms

. Written in factored form it is easy to see that the roots are 2, 2, i,

The Magma Database file formats

Lecture 5. Counting Sort / Radix Sort

2. ALGORITHM ANALYSIS

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

University of Waterloo Department of Electrical and Computer Engineering ECE 250 Algorithms and Data Structures

DATA STRUCTURES. amortized analysis binomial heaps Fibonacci heaps union-find. Data structures. Appetizer. Appetizer

Abstract Data Types (ADTs) Stacks. The Stack ADT ( 4.2) Stack Interface in Java

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

Homework 1 Solutions MA 522 Fall 2017

Algorithm. Counting Sort Analysis of Algorithms

Computers and Scientific Thinking

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

Chapter 24. Sorting. Objectives. 1. To study and analyze time efficiency of various sorting algorithms

Computer Science Foundation Exam. August 12, Computer Science. Section 1A. No Calculators! KEY. Solutions and Grading Criteria.

Minimum Spanning Trees

condition w i B i S maximum u i

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Chapter 9. Pointers and Dynamic Arrays. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

SECURITY PROOF FOR SHENGBAO WANG S IDENTITY-BASED ENCRYPTION SCHEME

Linked Lists 11/16/18. Preliminaries. Java References. Objects and references. Self references. Linking self-referential nodes

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

AS-Index: A Structure For String Search Using n-grams and Algebraic Signatures

Lecture 1: Introduction and Strassen s Algorithm

Examples and Applications of Binary Search

The isoperimetric problem on the hypercube

Chapter 8. Strings and Vectors. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

Order statistics. Order Statistics. Randomized divide-andconquer. Example. CS Spring 2006

Chapter 8. Strings and Vectors. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Solution printed. Do not start the test until instructed to do so! CS 2604 Data Structures Midterm Spring, Instructions:

Recursion. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Review: Method Frames

Symbolic Execution with Abstraction

Greedy Algorithms. Interval Scheduling. Greedy Algorithms. Interval scheduling. Greedy Algorithms. Interval Scheduling

6.854J / J Advanced Algorithms Fall 2008

Lower Bounds for Sorting

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

Speeding-up dynamic programming in sequence alignment

Computational Geometry

Overview. Common tasks. Observation. Chapter 20 The STL (containers, iterators, and algorithms) 8/13/18. Bjarne Stroustrup

Sorting 9/15/2009. Sorting Problem. Insertion Sort: Soundness. Insertion Sort. Insertion Sort: Running Time. Insertion Sort: Soundness

Data Structures Week #9. Sorting

Dictionaries and Hash Tables

Ones Assignment Method for Solving Traveling Salesman Problem

why study sorting? Sorting is a classic subject in computer science. There are three reasons for studying sorting algorithms.

Hashing Functions Performance in Packet Classification

1.2 Binomial Coefficients and Subsets

1 Graph Sparsfication

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a

Priority Queues. Binary Heaps

Polynomial Functions and Models. Learning Objectives. Polynomials. P (x) = a n x n + a n 1 x n a 1 x + a 0, a n 0

Exercise 6 (Week 42) For the foreign students only.

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Last Class. Announcements. Lecture Outline. Types. Structural Equivalence. Type Equivalence. Read: Scott, Chapters 7 and 8. T2 y; x = y; n Types

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Massachusetts Institute of Technology Lecture : Theory of Parallel Systems Feb. 25, Lecture 6: List contraction, tree contraction, and

3. b. Present a combinatorial argument that for all positive integers n : : 2 n

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

Lecture 2: Spectra of Graphs

Graphs. Minimum Spanning Trees. Slides by Rose Hoberman (CMU)

BACHMANN-LANDAU NOTATIONS. Lecturer: Dr. Jomar F. Rabajante IMSP, UPLB MATH 174: Numerical Analysis I 1 st Sem AY

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Counting the Number of Minimum Roman Dominating Functions of a Graph

1/27/12. Vectors: Outline and Reading. Chapter 6: Vectors, Lists and Sequences. The Vector ADT. Applications of Vectors. Array based Vector: Insertion

Major CSL Write your name and entry no on every sheet of the answer script. Time 2 Hrs Max Marks 70

A graphical view of big-o notation. c*g(n) f(n) f(n) = O(g(n))

Sub-Exponential Algorithms for 0/1 Knapsack and Bin Packing

CS211 Fall 2003 Prelim 2 Solutions and Grading Guide

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Algorithm Efficiency

Transcription:

Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative Commos 2.5 Licese. 2015 Goodrich ad Tamassia Hash Tables 1

Recall the Map Operatios get(k): if the map M has a etry with key k, retur its associated value; else, retur ull put(k, v): isert etry (k, v) ito the map M; if key k is ot already i M, the retur ull; else, retur old value associated with k remove(k): if the map M has a etry with key k, remove it from M ad retur its associated value; else, retur ull size(), isempty() 2015 Goodrich ad Tamassia Hash Tables 2

Ituitive Notio of a Map Ituitively, a map M supports the abstractio of usig keys as idices with a sytax such as M[k]. As a metal warm-up, cosider a restricted settig i which a map with items uses keys that are kow to be itegers i a rage from 0 to N 1, for some N. 2015 Goodrich ad Tamassia Hash Tables 3

More Geeral Kids of Keys But what should we do if our keys are ot itegers i the rage from 0 to N 1? Use a hash fuctio to map geeral keys to correspodig idices i a table. For istace, the last four digits of a Social Security umber. 0 1 2 3 025-612-0001 981-101-0002 4 451-229-0004 2015 Goodrich ad Tamassia Hash Tables 4

Hash Fuctios ad Hash Tables A hash fuctio h maps keys of a give type to itegers i a fixed iterval [0, N - 1] Example: h(x) = x mod N is a hash fuctio for iteger keys The iteger h(x) is called the hash value of key x A hash table for a give key type cosists of Hash fuctio h Array (called table) of size N Whe implemetig a map with a hash table, the goal is to store item (k, o) at idex i = h(k) 2015 Goodrich ad Tamassia Hash Tables 5

Example We desig a hash table for a map storig etries as (SSN, Name), where SSN (social security umber) is a ie-digit positive iteger Our hash table uses a array of size N = 10,000 ad the hash fuctio h(x) = last four digits of x 0 1 2 3 4 9997 9998 9999 025-612-0001 981-101-0002 451-229-0004 200-751-9998 2015 Goodrich ad Tamassia Hash Tables 6

Hash Fuctios A hash fuctio is usually specified as the compositio of two fuctios: Hash code: h 1 : keys itegers Compressio fuctio: h 2 : itegers [0, N - 1] The hash code is applied first, ad the compressio fuctio is applied ext o the result, i.e., h(x) = h 2 (h 1 (x)) The goal of the hash fuctio is to disperse the keys i a apparetly radom way 2015 Goodrich ad Tamassia Hash Tables 7

Hash Codes Memory address: We reiterpret the memory address of the key object as a iteger. Good i geeral, except for umeric ad strig keys Iteger cast: We reiterpret the bits of the key as a iteger Suitable for keys of legth less tha or eual to the umber of bits of the iteger type (e.g., byte, short, it ad float) Compoet sum: We partitio the bits of the key ito compoets of fixed legth (e.g., 16 or 32 bits) ad we sum the compoets (igorig overflows) Suitable for umeric keys of fixed legth greater tha or eual to the umber of bits of the iteger type. 2015 Goodrich ad Tamassia Hash Tables 8

Hash Codes (cot.) Polyomial accumulatio: We partitio the bits of the key ito a seuece of compoets of fixed legth (e.g., 8, 16 or 32 bits) a 0 a 1 a -1 We evaluate the polyomial p(z) = a 0 + a 1 z + a 2 z2 + + a -1 z -1 at a fixed value z, igorig overflows Especially suitable for strigs (e.g., the choice z = 33 gives at most 6 collisios o a set of 50,000 Eglish words) Polyomial p(z) ca be evaluated i O() time usig Horer s rule: The followig polyomials are successively computed, each from the previous oe i O(1) time p 0 (z) = a -1 p i (z) = a -i-1 + zp i-1 (z) (i = 1, 2,, -1) We have p(z) = p -1 (z) 2015 Goodrich ad Tamassia Hash Tables 9

Tabulatio-Based Hashig Suppose each key ca be viewed as a tuple, k = (x 1, x 2,..., x d ), for a fixed d, where each x i is i the rage [0,M 1]. There is a class of hash fuctios we ca use, which ivolve simple table lookups, kow as tabulatio-based hashig. We ca iitialize d tables, T 1, T 2,..., T d, of size M each, so that each T i [j] is a uiformly chose idepedet radom umber i the rage [0,N 1]. We the ca compute the hash fuctio, h(k), as h(k) = T 1 [x 1 ] T 2 [x 2 ]... T d [x d ], where deotes the bitwise exclusive-or fuctio. Because the values i the tables are themselves chose at radom, such a fuctio is itself fairly radom. For istace, it ca be show that such a fuctio will cause two distict keys to collide at the same hash value with probability 1/N, which is what we would get from a perfectly radom fuctio. 2015 Goodrich ad Tamassia Hash Tables 10

Compressio Fuctios Divisio: h 2 (y) = y mod N The size N of the hash table is usually chose to be a prime The reaso has to do with umber theory ad is beyod the scope of this course Radom liear hash fuctio: h 2 (y) = (ay + b) mod N a ad b are radom oegative itegers such that a mod N 0 Otherwise, every iteger would map to the same value b 2015 Goodrich ad Tamassia Hash Tables 11

Collisio Hadlig Collisios occur whe differet elemets are mapped to the same cell Separate Chaiig: let each cell i the table poit to a liked list of etries that map there 0 1 2 3 025-612-0001 4 451-229-0004 981-101-0004 Separate chaiig is simple, but reuires additioal memory outside the table 2015 Goodrich ad Tamassia Hash Tables 12

Map with Separate Chaiig Delegate operatios to a list-based map at each cell: Algorithm get(k): retur A[h(k)].get(k) Algorithm put(k,v): t = A[h(k)].put(k,v) if t = ull the = + 1 retur t Algorithm remove(k): t = A[h(k)].remove(k) if t ull the = - 1 retur t 2015 Goodrich ad Tamassia {k is a ew key} {k was foud} Hash Tables 13

Performace of Separate Chaiig Let us assume that our hash fuctio, h, maps keys to idepedet uiform radom values i the rage [0,N 1]. Thus, if we let X be a radom variable represetig the umber of items that map to a bucket, i, i the array A, the the expected value of X, E(X) = /N, where is the umber of items i the map, sice each of the N locatios i A is eually likely for each item to be placed. This parameter, /N, which is the ratio of the umber of items i a hash table,, ad the capacity of the table, N, is called the load factor of the hash table. If it is O(1), the the above aalysis says that the expected time for hash table operatios is O(1) whe collisios are hadled with separate chaiig. 2015 Goodrich ad Tamassia Hash Tables 14

Liear Probig Ope addressig: the collidig item is placed i a differet cell of the table Liear probig: hadles collisios by placig the collidig item i the ext (circularly) available table cell Each table cell ispected is referred to as a probe Collidig items lump together, causig future collisios to cause a loger seuece of probes Example: h(x) = x mod 13 Isert keys 18, 41, 22, 44, 59, 32, 31, 73, i this order 0 1 2 3 4 5 6 7 8 9 10 11 12 41 18 44 59 32 22 31 73 0 1 2 3 4 5 6 7 8 9 10 11 12 2015 Goodrich ad Tamassia Hash Tables 15

Search with Liear Probig Cosider a hash table A that uses liear probig get(k) We start at cell h(k) We probe cosecutive locatios util oe of the followig occurs 2015 Goodrich ad Tamassia w A item with key k is foud, or w A empty cell is foud, or w N cells have bee usuccessfully probed Algorithm get(k) i h(k) p 0 repeat c A[i] if c = retur ull else if c.getkey () = k retur c.getvalue() else i (i + 1) mod N p p + 1 util p = N retur ull Hash Tables 16

Updates with Liear Probig To hadle isertios ad deletios, we itroduce a special object, called DEFUNCT, which replaces deleted elemets remove(k) We search for a etry with key k If such a etry, (k, v), is foud, we move elemets to fill the hole created by its removal. put(k, v) We throw a exceptio if the table is full We start at cell h(k) We probe cosecutive cells util a A cell i is foud that is empty. w We store (k, v) i cell i 2015 Goodrich ad Tamassia Hash Tables 17

Pseudo-code for get ad put 2015 Goodrich ad Tamassia Hash Tables 18

Pseudo-code for remove 2015 Goodrich ad Tamassia Hash Tables 19

Performace of Liear Probig I the worst case, searches, isertios ad removals o a hash table take O() time The worst case occurs whe all the keys iserted ito the map collide The load factor α = /N affects the performace of a hash table Assumig that the hash values are like radom umbers, it ca be show that the expected umber of probes for a isertio with ope addressig is 1 / (1 - α) 2015 Goodrich ad Tamassia The expected ruig time of all the dictioary ADT operatios i a hash table is O(1) with costat load < 1 I practice, hashig is very fast provided the load factor is ot close to 100% Applicatios of hash tables: small databases compilers browser caches Hash Tables 20

A More Careful Aalysis of Liear Probig Recall that, i the liear-probig scheme for hadlig collisios, wheever a isertio at a cell i would cause a collisio, the we istead isert the ew item i the first cell of i+1, i+2, ad so o, util we fid a empty cell. For this aalysis, let us assume that we are storig items i a hash table of size N = 2, that is, our hash table has a load factor of 1/2. 2015 Goodrich ad Tamassia Hash Tables 21

A More Careful Aalysis of Liear Probig, 2 Thus, if we ca boud the expected value of the sum of Y i s, the we ca boud the expected time for a search or update operatio i a liear-probig hashig scheme. 2015 Goodrich ad Tamassia Hash Tables 22

A More Careful Aalysis of Liear Probig, 2 Thus, if we ca boud the expected value of the sum of Y i s, the we ca boud the expected time for a search or update operatio i a liear-probig hashig scheme. 2015 Goodrich ad Tamassia Hash Tables 23

A More Careful Aalysis of Liear Probig, 3 2015 Goodrich ad Tamassia Hash Tables 24

A More Careful Aalysis of Liear Probig, 4 2015 Goodrich ad Tamassia Hash Tables 25

Double Hashig Double hashig uses a secodary hash fuctio d(k) ad hadles collisios by placig a item i the first available cell of the series (i + jd(k)) mod N for j = 0, 1,, N - 1 The secodary hash fuctio d(k) caot have zero values The table size N must be a prime to allow probig of all the cells Commo choice of compressio fuctio for the secodary hash fuctio: d 2 (k) = - k mod where < N is a prime The possible values for d 2 (k) are 1, 2,, 2015 Goodrich ad Tamassia Hash Tables 26

Example of Double Hashig Cosider a hash table storig iteger keys that hadles collisio with double hashig N = 13 h(k) = k mod 13 d(k) = 7 - k mod 7 Isert keys 18, 41, 22, 44, 59, 32, 31, 73, i this order k h (k ) d (k ) Probes 18 5 3 5 41 2 1 2 22 9 6 9 44 5 5 5 10 59 7 4 7 32 6 3 6 31 5 4 5 9 0 73 8 4 8 0 1 2 3 4 5 6 7 8 9 10 11 12 31 41 18 32 59 73 22 44 0 1 2 3 4 5 6 7 8 9 10 11 12 2015 Goodrich ad Tamassia Hash Tables 27