SFU CMPT Lecture: Week 8

Similar documents
SFU CMPT Lecture: Week 9

Worst-case running time for RANDOMIZED-SELECT

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È.

Scan Scheduling Specification and Analysis

9/24/ Hash functions

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È.

From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols

Unit #5: Hash Functions and the Pigeonhole Principle

We assume uniform hashing (UH):

CS 241 Analysis of Algorithms

Hashing. Hashing Procedures

CS 270 Algorithms. Oliver Kullmann. Generalising arrays. Direct addressing. Hashing in general. Hashing through chaining. Reading from CLRS for week 7

AAL 217: DATA STRUCTURES

Week 9. Hash tables. 1 Generalising arrays. 2 Direct addressing. 3 Hashing in general. 4 Hashing through chaining. 5 Hash functions.

Fundamental Algorithms

Hashing Techniques. Material based on slides by George Bebis

Response Time Analysis of Asynchronous Real-Time Systems

9.5 Equivalence Relations

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

III Data Structures. Dynamic sets

Understand how to deal with collisions

Data Structures And Algorithms

Data Structures and Algorithm Analysis (CSC317) Hash tables (part2)

TABLES AND HASHING. Chapter 13

HASH TABLES. Hash Tables Page 1

DATA STRUCTURES AND ALGORITHMS

Online Facility Location

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

BBM371& Data*Management. Lecture 6: Hash Tables

HASH TABLES. Goal is to store elements k,v at index i = h k

Hash Table and Hashing

Successful vs. Unsuccessful

Algorithms and Data Structures

Optimal Time Bounds for Approximate Clustering

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Dictionaries and Hash Tables

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Hash Tables. Reading: Cormen et al, Sections 11.1 and 11.2

Hash Tables and Hash Functions

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

Hashing. Introduction to Data Structures Kyuseok Shim SoEECS, SNU.

COMP171. Hashing.

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

Open Addressing: Linear Probing (cont.)

Designing Networks Incrementally

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Probabilistic analysis of algorithms: What s it good for?

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

CHAPTER 8. Copyright Cengage Learning. All rights reserved.

Hashing 1. Searching Lists

Module 3: Hashing Lecture 9: Static and Dynamic Hashing. The Lecture Contains: Static hashing. Hashing. Dynamic hashing. Extendible hashing.

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

Mobile Agent Rendezvous in a Ring

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Directed Single Source Shortest Paths in Linear Average Case Time

CSE 2320 Notes 13: Hashing

CS 350 : Data Structures Hash Tables

Computing optimal linear layouts of trees in linear time

Comp 335 File Structures. Hashing

On the Performance of Greedy Algorithms in Packet Buffering

Randomized Algorithms, Hash Functions

Disjoint, Partition and Intersection Constraints for Set and Multiset Variables

Cpt S 223. School of EECS, WSU

A Modality for Recursion

Fast Lookup: Hash tables

Hash Tables. Hashing Probing Separate Chaining Hash Function

Data and File Structures Laboratory

Midterm 2. Read all of the following information before starting the exam:

Hashing 1. Searching Lists

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Worst-Case Utilization Bound for EDF Scheduling on Real-Time Multiprocessor Systems

CSE200: Computability and complexity Interactive proofs

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Expected Runtimes of Evolutionary Algorithms for the Eulerian Cycle Problem

CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS

CSCI Analysis of Algorithms I

Information-Theoretic Private Information Retrieval: A Unified Construction (Extended Abstract)

Ulysses: A Robust, Low-Diameter, Low-Latency Peer-to-Peer Network

Competitive Analysis of On-line Algorithms for On-demand Data Broadcast Scheduling

Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

A Note on Karr s Algorithm

CSC263 Week 5. Larry Zhang.

External-Memory Breadth-First Search with Sublinear I/O

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

Hash table basics. ate à. à à mod à 83

Dictionaries and Hash Tables

Theorem 2.9: nearest addition algorithm

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

More on Hashing: Collisions. See Chapter 20 of the text.

CS/COE 1501

Hashing for searching

Transcription:

SFU CMPT-307 2008-2 1 Lecture: Week 8 SFU CMPT-307 2008-2 Lecture: Week 8 Ján Maňuch E-mail: jmanuch@sfu.ca Lecture on June 24, 2008, 5.30pm-8.20pm

SFU CMPT-307 2008-2 2 Lecture: Week 8 Universal hashing Overcomes problem of bad inputs by using randomization Mentioned: if adversary chooses keys to be hashed by fixed hash function, he can choose Ò keys that all hash into same slot All fixed hash functions have this problem Universal hashing: design class of parametrized hash functions À pick random ¾ À at beginning of execution (select at random parameters)

SFU CMPT-307 2008-2 3 Lecture: Week 8 À Let be a finite collection of hash functions that map given Í universe ¼ ½ Ñ ½ into À is said to be universal if for each pair ¾ Í of keys, the number of hash functions ¾ À with µ µ is at most À Ñ, i.e., the chance of a collision is at most À Ñ ½ Ñ À In other words: with randomly chosen from À, the chance of collision between arbitrary distinct keys and is just the same (½ Ñ) as a chance of collision µ if µ and were randomly and independently chosen ¼ Ñ ½ from

SFU CMPT-307 2008-2 4 Lecture: Week 8 Properties: average-case Let À be a universal collection of hash functions Theorem. Choose ¾ À randomly to hash Ò keys (set Ã) into table Ì of size Ñ, using chaining to resolve collisions. 1. If key is not in the table, then expected length Ò µ of list that hashes to is at most «. 2. If key is in the table, then expected length Ò µ of list containing is at most ½ «. Proof. Important: expectations over the choice of a hash function, do not depend on assumptions regarding distribution of keys! For each pair of distinct keys define a random variable with ½ iff µ µ, i.e, collision By definition, È ½µ ½ Ñ, and thus ½ Ñ

¾Ã Ð ¾Ã SFU CMPT-307 2008-2 5 Lecture: Week 8 Now define for each key a random variable i.e. the number of keys other than that hash to same slot as. Clearly, ¾ ½ ¾ Ã Ñ ¾Ã ¾Ã Ñ

SFU CMPT-307 2008-2 6 Lecture: Week 8 It remains to count the number of ¾ Ã with Case 1: If ¾ Ã, then Ò µ and ¾ Ã Ò Thus Ò µ Ò Ñ «Case 2: If ¾ Ã, key appears in Ì µ. Also, count does not include itself. Thus Ò µ ½ and Thus ¾ Ã Ò ½ Ò µ ½ Ò ½µ Ñ ½ ½ «½ Ñ ½ «

SFU CMPT-307 2008-2 7 Lecture: Week 8 Universal hashing: sequence of operations Corollary. Using the universal hashing and chaining in a table Ñ with slots, it takes expected µ time to handle any sequence of Insert, Search, and Delete operations, Ç Ñµ containing Insert operations. Proof. the number of Insertions is Ç Ñµ thus the number of elements in the hash table at any time is Ò Ç Ñµ and «Ç ½µ. Insert and Delete take constant time, Search takes expected constant time (due «to Ç ½µ), by linearity of expectation, expected time for sequence of operations is Ç µ. Notice: Impossible for adversary to pick sequence of operations that forces the worst-case time This is now without the assumption of simple uniform hashing

SFU CMPT-307 2008-2 8 Lecture: Week 8 Construction of a universal class of hash functions Simple solution. take all functions mapping Í into ¼ ½ Ñ disadvantages: ½ size of the collection: Ñ Í number of random bits needed to pick a random function: Í ÐÓ Ñ a lot of preprocessing needed how to encode functions? we have to store the chosen function space needed: Í ÐÓ Ñ we need a much smaller collection of hash functions

SFU CMPT-307 2008-2 9 Lecture: Week 8 Assignment Problem 8.1. (deadline: July 8, 5:30pm) Prove that the class of all functions from Í to ¼ ½ Ñ ½ is universal.

SFU CMPT-307 2008-2 10 Lecture: Week 8 Number-theoretical solution. Let Ô be a prime number and let Ô ¼ ½ Ô ½ Ô ½ ¾ Ô ½ Lemma. For any ¾ Ô, ¾ Ô, the equation Ü ÑÓ Ô has a unique solution Ü ¾ Ô. (see Chapter 31.4 for a proof)

SFU CMPT-307 2008-2 11 Lecture: Week 8 Let Ô be a prime, large enough such that every possible key is in ¼ Ô ½, inclusive. for all ¾ Ô and ¾ Ô define hash functions : µ µ ÑÓ Ôµ ÑÓ Ñ (linear transformation followed by reductions modulo Ô and then modulo Ñ) Note: Ô Ñ since by the assumption, the size of universe of keys Í is greater than the number of slots the function maps Í Ô into Ñ Consider the universal class of hash functions: Note: size of Ô Ô ½µ class: can be chosen such that Ô ¾ Í (Bertrand s postulate) Ô picking randomly À from Ô Ñ means picking random number of bits needed to pick a function À from Ô Ñ Ç ÐÓ Í µ : À Ô Ñ ¾ Ô ¾ Ô

SFU CMPT-307 2008-2 12 Lecture: Week 8 Theorem. À Class Ô Ñ of hash functions is universal. Proof. Take two distinct keys and. How many functions are there À in Ô Ñ mapping and to the same slot? Consider ¾ À Ô Ñ. Let Ö µ ÑÓ Ô µ ÑÓ Ô Is it possible that Ö? By Lemma: No. If Ö then and are solutions to equation Ü Ö ÑÓ Ô and by uniqueness of solution,. no collisions at modulo Ô level

SFU CMPT-307 2008-2 13 Lecture: Week 8 There Ô Ô ½µ are functions À in Ô Ñ and there Ô Ô ½µ are possible Ö µ pairs such Ö that. Is there a one-to-one correspondence? Yes Values Ö and determine the function (remember and are still fixed): µ ¼ ÑÓ Ô Ö by Lemma (unknown ), is unique Ö ½ ÑÓ Ô by Lemma (unknown ), is unique implies uniquely determined by Ö and Conclusion: If we pick ¾ À Ô Ñ uniformly at random, the resulting pair Ö µ is equally likely to be any pair of distinct values from Ô.

Ô Ñ ½ Ô Ñ SFU CMPT-307 2008-2 14 Lecture: Week 8 When a collision between and happens? Hence, probability that and collide is equal to probability Ö that Ñ when Ö and are randomly chosen as distinct values from Ô. ÑÓ and collide when Ö ÑÓ Ñ Note: doesn t depend on and anymore pick any Ö, there are at most Ô Ñ choices of such that Ö ÑÓ Ñ, one of them is Ö hence, there are at most Ô Ñ ½ choices for such that Ö and hence, probability that Ö ÑÓ Ñ and Ö is at most Ö ÑÓ Ñ ½µ Ñ ½ Ô ½ ½ Ô Ô ½µ Ñ ½ Ñ Ô Final conclusion: À Ô Ñ is universal. ½

SFU CMPT-307 2008-2 15 Lecture: Week 8 Open addressing the second method for dealing with collisions recall, the first method was chaining elements are stored directly in hash table (not in linked lists) hence: the load «is always at most ½ (but the table can overflow) for every key we have a sequence of slots, called the probe sequence when inserting we try (probe) the first slot, if used, the second slot, etc., until we find an empty slot when searching we go over slots in sequence until we either find the searched element or empty slot no pointers used we can use more memory for the hash table instead, which results in fewer collisions, and so, faster searching

SFU CMPT-307 2008-2 16 Lecture: Week 8 Hash functions have now two parameters: the key ¾ Í the probe number ¾ ¼ ½ Ñ ½ Furthermore, the hash function Í ¼ ½ Ñ ½ ¼ ½ Ñ ½ must satisfy that for every key, the probe sequence ¼µ ½µ Ñ ½µ is a permutation of ¼ ½ Ñ ½ Now: every position is eventually considered as a slot for a new key (the algorithm fails with overflow only when there is no place in the entire table to insert a key)

SFU CMPT-307 2008-2 17 Lecture: Week 8 Assume that an empty slot contains the value NIL µ Hash-Insert Ì ¼ 1: 2: repeat 3: µ 4: if Ì NIL then 5: Ì 6: return 7: else ½ 8: 9: end if 10: until Ñ 11: error hash table overflow Idea: probing the probe sequence of until we find an empty slot

SFU CMPT-307 2008-2 18 Lecture: Week 8 Hash-Search Ì µ probes the same sequence, therefore the search can be terminated when an empty slot is found assuming that we didn t delete anything from the table µ Hash-Search Ì ¼ 1: 2: repeat 3: µ 4: if Ì then 5: return 6: else ½ 7: 8: end if 9: until Ì NIL or Ñ 10: return NIL returns where slot contains the key, or NIL, if is not in the table

SFU CMPT-307 2008-2 19 Lecture: Week 8 Hash-Delete Ì µ difficult operation: when we delete a key from the table, we cannot just mark the slot containing the key by NIL (empty) we could disconnect a sequence of occupied elements leading to the key Solution. mark the place with the special value DELETED (instead of NIL) modify Hash-Insert Ì µ so that it treats slot containing DELETED as empty Hash-Search Ì µ will treat such a slot as occupied, which we want, so no modification is needed Problem. the search time doesn t depend on «load #elements #slots, but rather on #elements #DELETED #slots, which can be quite bad Conclusion. when Delete operation is allowed, prefer to use hashing by chaining

SFU CMPT-307 2008-2 20 Lecture: Week 8 Assignment Problem 8.2. (deadline: July 8, 5:30pm) Write a pseudo-code for the modified Hash-Insert able to handle also the special value DELETED, and for the procedure Hash-Delete.

SFU CMPT-307 2008-2 21 Lecture: Week 8 Uniform hashing generalization of simple uniform hashing very strong conditions, in practice just some approximations are used useful for analysis assumption: each key is equally likely to have any Ñ of permutations of as its probe sequence ½ requires that all Ñ possible permutations (probe sequence) could be generated ¼ ½ Ñ Common techniques used are much worse: linear probing (only Ñ distinct probe sequences) quadratic probing (only Ñ distinct probe sequences) double (Ñ hashing distinct probe sequences) the best ¾ results

SFU CMPT-307 2008-2 22 Lecture: Week 8 Linear probing needs an ordinary hash function ¼ Í ¼ ½ Ñ ½, called an auxiliary hash function hash function: µ ¼ µ µ ÑÓ Ñ hence, for a key, the probe sequence is ¼ µ ¼ µ ½ Ò ½ ¼ ¼ µ ½ we start at the slot given by auxiliary hash function, and then probe consecutive slots first probe determines the entire sequence µ only Ñ distinct probe sequences

SFU CMPT-307 2008-2 23 Lecture: Week 8 Problem. primary clustering long clusters (sequences) of occupied slots increasing average search time why? consider a cluster with occupied slots, if the consecutive slot is empty, then it has a ½µ Ñ probability that it will be filled during next Insert operation

SFU CMPT-307 2008-2 24 Lecture: Week 8 Quadratic probing hash function: µ ¼ µ ½ ¾ ¾ µ ÑÓ Ñ where ¼ is again an auxiliary hash function, ¾ ¼ Problems: doesn t work for all values of ½ ¾ and Ñ still: a probe sequence for one key is shifted version of the probe sequence for another µ key Ñ only distinct probe sequences secondary clustering milder form of clustering (the keys mapped to the same slot will form a cluster)

SFU CMPT-307 2008-2 25 Lecture: Week 8 Assignment Problem 8.3. (deadline: July 8, 5:30pm) Consider a quadratic probing scheme with constants Ñ ¾ Ø for some integer Ø (i.e., Ñ is a power of ¾) and ½ ¾ ½ ¾. Prove that the function µ ¼ µ ½ ¾ ¾ µ ÑÓ Ñ is indeed a hash function, i.e., prove that all probe sequences are permutations ¼ ½ Ñ of ½. Hint: ¾ Congruence equivalent to ¾ ÑÓ ¾ congruence. Ø ½ ÑÓ ¾ Ø makes sense only if is even, and is

SFU CMPT-307 2008-2 26 Lecture: Week 8 Double hashing hash function: µ ½ µ ¾ µµ ÑÓ Ñ where ½ and ¾ are auxiliary hash functions first probe doesn t determine the entire sequence, depends also on the value of the second hash function (assuming we don t use ¾ ½ ) approximates behavior of uniform hashing, generated permutations have some characteristics of randomly chosen permutation

SFU CMPT-307 2008-2 27 Lecture: Week 8 Requires. The values of ¾ µ have to be relatively prime (coprime) to the size of hash Ñ table two easy ways: Ñ 1. is power of ¾, and ¾ produce only odd numbers Ñ 2. is a prime, and ¾ returns positive values smaller Ñ than each pair ½ µ ¾ µµ yields a distinct probe sequence, hence Ñ ¾ µ probing sequences are used

SFU CMPT-307 2008-2 28 Lecture: Week 8 Analysis of open-address hashing assume that we use uniform hashing : a new key has a probe sequence chosen uniformly at random from all possible permutations when we are looking for a key which is already in the table, we assume that every key is equally likely to be searched for we will express the expected number of probes in terms of the load «Ò Ñ ½

SFU CMPT-307 2008-2 29 Lecture: Week 8 Unsuccessful search Theorem. Given an open-address hash table with load «Ò Ñ ½, the expected number of probes in an unsuccessful search is at most ½ ½ «µ, assuming uniform hashing. Example. if «½ ¾ ( ¼± of table occupied) then the expected number of probes for unsuccessful search will be ½ ½ ½ ¾µ ¾ if «¼ ( ¼± of table occupied), then ½ ½ ½¼µ ½¼

SFU CMPT-307 2008-2 30 Lecture: Week 8 Proof. let be a random variable denoting the number of probes made in an unsuccessful search for ½ Ñ, let be the event that the -th probe to an occupied slot iff events ½ ½ happened and event didn t event is equivalent to event ½ ¾ ½, hence ½µ is È µ È ½ ¾ ½ µ we are looking for why do we need È µ? answer: Assignment Problem 8.4

½ ½ SFU CMPT-307 2008-2 31 Lecture: Week 8 Assignment Problem 8.4. (deadline: July 8, 5:30pm) Consider a random variable having only positive integer values (i.e., it s mapping the probability space to the set of natural numbers). Prove that È ½µ È ¾µ È µ Hint: events ½, ¾,,..., are mutually exclusive (disjoint), hence ½ È µ È µ

SFU CMPT-307 2008-2 32 Lecture: Week 8 We will use the formula from Assignment Problem 5.2 again: È ½ ¾ ½ µ È ½ µ È ¾ ½ µ È ½ ¾ ¾ ½ µ Hence, to find out È µ it s enough to evaluate conditional probabilities È ½ ½ µ. case ½: È ½ µ Ò Ñ slots, Ò are occupied, uniform hashing, the first probe can be to any of (Ñ slots) Ñ

SFU CMPT-307 2008-2 33 Lecture: Week 8 case ½: assume ½ ½ occurred hence, first ½ probes were to occupied slots -th probe cannot be to any of these slots (the probe sequence is a permutation) hence, we have Ñ ½µ slots to choose from, and Ò ½µ of them are occupied conclusion: probability È ½ ½ µ is Ò ½µ Ñ ½µ

«½ ½ ½ «½ SFU CMPT-307 2008-2 34 Lecture: Week 8 We can now calculate È µ: ¾ µ Ò È Ò ½ Ñ ½ Ò Ñ Ñ ¾ Ò ½ Ñ Using Assignment Problem 8.4, È µ ½ ½ ½ ½ ««½ ¼ if «is constant, then ½ ½ «µ is also constant, and so an unsuccessful search runs in Ç ½µ time

the same expected number of probes: ½ ½ «µ SFU CMPT-307 2008-2 35 Lecture: Week 8 Insert Corollary. Inserting an element into an open-address hash table with load «requires at most ½ ½ «µ probes on average, assuming uniform hashing. Proof. there must be space in the table, «½ inserting a key is equivalent to unsuccessful search, where we insert the key to the final (empty) slot of unsuccessful search

SFU CMPT-307 2008-2 36 Lecture: Week 8 Successful search Theorem. Given an open-address hash table with load «½, the expected number of probes of successful search is at most ½ ÐÒ «½ «Ç ½µ ½ assuming uniform hashing and assuming that each key in the table is equally likely to be searched for. Example. 50% table full: expected number of probes in successful search: at most ½ 90% table full: ¾

SFU CMPT-307 2008-2 37 Lecture: Week 8 Proof. when looking for a key, we have to repeat the same probe sequence as when inserting into the table assume that keys were inserted to the table in the order ½ ¾ Ò hence, if we are looking for the key, the number of probe is the same as the number of probes we needed when inserting to the table at that time, table contained ½ elements, and the load factor was «½µ Ñ search for requires at most ½ «½ ½ ½µ Ñ Ñ ½ Ñ ½ probes on average

#probes ½ Ò Ñ Ò ½ «SFU CMPT-307 2008-2 38 Lecture: Week 8 as it can be any of Ò keys in the table with the same probability, we have to take the average of those expected numbers of probes: Ò Ñ Ñ ½ ½ ½ Ò ½ Ñ ¼ ½ À Ñ À Ñ Ò µ «À using ÐÒ Ç ½µ ÐÒ Ñ ÐÒ Ñ Òµ Ç ½µµ ½ «ÐÒ Ñ Ò Ç ½µ Ñ ½ «ÐÒ ½ «Ç ½µ ½