CSE 2320 Notes 13: Hashing

Similar documents
CSE 5311 Notes 5: Hashing

CSE 5311 Notes 9: Hashing

Hash Tables. Hashing Probing Separate Chaining Hash Function

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

Hash Table and Hashing

Fast Lookup: Hash tables

Hashing 1. Searching Lists

BBM371& Data*Management. Lecture 6: Hash Tables

Understand how to deal with collisions

Hashing Techniques. Material based on slides by George Bebis

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Data Structures and Algorithms. Chapter 7. Hashing

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Data Structures and Algorithm Analysis (CSC317) Hash tables (part2)

Data Structures and Algorithms. Roberto Sebastiani

Hashing. Hashing Procedures

Hashing 1. Searching Lists

CS 241 Analysis of Algorithms

9/24/ Hash functions

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

TABLES AND HASHING. Chapter 13

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Successful vs. Unsuccessful

Worst-case running time for RANDOMIZED-SELECT

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

( D. Θ n. ( ) f n ( ) D. Ο%

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

SFU CMPT Lecture: Week 8

Open Addressing: Linear Probing (cont.)

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

IS 2610: Data Structures

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

Fundamental Algorithms

AAL 217: DATA STRUCTURES

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

III Data Structures. Dynamic sets

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

DATA STRUCTURES AND ALGORITHMS

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Outline. hash tables hash functions open addressing chained hashing

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

DS ,21. L11-12: Hashmap

Practical Session 8- Hash Tables

Comp 335 File Structures. Hashing

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

COMP171. Hashing.

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

Algorithms and Data Structures

Cpt S 223. School of EECS, WSU

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

Hashing. Biostatistics 615 Lecture 12

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

CPSC 259 admin notes

Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

Hash Tables and Hash Functions

CS 3114 Data Structures and Algorithms READ THIS NOW!

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Hashing. It s not just for breakfast anymore! hashing 1

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Part I Anton Gerdelan

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

More on Hashing: Collisions. See Chapter 20 of the text.

( ). Which of ( ) ( ) " #& ( ) " # g( n) ( ) " # f ( n) Test 1

Chapter 20 Hash Tables

CS 350 Algorithms and Complexity

CSI33 Data Structures

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

Randomized Algorithms: Element Distinctness

HASH TABLES. Goal is to store elements k,v at index i = h k

CSE 214 Computer Science II Searching

Topic HashTable and Table ADT

Question Bank Subject: Advanced Data Structures Class: SE Computer

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Algorithms and Data Structures

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

DATA STRUCTURES/UNIT 3

Data Structures And Algorithms

Implementation of Linear Probing (continued)

Chapter 27 Hashing. Objectives

CSc 120. Introduc/on to Computer Programming II. 15: Hashing

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

CSCI 104 Hash Tables & Functions. Mark Redekopp David Kempe

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

Hashing. October 19, CMPE 250 Hashing October 19, / 25

Dictionaries and Hash Tables

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

CS 10: Problem solving via Object Oriented Programming Winter 2017

Unit #5: Hash Functions and the Pigeonhole Principle

CS 350 : Data Structures Hash Tables

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Hashing (Κατακερματισμός)

HASH TABLES.

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Transcription:

CSE 30 Notes 3: Hashing (Last updated 0/7/7 : PM) CLRS.-. (skip.3.3) 3.A. CONCEPTS Goal: Achieve faster operations than balanced trees (nearly Ο( ) epected time) by using randomness in key sets by sacrificing ) generality and ) ordered retrieval. Set of Potential Keys Mapping (h) Table Subscripts Regardless of the hash function, a dynamic set of keys will lead to collisions. Birthday parado 3 different birthdays available How many (random) persons are needed to have at least even odds of two persons with the same birthday? 3 Probability of k persons having k different birthdays is k i= 3 i 3 probability of unique birthdays among 0 people is probability of unique birthdays among people is probability of unique birthdays among people is 0.9978 probability of unique birthdays among 3 people is 0.9988 probability of unique birthdays among people is 0.557 probability of unique birthdays among people is 0.559 probability of unique birthdays among 3 people is 0.9377 probability of unique birthdays among people is 0.5 probability of unique birthdays among 57 people is 0.0000 probability of unique birthdays among 58 people is 0.0085

3.B. HASH FUNCTIONS Modular (AKA remaindering or division method) Multiplicative h(key) = key % m m is the table size Folklore: Make m prime, regardless of collision handling technique. Double hashing requires. hash = m * (0.703587*key - (int)(0.703587*key)); Universal Hashing - aside Use parameterized hash function to minimize chance of getting collisions beyond epectation. Parameters are randomly generated when hash structure is initialized. Tet Strings as Key scanf("%s",str); hash=0; for (i=0; str[i]!=0; i++) hash = (hash*0 + str[i]) % m; printf("%s => %d\n",str,hash); A string s signature may be stored in a data structure, even if hashing is not used. 3.C. COLLISION HANDLING BY CHAINING Concept Use table of pointers to unordered linked lists. Elements of a list have the same signature. Load Factor = α = # elements stored # in table Often stated as a per cent. For some methods, such as chaining, α can eceed 00%. Epected s is n m = α for hits and n = α for misses. m

3 3.D. COLLISION HANDLING BY OPEN ADDRESSING Saves space when records are small, so chaining would waste a large fraction of space for links. Collisions are handled by using a sequence for each key a permutation of the table s subscripts. Hash function is h(key, i) where i is the number of re attempts tried. Two special key values (or flags) are often used: never-used (-) and recycled (-). Searches stop on never-used, but continue on recycled. (For linear probing, but not double hashing - can also reinsert records past the emptied slot for a deletion.) Linear Probing - h(key, i) = (key + i) % m Properties:. Probe sequences eventually hit all.. Probe sequences wrap back to beginning of table. 3. Long clusters of contiguous occupied are costly for misses.. There are only m sequences. Two keys hashing to same initial slot have the same sequence. What about using h(key, i) = (key + *i) % 0 or h(key, i) = (key + 50*i) % 000?

Suppose all keys are equally likely to be accessed. Is there a best order for inserting keys? Insert keys: 0, 7, 0, 03, 0, 05, 0 0 3 5 0 3 5 Double Hashing h(key, i) = (h (key) + i*h (key)) % m Properties:. Probe sequences will hit all only if m is prime.. m(m ) sequences. Unlikely that two keys hashing to the same initial slot will have the same sequence. 3. Minimizes effect of clustering. Typical Hash Functions: h = key % m h = + key % (m )

5 0 3 5 7 8 9 0 Key h h 33 0 0 9 8 3 5 0 0 9 3 0 9 3 00 0 9 305 5 0 3.E. UPPER BOUNDS ON EXPECTED PERFORMANCE FOR OPEN ADDRESSING Double hashing comes very close to these results, but analysis assumes that hash function provides all m! permutations of subscripts.. Unsuccessful search when load factor is α = n. Each successive has the effect of decreasing m both the number of in the table and the number of occupied by one. Before first first second third fourth n n- n- n-3 n- m m- m- m-3 m-

a. Probability that a search has a first b. Probability that search goes on to a second α = n m c. Probability that search goes on to a third α n m < α n m < α d. Probability that search goes on to a fourth α n n m m < α n m < α3... Suppose the table is large. Sum the probabilities for s to get upper bound on epected number of s: α i = α i=0 (much worse than chaining). Inserting a key when load factor is α a. Eactly like unsuccessful search b. Upper bound of 3. Successful search α s a. Searching for a key takes as many s as inserting that particular key. b. Each inserted key increases the load factor, so the inserted key number i + is epected to take no more than i m = m m i s

c. Find epected s for n keys inserted into an empty table (each key is equally likely to be requested): 7 n n i=0 m m i = m n Sum is n i=0 m i m + m +...+ m n + = m m m n i=m n+ i n = m n m d Upper bound on sum for decreasing function. m n ( ln m ln ( m n )) = α ln m m n = α ln α alpha 0.00 unsuccessful (insert).50 successful. alpha 0.50 unsuccessful (insert).333 successful.5 alpha 0.300 unsuccessful (insert).9 successful.89 alpha 0.350 unsuccessful (insert).538 successful.3 alpha 0.00 unsuccessful (insert).7 successful.77 alpha 0.50 unsuccessful (insert).88 successful.39 alpha 0.500 unsuccessful (insert).000 successful.38 alpha 0.550 unsuccessful (insert). successful.5 alpha 0.00 unsuccessful (insert).500 successful.57 alpha 0.50 unsuccessful (insert).857 successful.5 alpha 0.700 unsuccessful (insert) 3.333 successful.70 alpha 0.750 unsuccessful (insert).000 successful.88 alpha 0.800 unsuccessful (insert) 5.000 successful.0 alpha 0.850 unsuccessful (insert).7 successful.3 alpha 0.900 unsuccessful (insert) 0.000 successful.558 alpha 0.90 unsuccessful (insert). successful. alpha 0.90 unsuccessful (insert).500 successful.75 alpha 0.930 unsuccessful (insert).8 successful.859 alpha 0.90 unsuccessful (insert). successful.993 alpha 0.950 unsuccessful (insert) 0.000 successful 3.53 alpha 0.90 unsuccessful (insert) 5.000 successful 3.353 alpha 0.970 unsuccessful (insert) 33.333 successful 3.5 alpha 0.980 unsuccessful (insert) 9.998 successful 3.99 alpha 0.990 unsuccessful (insert) 99.993 successful.5 Fast and Powerful Hashing Using Tabulation, CACM 0 (7), July 07, https://dl-acm-org.ezproy.uta.edu/citation.cfm?id=30877