of characters from an alphabet, then, the hash function could be:

Similar documents
Module 2: Classical Algorithm Design Techniques

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

CSC Design and Analysis of Algorithms. Lecture 9. Space-For-Time Tradeoffs. Space-for-time tradeoffs

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

Chapter 7. Space and Time Tradeoffs. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

DATA STRUCTURES AND ALGORITHMS

Module 1: Asymptotic Time Complexity and Intro to Abstract Data Types

CS 350 Algorithms and Complexity

Lecture 6: Hashing Steven Skiena

Fundamental Algorithms

Hashing. Hashing Procedures

CS 270 Algorithms. Oliver Kullmann. Generalising arrays. Direct addressing. Hashing in general. Hashing through chaining. Reading from CLRS for week 7

Data Structures and Algorithms(10)

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

Hash Tables Hash Tables Goodrich, Tamassia

Practical Session 8- Hash Tables

Hash Tables. Hashing Probing Separate Chaining Hash Function

Lecture 17. Improving open-addressing hashing. Brent s method. Ordered hashing CSE 100, UCSD: LEC 17. Page 1 of 19

Lecture 7: Efficient Collections via Hashing

Week 9. Hash tables. 1 Generalising arrays. 2 Direct addressing. 3 Hashing in general. 4 Hashing through chaining. 5 Hash functions.

The dictionary problem

Hash Tables. Reading: Cormen et al, Sections 11.1 and 11.2

CS 241 Analysis of Algorithms

Data Structures and Algorithm Analysis (CSC317) Hash tables (part2)

Open Addressing: Linear Probing (cont.)

CS 3343 (Spring 2013) Exam 2

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

HASH TABLES.

Fast Lookup: Hash tables

Chapter 9: Maps, Dictionaries, Hashing

Quiz 1 Practice Problems

Algorithms and Data Structures

Understand how to deal with collisions

Dictionaries (Maps) Hash tables. ADT Dictionary or Map. n INSERT: inserts a new element, associated to unique value of a field (key)

1 CSE 100: HASH TABLES

CMSC 132: Object-Oriented Programming II. Hash Tables

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

DATA STRUCTURES AND ALGORITHMS

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Hashing 1. Searching Lists

AAL 217: DATA STRUCTURES

CSE 214 Computer Science II Searching

1 / 22. Inf 2B: Hash Tables. Lecture 4 of ADS thread. Kyriakos Kalorkoti. School of Informatics University of Edinburgh

CS2 Algorithms and Data Structures Note 4

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

CS/COE 1501

Questions. 6. Suppose we were to define a hash code on strings s by:

Final Examination CSE 100 UCSD (Practice)

Hashing file organization

THINGS WE DID LAST TIME IN SECTION

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Hashing. 7- Hashing. Hashing. Transform Keys into Integers in [[0, M 1]] The steps in hashing:

Announcements. Programming assignment 1 posted - need to submit a.sh file

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

CSCI Analysis of Algorithms I

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

COUNTING AND PROBABILITY

Successful vs. Unsuccessful

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

Module 6 NP-Complete Problems and Heuristics

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

CS369G: Algorithmic Techniques for Big Data Spring

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Hashing. October 19, CMPE 250 Hashing October 19, / 25

Hash Table. Ric Glassey

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. ECE 345 Algorithms and Data Structures Fall 2012

Hashing Techniques. Material based on slides by George Bebis

CSC263 Week 5. Larry Zhang.

Lecture 4. Hashing Methods

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

CMSC 341 Hashing. Based on slides from previous iterations of this course

This lecture. Iterators ( 5.4) Maps. Maps. The Map ADT ( 8.1) Comparison to java.util.map

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Hashing Algorithms. Hash functions Separate Chaining Linear Probing Double Hashing

Hash Tables. Hash Tables

MA/CSSE 473 Day 23. Binary (max) Heap Quick Review

Number System. Introduction. Natural Numbers (N) Whole Numbers (W) Integers (Z) Prime Numbers (P) Face Value. Place Value

Lecture 12 Hash Tables

Quiz 1 Practice Problems

Module 6 P, NP, NP-Complete Problems and Approximation Algorithms

Topic HashTable and Table ADT

Hash Functions. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

CPSC 331 Term Test #2 March 26, 2007

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Implementation with ruby features. Sorting, Searching and Haching. Quick Sort. Algorithm of Quick Sort

Graph. Vertex. edge. Directed Graph. Undirected Graph

Hash Table and Hashing

COMP3121/3821/9101/ s1 Assignment 1

1 Introduction to generation and random generation

Extended Introduction to Computer Science CS1001.py Lecture 15: The Dictionary Problem Hash Functions and Hash Tables

HASH TABLES. Goal is to store elements k,v at index i = h k

University of Waterloo CS240 Spring 2018 Help Session Problems

Transcription:

Module 7: Hashing Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu

Hashing A very efficient method for implementing a dictionary, i.e., a set with the operations: find, insert and delete Based on representation-change and space-for-time tradeoff ideas We consider the problem of implementing a dictionary of n records with keys K 1, K 2,, K n. Hashing is based on the idea of distributing keys among a onedimensional array H[0 m-1] called a hash table. The distribution is done by computing, for each of the keys, the value of some pre-defined function h called the hash function. The hash function assigns an integer between 0 and m-1, called the hash address to a key. The size of a hash table m is typically a prime integer. Typical hash functions For non-negative integers as key, a hash function could be h(k)=k mod m; If the keys are letters of some alphabet, the position of the letter in the alphabet (for example, A is at position 1 in alphabet A Z) could be used as the key for the hash function defined above. If the key is a character string c 0 c 1 c s-1 of characters from an alphabet, then, the hash function could be:

Collisions and Collision Resolution If h(k 1 ) = h(k 2 ), there is a collision Good hash functions result in fewer collisions but some collisions should be expected In this module, we will look at open hashing that works for arrays of any size, irrespective of the hash function.

Open Hashing # comparisons for an Unsuccessful search is 3.

Hashing: Space-time Tradeoff Hashing is another example for space-time tradeoff. A hash table takes space corresponding to the size of the array, but the average timecomplexity for a successful search and unsuccessful search are much lower than that of an array. In the previous example: In case of an array, the average number of comparisons for a successful search is (n+1)/2 = (6+1)/2 = 3.5; the number of comparisons for an unsuccessful search is 6.

Open Hashing Inserting and Deleting from the hash table is of the same complexity as searching. If hash function distributes keys uniformly, average length of linked list will be α = n/m. This ratio is called load factor. Average-case number of key comparisons for a successful search is α/2; Average-case number of key comparisons for an unsuccessful search is α. Worst-case number of key comparisons is Θ(n) occurs if we get a linked list containing all the n elements hashing to the same index. To avoid this, we need to be careful in selecting a proper hashing function. Mod-based hashing functions with a prime integer as the divisor are more likely to result in hash values that are evenly distributed across the keys. Open hashing still works if the number of keys, n > the size of the hash table, m.

Applications of Hashing (1) Finding whether an array is a Subset of another array Given two arrays A L (larger array) and A S (smaller array) of distinct elements, we want to find whether A S is a subset of A L. Example: A L = {11, 1, 13, 21, 3, 7}; A S = {11, 3, 7, 1}; A S is a subset of A L. Solution: Use (open) hashing. Hash the elements of the larger array, and for each element in the smaller array: search if it is in the hash table for the larger array. If even one element in the smaller array is not there in the larger array, we could stop! Time-complexity: Θ(n) to construct the hash table on the larger array of size n, and another Θ(n) to search the elements of the smaller array. A brute-force approach would have taken Θ(n 2 ) time. Space-complexity:Θ(n) with the hash table approach and Θ(1) with the brute-force approach. Note: The above solution could also be used to find whether two sets are disjoint or not. Even if one element in the smaller array is there in the larger array, we could stop!

Applications of Hashing (1) Finding whether an array is a Subset of another array Example 1: A L = {11, 1, 13, 21, 3, 7}; A S = {11, 3, 7, 1}; A S is a subset of A L. 0 1 2 3 4 Let H(K) = K mod 5. Hash table approach # comparisons = 1 (for 11) + 2 (for 3) + 1 (for 7) + 2 (for 1) = 6 11 1 21 7 13 3 Brute-force approach: Pick every element in the smaller array and do a linear search for it in the larger array. # comparisons = 1 (for 11) + 5 (for 3) + 6 (for 7) + 2 (for 1) = 14 Example 2: A L = {11, 1, 13, 21, 3, 7}; The hash table approach A S = {11, 3, 7, 4}; A S is NOT a subset of A L. would take just 1 (for 11) + 2 (for 3) + 1 (for 7) + 0 (for 4) Let H(K) = K mod 5. = 4 comparisons The brute-force approach would take: 1 (for 11) + 5 (for 3) + 6 (for 7) + 6 (for 4) = 18 comparisons.

Applications of Hashing (1) Finding whether two arrays are disjoint are not Example 1: A L = {11, 1, 13, 21, 3, 7}; A S = {22, 25, 27, 28}; They are disjoint. 0 1 2 3 4 Let H(K) = K mod 5. Hash table approach # comparisons = 1 (for 22) + 0 (for 25) + 1 (for 27) + 3 (for 28) = 5 Brute-force approach: Pick every element in the smaller array and do a linear search for it in the larger array. 11 1 21 # comparisons = 6 comparisons for each element * 4 = 24 Example 2: A L = {11, 1, 13, 21, 3, 7}; A S = {22, 25, 27, 1}; They are NOT disjoint. Let H(K) = K mod 5. 7 13 3 The hash table approach would take just 1 (for 22) + 0 (for 25) + 1 (for 27) + 2 (for 1) = 4 comparisons The brute-force approach would take: 6 (for 22) + 6 (for 25) + 6 (for 27) + 2 (for 1) = 20 comparisons.

Applications of Hashing (2) Finding Consecutive Subsequences in an Array Given an array A of unique integers, we want to find the contiguous subsequences of length 2 or above as well as the length of the largest subsequence. Assume it takes Θ(1) time to insert or search for an element in the hash table. 36 41 56 35 44 33 34 92 43 32 42 36 41 56 35 44 33 34 92 43 32 42 H(K) = K mod 7 0 1 2 3 4 5 6 35 40 42 43 44 45 55 57 34 43 32 33 91 93 42 31 33 34 35 36 37 41 56 35 42 36 92 43 44 32 33 41 34 41 42 43 44 32 33 34 35 36

Applications of Hashing (1) Finding Consecutive Subsequences in an Array Algorithm Insert the elements of A in a hash table H Largest Length = 0 for i = 0 to n-1 do if (A[i] 1 is not in H) then j = A[i] // A[i] is the first element of a possible cont. sub seq. j = j + 1 while ( j is in H) do j = j + 1 end while L searches in the Hash table H for sub sequences of length L if ( j A[i] > 1) then // we have found a cont. sub seq. of length > 1 Print all integers from A[i] (j-1) if (Largest Length < j A[i]) then Largest Length = j A[i] end if end if end if end for

Applications of Hashing (2) Finding Consecutive Subsequences in an Array Time Complexity Analysis For each element at index i in the array A we do at least one search (for element A[i] 1) in the hash table. For every element that is the first element of a sub seq. of length 1 or above (say length L), we do L searches in the Hash table. The sum of all such Ls should be n. For an array of size n, we do n + n = 2n = Θ(n) hash searches. The first n corresponds to the sum of all the lengths of the contiguous sub sequences and the second n is the sum of all the 1s (one 1 for each element in the array) 36 41 56 35 44 33 34 92 43 32 42 H(K) = K mod 7 0 1 2 3 4 5 6 56 35 42 36 92 43 44 32 33 41 34 36 41 56 35 44 33 34 92 43 32 42 35 40 42 43 44 45 55 57 34 43 32 33 91 93 42 31 33 34 35 36 37 41