CS 241 Analysis of Algorithms

Similar documents
Hashing Techniques. Material based on slides by George Bebis

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

CS 270 Algorithms. Oliver Kullmann. Generalising arrays. Direct addressing. Hashing in general. Hashing through chaining. Reading from CLRS for week 7

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Week 9. Hash tables. 1 Generalising arrays. 2 Direct addressing. 3 Hashing in general. 4 Hashing through chaining. 5 Hash functions.

Understand how to deal with collisions

Data Structures and Algorithms. Chapter 7. Hashing

Worst-case running time for RANDOMIZED-SELECT

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

Data Structures and Algorithms. Roberto Sebastiani

Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

Hash Tables. Hashing Probing Separate Chaining Hash Function

Chapter 9: Maps, Dictionaries, Hashing

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

SFU CMPT Lecture: Week 8

HASH TABLES.

BBM371& Data*Management. Lecture 6: Hash Tables

Lecture 4. Hashing Methods

Fundamental Algorithms

Hash Tables Hash Tables Goodrich, Tamassia

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

DATA STRUCTURES AND ALGORITHMS

Chapter 27 Hashing. Objectives

This lecture. Iterators ( 5.4) Maps. Maps. The Map ADT ( 8.1) Comparison to java.util.map

HASH TABLES. Goal is to store elements k,v at index i = h k

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

CS 10: Problem solving via Object Oriented Programming Winter 2017

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

CSCI Analysis of Algorithms I

Hash Table and Hashing

CS 350 : Data Structures Hash Tables

CMSC 341 Hashing. Based on slides from previous iterations of this course

9/24/ Hash functions

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

Chapter 27 Hashing. Liang, Introduction to Java Programming, Eleventh Edition, (c) 2017 Pearson Education, Inc. All rights reserved.

Algorithms and Data Structures

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Introduction to Hashing

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

III Data Structures. Dynamic sets

Hash Tables. Johns Hopkins Department of Computer Science Course : Data Structures, Professor: Greg Hager

Hashing. Hashing Procedures

Priority Queue Sorting

Hash Tables. Gunnar Gotshalks. Maps 1

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

DS ,21. L11-12: Hashmap

Data Structures and Algorithm Analysis (CSC317) Hash tables (part2)

CSC263 Week 5. Larry Zhang.

Lecture 17. Improving open-addressing hashing. Brent s method. Ordered hashing CSE 100, UCSD: LEC 17. Page 1 of 19

Hashing 1. Searching Lists

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Practical Session 8- Hash Tables

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

Dictionaries and Hash Tables

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

Part I Anton Gerdelan

Data Structures Lecture 12

CSE 214 Computer Science II Searching

AAL 217: DATA STRUCTURES

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Dictionaries-Hashing. Textbook: Dictionaries ( 8.1) Hash Tables ( 8.2)

COMP171. Hashing.

TABLES AND HASHING. Chapter 13

CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

CS/COE 1501

Unit #5: Hash Functions and the Pigeonhole Principle

Data Structures And Algorithms

CS2210 Data Structures and Algorithms

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

HASH TABLES. Hash Tables Page 1

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

CSE 2320 Notes 13: Hashing

CSI33 Data Structures

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

THINGS WE DID LAST TIME IN SECTION

csci 210: Data Structures Maps and Hash Tables

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

Dictionaries and Hash Tables

Data and File Structures Chapter 11. Hashing

Hash Tables. Hash Tables

Maps, Hash Tables and Dictionaries. Chapter 10.1, 10.2, 10.3, 10.5

of characters from an alphabet, then, the hash function could be:

1 CSE 100: HASH TABLES

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Hash Tables. Reading: Cormen et al, Sections 11.1 and 11.2

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

CS 241 Analysis of Algorithms

MA/CSSE 473 Day 23. Binary (max) Heap Quick Review

Hash Tables and Hash Functions

Hashing. October 19, CMPE 250 Hashing October 19, / 25

Transcription:

CS 241 Analysis of Algorithms Professor Eric Aaron Lecture T Th 9:00am Lecture Meeting Location: OLB 205 Business HW5 extended, due November 19 HW6 to be out Nov. 14, due November 26 Make-up lecture: Wed, Nov. 13, 4:30pm, OLB205 (tentative) Exam back today Reading: CLRS Ch. 12.1-12.3, and the unstarred parts of Ch. 11 1

Business pt. 2 A Note From Your (Vassar CS) Majors Committee: What: Java Review Session! When: Tonight! 8pm! Where: OLB 104! Hashing; Hash Tables Consider a case where there are many possible keys (or elements) to be stored, but relatively few of them are used U: Universe of keys; K: keys being used; K << U Then, look for an option with efficient operations but space on the order of K, not the order of U Important approach to dictionaries in this case: hashing Elements are stored (and searched for) in a hash table an array T that s typically of size related to K, not related to U Instead of key value k as the index into T, compute h(k) using a hash function h, and use h(k) as the array index Goals: Fast operations (O(1) time) Space-efficient data structure (O(n) space to store n elements) Things to think about: -- How do we define h? -- What if there s a collision, i.e., h(x) == h(y) for some x!= y? 2

Hash Functions and Collisions Hash table: Array T[0..m-1], where m is prime and m << U Hash function: maps any key value into [0..m-1], so every element could be stored / referenced in T Things to think about: How do we define h? And what if there s a collision, i.e., h(x) == h(y) for some x!= y? Because hash functions compute search keys, it is essential that hash functions map equal keys to the same table index Good hash functions minimize the chance of collisions Managing collisions is essential for hashing What kinds of problems do collisions cause for hashing as a method to implement a dictionary? What approaches can you think of to handle collisions, when they occur? Remember, hashing needs to support: insert, delete, search operations Hash Functions and Collisions Hash table: Array T[0..m-1], where m is prime and m << U Hash function: maps any key value into [0..m-1], so every element could be stored / referenced in T Things to think about: How do we define h? And what if there s a collision, i.e., h(x) == h(y) for some x!= y? Because hash functions compute search keys, it is essential that hash functions map equal keys to the same table index Good hash functions minimize the chance of collisions Managing collisions is essential for hashing. Two approaches: Chaining: Each cell in the table is a linked list of elements mapped to that index Open addressing: When there s a collision, move along through the table until an open index, or the element being searched for, is found 3

Hash Functions A good hash function is efficient to compute A really good hash function satisfies (or almost satisfies) the assumption of simple uniform hashing: Each key is equally likely to hash to any slot in the hash table In practice, it s generally not possible to achieve this it s not known in advance what the likelihoods are for keys to be chosen Heuristics or other intelligent choices, however, can yield good performance Hash functions essentially compute an integer summary of the object (e.g., the object being searched for) Thus, hash functions return natural numbers, and they often presume that their input keys are natural numbers If keys aren t numbers, they must somehow be mapped to numbers For example: How could a character string be represented as an integer? Hash Function Examples: Strings Some possible hash functions for String objects Treat each String as a sequence of Unicode characters, then: Sum of Unicode codes hash(s) = s 0 + s 1 + + s n-1 e.g.: hash( now ) = 110 + 111 + 119 = 340 Not a great choice: Few codes, uneven distribution Shifted sum of Unicode codes hash(s) = s 0 b n-1 + s 1 b n-2 + + s n-2 b + s n-1 e.g.: hash( now ) = 110*b 2 + 111*b 1 + 119*b 0 Choices for b: 2 16, prime numbers But wouldn t this result in very big hash values? How would we get them to smaller values for a space-efficient table? 4

Compression; and The Division Method For an object, its index into a hash table could be computed by a two-step hash function: A hash-code function h 1 (not the same as hash function h) computes a value, such as the shifted Unicode sum Then, to compress the range of hash codes into the range of table indices, compute the remainder mod m of that hash code Thus, h(k) = h 1 (k) mod m, for an m-sized hash table Notes: Simple case: h 1 (k) = k, so h(k) = k mod m. CLRS calls this the division method Typically, m is chosen to be a prime number this helps spread out hash values and avoid collisions Digression: Horner s Method Recall hash function: Shifted sum of Unicode codes hash(s) = s 0 b n-1 + s 1 b n-2 + + s n-2 b + s n-1 Horner s method can simplify calculation of such a code a 0 b n-1 + a 1 b n-2 +... + a n-2 b + a n-1 = ((a 0 b + a 1 )b +... + a n-2 )b + a n-1 Right-hand side is efficient to compute! Horner s method For hashing, we often want the hash value mod M. By properties of % operator, (a*b % n) == (a % n) * (b % n) (a+b % n) == (a % n) + (b % n) Can use these facts in Horner s method: x = 0 for i=0 to n-1 x = x*b + a[i] Horner s method with % M x = 0 for i=0 to n-1 x = (x*b + a[i]) % M 5

Collisions Recall: two approaches to managing hash-value collisions Chaining: use a general hash function and put all elements that hash to same location in a linked list at that location. Open addressing: use a general hash function as in chaining, and then increment the original position until an empty slot (or the element you are looking for) is found. One index position for each element in table. Chaining Idea: Each cell T[k] in the hash table is a linked list (chain) T[k] is the head node of a linked list containing all hashed objects x with h(x) = k Default: unsorted list, singly-linked list List lengths after storing n elements in a table of size m Load factor (average list length) is α = n / m With good hash function, each list likely to have length close to α What are the running times of the Insert, Delete, Search operations? 6

Open Addressing Instead of chaining with linked lists, collisions could be resolved by storing every element directly in the table Each slot in the table contains either NIL or an element (or a reference to that element) Note: Table may fill up! But α will never be greater than 1 Idea: compute hash value as with chaining, but if collision occurs, successively index into (i.e., probe) T until When inserting an element, systematically find an open slot into which the element can be placed When searching, systematically examine slots under either finding the element or determining it s not in the table (How about deleting is deleting simple? Think about it as it relates to searching ) Probe sequence depends both on key and on probe increment Open Addressing; Probe Sequence With table size T = m, the probe sequence sequence of table indices examined for a key value must be a permutation of <0, 1,, m-1> If it weren t, then some slots in the table might never be considered as the table becomes full Recall also that probe sequence depends in part on key being hashed (as well as other factors that determine probe increment) For theoretical analysis, we assume uniform hashing the probe sequence of each key is equally likely to be any of the m! permutations of <0, 1,, m-1> (In practice, we try to approach that performance with approximations) Summary: Hashing with open addressing Compute hash index h(k,i) for the i th index in the probe sequence, where k is the key for hashing Thus, probe sequence for key k is h(k,0), h(k,1),..., h(k,m-1) In worst case, every slot in T is examined before finding an empty slot (or the element being searched for) 7

An Example: Linear Probing Three common techniques for generating probe sequences that are guaranteed to be permutations of <0,1, m-1> Linear probing; quadratic probing; double hashing Each of these uses auxiliary hash functions as part of the full hash function i.e., hash function h is in terms of some h'(k) or h 1 (k), etc. Linear probing is the simplest of the three Probe sequence: if slot is full, go to the next; wrap around if needed Function: h(k, i) = (h'(k) + i) mod m [k is the element s key; i is the probe number, which goes from 0 to m-1] Trade-offs: simple to implement, but primary clustering long ranges of consecutive filled slots build up, making probe sequences longer If an empty slot is preceded by i filled slots, then i+1 keys will fill that slot next! Hash-Search with Open Addressing An example of a function on a hash table with open addressing: Searching in the hash table T Note index probed at i th probe is h(k,i) Because each key has a unique probe sequence, the sequence followed when searching will be the same as the one followed when inserting the element How would this work with deleting from the table? What if deleting an element was simply replacing it by NIL in T? 8