Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

Similar documents
CS 361, Lecture 21. Outline. Things you can do. Things I will do. Evaluation Results

CS 561, Lecture 2 : Randomization in Data Structures. Jared Saia University of New Mexico

CS 561, Lecture 2 : Hash Tables, Skip Lists, Bloom Filters, Count-Min sketch. Jared Saia University of New Mexico

CSCI Analysis of Algorithms I

Data Structures and Algorithms. Chapter 7. Hashing

DATA STRUCTURES AND ALGORITHMS

Data Structures and Algorithms. Roberto Sebastiani

9/24/ Hash functions

Hashing Techniques. Material based on slides by George Bebis

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

CS 241 Analysis of Algorithms

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Worst-case running time for RANDOMIZED-SELECT

Algorithms and Data Structures

Dictionaries (Maps) Hash tables. ADT Dictionary or Map. n INSERT: inserts a new element, associated to unique value of a field (key)

Practical Session 8- Hash Tables

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

CSE 214 Computer Science II Searching

Hashing 1. Searching Lists

III Data Structures. Dynamic sets

Fundamental Algorithms

We assume uniform hashing (UH):

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

DATA STRUCTURES AND ALGORITHMS

Chapter 9: Maps, Dictionaries, Hashing

Algorithms and Data Structures

HASH TABLES. Goal is to store elements k,v at index i = h k

Disk Accesses. CS 361, Lecture 25. B-Tree Properties. Outline

lecture23: Hash Tables

Binary Search Trees, etc.

Binary Search Trees. Motivation. Binary search tree. Tirgul 7

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Fast Lookup: Hash tables

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Hash Tables. Reading: Cormen et al, Sections 11.1 and 11.2

CS 270 Algorithms. Oliver Kullmann. Generalising arrays. Direct addressing. Hashing in general. Hashing through chaining. Reading from CLRS for week 7

Hashing. Hashing Procedures

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Week 9. Hash tables. 1 Generalising arrays. 2 Direct addressing. 3 Hashing in general. 4 Hashing through chaining. 5 Hash functions.

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Binary Trees. Recursive definition. Is this a binary tree?

COMP171. Hashing.

SFU CMPT Lecture: Week 8

Hash Tables and Hash Functions

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

Hash Table and Hashing

IS 2610: Data Structures

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

Announcements. Biostatistics 615/815 Lecture 8: Hash Tables, and Dynamic Programming. Recap: Example of a linked list

Data Structures and Algorithm Analysis (CSC317) Hash tables (part2)

BBM371& Data*Management. Lecture 6: Hash Tables

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

Searching: Introduction

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

CSC263 Week 5. Larry Zhang.

Binary search trees. Support many dynamic-set operations, e.g. Search. Minimum. Maximum. Insert. Delete ...

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

CS 350 Algorithms and Complexity

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Lecture 7: Efficient Collections via Hashing

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Design and Analysis of Algorithms Lecture- 9: Binary Search Trees

x-fast and y-fast Tries

CS 303 Design and Analysis of Algorithms

Binary Tree. Preview. Binary Tree. Binary Tree. Binary Search Tree 10/2/2017. Binary Tree

x-fast and y-fast Tries

CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS

Lec 17 April 8. Topics: binary Trees expression trees. (Chapter 5 of text)

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

Quiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g)

Understand how to deal with collisions

HASH TABLES.

Lecture 5: Scheduling and Binary Search Trees

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Friday Four Square! 4:15PM, Outside Gates

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

CS : Data Structures

Binary search trees. Binary search trees are data structures based on binary trees that support operations on dynamic sets.

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Lecture 6: Analysis of Algorithms (CS )

CS 350 : Data Structures Hash Tables

Hashing Algorithms. Hash functions Separate Chaining Linear Probing Double Hashing

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Hashing. 7- Hashing. Hashing. Transform Keys into Integers in [[0, M 1]] The steps in hashing:

Lecture 17. Improving open-addressing hashing. Brent s method. Ordered hashing CSE 100, UCSD: LEC 17. Page 1 of 19

CSE2331/5331. Topic 6: Binary Search Tree. Data structure Operations CSE 2331/5331

Announcements. Problem Set 2 is out today! Due Tuesday (Oct 13) More challenging so start early!

Solutions to Exam Data structures (X and NV)

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

CPSC 331 Term Test #2 March 26, 2007

Transcription:

Today s Outline CS 561, Lecture 8 Jared Saia University of New Mexico Hash Tables Trees 1 Direct Addressing Problem Hash Tables If universe U is large, storing the array T may be impractical Also much space can be wasted in T if number of objects stored is small Q: Can we do better? A: Yes we can trade time for space Key Idea: An element with key k is stored in slot h(k), where h is a hash function mapping U into the set {0,..., m 1 Main problem: Two keys can now hash to the same slot Q: How do we resolve this problem? A1: Try to prevent it by hashing keys to random slots and making the table large enough A2: Chaining A3: Open Addressing 2 3

Chained Hash Analysis In chaining, all elements that hash to the same slot are put in a linked list. CH-Insert(T,x){Insert x at the head of list T[h(key(x))]; CH-Search(T,k){search for elem with key k in list T[h(k)]; CH-Delete(T,x){delete x from the list T[h(key(x))]; CH-Insert and CH-Delete take O(1) time if the list is doubly linked and there are no duplicate keys Q: How long does CH-Search take? A: It depends. In particular, depends on the load factor, α = n/m (i.e. average number of elems in a list) 4 5 CH-Search Analysis CH-Search Analysis Worst case analysis: everyone hashes to one slot so Θ(n) For average case, make the simple uniform hashing assumption: any given elem is equally likely to hash into any of the m slots, indep. of the other elems Let n i be a random variable giving the length of the list at the i-th slot Then time to do a search for key k is 1 + n h(k) Q: What is E(n h(k) )? A: We know that h(k) is uniformly distributed among {0,.., m 1 Thus, E(n h(k) ) = m 1 i=0 (1/m)n i = n/m = α 6 7

Hash Functions Division Method Want each key to be equally likely to hash to any of the m slots, independently of the other keys Key idea is to use the hash function to break up any patterns that might exist in the data We will always assume a key is a natural number (can e.g. easily convert strings to naturaly numbers) h(k) = k mod m Want m to be a prime number, which is not too close to a power of 2 Why? 8 9 Multiplication Method Open Addressing h(k) = m (ka mod 1) ka mod 1 means the fractional part of ka Advantage: value of m is not critical, need not be a prime A = ( 5 1)/2 works well in practice All elements are stored in the hash table, there are no separate linked lists When we do a search, we probe the hash table until we find an empty slot Sequence of probes depends on the key Thus hash function maps from a key to a probe sequence (i.e. a permutation of the numbers 0,.., m 1) 10 11

Open Addressing Open Addressing In general, for open addressing, the hash function depends on both the key to be inserted and the probe number Thus for a key k, we get the probe sequence h(k, 0), h(k, 1),..., h(k, m 1) If we use open addressing, the hash table can never fill up i.e. the load factor α can never exceed 1 An advantage of open addressing is that it avoids pointers and the overhead of storing lists in each slot of the table This freed up memory can be used to create more slots in the table which can reduce the load-factor and potentially speed up retrieval time A disadvantage is that deletion is difficult. If deletions occur in the hash table, chaining is usually used 12 13 OA-Insert OA-Search OA-Insert(T,k){ i = 0; repeat { j = h(k,i); if (T[j] = nil){ T[j] = k; return j; else i++; until (i==m); OA-Insert(T,k){ i = 0; repeat { j = h(k,i); if (T[j] = k){ return j; else i++; until (T[j]==nil or i==m); 14 15

OA-Delete OA-Delete Deletion from an open-address hash table is difficult When we delete a key from slot i, we can t just mark that slot as empty by storing nil there The problem is that this would make it impossible to find some key k during whose insertion we probed slot i and found it occupied One solution is to mark the slot by storing in it the value DELETED Then we modify OA-Insert to treat such a slot as if it were empty so that something can be stored in it OA-Search passes over these special slots while searching Note that if we use this trick, search times are no longer dependent on the load-factor α (for this reason, chaining is more commonly used when keys must be deleted) 16 17 Implementation Implementations To analyze open-address hashing, we make the assumption of uniform hashing: we assume that each key is equally likely to have any of the m! permutations of {0, 1,..., m 1 as its probe sequence True uniform hashing is difficult to implement, so in practice, we generally use one of three approximations on the next slide All positions are taken modulo m, and i ranges from 1 to m 1 Linear Probing: Initial probe is to position h(k), successive probes are to positions h(k) + i, Quadratic Probing: Initial probes is to position h(k), successive probes are to position h(k) + c 1 i + c 2 i 2 Double Hashing: Initial probe is to position h(k), successive probes are to positions h(k) + ih 2 (k) 18 19

Analysis Hash Tables Wrapup Recall that the load factor, α, is the number of elements stored in the hash table, n, divided by the total number of slots m In open-address hashing, we have at most one element per slot so α < 1 We assume uniform hashing i.e. each probe maps to essentially a random slot in the table. We can show that the expected time for insertions is at most 1/(1 α), the expected time for an unsuccessful search is 1/(1 α) and the expected time for a successful search is (1/α) ln[1/(1 α)] Hash Tables implement the Dictionary ADT, namely: Insert(x) - O(1) expected time, Θ(n) worst case Lookup(x) - O(1) expected time, Θ(n) worst case Delete(x) - O(1) expected time, Θ(n) worst case 20 21 Binary Search Trees Red-Black Trees Binary Search Trees are another data structure for implementing the dictionary ADT Red-Black trees (a kind of binary tree) also implement the Dictionary ADT, namely: Insert(x) - O(log n) time Lookup(x) - O(log n) time Delete(x) - O(log n) time 22 23

Why BST? Search Tree Operations Q: When would you use a Search Tree? A1: When need a hard guarantee on the worst case run times (e.g. mission critical code) A2: When want something more dynamic than a hash table (e.g. don t want to have to enlarge a hash table when the load factor gets too large) A3: Search trees can implement some other important operations... Insert Lookup Delete Minimum/Maximum Predecessor/Successor 24 25 What is a BST? Binary Search Tree Property It s a binary tree Each node holds a key and record field, and a pointer to left and right children Binary Search Tree Property is maintained Let x be a node in a binary search tree. If y is a node in the left subtree of x, then key(y) key(x). If y is a node in the right subtree of x then key(x) key(y) 26 27

Example BST Inorder Walk BSTs are arranged in such a way that we can print out the elements in sorted order in Θ(n) time Inorder Tree-Walk does this 28 29 Inorder Tree-Walk Example Tree-Walk Inorder-TW(x){ if (x is not nil){ Inorder-TW(left(x)); print key(x); Inorder-TW(right(x)); 30 31

Analysis Search in BT Correctness? Run time? Tree-Search(x,k){ if (x=nil) or (k = key(x)){ return x; if (k<key(x)){ return Tree-Search(left(x),k); else{ return Tree-Search(right(x),k); 32 33 Analysis In-Class Exercise Let h be the height of the tree The run time is O(h) Correctness??? Q1: What is the loop invariant for Tree-Search? Q2: What is Initialization? Q3: Maintenance? Q4: Termination? 34 35