Fundamental Algorithms

Similar documents
Hash Tables. Hashing Probing Separate Chaining Hash Function

Worst-case running time for RANDOMIZED-SELECT

Hash Table and Hashing

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Hashing. Hashing Procedures

HASH TABLES.

AAL 217: DATA STRUCTURES

COMP171. Hashing.

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

HASH TABLES. Goal is to store elements k,v at index i = h k

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Algorithms and Data Structures

DATA STRUCTURES AND ALGORITHMS

Practical Session 8- Hash Tables

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

CS 241 Analysis of Algorithms

Hashing Techniques. Material based on slides by George Bebis

9/24/ Hash functions

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

Understand how to deal with collisions

Comp 335 File Structures. Hashing

SFU CMPT Lecture: Week 8

Data Structures and Algorithms. Chapter 7. Hashing

Algorithms and Data Structures

TABLES AND HASHING. Chapter 13

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Data Structures and Algorithms. Roberto Sebastiani

Successful vs. Unsuccessful

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Dictionaries and Hash Tables

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Hash Tables and Hash Functions

of characters from an alphabet, then, the hash function could be:

Data Structures and Algorithm Analysis (CSC317) Hash tables (part2)

Data and File Structures Chapter 11. Hashing

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

III Data Structures. Dynamic sets

BBM371& Data*Management. Lecture 6: Hash Tables

Topic HashTable and Table ADT

Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

CSCI Analysis of Algorithms I

Hashing 1. Searching Lists

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

CMSC 341 Hashing. Based on slides from previous iterations of this course

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

CSE 214 Computer Science II Searching

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Fast Lookup: Hash tables

CSC263 Week 5. Larry Zhang.

CS 350 : Data Structures Hash Tables

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

Introduction to Hashing

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Data Structures And Algorithms

Open Addressing: Linear Probing (cont.)

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Unit #5: Hash Functions and the Pigeonhole Principle

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

CS 270 Algorithms. Oliver Kullmann. Generalising arrays. Direct addressing. Hashing in general. Hashing through chaining. Reading from CLRS for week 7

Hash Tables. Reading: Cormen et al, Sections 11.1 and 11.2

CSE 2320 Notes 13: Hashing

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

CS 10: Problem solving via Object Oriented Programming Winter 2017

Adapted By Manik Hosen

CSI33 Data Structures

Introduction To Hashing

1. Attempt any three of the following: 15

DATA STRUCTURES AND ALGORITHMS

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

Hash Tables Hash Tables Goodrich, Tamassia

Hash table basics mod 83 ate. ate. hashcode()

Dictionaries and Hash Tables

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

HASH TABLES. Hash Tables Page 1

Direct File Organization Hakan Uraz - File Organization 1

We assume uniform hashing (UH):

CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS

6 Hashing. public void insert(int key, Object value) { objectarray[key] = value; } Object retrieve(int key) { return objectarray(key); }

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

4 Hash-Based Indexing

UNIT III BALANCED SEARCH TREES AND INDEXING

More on Hashing: Collisions. See Chapter 20 of the text.

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

Part I Anton Gerdelan

Lecture 4. Hashing Methods

Week 9. Hash tables. 1 Generalising arrays. 2 Direct addressing. 3 Hashing in general. 4 Hashing through chaining. 5 Hash functions.

Transcription:

Fundamental Algorithms Chapter 7: Hash Tables Michael Bader Winter 2011/12 Chapter 7: Hash Tables, Winter 2011/12 1

Generalised Search Problem Definition (Search Problem) Input: a sequence or set A of n elements A, and an x A. Output: Index i {1,..., n with x = A[i], or NIL, if x A. complexity depends on data structure complexity of operations to set up data structure? (insert/delete) Definition (Generalised Search Problem) Store a set of objects consisting of a key and additional data: Object := ( key : Integer,. record : Data ) ; search/insert/delete objects in this set Chapter 7: Hash Tables, Winter 2011/12 2

Direct-Address Tables Definition (table as data structure) similar to array: access element via index usually contains elements only for some of the indices Direct-Address Table: assume: limited number of values for the keys: U = {0, 1,..., m 1 allocate table of size m use keys directly as index Chapter 7: Hash Tables, Winter 2011/12 3

Direct-Address Tables (2) D i r A d d r I n s e r t ( T : Table, x : Object ) { T [ x. key ] := x ; DirAddrDelete ( T : Table, x : Object ){ T [ x. key ] := NIL ; DirAddrSearch ( T : Table, key : Integer ){ return T [ key ] ; Chapter 7: Hash Tables, Winter 2011/12 4

Direct-Address Tables (3) Advantage: very fast: search/delete/insert is Θ(1) Disadvantages: m has to be small, or otherwise, the table has to be very large! if only few elements are stored, lots of table elements are unused (waste of memory) all keys need to be distinct (they should be, anyway) Chapter 7: Hash Tables, Winter 2011/12 5

Hash Tables Idea: compute index from key Wanted: function h that maps a given key to an index, has a relatively small range of values, and can be computed efficiently, Definition (hash function, hash table) Such a function h is called a hash function. The respective table is called a hash table. Chapter 7: Hash Tables, Winter 2011/12 6

Hash Tables Insert, Delete, Search HashInsert ( T : Table, x : Object ) { T [ h ( x. key ) ] := x ; HashDelete ( T : Table, x : Object ) { T [ h ( x. key ) ] : = NIL ; HashSearch ( T : Table, x : Object ) { return T [ h ( x. key ) ] ; Chapter 7: Hash Tables, Winter 2011/12 7

So Far: Naive Hashing Advantages: still very fast: search/delete/insert is Θ(1), if h is Θ(1) size of the table can be chosen freely, provided there is an appropriate hash function h Disadvantages: values of h have to be distinct for all keys however: impossible to find a hash function that produces distinct values for any set of stored data ToDo: deal with collisions: objects with different keys that share a common hash value have to be stored in the same table element Chapter 7: Hash Tables, Winter 2011/12 8

Resolve Collisions by Chaining Idea: use a table of containers containers can hold an arbitrarily large amount of data lists as containers: chaining ChainHashInsert ( T : Table, x : Object ) { i n s e r t x i n t o T [ h ( x. key ) ] ; ChainHashDelete ( T : Table, x : Object ) { d e lete x from T [ h ( x. key ) ] ; Chapter 7: Hash Tables, Winter 2011/12 9

Resolve Collisions by Chaining ChainHashSearch ( T : Table, x : Object ) { return ListSearch ( x, T [ h ( x. key ) ] ) ;! r e s u l t : reference to x or NIL, i f x not found ; Advantages: hash function no longer has to return distinct values still very fast, if the lists are short Disadvantages: delete/search is Θ(k), if k elements are in the accessed list worst case: all elements stored in one single list (very unlikely). Chapter 7: Hash Tables, Winter 2011/12 10

Chaining Average Search Complexity Assumptions: hash table has m slots (table of m lists) contains n elements load factor: α = n m h(k) can be computed in O(1) for all k all values of h are equally likely to occur Search complexity: on average, the list corresponding to the requested key will have α elements unsuccessful search: compare the requested key with all objects in the list, i.e. O(α) operations successful search: requested key last in the list; also O(α) operations Expected: Average complexity: O(1 + α) operations Chapter 7: Hash Tables, Winter 2011/12 11

Hash Functions A good hash function should: satisfy the assumption of even distribution: each key is equally likely to be hashed to any of the slots: k : h(k)=j be easy to compute (P(key = k)) = 1 m for all j = 0,..., m 1 be non-smooth : keys that are close together should not produce hash values that are close together (to avoid clustering) Simplest choice: h = k mod m (m a prime number) easy to compute; even distribution if keys evenly distributed however: not non-smooth Chapter 7: Hash Tables, Winter 2011/12 12

The Multiplication Method for Integer Keys Two-step method 1. multiply k by constant 0 < γ < 1, and extract fractional part of kγ 2. multiply by m, and use integer part as hash value: h(k) := m(γk mod 1) = m(γk γk ) Remarks: value of m uncritical; e.g. m = 2 p value of γ needs to be chosen well in practice: use fix-point arithmetics non-integer keys: use encoding to integers (ASCII, byte encoding,... ) Chapter 7: Hash Tables, Winter 2011/12 13

Open Addressing Definition no containers: table contains objects each slot of the hash table either contains an object or NIL to resolve collisions, more than one position is allowed for a specific key Hash function: generates sequence of hash table indices: h : U {0,..., m 1 {0,..., m 1 General approach: store object in the first empty slot specified by the probe sequence empty slot in the hash table guaranteed, if the probe sequence h(k, 0), h(k, 1),..., h(k, m 1) is a permutation of 0, 1,..., m 1 Chapter 7: Hash Tables, Winter 2011/12 14

Open Addressing Algorithms OpenHashInsert ( T : Table, x : Object ) : Integer { f o r i from 0 to m 1 do { j := h ( x. key, i ) ; i f T [ j ]= NIL then { T [ j ] := x ; return j ; cast e r r o r hash t a b l e overflow OpenHashSearch ( T : Table, k : Integer ) : Object { i := 0; while T [ h ( k, i ) ] <> NIL and i < m { i f k = T [ h ( k, i ) ]. key then return T [ h ( k, i ) ] ; i := i +1; return NIL ; Chapter 7: Hash Tables, Winter 2011/12 15

Open Addressing Linear Probing Hash function: h(k, i) := (h 0 (k) + i) mod m first slot to be checked is T[h 0 (k)] second probe slot is T[h 0 (k) + 1], then T[h 0 (k) + 2], etc. wrap around to T[0] after T[m 1] has been checked Main problem: clustering continuous sequences of occupied slots ( clusters ) cause lots of checks during searching and inserting clusters tend to grow, because all objects that are hashed to a slot inside the cluster will increase it slight (but minor) improvement: h(k, i) := (h 0 (k) + ci) mod m Chapter 7: Hash Tables, Winter 2011/12 16

Open Addressing Quadratic Probing Hash function: h(k, i) := (h 0 (k) + c 1 i + c 2 i 2 ) mod m how to chose constants c 1 and c 2? objects with identical h 0 (k) still have the same sequence of hash values ( secondary clustering ) Idea: double hashing h(k, i) := (h 0 (k) + i h 1 (k)) mod m if h 0 is identical for two keys, h 1 will generate different probe sequences Chapter 7: Hash Tables, Winter 2011/12 17

Open Addressing Double Hashing How to choose h 0 and h 1 : h(k, i) := (h 0 (k) + i h 1 (k)) mod m range of h 0 : U {0,..., m 1 (cover entire table) h 1 (k) must never be 0 (no probe sequence generated) h 1 (k) should be prime to m for all k probe sequence will try all slots if d is the greatest common divisor of h 1 (k) and m, only 1 d hash slots will be probed of the Possible choices: m = 2 M and let h 1 generate odd numbers, only m a prime number, and h 1 : U {1,..., m 1 with m 1 < m Chapter 7: Hash Tables, Winter 2011/12 18

Collisions and Clustering Scenarios for Collisions: keys share the same primary hash value: h(k 1, 0) = h(k 2, 0) same sequence of hash values for linear and quadratic probing keys share a value of the hash sequence: h(k 1, i) = h(k 2, j) same sequence of hash values for linear probing different hash values for next try: h(k 1, i + 1) h(k 2, j + 1) Example: multiple keys that share the same has values linear hashing will cause primary cluster cluster will also grow by all keys mapped to a hash value within this cluster Chapter 7: Hash Tables, Winter 2011/12 19

Open Addressing Deletion Problem remaining: how to delete? search entry, remove it does not work: insert 3, 7, 8 having same hash-value, then delete 7 how to find 8? do not delete, just mark as deleted Next problem: searching stops if first empty entry found after many deletions: lots of unnecessary comparisons! Chapter 7: Hash Tables, Winter 2011/12 20

Open Addressing Deletion (2) Deletion general problem for open hashing only solution : new construction of table after some deletions hash tables therefore commonly don t support deletion Inserting inserting efficient, but too many inserts not enough space if ratio α too big, new construction of table with larger size Still... searching faster than O(log n) possible Chapter 7: Hash Tables, Winter 2011/12 21