Topic HashTable and Table ADT

Similar documents
5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

Hash Tables. Hashing Probing Separate Chaining Hash Function

Data Structures And Algorithms

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Understand how to deal with collisions

Open Addressing: Linear Probing (cont.)

Successful vs. Unsuccessful

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

COMP171. Hashing.

AAL 217: DATA STRUCTURES

HASH TABLES.

Comp 335 File Structures. Hashing

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

TABLES AND HASHING. Chapter 13

CS302 - Data Structures using C++

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

Data Structures. Topic #6

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Chapter 20 Hash Tables

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

DATA STRUCTURES AND ALGORITHMS

UNIT III BALANCED SEARCH TREES AND INDEXING

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Algorithms and Data Structures

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Data and File Structures Chapter 11. Hashing

STRUKTUR DATA. By : Sri Rezeki Candra Nursari 2 SKS

Hash[ string key ] ==> integer value

Cpt S 223. School of EECS, WSU

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Hashing Techniques. Material based on slides by George Bebis

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Hashing. October 19, CMPE 250 Hashing October 19, / 25

CSE 214 Computer Science II Searching

Fast Lookup: Hash tables

ECE 242 Data Structures and Algorithms. Hash Tables I. Lecture 24. Prof.

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

ECE 242 Data Structures and Algorithms. Hash Tables II. Lecture 25. Prof.

Hash Table and Hashing

CSE 373 Autumn 2012: Midterm #2 (closed book, closed notes, NO calculators allowed)

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Adapted By Manik Hosen

Outline. hash tables hash functions open addressing chained hashing

ECE242 Data Structures and Algorithms Fall 2008

Priority Queue. 03/09/04 Lecture 17 1

Introduction To Hashing

HASH TABLES. Goal is to store elements k,v at index i = h k

CMSC 341 Hashing. Based on slides from previous iterations of this course

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

CS 2412 Data Structures. Chapter 10 Sorting and Searching

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

Lecture 18. Collision Resolution

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Hashing. Hashing Procedures

Hashing. mapping data to tables hashing integers and strings. aclasshash_table inserting and locating strings. inserting and locating strings

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

Part I Anton Gerdelan

Lecture 4. Hashing Methods

1 CSE 100: HASH TABLES

The dictionary problem

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

1. Attempt any three of the following: 15

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

Lecture 10 March 4. Goals: hashing. hash functions. closed hashing. application of hashing

CS 310 Advanced Data Structures and Algorithms

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

CS 350 : Data Structures Hash Tables

Fundamental Algorithms

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

CSE373: Data Structures & Algorithms Lecture 6: Hash Tables

Question Bank Subject: Advanced Data Structures Class: SE Computer

stacks operation array/vector linked list push amortized O(1) Θ(1) pop Θ(1) Θ(1) top Θ(1) Θ(1) isempty Θ(1) Θ(1)

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Hash Tables. Gunnar Gotshalks. Maps 1

BBM371& Data*Management. Lecture 6: Hash Tables

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

Introduction to Hashing

Unit #5: Hash Functions and the Pigeonhole Principle

CSCD 326 Data Structures I Hashing

Hashing. inserting and locating strings. MCS 360 Lecture 28 Introduction to Data Structures Jan Verschelde, 27 October 2010.

Standard ADTs. Lecture 19 CS2110 Summer 2009

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

CSE 100: HASHING, BOGGLE

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

HASH TABLES. Hash Tables Page 1

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

Use PageUp and PageDown to move from screen to screen. Click on speaker to play sound.

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

Transcription:

Topic HashTable and Table ADT

Hashing, Hash Function & Hashtable

Search, Insertion & Deletion of elements based on Keys So far, By comparing keys! Linear data structures Non-linear data structures Time complexity?

Search, Insertion & Deletion of elements based on Keys A different approach: By calculating the location from keys! Time complexity?

Search, Insertion & Deletion of elements by calculating the location from keys 0 1 Search Key Location Calculator 2 3. N-1

Hashing The technique used for ordering and accessing elements in an array of some fixed size N. Hashtable By manipulating the key of an element to identify its location in the array. Each key is are mapped to an array position (0.. N-1) by a hash function. Hash Function In a relatively constant amount of time.

Hashtable The array of elements based on hashing. - unordered, sparse table! 0 1 2 3. N-1 N = Hashtable size (N= The fixed size of the array) Hashtable

Keys of Elements K = The set of keys of elements The size of K is relatively large or even unbounded. K = 9-digit numbers K = 1,000,000,000 K = Arbitrary character strings of arbitrary length K = unbounded

Keys Depending on the application, the keys might be Integers letters strings and so on

Elements E = The number of elements to be stored E is significantly less than K. K E

Size of Hashtable N = The size of hashtable N is at least as great as the maximum number of elements to be stored, i.e. E. K E 0 1 2 3. N-1

Hash Function A function used to manipulate the key of an element to identify its location in the array (hashtable).

Hash Function A Key Value in the set of possible keys K h An Integer Value between 0 and N 0 1 2 3.. N-1. Hash Function Hashing a key to an array index h: K {0, 1,, N-1}

Example Employees using their SSNs as a key: K = The set of keys = 10 9 = 1,000,000,000 Let E = The number of employees to be stored = 10,000 E << K Let N= 10,000. Hash function h: SSN {0, 1,, 9999} Let h (key) = key % 10000

Example Employees using their five digit ID numbers as a key: K = The set of keys = 10 5 = 100,000 Let E = The number of employees to be stored = 100 E << K Let N= 100. Hash function h: ID {0, 1,, 99} Let h (key) = key % 100

Hash Function Methods Selecting digits Folding Method - Add digits Middle square method Multiplication method Division method - Modulo arithmetic

Access using Hash Function Hash function has two uses: As a method of determining where to store the element. As a method of accessing the element.

Access using Hash Function Key h 0 1 2 3. N-1 Hashtable

Perfect Hash Function Transforms different keys to different numbers.

Collision

Collisions The condition resulting when two or more keys produce the same hash location. When two or more items should be kept in the same location, esp. in hash tables, that is, when two or more different keys hash to the same value.

Why Collisions? In general: K N The mapping defined by hash function H: K {0, 1,, N-1} is a many-to-one mapping! There will exist many pairs of two distinct keys K1 and K2 s.t. H(K1)= H(K2). K E 0 1 2 3.. N-1..

Example Employees using their SSNs as a key: K = The set of keys = 10 9 = 1,000,000,000 Let N= 10,000. K N {0, 1,, 9999} is a many- Hash function h: SSN to-one mapping! There will exist many pairs of two distinct keys SSN1 and SSN2 s.t. h(ssn1)= h(ssn2). h(999991234) = 1234, h(111111234) = 1234,

Example Employees using their five digit ID numbers as a key: K = The set of keys = 10 5 = 100,000 Let N= 100. K N {0, 1,, 99} is a many-to- Hash function h: ID one mapping! There will exist many pairs of two distinct keys ID1 and ID2 s.t. h(id1)= h(id2). h(91234) = 1234, h(11234) = 1234,

Collisions? The hash function The hash table size

Collision Resolution Schemes

How to Resolve Collisions? Two approaches to collision resolution : Through open addressing Closed hashing Through restructuring the hash table (chained addressing) Open hashing

Example N = 101 & h(key) = Key mod 101 7597 mod 101 = 22 7597 h 22 0. 22. 7597 100 Hashtable

Example N = 101 & h(key) = Key mod 101 4567 mod 101 = 22 22 4567 h 7597 4567? 0. 22. 100 Hashtable

Collision Resolution through Open Addressing A method of finding an open location for insertion into a hashtable after a collision has occurred. How to find an open location? (Probe) Linear probing Quadratic probing Doubling hashing Random probing

1. Linear Probing for Open Addressing An open addressing technique in which we continue from the hash location on looking for the next available position sequentially. The size of step = 1 The probe sequence: HT[ h ( SearchKey ) ] HT[ h ( SearchKey ) + 1 ] HT[ h ( SearchKey ) + 2 ] HT[ h ( SearchKey ) + 3 ]

Example: Insertion h(key) = Key mod 101 7597 4567 0628 h 22 0. 22 23 24 7597 4567 0628 4567? 3658 25. 3658 100

Example: Insertion h(key) = Key mod 101 1110 1211 h 100 0. 22 23 24 1211 7597 4567 0628 25. 3658 100 1110 1211?

Example: Deletion 4567 h(key) = Key mod 101 4567 h 22 0. 22 23 24 1211 7597 4567 0628 25. 3658 100 1110

Example: Search 3658 h(key) = Key mod 101 3658 h 22 0. 22 23 24 1211 7597 4567 0628 25. 3658 100 1110

Status of Each Location Each location has Three states: valid empty deleted

Example: Deletion 4567 & Search 3658 h(key) = Key mod 101 1211 22 22 7597 valid 3658 h 23 4567 deleted 0. 24 25. 100 0628 3658 1110 empty

Example: Insert 4567 Again h(key) = Key mod 101 1211 22 22 7597 valid 4567 h 23 4567 deleted 0. 24 25. 100 0628 3658 1110 empty

Example: Insert 4567 Again h(key) = Key mod 101 1211 22 22 7597 valid 4567 h 23 4567 valid 0. 24 25. 100 0628 3658 1110 empty

Clustering Problem with Linear Probing The tendency of elements to become unevenly distributed in the hashtable, with many elements clustering around a single hash location. Clustering causes long probe searches!

2. Quadratic Probing for Open Addressing An open addressing technique in which we continue from the hash location on looking for the next available position sequentially. The size of step = 1 2, 2 2, 3 2 The probe sequence: HT[ h ( SearchKey ) ] HT[ h ( SearchKey ) + 1 2 ] HT[ h ( SearchKey ) + 2 2 ] HT[ h ( SearchKey ) + 3 2 ]

Example: Insertion h(key) = Key mod 101. 22 7597 4567 0628 3658 h 22 23 24 25 26. 31. 7597 4567 0628 3658 4567?

Example: Insertions h(key) = Key mod 7 9 23 16 2 h 0 1 2 3 4 5 6 9 23 2 16

Example: Insertions h(key) = Key mod 7 0 1 2 9 30 h 3 4 23 2 30? 5 6 16

Quadratic Probing Virtually eliminates clustering! Cannot guarantee successful insertion if the hash table is half full or more.

3. Double Hashing for Open Addressing An open addressing technique in which we continue from the hash location on looking for the next available position sequentially. The size of step = h (SearchKey) The probe sequence: HT[ h ( SearchKey ) ] HT[ h ( SearchKey ) + h (SearchKey) ] HT[ h ( SearchKey ) + 2 * h (SearchKey) ] HT[ h ( SearchKey ) + 3 * h (SearchKey) ] The probe sequence is key-dependent.

Example: Insertion h(key) = Key mod 11 h (key) = 7 - (Key mod 7) 58 0. 3 14? 7 14 h 3 6. 58 10 14 Hashtable

Example: Insertion h(key) = Key mod 11 h (key) = 7 - (Key mod 7) 0. 3 91? 7 91 h 3 6. 58 91 10 14 91? Hashtable

Rehashing What happens the hash table is full or very full? Rehashing! Enlarge the hashtable size. Rehashing is Create a new larger hash table. Insert each element in the old hash table into the new hash table. How larger hash table? Double the hashtable size.

Collision Resolution through Restructuring the Hashtable Change the structure of the hashtable so that it can accommodate more tan one element in the same location! Two ways: Using buckets Using separate chaining

1. Using Buckets A technique to resolve collisions by implementing a hashtable as an array of arrays. A bucket is an element of a hashtable that is itself an array.

Example: Insertion h(key) = Key mod 101 7597 4567 0628 h 0. 22 23 7597 4567 0628 24 1110 1211 25. 100 1110 1211 Bucket size = 3

Using Buckets The size of bucket Too small? Too big?

2. Using Separate Chaining A technique to resolve collisions by implementing a hashtable as an array of pointers, each pointer is the head of a linked list of records with keys that hash to that location. Each linked list is called a chain.

Example: Insertion h(key) = Key mod 101 7597 4567 0628 h 0. 22 23 0628 4567 7597 24 1110 25. 1211 100 1211 1110

Separate Chaining Using a linked list A unsorted linked list A sorted linked list

Good Hash Functions? A good hash function avoids collisions. A good hash function tends to spread keys evenly. A good hash function is easy to compute. The running time should be O(1).

Good Hash Functions The calculation of the hash function should involve the entire search key. If a hash function uses modulo arithmetic, the base (the hashtable size) should be prime. f(key) % table_size

Size of Hashtable? Too big? Memory waste Too small? More collisions & rehashing Should be as large as practical & prime number!

Hash Table Implementation

Hash Table ADT template < class DT, class KT > class HashTbl { public: HashTbl ( int inittablesize ); ~HashTbl ();

Hash Table ADT void insert (const KT& searchkey, const DT &newdataitem); bool remove ( KT searchkey ); bool retrieve ( KT searchkey, DT &dataitem ); void clear (); bool isempty () const; bool isfull () const; void showstructure () const;

Hash Table ADT }; private: int tablesize; vector<list< pair<kt,dt> >> datatable;

Analysis of Hashing

Search - Time Analysis Search operation: The worst-case time Every key gets hashed to the same array index. O(N) linear time The average-case time O(2-5) constant time

Insert - Time Analysis Insert operation: The worst-case time Every key gets hashed to the same array index. O(N) linear time O(1) for separate chaining (insert at the front always) The average-case time O(2-5) constant time

Delete - Time Analysis Delete operation: The worst-case time Every key gets hashed to the same array index. O(N) linear time The average-case time O(2-5) constant time

Load Factor of a Hashtable In general, hashtables have some unused locations. Load factor = The number of occupied hashtable locations (entries) / The size of the hashtable!

Search - Average Case The average number of hashtable elements examined for Search with linear probing: (1 + 1/ (1 - lf)) / 2 for successful search (1 + 1/ (1 - lf)2) / 2 for unsuccessful search

Search - Average Case The average number of hashtable elements examined for Search with quadratic probing and double hashing: - log (1 - lf) / lf for successful search 1/(1 - lf) for unsuccessful search

Search - Average Case The average number of hashtable elements examined for Search with separate chaining: 1 + lf / 2 for successful search lf for unsuccessful search

Search - Average Case load factor linear probing double hashing separate chaining 0.5 1.50 1.39 1.25 0.6 1.75 1.53 1.30 0.7 2.17 1.72 1.35 0.8 3.00 2.01 1.40 0.9 5.50 2.56 1.45 1.0 1.50 2.0 2.00 3.0 3.00

FindMin & FindMax - Time Analysis FindMin & FindMax operations: O(N) linear time

Traverse - Time Analysis Ordered traversal operation: O(N) linear time