Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Similar documents
General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Understand how to deal with collisions

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Hashing Techniques. Material based on slides by George Bebis

Hash Tables. Hashing Probing Separate Chaining Hash Function

TABLES AND HASHING. Chapter 13

Topic HashTable and Table ADT

Data Structures And Algorithms

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Introduction to Hashing

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

CS 350 : Data Structures Hash Tables

Outline. hash tables hash functions open addressing chained hashing

UNIT III BALANCED SEARCH TREES AND INDEXING

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

AAL 217: DATA STRUCTURES

Open Addressing: Linear Probing (cont.)

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

STRUKTUR DATA. By : Sri Rezeki Candra Nursari 2 SKS

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

HASH TABLES.

Algorithms and Data Structures

Chapter 20 Hash Tables

CSE 214 Computer Science II Searching

Data and File Structures Chapter 11. Hashing

Data Structures. Topic #6

Adapted By Manik Hosen

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Fast Lookup: Hash tables

HASH TABLES. Goal is to store elements k,v at index i = h k

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

THINGS WE DID LAST TIME IN SECTION

Hashing. October 19, CMPE 250 Hashing October 19, / 25

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

COMP171. Hashing.

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

Introduction To Hashing

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

CMSC 341 Hashing. Based on slides from previous iterations of this course

Question Bank Subject: Advanced Data Structures Class: SE Computer

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Priority Queue. 03/09/04 Lecture 17 1

CS 251, LE 2 Fall MIDTERM 2 Tuesday, November 1, 2016 Version 00 - KEY

HASH TABLES. Hash Tables Page 1

Comp 335 File Structures. Hashing

Hash table basics mod 83 ate. ate

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

lecture23: Hash Tables

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Cpt S 223. School of EECS, WSU

Lecture 4. Hashing Methods

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

Chapter 27 Hashing. Liang, Introduction to Java Programming, Eleventh Edition, (c) 2017 Pearson Education, Inc. All rights reserved.

Chapter 27 Hashing. Objectives

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

CPSC 259 admin notes

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

DATA STRUCTURES/UNIT 3

CS 2412 Data Structures. Chapter 10 Sorting and Searching

Hash Tables. Gunnar Gotshalks. Maps 1

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Chapter 9: Maps, Dictionaries, Hashing

CSCD 326 Data Structures I Hashing

FINAL EXAM REVIEW CS 200 RECITATIOIN 14

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Hashing. Hashing Procedures

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

Hash table basics mod 83 ate. ate. hashcode()

Hashing for searching

Module 2: Classical Algorithm Design Techniques

Hashing file organization

CS302 - Data Structures using C++

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

Hash table basics mod 83 ate. ate

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Dictionaries-Hashing. Textbook: Dictionaries ( 8.1) Hash Tables ( 8.2)

Hash Table and Hashing

Fundamental Algorithms

Hash table basics. ate à. à à mod à 83

9/24/ Hash functions

Hash[ string key ] ==> integer value

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

CS 350 Algorithms and Complexity

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

CSE 373 Autumn 2012: Midterm #2 (closed book, closed notes, NO calculators allowed)

Hashing IV and Course Overview

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Dictionaries and Hash Tables

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

Successful vs. Unsuccessful

Transcription:

Hash Tables Outline Definition Hash functions Open hashing Closed hashing collision resolution techniques Efficiency EECS 268 Programming II 1

Overview Implementation style for the Table ADT that is good in a wide range of situations is the hash table efficient Insert, Delete, and Search operations difficult Sorted Traversal efficient unsorted traversal Good approach as long as sorted output comparatively rare in the total set of hash table operations EECS 268 Programming II 2

Definition Hash table is defined by: set of records R = { r 1, r 2,..., r n } stored by the table set of input keys K = { k 1, k 2,..., k n }, n >= 0 that can be associated with records (k x, r y ) Array of buckets B[0... m-1]: each array element is capable of holding one or more (k x, r y ) pairs Hash Function H: K {0, 1,..., m-1} for any given (k x, r y ), B[H(k x )] is the designated storage location for (k x, r y ) Collision resolution scheme when (k x, r y ) and (k a, r b ) map to the same bucket under H, this scheme determines where the second record is stored EECS 268 Programming II 3

Definitions An Array of buckets B[0... m-1] holds all data managed by the hash table Open or External Hashing bucket locations store pointers (references) to record pairs (k x, r y ) colliding records stored in a linked list Closed or Internal Hashing buckets store actual objects colliding records stored in other bucket locations Note that the associated keys may be implicit rather than explicitly stored EECS 268 Programming II 4

Hash Functions H(i) = i reduces the hash table to an array Selecting digits choose some subset of digits in a large number specific slice or positions Folding take digits or slices of a number and add them together with roll-over H(i) = i modulo m where m is Hash Table size choosing m as a prime number is popular for an even distribution of keys EECS 268 Programming II 5

Hash Function 2 Strings are a common search key in many cases convert string to an integer H(string) integer Approaches add characters or slices of characters together as n-bit unsigned numbers with the sum rolling over within x- bits bit shifting to form numbers possible x-bits chose for table size or x modulo m several other options possible EECS 268 Programming II 6

Open Hashing Example: take a hash table size of 7 (prime) and a hash function h(x) = x mod 7 insert 64, 26, 56, 72, 8, 36, 42 If data set is large compared to hash table size, or the hash function clusters data, then length of the list holding the bucket contents can be significant sorted list will reduce the average failure time can identify failure before the end of the list use binary search tree instead of list why not a BST for the whole data set? use second Hash table EECS 268 Programming II 7

Open Hashing 2 Advantages of Open Hashing with chaining simple in concept and implementation insertion is always possible Disadvantages of hashing with chaining unbalanced distribution decreases efficiency O(n) for a linked list, O(log n) for a BST greater memory overhead higher execution overhead of stepping through pointers EECS 268 Programming II 8

Closed Hashing Closed hashing with Open addressing storing all data items within single hash table, but open up the address assigned to item on collision Hash table of size m can hold at most m items Only a perfect hash function will distribute m items to m different table elements collisions will generally occur before table is full Collision resolution is thus crucial to efficient use of closed hash tables EECS 268 Programming II 9

Closed Hashing Collision Resolution Create a sequence of collision resolution functions h 0 (x) is base hash function h 1 (x) used to find first alternate storage location after a collision h 2 (x) used to find the next alternate if first alternate is occupied Each h i (x) must be guaranteed to choose different table locations Hash function series should ideally check all table locations EECS 268 Programming II 10

Collision Resolution Linear Probing Search hash table sequentially starting from the original location specified by the hash function h i x = h 0 x + i mod m, i > 0 Insert 64, 26, 56, 72, 8, 36, 42 in an empty table of size 7 Fragile causes primary clusters by occupying adjacent table locations similar to long chains in open hashing EECS 268 Programming II 11

Collision Resolution Quadratic Probing Spread probed locations across the table h i x = h 0 x + i 2 mod m, i > 0 Example: Insert 64, 26, 56, 72, 8, 36, 42 Series of probed locations is not guaranteed to cover the whole table without duplication Closed hashing schemes can fail even though the table is not full and secondary clusters may form if the probing scheme will not visit all table locations and distribute probes evenly over 0..m EECS 268 Programming II 12

Collision Resolution Linear Probing with Fixed Increment h i x = h 0 x + (i FI) mod m, i > 0 FI is relatively prime to m linear probing will visit all table locations without repeats X is relatively prime to Y iff GCD(X,Y) = 1 EECS 268 Programming II 13

Collision Resolution Double Hashing Use a second hash function (h'(x)) to generate the probe sequence used after a collision h i x = h 0 x + (ih (x)) mod m, i > 0 Use h (x)=r (x mod R), where R < m is prime Example: m=7, R=5, insert 64,26,56,72,8,36,42 EECS 268 Programming II 14

Closed Hashing -- Deletions Example: Insert 64, 56, 72, 8 using linear probling delete 64; delete 8 Deletion along the probing path from A B creates a problem because the empty cell could be there for two reasons no further elements exist along this probing sequence deletion of an item along the sequence took place Two types of empty buckets bucket has always been empty (AE) (flag 0) bucket emptied by deletion (ED) (flag 1) EECS 268 Programming II 15

Closed Hashing -- Deletions During a probing sequence, if an AE bucket is found, searching can stop if an ED bucket is found, searching must continue Closed Hashing is thus subject to a form of fatigue as cells are deleted, probing sequences generally lengthen as the probability of encountering ED cells increases failed searches get more expensive because they cannot terminate until an AE cell is found all cells of the table can be visited EECS 268 Programming II 16

Closed Hashing Advantages of Closed Hashing with Open Addressing lower execution overhead as addresses are calculated rather than read from pointers in memory lower memory overhead as pointers are not stored Disadvantages more complex than chaining can degenerate into linear search due to primary or secondary clustering Delete and Find operations are more complex Insert is not always possible even though the table is not full Delete can increase probe sequence length by making search termination conditions ambiguous EECS 268 Programming II 17

The Efficiency of Hashing An analysis of the average-case efficiency Load factor ratio of the current number of items in the table to the maximum size of the array table measures how full a hash table is should not exceed 2/3 Hashing efficiency for a particular search also depends on whether the search is successful unsuccessful searches generally require more time than successful searches EECS 268 Programming II 18

The Efficiency of Hashing EECS 268 Programming II 19

Summary Hash Tables are useful and efficient data structures in a wide range of applications Open hashing with chaining is simple, easy to implement, and usually efficient length of the chains is key to performance Closed hashing with various approaches to generating a probe sequence can also be efficient lower space and computation overhead more complex implementation performance is sensitive to probe sequence Monitoring load factor and other hash-table behavior parameters is important in maintaining performance EECS 268 Programming II 20