CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

Similar documents
Topic HashTable and Table ADT

Hash Tables. Hashing Probing Separate Chaining Hash Function

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Data Structures And Algorithms

CSE 214 Computer Science II Searching

Adapted By Manik Hosen

Successful vs. Unsuccessful

CSE 2320 Notes 13: Hashing

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

HASH TABLES.

Hash Table and Hashing

Comp 335 File Structures. Hashing

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

TABLES AND HASHING. Chapter 13

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

BBM371& Data*Management. Lecture 6: Hash Tables

Hashing. October 19, CMPE 250 Hashing October 19, / 25

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Cpt S 223. School of EECS, WSU

Hash[ string key ] ==> integer value

Hashing for searching

COMP171. Hashing.

Hashing Techniques. Material based on slides by George Bebis

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Part I Anton Gerdelan

Hashing. It s not just for breakfast anymore! hashing 1

Hashing. Introduction to Data Structures Kyuseok Shim SoEECS, SNU.

ECE 242 Data Structures and Algorithms. Hash Tables I. Lecture 24. Prof.

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Chapter 20 Hash Tables

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

CSCD 326 Data Structures I Hashing

More on Hashing: Collisions. See Chapter 20 of the text.

Lecture 10 March 4. Goals: hashing. hash functions. closed hashing. application of hashing

ECE 242 Data Structures and Algorithms. Hash Tables II. Lecture 25. Prof.

DATA STRUCTURES AND ALGORITHMS

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

Chapter 27 Hashing. Objectives

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Unit #5: Hash Functions and the Pigeonhole Principle

Fast Lookup: Hash tables

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

HASH TABLES. Hash Tables Page 1

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Question Bank Subject: Advanced Data Structures Class: SE Computer

1. Attempt any three of the following: 15

Hashing. Hashing Procedures

CSI33 Data Structures

Worst-case running time for RANDOMIZED-SELECT

Hashing. Biostatistics 615 Lecture 12

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

9/24/ Hash functions

Open Addressing: Linear Probing (cont.)

CS 2412 Data Structures. Chapter 10 Sorting and Searching

Data Structures and Algorithms. Chapter 7. Hashing

Understand how to deal with collisions

III Data Structures. Dynamic sets

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

AAL 217: DATA STRUCTURES

Data Structures and Algorithms. Roberto Sebastiani

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

DATA STRUCTURES/UNIT 3

Data Structures & File Management

Hashing (Κατακερματισμός)

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Data Structures. Topic #6

CPSC 259 admin notes

Data Structures (CS 1520) Lecture 23 Name:

Implementations III 1

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

CSE 5311 Notes 5: Hashing

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

CS 350 : Data Structures Hash Tables

Lecture 4. Hashing Methods

CSE 100: HASHING, BOGGLE

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

Data Structures and Object-Oriented Design VIII. Spring 2014 Carola Wenk

Implementation of Linear Probing (continued)

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

Design Patterns for Data Structures. Chapter 9.5. Hash Tables

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

CS 350 Algorithms and Complexity

CSc 120. Introduc/on to Computer Programming II. 15: Hashing

Fundamental Algorithms

The dictionary problem

4. SEARCHING AND SORTING LINEAR SEARCH

Hash table basics mod 83 ate. ate

Transcription:

CS 310 Hash Tables, Page 1 Hash Tables key value key value Hashing Function CS 310 Hash Tables, Page 2 The hash-table data structure achieves (near) constant time searching by wasting memory space. the size of the memory we reserve for a hash table is typically much large than the number of data stored in it. Each item in the table comprises a key and a value. In each GMU student record, we could use GMU ID as the key and all other data combined as the value. We use a hashing function to determine where to store items in table. Given an entry x, it will be stored at location h(x.key), where h is the hashing function. The verb to hash means to chop something up or to make a mess out of it (it/something being the key).

CS 310 Hash Tables, Page 3 Example The table contains 80 entries. The key of each entry is an integer. We use the hash function h(key) = key % 80. An entry with key 140 will be stored at location 140 % 80 = 60 in the table. An entry with key 425 will be stored at location 425 % 80 = 25 An entry with key 825 will be also stored at location 825 % 80 = 25! CS 310 Hash Tables, Page 4 Key-to-Index Mapping A collision occurs when two keys are hashed to the same location in the table. Birthday paradox: If 23 people are present in a room, chances are good (> 0.5) that two of them will have the same month/day birthday. If we randomly maps 23 keys into a table of size 365, the probability that no two keys map to the same location is less than 0.5. A complete key-to-index mapping consists of 1. a hashing function that minimizes collisions, and 2. a technique for handling collisions.

CS 310 Hash Tables, Page 5 Hashing Fuction I: Division Let CAPACITY be the number of entries in the table. h(key) = key % CAPACITY. please notice that this hash function is called division, although what we actually want is its remainder. Certain CAPACITY values are better than others at avoiding collisions. Studies show that a good choice of CAPACITY is a prime number of the form 4c + 3, where c is some integer number. 811 is a prime number equal to (4 202) + 3 1019 is a prime number equal to (4 254) + 3 CS 310 Hash Tables, Page 6 Hashing Function II: Mid-Square The key is multiplied by itself. The hashing function returns middle r digits of the result. CAPACITY is set to 2 r.

CS 310 Hash Tables, Page 7 Hashing Function III: Multiplication The key is multiplied by a constant less than one. The hash function returns the first r digits of the the results. Again, CAPACITY is set to 2 r. CS 310 Hash Tables, Page 8 How to Handle Keys of Type String Just treat a string as a long binary number. Example: Suppose the key is the string KEY In ASCII, this is 1001011 1000101 1011001. Treat this as a binary number and use previous hashing functions. Folding. Suppose we are given a key comprising 11 letters a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 a 11. a 1 a 2 a 3 a 4 + a 5 a 6 a 7 a 8 + a 9 a 10 a 11

CS 310 Hash Tables, Page 9 Collision Resolution Method I: Chaining For chaining, we put entries hashed to the same index into a linked list. The advantages of chaining are: It is easy to locate a key when there are collisions. The hash table needs only space for pointers. Deletion of an item is straight-forward. The disadvantage of chaining is that we need additional space for pointers along with the data. CS 310 Hash Tables, Page 10 00 01 02 03 04 05 06 Example: Suppose h(key) = key % 100 Keys: 4203 3606 1200 1302 8802 4106 98 99

CS 310 Hash Tables, Page 11 Collision Resolution Method II: Open Addressing If h(key) = h 0 and is occupied, then we try, or probe, positions h 1, h 2, h 3..., until we find an opening slot. Common probing techniques: linear, quadratic, random, double hashing CS 310 Hash Tables, Page 12 Linear Probing h i = (h 0 + i) % CAPACITY That is, we try h 0 + 1, h 0 + 2, h 0 + 3,... until an empty slot is found. the % CAPACITY part ensures wraping around after unsuccessfully probing the last location in the table. When the table becomes about half full, linear probing has a tendency toward clustering; that is, data occupy long strings of adjacent positions with gaps between the strings.

CS 310 Hash Tables, Page 13 0 1 2 3 4 5 6 7 8 9 M = 10 Keys: 211, 143, 160, 821, 223, 221 CS 310 Hash Tables, Page 14 Quadratic Probing h i = (h 0 + i 2 ) % CAPACITY That is, we try h 0 + 1, h 0 + 4, h 0 + 9,... until an empty slot is found. Typically, quadratic probing will not probe all locations. For prime CAPACITY values, (CAPACITY+1)/2 probes will bring you back to the starting point; when that happens, there is no point in going further. (See pages 402 to 403 of Kruise for a proof.)

CS 310 Hash Tables, Page 15 Example CAPACITY=7, key = 702 h 0 = 2 h 1 = (h 0 + 1 2 ) % 7 = 3 h 2 = (h 0 + 2 2 ) % 7 = 6 h 3 = (h 0 + 3 2 ) % 7 = 4 h 4 = (h 0 + 4 2 ) % 7 = 4 h 5 = (h 0 + 5 2 ) % 7 = 6 h 6 = (h 0 + 6 2 ) % 7 = 3 h 7 = (h 0 + 7 2 ) % 7 = 2 We have completed a cycle, and half of the table entries are not probed. CS 310 Hash Tables, Page 16 consider inserting 2, 72, 702, 7002, and 70002 to an empty, size-7 hash table that uses quadratic probing for all these keys, h 0 = 2, h 1 = 3, h 2 = 6, and h 3 = 4. 0 1 2 h 0 3 h 1 4 h 3 5 6 h 2 This is typically acceptable with hash tables, for probing half of the table most probably indicates flaws in system design. a hash table is typically large to minimize the chances of collisions.

CS 310 Hash Tables, Page 17 Double Hashing For double hashing, we use a second hashing function h (key). If location h(key) = h 0 is occupied, we probe h 0 + h (key), h 0 + 2 h (key), h 0 + 3 h (key),... until an empty space is found. CS 310 Hash Tables, Page 18 Rehashing For rehashing, we also use two hashing functions, h and h. If h(key) = h 0 is occupied, then we try h 1 = h (h 0 ), then we try h 2 = h (h 1 ), then we try h 3 = h (h 2 ), and so forth. A good choice of h and h is as follows: h(key) = key % CAPACITY h (x) = 1 + (x % (CAPACITY 2)) Both CAPACITY and CAPACITY 2 should be prime (for example, 811 and 809).

CS 310 Hash Tables, Page 19 Implementing Open-Address Hashing with Linear Probing template <class Entry> class Table { public: static const int CAPACITY = 811; Table(); ~Table(); void insert (const Entry& entry); void remove (int key); bool is_present (int key) const; void find (int key, bool& found, Entry& result) const; int size() const {return used; CS 310 Hash Tables, Page 20 private: static const int NEVER_USED = -1; static const int PREVIOUSLY_USED = -2; Entry data[capacity]; int used; int hash (int key) const; void find (int key, bool& found, int& index) const;

CS 310 Hash Tables, Page 21 template <class Entry> void Table<Entry>::find (int key, bool& found, int& i) const { int count = 0; int i = hash (key); while (count < CAPACITY && data[i].key!= key && data[i].key!= NEVER_USED) { count++; i = (i+1) % CAPACITY; found = data[i].key == key; CS 310 Hash Tables, Page 22 template <class Entry> void Table<Entry>::insert (const Entry& entry) const { bool already_present; int i; find (entry.key, already_present, i); if (!already_present) { i = hash(entry.key); while (data[i].key!= NEVER_USED data[i].key!= PREVIOUSLY_USED) i = (i+1) % CAPACITY; used++; data[i] = entry;

CS 310 Hash Tables, Page 23 template <class Entry> void Table<Entry>::remove (int key) { bool present; int i; find (key, present, i); if (present) { data[i].key = PREVIOUSLY_USED; used--; CS 310 Hash Tables, Page 24 load factor α = Performance # of occupied table entries CAPACITY In open-address hashing with linear probing, a nonfull hash table, and no deletions, the average number of locations examined in a successful search is approximately: ( 1 1 + 1 ) 2 1 α Given a table with CAPACITY=811 and 611 entries occupied, we( have α ) 0.8. We expect successful searches to examine 1 2 1 + 1 1 0.8 = 3.0 slots.

CS 310 Hash Tables, Page 25 In open-address hashing with double hashing, a nonfull hash table, and no deletions, the average number of table entries examined in a successful search is approximately: ln(1 α) α Given a table with CAPACITY=811 and 611 entries occupied, we expect successful searches to examine ln(1 0.8) 0.8 = 2 slots. In chained hashing, the average number of table entries examined in a successful search is approximately: 1 + α 2 Note that, with chaining the table can be overloaded (that is, α 1). CS 310 Hash Tables, Page 26 An Empirical Comparison Load factor 0.1 0.5 0.8 0.9 0.99 2.0 Successful search, average # of probes Chaining 1.04 1.2 1.4 1.4 1.5 2.0 Open, quadratic probes 1.04 1.5 2.1 2.7 5.2 NA Open, linear probes 1.05 1.6 3.4 6.2 21.3 NA Unsuccessful search, average # of probes Chaining 1.1 1.5 1.8 1.9 1.99 2.0 Open, quadratic probes 1.13 2.2 5.2 11.9 126 NA Open, linear probes 1.13 2.7 15.4 59.8 430 NA