Implementation of Linear Probing (continued)

Similar documents
CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

Open Addressing: Linear Probing (cont.)

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Data Structures And Algorithms

Hash[ string key ] ==> integer value

Hash Tables. Hashing Probing Separate Chaining Hash Function

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Cpt S 223. School of EECS, WSU

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

Topic HashTable and Table ADT

Adapted By Manik Hosen

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

Chapter 27 Hashing. Liang, Introduction to Java Programming, Eleventh Edition, (c) 2017 Pearson Education, Inc. All rights reserved.

HASH TABLES. Hash Tables Page 1

Hashing Techniques. Material based on slides by George Bebis

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Hash table basics mod 83 ate. ate. hashcode()

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Comp 335 File Structures. Hashing

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

HASH TABLES.

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

DATA STRUCTURES/UNIT 3

Hash table basics mod 83 ate. ate

Data and File Structures Chapter 11. Hashing

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Hashing. mapping data to tables hashing integers and strings. aclasshash_table inserting and locating strings. inserting and locating strings

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Hashing. inserting and locating strings. MCS 360 Lecture 28 Introduction to Data Structures Jan Verschelde, 27 October 2010.

HASH TABLES. Goal is to store elements k,v at index i = h k

CSCD 326 Data Structures I Hashing

Hashing as a Dictionary Implementation

CSE 2320 Notes 13: Hashing

TABLES AND HASHING. Chapter 13

Algorithms and Data Structures

6.7 b. Show that a heap of eight elements can be constructed in eight comparisons between heap elements. Tournament of pairwise comparisons

Data Structures and Object-Oriented Design VIII. Spring 2014 Carola Wenk

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

CS 310: Hash Table Collision Resolution Strategies

Outline. hash tables hash functions open addressing chained hashing

Understand how to deal with collisions

Hash table basics mod 83 ate. ate

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Hash table basics. ate à. à à mod à 83

Lecture 10 March 4. Goals: hashing. hash functions. closed hashing. application of hashing

Hash Tables. Gunnar Gotshalks. Maps 1

DATA STRUCTURES AND ALGORITHMS

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

UNIT III BALANCED SEARCH TREES AND INDEXING

CSE 214 Computer Science II Searching

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Hashing. It s not just for breakfast anymore! hashing 1

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

More on Hashing: Collisions. See Chapter 20 of the text.

Hashing. October 19, CMPE 250 Hashing October 19, / 25

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Chapter 27 Hashing. Objectives

Successful vs. Unsuccessful

COMP171. Hashing.

HASH TABLES. CSE 332 Data Abstractions: B Trees and Hash Tables Make a Complete Breakfast. An Ideal Hash Functions.

Module 3: Hashing Lecture 9: Static and Dynamic Hashing. The Lecture Contains: Static hashing. Hashing. Dynamic hashing. Extendible hashing.

Fundamental Algorithms

CS 2412 Data Structures. Chapter 10 Sorting and Searching

COMP 103 RECAP-TODAY. Hashing: collisions. Collisions: open hashing/buckets/chaining. Dealing with Collisions: Two approaches

Fast Lookup: Hash tables

Introduction to Hashing

STRUKTUR DATA. By : Sri Rezeki Candra Nursari 2 SKS

Announcements. Submit Prelim 2 conflicts by Thursday night A6 is due Nov 7 (tomorrow!)

Standard ADTs. Lecture 19 CS2110 Summer 2009

Symbol Table Information. Symbol Tables. Symbol table organization. Hash Tables. What kind of information might the compiler need?

CS2210 Data Structures and Algorithms

Chapter 20 Hash Tables

Hash Table and Hashing

Lecture 4. Hashing Methods

Hash Tables. Johns Hopkins Department of Computer Science Course : Data Structures, Professor: Greg Hager

CS 206 Introduction to Computer Science II

ECE 242 Data Structures and Algorithms. Hash Tables I. Lecture 24. Prof.

Building Java Programs

Priority Queue. 03/09/04 Lecture 17 1

Symbol Tables. For compile-time efficiency, compilers often use a symbol table: associates lexical names (symbols) with their attributes

CS 310 Advanced Data Structures and Algorithms

CMSC 341 Hashing. Based on slides from previous iterations of this course

Worst-case running time for RANDOMIZED-SELECT

Abstract Data Types (ADTs) Queues & Priority Queues. Sets. Dictionaries. Stacks 6/15/2011

Hashing. Hashing Procedures

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

CMSC 132: Object-Oriented Programming II. Hash Tables

Dictionaries and Hash Tables

ECE 242 Data Structures and Algorithms. Hash Tables II. Lecture 25. Prof.

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

Question Bank Subject: Advanced Data Structures Class: SE Computer

Transcription:

Implementation of Linear Probing (continued) Helping method for locating index: private int findindex(long key) // return -1 if the item with key 'key' was not found int index = h(key); int probe = index; int k = 1; // probe number do if (table[probe]==null) // probe sequence has ended break; if (table[probe].getkey()==key) return probe; probe = (index + step(k)) % table.length; // check next slot k++; while (probe!=index); return -1; // not found

Implementation of Linear Probing (continued) Find and Deleting the item: public T find(long key) int index = findindex(key); if (index>=0) return (T) table[index]; else return null; // not found public T delete(long key) int index = findindex(key); if (index>=0) T item = (T) table[index]; table[index] = AVAILABLE; // mark available return item; else return null; // not found

Open addressing: Quadratic Probing Designed to prevent primary clustering. It does this by increasing the step by increasingly large amounts as more probes are required to insert a record. This prevents clusters from building up. In quadratic probing the step is equal to the square of the probe number. With linear probing the step values for a sequence of probes would be 1, 2, 3, 4, etc. For quadratic probing the step values would be 1, 2 2, 3 2, 4 2, etc, i.e. 1, 4, 9, 16, etc. Disadvantage of this method: After a number of probes the sequence of steps repeats itself (remember that the step will be probe number 2 mod the size of the hash table). This repetition occurs when the probe number is roughly half the size of the hash table. Secondary clustering.

Open addressing: Quadratic Probing Disadvantage of this method: After a number of probes the sequence of steps repeats itself. => It fails to insert a new item even if there is still a space in the array. Secondary clustering: the sequence of probe steps is the same for any insertion. Secondary clustering refers to the increase in the probe length (that is the number of probes required to find a record) for records where collisions have occurred (the keys are mapped to the same value). Note that this is not as large a problem as primary clustering. However, it is important to realize that in practice these two issues are not significant, given a large hash table and a good hash function it is extremely unlikely that these issues will affect the performance of the hash table, unless it becomes nearly full.

Figure: Quadratic probing with h(x) = x mod 101

Implementation It s enough to modify the step helping method: private int step(int k) // step function return k*k // quadratic probing

Open addressing: Double Hashing Double hashing aims to avoid both primary and secondary clustering and is guaranteed to find a free element in a hash table as long as the table is not full. It achieves these goals by calculating the step value using a second hash function h. step(k) = k.h (key) This new hash function h should: be different from the original hash function (remember that it was the original hash function that resulted in the collision in the first place) and, not result in zero (as original index + 0 = original index)

Open addressing: Double Hashing The second hash function is usually chosen as follows: h (key) = q (key%q), where q is a prime number q<n (N is the size of the array). Remark: It is important that the size of the hash table is a prime number if double hashing is to be used. This guarantees that successive probes will (eventually) try every index in the hash table before an index is repeated (which would indicate that the hash table is full). For other hashings (and for q) we want to use prime numbers to eliminate existing patterns in the data.

Figure: Double hashing during the insertion of 58, 14, and 91

Double Hashing Implementation public class DoubleHashTable<T extends KeyedItem> implements HashTableInterface<T> private KeyedItem[] table; // special values: null = EMPTY, T with key=0 = AVAILABLE private static KeyedItem AVAILABLE = new KeyedItem(0); private int q; // should be a prime number public DoubleHashTable(int size,int q) // size: should be a prime number; // recommended roughly twice bigger // than the expected number of elements // q: recommended to be a prime number, should be smaller than size table = new KeyedItem[size]; this.q=q;

Double Hashing Implementation private int h(long key) // hash function // return index return (int)(key % table.length); // typecast to int private int hh(long key) // second hash function // return step multiplicative constant return (int)(q - key%q); private int step(int k,long key) // step function return k*hh(key);

Double Hashing Implementation Call step(k,item.getkey())instead of step(k) public void insert(t item) throws HashTableFullException int index = h(item.getkey()); int probe = index; int k = 1; do if (table[probe]==null table[probe]==available) // this slot is available table[probe] = item; return; probe = (index + step(k,item.getkey())) % table.length; // check next slot k++; while (probe!=index); throw new HashTableFullException("Hash table is full.");

Open addressing performance The performance of a hash table depends on the load factor of the table. The load factorαis the ratio of the number of data items to the size of the array. Of the three types of open addressing double hashing gives the best performance. Overall, open addressing works very well up to load factors of around 0.5 (when 2 probes on average are required to find a record). For load factors greater than 0.6 performance declines dramatically.

Rehashing If the load factor goes over the safe limit, we should increase the size of the hash table (as for dynamic arrays). This process is called rehashing. Comments: we cannot just double the size of the table, as the size should be a prime number; it will change the main hash function it s not enough to just copy items Rehashing will take time O(N)

Dealing with Collisions (2 nd approach): Separate Chaining In separate chaining the hash table consists of an array of lists. When a collision occurs the new record is added to the list. Deletion is straightforward as the record can simply be removed from the list. Finally, separate chaining is less sensitive to load factor and it is normal to aim for a load factor of around 1 (but it will work also for load factors over 1).

Figure: Separate chaining (using linked lists). If array-based implementation of list is used: they are called buckets.