ECE 242 Data Structures and Algorithms. Hash Tables I. Lecture 24. Prof.

Similar documents
ECE 242 Data Structures and Algorithms. Hash Tables II. Lecture 25. Prof.

Final- ECE 242 Fall 2016 Closed book/notes- no calculator- no phone- no computer

Topic HashTable and Table ADT

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Hash Tables. Hashing Probing Separate Chaining Hash Function

AAL 217: DATA STRUCTURES

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

Understand how to deal with collisions

Chapter 27 Hashing. Liang, Introduction to Java Programming, Eleventh Edition, (c) 2017 Pearson Education, Inc. All rights reserved.

Hashing. It s not just for breakfast anymore! hashing 1

Lecture Notes on Hash Tables

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

Open Addressing: Linear Probing (cont.)

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

HASH TABLES. Hash Tables Page 1

Introduction to Hashing

The dictionary problem

Hash Table and Hashing

CSE 214 Computer Science II Searching

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Lecture 18. Collision Resolution

Adapted By Manik Hosen

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

COMP171. Hashing.

Chapter 27 Hashing. Objectives

CS 350 : Data Structures Hash Tables

9/24/ Hash functions

Hash Tables. A hash table is a data structure that offers very fast insertion IN THIS CHAPTER. Introduction to Hashing.

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

ECE 242 Data Structures and Algorithms. Heaps I. Lecture 22. Prof. Eric Polizzi

Module 3: Hashing Lecture 9: Static and Dynamic Hashing. The Lecture Contains: Static hashing. Hashing. Dynamic hashing. Extendible hashing.

Standard ADTs. Lecture 19 CS2110 Summer 2009

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

TABLES AND HASHING. Chapter 13

Data Structures And Algorithms

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Lecture 12 Notes Hash Tables

ECE 242 Data Structures and Algorithms. Advanced Sorting I. Lecture 16. Prof.

ECE 242 Data Structures and Algorithms. Advanced Sorting II. Lecture 17. Prof.

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Chapter 20 Hash Tables

CSCD 326 Data Structures I Hashing

Hash table basics mod 83 ate. ate. hashcode()

Priority Queue. 03/09/04 Lecture 17 1

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

CSE 373 Autumn 2012: Midterm #2 (closed book, closed notes, NO calculators allowed)

UNIT III BALANCED SEARCH TREES AND INDEXING

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

DATA STRUCTURES/UNIT 3

Hashing Techniques. Material based on slides by George Bebis

Data Structures. Topic #6

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Hash table basics mod 83 ate. ate

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

Introduction To Hashing

ECE 242 Data Structures and Algorithms. Linked List I. Lecture 10. Prof.

Lecture 16 More on Hashing Collision Resolution

Hash table basics. ate à. à à mod à 83

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Data Structures and Object-Oriented Design VIII. Spring 2014 Carola Wenk

BINARY HEAP cs2420 Introduction to Algorithms and Data Structures Spring 2015

Lecture 12 Hash Tables

Hashing. Biostatistics 615 Lecture 12

Comp 335 File Structures. Hashing

AP Programming - Chapter 20 Lecture page 1 of 17

Question Bank Subject: Advanced Data Structures Class: SE Computer

lecture23: Hash Tables

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

STRUKTUR DATA. By : Sri Rezeki Candra Nursari 2 SKS

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

More on Hashing: Collisions. See Chapter 20 of the text.

THINGS WE DID LAST TIME IN SECTION

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Hash table basics mod 83 ate. ate

ECE 242 Data Structures and Algorithms. Trees IV. Lecture 21. Prof.

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

DATA STRUCTURES AND ALGORITHMS

Algorithms and Data Structures

Lecture 4. Hashing Methods

Hash Tables. Gunnar Gotshalks. Maps 1

CSC 321: Data Structures. Fall 2016

Part I Anton Gerdelan

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

csci 210: Data Structures Maps and Hash Tables

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Abstract Data Types (ADTs) Queues & Priority Queues. Sets. Dictionaries. Stacks 6/15/2011

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

COMP 103 RECAP-TODAY. Hashing: collisions. Collisions: open hashing/buckets/chaining. Dealing with Collisions: Two approaches

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

CSCI 104 Hash Tables & Functions. Mark Redekopp David Kempe

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

CS 2412 Data Structures. Chapter 10 Sorting and Searching

CS 3410 Ch 20 Hash Tables

Transcription:

ECE 242 Data Structures and Algorithms http//www.ecs.umass.edu/~polizzi/teaching/ece242/ Hash Tables I Lecture 24 Prof. Eric Polizzi

Motivations We have to store some records and perform the following Insert new record Search a record by key Delete a record What is the optimal data structure that allows fast insertion and searching? Unordered array insert optimal O(1), search slow O(N) Ordered array insert slow O(N), search fast O(logN) Linked list insert optimal O(1), search slow O(N) Binary Search Tree (balanced) insert fast O(logN), search fast O(logN) Hash Table insert optimal O(1), search optimal O(1)...? 2

Motivations More comparisons between general purpose data structures Some disadvantages for Hash tables They are based on arrays, so difficult to expand once created Performances may degrade significantly if the hash table is too full There is no convenient way to traverse a hash table in any kind of order Hash table is an ideal choice if you do not need to visit items and you can predict in advance the size of the database 3

Hash Tables Introduction to Hashing Example Access students records by ID (here 1000 students in the database) ID NAME SCORE 000 001 002... 303 304... andy betty david peter mary 81.5 90 56.8 998 tom 73 999 bill 49 Best Strategy use an array of student objects, ID numbers (called keys) can be chosen as the index numbers. O(1) for insert, remove, search In real situations, however, keys are not distributed in such orderly fashion 20 100 4

Hash Tables Introduction to Hashing Example Access students records by ID (1000 students in the database) ID NAME SCORE Obviously, we cannot create an array of ID of size 10,000,000 where almost all cells will also be empty ( waste of memory) 0012345 0033333 0056789... 9801010 9802020... andy betty david peter mary 81.5 90 56.8 20 100 9903030 tom 73 9908080 bill 49 ID NAME SCORE 0 12345 33333 56789 9908080 9999999 andy 81.5 betty 90 david 56.8 bill 49 5

Hash Tables Introduction to Hashing Goal Find a way to squeeze a very large range of keys into a table of much smaller range We then introduce the function int hash(key) key can be an integer, String, etc. hash represents a mapping, the method converts a number in a large range into a number that belongs to a smaller manageable range A simple approach with key being an integer is to use the modulo operator (%), which finds the remainder of the division smallnumber=largenumber%smallrange Example smallrange=1000 and key are students large ID hash(0012345)=345 hash(0033333)=333 hash(0056789)=789 hash(9908080)=80 6

Hash Tables Introduction to Hashing Example To store a record, we compute hash(id), and store it at the new location in the smaller range array 0 80 333 345 789 999 ID NAME SCORE 9908080 0033333 0012345 0056789 bill 49 betty 90 andy 81.5 david 56.8 To search for a student, we just need to peek at the location hash(targetid) 7

Hash Tables Comments Some desirable properties of a hash function Simple and quick to calculate Even distribution for the return indexes However, there is no guarantee that two keys or more won't hash to the same array index This situtation is called a collision 0012345 andy 81.5 345 ID1322345, Luke 61.5 It is not possible to avoid collisions while hashing, however there exist two main strategies for addressing this problem 1 Open addressing 2 Separate Chaining 8

Open Addressing Definition First we need to consider an array with a size greater than the number of expected data items ( x2 size) If no collision happens, the array will be half full. Using open addressing, if the new data can't be placed at the index calculated by the hash function (if the cell is already occupied), it must find another location in the array. There exist 3 main methods of open addressing which depend on the way to determine the next vacant cell Linear probing (easy, but problem of clustering may appear) Quadratic probing (effective) Double Hashing (preferred) 9

Open Addressing Linear Probing We search sequentially for vacant cells Insertion example compute hash value for a given key e.g. 345 If 345 cell is occupied when we try to insert a new data item, we go to 346, and then 347 and so on, until we find an empty cell Search example Compute hash value for a given key e.g. 345 Start probing at 345, and increment the probe until the key of a new probe cell equal to the search key the new probe cell is empty (search has failed) Comments on Delete Deleted items should be marked with a special key (example 1) The insert method should consider the deleted cell as empty The search method should treat the deleted cell as filled (and search further along) 10

Open Addressing Linear Probing More examples with search Todo Java applet hash.html 11

Open Addressing Linear Probing class HashTable{ private DataItem table; private int size; public HashTable(int size){ this.size=size; table = new DataItem[size]; } public int hash(int key){return key%size; } public void insert(dataitem item){ int h = hash(item.getkey()); // key while (table[h]!=null && table[h].getkey()!=-1) { // until empty spot or -1 h=(h+1)%size;//if occupied,increment by 1 } table[h]=item; // enter item into table } public DataItem find(int key){ int h = hash(key); // compute hash of item while (table[h]!=null && table[h].getkey()!=key) { // find matching item or null h=(h+1)%size; // if no match, increment by 1 } return table[h]; // return item or null } } 12

Open Addressing Linear Probing Execution example (complete code Textbook p535) 13

Open Addressing Linear Probing limitations As the hash table gets more full, clustering grows larger; It can result in very long probe length and degrade performance from O(1) to O(M), M being the width of the cluster. The load factor is defined as the ratio between the number of items in the table and the size of the table. If load increases, the probe length P grows larger. After some calculations (see Knuth, and textbook p567), one can obtain some formula for P relative to successful or unsuccessful searches. It is critical to design a hash table to ensure that it never becomes more than half or at most two third full Expanding the array is also problematic (cannot use a direct copy), it involves a time consuming rehashing process 14