Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Similar documents
Hash Tables. Hashing Probing Separate Chaining Hash Function

Introduction To Hashing

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Hashing Techniques. Material based on slides by George Bebis

Understand how to deal with collisions

Open Addressing: Linear Probing (cont.)

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

TABLES AND HASHING. Chapter 13

Topic HashTable and Table ADT

Fundamental Algorithms

Data and File Structures Chapter 11. Hashing

COMP171. Hashing.

Part I Anton Gerdelan

Hashing. Hashing Procedures

AAL 217: DATA STRUCTURES

Data Structures And Algorithms

Worst-case running time for RANDOMIZED-SELECT

9/24/ Hash functions

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Algorithms and Data Structures

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Data Structures and Algorithm Analysis (CSC317) Hash tables (part2)

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Data Structures and Algorithms. Chapter 7. Hashing

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

CS 241 Analysis of Algorithms

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Data Structures. Topic #6

Data Structures and Algorithms. Roberto Sebastiani

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Question Bank Subject: Advanced Data Structures Class: SE Computer

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

BBM371& Data*Management. Lecture 6: Hash Tables

THINGS WE DID LAST TIME IN SECTION

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

DATA STRUCTURES AND ALGORITHMS

More on Hashing: Collisions. See Chapter 20 of the text.

HASH TABLES. Hash Tables Page 1

CSE 214 Computer Science II Searching

Fast Lookup: Hash tables

CSE 2320 Notes 13: Hashing

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Module 2: Classical Algorithm Design Techniques

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

CS 2412 Data Structures. Chapter 10 Sorting and Searching

Algorithms and Data Structures

Chapter 27 Hashing. Objectives

Outline. hash tables hash functions open addressing chained hashing

Comp 335 File Structures. Hashing

Hash Tables. Gunnar Gotshalks. Maps 1

HASH TABLES. Goal is to store elements k,v at index i = h k

HASH TABLES.

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Hashing IV and Course Overview

CSCD 326 Data Structures I Hashing

Successful vs. Unsuccessful

Hash Table and Hashing

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

SFU CMPT Lecture: Week 8

Direct File Organization Hakan Uraz - File Organization 1

UNIT III BALANCED SEARCH TREES AND INDEXING

CS 10: Problem solving via Object Oriented Programming Winter 2017

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

1. Attempt any three of the following: 15

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

Chapter 20 Hash Tables

Unit #5: Hash Functions and the Pigeonhole Principle

STRUKTUR DATA. By : Sri Rezeki Candra Nursari 2 SKS

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

Lecture 10 March 4. Goals: hashing. hash functions. closed hashing. application of hashing

CSI33 Data Structures

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

CSC263 Week 5. Larry Zhang.

Outline. Computer Science 331. Desirable Properties of Hash Functions. What is a Hash Function? Hash Functions. Mike Jacobson.

Hash Tables and Hash Functions

Hashing. It s not just for breakfast anymore! hashing 1

Introduction to Hashing

ECE 242 Data Structures and Algorithms. Hash Tables II. Lecture 25. Prof.

III Data Structures. Dynamic sets

CPSC 259 admin notes

CMSC 341 Hashing. Based on slides from previous iterations of this course

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

Practical Session 8- Hash Tables

Hash Tables. CS 321 Spring 2015

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

CS 206 Introduction to Computer Science II

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

lecture23: Hash Tables

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

Transcription:

HASHING By, Durgesh B Garikipati Ishrath Munir Hashing Sorting was putting things in nice neat order Hashing is the opposite Put things in a random order Algorithmically determine position of any given item Improve search from O(log 2 N) to O(1) HOW? Place everything in an array based on search key value Run the search key through an address calculator (the hash function) Insert/retrieve from location specified by the hash function (address calculator) This scheme is termed hashing The function is the hash function The array is called a hash table Why not discard other data structures? The mapping not always one-to to-oneone A major drawback Results in collisions - two items mapped to same location which adds complexity Ordered operations do not work well Making array large enough to avoid collisions seldom practical Ordered Operations We said earlier that hash tables don t work well for ordered operations Now you should know enough to understand why Problems: Display the users stored in a hash table alphabetically Finding the lowest/highest values in a database To do these operations, we need to retrieve all the data in the hash table, sort or compare as we retrieve. Typical Hash Function Must Be fast and easy to compute Place items evenly through the hash table More important, because the biggest performance loss is due to resolving collisions 1

Different Hash Functions Division and Remainder method hash (key) = key % ARRAY_SIZE Can distribute array items evenly by choosing a prime number for ARRAY_ SIZE Folding Add the digits hash(001364825) = 1+3+6+4+8+2+5 = 29 Produces a normal distribution too bunched up Universal hashing Continued.. -select hash function at random from designed class of functions - Diff address for the same inputs Digit Rearrangement - taking part of the original value or key, reversing of order, using that sequence of digits Collision Resolving Collision 4567 % 101 = 22 = 7597 % 101 : collision Two approaches to resolve Alter the hash table to allow multiple entries per location Allocate another location within the hash table Restructure Hash Table Multiple items stored in a single array location Two schemes to do this Buckets Separate chaining Buckets Each array location is itself an array called a bucket Problem is choosing size of these arrays If too small, will fill up and must probe If too large, wasted space Chaining Hash table is an array of linked lists Each array element is a pointer to a linked list, the chain Insertion: place at beginning of chain Deletion/retrieval: search the appropriate chain 2

Chaining Insertion is O(1) Deletion/retrieval depend on chain length Searching (1 + α) ) for successful search (1+ α) ) for unsuccessful search Best case: O(1), worst case: O(N) Open Addressing Open Addressing Probe for another empty, open location - Linear Probing - Quadratic probing - Double Hashing Linear probing Probe sequentially until an empty spot is found Given key K First slot probed T[h (k)], if collision continue T[h (k)+1].. Up to T[m-1] - Wrap around if you get to the end Additions straightforward Deletions pose a problem Problems of Linear Probing Items tend to cluster together in consecutively occupied locations Termed primary clustering A cluster tends to increase in size Clusters can merge into larger clusters Increase the average search time Primary clustering makes linear probing inefficient Quadratic probing Uses a hash function h(k,i) = (h (k)+c 1 i + C 2 i 2 ) mod m C 1 C 2 - auxillary constants i = 0,,m-1 - Positions are offset in a quadratic manner Virtually eliminates primary clusters Suffers from secondary clustering since same probe sequence used to resolve collision at original location Double Hashing Best methods of open addressing Uses a hash function h(k,i) = (h 1 (k) + ih 2 (k)) mod m h 1, h 2 auxillary hash functions Drastically reduces clustering Previous methods are key independent Double hashing uses key-dependent probe sequences 3

Continued Choose first function hash() as usual Follow these guidelines for hash2() hash2(key)!= 0 hash2()!= hash() Hashing Efficiency Involves the load factor α α = average # of elements α is a measure of how full the table is As the hash table fills α increases Chance of collision increases Search time increases Hashing efficiency decreases Linear Probing As collisions increase Probe sequence increases Search time increases α should not exceed 2/3 for linear probing Best case : O(1) Worst Case : O(1+ α ) Quadratic probing,double Hashing On average, both require fewer comparisons than linear probing For α = 2/3, avg unsuccessful search might require at most 3 compares, successful 2 All three open addressing schemes suffer when unable to predict number of insertions and deletions as table may be too small Good Hash Function Easy and fast to compute Scatter data evenly through table Check for various types of data Random data Nonrandom data Two general principles for even distribution Use the whole key If modulo math, base should be prime Questions? 4

References 1) Introduction to Algorithms Thomas H. Cormen,, Charles E.Leiserson, Ronald L. Rivest 2) http://www.palfrader.org/hashing/ 3)http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS2 10/hash_tables.html 4)http://64.233.179.104/search?q=cache:9dqRIPQ hl5gj:www.npsnet.org/~mcdowell/cs3971/ha shing+hashing%2befficiency&hl=en 5