Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Similar documents
COMP171. Hashing.

Hash Tables. Hashing Probing Separate Chaining Hash Function

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Hashing Techniques. Material based on slides by George Bebis

Algorithms and Data Structures

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Hash Table and Hashing

HASH TABLES.

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Understand how to deal with collisions

Worst-case running time for RANDOMIZED-SELECT

Unit #5: Hash Functions and the Pigeonhole Principle

Data Structures And Algorithms

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Fundamental Algorithms

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Open Addressing: Linear Probing (cont.)

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

Hashing. Hashing Procedures

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

Data Structures and Algorithms. Chapter 7. Hashing

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

AAL 217: DATA STRUCTURES

Lecture 4. Hashing Methods

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

9/24/ Hash functions

Topic HashTable and Table ADT

Algorithms and Data Structures

The dictionary problem

HASH TABLES. Goal is to store elements k,v at index i = h k

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Lecture 17. Improving open-addressing hashing. Brent s method. Ordered hashing CSE 100, UCSD: LEC 17. Page 1 of 19

TABLES AND HASHING. Chapter 13

Data and File Structures Chapter 11. Hashing

Data Structures and Algorithms. Roberto Sebastiani

Hashing Algorithms. Hash functions Separate Chaining Linear Probing Double Hashing

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

Cpt S 223. School of EECS, WSU

DATA STRUCTURES AND ALGORITHMS

CMSC 341 Hashing. Based on slides from previous iterations of this course

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Hash Tables and Hash Functions

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

CSE 214 Computer Science II Searching

Hash[ string key ] ==> integer value

Dictionaries and Hash Tables

Fast Lookup: Hash tables

Hashing 1. Searching Lists

BBM371& Data*Management. Lecture 6: Hash Tables

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

Data Structures and Algorithm Analysis (CSC317) Hash tables (part2)

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

HASH TABLES. Hash Tables Page 1

Hash Tables. Gunnar Gotshalks. Maps 1

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

III Data Structures. Dynamic sets

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

DATA STRUCTURES/UNIT 3

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Part I Anton Gerdelan

Introduction to Hashing

SFU CMPT Lecture: Week 8

DATA STRUCTURES AND ALGORITHMS

CSC263 Week 5. Larry Zhang.

Outline. hash tables hash functions open addressing chained hashing

CS 350 : Data Structures Hash Tables

CPSC 259 admin notes

CSCD 326 Data Structures I Hashing

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Comp 335 File Structures. Hashing

Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Lecture 7: Efficient Collections via Hashing

Hash-Based Indexing 1

Data and File Structures Laboratory

CSCI Analysis of Algorithms I

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Dictionaries and Hash Tables

CS 2412 Data Structures. Chapter 10 Sorting and Searching

1. Attempt any three of the following: 15

Hashing. October 19, CMPE 250 Hashing October 19, / 25

Hash Tables Hash Tables Goodrich, Tamassia

Week 9. Hash tables. 1 Generalising arrays. 2 Direct addressing. 3 Hashing in general. 4 Hashing through chaining. 5 Hash functions.

Data Structures and Algorithms(10)

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

Adapted By Manik Hosen

Transcription:

Hashing Dr. Ronaldo Menezes Hugo Serrano

Agenda Motivation Prehash Hashing Hash Functions Collisions Separate Chaining Open Addressing

Motivation Hash Table Its one of the most important data structures in computer science: Databases Authentication Systems Spell Checking Network Routers Cryptography Compilers File/Directory Synchronization

Hash Tables We are familiar with direct access structures and linear access structures Both have their advantages and disadvantages

Hash Tables The main reason one might avoid direct access structures is the fact that we need to allocate its size in advance We tend to think that the actual number of keys to be stored is equivalent to the universe of possible existing keys

Hash Tables 1. In some other cases we have a problem of accessing an element directly because the key is non-trivial Not necessarily an index in an array or something that can be easily used as an index 2. In some problems the number of keys to be stored is (much) smaller than the numbers in the universe of keys In this case if we use an array we might waste a lot of space

Prehash Map non-trivial keys to nonnegative integers. In theory, keys are finite and discrete; Anything on a computer can be written down as a string of bits. Strings of bits represent integers In practice, it is slightly different Ideally, prehash(x) = prehash(y) IF x=y

Hash Functions Reduce the universe U of all keys (integers) down to a reasonable size m for table. h : U --> {0,1,...,m-1} where m is the size of the table

Hash Functions Pigeonhole Principle If n pigeons (items) are put into m pigeonholes with n>m, then at least one pigeonhole must contain more than one item.

Hash Functions Most hash functions assume keys as natural numbers What makes a good hash function? It is one that satisfies the assumption of uniform hashing

Simple Uniform Hashing Uniformity Each key is equally likely to be hashed to any slot of the table Independence Keys will be hashed to slots independent of to what slot was hashed to other keys Unfortunately the above is rarely possible since we need to know the probability distribution of the keys

Common Hash Functions The division method is based on where m is the size of the hash table Good values of m are crucial These normally consist of prime numbers that are close to n divided by the number of desired average probes For instance, if we want to store 4000 numbers and we don t mind doing 4 probes, chose m to be a prime close to 1000 In this case 997

Common Hash Functions The multiplication method is based on where A is a constant between 0 and 1 This is a good choice because it does not depend on m only Knuth suggested that a good general A is This comes from the golden ration that is given (approximately) by Example storing the number 765 into a table of size 45 gives us

Pictorial view of Hash Tables k1 k2 k4 k3

Pictorial view of a Collision k1 k2 k4 k5 k3

Order Preservation Order preservation of hash functions is similar to the stability property in sorting Given an ordering of keys to be hashed from then we should expect that: What is the importance of this characteristic? Let s discuss it when we talk about conflict resolution.

Collision Resolution Because we are mapping elements to a normally smaller domain of slots, collisions are likely to happen There are two classes of collision resolution: Separate chaining The table points to structures holding the element that collides Open addressing The elements being hashed are actually stored in the table and not on a separate structure

Separate Chaining The most common resolution mechanism is called separate chaining or just chaining It consists of mixing the concepts of linked lists and direct access structures like arrays Each slot of a hash table is a pointer to a dynamic structure (say a linked list or a binary search tree)

Collision Resolution When hashing a key, if collision happens the new key is stored in the linked list in that location Let's see some real example. Suppose that we're mapping the universe of integers in a hash table of size 10 Our hash function may be based on the division method for creating hash values h(k) = k mod size

Hashing(103) h(n) = 103 mod 10 h(103) = 3

Hashing(103) h(n) = 103 mod 10 h(n) = 3 103 /

Hashing(69) h(n) = 69 mod 10 h(n) = 9 103 / 69 /

Hashing(20) h(n) = 20 mod 10 h(n) = 0 20 / 103 / 69 /

Hashing(13) h(n) = 13 mod 10 h(n) = 3 20 / 103 13 / 69 /

Hashing(110) h(n) = 110 mod 10 h(n) = 0 20 110 / 103 13 / 69 /

Hashing(53) h(n) = 53 mod 10 h(n) = 3 20 110 / 103 13 53 / 69 /

Final Hash Table 20 110 / 103 13 53 / 69 /

Searching in a Hash Table (assuming chaining) Like any other structure, searching is a common task with hash tables Searching works as below Hash the target Take the value of the hash of target and go to the slot. If the target exist it must be in that slot Search in the list in the current slot using a linear search (assuming Linked Lists)

Searching for 53 20 110 / 103 13 / 53 / 69 /

Searching for 53 20 110 / 103 13 / 53 / 69 /

Searching for 53 20 110 / 103 13 / 53 / temp 69 /

Searching for 53 20 110 / 103 13 / 53 / temp 69 /

Searching for 53 20 110 / 103 13 / 53 / temp 69 /

Searching for 53 20 110 / 103 13 / 53 / temp 69 /

hashsearch(n) NodeType hashsearch(nodetype table[],int target) { int index = hash(target); NodeType temp = table[index]; return linearsearch(temp,target); }

Analysis of Hash Search Expected length of the a chain for n keys and m slots n m = α = load factor

Analysis of Hash Search Discussion Using big-o notation express the performance of hash search Worst Case Best Case

Analysis of Hash Search Discussion Using big-o notation express the performance of hash search Average Case T n = O 1 + α = O(1)

Collision techniques The techniques based on open addressing are: Linear Probing: If position h(key) is occupied, do a linear search in the table until you find a empty slot. The slot is searched in this order: Suffers from primary clustering Values above are modded.

More open addressing techniques Quadratic probing: is a variant of the above where the term being added to the hash result is squared. h(key)+c 2 Suffers from secondary clustering: a milder version of primary clustering Random probing: is another variant where the term being added to the hash function is a random number. h(key)+random() No clustering but conflicts rapidly leads to O(n) search. Rehashing: is a technique where a sequence of hashing functions are defined (h 1, h 2,... h k ). If a collision occurs the functions are used in the this order.