HASH TABLES.

Similar documents
5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

AAL 217: DATA STRUCTURES

Comp 335 File Structures. Hashing

COMP171. Hashing.

Hashing. October 19, CMPE 250 Hashing October 19, / 25

TABLES AND HASHING. Chapter 13

Topic HashTable and Table ADT

Hashing Techniques. Material based on slides by George Bebis

Hash Table and Hashing

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Algorithms and Data Structures

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Hash Tables. Hashing Probing Separate Chaining Hash Function

Successful vs. Unsuccessful

Hashing for searching

Adapted By Manik Hosen

Fundamental Algorithms

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

Data Structures And Algorithms

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

CS 241 Analysis of Algorithms

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Lecture 4. Hashing Methods

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

Part I Anton Gerdelan

CSI33 Data Structures

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Data and File Structures Chapter 11. Hashing

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

CMSC 341 Hashing. Based on slides from previous iterations of this course

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

HASH TABLES. Goal is to store elements k,v at index i = h k

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

Understand how to deal with collisions

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

CS 350 : Data Structures Hash Tables

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

Dictionaries and Hash Tables

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

DATA STRUCTURES AND ALGORITHMS

The dictionary problem

Open Addressing: Linear Probing (cont.)

Cpt S 223. School of EECS, WSU

CS 310 Advanced Data Structures and Algorithms

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Outline. hash tables hash functions open addressing chained hashing

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Worst-case running time for RANDOMIZED-SELECT

CSE 214 Computer Science II Searching

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Hash[ string key ] ==> integer value

1. Attempt any three of the following: 15

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

BBM371& Data*Management. Lecture 6: Hash Tables

Quiz 3 CSCI 333 Fall December, 2001

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Hashing 1. Searching Lists

Fast Lookup: Hash tables

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Data Structures and Algorithms. Chapter 7. Hashing

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

Dictionaries-Hashing. Textbook: Dictionaries ( 8.1) Hash Tables ( 8.2)

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Data Structures and Algorithms. Roberto Sebastiani

Hashing. Hashing Procedures

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

III Data Structures. Dynamic sets

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

Chapter 27 Hashing. Liang, Introduction to Java Programming, Eleventh Edition, (c) 2017 Pearson Education, Inc. All rights reserved.

Introduction to Hashing

2 Fundamentals of data structures

Hashing. Prof. Dr. Debora Weber-Wulff Informatik 2

b) Describe a procedure visitors should follow to find a free parking space, when the space they are assigned is occupied.

Chapter 20 Hash Tables

HASH TABLES. Hash Tables Page 1

Quiz 1 Practice Problems

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

1 / 22. Inf 2B: Hash Tables. Lecture 4 of ADS thread. Kyriakos Kalorkoti. School of Informatics University of Edinburgh

Lecture 16 More on Hashing Collision Resolution

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Hash Tables Hash Tables Goodrich, Tamassia

Data Structures and Algorithm Analysis (CSC317) Hash tables (part2)

AdvanceDataStructures-Unit-1(Dictionaries)

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Dictionaries and Hash Tables

Data Structures and Object-Oriented Design VIII. Spring 2014 Carola Wenk

Question Bank Subject: Advanced Data Structures Class: SE Computer

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

of characters from an alphabet, then, the hash function could be:

Chapter 27 Hashing. Objectives

Advanced Algorithmics (6EAP) MTAT Hashing. Jaak Vilo 2016 Fall

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Transcription:

1 HASH TABLES http://en.wikipedia.org/wiki/hash_table

2 Hash Table A hash table (or hash map) is a data structure that maps keys (identifiers) into a certain location (bucket) A hash function changes the key into an index value (or hash value)

3 Hash Tables The Hash Table has a fixed length. We'll see how to add space dynamically later. Keys Ryan, D Williams, C Khoja, S hash(ryan, D) = 01 hash(williams, C) = 06 hash(khoja, S) = 03 Hash Function Bucket Data 01 Ryan, D 02 03 Khoja, S 04 05 06 Williams, C 07

4 A Problem We want to create a fast dictionary that we can use to look up the definition of 2-letter words (e.g. on, to, so, no). The number of possible words is 26 * 26 = 676. Create an array with 676 entries. Write a function to map each 2-letter word to a unique integer number within the array range [0, 675]. Use the integer as an index to the array, and place the definition of the word into that array position.

5 A Solution no 0 Function: 26 * ( c 1 a ) + ( c 2 a ) on to 352 def no Result for no : 26 * ( n a ) + ( o a ) = 377 def on 26 * ( 14 1 ) + ( 15 1 ) = 352 Result for on = 377 508 def to Result for to = 508 675

6 Another Problem Q. How big would the hash table be if we wanted to store every word in the English language? Keep in mind that the longest word in the dictionary has 45 letters. The longest word in the dictionary is pneumonoultramicroscopicsilicovolcanoconioisis and it s a type of lung disease. The size of the table needed to store all English words of 45 letters or less is 26 45. Of course it is impossible to create and store such a large table, and it would also be wasteful since fewer than a million of those words would be valid English words. What is the solution? Use a different hash function!

7 A Solution Create an array large enough to hold all the words in the English language. Map the words to the array indexes using a hash function. H ( key ) = key mod table-size A hash function should: Be easy to implement and quick to compute. Achieve an even distribution over the hash table. The table-size is normally chosen to be a prime number to avoid uneven distribution

8 A Simple Example 91 0 1 36 36 2 3 10 4 5 77 83 77 9 6 7 8 9 10 11 91 9 10 83

9 Hash Functions Hash function: key % table-size = array index 10 % 12 = 10 91 % 12 = 7 83 % 12 = 11 36 % 12 = 0 77 % 12 = 5 9 % 12 = 9 0 1 2 3 4 5 6 7 8 9 10 11 36 77 91 9 10 83

10 Collision What happens when we try and map the following keys onto the same array? This is called a collision! 48 65 0 1 2 3 4 5 6 7 8 9 10 11 36 27 77 91 9 10 83

11 Avoiding Collision We could try to avoid this collision by using a different hash function. Choosing a good hash functions is hard, so instead, we concentrate on how to resolve collisions.

12 Collisions Perfect Hash - each key maps to an empty bucket Rare! Collisions occur where two different keys map to the same bucket Solution?

13 Hash Function Hash function compute the key's bucket address from the key some function h(k) maps the domain of keys K into a range of addresses 0, 1, 2, M-1 The Problem Finding a suitable function h Determining a suitable M Handling collisions

14 Hash Function Mid Square (turn the key into an integer) square the key take some number of bits from the center to form the bucket address Advantages? Disadvantages?

15 Example Problem: Let's assume that the key value is simply the sum of the ASCII values squared. If the key value is 16-bits and we take the middle 8-bits: How big is the hash table? What is the range of bucket addresses? Where does the key AB map to in the hash table?

16 Implementation section 2.9 of K&R How do we access the middle 8 in an integer? // assume 4 byte integers unsigned int key = 0x1231a456; unsigned int middle; middle = (key & 0x000ff000) >> 12; printf("%08x %08x\n", key, middle); pad with zero 8 wide hexadecimal output http://www.eskimo.com/~scs/cclass/int/sx4ab.html One Hex-digit is 4 bits

17 Hash Function Division Hashing bucket = key % N N is the length of the hash table AND a prime number a) How big is the hash table? b) What is the range of bucket addresses? c) Where does the key AB map to in the hash table? Advantages? Disadvantages?

18 Collision Handling Techniques Open Addressing: if a collision occurs, then an empty slot is used to store the record. Techniques for selecting the empty slot include: Linear Probing. Quadratic Probing. Re-hashing (double hashing). Separate Chaining: if a collision occurs, then this new record is attached to the existing record in the form of a chain (linked list). This way, several records will share a slot.

19 Collision Handling Open Addressing If both K and C map to the same bucket we have a collision K and C are distinct K is inserted first To resolve using OA, find another unoccupied space for C BUT: We must do this systematically so we can find C again easily! Analysis: (summation of the # of probes to locate each key in the table) / # of keys in the table

20 Open Addressing Find another open bucket bucket = ( h(k) + f(i) ) % N N is the length of the table h(k) : original hash of key K f(i) : i is the number of times you have hashed and failed to find an empty slot First hash is: bucket = ( h(k) + f(0) ) % N f(0) = 0

21 Linear Probing If a collision occurs, then the next empty slot that is found is used to store the data item. If position h(n) is occupied, then try ( h(n) + 1 ) mod table-size, ( h(n) + 2 ) mod table-size, and so on until an empty slot is found. Problems with linear probing include: Formation of bad clusters. Since gaps are being filled, the collision problem is exacerbated.

22 Linear Probing f(i) = i Example: h(kn) = n % 11 Insert M13 G7 Q17 Y25 R18 Z26 F6 Bucket 0 1 2 3 4 5 6 7 8 9 10 Data Primary Clustering!