Topic 22 Hash Tables

Similar documents
Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Introduction to Hashing

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Review. CSE 143 Java. A Magical Strategy. Hash Function Example. Want to implement Sets of objects Want fast contains( ), add( )

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

CS 310 Advanced Data Structures and Algorithms

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

More on Hashing: Collisions. See Chapter 20 of the text.

Compsci 201 Hashing. Jeff Forbes February 7, /7/18 CompSci 201, Spring 2018, Hashiing

Hash table basics mod 83 ate. ate. hashcode()

CS61B Lecture #24: Hashing. Last modified: Wed Oct 19 14:35: CS61B: Lecture #24 1

Understand how to deal with collisions

Announcements. Hash Functions. Hash Functions 4/17/18 HASHING

CMSC 132: Object-Oriented Programming II. Hash Tables

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

Hash table basics mod 83 ate. ate

Hash table basics. ate à. à à mod à 83

Announcements. Submit Prelim 2 conflicts by Thursday night A6 is due Nov 7 (tomorrow!)

Hash table basics mod 83 ate. ate

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

Linked lists (6.5, 16)

CS 3410 Ch 20 Hash Tables

csci 210: Data Structures Maps and Hash Tables

CSC 321: Data Structures. Fall 2016

Lecture 18. Collision Resolution

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

ECE 242 Data Structures and Algorithms. Hash Tables II. Lecture 25. Prof.

CSCI 104 Hash Tables & Functions. Mark Redekopp David Kempe

Hash[ string key ] ==> integer value

Logistics. Homework 10 due tomorrow Review on Monday. Final on the following Friday at 3pm in CHEM 102. Come with questions

Computer Science 136 Spring 2004 Professor Bruce. Final Examination May 19, 2004

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Summer Final Exam Review Session August 5, 2009

Lecture 10: Introduction to Hash Tables

CS 261 Data Structures

+ Abstract Data Types

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

Points off Total off Net Score. CS 314 Final Exam Spring Your Name Your UTEID

Preview. A hash function is a function that:

Garbage Collection (1)

CS 307 Final Spring 2009

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Hashing for searching

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 7. (A subset of) the Collections Interface. The Java Collections Interfaces

MIDTERM EXAM THURSDAY MARCH

Hash Tables. Hashing Probing Separate Chaining Hash Function

Hash tables. hashing -- idea collision resolution. hash function Java hashcode() for HashMap and HashSet big-o time bounds applications

CSE 214 Computer Science II Searching

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix

Data Structures. Topic #6

Cpt S 223. School of EECS, WSU

CSC 321: Data Structures. Fall 2017

Introduction to Computers and Programming. Today

I have neither given nor received any assistance in the taking of this exam.

CS 206 Introduction to Computer Science II

Programming Languages and Techniques (CIS120)

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

Lecture 5 Data Structures (DAT037) Ramona Enache (with slides from Nick Smallbone)

CMSC 341 Hashing. Based on slides from previous iterations of this course

(the bubble footer is automatically inserted in this space)

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

CSE 143. Lecture 28: Hashing

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Prelim 2 SOLUTION. CS 2110, 16 November 2017, 7:30 PM Total Question Name Short Heaps Tree Collections Sorting Graph

Introduction Data Structures

The dictionary problem

11/27/12. CS202 Fall 2012 Lecture 11/15. Hashing. What: WiCS CS Courses: Inside Scoop When: Monday, Nov 19th from 5-7pm Where: SEO 1000

Tutorial #11 SFWR ENG / COMP SCI 2S03. Interfaces and Java Collections. Week of November 17, 2014

CPSC 259 admin notes

COMP-202 Unit 7: More Advanced OOP. CONTENTS: ArrayList HashSet (Optional) HashMap (Optional)

Prelim 2. CS 2110, 16 November 2017, 7:30 PM Total Question Name Short Heaps Tree Collections Sorting Graph

HASH TABLE BY AKARSH KUMAR

NAME: c. (true or false) The median is always stored at the root of a binary search tree.

CSE373 Fall 2013, Second Midterm Examination November 15, 2013

Hash Tables and Hash Functions

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

Standard ADTs. Lecture 19 CS2110 Summer 2009

Data Structures And Algorithms

CS 61B Summer 2005 (Porter) Midterm 2 July 21, SOLUTIONS. Do not open until told to begin

Collections, Maps and Generics

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 6. The Master Theorem. Some More Examples...

Data Structures and Object-Oriented Design VIII. Spring 2014 Carola Wenk

Hash Table and Hashing

Comp 335 File Structures. Hashing

Chapter 20 Hash Tables

CS 307 Final Spring 2010

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

HASH TABLES.

Unit #5: Hash Functions and the Pigeonhole Principle

The following content is provided under a Creative Commons license. Your support

CPS222 Lecture: Sets. 1. Projectable of random maze creation example 2. Handout of union/find code from program that does this

BINARY HEAP cs2420 Introduction to Algorithms and Data Structures Spring 2015

Desire. We want to store objects in some structure and be able to retrieve them extremely quickly. The number of items to store might be big.

Section 05: Solutions

CS211 Computers and Programming Matthew Harris and Alexa Sharp July 9, Boggle

Transcription:

Topic 22 Hash Tables "hash collision n. [from the techspeak] (var. `hash clash') When used of people, signifies a confusion in associative memory or imagination, especially a persistent one (see thinko). True story: One of us was once on the phone with a friend about to move out to Berkeley. When asked what he expected Berkeley to be like, the friend replied: 'Well, I have this mental picture of naked women throwing Molotov cocktails, but I think that's just a collision in my hash tables.'" -The Hacker's Dictionary

Programming Pearls by Jon Bentley Jon was senior programmer on a large programming project. Senior programmer spend a lot of time helping junior programmers. Junior programmer to Jon: "I need help writing a sorting algorithm." CS314 Hash Tables 2

A Problem From Programming Pearls (Jon in Italics) Why do you want to write your own sort at all? Why not use a sort provided by your system? I need the sort in the middle of a large system, and for obscure technical reasons, I can't use the system file-sorting program. What exactly are you sorting? How many records are in the file? What is the format of each record? The file contains at most ten million records; each record is a seven-digit integer. Wait a minute. If the file is that small, why bother going to disk at all? Why not just sort it in main memory? Although the machine has many megabytes of main memory, this function is part of a big system. I expect that I'll have only about a megabyte free at that point. Is there anything else you can tell me about the records? Each one is a seven-digit positive integer with no other associated data, and no integer can appear more than once. CS314 Hash Tables 3

System Sort CS314 Hash Tables 4

Questions When did this conversation take place? What were they sorting? How do you sort data when it won't all fit into main memory? Speed of file i/o? CS314 Hash Tables 5

A Solution /* phase 1: initialize set to empty */ for i = [0, n) bit[i] = 0 /* phase 2: insert present elements into the set */ for each i in the input file bit[i] = 1 /* phase 3: write sorted output */ for i = [0, n) if bit[i] == 1 write i on the output file CS314 Hash Tables 6

Some Structures so Far ArrayLists O(1) access O(N) insertion (average case), better at end O(N) deletion (average case) LinkedLists O(N) access O(N) insertion (average case), better at front and back O(N) deletion (average case), better at front and back Binary Search Trees O(log N) access if balanced O(log N) insertion if balanced O(log N) deletion if balanced CS314 Hash Tables 7

Why are Binary Trees Better? Divide and Conquer reducing work by a factor of 2 each time Can we reduce the work by a bigger factor? 10? 1000? An ArrayList does this in a way when accessing elements but must use an integer value each position holds a single element CS314 Hash Tables 8

Hash Tables Hash Tables overcome the problems of ArrayList while maintaining the fast access, insertion, and deletion in terms of N (number of elements already in the structure.) Hash tables use an array and hash functions to determine the index for each element. CS314 Hash Tables 9

Hash Functions Hash: "From the French hatcher, which means 'to chop'. " to hash to mix randomly or shuffle (To cut up, to slash or hack about; to mangle) Hash Function: Take a large piece of data and reduce it to a smaller piece of data, usually a single integer. A function or algorithm The input need not be integers! CS314 Hash Tables 10

Hash Function 5/5/1967 555389085 5122466556 "Mike Scott" scottm@gmail.net "Isabelle" hash function 12 CS314 Hash Tables 11

Simple Example Assume we are using names as our key take 3rd letter of name, take int value of letter (a = 0, b = 1,...), divide by 6 and take remainder What does "Bellers" hash to? L -> 11 -> 11 % 6 = 5 CS314 Hash Tables 12

Result of Hash Function Mike = (10 % 6) = 4 Kelly = (11 % 6) = 5 Olivia = (8 % 6) = 2 Isabelle = (0 % 6) = 0 David = (21 % 6) = 3 Margaret = (17 % 6) = 5 (uh oh) Wendy = (13 % 6) = 1 This is an imperfect hash function. A perfect hash function yields a one to one mapping from the keys to the hash values. What is the maximum number of values this function can hash perfectly? CS314 Hash Tables 13

Another Hash Function Assume the hash function for String adds up the Unicode value for each character. public int hashcode(string s) { int result = 0; for(int i = 0; i < s.length(); i++) result += s.charat(i); return result; } Hashcode for "DAB" and "BAD"? A. 301 103 B. 4 4 C. 412 214 D. 5 5 E. 199 199 14

More on Hash Functions transform the key (which may not be an integer) into an integer value The transformation can use one of four techniques Mapping Folding Shifting Casting CS314 Hash Tables 15

Mapping Hashing Techniques As seen in the example integer values or things that can be easily converted to integer values in key Folding partition key into several parts and the integer values for the various parts are combined the parts may be hashed first combine using addition, multiplication, shifting, logical exclusive OR CS314 Hash Tables 16

Shifting More complicated with shifting int hashval = 0; int i = str.length() - 1; while(i > 0) { hashval = (hashval << 1) + (int) str.charat(i); i--; } different answers for "dog" and "god" Shifting may give a better range of hash values when compared to just folding Casts Very simple essentially casting as part of fold and shift when working with chars. CS314 Hash Tables 17

The Java String class hashcode method public int hashcode() { int h = hash; if (h == 0 && value.length > 0) { char[] val = value; int len = count; for (int i = 0; i < value.length; i++) { h = 31 * h + val[i]; } hash = h; } return h; } CS314 Hash Tables 18

Mapping Results Transform hashed key value into a legal index in the hash table Hash table is normally uses an array as its underlying storage container Normally get location on table by taking result of hash function, dividing by size of table, and taking remainder index = key mod n n is size of hash table empirical evidence shows a prime number is best 1000 element hash table, make 997 or 1009 elements CS314 Hash Tables 19

Mapping Results "Isabelle" 230492619 hashcode method 230492619 % 997 = 177 0 1 2 3...177... 996 "Isabelle" CS314 Hash Tables 20

Handling Collisions What to do when inserting an element and already something present? CS314 Hash Tables 21

Open Addressing Could search forward or backwards for an open space Linear probing: move forward 1 spot. Open?, 2 spots, 3 spots reach the end? When removing, insert a blank null if never occupied, blank if once occupied Quadratic probing 1 spot, 2 spots, 4 spots, 8 spots, 16 spots Resize when load factor reaches some limit CS314 Hash Tables 22

Closed Addressing: Chaining Each element of hash table be another data structure linked list, balanced binary tree More space, but somewhat easier everything goes in its spot What happens when resizing? Why don't things just collide again? CS314 Hash Tables 23

Hash Tables in Java hashcode method in Object hashcode and equals "If two objects are equal according to the equals (Object) method, then calling the hashcode method on each of the two objects must produce the same integer result. " if you override equals you need to override hashcode Overriding one of equals and hashcode, but not the other, can cause logic errors that are difficult to track down. CS314 Hash Tables 24

HashTable class HashSet class Hash Tables in Java implements Set interface with internal storage container that is a HashTable compare to TreeSet class, internal storage container is a Red Black Tree HashMap class implements the Map interface, internal storage container for keys is a hash table CS314 Hash Tables 25

Comparison Compare these data structures for speed: Java HashSet Java TreeSet our naïve Binary Search Tree our HashTable Insert random ints CS314 Hash Tables 26

Clicker Question What will be order from fastest to slowest? A. HashSet TreeSet HashTable BST B. HashSet HashTable TreeSet BST C. TreeSet HashSet BST HashTable D. HashTable HashSet BST TreeSet E. None of these CS314 Hash Tables 27