Hash tables. hashing -- idea collision resolution. hash function Java hashcode() for HashMap and HashSet big-o time bounds applications

Similar documents
Java Map and Set collections

CS 455 MT 2 Sample Questions

Review. CSE 143 Java. A Magical Strategy. Hash Function Example. Want to implement Sets of objects Want fast contains( ), add( )

Hash table basics mod 83 ate. ate. hashcode()

CS 455 Midterm Exam 2 Fall 2016 [Bono] November 8, 2016

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Compsci 201 Hashing. Jeff Forbes February 7, /7/18 CompSci 201, Spring 2018, Hashiing

Announcements. Submit Prelim 2 conflicts by Thursday night A6 is due Nov 7 (tomorrow!)

Announcements. Container structures so far. IntSet ADT interface. Sets. Today s topic: Hashing (Ch. 10) Next topic: Graphs. Break around 11:45am

Midterm Exam 2 CS 455, Spring 2011

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Announcements. Today s topic: Hashing (Ch. 10) Next topic: Graphs. Break around 11:45am

Programming Abstractions

ArrayList; Names example

Hash table basics. ate à. à à mod à 83

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Hash Tables. Gunnar Gotshalks. Maps 1

Hash table basics mod 83 ate. ate

Fall 2017 Mentoring 9: October 23, Min-Heapify This. Level order, bubbling up. Level order, bubbling down. Reverse level order, bubbling up

Abstract Data Types (ADTs) Queues & Priority Queues. Sets. Dictionaries. Stacks 6/15/2011

Midterm Exam 2 CS 455, Spring 2015

Collections, Maps and Generics

Hash table basics mod 83 ate. ate

Not overriding equals

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

Standard ADTs. Lecture 19 CS2110 Summer 2009

STANDARD ADTS Lecture 17 CS2110 Spring 2013

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

Lecture 7: Efficient Collections via Hashing

Announcements. Hash Functions. Hash Functions 4/17/18 HASHING

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Hash Tables. Hash Tables

Programming Abstractions

Tutorial #11 SFWR ENG / COMP SCI 2S03. Interfaces and Java Collections. Week of November 17, 2014

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

USAL1J: Java Collections. S. Rosmorduc

CMSC 132: Object-Oriented Programming II. Hash Tables

Lecture 10: Introduction to Hash Tables

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

CS 455 Midterm Exam 2 Fall 2015 [Bono] Nov. 10, 2015

The dictionary problem

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

Logistics. Homework 10 due tomorrow Review on Monday. Final on the following Friday at 3pm in CHEM 102. Come with questions

Chapter 20 Hash Tables

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Computer Science II (Spring )

Recursion; Side effects

1 CSE 100: HASH TABLES

1.00/1.001 Introduction to Computers and Engineering Problem Solving. Final Exam

CS 455 Midterm Exam 2 Spring 2017 [Bono] Tuesday, April 4, 2017

CSCI 104 Hash Tables & Functions. Mark Redekopp David Kempe

Topic HashTable and Table ADT

Programming Languages and Techniques (CIS120)

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

CS 455 Midterm 2 Spring 2018 [Bono] Apr. 3, 2018

C12a: The Object Superclass and Selected Methods

Hashing. Reading: L&C 17.1, 17.3 Eck Programming Course CL I

Building Java Programs

CS 307 Final Spring 2009

Programming Languages and Techniques (CIS120)

Data Structures And Algorithms

Hashing Techniques. Material based on slides by George Bebis

Midterm Exam 2 CS 455, Fall 2014

Linked lists (6.5, 16)

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

1.00 Lecture 32. Hashing. Reading for next time: Big Java Motivation

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

Midterm Exam 2 CS 455, Fall 2013

CS61BL Summer 2013 Midterm 2

CS 310 Advanced Data Structures and Algorithms

CS 310: Hash Table Collision Resolution

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Preview. A hash function is a function that:

CS 261 Data Structures

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

CSE 143. Lecture 28: Hashing

Topic 22 Hash Tables

CS11 Java. Winter Lecture 8

Points off Total off Net Score. CS 314 Final Exam Spring Your Name Your UTEID

Hash[ string key ] ==> integer value

CS 455 Final Exam Fall 2015 [Bono] Dec. 15, 2015

Data Structures and Object-Oriented Design VIII. Spring 2014 Carola Wenk

Algorithms and Data Structures

Hash Tables. Hashing Probing Separate Chaining Hash Function

MIDTERM EXAM THURSDAY MARCH

Introduction to Hashing

CIS 190: C/C++ Programming. Lecture 12 Student Choice

COMP 250. Lecture 27. hashing. Nov. 10, 2017

CS 455 Final Exam Fall 2012 [Bono] Dec. 17, 2012

HASH TABLES. CSE 332 Data Abstractions: B Trees and Hash Tables Make a Complete Breakfast. An Ideal Hash Functions.

Collections Questions

(f) Given what we know about linked lists and arrays, when would we choose to use one data structure over the other?

CS 10: Problem solving via Object Oriented Programming Winter 2017

Understand how to deal with collisions

Java HashMap Interview Questions

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

CSE373 Fall 2013, Second Midterm Examination November 15, 2013

Transcription:

hashing -- idea collision resolution Hash tables closed addressing (chaining) open addressing techniques hash function Java hashcode() for HashMap and HashSet big-o time bounds applications Hash tables [Bono] 1

Announcements No labs this week MT2: Tues. 4/3, 9:30 10:50am see piazza annoucement for location information PA2 has been published: due 4/11 Hash tables [Bono] 2

Review: Map ADT A map stores a collection of (key,value) pairs keys are unique: a pair can be identified by its key Operations: add a new (key, value) pair (called an entry) lookup an entry, given its key update the value of an entry, given its key remove an entry, given its key list all the entries (order of visiting depends on the kind of map created) Java maps and sets [Bono] 3

Hashing idea Suppose we had a map where keys were all in the range [0..99]. What specialized representation could we use? Hashing is a generalization of this idea... Hash tables [Bono] 4

Hashing idea (cont.) In Map ADT, keys might be strings, ints, key address calculator hash function O(1) address (array index) 0 1 2....... 996... hash bins Hash tables [Bono] 5

Problems More key values than indices in the array. How big is the set of one-word names of up to 20 characters? set of possible keys Ryan John Susie Ann h(key) set of Array indices Jose Hash tables [Bono] 6

Collisions Even if set of actual key values is small, the set of possible key values is large. The hash function maps the set of possible values, so you need to worry about collisions. But if the set of actual values is small, how soon do we need to really worry about this? Hash tables [Bono] 7

The Birthday Problem How many people in a room before chance that any two have the same birthday is > 50%? Hash tables [Bono] 8

In general What s the implication? No matter what the table size No matter what the hash function No matter how many actual entries currently stored (>1) the hash address for key does not uniquely identify the map entry <key, value>. i.e., collisions can happen hash address just identifies a hash bin assume you can store multiple entries in a bin Hash tables [Bono] 9

Collision resolution Recall: hash value does not uniquely identify the entry. Collision resolution: Where to put the entries when two of them map to the same location (i.e., hash(key 1 ) = hash(key 2 ))? Hash tables [Bono] 10

Collision resolution strategies Where to put the entries when two of them map to the same location? Open addressing put them in different hash buckets linear probing: put in next empty bucket Closed addressing each bucket can hold multiple entries. chaining: use a linked list for all entries in a bucket. We'll only discuss chaining in detail. Hash tables [Bono] 11

Collision resolution by chaining Each bucket stores multiple items -- using linked list. 0 John --> 3 Sam --> 6 Mary --> 3 Beth --> 4 Joe --> 4 Bob --> 3 HSIZE = 11 1 2 3 4 5 6 7 8 9 10 Insert, lookup, delete Hash tables [Bono] 12

Hash functions Required properties: deterministic: value only depends on key has to map to the indices of the array: achieve with : hashvalue % HASH_SIZE Desirable properties easy to compute (i.e., fast) promotes uniform distribution of key values over whole range of addresses Hash tables [Bono] 13

Notes: POLL: hash functions don't worry about the range of the output (assume it will get converted with h % HASHSIZE) chars have an int code: they can be treated as numbers mark all that apply Hash tables [Bono] 14

Hash functions General methods truncation (e.g., key is a SSN) folding modulus (%) % HASHSIZE (prime) Hash tables [Bono] 15

Hash functions: example Simple example of folding technique: for a key that is a string: c 1..c n ( S c i ) % HASHSIZE How many distinct values (addresses) can we get with this hash function? (before modulus) Hash tables [Bono] 16

Better string hash function Takes into account the position of the character. a is a constant (prime better) h o = 0 h i = a h i-1 + c i 1 <= i <= k Implement with a loop but to see what s happening, here s the whole computation for three chars: a 2 c 1 + ac 2 + c 3 To find good hash functions: The Art of Computer Programming, volume on Sorting and Searching, by Don Knuth Hash tables [Bono] 17

Using Java HashMap (or HashSet) Recall: for HashMap<KeyType, ValueType> KeyType must have equals and hashcode defined Already defined for Java types such as String, Integer, Double How to use our own type as the KeyType? Hash tables [Bono] 18

Java hashcode() method Java HashMap / HashSet uses hashcode() method as the hash function. Many Java classes override it to do the right thing. For example String's hashcode() uses the algorithm from the previous slide (a = 31) Also defined for LinkedList and ArrayList, for example (see documentation for details) Hash tables [Bono] 19

Writing your own hashcode() Can be overridden from Object Doesn t depend on size of hash table: % HASH_SIZE is handled by HashMap/HashSet class Contract for hashcode() if a.equals(b) then a.hashcode() == b.hashcode() Warning: Object version does not do the right thing don't rely on inherited one. object version returns a code based on the address of the object (i.e., object identity) We want it to use the value of the object Hash tables [Bono] 20

Ex: Student class public class Student { private String thename; private int score; } public boolean equals(object other) { // returns true iff they have the same name // and the same score } public int hashcode() { return thename.hashcode() + score; }... Hash tables [Bono] 21

Danger of inheriting hashcode() Consider a Term class: (coefficient, exponent). Suppose we override equals() to be that the exponents and coefficients must be the same. but inherit hashcode() from Object. Reminder: Contract for hashcode() if a.equals(b) then a.hashcode() == b.hashcode() Term t = new Term(3,4); Term t2 = new Term(3,4); t.equals(t2) // true or false? t.hashcode() == t2.hashcode() // true or false? Hash tables [Bono] 22

Why does this matter? What happens if you use these terms in a HashSet? Term t = new Term(3,4);... hashset.add(t);... Term target = new Term(3,4); if (hashset.contains(target))... t.equals(target) true or false? t.hashcode() == target.hashcode() true or false? Hash tables [Bono] 23

Converse to contract not true Reminder: Contract for hashcode() if a.equals(b) then a.hashcode() == b.hashcode() But, if a.hashcode() == b.hashcode() what do we know about a and b? Hash tables [Bono] 24

Suppose chaining Big-O time bounds n = number of items in table b = size of table l = n / b average length of a list (uniform distr) O(l) search time On average, O(1) Inserts and deletes also take O(l) Hash tables [Bono] 25

Performance in practice Performance depends on hash function, and load factor. Good hash function: uniform distribution of keys over hash addresses Bad hash function: worst case, all values could be in one bucket. Load factor: HashMap / HashSet make the array bigger if it gets above 0.75 load factor. (numentries / HASH_SIZE) Involves rehashing everything. Hash tables [Bono] 26

Big-O for traversal Traversal (in no special order) takes O(n) -- but really n + size of the array Ordered traversal involves sorting. Best sorts are O(nlogn) hashing is an excellent Map representation, providing that you aren t going to be doing traverseinorder operation much. What representations are well suited for ordered traversal? Hash tables [Bono] 27

Applications of hash tables First: properties desired: lookup insert (remove) don t care about order of keys Examples of applications: Database (master customer file) Compilers: symbol tables Games: look up board configuration to find the move that goes with it (e.g., chess, tic-tac-toe) UNIX shell: quick command lookup. Hash tables [Bono] 28