COMP 103 RECAP-TODAY. Hashing: collisions. Collisions: open hashing/buckets/chaining. Dealing with Collisions: Two approaches

Similar documents
Data Structures. Algorithms

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

Lecture 18. Collision Resolution

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

COMP 103 Introduction to Data Structures and Algorithms

Review. CSE 143 Java. A Magical Strategy. Hash Function Example. Want to implement Sets of objects Want fast contains( ), add( )

EXAMINATIONS 2005 END-YEAR. COMP 103 Introduction to Data Structures and Algorithms

EXAMINATIONS 2012 MID YEAR. COMP103 Introduction to Data Structures and Algorithms SOLUTIONS

Hashing. It s not just for breakfast anymore! hashing 1

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

CSC 321: Data Structures. Fall 2016

COMP 103 RECAP-TODAY. Better Implementations for Collections 3. Linked Structures. Array versus Linked Structure (analogy) RECAP TODAY.

Open Addressing: Linear Probing (cont.)

Lecture 16 More on Hashing Collision Resolution

COMP 103 RECAP-TODAY. Priority Queues and Heaps. Queues and Priority Queues 3 Queues: Oldest out first

Lecture 10 March 4. Goals: hashing. hash functions. closed hashing. application of hashing

Data Structures And Algorithms

Hash table basics mod 83 ate. ate. hashcode()

EXAMINATIONS 2012 Trimester 1, MID-TERM TEST. COMP103 Introduction to Data Structures and Algorithms SOLUTIONS

CSC 321: Data Structures. Fall 2017

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Introduction to Hashing

Hashing Techniques. Material based on slides by George Bebis

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Standard ADTs. Lecture 19 CS2110 Summer 2009

lecture23: Hash Tables

DATA STRUCTURES AND ALGORITHMS

Section 05: Midterm Review

Hash Tables. Gunnar Gotshalks. Maps 1

Priority Queue. 03/09/04 Lecture 17 1

Hashing as a Dictionary Implementation

EXAMINATIONS 2015 COMP103 INTRODUCTION TO DATA STRUCTURES AND ALGORITHMS

Building Java Programs

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Hash table basics mod 83 ate. ate

CSE373 Fall 2013, Second Midterm Examination November 15, 2013

ECE 242 Data Structures and Algorithms. Hash Tables I. Lecture 24. Prof.

CS 310: Hash Table Collision Resolution Strategies

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

Announcements. Today s topic: Hashing (Ch. 10) Next topic: Graphs. Break around 11:45am

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

More on Hashing: Collisions. See Chapter 20 of the text.

Hash table basics. ate à. à à mod à 83

Hash Tables. Hashing Probing Separate Chaining Hash Function

Comp 11 - Summer Session Hashmap

Lecture 7: Efficient Collections via Hashing

Data Structures Brett Bernstein

CSE 143. Lecture 28: Hashing

Hash tables. hashing -- idea collision resolution. hash function Java hashcode() for HashMap and HashSet big-o time bounds applications

COMP171. Hashing.

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

MIDTERM EXAM THURSDAY MARCH

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

CS 3410 Ch 20 Hash Tables

CS 310 Advanced Data Structures and Algorithms

Hash Table. Ric Glassey

Announcements. Container structures so far. IntSet ADT interface. Sets. Today s topic: Hashing (Ch. 10) Next topic: Graphs. Break around 11:45am

Announcements. Submit Prelim 2 conflicts by Thursday night A6 is due Nov 7 (tomorrow!)

CS 310: Hash Table Collision Resolution

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

The dictionary problem

CSE 143 Lecture 14 AnagramSolver and Hashing

Title Description Participants Textbook

Compsci 201 Hashing. Jeff Forbes February 7, /7/18 CompSci 201, Spring 2018, Hashiing

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

CS61B Lecture #24: Hashing. Last modified: Wed Oct 19 14:35: CS61B: Lecture #24 1

Preview. A hash function is a function that:

Hash table basics mod 83 ate. ate

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

HASH TABLES. Hash Tables Page 1

CS 310: Hash Table Collision Resolution

Implementation of Linear Probing (continued)

Algorithms and Data Structures

Section 05: Solutions

EXAMINATIONS 2010 END YEAR. COMP103 Introduction to Data Structures and Algorithms SOLUTIONS

1.00 Lecture 32. Hashing. Reading for next time: Big Java Motivation

CS 10: Problem solving via Object Oriented Programming Winter 2017

HASH TABLES. Goal is to store elements k,v at index i = h k

Cpt S 223. School of EECS, WSU

Habanero Extreme Scale Software Research Project

EXAMINATIONS 2016 TRIMESTER 2

Topic 22 Hash Tables

Lecture 5 Data Structures (DAT037) Ramona Enache (with slides from Nick Smallbone)

CSE 214 Computer Science II Searching

DATA STRUCTURES AND ALGORITHMS

CSE 143 Sp03 Final Exam Sample Solution Page 1 of 13

SOLUTIONS. COMP103 Introduction to Data Structures and Algorithms

Understand how to deal with collisions

ECE 242 Data Structures and Algorithms. Hash Tables II. Lecture 25. Prof.

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger.

Outline. 1 Hashing. 2 Separate-Chaining Symbol Table 2 / 13

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Fall 2017 Mentoring 9: October 23, Min-Heapify This. Level order, bubbling up. Level order, bubbling down. Reverse level order, bubbling up

CS : Data Structures

Linked lists (6.5, 16)

Lecture 17: Implementing HashTables 10:00 AM, Mar 5, 2018

Hash[ string key ] ==> integer value

Transcription:

COMP 103 2017-T1 Lecture 31 Hashing: collisions Marcus Frean, Lindsay Groves, Peter Andreae and Thomas Kuehne, VUW Lindsay Groves School of Engineering and Computer Science, Victoria University of Wellington 2 RECAP-TODAY RECAP Fastsets sets with O(1) contains, add, remove Bitsets use a Boolean array with one cell for each possible value that could be in the set Extending the bitset idea to bags and maps Hashing use a hash function to calculate the position TODAY Dealing with collisions: where do you put colliding values? Put them in the same array Closed hashing/probing Put them somewhere else Open hashing/chaining 3 Dealing with Collisions: Two approaches Show me the book th 0 1 2 3 4 5 6 7 8 9 581 N Put colliding values in the same array: Look for an empty place in the hashtable closed hashing, open addressing, probing Put them somewhere else: Use a collection (eg list) at each place The barber shaves everyone open hashing, closed addressing, buckets, chaining HASH Show me the book th 4 Collisions: open hashing/buckets/chaining Store a Set in each cell: hash value which set ant fox hen dog bee kea cow elk owl pig sow tui eel gnu ape bat bug cat Open hashing: not everything is in the same table Closed address: hash code takes you to the right place Array is top level index into a larger structure jay ray yak This is what Java's HashMap does. If the sets get too nit big... Resize and rehash! roe cod What kind of set? 1

5 Collisions: open hashing/buckets/chaining Performance? if the array is of size k, each subset will be about 1/k th of size(). cost cost of hashcode + cost of method applied to subset Eg, using linked lists and array of size 100 This is 100 times faster than a simple linked list Good when the subsets are mostly small Trie: Needs dynamic memory management for strings, index on first character, then on second, Lookup time is proportional to length of key! 6 Collisions: closed hashing/probing Closed hashing: All data is stored in the same array If location given by hash function is occupied, look for another location Open addressing: Hash value tells us where to start looking Probing: Looking at successive locations till we find the value we re looking for, or an empty location Where do we look? Next location? One further away? What will give best performance? 7 Linear Probing: Look in next location Hash value tells us where to start looking. if value.hashcode() p start at index p if cell is used, try p+1, p+2, p+3 wrap round to 0 at the end of the array. Stu (2) Sven (5) Sam (4) Steve (2) Stig (2) Sun (3) 8 Linear Probing: contains Search for: Stu (2) Sven (5) Sam (4) Steve (2) Sun (3) Sun Stu Steve Sam Sven Stig 0 1 2 3 4 5 6 Problem: remove Sam!! 0 1 2 3 4 5 6 2

9 Linear Probing: contains public boolean contains(object value) { if (value==null) return false; // or error int hash = Math.abs(value.hashCode() % data.length); int p = hash; if (data[p] == null) return false; // not there if (data[p].equals(value)) return true; // found p = (p+1) % data.length; if (p == hash) return false; // not there You ve gone right around to where you started... How can this happen? 10 Linear Probing: add public boolean add(e value) { if (value==null) throw new NullPointerException(); // better! ensurecapacity();!! int hash = Math.abs(value.hashCode() % data.length); int p = hash; if (data[p] == null) { data[p] = value; size++; return true; // added if (data[p].equals(value)) return false; // already there p = (p+1) % data.length; if (p == hash) return false; // ummm.???? 11 ensurecapacity If table is full (or nearly full), double its size and copy: how do you copy? 12 Hash Tables and Load Factor When is the hashtable too full? cat bee fox pig cat bee fox owl hen Index depends on cat bee fox hashcode and length (division method)! and it depends on previous collisions... Have to rehash everything! cat bee fox When number of items is close to array size: May have to probe a large number of cells to find empty cell performance becomes very slow. Linear probing is particularly bad! Should not let table get more than 70% - 80% full (maximum load factor ) With a low load factor, cost is O(1)...high...O(N) 3

13 ensurecapacity public void ensurecapacity() { if (size <= maximumload) return; E[ ] olddata = data; data = (E[ ])new Object[data.length * 2]; maximumload = data.length*max_load_factor; for (E v : olddata) { if (v!= null) { add(v); rehash a field, initially set to data.length*max_load_factor within ensurecapacity, calling add is unnecessarily expensive: checks capacity each time (we know it is OK) checks if item is present already (we know it isn t) 14 ensurecapacity (more efficient) public void ensurecapacity() { if (size <= maximumload) return; E[ ] olddata = data; data = (E[ ])new Object[data.length * 2]; maximumload = maximumload * 2; for (E v : olddata) { if (v!= null) { int p = Math.abs(v.hashCode() % data.length); if (data[p] == null) { data[p] = v; break; p = (p+1) % data.length; rehash 15 Linear Probing: remove Inserted: Stu (2) Sven (5) Sam (4) Steve (2) Sun (4) 16 Linear Probing: Runs and Clustering Linear probing is particularly bad: Sun Stu Steve Sam Sven Stig cat bee fox Now remove: Sam (4) 0 1 2 3 4 5 6 What s the problem? contains(sun) will return false! To remove, need to leave a tombstone (not null, not a value!) ignored by add, etc. How do we count tombstones in ensurecapacity? Repeated collisions at one index create runs Runs linear performance With linear probing, runs join up they grow fast: the bigger the run, the faster it grows This is called "clustering 1,2 5 3 4 Can we do better by increasing the step size? hen owl pig gnu emu rat tui 4

17 Quadratic Probing Make the sequence of probes have increasing steps: runs don t join up so fast 18 Quadratic Probing Another problem, perhaps? Sequence might wrap back on itself before checking each cell: hen bee cat fox owl hen h, h+1, h+4, h+9, h+16, p=h, p+=1, p+=3, p+=5, p+= 7, p+= 9,. Quadratic probing uses a quadratic formula: probe i = hash + a i + b i 2 (b 0) Eg: with a=b=½, the step sizes become 1,2,3 instead of 1,3,5 If we choose a = b = ½, and length is a power of 2... guaranteed not to wrap until it has checked every cell! probe i = hash + ½ (i + i 2 ) probes are hash, hash+1, hash+3, hash+6, hash+10, hash+15,... step sizes are 1, 2, 3, 4, 5, 19 Quadratic Probing: contains private static final int INITIAL_CAPACITY = 16; // a power of 2 : public boolean contains(object value) { if (value == null) return false; int p = Math.abs(value.hashCode() % data.length); int p = hash; int step = 1; if (data[p] == null) return false; // not there if (data[p].equals(value)) return true; // found p = (p + (step++)) % data.length; This does not check for cycles! It relies on: the array not being full, and the probe sequence checking every cell 20 Iterator Iterating through hash table is not simple: there will be nulls to skip over the order that items are returned appears random (and may change when the array is doubled!) At each call to next(), Iterator must advance the index to the next non-null cell. cat bee fox 5

21 Hash Table with Probing: iterator 22 Hash Table with Probing: iterator private class HashSetIterator implements Iterator <E> { private E[ ] data; private int nextindex = 0; private HashSetIterator (E[ ] d) { data = d; while (nextindex < data.length && data[nextindex] == null ) nextindex++; public E next () { if (nextindex >= data.length) throw new NoSuchElementException(); E ans = data[nextindex++]; while (nextindex < data.length && data[nextindex] == null) nextindex++; return ans; public boolean hasnext () { return (nextindex < data.length); public void remove() { throw new UnsupportedOperationException(); 23 Other Probing Techniques Quadratic probing: Step sizes 1,2,3 still suffers from secondary clustering Double hashing: use a second hash function, to compute next probe index: p = hash2(value, p); less clustering, but more expensive Cuckoo hashing... Use two hash functions. Try both indexes. the new hash depends on the value as well (unlike with probing) If both are full, kick out one of the values, and put it in its alternate place (kicking out a value if necessary,.) Office1 24 Extending the Bitset idea Bitsets use a Boolean array with one cell for each possible value that could be in the set Can we extend this idea to bags and maps? For bag, store number of times the value is in the bag For map, store the value that key maps to oh, that s just an ordinary array! 6

Slide 24 Office1 Now covered in lecture 30 Microsoft Office User, 10/17/2017