stacks operation array/vector linked list push amortized O(1) Θ(1) pop Θ(1) Θ(1) top Θ(1) Θ(1) isempty Θ(1) Θ(1)

Similar documents
CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Hash[ string key ] ==> integer value

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

Cpt S 223. School of EECS, WSU

Section 05: Solutions

Standard ADTs. Lecture 19 CS2110 Summer 2009

Topic HashTable and Table ADT

Hash Tables. Hash Tables

UNIT III BALANCED SEARCH TREES AND INDEXING

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

CSE373 Fall 2013, Second Midterm Examination November 15, 2013

Practice Midterm Exam Solutions

Section 05: Solutions

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

Linked lists (6.5, 16)

Hash table basics mod 83 ate. ate. hashcode()

Open Addressing: Linear Probing (cont.)

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Lecture 7: Efficient Collections via Hashing

Unit #5: Hash Functions and the Pigeonhole Principle

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Hashing Techniques. Material based on slides by George Bebis

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

1 CSE 100: HASH TABLES

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

Computer Science Foundation Exam

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

ECE250: Algorithms and Data Structures Midterm Review

You must print this PDF and write your answers neatly by hand. You should hand in the assignment before recitation begins.

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

Abstract Data Types (ADTs) Queues & Priority Queues. Sets. Dictionaries. Stacks 6/15/2011

STANDARD ADTS Lecture 17 CS2110 Spring 2013

CSCI 104 Hash Tables & Functions. Mark Redekopp David Kempe

Fast Lookup: Hash tables

CSC 321: Data Structures. Fall 2016

Section 05: Midterm Review

Questions. 6. Suppose we were to define a hash code on strings s by:

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

B+ Tree Review. CSE332: Data Abstractions Lecture 10: More B Trees; Hashing. Can do a little better with insert. Adoption for insert

CS 61B Midterm 2 Guerrilla Section Spring 2018 March 17, 2018

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Outline. hash tables hash functions open addressing chained hashing

Understand how to deal with collisions

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

DATA STRUCTURES AND ALGORITHMS

Chapter 20 Hash Tables

Course Review for Finals. Cpt S 223 Fall 2008

Computer Science 302 Spring 2007 Practice Final Examination: Part I

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

COMP171. Hashing.

CS 206 Introduction to Computer Science II

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

Hash table basics mod 83 ate. ate

Announcements. Submit Prelim 2 conflicts by Thursday night A6 is due Nov 7 (tomorrow!)

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

CSC 427: Data Structures and Algorithm Analysis. Fall 2004

HASH TABLES. Goal is to store elements k,v at index i = h k

DATA STRUCTURES AND ALGORITHMS

Hashing. Hashing Procedures

Part I Anton Gerdelan

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Lecture 17: Hash Tables, Maps, Finish Linked Lists

UNIVERSITY OF WATERLOO DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING E&CE 250 ALGORITHMS AND DATA STRUCTURES

HASH TABLES.

MIDTERM EXAM THURSDAY MARCH

lecture23: Hash Tables

CSE373: Data Structures & Algorithms Lecture 28: Final review and class wrap-up. Nicki Dell Spring 2014

Data Structure. A way to store and organize data in order to support efficient insertions, queries, searches, updates, and deletions.

Prefix Trees Tables in Practice

Lecture 18. Collision Resolution

CSE 326: Data Structures Splay Trees. James Fogarty Autumn 2007 Lecture 10

University of Illinois at Urbana-Champaign Department of Computer Science. Second Examination

Hashing. mapping data to tables hashing integers and strings. aclasshash_table inserting and locating strings. inserting and locating strings

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

Final Examination CSE 100 UCSD (Practice)

Priority Queues and Heaps (continues) Chapter 13: Heaps, Balances Trees and Hash Tables Hash Tables In-class Work / Suggested homework.

Hash table basics mod 83 ate. ate

CS 2150 (fall 2010) Midterm 2

COS 226 Algorithms and Data Structures Fall Midterm

HASH TABLES. CSE 332 Data Abstractions: B Trees and Hash Tables Make a Complete Breakfast. An Ideal Hash Functions.

Course Review. Cpt S 223 Fall 2009

Cpt S 223 Fall Cpt S 223. School of EECS, WSU

CSE 332 Spring 2014: Midterm Exam (closed book, closed notes, no calculators)

CSE 100: UNION-FIND HASH

Course Review. Cpt S 223 Fall 2010

Hash table basics. ate à. à à mod à 83

Solutions to Exam Data structures (X and NV)

Programming Abstractions

About this exam review

Hash Table and Hashing

Hash Tables and Hash Functions

Transcription:

Hashes 1

lists 2 operation array/vector linked list find (by value) Θ(n) Θ(n) insert (end) amortized O(1) Θ(1) insert (beginning/middle) Θ(n) Θ(1) remove (by value) Θ(n) Θ(n) find (by index) Θ(1) Θ(1)

stacks 3 operation array/vector linked list push amortized O(1) Θ(1) pop Θ(1) Θ(1) top Θ(1) Θ(1) isempty Θ(1) Θ(1)

queues 4 operation array/vector linked list enqueue amortized O(1) Θ(1) dequeue Θ(1) Θ(1)

sets 5 abstract data type with subset of list operations: find (by value) insert (unspecified location) remove (by value) omits: find (by index) insert at particular location

sets operation BST AVL or vector hash table red-black find (by value) Θ(height)* Θ(log n) Θ(n) O(1) insert Θ(height)* Θ(log n) amortized O(1) O(1) remove Θ(height)* Θ(log n) Θ(1) O(1) find max/min Θ(height)* Θ(log n) Θ(n) Θ(n)

sets operation BST AVL or vector hash table red-black find (by value) Θ(height)* Θ(log n) Θ(n) O(1) insert Θ(height)* Θ(log n) amortized O(1) O(1) remove Θ(height)* Θ(log n) Θ(1) O(1) find max/min Θ(height)* Θ(log n) Θ(n) Θ(n) *BST: height is often Θ(log n), but can be Θ(n) hash table O(1) usually, but Θ(n) worst case

sets operation BST AVL or vector hash table red-black find (by value) Θ(height)* Θ(log n) Θ(n) O(1) insert Θ(height)* Θ(log n) amortized O(1) O(1) remove Θ(height)* Θ(log n) Θ(1) O(1) find max/min Θ(height)* Θ(log n) Θ(n) Θ(n) *BST: height is often Θ(log n), but can be Θ(n) how to get worst case: insert in sorted order hash table O(1) usually, but Θ(n) worst case

sets operation BST AVL or vector hash table red-black find (by value) Θ(height)* Θ(log n) Θ(n) O(1) insert Θ(height)* Θ(log n) amortized O(1) O(1) remove Θ(height)* Θ(log n) Θ(1) O(1) find max/min Θ(height)* Θ(log n) Θ(n) Θ(n) *BST: height is often Θ(log n), but can be Θ(n) how to get worst case: insert in sorted order hash table O(1) usually, but Θ(n) worst case how to get worst case: insert specially chosen set of items can design hash table to make this really rare

maps 7 abstract data type with key-value pairs examples: key=computing ID, value=grade key=word, value=definition key=user ID, value=object with many fields operations: find value by key insert(key, value) remove by key

map with vector 8 class KeyValuePair { public: string key; int value; }; class VectorMap { public: void insert(const string& key, int value); int find(const string& key); // XXX value if not found? void remove(const string& key); private: vector<keyvaluepair> data; };

maps 9 operation BST AVL or vector hash table red-black find (by key) Θ(height)* Θ(log n) Θ(n) O(1) insert Θ(height)* Θ(log n) amortized O(1) O(1) remove (by key) Θ(height)* Θ(log n) Θ(1) O(1) *BST: height is often Θ(log n), but can be Θ(n) hash table O(1) usually, but Θ(n) worst case

aside: standard library std::map balanced tree-based map std::unordered_map hashtable-based map unordered_map<string, double> grades; grades["cr4bd"] = 85.0;... if (grades.count("mst3k") > 0) { cout << "mst3k has a grade assigned\n"; } for (unordered_map<string, double>::iterator it = grades.begin(); it!= grades.end(); ++it) { cout << it >first << " " << it >second << "\n"; } std::set balanced tree-based set std::unordered_set hashtable-based set 10

key-value pairs 11 sets are special maps map where values are ignored

hashtable 12 array of some size larger than # of total elements usually prime size hash function: map keys to array indices possible keys hash function hashtable 0 1 2 3 size-1

hash function properties (1) 13 possible keys hash function + mod size hashtable 0 1 2 3 size-1 input: key type (e.g. string) output: unsigned integer then take typically then take mod of the table size result is the bucket used to store info for that key

hash function properties (2) 14 possible keys hash function + mod size hashtable 0 1 2 3 size-1 must be deterministic each key assigned to exactly one bucket

hash function properties (2) 14 possible keys hash function + mod size hashtable 0 1 2 3 size-1 must be deterministic each key assigned to exactly one bucket should be evenly distributed two keys unlikely to share bucket each bucket about as used as each other bucket

hash function properties (2) possible keys hash function + mod size hashtable 0 1 2 3 size-1 must be deterministic each key assigned to exactly one bucket should be evenly distributed two keys unlikely to share bucket each bucket about as used as each other bucket should be fast 14

activity 15 hash students here by birthday or choose arbitrary date just be consistent four options: decade of birth year ((year/10)%10) last digit of birth year (year%10) last digit of birth month (month%10) last digit of birth day (day%10)

exercise 1 hashtable: birthdate info about person w/birthdate which option is best? A. birth year (year) B. birth day (day) C. days between now and birthdate ((date today()).days()) D. year*128 + month*32 + day E. year+month+day F. year*(month 1)*(day 1) recall: deterministic, evenly distributed, fast

17 example (1) key: integers table size: 10 hash function: h(k) = k; hash+mod: k mod 10 insert 7, 18, 41, 34 index keys 0 1 2 3 4 5 7 8 9

17 example (1) key: integers table size: 10 hash function: h(k) = k; hash+mod: k mod 10 insert 7, 18, 41, 34 7, h(7) mod 10 = 7 use bucket 7 index keys 0 1 2 3 4 5 7 7 8 9

17 example (1) key: integers table size: 10 hash function: h(k) = k; hash+mod: k mod 10 insert 7, 18, 41, 34 7, h(7) mod 10 = 7 use bucket 7 18, h(18) mod 10 = 8 use bucket 8 index keys 0 1 2 3 4 5 7 7 8 18 9

17 example (1) key: integers table size: 10 hash function: h(k) = k; hash+mod: k mod 10 insert 7, 18, 41, 34 7, h(7) mod 10 = 7 use bucket 7 18, h(18) mod 10 = 8 use bucket 8 index keys 0 1 41 2 3 4 5 7 7 8 18 9

17 example (1) key: integers table size: 10 hash function: h(k) = k; hash+mod: k mod 10 insert 7, 18, 41, 34 7, h(7) mod 10 = 7 use bucket 7 18, h(18) mod 10 = 8 use bucket 8 index keys 0 1 41 2 3 4 34 5 7 7 8 18 9

17 example (1) key: integers table size: 10 hash function: h(k) = k; hash+mod: k mod 10 insert 7, 18, 41, 34 7, h(7) mod 10 = 7 use bucket 7 18, h(18) mod 10 = 8 use bucket 8 find 34, 28, 90 34, h(34) mod 10 = 4 use bucket 4 found index keys 0 1 41 2 3 4 34 5 7 7 8 18 9

17 example (1) key: integers table size: 10 hash function: h(k) = k; hash+mod: k mod 10 insert 7, 18, 41, 34 7, h(7) mod 10 = 7 use bucket 7 18, h(18) mod 10 = 8 use bucket 8 find 34, 28, 90 34, h(34) mod 10 = 4 use bucket 4 found 28, h(28) mod 10 = 8 use bucket 8 not a match index keys 0 1 41 2 3 4 34 5 7 7 8 18 9

17 example (1) key: integers table size: 10 hash function: h(k) = k; hash+mod: k mod 10 insert 7, 18, 41, 34 7, h(7) mod 10 = 7 use bucket 7 18, h(18) mod 10 = 8 use bucket 8 find 34, 28, 90 34, h(34) mod 10 = 4 use bucket 4 found 28, h(28) mod 10 = 8 use bucket 8 not a match 90, h(90) mod 10 = 0 use bucket 0 nothing there index keys 0 1 41 2 3 4 34 5 7 7 8 18 9

hashtable algorithms 18 find (by key k): compute i = h(k) mod table size, check bucket at index i need to check key other keys may use same bucket insert/remove (by key k): compute i = h(k) mod table size, use bucket at index i but what if bucket is used by another key? find max/min: check all buckets (linear time)

hashing strings 19 unsigned long hashtableindex(const string &s, unsigned long ta return hash(s) % tablesize; } unsigned long hash(const string &s) {??? }

some proposals (1) 20 unsigned long hash(const string &s) { return s[0]; } unsigned long hash(const string &s) { unsigned long sum = 0; for (int i = 0; i < s.size(); ++i) { sum += s[i]; } return sum; }

some proposals (2) 21 unsigned long hash(const string &s) { unsigned long sum = 0; for (int i = 0; i < s.size(); ++i) { // deliberate use of wraparound on overflow sum *= 37; sum += s[i]; } return sum; }

example (2) key: strings table size: 11 hash function: h(k) = k i (ASCII codes) i hash+mod: h(k) mod 11 = k i mod 11 i insert foo, bar, baz index keys 0 1 2 3 4 5 7 8 9 10 find baz, quux 22

example (2) key: strings table size: 11 hash function: h(k) = i hash+mod: h(k) mod 11 = i k i (ASCII codes) k i mod 11 insert foo, bar, baz h( foo ) = 324 bucket 324 mod 11 = 5 index keys 0 1 2 3 4 5 foo 7 8 9 10 find baz, quux 22

example (2) key: strings table size: 11 hash function: h(k) = i hash+mod: h(k) mod 11 = i k i (ASCII codes) k i mod 11 insert foo, bar, baz h( foo ) = 324 bucket 324 mod 11 = 5 h( bar ) = 309 bucket 309 mod 11 = 1 index keys 0 1 bar 2 3 4 5 foo 7 8 9 10 find baz, quux 22

example (2) key: strings table size: 11 hash function: h(k) = i hash+mod: h(k) mod 11 = i k i (ASCII codes) k i mod 11 insert foo, bar, baz h( foo ) = 324 bucket 324 mod 11 = 5 h( bar ) = 309 bucket 309 mod 11 = 1 h( baz ) = 317 bucket 317 mod 11 = 9 index keys 0 1 bar 2 3 4 5 foo 7 8 9 baz 10 find baz, quux 22

example (2) key: strings table size: 11 hash function: h(k) = i hash+mod: h(k) mod 11 = i k i (ASCII codes) k i mod 11 insert foo, bar, baz h( foo ) = 324 bucket 324 mod 11 = 5 h( bar ) = 309 bucket 309 mod 11 = 1 h( baz ) = 317 bucket 317 mod 11 = 9 index keys 0 1 bar 2 3 4 5 foo 7 8 9 baz 10 find baz, quux 22

example (2) key: strings table size: 11 hash function: h(k) = i hash+mod: h(k) mod 11 = i k i (ASCII codes) k i mod 11 insert foo, bar, baz h( foo ) = 324 bucket 324 mod 11 = 5 h( bar ) = 309 bucket 309 mod 11 = 1 h( baz ) = 317 bucket 317 mod 11 = 9 index keys 0 1 bar 2 3 4 5 foo 7 8 9 baz 10 find baz, quux h( quux ) = 317 bucket 47 mod 11 = 5 22

23 example (1b) key: integers table size: 10 hash function: h(k) = k; hash+mod: k mod 10 insert 7, 18, 41, 34, 11 index keys 0 1 41 2 3 4 34 5 7 7 8 18 9

23 example (1b) key: integers table size: 10 hash function: h(k) = k; hash+mod: k mod 10 insert 7, 18, 41, 34, 11 11, h(11) mod 10 = 1 index keys 0 1 41, 11 2 3 4 34 5 7 7 8 18 9

hashtable algorithms 24 find (by key k): compute i = h(k) mod table size, check bucket at index i need to check key other keys may use same bucket insert/remove (by key k): compute i = h(k) mod table size, use bucket at index i but what if bucket is used by another key? find max/min: check all buckets (linear time)

25 option 1: separate chaining class HashTableBucket { int key; HashTableBucket *next; //... + value? }; class HashTable {...; private: vector<hashtablebucket> data; // could also use // vector<hashtablebucket*> }; index 0 1 2 3 4 5 7 8 9 10 // insert {2 (bucket 4), 7 (bucket 7), // 22 (bucket 0), 59 (bucket 4)} key: 22 next: NULL key: 2 next: key: 7 next: NULL key: 59 next: NULL

option 1: separate chaining (alt) class HashTableBucket { int key; HashTableBucket *next; //... + value? }; class HashTable {...; private: vector<hashtablebucket*> data; // could also use // vector<hashtablebucket> }; index 0 1 2 3 4 5 7 8 9 10 // insert {2 (bucket 4), 7 (bucket 7), // 22 (bucket 0), 59 (bucket 4)} key: 22 next: NULL key: 2 next: key: 7 next: NULL key: 59 next: NULL 2

27 load factors and chaining load factor: λ = # elements table size average number of elements per bucket: λ

find performance 28 average* time for find: unsuccessful: check λ items successful: check 1 + λ/2 items (half of list) *assuming we choose random keys?

find performance 28 average* time for find: unsuccessful: check λ items successful: check 1 + λ/2 items (half of list) *assuming we choose random keys?

maybe our keys aren t average 29 h(k) = k index 0 size = 10 1 2 3 index = h(k) mod 10 4 5 λ = 0.3 7 8 9 but if we usually lookup existing keys key: 10 next: key: 20 next: key: 30 next: NULL

why use a linked list? 30 one item/bucket usually if not we should use a balanced tree (or change hash functions?) when not one, probably two or three linked list probably most efficient typical space overhead: one NULL pointer typical time overhead: check the one pointer

linked list alternatives 31 vector way too much extra space size, capacity extra space reserved in array remember: typically just one element balanced trees two pointers about same comparisons as linked list for size 2, 3

find performance revisited 32 with ideal hash function: Θ(λ) (load factor) typically: adjust hashtable size so λ remains approximately constant actual worst case: Θ(n) (I choose all the wrong keys)

insert performance 33 Θ(1) assuming we don t care about checking for a duplicate don t care about sorting the linked list insert at head

delete performance 34 need to do a find to get the bucket then linked list removal Θ(1) (if singly linked list track previous while finding)

rehashing how big should the table be? C number of items typical C 1 2 to 1 35

rehashing how big should the table be? C number of items typical C 1 2 to 1 as number of change: want to resize it! called rehashing because we recompute every key s hash 35

when to rehash? 3 load factor λ = elements/table size typical policy: resize table when λ > threshold java.util policy: when λ > 0.75 alternatives: only when insert fails?

rehashing big-oh 37 worst case: everything hashes to same bucket Θ(n) time per insert Θ(n) inserts Θ(n 2 ) total time if keys are well spread out between buckets about linear time

avoiding linked lists 38 class HashTableBucket {... int key; // 4 bytes int value; // 4 bytes HashTableBucket *next; // 8 bytes }; gosh, that s a lot of overhead even though usually one item/bucket

an alternative index 0 1 2 3 4 5 7 8 9 10 key: 13 next: key: 25 next: key: 37 next: index 0 1 2 key=13 3 key=25 4 key=37 5 7 8 9 10 39

an alternative index 0 1 2 3 4 5 7 8 9 10 this example: all keys hash to bucket 2 (h(k) = k, choose h(k) mod 11) always start searching there index 0 key: 13 1 2 key=13 next: 3 key=25 4 key=37 5 key: 25 next: 7 8 9 key: 37 10 next: 39

an alternative both ways might search all keys with same hash mod size index index 0 0 1 key: 13 1 2 2 key=13 3 next: 3 key=25 4 4 key=37 5 5 key: 25 7 8 next: 7 9 8 10 9 key: 37 10 next: 39

an alternative index 0 1 2 3 4 5 7 8 9 10 difference: new way no next pointers just go to next bucket index 0 key: 13 1 2 key=13 next: 3 key=25 4 key=37 key: 25 5 key=none next: 7 8 9 key: 37 10 next: 39

but what if index 0 1 2 3 4 5 7 8 9 10 key: 13 next: key: 25 next: NULL key: 38 next: index 0 1 2 key=13 3 key=25 4 key=38 5 key=none 7 8 9 10 index 0 1 2 key=13 3 key=38 4 key=25 5 key=none 7 8 9 10 NULL 40

open addressing generally 41 search h(k) + f(0) mod size then h(k) + f(1) mod size then h(k) + f(2) mod size linear probing: f(i) = i

probing possibilities 42 h(k) + f(i) mod size linear: f(i) = i previous diagram quadratic: f(i) = i 2 double hashing f(i) = i h 2 (k) (second hash function)

probing possibilities 43 h(k) + f(i) mod size linear: f(i) = i previous diagram quadratic: f(i) = i 2 double hashing f(i) = i h 2 (k) (second hash function)

linear probing example (2) 44 h(k) = 3k + 7 index = h(k) mod 10 then check h(k) + 1 mod 10, h(k) + 2 mod 10, etc. insert 4, 27, 37, 14, 21 h(k) = 19, 88, 118, 49, 70 index 0 1 2 3 4 5 7 8 9

linear probing example (2) 44 h(k) = 3k + 7 index = h(k) mod 10 then check h(k) + 1 mod 10, h(k) + 2 mod 10, etc. insert 4, 27, 37, 14, 21 h(k) = 19, 88, 118, 49, 70 index 0 1 2 3 4 5 7 8 9 4

linear probing example (2) 44 h(k) = 3k + 7 index = h(k) mod 10 then check h(k) + 1 mod 10, h(k) + 2 mod 10, etc. insert 4, 27, 37, 14, 21 h(k) = 19, 88, 118, 49, 70 index 0 1 2 3 4 5 7 8 27 9 4

linear probing example (2) 44 h(k) = 3k + 7 index = h(k) mod 10 then check h(k) + 1 mod 10, h(k) + 2 mod 10, etc. insert 4, 27, 37, 14, 21 h(k) = 19, 88, 118, 49, 70 index 0 37 1 2 3 4 5 7 8 27 9 4

linear probing example (2) 44 h(k) = 3k + 7 index = h(k) mod 10 then check h(k) + 1 mod 10, h(k) + 2 mod 10, etc. insert 4, 27, 37, 14, 21 h(k) = 19, 88, 118, 49, 70 index 0 37 1 14 2 3 4 5 7 8 27 9 4

linear probing example (2) 44 h(k) = 3k + 7 index = h(k) mod 10 then check h(k) + 1 mod 10, h(k) + 2 mod 10, etc. insert 4, 27, 37, 14, 21 h(k) = 19, 88, 118, 49, 70 index 0 37 1 14 2 21 3 4 5 7 8 27 9 4

the clumping 45 we tend to get clumps of used buckets reason why linear probing isn t the only way

probing possibilities 4 h(k) + f(i) mod size linear: f(i) = i previous diagram quadratic: f(i) = i 2 double hashing f(i) = i h 2 (k) (second hash function)

quadratic probing example 47 h(k) = 3k + 7 index = h(k) mod 10 then check h(k) + 1 2 mod 10, h(k) + 2 2 mod 10, etc. insert 4, 27, 14, 37, 22, 34 h(k) = 19, 88, 49, 118, 73, 109 index 0 1 2 3 4 5 7 8 9

quadratic probing example 47 h(k) = 3k + 7 index = h(k) mod 10 then check h(k) + 1 2 mod 10, h(k) + 2 2 mod 10, etc. insert 4, 27, 14, 37, 22, 34 h(k) = 19, 88, 49, 118, 73, 109 index 0 1 2 3 4 5 7 8 9 4

quadratic probing example 47 h(k) = 3k + 7 index = h(k) mod 10 then check h(k) + 1 2 mod 10, h(k) + 2 2 mod 10, etc. insert 4, 27, 14, 37, 22, 34 h(k) = 19, 88, 49, 118, 73, 109 index 0 1 2 3 4 5 7 8 27 9 4

quadratic probing example 47 h(k) = 3k + 7 index = h(k) mod 10 then check h(k) + 1 2 mod 10, h(k) + 2 2 mod 10, etc. insert 4, 27, 14, 37, 22, 34 h(k) = 19, 88, 49, 118, 73, 109 index 0 14 1 2 3 4 5 7 8 27 9 4

quadratic probing example 47 h(k) = 3k + 7 index = h(k) mod 10 then check h(k) + 1 2 mod 10, h(k) + 2 2 mod 10, etc. insert 4, 27, 14, 37, 22, 34 h(k) = 19, 88, 49, 118, 73, 109 index 0 14 1 2 37 3 4 5 7 8 27 9 4

quadratic probing example 47 h(k) = 3k + 7 index = h(k) mod 10 then check h(k) + 1 2 mod 10, h(k) + 2 2 mod 10, etc. insert 4, 27, 14, 37, 22, 34 h(k) = 19, 88, 49, 118, 73, 109 index 0 14 1 2 37 3 22 4 5 7 8 27 9 4

quadratic probing example 47 h(k) = 3k + 7 index = h(k) mod 10 then check h(k) + 1 2 mod 10, h(k) + 2 2 mod 10, etc. insert 4, 27, 14, 37, 22, 34 h(k) = 19, 88, 49, 118, 73, 109 index 0 14 1 2 37 3 22 4 5 34 7 8 27 9 4

quadratic probing example 47 h(k) = 3k + 7 index = h(k) mod 10 then check h(k) + 1 2 mod 10, h(k) + 2 2 mod 10, etc. insert 4, 27, 14, 37, 22, 34 h(k) = 19, 88, 49, 118, 73, 109 index 0 14 1 2 37 3 22 4 5 34 7 8 27 9 4

probing possibilities 48 h(k) + f(i) mod size linear: f(i) = i previous diagram quadratic: f(i) = i 2 double hashing f(i) = i h 2 (k) (second hash function)

double hashing example 49 h(k) = k index = h(k) mod 10 then check h(k) + h 2 (k) mod 10, h(k) + 2h 2 (k) mod 10, etc. where h 2 (k) = 7 (k mod 7) insert 89, 18, 58, 49, 9, 0 index 0 1 2 3 4 5 7 8 9

double hashing example 49 h(k) = k index = h(k) mod 10 then check h(k) + h 2 (k) mod 10, h(k) + 2h 2 (k) mod 10, etc. where h 2 (k) = 7 (k mod 7) insert 89, 18, 58, 49, 9, 0 index 0 1 2 3 4 5 7 8 9 89

double hashing example 49 h(k) = k index = h(k) mod 10 then check h(k) + h 2 (k) mod 10, h(k) + 2h 2 (k) mod 10, etc. where h 2 (k) = 7 (k mod 7) insert 89, 18, 58, 49, 9, 0 index 0 1 2 3 4 5 7 8 18 9 89

double hashing example 49 h(k) = k index = h(k) mod 10 then check h(k) + h 2 (k) mod 10, h(k) + 2h 2 (k) mod 10, etc. where h 2 (k) = 7 (k mod 7) insert 89, 18, 58, 49, 9, 0 index 0 1 2 3 58 4 5 7 8 18 9 89

double hashing example 49 h(k) = k index = h(k) mod 10 then check h(k) + h 2 (k) mod 10, h(k) + 2h 2 (k) mod 10, etc. where h 2 (k) = 7 (k mod 7) insert 89, 18, 58, 49, 9, 0 index 0 1 2 3 58 4 5 49 7 8 18 9 89

double hashing example 49 h(k) = k index = h(k) mod 10 then check h(k) + h 2 (k) mod 10, h(k) + 2h 2 (k) mod 10, etc. where h 2 (k) = 7 (k mod 7) insert 89, 18, 58, 49, 9, 0 index 0 9 1 2 3 58 4 5 49 7 8 18 9 89

double hashing example 49 h(k) = k index = h(k) mod 10 then check h(k) + h 2 (k) mod 10, h(k) + 2h 2 (k) mod 10, etc. where h 2 (k) = 7 (k mod 7) insert 89, 18, 58, 49, 9, 0 index 0 9 1 2 0 3 58 4 5 49 7 8 18 9 89

double hashing thrashing 50 h(k) = k; h 2 (k) = (k mod 5) + 1 index = h(k) mod 10 then check h(k) + h 2 (k) mod 10, h(k) + 2h 2 (k) mod 10, etc. insert 10, 12, 14, 1, 18, 3 index 0 1 2 3 4 5 7 8 9

double hashing thrashing 50 h(k) = k; h 2 (k) = (k mod 5) + 1 index = h(k) mod 10 then check h(k) + h 2 (k) mod 10, h(k) + 2h 2 (k) mod 10, etc. insert 10, 12, 14, 1, 18, 3 index 0 10 1 2 12 3 4 14 5 1 7 8 18 9

double hashing thrashing h(k) = k; h 2 (k) = (k mod 5) + 1 index = h(k) mod 10 then check h(k) + h 2 (k) mod 10, h(k) + 2h 2 (k) mod 10, etc. insert 10, 12, 14, 1, 18, 3 h(3) mod 10 = h(3) + h 2 (3) mod 10 = 3 + 2 mod 10 = 8 h(3) + 2h 2 (3) mod 10 = 0; h(3) + 3h 2 (3) mod 10 = 2 h(3) + 4h 2 (3) mod 10 = 4; h(3) + 5h 2 (3) mod 10 = index 0 10 1 2 12 3 4 14 5 1 7 8 18 9 50

why prime sizes 51 prime sizes prevent this problem h 2 (k) (2) was not relatively prime to table size (10) result: didn t use all elements of table similar issues with i 2, etc. (but not as bad)

why prime sizes (2) 52 often use prime sizes w/o open addressing why more forgiving for not great hash functions example: h(k) = k are most keys even? oops if table size is even are most keys 10k? oops if table size is multiple of 5

handling removal 53 with chaining: easy remove from linked list with open addressing: hard need to not disrupt searches option 1: rehash every time (super-expensive) option 2: placeholder value + rehash eventually option 3: disallow deletion (lab )

cryptographic hashes 54 example: SHA-25 input: any string of bits output: 25 bits have security properties normal hashes don t: collision resistence preimage resistence

cryptographic hashes 54 example: SHA-25 input: any string of bits output: 25 bits have security properties normal hashes don t: collision resistence preimage resistence

collision resistence 55 security property of a cryptographic hash it s very hard to find keys k 1 and k 2 so h(k 1 ) = h(k 2 ) note: why SHA-25 s output is so big (25 bits) otherwise, just generate lots of hashes example application: verify download with hash of file contents it s very hard to find two files with the same hash even if you re trying

exercise: collision non-resistence 5 unsigned int hash(const string &s) { unsigned int sum = 0; for (int i = 0; i < s.size(); ++i) { // deliberate use of wraparound on overflow sum *= 37; sum += s[i]; } return sum; } exercise: how to construct two strings with same hash?

exercise: collision non-resistence unsigned int hash(const string &s) { unsigned int sum = 0; for (int i = 0; i < s.size(); ++i) { // deliberate use of wraparound on overflow sum *= 37; sum += s[i]; } return sum; } exercise: how to construct two strings with same hash? one idea: {0, x} and {59, x + 37} have the same hash 5

cryptographic hashes 57 example: SHA-25 input: any string of bits output: 25 bits have security properties normal hashes don t: collision resistence preimage resistence

preimage resistence 58 security property of a cryptographic hash if given V, very hard to find k so h(k) = V collision resistence usually implies preimage resistence

some cryptographic hash applications 59 verifying downloads get short hash from trusted source get big file from less trusted source use hash to make sure it s the right big file message authentication did message X really come from where I thought? usually: magic math operation that works on small amount of data hash turns big message into small amount of data