CSE373: Data Structures & Algorithms Lecture 6: Hash Tables

Similar documents
CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

B+ Tree Review. CSE332: Data Abstractions Lecture 10: More B Trees; Hashing. Can do a little better with insert. Adoption for insert

Topic HashTable and Table ADT

HASH TABLES. CSE 332 Data Abstractions: B Trees and Hash Tables Make a Complete Breakfast. An Ideal Hash Functions.

CSE 373: Data Structures and Algorithms

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

CMSC 132: Object-Oriented Programming II. Hash Tables

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

COMP171. Hashing.

Lecture 6: Hashing Steven Skiena

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Fast Lookup: Hash tables

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

csci 210: Data Structures Maps and Hash Tables

ECE242 Data Structures and Algorithms Fall 2008

Open Addressing: Linear Probing (cont.)

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

stacks operation array/vector linked list push amortized O(1) Θ(1) pop Θ(1) Θ(1) top Θ(1) Θ(1) isempty Θ(1) Θ(1)

CMSC 341 Hashing. Based on slides from previous iterations of this course

AAL 217: DATA STRUCTURES

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

HASH TABLES. Goal is to store elements k,v at index i = h k

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Hash Table and Hashing

CSE 332 Winter 2015: Midterm Exam (closed book, closed notes, no calculators)

HASH TABLES.

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Hash[ string key ] ==> integer value

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

CS 206 Introduction to Computer Science II

CSC263 Week 5. Larry Zhang.

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

Cpt S 223. School of EECS, WSU

Introduction to Data Management. Lecture 15 (More About Indexing)

CSE 214 Computer Science II Searching

Hash Tables. Hash Tables

BINARY HEAP cs2420 Introduction to Algorithms and Data Structures Spring 2015

Hash-Based Indexing 1

CS 251, LE 2 Fall MIDTERM 2 Tuesday, November 1, 2016 Version 00 - KEY

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Maps, Hash Tables and Dictionaries. Chapter 10.1, 10.2, 10.3, 10.5

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

1 CSE 100: HASH TABLES

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

Dictionaries. Application

Hashing. October 19, CMPE 250 Hashing October 19, / 25

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

Data Structures and Algorithms. Chapter 7. Hashing

CSE 373 Autumn 2012: Midterm #2 (closed book, closed notes, NO calculators allowed)

Lecture 7: Efficient Collections via Hashing

Lecture 10: Introduction to Hash Tables

Priority Queue. 03/09/04 Lecture 17 1

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

lecture23: Hash Tables

Understand how to deal with collisions

Hashing Algorithms. Hash functions Separate Chaining Linear Probing Double Hashing

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

Part 1: Written Questions (60 marks):

Data Structures and Algorithms. Roberto Sebastiani

CSE 332 Spring 2014: Midterm Exam (closed book, closed notes, no calculators)

Hashing Techniques. Material based on slides by George Bebis

Dictionaries and Hash Tables

Hash table basics. ate à. à à mod à 83

Data and File Structures Chapter 11. Hashing

CSc 120. Introduc/on to Computer Programming II. 15: Hashing

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

Hash tables. hashing -- idea collision resolution. hash function Java hashcode() for HashMap and HashSet big-o time bounds applications

Questions. 6. Suppose we were to define a hash code on strings s by:

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Comp 335 File Structures. Hashing

Hash table basics mod 83 ate. ate

Lecture 4. Hashing Methods

Introduction to Data Management. Lecture 21 (Indexing, cont.)

CS 350 Algorithms and Complexity

UNIT III BALANCED SEARCH TREES AND INDEXING

CSCI 104 Hash Tables & Functions. Mark Redekopp David Kempe

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

Algorithms and Data Structures

CS261 Data Structures. Maps (or Dic4onaries)

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

Standard ADTs. Lecture 19 CS2110 Summer 2009

BBM371& Data*Management. Lecture 6: Hash Tables

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

CS2210 Data Structures and Algorithms

Chapter 20 Hash Tables

Lecture 17. Improving open-addressing hashing. Brent s method. Ordered hashing CSE 100, UCSD: LEC 17. Page 1 of 19

Ch 3.4 The Integers and Division

STRUKTUR DATA. By : Sri Rezeki Candra Nursari 2 SKS

CIS 190: C/C++ Programming. Lecture 12 Student Choice

Data Structures. Topic #6

Hash-Based Indexes. Chapter 11 Ramakrishnan & Gehrke (Sections ) CPSC 404, Laks V.S. Lakshmanan 1

CS 241 Analysis of Algorithms

Transcription:

Lecture 6: Hash Tables Hunter Zahn Summer 2016 Summer 2016 1

MoCvaCng Hash Tables For a dic$onary with n key, value pairs insert find delete Unsorted linked- list O(1) O(n) O(n) Unsorted array O(1) O(n) O(n) Sorted linked list O(n) O(n) O(n) Sorted array O(n) O(log n) O(n) Balanced tree O(log n) O(log n) O(log n) Magic array O(1) O(1) O(1) Sufficient magic : Use key to compute array index for an item in O(1) Cme [doable] Have a different index for every item [magic] Summer 2016 2

MoCvaCng Hash Tables Let s say you are tasked with councng the frequency of integers in a text file. You are guaranteed that only the integers 0 through 100 will occur: For example: 5, 7, 8, 9, 9, 5, 0, 0, 1, 12 Result: 0 à 2 1 à 1 5 à 2 7 à 1 8 à 1 9 à 2 What structure is appropriate? Tree? List? Array? 2 1 2 1 1 2 0 1 2 3 4 5 6 7 8 9 Summer 2016 3

MoCvaCng Hash Tables Now what if we want to associate name to phone number? Suppose keys are first, last names how big is the key space? Maybe we only care about students Summer 2016 4

Hash Tables Aim for constant- Cme (i.e., O(1)) find, insert, and delete On average under some oden- reasonable assumpcons A hash table is an array of some fixed size Basic idea: hash table 0 hash funccon: index = h(key) key space (e.g., integers, strings) Summer 2016 5 TableSize 1

Summer 2016 6

Hash Tables vs. Balanced Trees In terms of a DicConary ADT for just insert, find, delete, hash tables and balanced trees are just different data structures Hash tables O(1) on average (assuming we follow good praccces) Balanced trees O(log n) worst- case Constant- Cme is bejer, right? Yes, but you need hashing to behave (must avoid collisions) Yes, but findmin, findmax, predecessor, and successor go from O(log n) to O(n), printsorted from O(n) to O(n log n) Why your textbook considers this to be a different ADT Summer 2016 7

Hash Tables There are m possible keys (m typically large, even infinite) We expect our table to have only n items n is much less than m (oden wrijen n << m) Many dicconaries have this property Compiler: All possible idencfiers allowed by the language vs. those used in some file of one program Database: All possible student names vs. students enrolled AI: All possible chess- board configuracons vs. those considered by the current player Summer 2016 8

Hash funccons An ideal hash funccon: Fast to compute Rarely hashes two used keys to the same index Oden impossible in theory but easy in praccce Will handle collisions later hash table 0 hash funccon: index = h(key) key space (e.g., integers, strings) Summer 2016 9 TableSize 1

Simple Integer Hash FuncCons key space K = integers TableSize = 7 h(k) = K % 7 Insert: 7, 18, 41 0 1 2 3 4 5 6 7 18 41 Summer 2016 10

Simple Integer Hash FuncCons key space K = integers TableSize = 10 h(k) =?? Insert: 7, 18, 41, 34 What happens when we insert 44? 0 1 2 3 4 5 6 7 8 9 41 34 7 18 Summer 2016 11

Aside: ProperCes of Mod To keep hashed values within the size of the table, we will generally do: h(k) = funccon(k) % TableSize (In the previous examples, funccon(k) = K.) Useful properces of mod: (a + b) % c = [(a % c) + (b % c)] % c (a b) % c = [(a % c) (b % c)] % c a % c = b % c (a b) % c = 0 Summer 2016 12

Designing Hash FuncCons Oden based on modular hashing: h(k) = f(k) % P P is typically the TableSize P is oden chosen to be prime: Reduces likelihood of collisions due to pajerns in data Is useful for guarantees on certain hashing strategies (as we ll see) Equivalent objects MUST hash to the same locacon Summer 2016 13

Designing Hash FuncCons: h(k) = f(k) % P f(k) =?? Summer 2016 14

Some String Hash FuncCons key space = strings K = s 0 s 1 s 2 s m- 1 (where s i are chars: s i [0, 128]) 1. h(k) = s 0 % TableSize m 1 2. h(k) = % TableSize $ & % i= 0 m 1 3. h(k) = s i 37 i % TableSize i=0 s i ' ) ( H( batman ) = H( ballgame ) H( spot ) = H( pots ) Summer 2016 15

What to hash? We will focus on the two most common things to hash: ints and strings For objects with several fields, usually best to have most of the idencfying fields contribute to the hash to avoid collisions Example: class Person { String first; String middle; String last; Date birthdate; } An inherent trade- off: hashing- Cme vs. collision- avoidance Bad idea(?): Use only first name Good idea(?): Use only middle inical? CombinaCon of fields? Admijedly, what- to- hash- with is oden unprincipled L Summer 2016 16

Deep Breath Recap Summer 2016 17

Hash Tables: Review Aim for constant- Cme (i.e., O(1)) find, insert, and delete On average under some reasonable assumpcons A hash table is an array of some fixed size But growable as we ll see hash table 0 client hash table library E int table- index collision? collision resolucon TableSize 1 Summer 2016 18

Collision resolucon Collision: When two keys map to the same locacon in the hash table We try to avoid it, but number- of- keys exceeds table size So hash tables should support collision resolucon Ideas? Summer 2016 19

Separate Chaining 0 / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / Chaining: All keys that map to the same table locacon are kept in a list (a.k.a. a chain or bucket ) As easy as it sounds Example: insert 10, 22, 107, 12, 42 with mod hashing and TableSize = 10 Summer 2016 20

Separate Chaining 0 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / Chaining: All keys that map to the same table locacon are kept in a list (a.k.a. a chain or bucket ) As easy as it sounds Example: insert 10, 22, 107, 12, 42 with mod hashing and TableSize = 10 Summer 2016 21

Separate Chaining 0 1 / 2 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / 22 / Chaining: All keys that map to the same table locacon are kept in a list (a.k.a. a chain or bucket ) As easy as it sounds Example: insert 10, 22, 107, 12, 42 with mod hashing and TableSize = 10 Summer 2016 22

Separate Chaining 0 1 / 2 3 / 4 / 5 / 6 / 7 8 / 9 / 10 / 22 / 107 / Chaining: All keys that map to the same table locacon are kept in a list (a.k.a. a chain or bucket ) As easy as it sounds Example: insert 10, 22, 107, 12, 42 with mod hashing and TableSize = 10 Summer 2016 23

Separate Chaining 0 1 / 2 3 / 10 / 12 22 / Chaining: All keys that map to the same table locacon are kept in a list (a.k.a. a chain or bucket ) 4 / 5 / As easy as it sounds 6 / 7 8 / 9 / 107 / Example: insert 10, 22, 107, 12, 42 with mod hashing and TableSize = 10 Summer 2016 24

Separate Chaining 0 1 / 2 3 / 4 / 5 / 6 / 7 8 / 9 / 10 / 42 107 / 12 22 / Chaining: All keys that map to the same table locacon are kept in a list (a.k.a. a chain or bucket ) As easy as it sounds Example: insert 10, 22, 107, 12, 42 with mod hashing and TableSize = 10 Summer 2016 25

More rigorous chaining analysis DefiniCon: The load factor, λ, of a hash table is λ = N TableSize number of elements Under chaining, the average number of elements per bucket is λ So if some inserts are followed by random finds, then on average: Each unsuccessful find compares against λ items So we like to keep λ fairly low (e.g., 1 or 1.5 or 2) for chaining Summer 2016 26