Hash Table. Ric Glassey

Similar documents
Maps,Hash(es) We need more second year reps

HASH TABLES. Goal is to store elements k,v at index i = h k

Hash Tables. Johns Hopkins Department of Computer Science Course : Data Structures, Professor: Greg Hager

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

The dictionary problem

MIDTERM EXAM THURSDAY MARCH

Amortized Analysis. Ric Glassey

CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS

CSC 321: Data Structures. Fall 2016

27/04/2012. Objectives. Collection. Collections Framework. "Collection" Interface. Collection algorithm. Legacy collection

Hashing. October 19, CMPE 250 Hashing October 19, / 25

11/27/12. CS202 Fall 2012 Lecture 11/15. Hashing. What: WiCS CS Courses: Inside Scoop When: Monday, Nov 19th from 5-7pm Where: SEO 1000

Dictionaries-Hashing. Textbook: Dictionaries ( 8.1) Hash Tables ( 8.2)

CSC 321: Data Structures. Fall 2017

Hashing. It s not just for breakfast anymore! hashing 1

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

This lecture. Iterators ( 5.4) Maps. Maps. The Map ADT ( 8.1) Comparison to java.util.map

CS2210 Data Structures and Algorithms

Hashing. Hashing Procedures

Hash Tables Hash Tables Goodrich, Tamassia

Review. CSE 143 Java. A Magical Strategy. Hash Function Example. Want to implement Sets of objects Want fast contains( ), add( )

Priority Queue Sorting

Collections and Maps

THE WEB. Document IDs. Index & Freq s. Search engine servers. user query. Rank results. crawl the web. Create word index. i do not like them

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

Collections Framework: Part 2

Maps, Hash Tables and Dictionaries. Chapter 10.1, 10.2, 10.3, 10.5

Hash Tables. Gunnar Gotshalks. Maps 1

Fall 2017 Mentoring 9: October 23, Min-Heapify This. Level order, bubbling up. Level order, bubbling down. Reverse level order, bubbling up

Data Structures Lecture 12

csci 210: Data Structures Maps and Hash Tables

Lecture 16: HashTables 10:00 AM, Mar 2, 2018

Announcements. Submit Prelim 2 conflicts by Thursday night A6 is due Nov 7 (tomorrow!)

1 / 22. Inf 2B: Hash Tables. Lecture 4 of ADS thread. Kyriakos Kalorkoti. School of Informatics University of Edinburgh

1 CSE 100: HASH TABLES

(f) Given what we know about linked lists and arrays, when would we choose to use one data structure over the other?

Introduction to Hashing

Cpt S 223. School of EECS, WSU

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Le L c e t c ur u e e 8 To T p o i p c i s c t o o b e b e co c v o e v r e ed e Collections

HASH TABLES.

Data Structures. COMS W1007 Introduction to Computer Science. Christopher Conway 1 July 2003

Standard ADTs. Lecture 19 CS2110 Summer 2009

CS 10: Problem solving via Object Oriented Programming Winter 2017

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

Dictionaries and Hash Tables

of characters from an alphabet, then, the hash function could be:

Hash[ string key ] ==> integer value

11-1. Collections. CSE 143 Java. Java 2 Collection Interfaces. Goals for Next Several Lectures

Hashing as a Dictionary Implementation

DATA STRUCTURES AND ALGORITHMS

1.00/ Introduction to Computers and Engineering Problem Solving. Final Exam / December 21, 2005

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

Lecture 4. Hashing Methods

1.00 Lecture 32. Hashing. Reading for next time: Big Java Motivation

DS ,21. L11-12: Hashmap

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

CS 270 Algorithms. Oliver Kullmann. Generalising arrays. Direct addressing. Hashing in general. Hashing through chaining. Reading from CLRS for week 7

HASH TABLES. Hash Tables Page 1

Lecture 18. Collision Resolution

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Topic HashTable and Table ADT

COURSE 4 PROGRAMMING III OOP. JAVA LANGUAGE

CS Ananda Gunawardena

CSED233: Data Structures (2017F) Lecture10:Hash Tables, Maps, and Skip Lists

Topic #9: Collections. Readings and References. Collections. Collection Interface. Java Collections CSE142 A-1

CMSC 132: Object-Oriented Programming II. Hash Tables

Week 9. Hash tables. 1 Generalising arrays. 2 Direct addressing. 3 Hashing in general. 4 Hashing through chaining. 5 Hash functions.

A simple map: Hashtable

Collections, Maps and Generics

Outline. 1 Hashing. 2 Separate-Chaining Symbol Table 2 / 13

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Java Collections. Readings and References. Collections Framework. Java 2 Collections. References. CSE 403, Winter 2003 Software Engineering

CS2 Algorithms and Data Structures Note 4

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Algorithms and Data Structures

CS 3410 Ch 20 Hash Tables

Announcements. Container structures so far. IntSet ADT interface. Sets. Today s topic: Hashing (Ch. 10) Next topic: Graphs. Break around 11:45am

CS 241 Analysis of Algorithms

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

COMP 103 RECAP-TODAY. Hashing: collisions. Collisions: open hashing/buckets/chaining. Dealing with Collisions: Two approaches

STANDARD ADTS Lecture 17 CS2110 Spring 2013

Abstract Data Types (ADTs) Queues & Priority Queues. Sets. Dictionaries. Stacks 6/15/2011

9/16/2010 CS Ananda Gunawardena

Fast Lookup: Hash tables

Hashing (Κατακερματισμός)

1.00/ Introduction to Computers and Engineering Problem Solving. Final Exam / December 21, 2005

2 Fundamentals of data structures

Hashing Techniques. Material based on slides by George Bebis

Fall 2017 Mentoring 7: October 9, Abstract Data Types

CS 310 Advanced Data Structures and Algorithms

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Mapping Structures. Chapter An Example: Language Dictionaries

CS S-17 Recursion IV 1. ArrayLists give some extra functionality to arrays (automatic resizing, code for inserting, etc)

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Transcription:

Hash Table Ric Glassey glassey@kth.se

Overview Hash Table Aim: Describe the map abstract data type with efficient insertion, deletion and search operations Motivation: List data structures are divided by their underlying implementation, and combining their respective best properties is desirable Maps and Hash tables Key concepts Hashing and compression Collisions and chaining Load and efficiency 2

THE MISSING LIST 3

B & C search the Web B: I d like to search for cats... B: Hej, C, what s the IP address for Google? C: Why it s 216.58.209.142... C: Or type google in the search box Not a user friendly system, rather We need a simple addressing scheme for websites (URL) We want the URL to reliably map to an IP address In general, some arbitrary key k maps to some value v is a useful construct for many applications So far the only keys used have been integers 4

List Limitations Recall that behind the List abstract data type are two implementations (array & linked list) with advantages and disadvantages Data Structure Array Linked List Opera,on Search* O(1) O(n) slow to search Insert O(n) O(1) Delete O(n) O(1) * Index based retrieval Assume doubly linked list slow to update 5

Desirable Properties There are many applications that require both efficient search + update Cheap to insert & delete items Fast to search for items Leads to a classical engineer s dilemma: Cheap, Fast, Simple. You can only pick two :( perhaps combine with Reliable, and Secure, or some other 6

MAP ABSTRACT DATA TYPE 7

Abstract data type Map Efficiently stores and retrieves values, based upon a unique search key Map is said to store key-value pairs (k, v) Keys must be unique, such that k maps only to v Key acts like an index Key can be of arbitrary type (not just numeric) Key Blue maps to value RGB(0, 0,255) Key Red maps to value RGB(255, 0 0) Key Fuchsia maps to value RGB(255, 0, 255) 8

Primary operations Insert ( key, value ) Delete ( key ) Search ( key ) Map Operations Depending upon specific implementations, many more operations are included (see later) mostly utility functions Implementations also commonly referred to: Hash Table, Dictionary, Associative Array 9

Simple Direct Addressing 1 U Universe of Keys 7 4 K Actual Keys 2 5 0 9 2 2 v 3 6 8 T 3 5 8 0 1 2 3 4 5 6 7 8 9 Key Value 3 v 5 v 8 v Entries Essen,al ac,ng as a random access array However, what happens if all U keys have to be an,cipated and exist in T? 10

Accommodating all keys? 1) If the set of keys becomes large in U U Universe of Keys K Actual Keys T 3) The amount of wasted space in T becomes a resource concern 2) Whilst the actual keys used K is relatively small Direct addressing is not a space- efficient approach 11

HASHING AND COMPRESSION 12

Hashing Chopping & mixing Ideally we want to avoid direct addressing Maintain a more space efficient table T of size N Allow arbitrary types as keys (not just integers) We can design some function h(k) that converts k into an integer i (to index a position in T), that falls within the range of [0, N-1] Hash Function Hash Code Compression Function 13

Hash Code Aim is to generate an integer from input key No need to be bounded by table size Can be negative But should avoid collisions as much as possible h(k1) == h(k2) Bit representation strategy If data type uses as many bits as hash code integers e.g. Java uses 32 bit hash codes, so byte, char, int, short can simply be cast to int, so h(13) = 0...1101 Other schemes Polynomial hash codes Cyclic-shift hash codes override Java s hashcode( ) method and make your own 14

Compression Function A hash code may not lie within the bounds [0, N-1] of a table with size N, and it needs to be converted to fall within this range. A good compression function should also seek to minimise the number of collisions Division method simple approach, but suffers from repeated patterns of hash codes being copied through to hash values i mod N MAD method Multiple-Add-Divide [(ai + b) mod p] mod N p is prime > N a,b are random integers from [0, p-1], with a > 0 15

COLLISIONS AND CHAINING 16

Managing Collisions Collisions are a consequence of using hashing functions, and eventually some h(k2) == h(k5) T U Universe of Keys h(k1) K Actual Keys k1 k6 k2 k5 k4 h(k4) h(k2) == h(k5) h(k6) 17

Separate Chaining To deal with collisions, we can simply extend the capacity of a slot to have its own DL-List T U Universe of Keys / k1 k4 / K Actual Keys k6 k5 k8 k3 k1 k2 k4 k7 / k5 k2 / k3 / k7 / / k6 k8 / Why DL- List? 4) Where collisions occur, use a doubly-linked list 18

Back to lists? Ideally, the size of a bucket should never become too large Operations within the buckets will be proportional to their size Insert and Remove are still O(1) Search is O(n) Pathological case is only one slot active with a bucket containing all entries in a hash table :( As more collisions occur, the load on the table increases and efficiency will begin to decrease 19

LOAD AND SIZE 20

Load Factor Simple measure of health α = number of entries (n) / number of slots (N) α"="n/n ""="3/8 ""="0.375 α"="n/n ""="8/8 ""="1.0 As α 1, what problems can we expect to occur? What is the solu,on? 21

Resizing To maintain efficiency and limit collisions, we set a threshold of α < 1, and resize the table Use a dynamic table that doubles it size once the threshold is reached Then, rehash all keys* k1 rehash k1,k2,k3,k4 α"="n/n ""="4/8 ""="0.5 k2 k3 k3 k1 k4 threshold reached! k4 * we may only have to re- compress we may want to shrink or contract the table...why? k2 double table 22

PERFORMANCE 23

Summary of Hash Table Performance Data Structure Array Linked List Hash Table average worst Opera,on Search* O(1) O(n) O(1) O(n) Insert O(n) O(1) O(1) O(n) Delete O(n) O(1) O(1) O(n) * Index or Key based search Assume doubly linked list 24

JAVA S MAP INTERFACE & IMPLEMENTATIONS 25

Java s Map Interface Subset of operations include: boolean containskey(object key) boolean containsvalue(object value) V get(object key) V put(k key, V value) V remove(object key) int size( ) # n of k,v mappings Set<K> keyset( ) Collection<V> values( ) Set<Map.Entry<K, V>>entrySet( ) 26

Implementation and Usage of Map e.g. Hashtable, HashMap, TreeMap import java.util.*; public class Freq { public static void main(string[] args) { Map<String, Integer> m = new HashMap<String, Integer>(); // Initialize frequency table from command line for (String a : args) { Integer freq = m.get(a); m.put(a, (freq == null)? 1 : freq + 1); } } } System.out.println(m.size() + " distinct words:"); System.out.println(m); hqp://docs.oracle.com/javase/tutorial/collec,ons/interfaces/map.html 27

Readings Algorithms and Data Structures Stefan Nilsson s text on Hash Tables http://www.nada.kth.se/~snilsson/algoritmer/hashtabell/ Introduction to Algorithms, 3 rd Edition Chapter 11: Hash Tables Full text available via KTH Library http://kth-primo.hosted.exlibrisgroup.com/ KTH:KTH_SFX2560000000068328 Data Structures and Algorithms in Java, 6 th Edition Goodrich et al. Chapter 10: Maps, Hash Tables and Skip Lists Full text available via KTH Library http://kth-primo.hosted.exlibrisgroup.com/ KTH:KTH_SFX3710000000333147 28