Hash Tables. CS 321 Spring 2015

Similar documents
CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Review of Elementary Data. Manoj Kumar DTU, Delhi

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Comp 335 File Structures. Hashing

Topic HashTable and Table ADT

CMSC 341 Hashing. Based on slides from previous iterations of this course

AAL 217: DATA STRUCTURES

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Hash Tables. Gunnar Gotshalks. Maps 1

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Data and File Structures Chapter 11. Hashing

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

Hash Tables. Hashing Probing Separate Chaining Hash Function

Understand how to deal with collisions

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

CS 206 Introduction to Computer Science II

Hashing Techniques. Material based on slides by George Bebis

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Data Structures And Algorithms

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

CS 241 Analysis of Algorithms

The dictionary problem

CSE 214 Computer Science II Searching

Hash table basics. ate à. à à mod à 83

TABLES AND HASHING. Chapter 13

Fast Lookup: Hash tables

Hash Table and Hashing

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

HASH TABLES. Goal is to store elements k,v at index i = h k

Hashing for searching

HASH TABLES. Hash Tables Page 1

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Hash table basics mod 83 ate. ate

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Chapter 20 Hash Tables

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

ECE 242 Data Structures and Algorithms. Hash Tables II. Lecture 25. Prof.

Data Structures and Algorithms. Chapter 7. Hashing

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Lecture 18. Collision Resolution

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Hash tables. hashing -- idea collision resolution. hash function Java hashcode() for HashMap and HashSet big-o time bounds applications

Computer Science 136 Spring 2004 Professor Bruce. Final Examination May 19, 2004

CSCI 104 Hash Tables & Functions. Mark Redekopp David Kempe

Part I Anton Gerdelan

CMSC 132: Object-Oriented Programming II. Hash Tables

Fundamental Algorithms

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Measuring Input size. Last lecture recap.

Outline. hash tables hash functions open addressing chained hashing

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Worst-case running time for RANDOMIZED-SELECT

Data Structures and Algorithms. Roberto Sebastiani

Introduction to Hashing

Data Structures. Topic #6

Hash table basics mod 83 ate. ate. hashcode()

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

Question Bank Subject: Advanced Data Structures Class: SE Computer

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Why do we need hashing?

Announcements. Submit Prelim 2 conflicts by Thursday night A6 is due Nov 7 (tomorrow!)

THINGS WE DID LAST TIME IN SECTION

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

Hashing. Hashing Procedures

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

Lecture 16 More on Hashing Collision Resolution

Summer Final Exam Review Session August 5, 2009

STRUKTUR DATA. By : Sri Rezeki Candra Nursari 2 SKS

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

Chapter 10. Sorting and Searching Algorithms. Fall 2017 CISC2200 Yanjun Li 1. Sorting. Given a set (container) of n elements

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

Hash table basics mod 83 ate. ate

HASH TABLES.

Algorithms and Data Structures

COMP171. Hashing.

CSc 120. Introduc/on to Computer Programming II. 15: Hashing

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

CSC 261/461 Database Systems Lecture 17. Fall 2017

FINAL EXAM REVIEW CS 200 RECITATIOIN 14

Direct File Organization Hakan Uraz - File Organization 1

Hashing. Given a search key, can we guess its location in the file? Goal: Method: hash keys into addresses

Introduction To Hashing

CSCE 2014 Final Exam Spring Version A

CS 350 : Data Structures Hash Tables

CS 3410 Ch 20 Hash Tables

Open Addressing: Linear Probing (cont.)

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

CSC263 Week 5. Larry Zhang.

Data Structure. Measuring Input size. Composite Data Structures. Linear data structures. Data Structure is: Abstract Data Type 1/9/2014

Transcription:

Hash Tables CS 321 Spring 2015

Todays Topics HW1 Available on Web Site. PA 2 CPUScheduling Due Fri Feb 13 th PA 1 Grades Out today All codes ran. Max Heap Methods Full Radix Sort Hash Tables

Max- Heap Methods Max() Build- MaxHeap() Max- Heapify() Insert() replacekey(i,key) parheapify() useful convenience method. findkey(key)

Full Radix Sort A combinauon sort that runs in linear Ume. Uses muluple passes to sort. Running Ume = O(p*n) where p is the number of passes.

CounUng Sort

Running Time of Radix Sort O(w/d*n + b d ) where w = number of digits in numbers. d = number of digits in counung sort pass. Can do for any base: example base 100.

Trouble with Arrays What are some trouble with Arrays? Arrays can only store data by a numeric index. Arrays can waste a lot of space but are fast. How can I store data by a more general key? What is a real world example of an object sorted by a non- numeric key with data associated? Why? Keep track for a player in a game: Shirt: diamond armor. Legs: chainmail Head: gold helmet.

Specific Goals for our SoluUon. Lookups should be very quick: O(1) if at all possible or as close as possible. As few steps as possible to find. Insert and Deletes should be fast: like arrays. We will assume that objects use unique keys: A key may be a single value. Or may be created from muluple values. We will only consider single value keys.

Common SoluUon: Hash Table A data structure that holds values indexed by keys. Keys are usually strings. The locauon of the value for a given key is found by passing the key to a hash funcuon that returns an index to the correct value. Hash Tables are oben called dicuonaries. Also oben called Tables of Key/Value pairs. Standard implementauons are extremely efficient: close to O(1) for all operauons.

What About Other Data Structues? Must have: Insert(), Delete() and Find(): Arrays: can accomplish in O(1) Ume but are not space efficient (assumes we leave empty space for keys not currently in dicuonary) Binary search trees can accomplish in O(log n) Ume- want faster. are space efficient. Hash Tables: With constraints is ~O(1) for Insert/Delete/Find

Example Array Use SSN for the key. Use an Array to hold: Use an array with range 0-999,999,999 Using the SSN as a key, you have O(1) access to any person object Unfortunately, the number of acuve keys (Social Security Numbers) is much less than the array size (1 billion entries) Est. US populauon, Oct. 20th 2004: 294,564,209 Over 60% of the array would be unused But would be fast and fit in memory.

Hash Table SoluUon Hash on your SSN yields Index into a Table. Hash funcuon must choose good index. Very Useful for When ID numbers are widely spread out When you don t need access in ID order Fits our SSID example.

Hash Table abstract data type. Core methods for a Hash Table: Insert(key,value) ~O(1), add key and value. Delete(key) ~O(1), remove key and value. Search/Find(key) ~O(1), find key and value in table. Internal method criucal method: Hash(key) O(1), compute an index for the given key.

Hash Tables Conceptual View 7 table buckets obj1 key=15 hash value/index 6 5 4 3 2 1 Obj3 key=4 Obj2 key=30 Obj4 key=2 0 Index = hash(key); 7 = hash(15); Obj5 key=1

Hash index/value A hash value or hash index is used to index the hash table (array) A hash funcuon takes a key and returns a hash value/ index The hash index is a integer (to index an array) The key is specific value associated with a specific object being stored in the hash table It is important that the key remain constant for the lifeume of the object

Hash FuncUons & insert( ) Usage summary: int hashvalue = hashfunction (int key); Or hashvalue = hashfunction (String key); Or hashvalue = hashfunction (itemtype item); Insert method: public void insert (int key, itemtype item) { hashvalue = hashfunction (key); } table[hashvalue] = item;

Hash FuncUon Requirements You want a hash funcuon/algorithm that is: Fast Distributes keys throughout the table. Hash funcuons can use as input Integer key values String key values MulUpart key values MulUpart fields, and/or MulUple fields

Simple Hash FuncUon: Mod Stands for modulo: Remainder of X/Y in integer arithmeuc. Example Mod results. 8 mod 5 = 3 9 mod 5 = 4 10 mod 5 = 0 15 mod 5 = 0 Key mod M = 0 if key = M*c What if M is prime and keys!= M*c

Hash Tables: Insert Example For example, if we hash keys 0 1000 into a hash table with 5 entries and use h(key) = key mod 5, we get the following sequence of events: Insert 2 Insert 21 Insert 34 Insert 54 key data key data key data 0 1 2 3 4 2 0 1 2 3 4 21 2 0 1 2 3 4 21 2 34 There is a collision at array entry #4???

Dealing with Collisions A problem arises when we have two keys that hash in the same array entry this is called a collision. There are two ways to resolve collision: Hashing with Chaining (a.k.a. Separate Chaining ): every hash table entry contains a pointer to a linked list of keys that hash in the same entry Hashing with Open Addressing: every hash table entry contains only one key. If a new key hashes to a table entry which is filled, systemaucally examine other table entries unul you find one empty entry to place the new key

Hashing with Chaining The problem is that keys 34 and 54 hash in the same entry (4). We solve this collision by placing all keys that hash in the same hash table entry in a chain (linked list) or bucket (array) pointed by this entry: Insert 54 0 1 2 3 4 other key key data 21 2 54 34 Insert 101 0 1 2 3 4 21 101 2 54 34 CHAIN

Hashing with Chaining What is the running Ume for insert/search/delete? Insert: It takes O(1) Ume to compute the hash funcuon and insert at head of linked list Search: It is proporuonal to max linked list length Delete: Same as search Therefore, in the unfortunate event that we have a bad hash funcuon all n keys may hash in the same table entry giving an O(n) run- Ume! So how can we create a good hash funcuon?

Choosing a Hash FuncUon 1 Uniform Hashing = keys distributed throughout table. Choosing a good hash funcuon requires taking into account the kind of data that will be used. The stausucs of the key distribuuon needs to be accounted for E.g., Choosing the first leser of a last name will likely cause lots of collisions depending on the nauonality of the populauon Many programming systems have hash funcuons built in

Choosing a Hash FuncUon 2 Division/modulo method key mod m m is the array size; in general, it should be prime. MulUplicaUon method Floor ((key*somefracuon mod 1)*arraySize) Where some fracuon is typically 0.618 Java Hash Map method Create a hash by performing a series of shibs, adds, and xors on the key index = hash mod arraysize

Prime Number DistribuUon For example, assume Keys (key values) are muluples of 5 5, 10, 15, 20, 25 The keys are evenly distributed 5 to 245 An M (the divisor) of 7 Then, the hash values will be evenly distributed from 0 to 6 for the keys See table à If M was 5, then you would have what kind of distribuuon? Key mod M Total 0 7 1 7 2 7 3 7 4 7 5 7 6 7 (blank) Grand Total 49 hash value = key mod m (m is typically the table size)

Choosing Hash FuncUon 3 If keys are non- random e.g. part numbers Use all data to contribute to the hash funcuon to get a beser distribuuon Consider folding sum the natural (or arbitrary) groups of digits in key Don t use redundant or non- data (.e.g. checksum values) Do not use informauon that might change! è Analyze your expected key values (or some representauve subset) to make sure your hash funcuon gives a good distribuuon!

Hashing with Open Addressing So far we have studied hashing with chaining, using a list to store the items that hash to the same locauon Another opuon is to store all the items (references to single items) directly in the table. Open addressing collisions are resolved by systemaucally examining other table indexes, i 0, i 1, i 2, unul an empty slot is located.

Hash Tables Open Addressing table I = key mod 8 hash value/index 7 6 5 4 3 2 1 0 Index=4 obj1 key=15 Index=4 Obj5 key=1 Obj3 key=4 Obj4 key=2 Obj2 Key=28

Open Addressing The key is first mapped to an array cell using the hash funcuon (e.g. key % array- size) If there is a collision find an available array cell There are different algorithms to find (to probe for) the next array cell Linear H+1,H+2,H+3, unul empty slot. QuadraUc H+1*1, H+2*2, H+3*3, H+4*4, Double Hashing hash again with a different hash funcuon.

Probe Algorithms (Collision ResoluUon) Linear Probing Choose the next available array cell First try arrayindex = hash value + 1 Then try arrayindex = hash value + 2 Be sure to wrap around the end of the array! arrayindex = (arrayindex + 1) % arraysize Stop when you have tried all possible array indices If the array is full, you need to throw an excepuon or, beser yet, resize the array QuadraUc Probing VariaUon of linear probing that uses a more complex funcuon to calculate the next cell to try

Double Hashing Apply a second hash funcuon aber the first The second hash funcuon, like the first, is dependent on the key Secondary hash funcuon must Be different than the first And, obviously, not generate a zero Good algorithm: arrayindex = (arrayindex + stepsize) % arraysize; Where stepsize = constant (key % constant) And constant is a prime less than the array size

Problems Linear Probing yields clusters. QuadraUc Probing yields secondary clusters. Double hashing can avoid both. Depends on secondary hash funcuon.

The End