Hashing. October 19, CMPE 250 Hashing October 19, / 25

Similar documents
Lecture 10 March 4. Goals: hashing. hash functions. closed hashing. application of hashing

Successful vs. Unsuccessful

Hash Table and Hashing

Hashing for searching

HASH TABLES.

Hash Tables. Hashing Probing Separate Chaining Hash Function

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

COMP171. Hashing.

CMSC 341 Hashing. Based on slides from previous iterations of this course

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Data Structures And Algorithms

Topic HashTable and Table ADT

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

AAL 217: DATA STRUCTURES

Hash Tables. Johns Hopkins Department of Computer Science Course : Data Structures, Professor: Greg Hager

HASH TABLES. Hash Tables Page 1

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

BBM 201 DATA STRUCTURES

HASH TABLES. Goal is to store elements k,v at index i = h k

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS

Hashing. Hashing Procedures

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

Hashing Techniques. Material based on slides by George Bebis

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Adapted By Manik Hosen

CSE 214 Computer Science II Searching

CS 261 Data Structures

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Hashing. Introduction to Data Structures Kyuseok Shim SoEECS, SNU.

Hashing. It s not just for breakfast anymore! hashing 1

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

Dictionaries-Hashing. Textbook: Dictionaries ( 8.1) Hash Tables ( 8.2)

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Lecture 4. Hashing Methods

Introduction to Hashing

Worst-case running time for RANDOMIZED-SELECT

Cpt S 223. School of EECS, WSU

CS 310 Advanced Data Structures and Algorithms

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

TABLES AND HASHING. Chapter 13

UNIT 5. Sorting and Hashing

Algorithms and Data Structures

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

DATA STRUCTURES AND ALGORITHMS

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

Hashing (Κατακερματισμός)

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

Dictionaries and Hash Tables

Data and File Structures Chapter 11. Hashing

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Dictionaries and Hash Tables

lecture23: Hash Tables

Hash Tables Hash Tables Goodrich, Tamassia

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

csci 210: Data Structures Maps and Hash Tables

Part I Anton Gerdelan

CS 3410 Ch 20 Hash Tables

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

III Data Structures. Dynamic sets

Chapter 27 Hashing. Objectives

Hash[ string key ] ==> integer value

DS ,21. L11-12: Hashmap

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

Open Addressing: Linear Probing (cont.)

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

Hash Table. Ric Glassey

CS 241 Analysis of Algorithms

Chapter 20 Hash Tables

Data Structures & File Management

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Counting Words Using Hashing

Understand how to deal with collisions

IS 2610: Data Structures

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

CSE 100: HASHING, BOGGLE

Maps, Hash Tables and Dictionaries. Chapter 10.1, 10.2, 10.3, 10.5

Unit #5: Hash Functions and the Pigeonhole Principle

9/24/ Hash functions

Fast Lookup: Hash tables

1/9/2014. Unary operation: min, max, sort, makenull, Set + Operations define an ADT.

CS2210 Data Structures and Algorithms

Comp 335 File Structures. Hashing

Use PageUp and PageDown to move from screen to screen. Click on speaker to play sound.

Data Structures and Object-Oriented Design VIII. Spring 2014 Carola Wenk

Data Structures and Algorithms(10)

Transcription:

Hashing October 19, 2016 CMPE 250 Hashing October 19, 2016 1 / 25

Dictionary ADT Data structure with just three basic operations: finditem (i): find item with key (identifier) i insert (i): insert i into the dictionary remove (i): delete i Just like words in a Dictionary Where do we use them: Symbol tables for compiler Customer records (access by name) Games (positions, configurations) Spell checkers,etc. CMPE 250 Hashing October 19, 2016 2 / 25

How to Implement a Dictionary? We can use: Sequences ordered unordered Binary Search Trees Hashtables CMPE 250 Hashing October 19, 2016 3 / 25

Hashing An important and widely useful technique for implementing dictionaries Constant time per operation (on the average) Worst case time proportional to the size of the set for each operation (just like array based and linked implementation) CMPE 250 Hashing October 19, 2016 4 / 25

Basic Idea Use the hash function to map keys into positions in a hash table. Ideally If element e has key k and h is the hash function, then e is stored in position h(k) of the table To search for e, compute h(k) to locate the position. If no element has key k, dictionary does not contain e. CMPE 250 Hashing October 19, 2016 5 / 25

Example Dictionary Student Records Keys are ID numbers (951000-952000), no more than 1000 students Hash function: h(k) = k 951000 maps ID into distinct table positions 0-1000 array table[1001] CMPE 250 Hashing October 19, 2016 6 / 25

Analysis (Ideal Case) O(b) time to initialize the hash table (b is the number of positions or buckets in the hash table) O(1) time to perform insert, remove, finditem CMPE 250 Hashing October 19, 2016 7 / 25

Ideal Case is Unrealistic Works for implementing dictionaries, but many applications have key ranges that are too large to have one-to-one mapping between buckets and keys! Example: Suppose key can take on values from 0.. 65,535 (2 byte unsigned int) Expect 1,000 records at any given time Impractical to use hash table with 65,536 slots! CMPE 250 Hashing October 19, 2016 8 / 25

Hash Functions If the key range is too large, use a hash table with fewer buckets and a hash function which maps multiple keys to the same bucket: h(k 1 ) = β = h(k 2 ): k 1 and k 2 have collision at slot β Popular hash functions: hashing by division h(k) = k%d, where D is the number of buckets in the hash table Example: hash table with 11 buckets h(k) = k%11 80 3 (80%11 = 3), 40 7, 65 10 58 3 collision! CMPE 250 Hashing October 19, 2016 9 / 25

Collision Resolution Policies Two approaches: 1 Open hashing, or separate chaining 2 Closed hashing, or open addressing The difference is about whether collisions are stored outside the table (open hashing) or whether collisions result in storing one of the records at another slot in the table (closed hashing) CMPE 250 Hashing October 19, 2016 10 / 25

Closed Hashing Associated with closed hashing is a rehash strategy: If we try to place x in bucket h(x) and find it occupied, find an alternative location h 1 (x), h 2 (x), etc. Try each in order, if none of them is empty then the table is full, h(x) is called home bucket The simplest rehash strategy is called linear hashing h i (x) = (h(x) + i)%d In general, our collision resolution strategy is to generate a sequence of hash table slots (probe sequence) that can hold the record; test each slot until find the empty one (probing) CMPE 250 Hashing October 19, 2016 11 / 25

Example Linear (Closed) Hashing D = 8, keys a, b, c, d have hash values h(a) = 3, h(b) = 0, h(c) = 4, h(d) = 3 Where do we insert d? 3 is already filled! Probe sequence using linear hashing: h1(d) = (h(d) + 1)%8 = 4%8 = 4 h2(d) = (h(d) + 2)%8 = 5%8 = 5 h3(d) = (h(d) + 3)%8 = 6%8 = 6 etc. 7, 0, 1, 2 Wraps around the beginning of the table! CMPE 250 Hashing October 19, 2016 12 / 25

Operations Using Linear Hashing Test for membership: finditem Examine h(k), h 1 (k), h 2 (k),..., until we find k or an empty bucket or home bucket If no deletions are possible, the strategy works! What if there are deletions? If we reach empty bucket, we cannot be sure that k is not somewhere else and the empty bucket was occupied when k was inserted Need a special placeholder deleted, to distinguish the bucket that was never used from the one that once held a value May need to reorganize the table after many deletions CMPE 250 Hashing October 19, 2016 13 / 25

Performance Analysis - Worst Case Initialization: O(b), b is the # of buckets Insert and search: O(n), n is the number of the elements in the table; all n key values have the same home bucket Not better than a linear list for maintaining dictionary! CMPE 250 Hashing October 19, 2016 14 / 25

Improved Collision Resolution Linear probing: h i (x) = (h(x) + i)%d all buckets in th etable will be candidates for inserting a new record before the probe sequence returns to home position clustering of records, leads to long probing sequences Linear probing with skipping: h i (x) = (h(x) + ic)%d c is a constant other than 1 records with adjacent home buckets will not follow the same probe sequence (Pseudo)Random probing: h i (x) = (h(x) + r i )%D r i is the i th value in a random permutation of numbers from 1 to D 1 insertions and searches use the same sequence of random numbers CMPE 250 Hashing October 19, 2016 15 / 25

Example 1 What if the next element has home bucket 0? go to bucket 3 Same for elements with home bucket 1 or 2! Only a record with home position 3 will stay. p = 4/11 that next record will go to bucket 3 2 Similarly, records hashing to 7,8,9 will end up in 10 3 Only records hashing to 4 will end up in 4 (p=1/11); same for 5 and 6 CMPE 250 Hashing October 19, 2016 16 / 25

Example (Cont.) insert 1052 (home bucket: 7) next element in bucket 3 with p = 8/11 CMPE 250 Hashing October 19, 2016 17 / 25

Hash Functions - Numerical Values Consider: h(x) = x%16 poor distribution, not very random depends solely on least significant four bits of key Better, mid-square method if keys are integers in range 0, 1..., K, pick integer C such that DC 2 about equal to K 2, then h(x) = x 2 /C %D extracts middle r bits of x 2, where 2 r = D (a base-d digit) better, because most or all of bits of key contribute to result CMPE 250 Hashing October 19, 2016 18 / 25

Hash Function Strings of Characters Folding Method int h(string x, int D) { int i, sum; for (sum=0, i=0; i<x.length(); i++) sum+= (int)x.charat(i); return (sum%d); } sums the ASCII values of the letters in the string ASCII value for A =65; sum will be in range 650-900 for 10 upper-case letters; good when D around 100, for example order of chars in string has no effect CMPE 250 Hashing October 19, 2016 19 / 25

Hash Function Strings of Characters Polynomial hash codes static long hashcode(string key, int D) { int h = key.charat(0); } for (int i=1, i<key.length(); i++) { h = h*a; h += (int) key.charat(i); } return h % D; They have found that a = 33, 37, and 41 have less than 7 collisions. CMPE 250 Hashing October 19, 2016 20 / 25

Hash Function Strings of Characters Much better: Cyclic Shift static long hashcode(string key, int D) { int h=0; for (int i=0, i<key.length(); i++) { h = (h << 4) ( h >> 27); h += (int) key.charat(i); } return h%d; } CMPE 250 Hashing October 19, 2016 21 / 25

Open Hashing Each bucket in the hash table is the head of a linked list All elements that hash to a particular bucket are placed on that bucket s linked list Records within a bucket can be ordered in several ways such as by order of insertion, by key value order, or by frequency of access order CMPE 250 Hashing October 19, 2016 22 / 25

Open Hashing Data Organization CMPE 250 Hashing October 19, 2016 23 / 25

Analysis Open hashing is most appropriate when the hash table is kept in the main memory, implemented with a standard in-memory linked list We hope that the number of elements per bucket is roughly equal in size, so that the lists will be short If there are n elements in the set, then each bucket will have roughly n/d members If we can estimate n and choose D to be roughly as large, then the average bucket will have only one or two members CMPE 250 Hashing October 19, 2016 24 / 25

Analysis (Cont.) Average time per dictionary operation: D buckets, n elements in dictionary average n/d elements per bucket insert, finditem, remove operations take O(1 + n/d) time each If we can choose D to be about n, constant time Assuming each element is likely to be hashed to any bucket, running time constant, independent of n CMPE 250 Hashing October 19, 2016 25 / 25