CSCD 326 Data Structures I Hashing

Similar documents
5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Hash Tables. Hashing Probing Separate Chaining Hash Function

Data Structures And Algorithms

Question Bank Subject: Advanced Data Structures Class: SE Computer

Adapted By Manik Hosen

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Comp 335 File Structures. Hashing

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Understand how to deal with collisions

COMP171. Hashing.

TABLES AND HASHING. Chapter 13

Data and File Structures Chapter 11. Hashing

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Chapter 27 Hashing. Liang, Introduction to Java Programming, Eleventh Edition, (c) 2017 Pearson Education, Inc. All rights reserved.

1. Attempt any three of the following: 15

Hash Table and Hashing

CS 2412 Data Structures. Chapter 10 Sorting and Searching

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

Successful vs. Unsuccessful

Chapter 20 Hash Tables

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

Topic HashTable and Table ADT

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Hash Tables. Gunnar Gotshalks. Maps 1

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Direct File Organization Hakan Uraz - File Organization 1

DATA STRUCTURES/UNIT 3

Open Addressing: Linear Probing (cont.)

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Algorithms and Data Structures

Introduction to Hashing

HASH TABLES. Goal is to store elements k,v at index i = h k

Introduction To Hashing

Hash-Based Indexing 1

Module 3: Hashing Lecture 9: Static and Dynamic Hashing. The Lecture Contains: Static hashing. Hashing. Dynamic hashing. Extendible hashing.

HASH TABLES. Hash Tables Page 1

4. SEARCHING AND SORTING LINEAR SEARCH

Worst-case running time for RANDOMIZED-SELECT

Hashing. Hashing Procedures

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Hash[ string key ] ==> integer value

9/24/ Hash functions

Hashing for searching

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Data Structures (CS 1520) Lecture 23 Name:

Dictionaries and Hash Tables

Part I Anton Gerdelan

Hashing. Introduction to Data Structures Kyuseok Shim SoEECS, SNU.

HASH TABLES.

Cpt S 223. School of EECS, WSU

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Lecture 4. Hashing Methods

ECE 242 Data Structures and Algorithms. Hash Tables I. Lecture 24. Prof.

CS 350 : Data Structures Hash Tables

UNIT 5. Sorting and Hashing

III Data Structures. Dynamic sets

Chapter 27 Hashing. Objectives

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

Use PageUp and PageDown to move from screen to screen. Click on speaker to play sound.

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

Hashing. October 19, CMPE 250 Hashing October 19, / 25

CSE 214 Computer Science II Searching

Hashing Techniques. Material based on slides by George Bebis

On my honor I affirm that I have neither given nor received inappropriate aid in the completion of this exercise.

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Data Structures. Topic #6

Hashing Algorithms. Hash functions Separate Chaining Linear Probing Double Hashing

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

Database Technology. Topic 7: Data Structures for Databases. Olaf Hartig.

Data Structures & File Management

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

AAL 217: DATA STRUCTURES

Fundamental Algorithms

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

Priority Queue. 03/09/04 Lecture 17 1

Data Structures and Algorithms(10)

Hash Functions. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

CS 3410 Ch 20 Hash Tables

Two states of the queue, either empty or containing some elements, can be judged by the following tests:

Outline. hash tables hash functions open addressing chained hashing

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

DATA STRUCTURES AND ALGORITHMS

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY THIRD SEMESTER B.TECH DEGREE EXAMINATION, JULY 2017 CS205: DATA STRUCTURES (CS, IT)

Hash Tables and Hash Functions

BBM371& Data*Management. Lecture 6: Hash Tables

Hash table basics mod 83 ate. ate. hashcode()

Linked lists (6.5, 16)

Hashing IV and Course Overview

ECE 242 Data Structures and Algorithms. Hash Tables II. Lecture 25. Prof.

SFU CMPT Lecture: Week 8

We don t have much time, so we don t teach them [students]; we acquaint them with things that they can learn. Charles E. Leiserson

Transcription:

1 CSCD 326 Data Structures I Hashing

Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional searching time complexity available is O(log2n) for binary search Binary search requires that data be stored in sorted order. Hashing approach to data storage and retrieval: Contiguous memory is not used and memory is sacrificed for speed. Often used for symbol table management in compilers, assemblers, and linker/loaders. 2

Hashing - Basic Ideas Data storage - hashing relies primarily on arrays for data storage but not on contiguous storage within the array Data storage/retrieval method: use a math function which, when given the key or data value to be stored, returns an array index in which to store the value. This is referred to as a "hash function." The same function will be used to retrieve the value later on. 3

Simple Example of hashing Employee data is to be stored using employee number as a key. Employee numbers are unique and run from 10,000 to 19,999. Storage: use an array of size 10,000. Hash function: Emp. Number - 10000 provides a unique index into the array and that array location is used to store/retrieve information for this employee. Problem: key values (in other situations) are often not unique or do not fall into a range which allows a reasonable size array. 4

Goals for Hashing Functions The same key value (value used for insertion) should always return the same index. If it does not - data can't be retrieved later. As much as possible - different key values should not hash to the same index. This is done by mixing things up with the hash function so that common patterns in key values do not hash to the same locations. This can never be prevented however - so collision handling becomes an issue. 5

6 Hash Function Construction Methods Using numeric ASCII values of characters: Example key: JUNK Add ASCII values of characters (74 + 85 + 78 + 75) to produce a single integer (312). This may suffice but the integer produced is not unique to "JUNK".

7 Hash Function Construction Methods (2) Concatenation of ASCII values: Represent A - Z as integers 0-25 and concatenate these values. So JUNK becomes: 9 20 13 10 01001101000110101010 = 315818 2 15 2 10 2 5 32768=32 3 1024=32 2 32=32 1 and so the concatenation can be expressed as: 9 * 32 3 + 20 * 32 2 + 13 * 32 1 + 10 = 315818

8 Hash Function Construction Methods (3) Using the mod operator: Allows reduction of large values into the range of actual hash table indices. in the example above if the table is an array of size 10000 --315818 % 10000 = 5818. Note here that the mod operator simply removes the first two digits and this makes the hashed value less unique to the string used to generate it.

Hash Function Construction Methods (4) Using the mod operator: Problems with use of mod operator - choice of exact table size is very important - if there are a large number of common factors - many collisions can be generated. e.g. table size 15 Key values 10, 20, 30, 40, 50, 60, 70 - here 7 values hash to three indices - 30,60 to 0-20,50 to 5 and 10,40,70 to 10 Solution - use an array size which is prime - thus it can't have any common factors with key values. 9

10 Hash Function Construction Methods (5) Using pseudo-random number generators: Given the same starting seed pseudo-random number generators always produce the same sequence of values. Here use a number generated from the key string as a seed and use the first resulting pseudo-random sequence value to generate the hash table index.

Hash Function Construction Methods (6) Folding Scrambles numeric values to remove the effects of recurring patterns- e.g. add the numeric values. Boundary Folding Breaks numbers into segments and adds digits in the segments. e.g. social security numbers: 534-65-9234 - breaks at dashes - hash value is 534 + 65 + 9234 Fan Folding Like boundary but reverses the digits in every other value. 11

12 Hash Function Construction Methods (7) Digit or character extraction Another way to scramble similar patterns in multiple keys - can be used in two ways: 1) Simply remove characters likely to be similar in many keys (or use dissimilar characters). 2) Mid-Square technique Represent key as a number. Square the number. Extract from the middle of the squared value enough bits to form an array index.

13 Linked Collision Processing Linked method of collision overflow handling divides memory into two parts: One part for primary storage (the hash table itself) A separate secondary part for collision overflow (may be either dynamically allocated or a separate fixed allocation area).

Linked Collision Processing (2) Linked collision overflow handling: Assume the hash table is composed of an array of objects which contain an instance variable which is a reference to an object of the same type. On collision: dynamically allocate a new node and place data into it. link the new node through the reference. overflowed items are stored in a linked list off the original table item. 14

15 Linked Collision Processing (3) Primary Memory (Hash Table) Secondary Memory (Overflow)

16 Linked Collision Processing (4) Search time with linked overflow If there have been many collisions - the search is no longer constant time complexity since a sequential search must be done through the linked list. Thus the time complexity becomes O(n) where n is the number of collisions.

Linear Collision Processing Also called Linear Probing - no primary and secondary memory - original array holds both. When a collision occurs: Start at hashed location (site of first collision) Proceed sequentially through the array until available storage is found - store at this location The array must be treated circularly since a probe could reach the end and need to start again at beginning. 17

18 Linear Collision Processing Problem with linear probing: clustering If the hash function produces one value more than others - parts of the table will quickly fill up while others are empty. Clustering causes further collisions later.

19 Analysis of Linear Probing Depends on the loading density of the hash table D - Number of Records in Hash Table / Size of Hash Table Array --- D = 1 indicates maximum density Average number of probes is proportional to: For a successful search: (½ (1 + 1/(1-D)) Unsuccessful search: (½ (1 + 1/(1-D) 2 )) for D = 0.1 --- 1.06 and 1.18 for D = 0.5 --- 1.50 and 2.50 for D = 0.8 --- 3.00 and 13.00 for D = 0.9 --- 5.50 and 50.50 This is why Linear Probing is referred to as a Density Dependant Search Technique

20 Rehashing Alternative to linear probing to avoid clustering. After a collision occurs - apply a different hash function to get a new location altogether. If new location is taken either resort to linear probing from there or apply a 3rd or 4th hash function Eventually some probing method must be used.

Quadratic Probing Another alternative to linear probing: if a collision occurs at initial index k: try to store in index k +1 for all successive collisions (k + 1, etc) try to store in index k + r 2 where r is a count of how many collisions have occurred Variation on rehashing-double hashing Use the second hash function to determine a fixed increment to move through the array. 21