Data Structures And Algorithms

Similar documents
5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Topic HashTable and Table ADT

CSE 214 Computer Science II Searching

Hashing Techniques. Material based on slides by George Bebis

Data and File Structures Chapter 11. Hashing

Hash Tables. Hashing Probing Separate Chaining Hash Function

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

CSCD 326 Data Structures I Hashing

Comp 335 File Structures. Hashing

Successful vs. Unsuccessful

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Lecture 4. Hashing Methods

Adapted By Manik Hosen

Understand how to deal with collisions

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

CS 2412 Data Structures. Chapter 10 Sorting and Searching

Open Addressing: Linear Probing (cont.)

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

UNIT 5. Sorting and Hashing

CS 350 : Data Structures Hash Tables

Hashing. October 19, CMPE 250 Hashing October 19, / 25

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

COMP171. Hashing.

HASH TABLES.

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

TABLES AND HASHING. Chapter 13

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Hash Table and Hashing

1. Attempt any three of the following: 15

Introduction to Hashing

DATA STRUCTURES/UNIT 3

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Question Bank Subject: Advanced Data Structures Class: SE Computer

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Worst-case running time for RANDOMIZED-SELECT

CS/COE 1501

Hash[ string key ] ==> integer value

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

SFU CMPT Lecture: Week 8

DATA STRUCTURES AND ALGORITHMS

Algorithms and Data Structures

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Cpt S 223. School of EECS, WSU

Part I Anton Gerdelan

Priority Queue. 03/09/04 Lecture 17 1

Hash Tables. Gunnar Gotshalks. Maps 1

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

HASH TABLES. Hash Tables Page 1

Hashing. Introduction to Data Structures Kyuseok Shim SoEECS, SNU.

Chapter 27 Hashing. Objectives

UNIT III BALANCED SEARCH TREES AND INDEXING

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

AAL 217: DATA STRUCTURES

Hashing. Hashing Procedures

Hashing. It s not just for breakfast anymore! hashing 1

HASH TABLES. Goal is to store elements k,v at index i = h k

Chapter 27 Hashing. Liang, Introduction to Java Programming, Eleventh Edition, (c) 2017 Pearson Education, Inc. All rights reserved.

Data Structures and Algorithms 2018

2) Which data structure is most suitable for search operations? A) Linked List B) Queue C) Hash table D) Sorted array

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Chapter 20 Hash Tables

More on Hashing: Collisions. See Chapter 20 of the text.

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Data Structures and Algorithms(10)

9/24/ Hash functions

4. SEARCHING AND SORTING LINEAR SEARCH

On my honor I affirm that I have neither given nor received inappropriate aid in the completion of this exercise.

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

Module 3: Hashing Lecture 9: Static and Dynamic Hashing. The Lecture Contains: Static hashing. Hashing. Dynamic hashing. Extendible hashing.

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

Algorithms and Data Structures

On my honor I affirm that I have neither given nor received inappropriate aid in the completion of this exercise.

CS 241 Analysis of Algorithms

Data Structures & File Management

Direct File Organization Hakan Uraz - File Organization 1

Data Structures. Topic #6

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Lecture 10 March 4. Goals: hashing. hash functions. closed hashing. application of hashing

CS 310 Advanced Data Structures and Algorithms

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

Data Structures and Algorithms(10)

Hashing 1. Searching Lists

Implementation of Linear Probing (continued)

Fundamental Algorithms

The dictionary problem

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

Lecture 4: Advanced Data Structures

Exam Datastrukturer. DIT960 / DIT961, VT-18 Göteborgs Universitet, CSE

Transcription:

Data Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2017-2018

Searching Search: find if a key exists in a given set Searching algorithms: linear (sequential) search binary search Search based on a hash function

Linear/sequential Search Algorithm: go through the elements one by one, if Code: found, return bool linearsearch( int A[], int size, int key) { for ( i=0 ; i < size ; i++) if (A[i] == key ) return true; return false; } What is the complexity?

Binary Search Assumption: the array elements are sorted Algorithm: compare key with element at the middle if ( key == element) return true; if ( key > element ) search left sub array else search right sub array Question: when to stop? how to determin key is not found? What is the complexity?

Binary Search Code: bool binarysearch( int A[], int size, int key) { int L = 0, R = size 1; int M = (L+R) / 2; while ( L <= R ) { if if ( key == A[M] ) return true; else ( key > A[M] ) L = M+1; else R = M 1; M = (L+R)/2; } return false; }

Hash function Hash function is a function that gives the result based on the input or part of the input. Example of a hash function: f(x) = x % 10 Assume we store the elements in an array based on the hash function the index of value x is f(x) A[ f(x) ] = x

Hash function Example: store the following in an array of size 10, given that the hash function is f(x) = x % 10 1, 18, 15, 930, 77, 29 0 1 2 3 4 5 6 7 8 9 930 1 15 77 18 29 is 44 in the array? f(44) = 44 % 10 = 4, A[4] is empty 44 not in array

Hash function What is the advantage of using a hash function? What is the problem when using a hash function? two inputs hash to the same value ex. f(x) = x % 10 f(15) = 5 f(225) = 5 What to do if two values hash to the same index?

Collision Collision: when two distinct values v1 and v2 hash to the same index How to deal with collisions? Use a perfect hash function: i.e. no two values hash to the same index this is practically impossible since the data is unknown A good hash function is a function that avoids collisions

Hash functions Some examples of hash functions: Division Folding Mid-Square Extraction Radix transformation

Hash functions Division: based on the modulo operator: h(x) = x % (array size) It is better to have array size a prime number

Hash functions Folding: the key is divided into parts, and the parts are processed to generate the index (address) Example: divide the key into parts of three digits, then add the digits, then take the modulo array size ID = 199805535, array size = 101 h(199805535) = (199 + 805 + 535 ) % 101= 24

Hash functions Mid-Square: The key is squared and the middle is taken Example: key = 3121, size = 1000 3121^2 = 9740641, middle = 406 It is better to use a power of 2 size and use the middle of the binary representation Example: key = 3121, size = 1024 3121^2 = 9740641 = 100101001010000101100001 h(3121) = 0101000010 = 322

Hash functions Extraction: take a part of the key, Example: take the first 4 digits of the ID number: h(199805535) = 5535 This method is a useful when part of the key is common in the data, ID numbers usually start with the same digits

Hash functions Radix transformation: the key is converted to another number system, and the value is divided modulo array size: Example: key = 345, size = 100, base 9 h(345) = ( (423) % 100 ) = 23 h(245) = ( (309) % 100 ) = 9

Collision resolution Collision: two keys hash to the same address (index) How to deal with collision: Use a perfect hash function, not practical Open addressing: Find an availble position to place the colliding key linear probing quadratic probing double hashing Chaining: use a linked list to store the keys

Collision resolution Linear probing: look for the next available position, wrap around the end of the array Ex. h(x) = x % 10, size = 10 16, 22, 77, 48, 35, 62, 47, 99 0 1 2 3 4 5 6 7 8 9

Collision resolution Linear probing tends to create clusters. elements tend to group near each other The empty position following a cluster has a higher chance to be filled. this is proportional to the cluster size, the bigger the cluster, the higher the probability

Collision resolution Quadratic probing: look for positions using a quadratic formula: h(x) + i i = 1, -1, 4, -4, 9, -9,. Ex. h(x) = x % 10, size = 10 16, 22, 77, 48, 35, 62, 47, 99 0 1 2 3 4 5 6 7 8 9

Collision resolution Assume key = 9, h(x) = x %19 and the array is full except A[3], what is the sequence of indices (probes) that are tried? Quadratic probing avoids clustering but will generate secondary clusters since two elements that hash to the same index, will generate the same probe sequence

Collision resolution How to know when to stop if the key is not in the array? If the size of the array is a prime number of the form 4j + 3, where j is an integer, the probing sequence is guarenteed to cover all the indices

Collision resolution Double hashing: if a collision occures, use another hash function probe sequence: h(x), h(x)+h2(x), h(x) + 2h2(x), h(x)+3h2(x) Example: h(x) = x%19 h2(x) = x%13 What are the probe sequences for x = 3, x = 22

Comparison

Collision resolution Chaining: store a pointer to a linked list in the array, and store the data in the linked list The list can be sorted for efficiency Chaining requires more space to store the pointers

Collision resolution Separate chaining:

Collision resolution Coalesced chaining: 2D array: Size x 2 A[size][2] the second column stores the index of the next element in the chain Example: store the following data, h(x) = x % 10-2 position is available 12, 23, 15, 72, 49, 35, 9, 22-1 element is last in the chain collision resolution: linear probing

Example 12, 23, 15, 72, 49, 35, 9, 22 0 1 2 3 4 5 6 7 8 9

Example 12, 23, 15, 72, 49, 35, 9, 22 0 9-1 1-2 2 12 4 3 23-1 4 72 7 5 15 6 6 35-1 7 22-1 8-2 9 49 0

Deletion What happens if you delete a value from a hash table? Example: arrange the data: 11, 34, 62, 4, 91 use h(x) = x%10, and linear probing then delete data 34, 62 then search for 4 0 1 2 3 4 5 6 7 8 9

Deletion The position of the deleted item should not be marked as empty, why? Can we reuse the position of the deleted element? if you have many delete operations and few insert operations, you should rehash the table after a number of deletions Rehash: arrange the data using a different table size and/or different hash function

THE END