TABLES AND HASHING. Chapter 13

Size: px
Start display at page:

Download "TABLES AND HASHING. Chapter 13"

Transcription

1 Data Structures Dr Ahmed Rafat Abas Computer Science Dept, Faculty of Computer and Information, Zagazig University

2 TABLES AND HASHING Chapter 13

3 13.1 Alternative methods of storing data hashing is a technique of storing data so that the amount of work required to retrieve a particular item is independent of the length of the list.

4 Example of hashing is the way arrays are stored and used. The array data type stores data at a location given by the array index. The location at which element i of an array is stored is calculated by: starting at the base address of an array and adding the size of each element of the array multiplied by i to this base address. This method of base plus offset means that the time required to locate any array element is a constant, independent of: its location in the array, the size of the array.

5 13.2 The table data structure The idea behind storing array elements can be generalized to allow any data to be stored in a onedimensional form. Example: suppose we wish to count the number of times each word occurs in a file. Word Frequency Define an array where each element in the array stores the count for a particular word. In doing so, we are faced with two problems: The character string forming a word is not an integer and, in C++, cannot be used as an array index. There are more than 400,000 words in the English language, only a small fraction of which will be used in any file of moderate length. To cope with these problems: a new data structure called a table is defined.

6 Tables are similar to arrays, but the word array refers to the actual data structure found in the C++ language. A table consists of a function or formula which maps members of one data type D (for example, the words in the word-counting problem) onto another type, called the index I (usually non-negative integers), which is used to store and access the original data.

7 Properties of a table data type A function which calculates the value of the index I given the data D. (Such a function in the word-counting problem would calculate the array index at which a particular word is stored.) Table insertion: a new data item (for example, a word) may be inserted into the table. Table retrieval: a table may be searched for a data item and, if present, it, and associated values, may be retrieved. (Given a word, the table is searched to see if that word is present and, if so, the count of the number of times it has been used may be retrieved.) Table deletion: a data item may be deleted from the table.

8 Provided the function which converts the original data into the table index I is efficient, a table can represent a considerable increase in efficiency over the searching routines studied earlier. Given the data for which you are searching: its location (if present) is calculated directly from the data, so you need only look in one place in the table to see if the word is there.

9 Example: if the word thing was calculated to have index 39, you need only look at location 39 in the table. If this location is empty, you know immediately that the word thing is not in the table, while if location 39 is occupied, it will be by the word thing and its associated count will be found at that location.

10 In practice, such clean access to the table is rarely found. This is because: You are usually trying to map data from a very large set (such as the 400,000 words in English) into a much smaller space (say, 1000 array elements). Since you don t know in advance what data items will occur, it is very difficult to find a transformation function which will map all the different data items that actually occur into different locations in the array.

11 Example: if the word thing maps into location 39, another word, such as computer, might also map into location 39. When this happens, a collision is said to occur. A method is needed to handle collisions in such a way that both these words can be stored in the array and, of course, can both be retrieved.

12 13.3 Hashing Principles The process of mapping large amount of data into a smaller table is called hashing. The function which provides the map between the original data and the smaller table in which it is finally stored is a hash function. The table itself is called a hash table.

13 Fig Mapping items into a hash table

14 Operations of the table data type are implemented in hashing as follows: 1. The hash function provides the map which translates the data D into the index I. 2. A new data item D is inserted into the table by using the hash function to calculate its index I. If this location is free, the item is inserted into the table. If not, a procedure for resolving the collision must be given.

15 3. An item of data D may be retrieved by using the hash function to calculate its index I. Position I in the table is checked. If it is empty, item D is not present. If it is occupied, its contents must be tested to see: if they match item D, then D has been found. If not, there are two possibilities: (i) item D is not present in the table; (ii) item D is present in the table, but when it was inserted, the other item at index I was already there, causing a collision. In either case, the same procedure used for resolving a collision during insertion of an item must be followed to see if D is located somewhere else in the table.

16 4. Item deletion may proceed in a similar manner to insertion. the hash function is called to determine the location of the item if it is present, it may then be deleted. if a collision occurred when the item was originally inserted, care must be used in deleting it.

17 Choosing a hash function A good hash function should satisfy two criteria: 1. It should be quick to compute. 2. It should minimize the number of collisions.

18 1. Speed of computation The hash function should be simple, and minimize timeconsuming operations such as multiplication, division, or more complex functions such as square roots. Speed is important, because the hash function is used every time the table is accessed.

19 2. Minimization of collisions A hash function should spread the incoming data as evenly as possible over the hash table. Example: a bad hash function in the case of counting words: suppose we have a hash table of 1000 elements, and we choose a hash function that takes the ASCII code of the first character in the word and uses that as an array index. This method would provide only 26 different indexes, so that 974 sites in the table are not directly accessible by the hash function. Any two words beginning with the same letter would result in a collision.

20 Examples of commonly used hash functions 1. Truncation. 2. Folding. 3. Modular arithmetic.

21 1. Truncation Part of the key is ignored, with the remainder truncated or concatenated to form the index. Example: if we are storing 7-digit phone numbers in a hash table with 1000 elements, we may ignore all but the 2nd, 4th, and 7th digits in the phone number, so that a number such as would be indexed at location 338. This method is quick, as it involves accessing a few digits in the input data. the number of collisions it produces depends on how uniform the input data are.

22 If the table contains phone numbers for people living within a small area then the first three digits may be the same for all the numbers. In this case, all phone numbers would be hashed into indexes beginning with 3 in the table, so that 900 locations would remain unused. This problem could be solved by choosing the last three digits in the phone number instead. In general you should consider what regularities may be present in the data before deciding on a hash function.

23 2. Folding The data can be split up into smaller chunks which are then folded together in some form. Example: a 7-digit phone number could be split into three groups of 2, 2, and 3 digits, which are then added together and truncated to produce an index in the range For the number , we produce the three numbers 73, 13, and 018, which add up to 104, which may be used as the index. Another number such as would split into 89, 96, and 989, which add up to Since this number is larger than the highest allowed index in the hash table, we truncate it by saving only the last three digits, giving an index of 174.

24 3. Modular arithmetic Convert the data into an integer (using truncation, folding, or some other method), divide by the size of the hash table, and take the remainder as the index (for example, by using the % operator in C++). Example: modular arithmetic is used in the second example under folding : the phone number produced the index 1174 under the folding procedure. so this number is taken modulo the hash table size (1000) to produce the final index of 174.

25 Collision resolution with open addressing There are two main ways which collisions may be resolved: Open addressing: the amount of space available for storing data is fixed at compile time by declaring a fixed array for the hash table. Chaining: an array is also declared for the hash table, but each element in the array is a pointer to a linked list which holds all data with the same index.

26 In open addressing, when a collision occurs, another unoccupied location in the array should be found such that: a method of choosing an alternative location should be fast, the number of additional collisions that will occur as more data are added to the table should be minimized. Collision resolution methods in open addressing 1. Linear probing 2. Quadratic probing 3. Item-dependent probe distance 4. Pseudorandom number generator

27 Fig shows an item being inserted into a hash table using linear probing to resolve the collision. The item is mapped to location 4 by the hash function, but locations 4, 5, and 6 are already full, so the collision resolution method eventually places the item in location 7.

28 1. Linear probing If a collision occurs when inserting a new item into the table we probe forward in the array, one step at a time, until an empty slot is found to store the new data item. When retrieving this data: Calculate the hash function, Test the location given by the index to see if the required data item is there If not, examine each array element from the index location until the item is found, or until we encounter an empty site or examine all locations in the table, at which point we know the item is not in the table. When using linear probing, the array is circular, so that if the search past the end of the array, it starts at element 0.

29 The disadvantage of linear probing is that data tend to cluster around certain points in the table, leaving other parts of the table not used. This results in lengthy sequential searches through the table when retrieving data and therefore, the search efficiency is reduced.

30 How does clustering appears? suppose a hash function distributes data uniformly over a hash table of size n. When inserting the first element at location i the next element that is hashed to location i is placed in location i + 1. Site i + 1 has twice the chance of being filled in by the second element as any other site in the hash table. If sites i and i + 1 are filled by the first two elements, then site i + 2 will have three times the chance of any other element of being filled in by the third element, and so on. Therefore, any empty site at the end of a sequence of filled sites will receive any item that is hashed to any of the filled sites or that site directly. This results in long chains or clusters which require long sequences of comparisons in the retrieval process and thus reducing the search efficiency.

31 2. Quadratic probing One way of resolving the clustering problem is to use a collision resolution function that depends on: the index value, the number of previous attempts made to resolve the collision. If a collision occurs at position i, locations i + 1 2, i + 2 2, i + 3 2, and so on, are tested until an empty site is found.

32 Although this method reduces clustering, it does not probe every site in the table. if the table size is a prime number the maximum number of probed sites in a hash table of size n is (n+1)/2, so that approximately half the table is probed. Example: if the table size is n = 11, then for an element mapped to location 0, the six sites 0, 1, 4, 9, 5 (16 mod 11), and 3 (25 mod 11) will be probed. The next location to be probed by the quadratic probing algorithm would be site 3 again (36 mod 11), and all further sites produced by this algorithm will be one of the six already visited.

33 For table sizes that are not prime numbers the number of different sites probed by the quadratic probing algorithm can be less or more than (n+1)/2. Example: if the table size is n = 10, six sites are probed (sites 0, 1, 4, 9, 6, 5). For a table size that is a perfect square few sites will be probed. Example: if the table size is 16, only the four sites 0, 1, 4, and 9 are probed.

34 To maximize the number of probed sites with the same hash function value: avoid choosing table sizes that are perfect squares (or are divisible by perfect squares) choose your table size as either a prime number or a product of two different prime numbers.

35 3. item-dependent probe distance is used to truncate the data and the truncated form is used to calculate the increment. Example: the last digit of a phone number is used as an increment. 4. pseudorandom number generator Is used to generate a random increment. A pseudorandom number generator uses a seed value to generate a sequence of integers that appear random, but are actually calculated using a deterministic rule. provided the same seed is used for successive runs, the same sequence of numbers will be generated. As long as we keep track of the seed and where we are in the sequence of numbers we will always know where to probe next.

36 Deleting elements from hash tables Deletion is difficult to do efficiently in a hash table where open addressing is used. The reason is that : In any table where collisions have occurred during the insertion of data, there is a chain of items with the same index. If we want to delete any item that is not at the end of the chain, we will remove a link in the chain, thus disconnecting the elements beyond that link.

37 Example: suppose we have stored four items with the same index at sites i, j, k, l, and we wish to delete item j. First, locate the item by using the hash function to calculate its index. This will direct us to site i, where the first item with that index is stored. This is not the correct item, so we apply whatever collision resolution system we are using to locate the next site, which contains item j, the one we are looking for. If we delete j from that site, then the site will be empty. A subsequent search for items k or l will start by using the hash function to find their index, which, will start in site i. Applying the collision resolution system will lead us to the site formerly occupied by j. However, since j has been deleted, we will be confronted with an empty site, which is the signal that no more items of that index are present, so the search will terminate with the conclusion that k and l are not present in the table.

38 There are several solutions to this problem: 1. shifting the remaining items forward in the list when an item is deleted, 2. using a special flag which marks an empty cell as deleted rather than just empty so that searches will continue through this cell to see if any more items with that index are present. However, all these methods are rather slow and cumbersome.

39 Collision resolution with chaining The second method of resolving collisions involves using dynamic data allocation and linked lists. The hash table and associated hash function are defined in the usual manner, except that now the array is an array of pointers to linked lists, one list for each index.

40 Fig. 13.3, the array of pointers is shown as the vertical column of boxes on the left, with each box labelled with its hash function value. When a data item maps to a particular location, an extra node is allocated and added to the corresponding list. Note that a chained hash table can store more data items than the number of cells in the table. In this case, seven items are stored in a table with six cells.

41 When an item is inserted: If no data are stored at an index site, the corresponding pointer is set to 0. If an item is to be inserted, the hash function is used to find the list to which the item is to be added, and the standard insertion procedures for a linked list are used to insert the item. If a collision occurs, we simply add another node to the end of this list at the corresponding index.

42 When an item is to be retrieved: we use the hash function to calculate its index, and look at the corresponding pointer. If the pointer is 0, the item is not present. If the pointer points to a list, that list is traversed sequentially to see if the desired item is present. With a properly designed hash function, none of these lists should contain more than a few items, so sequential search is an efficient way to search them.

43 Deletion of an item from a table: The hash function is called to determine the index of the item to be deleted. The linked list at that index is searched and, if the item is present, its node is spliced out of the list in the usual way. We need not worry about isolating other parts of the table.

44 The disadvantage of using chaining is that a linked list requires extra storage space for the pointers connecting the list elements.

AAL 217: DATA STRUCTURES

AAL 217: DATA STRUCTURES Chapter # 4: Hashing AAL 217: DATA STRUCTURES The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions, and finds in constant average

More information

Hash Table and Hashing

Hash Table and Hashing Hash Table and Hashing The tree structures discussed so far assume that we can only work with the input keys by comparing them. No other operation is considered. In practice, it is often true that an input

More information

Hash Tables. Hashing Probing Separate Chaining Hash Function

Hash Tables. Hashing Probing Separate Chaining Hash Function Hash Tables Hashing Probing Separate Chaining Hash Function Introduction In Chapter 4 we saw: linear search O( n ) binary search O( log n ) Can we improve the search operation to achieve better than O(

More information

Understand how to deal with collisions

Understand how to deal with collisions Understand the basic structure of a hash table and its associated hash function Understand what makes a good (and a bad) hash function Understand how to deal with collisions Open addressing Separate chaining

More information

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5. 5. Hashing 5.1 General Idea 5.2 Hash Function 5.3 Separate Chaining 5.4 Open Addressing 5.5 Rehashing 5.6 Extendible Hashing Malek Mouhoub, CS340 Fall 2004 1 5. Hashing Sequential access : O(n). Binary

More information

COMP171. Hashing.

COMP171. Hashing. COMP171 Hashing Hashing 2 Hashing Again, a (dynamic) set of elements in which we do search, insert, and delete Linear ones: lists, stacks, queues, Nonlinear ones: trees, graphs (relations between elements

More information

Worst-case running time for RANDOMIZED-SELECT

Worst-case running time for RANDOMIZED-SELECT Worst-case running time for RANDOMIZED-SELECT is ), even to nd the minimum The algorithm has a linear expected running time, though, and because it is randomized, no particular input elicits the worst-case

More information

9/24/ Hash functions

9/24/ Hash functions 11.3 Hash functions A good hash function satis es (approximately) the assumption of SUH: each key is equally likely to hash to any of the slots, independently of the other keys We typically have no way

More information

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1 Hash Tables Hash functions Open addressing November 24, 2017 Hassan Khosravi / Geoffrey Tien 1 Review: hash table purpose We want to have rapid access to a dictionary entry based on a search key The key

More information

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1 Hash Tables Hash functions Open addressing Cinda Heeren / Geoffrey Tien 1 Hash functions A hash function is a function that map key values to array indexes Hash functions are performed in two steps Map

More information

Question Bank Subject: Advanced Data Structures Class: SE Computer

Question Bank Subject: Advanced Data Structures Class: SE Computer Question Bank Subject: Advanced Data Structures Class: SE Computer Question1: Write a non recursive pseudo code for post order traversal of binary tree Answer: Pseudo Code: 1. Push root into Stack_One.

More information

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1 Hash Tables Outline Definition Hash functions Open hashing Closed hashing collision resolution techniques Efficiency EECS 268 Programming II 1 Overview Implementation style for the Table ADT that is good

More information

Hashing. Hashing Procedures

Hashing. Hashing Procedures Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements

More information

Comp 335 File Structures. Hashing

Comp 335 File Structures. Hashing Comp 335 File Structures Hashing What is Hashing? A process used with record files that will try to achieve O(1) (i.e. constant) access to a record s location in the file. An algorithm, called a hash function

More information

Hashing Techniques. Material based on slides by George Bebis

Hashing Techniques. Material based on slides by George Bebis Hashing Techniques Material based on slides by George Bebis https://www.cse.unr.edu/~bebis/cs477/lect/hashing.ppt The Search Problem Find items with keys matching a given search key Given an array A, containing

More information

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure Hashing 1 Hash Tables We ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees. The implementation of hash tables is called hashing. Hashing is a technique

More information

HASH TABLES.

HASH TABLES. 1 HASH TABLES http://en.wikipedia.org/wiki/hash_table 2 Hash Table A hash table (or hash map) is a data structure that maps keys (identifiers) into a certain location (bucket) A hash function changes the

More information

III Data Structures. Dynamic sets

III Data Structures. Dynamic sets III Data Structures Elementary Data Structures Hash Tables Binary Search Trees Red-Black Trees Dynamic sets Sets are fundamental to computer science Algorithms may require several different types of operations

More information

Topic HashTable and Table ADT

Topic HashTable and Table ADT Topic HashTable and Table ADT Hashing, Hash Function & Hashtable Search, Insertion & Deletion of elements based on Keys So far, By comparing keys! Linear data structures Non-linear data structures Time

More information

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management Hashing Symbol Table Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management In general, the following operations are performed on

More information

HASH TABLES. Hash Tables Page 1

HASH TABLES. Hash Tables Page 1 HASH TABLES TABLE OF CONTENTS 1. Introduction to Hashing 2. Java Implementation of Linear Probing 3. Maurer s Quadratic Probing 4. Double Hashing 5. Separate Chaining 6. Hash Functions 7. Alphanumeric

More information

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 Prof. John Park Based on slides from previous iterations of this course Today s Topics Overview Uses and motivations of hash tables Major concerns with hash

More information

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables CITS2200 Data Structures and Algorithms Topic 15 Hash Tables Introduction to hashing basic ideas Hash functions properties, 2-universal functions, hashing non-integers Collision resolution bucketing and

More information

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms Hashing CmSc 250 Introduction to Algorithms 1. Introduction Hashing is a method of storing elements in a table in a way that reduces the time for search. Elements are assumed to be records with several

More information

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques Hashing Manolis Koubarakis 1 The Symbol Table ADT A symbol table T is an abstract storage that contains table entries that are either empty or are pairs of the form (K, I) where K is a key and I is some

More information

Chapter 20 Hash Tables

Chapter 20 Hash Tables Chapter 20 Hash Tables Dictionary All elements have a unique key. Operations: o Insert element with a specified key. o Search for element by key. o Delete element by key. Random vs. sequential access.

More information

UNIT III BALANCED SEARCH TREES AND INDEXING

UNIT III BALANCED SEARCH TREES AND INDEXING UNIT III BALANCED SEARCH TREES AND INDEXING OBJECTIVE The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions and finds in constant

More information

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures? HASHING By, Durgesh B Garikipati Ishrath Munir Hashing Sorting was putting things in nice neat order Hashing is the opposite Put things in a random order Algorithmically determine position of any given

More information

More on Hashing: Collisions. See Chapter 20 of the text.

More on Hashing: Collisions. See Chapter 20 of the text. More on Hashing: Collisions See Chapter 20 of the text. Collisions Let's do an example -- add some people to a hash table of size 7. Name h = hash(name) h%7 Ben 66667 6 Bob 66965 3 Steven -1808493797-5

More information

Hashing for searching

Hashing for searching Hashing for searching Consider searching a database of records on a given key. There are three standard techniques: Searching sequentially start at the first record and look at each record in turn until

More information

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function. Hash Tables Chapter 20 CS 3358 Summer II 2013 Jill Seaman Sections 201, 202, 203, 204 (not 2042), 205 1 What are hash tables?! A Hash Table is used to implement a set, providing basic operations in constant

More information

Hash Functions. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

Hash Functions. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST Hash Functions Kuan-Yu Chen ( 陳冠宇 ) 2018/12/12 @ TR-212, NTUST Review A binary heap is a complete binary tree in which every node satisfies the heap property Min Heap Max Heap A binomial heap HH is a set

More information

Data Structures And Algorithms

Data Structures And Algorithms Data Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2017-2018 Searching Search: find if a key exists in a given set Searching algorithms: linear (sequential) search binary search Search

More information

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Introduction hashing: a technique used for storing and retrieving information as quickly as possible. Lecture IX: Hashing Introduction hashing: a technique used for storing and retrieving information as quickly as possible. used to perform optimal searches and is useful in implementing symbol tables. Why

More information

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course Today s Topics Review Uses and motivations of hash tables Major concerns with hash tables Properties Hash function Hash

More information

STRUKTUR DATA. By : Sri Rezeki Candra Nursari 2 SKS

STRUKTUR DATA. By : Sri Rezeki Candra Nursari 2 SKS STRUKTUR DATA By : Sri Rezeki Candra Nursari 2 SKS Literatur Sjukani Moh., (2007), Struktur Data (Algoritma & Struktur Data 2) dengan C, C++, Mitra Wacana Media Utami Ema. dkk, (2007), Struktur Data (Konsep

More information

Algorithms and Data Structures

Algorithms and Data Structures Lesson 4: Sets, Dictionaries and Hash Tables Luciano Bononi http://www.cs.unibo.it/~bononi/ (slide credits: these slides are a revised version of slides created by Dr. Gabriele D Angelo)

More information

DATA STRUCTURES/UNIT 3

DATA STRUCTURES/UNIT 3 UNIT III SORTING AND SEARCHING 9 General Background Exchange sorts Selection and Tree Sorting Insertion Sorts Merge and Radix Sorts Basic Search Techniques Tree Searching General Search Trees- Hashing.

More information

Adapted By Manik Hosen

Adapted By Manik Hosen Adapted By Manik Hosen Basic Terminology Question: Define Hashing. Ans: Concept of building a data structure that can be searched in O(l) time is called Hashing. Question: Define Hash Table with example.

More information

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2 CS 310 Hash Tables, Page 1 Hash Tables key value key value Hashing Function CS 310 Hash Tables, Page 2 The hash-table data structure achieves (near) constant time searching by wasting memory space. the

More information

Data and File Structures Chapter 11. Hashing

Data and File Structures Chapter 11. Hashing Data and File Structures Chapter 11 Hashing 1 Motivation Sequential Searching can be done in O(N) access time, meaning that the number of seeks grows in proportion to the size of the file. B-Trees improve

More information

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing

More information

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion, Introduction Chapter 5 Hashing hashing performs basic operations, such as insertion, deletion, and finds in average time 2 Hashing a hash table is merely an of some fixed size hashing converts into locations

More information

1. Attempt any three of the following: 15

1. Attempt any three of the following: 15 (Time: 2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory. (2) Make suitable assumptions wherever necessary and state the assumptions made. (3) Answers to the same question must be written

More information

Introduction To Hashing

Introduction To Hashing Introduction To Hashing In this section of notes you will learn an approach for organizing information that allows for searches in constant time Searching For Information Algorithms You Know Linear search

More information

2 Fundamentals of data structures

2 Fundamentals of data structures 2.6 Hash tables Learning objectives: Be familiar with the concept of a hash table and its uses. Be able to apply simple hashing algorithms. Know what is meant by a collision and how collisions are handled

More information

CSCD 326 Data Structures I Hashing

CSCD 326 Data Structures I Hashing 1 CSCD 326 Data Structures I Hashing Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional searching time complexity available is O(log2n)

More information

CS/COE 1501

CS/COE 1501 CS/COE 1501 www.cs.pitt.edu/~lipschultz/cs1501/ Hashing Wouldn t it be wonderful if... Search through a collection could be accomplished in Θ(1) with relatively small memory needs? Lets try this: Assume

More information

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved Introducing Hashing Chapter 21 Contents What Is Hashing? Hash Functions Computing Hash Codes Compressing a Hash Code into an Index for the Hash Table A demo of hashing (after) ARRAY insert hash index =

More information

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation Announcements HW1 PAST DUE HW2 online: 7 questions, 60 points Nat l Inst visit Thu, ok? Last time: Continued PA1 Walk Through Dictionary ADT: Unsorted Hashing Today: Finish up hashing Sorted Dictionary

More information

CS 350 : Data Structures Hash Tables

CS 350 : Data Structures Hash Tables CS 350 : Data Structures Hash Tables David Babcock (courtesy of James Moscola) Department of Physical Sciences York College of Pennsylvania James Moscola Hash Tables Although the various tree structures

More information

CSE 214 Computer Science II Searching

CSE 214 Computer Science II Searching CSE 214 Computer Science II Searching Fall 2017 Stony Brook University Instructor: Shebuti Rayana shebuti.rayana@stonybrook.edu http://www3.cs.stonybrook.edu/~cse214/sec02/ Introduction Searching in a

More information

Fundamental Algorithms

Fundamental Algorithms Fundamental Algorithms Chapter 7: Hash Tables Michael Bader Winter 2011/12 Chapter 7: Hash Tables, Winter 2011/12 1 Generalised Search Problem Definition (Search Problem) Input: a sequence or set A of

More information

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech Hashing Dr. Ronaldo Menezes Hugo Serrano Agenda Motivation Prehash Hashing Hash Functions Collisions Separate Chaining Open Addressing Motivation Hash Table Its one of the most important data structures

More information

Cpt S 223. School of EECS, WSU

Cpt S 223. School of EECS, WSU Hashing & Hash Tables 1 Overview Hash Table Data Structure : Purpose To support insertion, deletion and search in average-case constant t time Assumption: Order of elements irrelevant ==> data structure

More information

Direct File Organization Hakan Uraz - File Organization 1

Direct File Organization Hakan Uraz - File Organization 1 Direct File Organization 2006 Hakan Uraz - File Organization 1 Locating Information Ways to organize a file for direct access: The key is a unique address. The key converts to a unique address. The key

More information

CS 2412 Data Structures. Chapter 10 Sorting and Searching

CS 2412 Data Structures. Chapter 10 Sorting and Searching CS 2412 Data Structures Chapter 10 Sorting and Searching Some concepts Sorting is one of the most common data-processing applications. Sorting algorithms are classed as either internal or external. Sorting

More information

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2 Hashing Hashing A hash function h maps keys of a given type to integers in a fixed interval [0,N-1]. The goal of a hash function is to uniformly disperse keys in the range [0,N-1] 5/1/2006 Algorithm analysis

More information

Algorithms and Data Structures

Algorithms and Data Structures Algorithms and Data Structures Open Hashing Ulf Leser Open Hashing Open Hashing: Store all values inside hash table A General framework No collision: Business as usual Collision: Chose another index and

More information

CPSC 259 admin notes

CPSC 259 admin notes CPSC 9 admin notes! TAs Office hours next week! Monday during LA 9 - Pearl! Monday during LB Andrew! Monday during LF Marika! Monday during LE Angad! Tuesday during LH 9 Giorgio! Tuesday during LG - Pearl!

More information

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

HO #13 Fall 2015 Gary Chan. Hashing (N:12) HO #13 Fall 2015 Gary Chan Hashing (N:12) Outline Motivation Hashing Algorithms and Improving the Hash Functions Collisions Strategies Open addressing and linear probing Separate chaining COMP2012H (Hashing)

More information

BBM371& Data*Management. Lecture 6: Hash Tables

BBM371& Data*Management. Lecture 6: Hash Tables BBM371& Data*Management Lecture 6: Hash Tables 8.11.2018 Purpose of using hashes A generalization of ordinary arrays: Direct access to an array index is O(1), can we generalize direct access to any key

More information

Dictionaries and Hash Tables

Dictionaries and Hash Tables Dictionaries and Hash Tables Nicholas Mainardi Dipartimento di Elettronica e Informazione Politecnico di Milano nicholas.mainardi@polimi.it 14th June 2017 Dictionaries What is a dictionary? A dictionary

More information

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys). Tirgul 7 Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys belong to a universal group of keys, U = {1... M}.

More information

SFU CMPT Lecture: Week 8

SFU CMPT Lecture: Week 8 SFU CMPT-307 2008-2 1 Lecture: Week 8 SFU CMPT-307 2008-2 Lecture: Week 8 Ján Maňuch E-mail: jmanuch@sfu.ca Lecture on June 24, 2008, 5.30pm-8.20pm SFU CMPT-307 2008-2 2 Lecture: Week 8 Universal hashing

More information

CSI33 Data Structures

CSI33 Data Structures Outline Department of Mathematics and Computer Science Bronx Community College November 30, 2016 Outline Outline 1 Chapter 13: Heaps, Balances Trees and Hash Tables Hash Tables Outline 1 Chapter 13: Heaps,

More information

Outline. hash tables hash functions open addressing chained hashing

Outline. hash tables hash functions open addressing chained hashing Outline hash tables hash functions open addressing chained hashing 1 hashing hash browns: mixed-up bits of potatoes, cooked together hashing: slicing up and mixing together a hash function takes a larger,

More information

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018 HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018 Acknowledgement The set of slides have used materials from the following resources Slides for textbook by Dr. Y.

More information

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell Hash Tables CS 311 Data Structures and Algorithms Lecture Slides Wednesday, April 22, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks CHAPPELLG@member.ams.org 2005

More information

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017 Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017 Acknowledgement The set of slides have used materials from the following resources Slides

More information

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Spring 2017 Acknowledgement The set of slides have used materials from the following resources Slides for

More information

Hash table basics. ate à. à à mod à 83

Hash table basics. ate à. à à mod à 83 Hash table basics After today, you should be able to explain how hash tables perform insertion in amortized O(1) time given enough space ate à hashcode() à 48594983à mod à 83 82 83 ate 84 } EditorTrees

More information

CS 261 Data Structures

CS 261 Data Structures CS 261 Data Structures Hash Tables Open Address Hashing ADT Dictionaries computer kəәmˈpyoōtəәr noun an electronic device for storing and processing data... a person who makes calculations, esp. with a

More information

Lecture 4. Hashing Methods

Lecture 4. Hashing Methods Lecture 4 Hashing Methods 1 Lecture Content 1. Basics 2. Collision Resolution Methods 2.1 Linear Probing Method 2.2 Quadratic Probing Method 2.3 Double Hashing Method 2.4 Coalesced Chaining Method 2.5

More information

Part I Anton Gerdelan

Part I Anton Gerdelan Hash Part I Anton Gerdelan Review labs - pointers and pointers to pointers memory allocation easier if you remember char* and char** are just basic ptr variables that hold an address

More information

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Spring 2018 The set of slides have used materials from the following resources Slides for textbook by Dr.

More information

Hashing 1. Searching Lists

Hashing 1. Searching Lists Hashing 1 Searching Lists There are many instances when one is interested in storing and searching a list: A phone company wants to provide caller ID: Given a phone number a name is returned. Somebody

More information

Data Structures & File Management

Data Structures & File Management The Cost of Searching 1 Given a collection of N equally-likely data values, any search algorithm that proceeds by comparing data values to each other must, on average, perform at least Θ(log N) comparisons

More information

HASH TABLES. Goal is to store elements k,v at index i = h k

HASH TABLES. Goal is to store elements k,v at index i = h k CH 9.2 : HASH TABLES 1 ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++, GOODRICH, TAMASSIA AND MOUNT (WILEY 2004) AND SLIDES FROM JORY DENNY AND

More information

Hash Tables. Gunnar Gotshalks. Maps 1

Hash Tables. Gunnar Gotshalks. Maps 1 Hash Tables Maps 1 Definition A hash table has the following components» An array called a table of size N» A mathematical function called a hash function that maps keys to valid array indices hash_function:

More information

Hash Tables and Hash Functions

Hash Tables and Hash Functions Hash Tables and Hash Functions We have seen that with a balanced binary tree we can guarantee worst-case time for insert, search and delete operations. Our challenge now is to try to improve on that...

More information

Hashing. October 19, CMPE 250 Hashing October 19, / 25

Hashing. October 19, CMPE 250 Hashing October 19, / 25 Hashing October 19, 2016 CMPE 250 Hashing October 19, 2016 1 / 25 Dictionary ADT Data structure with just three basic operations: finditem (i): find item with key (identifier) i insert (i): insert i into

More information

DATA STRUCTURES AND ALGORITHMS

DATA STRUCTURES AND ALGORITHMS LECTURE 11 Babeş - Bolyai University Computer Science and Mathematics Faculty 2017-2018 In Lecture 9-10... Hash tables ADT Stack ADT Queue ADT Deque ADT Priority Queue Hash tables Today Hash tables 1 Hash

More information

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table CS1020 Data Structures and Algorithms I Lecture Note #15 Hashing For efficient look-up in a table Objectives 1 To understand how hashing is used to accelerate table lookup 2 To study the issue of collision

More information

Data Structures. Topic #6

Data Structures. Topic #6 Data Structures Topic #6 Today s Agenda Table Abstract Data Types Work by value rather than position May be implemented using a variety of data structures such as arrays (statically, dynamically allocated)

More information

Hash[ string key ] ==> integer value

Hash[ string key ] ==> integer value Hashing 1 Overview Hash[ string key ] ==> integer value Hash Table Data Structure : Use-case To support insertion, deletion and search in average-case constant time Assumption: Order of elements irrelevant

More information

Fast Lookup: Hash tables

Fast Lookup: Hash tables CSE 100: HASHING Operations: Find (key based look up) Insert Delete Fast Lookup: Hash tables Consider the 2-sum problem: Given an unsorted array of N integers, find all pairs of elements that sum to a

More information

CMSC 341 Hashing. Based on slides from previous iterations of this course

CMSC 341 Hashing. Based on slides from previous iterations of this course CMSC 341 Hashing Based on slides from previous iterations of this course Hashing Searching n Consider the problem of searching an array for a given value q If the array is not sorted, the search requires

More information

Hash table basics mod 83 ate. ate

Hash table basics mod 83 ate. ate Hash table basics After today, you should be able to explain how hash tables perform insertion in amortized O(1) time given enough space ate hashcode() 82 83 84 48594983 mod 83 ate 1. Section 2: 15+ min

More information

Hash table basics mod 83 ate. ate. hashcode()

Hash table basics mod 83 ate. ate. hashcode() Hash table basics ate hashcode() 82 83 84 48594983 mod 83 ate Reminder from syllabus: EditorTrees worth 10% of term grade See schedule page Exam 2 moved to Friday after break. Short pop quiz over AVL rotations

More information

Module 2: Classical Algorithm Design Techniques

Module 2: Classical Algorithm Design Techniques Module 2: Classical Algorithm Design Techniques Dr. Natarajan Meghanathan Associate Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu Module

More information

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada CS102 Hash Tables Prof. Tejada 1 Vectors, Linked Lists, Stack, Queues, Deques Can t provide fast insertion/removal and fast lookup at the same time The Limitations of Data Structure Binary Search Trees,

More information

Data Structures and Algorithms. Chapter 7. Hashing

Data Structures and Algorithms. Chapter 7. Hashing 1 Data Structures and Algorithms Chapter 7 Werner Nutt 2 Acknowledgments The course follows the book Introduction to Algorithms, by Cormen, Leiserson, Rivest and Stein, MIT Press [CLRST]. Many examples

More information

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015 HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015 1 administrivia 2 -assignment 9 is due on Monday -assignment 10 will go out on Thursday -midterm on Thursday 3 last time 4

More information

Introduction to Hashing

Introduction to Hashing Lecture 11 Hashing Introduction to Hashing We have learned that the run-time of the most efficient search in a sorted list can be performed in order O(lg 2 n) and that the most efficient sort by key comparison

More information

Data Structures and Algorithm Analysis (CSC317) Hash tables (part2)

Data Structures and Algorithm Analysis (CSC317) Hash tables (part2) Data Structures and Algorithm Analysis (CSC317) Hash tables (part2) Hash table We have elements with key and satellite data Operations performed: Insert, Delete, Search/lookup We don t maintain order information

More information

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n) Hash-Tables Introduction Dictionary Dictionary stores key-value pairs Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n) Balanced BST O(log n) O(log n) O(log n) Dictionary

More information

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40 Lecture 16 Hashing Hash table and hash function design Hash functions for integers and strings Collision resolution strategies: linear probing, double hashing, random hashing, separate chaining Hash table

More information

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials) CSE100 Advanced Data Structures Lecture 21 (Based on Paul Kube course materials) CSE 100 Collision resolution strategies: linear probing, double hashing, random hashing, separate chaining Hash table cost

More information

CS 241 Analysis of Algorithms

CS 241 Analysis of Algorithms CS 241 Analysis of Algorithms Professor Eric Aaron Lecture T Th 9:00am Lecture Meeting Location: OLB 205 Business HW5 extended, due November 19 HW6 to be out Nov. 14, due November 26 Make-up lecture: Wed,

More information