Hashing for searching

Size: px
Start display at page:

Download "Hashing for searching"

Transcription

1 Hashing for searching Consider searching a database of records on a given key. There are three standard techniques: Searching sequentially start at the first record and look at each record in turn until you find the right one (with the matching key). The advantage with this method is that it is simple to understand, implement and prove to be correct; in particular, changes to the database of records cannot stop the search from working correctly. The disadvantage it is horribly inefficient for large databases (lots of records) Structured (ordered) searching where the database to be searched is structured in such a way as to make the searching process more efficient. For example, we have already seen (or will see) ordered lists, binary trees, balanced binary trees which use an ordering between the keys to structure the data being stored. The advantage is that the searching can be made much more efficient. The disadvantage is that updating the database is much more complicated and has the potential for disrupting the search process: consequently, correct implementation is more difficult. Searching by hashing a completely different approach which does not depend on being able to sort the database records (by key), but does provide structure to improve the efficiency of searches. J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 1

2 Hashing some terminology Hashing the process of accessing a record by mapping a key value to a position in a (database) table. Hash function the function that maps any given key to some table position. This is usually denoted by h. Hash table the data structure (usually an array) that holds the records. This is usually denoted by T. Slot position in a hash table. Hash modulus the number of slots in the hash table. This is usually denoted by M, with slots numbered 0 to M 1. The goal when hashing is to arrange things such that for any key value K and some hash function h, we have the following: 0<= h(k) <= M, and T[h(K)].key() = K In this way, the hash function tells us where the record being searched for (by the given key K) can be found in the hash table. But, as usual, it is more complicated than this as we shall see in later slides! J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 2

3 When to use hashing Generally used only for sets: in most cases it should not be used where multiple records with the same key are permitted. Not normally used for range searches: for example, finding a record with a key in a certain alphabetic range is not easy to do by hashing. Hashing should not be used if the ordering of the elements is important: for example, finding the largest, smallest or next value is not easy with hash tables. Hashing is best for answering questions like: what record, if any, has the key value K? For databases which are used to answer only questions of this type, hashing is the preferred method since it is very efficient (when done correctly!) It is very easy to choose and implement a bad hashing strategy. But, standard (good) strategies have been developed,and we will look at some of the most important issues First, we should look at a simple example... J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 3

4 The simplest hashing:array indexing In the simplest of cases, hashing is already provided in the form of array indexing. For example: When there are n records with unique key values in the range 0 to n 1 then we can use the hashing function h(k) = k. Thus, a record with key i can be stored in T[i]. In fact, we don t even need to store the key value as part of the record since it is the same as the array index being used! To find a record with key value i, simply look in T[i]. Unfortunately, this case is the exception rather than the rule: there are usually many more values in the key range than there are slots in the hash table. We need to examine more realistic examples. J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 4

5 An introduction to collisions Suppose we have a key range of 0 to 65,535 (a 2 bit signed integer) We expect to have to store 1000 records, on average, at any one time. It is impractical to use a table with 65,535 slots, where most would be empty. Instead, we design a hash function that will store records in a much smaller table. (To store in a table of size 2000 we could just take the key modulus 2000) Since, the possible key range is much larger than the size of the table we know that, unless we are very lucky, at least some of the slots must be mapped to by the same key value (like keys 2000 and 4000 which would both map to 0): Given a hash function h and keys k1 and k2, if h(k1) = h(k2) = s, then we say that k1 and k2 have a collision at slot s under hash function h. We need to decide upon a collision policy for resolving collisions. Then, to find a record with key k, we compute the table position h(k) and starting at slot h(k) locate the record using knowledge of the collision policy J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 5

6 Hash functions We should choose a hash function which helps to minimise the number of collisions. Even in the best cases, collisions are practically unavoidable Perfect hashing, where there are never any collisions, can be coded when all the records to be stored are known in advance. This is extremely efficient but design and implementation can be difficult. A typical use of perfect hashing is on CD ROMs where the database will never change but access time can be expensive. An imperfect hash example: A head teacher wishes to keep a database of student information where the school has 200 students. (S)he decides to use a hashing function which uses the birthday of each student to map them into a table of 365 elements. This will almost certainly lead to collisions because it takes only 23 students to give odds of better than evens that 2 of the students will share the same birthday. The first guideline when deciding if a hash function is suitable is whether it keeps the hash table at least half full at any one time. In the example above, this property is met. A second guideline concerns the number of collisions that are acceptable, and this is usually something which designers must decide upon on a problem to problem basis. J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 6

7 Key data distribution In general, we would like to pick a hash function which distributes the records with equal probability to all hash table slots. This will depend on how well the key data is distributed. For example: if the keys are generated as a set of random numbers selected uniformly in the key range then any hash function that assigns the key range so that each slot receives an equal share of the range will also distribute the records evenly throughout the table. When input records are not well distributed throughout the key range then it can be difficult to devise a hash function that does a good distribution through the table. This problem becomes even more complex if we do not know in advance the list of keys to be stored, or if this changes dynamically during execution! J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 7

8 Poorly distributed keys and distribution dependent hashing There are many reasons why data values may be poorly distributed. Many natural distributions are asymptotic curves Collected data is likely to be skewed in some way For example, if the keys are a collection of English words then the initial letter will not be evenly distributed so a hashing function which maps words to a table of 26 slots (1 for every letter of the alphabet which initial characters can take) would probably not be a very good idea In the above example, we should use a distribution dependent hashing function that uses knowledge of the distribution of the keys in order to avoid collisions. Distribution independent hashing functions are used when we have no knowledge of the distribution of the key values being used. J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 8

9 A simple example The following hash function is used to hash integers to a table of 64 slots: public static int h(int x){ return (x%64);} The value returned by this hash function depends only on the least significant 6 bits of the key. These bits are very likely to be poorly distributed and so the hashing function is likely to produce a table which is unevenly filled (increasing the number of collisions to be resolved) A better classic example: the mid square method square the numerical (integer) key and take the middle r bits for a table of size 2^r. Question: try programming the mid square method in Java Test it in comparison with the first hash function (h, above). Question: Why do you think the mid square method is deemed to be better than h (for the same sized hash tables)? J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 9

10 Hashing for strings Consider the following hash function: public static int h (String x, int M){ int I, sum; for (sum =0, I=0, I<x.length(); I++) sum+= (int)x.charat(i); return (sum%m); } Question: what would be a good size for M if the average length of key strings is 10 characters? Here we sum the ascii values of the letters in the string. Provided M (the size of the hashing table) is small, this should provide a good distribution because it gives equal weight to all characters. This is an example of a folding method: the hash function folds up the sequence of characters using the plus operator. Note: changing the order of characters does not change the slot calculated. The last step is common to all hashing functions apply the modulus operator to make sure the value generated is within the table range. J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 10

11 Open hashing Two classes of collision resolution techniques: Open hashing also known as separate chaining Closed hashing also known as open addressing With open hashing, collisions lead to records being stored outside the hashing table With closed hashing, collisions lead to records being stored in the table at a position different from that calculated by the hashing function. Open Hashing Implementation A simple design for open hashing defines each slot in the table to be the head of a linked list. All records hashed to that slot are then placed in this linked list. Now we need only decide how records are ordered in each list the most popular techniques are by insertion time, or key value order, or frequency of access time. When to use open hashing: It is most appropriate when the hash table is to be kept in main memory; storing such a table on disk would be very inefficient. Note: open hashing is based on the same idea as the bin sort (which we will see later). J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 11

12 Closed Hashing Closed hashing stores all records directly in the hash table. Each record has a home position defined by h(k), where k is the record s key. If a record R is to be inserted and another record already occupies R s home then R will be stored at some other slot. The collision resolution policy determines which slot. The same policy must also be followed when searching the database. A simple example: if a collision occurs just move onto the next position in the table. If the end of the table is reached then just loop around to the front. Question: this policy, although simple and correct, is not often used in real programs; can you see why? J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 12

13 Closed hashing continued... Bucket Hashing group hash table slots into buckets Hash table: array of M slots divided into B buckets, (M/B slots per bucket) Hash function maps to buckets (1st slot free) Overflow bucket when a bucket is full then record is store here. All buckets share the same overflow Goal: minimise use of overflow! Searching: hash the key to determine the bucket if key not found in bucket and bucket is not full then search is complete if key not found and bucket is full then check the overflow bucket Note: searching the overflow can be expensive if there are many records contained within J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 13

14 A simple variation Hash the key value to its home position as if no bucketing used If home is taken then push record down until end of bucket is reached If bottom of bucket is reached then move record to top. Example: assume 8 record buckets if key is hashed to 5 and 5 is full then we try to find an empty slot in following order: 6,7,0,1,2,3,4 if all slots are taken then record is assigned to overflow bucket. Advantage: collisions are reduced Used for storing on disks where bucket size = block size. Goal: maximise the likelihood that a record is stored in the same disk block as its home position. J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 14

15 Simple Linear Probing Classic form: closed hashing with no bucketing and a collision resolution policy which can potentially use any slot in the table. Collision resolution: generate a sequence of hash table slots that can potentially hold the record: the probe sequence Assume p(k,i) is the probe function returning an offset from the home position for the Ith slot in the probe sequence. Linear probing: just keep moving down (allowing for circular movement when bottom is reached) until an empty slot is found. The probe function for linear probing is very simple: p(k,i) = I. Advantage: simple and all slots will eventually be taken. Disadvantage: leads to clustering and uneven distribution inefficient! J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 15

16 Improved Collision Resolution Methods How to avoid clustering: use linear probing but skip slots by a constant c other than 1. Probe function: (H(K)+ic)mod M. Records with adjacent home positions will not follow the same probe sequence Completeness: probe sequence should cycle through all slots in the hash table before returning to the home position. Not all probe functions are complete: if c = 2 and the table contains even number of slots, then any key whose home position is in an even slot will have a probe sequence that cycles only through even slots. Similarly, an odd home position will cycle only through odd slots. Completeness Requirement: c must be relatively prime to M. For example, if M= 10 then we can choose c among 1,3,7,9 J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 16

17 Pseudo random probing Ideal probe function: select the next position in the probe sequence randomly! But, how would we search if we did this? We need to be able to duplicate the same probe sequence when searching the key. Pseudo random probing use a common sequence of random numbers for adding and searching. Advantages: eliminates primary clustering. Disadvantages: complex and less efficient (and choice of sequence is key) Note: there are many other techniques for eliminating primary clustering. J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 17

18 Analysis of Closed hashing Primary Operations: insert, delete and search Key property is how full the table is (on average) the load factor Typically, cost of hashing is close to 1 record access this is super efficient (a binary search takes log n, on average!) Mathematical analysis shows that the best policy is to aim for the hash table being, on average, half full. But, this requires the implementor having some idea of record usage. Question: what are the extra considerations for deletion Answer: don t hinder later searches, and don t make slots unusable J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 18

19 Other Hashing Concerns: File Processing and External Sorting Differences between primary memory and secondary storage as they affect algorithm and data structure designers: speed of access, quantity of data, persistence of data Primary problem: access to disk and tape drives is much slower than access to primary memory. Persistent storage: disk and tape files do not lose data when they are switched off. Volatile: all information is lost with power. Access ration: primary storage is roughly 1 million times faster to access (in general). Goal: minimize the number of disk accesses use a good file structure read more information than is needed and store in a cache! J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 19

20 Other Hashing Concerns: Disk Access Costs and Caching Primary cost: the seek time (accessing a sector on disk) But seeks are not always necessary (for example, sequential reading) Note, there are other delay factors: rotational delay, startup time, second seeks Disk fragmentation problems: use disk defragmenters as often as possible Sector buffering: keep at least 2 buffers for input and output so that read/writes to same disk sector do not interfere. (Standard on most modern computers) More advanced techniques for parallelism: double buffering, eg. Caching techniques: how to keep the most used records in the local cache (fastest available memory). Caching layers: caching is such a good idea that there are often multiple layers. J Paul Gibson, NUI Maynooth 2004/2005: CS211 Hashing. 20

Successful vs. Unsuccessful

Successful vs. Unsuccessful Hashing Search Given: Distinct keys k 1, k 2,, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ),, (k n, I n ) where I j is the information associated with key k j for 1

More information

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5. 5. Hashing 5.1 General Idea 5.2 Hash Function 5.3 Separate Chaining 5.4 Open Addressing 5.5 Rehashing 5.6 Extendible Hashing Malek Mouhoub, CS340 Fall 2004 1 5. Hashing Sequential access : O(n). Binary

More information

Hashing. October 19, CMPE 250 Hashing October 19, / 25

Hashing. October 19, CMPE 250 Hashing October 19, / 25 Hashing October 19, 2016 CMPE 250 Hashing October 19, 2016 1 / 25 Dictionary ADT Data structure with just three basic operations: finditem (i): find item with key (identifier) i insert (i): insert i into

More information

Question Bank Subject: Advanced Data Structures Class: SE Computer

Question Bank Subject: Advanced Data Structures Class: SE Computer Question Bank Subject: Advanced Data Structures Class: SE Computer Question1: Write a non recursive pseudo code for post order traversal of binary tree Answer: Pseudo Code: 1. Push root into Stack_One.

More information

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University U Kang 1 In This Lecture Motivation of collision resolution policy Open hashing for collision resolution Closed hashing

More information

Hash Table and Hashing

Hash Table and Hashing Hash Table and Hashing The tree structures discussed so far assume that we can only work with the input keys by comparing them. No other operation is considered. In practice, it is often true that an input

More information

Hash Tables. Hashing Probing Separate Chaining Hash Function

Hash Tables. Hashing Probing Separate Chaining Hash Function Hash Tables Hashing Probing Separate Chaining Hash Function Introduction In Chapter 4 we saw: linear search O( n ) binary search O( log n ) Can we improve the search operation to achieve better than O(

More information

Data and File Structures Chapter 11. Hashing

Data and File Structures Chapter 11. Hashing Data and File Structures Chapter 11 Hashing 1 Motivation Sequential Searching can be done in O(N) access time, meaning that the number of seeks grows in proportion to the size of the file. B-Trees improve

More information

HASH TABLES.

HASH TABLES. 1 HASH TABLES http://en.wikipedia.org/wiki/hash_table 2 Hash Table A hash table (or hash map) is a data structure that maps keys (identifiers) into a certain location (bucket) A hash function changes the

More information

AAL 217: DATA STRUCTURES

AAL 217: DATA STRUCTURES Chapter # 4: Hashing AAL 217: DATA STRUCTURES The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions, and finds in constant average

More information

Fast Lookup: Hash tables

Fast Lookup: Hash tables CSE 100: HASHING Operations: Find (key based look up) Insert Delete Fast Lookup: Hash tables Consider the 2-sum problem: Given an unsorted array of N integers, find all pairs of elements that sum to a

More information

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course Today s Topics Review Uses and motivations of hash tables Major concerns with hash tables Properties Hash function Hash

More information

TABLES AND HASHING. Chapter 13

TABLES AND HASHING. Chapter 13 Data Structures Dr Ahmed Rafat Abas Computer Science Dept, Faculty of Computer and Information, Zagazig University arabas@zu.edu.eg http://www.arsaliem.faculty.zu.edu.eg/ TABLES AND HASHING Chapter 13

More information

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 Prof. John Park Based on slides from previous iterations of this course Today s Topics Overview Uses and motivations of hash tables Major concerns with hash

More information

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Data Structures Hashing Structures Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Hashing Structures I. Motivation and Review II. Hash Functions III. HashTables I. Implementations

More information

CS 350 Algorithms and Complexity

CS 350 Algorithms and Complexity CS 350 Algorithms and Complexity Winter 2019 Lecture 12: Space & Time Tradeoffs. Part 2: Hashing & B-Trees Andrew P. Black Department of Computer Science Portland State University Space-for-time tradeoffs

More information

Adapted By Manik Hosen

Adapted By Manik Hosen Adapted By Manik Hosen Basic Terminology Question: Define Hashing. Ans: Concept of building a data structure that can be searched in O(l) time is called Hashing. Question: Define Hash Table with example.

More information

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure Hashing 1 Hash Tables We ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees. The implementation of hash tables is called hashing. Hashing is a technique

More information

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Introduction hashing: a technique used for storing and retrieving information as quickly as possible. Lecture IX: Hashing Introduction hashing: a technique used for storing and retrieving information as quickly as possible. used to perform optimal searches and is useful in implementing symbol tables. Why

More information

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials) CSE100 Advanced Data Structures Lecture 21 (Based on Paul Kube course materials) CSE 100 Collision resolution strategies: linear probing, double hashing, random hashing, separate chaining Hash table cost

More information

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion, Introduction Chapter 5 Hashing hashing performs basic operations, such as insertion, deletion, and finds in average time 2 Hashing a hash table is merely an of some fixed size hashing converts into locations

More information

UNIT III BALANCED SEARCH TREES AND INDEXING

UNIT III BALANCED SEARCH TREES AND INDEXING UNIT III BALANCED SEARCH TREES AND INDEXING OBJECTIVE The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions and finds in constant

More information

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing

More information

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40 Lecture 16 Hashing Hash table and hash function design Hash functions for integers and strings Collision resolution strategies: linear probing, double hashing, random hashing, separate chaining Hash table

More information

CMSC 341 Hashing. Based on slides from previous iterations of this course

CMSC 341 Hashing. Based on slides from previous iterations of this course CMSC 341 Hashing Based on slides from previous iterations of this course Hashing Searching n Consider the problem of searching an array for a given value q If the array is not sorted, the search requires

More information

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1 Asymptotics, Recurrence and Basic Algorithms 1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1 2. O(n) 2. [1 pt] What is the solution to the recurrence T(n) = T(n/2) + n, T(1)

More information

Introduction to Hashing

Introduction to Hashing Lecture 11 Hashing Introduction to Hashing We have learned that the run-time of the most efficient search in a sorted list can be performed in order O(lg 2 n) and that the most efficient sort by key comparison

More information

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2 CS 310 Hash Tables, Page 1 Hash Tables key value key value Hashing Function CS 310 Hash Tables, Page 2 The hash-table data structure achieves (near) constant time searching by wasting memory space. the

More information

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking Chapter 17 Disk Storage, Basic File Structures, and Hashing Records Fixed and variable length records Records contain fields which have values of a particular type (e.g., amount, date, time, age) Fields

More information

HASH TABLES. Hash Tables Page 1

HASH TABLES. Hash Tables Page 1 HASH TABLES TABLE OF CONTENTS 1. Introduction to Hashing 2. Java Implementation of Linear Probing 3. Maurer s Quadratic Probing 4. Double Hashing 5. Separate Chaining 6. Hash Functions 7. Alphanumeric

More information

Introduction To Hashing

Introduction To Hashing Introduction To Hashing In this section of notes you will learn an approach for organizing information that allows for searches in constant time Searching For Information Algorithms You Know Linear search

More information

Hashing. Hashing Procedures

Hashing. Hashing Procedures Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements

More information

COMP171. Hashing.

COMP171. Hashing. COMP171 Hashing Hashing 2 Hashing Again, a (dynamic) set of elements in which we do search, insert, and delete Linear ones: lists, stacks, queues, Nonlinear ones: trees, graphs (relations between elements

More information

Hashing IV and Course Overview

Hashing IV and Course Overview Date: April 5-6, 2001 CSI 2131 Page: 1 Hashing IV and Course Overview Other Collision Resolution Techniques 1) Double Hashing The first hash function determines the home address If the home address is

More information

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is: CS 124 Section #8 Hashing, Skip Lists 3/20/17 1 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look

More information

Hashing Techniques. Material based on slides by George Bebis

Hashing Techniques. Material based on slides by George Bebis Hashing Techniques Material based on slides by George Bebis https://www.cse.unr.edu/~bebis/cs477/lect/hashing.ppt The Search Problem Find items with keys matching a given search key Given an array A, containing

More information

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1 Hash Tables Outline Definition Hash functions Open hashing Closed hashing collision resolution techniques Efficiency EECS 268 Programming II 1 Overview Implementation style for the Table ADT that is good

More information

Comp 335 File Structures. Hashing

Comp 335 File Structures. Hashing Comp 335 File Structures Hashing What is Hashing? A process used with record files that will try to achieve O(1) (i.e. constant) access to a record s location in the file. An algorithm, called a hash function

More information

Chapter 1 Disk Storage, Basic File Structures, and Hashing.

Chapter 1 Disk Storage, Basic File Structures, and Hashing. Chapter 1 Disk Storage, Basic File Structures, and Hashing. Adapted from the slides of Fundamentals of Database Systems (Elmasri et al., 2003) 1 Chapter Outline Disk Storage Devices Files of Records Operations

More information

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables CITS2200 Data Structures and Algorithms Topic 15 Hash Tables Introduction to hashing basic ideas Hash functions properties, 2-universal functions, hashing non-integers Collision resolution bucketing and

More information

DATA STRUCTURES/UNIT 3

DATA STRUCTURES/UNIT 3 UNIT III SORTING AND SEARCHING 9 General Background Exchange sorts Selection and Tree Sorting Insertion Sorts Merge and Radix Sorts Basic Search Techniques Tree Searching General Search Trees- Hashing.

More information

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION DESIGN AND ANALYSIS OF ALGORITHMS Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION http://milanvachhani.blogspot.in EXAMPLES FROM THE SORTING WORLD Sorting provides a good set of examples for analyzing

More information

CSE 214 Computer Science II Searching

CSE 214 Computer Science II Searching CSE 214 Computer Science II Searching Fall 2017 Stony Brook University Instructor: Shebuti Rayana shebuti.rayana@stonybrook.edu http://www3.cs.stonybrook.edu/~cse214/sec02/ Introduction Searching in a

More information

Data Structures (CS 1520) Lecture 23 Name:

Data Structures (CS 1520) Lecture 23 Name: Data Structures (CS ) Lecture Name: ListDict object _table Python list object.... Consider the following ListDict class implementation. class Entry(object): """A key/value pair.""" def init (self, key,

More information

Lecturer 4: File Handling

Lecturer 4: File Handling Lecturer 4: File Handling File Handling The logical and physical organisation of files. Serial and sequential file handling methods. Direct and index sequential files. Creating, reading, writing and deleting

More information

Topic HashTable and Table ADT

Topic HashTable and Table ADT Topic HashTable and Table ADT Hashing, Hash Function & Hashtable Search, Insertion & Deletion of elements based on Keys So far, By comparing keys! Linear data structures Non-linear data structures Time

More information

Outline. Computer Science 331. Desirable Properties of Hash Functions. What is a Hash Function? Hash Functions. Mike Jacobson.

Outline. Computer Science 331. Desirable Properties of Hash Functions. What is a Hash Function? Hash Functions. Mike Jacobson. Outline Computer Science 331 Mike Jacobson Department of Computer Science University of Calgary Lecture #20 1 Definition Desirable Property: Easily Computed 2 3 Universal Hashing 4 References Mike Jacobson

More information

Database Technology. Topic 7: Data Structures for Databases. Olaf Hartig.

Database Technology. Topic 7: Data Structures for Databases. Olaf Hartig. Topic 7: Data Structures for Databases Olaf Hartig olaf.hartig@liu.se Database System 2 Storage Hierarchy Traditional Storage Hierarchy CPU Cache memory Main memory Primary storage Disk Tape Secondary

More information

Fundamentals of Database Systems Prof. Arnab Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

Fundamentals of Database Systems Prof. Arnab Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Kanpur Fundamentals of Database Systems Prof. Arnab Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Kanpur Lecture - 18 Database Indexing: Hashing We will start on

More information

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved Introducing Hashing Chapter 21 Contents What Is Hashing? Hash Functions Computing Hash Codes Compressing a Hash Code into an Index for the Hash Table A demo of hashing (after) ARRAY insert hash index =

More information

Hashing 1. Searching Lists

Hashing 1. Searching Lists Hashing 1 Searching Lists There are many instances when one is interested in storing and searching a list: A phone company wants to provide caller ID: Given a phone number a name is returned. Somebody

More information

Understand how to deal with collisions

Understand how to deal with collisions Understand the basic structure of a hash table and its associated hash function Understand what makes a good (and a bad) hash function Understand how to deal with collisions Open addressing Separate chaining

More information

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs Algorithms in Systems Engineering ISE 172 Lecture 12 Dr. Ted Ralphs ISE 172 Lecture 12 1 References for Today s Lecture Required reading Chapter 5 References CLRS Chapter 11 D.E. Knuth, The Art of Computer

More information

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function. Hash Tables Chapter 20 CS 3358 Summer II 2013 Jill Seaman Sections 201, 202, 203, 204 (not 2042), 205 1 What are hash tables?! A Hash Table is used to implement a set, providing basic operations in constant

More information

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Chapter 13 Disk Storage, Basic File Structures, and Hashing. Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright 2004 Pearson Education, Inc. Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files

More information

Chapter 20 Hash Tables

Chapter 20 Hash Tables Chapter 20 Hash Tables Dictionary All elements have a unique key. Operations: o Insert element with a specified key. o Search for element by key. o Delete element by key. Random vs. sequential access.

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

CS 270 Algorithms. Oliver Kullmann. Generalising arrays. Direct addressing. Hashing in general. Hashing through chaining. Reading from CLRS for week 7

CS 270 Algorithms. Oliver Kullmann. Generalising arrays. Direct addressing. Hashing in general. Hashing through chaining. Reading from CLRS for week 7 Week 9 General remarks tables 1 2 3 We continue data structures by discussing hash tables. Reading from CLRS for week 7 1 Chapter 11, Sections 11.1, 11.2, 11.3. 4 5 6 Recall: Dictionaries Applications

More information

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms Hashing CmSc 250 Introduction to Algorithms 1. Introduction Hashing is a method of storing elements in a table in a way that reduces the time for search. Elements are assumed to be records with several

More information

Logical File Organisation A file is logically organised as follows:

Logical File Organisation A file is logically organised as follows: File Handling The logical and physical organisation of files. Serial and sequential file handling methods. Direct and index sequential files. Creating, reading, writing and deleting records from a variety

More information

CS 2412 Data Structures. Chapter 10 Sorting and Searching

CS 2412 Data Structures. Chapter 10 Sorting and Searching CS 2412 Data Structures Chapter 10 Sorting and Searching Some concepts Sorting is one of the most common data-processing applications. Sorting algorithms are classed as either internal or external. Sorting

More information

Kathleen Durant PhD Northeastern University CS Indexes

Kathleen Durant PhD Northeastern University CS Indexes Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical

More information

Questions. 6. Suppose we were to define a hash code on strings s by:

Questions. 6. Suppose we were to define a hash code on strings s by: Questions 1. Suppose you are given a list of n elements. A brute force method to find duplicates could use two (nested) loops. The outer loop iterates over position i the list, and the inner loop iterates

More information

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management Hashing Symbol Table Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management In general, the following operations are performed on

More information

Cpt S 223. School of EECS, WSU

Cpt S 223. School of EECS, WSU Hashing & Hash Tables 1 Overview Hash Table Data Structure : Purpose To support insertion, deletion and search in average-case constant t time Assumption: Order of elements irrelevant ==> data structure

More information

Hash Tables and Hash Functions

Hash Tables and Hash Functions Hash Tables and Hash Functions We have seen that with a balanced binary tree we can guarantee worst-case time for insert, search and delete operations. Our challenge now is to try to improve on that...

More information

Week 9. Hash tables. 1 Generalising arrays. 2 Direct addressing. 3 Hashing in general. 4 Hashing through chaining. 5 Hash functions.

Week 9. Hash tables. 1 Generalising arrays. 2 Direct addressing. 3 Hashing in general. 4 Hashing through chaining. 5 Hash functions. Week 9 tables 1 2 3 ing in ing in ing 4 ing 5 6 General remarks We continue data structures by discussing hash tables. For this year, we only consider the first four sections (not sections and ). Only

More information

Hashing. Introduction to Data Structures Kyuseok Shim SoEECS, SNU.

Hashing. Introduction to Data Structures Kyuseok Shim SoEECS, SNU. Hashing Introduction to Data Structures Kyuseok Shim SoEECS, SNU. 1 8.1 INTRODUCTION Binary search tree (Chapter 5) GET, INSERT, DELETE O(n) Balanced binary search tree (Chapter 10) GET, INSERT, DELETE

More information

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2 Hashing Hashing A hash function h maps keys of a given type to integers in a fixed interval [0,N-1]. The goal of a hash function is to uniformly disperse keys in the range [0,N-1] 5/1/2006 Algorithm analysis

More information

Lecture 8 Index (B+-Tree and Hash)

Lecture 8 Index (B+-Tree and Hash) CompSci 516 Data Intensive Computing Systems Lecture 8 Index (B+-Tree and Hash) Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 HW1 due tomorrow: Announcements Due on 09/21 (Thurs),

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23 FILE SYSTEMS CS124 Operating Systems Winter 2015-2016, Lecture 23 2 Persistent Storage All programs require some form of persistent storage that lasts beyond the lifetime of an individual process Most

More information

On my honor I affirm that I have neither given nor received inappropriate aid in the completion of this exercise.

On my honor I affirm that I have neither given nor received inappropriate aid in the completion of this exercise. CS 2413 Data Structures EXAM 2 Fall 2015, Page 1 of 10 Student Name: Student ID # OU Academic Integrity Pledge On my honor I affirm that I have neither given nor received inappropriate aid in the completion

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

CS 310 Advanced Data Structures and Algorithms

CS 310 Advanced Data Structures and Algorithms CS 310 Advanced Data Structures and Algorithms Hashing June 6, 2017 Tong Wang UMass Boston CS 310 June 6, 2017 1 / 28 Hashing Hashing is probably one of the greatest programming ideas ever. It solves one

More information

1. What is the difference between primary storage and secondary storage?

1. What is the difference between primary storage and secondary storage? 1. What is the difference between primary storage and secondary storage? Primary Storage is - Limited - Volatile - Expensive - Fast (May be accessed directly from the CPU) - Retrieving a single character

More information

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path.

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path. Hashing B+-tree is perfect, but... Selection Queries to answer a selection query (ssn=) needs to traverse a full path. In practice, 3-4 block accesses (depending on the height of the tree, buffering) Any

More information

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited Unit 9, Part 4 Hash Tables Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited We've considered several data structures that allow us to store and search for data

More information

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015 HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015 1 administrivia 2 -assignment 9 is due on Monday -assignment 10 will go out on Thursday -midterm on Thursday 3 last time 4

More information

1. Attempt any three of the following: 15

1. Attempt any three of the following: 15 (Time: 2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory. (2) Make suitable assumptions wherever necessary and state the assumptions made. (3) Answers to the same question must be written

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6) CompSci 16 Intensive Computing Systems Lecture 7 Storage and Index Instructor: Sudeepa Roy Announcements HW1 deadline this week: Due on 09/21 (Thurs), 11: pm, no late days Project proposal deadline: Preliminary

More information

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech Hashing Dr. Ronaldo Menezes Hugo Serrano Agenda Motivation Prehash Hashing Hash Functions Collisions Separate Chaining Open Addressing Motivation Hash Table Its one of the most important data structures

More information

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques Hashing Manolis Koubarakis 1 The Symbol Table ADT A symbol table T is an abstract storage that contains table entries that are either empty or are pairs of the form (K, I) where K is a key and I is some

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,

More information

4. SEARCHING AND SORTING LINEAR SEARCH

4. SEARCHING AND SORTING LINEAR SEARCH 4. SEARCHING AND SORTING SEARCHING Searching and sorting are fundamental operations in computer science. Searching refers to the operation of finding the location of a given item in a collection of items.

More information

CS122 Lecture 3 Winter Term,

CS122 Lecture 3 Winter Term, CS122 Lecture 3 Winter Term, 2017-2018 2 Record-Level File Organization Last time, finished discussing block-level organization Can also organize data files at the record-level Heap file organization A

More information

Classifying Physical Storage Media. Chapter 11: Storage and File Structure. Storage Hierarchy (Cont.) Storage Hierarchy. Magnetic Hard Disk Mechanism

Classifying Physical Storage Media. Chapter 11: Storage and File Structure. Storage Hierarchy (Cont.) Storage Hierarchy. Magnetic Hard Disk Mechanism Chapter 11: Storage and File Structure Overview of Storage Media Magnetic Disks Characteristics RAID Database Buffers Structure of Records Organizing Records within Files Data-Dictionary Storage Classifying

More information

Classifying Physical Storage Media. Chapter 11: Storage and File Structure. Storage Hierarchy. Storage Hierarchy (Cont.) Speed

Classifying Physical Storage Media. Chapter 11: Storage and File Structure. Storage Hierarchy. Storage Hierarchy (Cont.) Speed Chapter 11: Storage and File Structure Overview of Storage Media Magnetic Disks Characteristics RAID Database Buffers Structure of Records Organizing Records within Files Data-Dictionary Storage Classifying

More information

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1] Exercise # 8- Hash Tables Hash Tables Hash Function Uniform Hash Hash Table Direct Addressing A hash function h maps keys of a given type into integers in a fixed interval [0,m-1] 1 Pr h( key) i, where

More information

Data Structures. Topic #6

Data Structures. Topic #6 Data Structures Topic #6 Today s Agenda Table Abstract Data Types Work by value rather than position May be implemented using a variety of data structures such as arrays (statically, dynamically allocated)

More information

Hashing file organization

Hashing file organization Hashing file organization These slides are a modified version of the slides of the book Database System Concepts (Chapter 12), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan. Original slides

More information

The physical database. Contents - physical database design DATABASE DESIGN I - 1DL300. Introduction to Physical Database Design

The physical database. Contents - physical database design DATABASE DESIGN I - 1DL300. Introduction to Physical Database Design DATABASE DESIGN I - 1DL300 Fall 2011 Introduction to Physical Database Design Elmasri/Navathe ch 16 and 17 Padron-McCarthy/Risch ch 21 and 22 An introductory course on database systems http://www.it.uu.se/edu/course/homepage/dbastekn/ht11

More information

Hashing 1. Searching Lists

Hashing 1. Searching Lists Hashing 1 Searching Lists There are many instanced when one is interested in storing and searching a list: phone company wants to provide caller ID: Given a phone number a name is returned. Somebody who

More information

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017 Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017 Acknowledgement The set of slides have used materials from the following resources Slides

More information

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Spring 2017 Acknowledgement The set of slides have used materials from the following resources Slides for

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018 HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018 Acknowledgement The set of slides have used materials from the following resources Slides for textbook by Dr. Y.

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

CSCD 326 Data Structures I Hashing

CSCD 326 Data Structures I Hashing 1 CSCD 326 Data Structures I Hashing Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional searching time complexity available is O(log2n)

More information