Hashing. inserting and locating strings. MCS 360 Lecture 28 Introduction to Data Structures Jan Verschelde, 27 October 2010.

Similar documents
Hashing. mapping data to tables hashing integers and strings. aclasshash_table inserting and locating strings. inserting and locating strings

Topic HashTable and Table ADT

Templates and Vectors

Design Patterns for Data Structures. Chapter 9.5. Hash Tables

Queue Implementations

Lists. linking nodes. constructors. chasing pointers. MCS 360 Lecture 11 Introduction to Data Structures Jan Verschelde, 17 September 2010.

Hash Tables. Gunnar Gotshalks. Maps 1

Iterators. node UML diagram implementing a double linked list the need for a deep copy. nested classes for iterator function objects

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Graphs. directed and undirected graphs weighted graphs adjacency matrices. abstract data type adjacency list adjacency matrix

Data Structures And Algorithms

Hash[ string key ] ==> integer value

Hashing Techniques. Material based on slides by George Bebis

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

HASH TABLES. Goal is to store elements k,v at index i = h k

CS/COE 1501

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Cpt S 223. School of EECS, WSU

B-Trees. nodes with many children a type node a class for B-trees. an elaborate example the insertion algorithm removing elements

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

CSCI-1200 Data Structures Fall 2009 Lecture 20 Hash Tables, Part II

Chapter 20 Hash Tables

Module 3: Hashing Lecture 9: Static and Dynamic Hashing. The Lecture Contains: Static hashing. Hashing. Dynamic hashing. Extendible hashing.

Chapter 27 Hashing. Liang, Introduction to Java Programming, Eleventh Edition, (c) 2017 Pearson Education, Inc. All rights reserved.

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

HASH TABLES.

the Stack stack ADT using the STL stack are parentheses balanced? algorithm uses a stack adapting the STL vector class adapting the STL list class

CSCI-1200 Data Structures Fall 2018 Lecture 21 Hash Tables, part 1

Understand how to deal with collisions

selectors, methodsinsert() andto_string() the depth of a tree and a membership function

Elements of C in C++ data types if else statement for loops. random numbers rolling a die

CSc Introduc/on to Compu/ng. Lecture 8 Edgardo Molina Fall 2011 City College of New York

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Inheritance and Polymorphism

Unified Modeling Language a case study

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

EP241 Computing Programming

Lecture 10 March 4. Goals: hashing. hash functions. closed hashing. application of hashing

TABLES AND HASHING. Chapter 13

ECE 242 Data Structures and Algorithms. Hash Tables II. Lecture 25. Prof.

Implementation of Linear Probing (continued)

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Review of the Lectures 21-26, 30-32

Open Addressing: Linear Probing (cont.)

Successful vs. Unsuccessful

Hash Table and Hashing

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Comp 335 File Structures. Hashing

the Queue queue ADT using the STL queue designing the simulation simulation with STL queue using STL list as queue using STL vector as queue

Lecture 18. Collision Resolution

CSCI-1200 Data Structures Fall 2015 Lecture 24 Hash Tables

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Lecture 12. Monday, February 7 CS 215 Fundamentals of Programming II - Lecture 12 1

CS 310 Advanced Data Structures and Algorithms

Local and Global Variables

Object-oriented Programming for Automation & Robotics Carsten Gutwenger LS 11 Algorithm Engineering

Lecture 16 More on Hashing Collision Resolution

Priority Queue. 03/09/04 Lecture 17 1

Expression Trees and the Heap

stacks operation array/vector linked list push amortized O(1) Θ(1) pop Θ(1) Θ(1) top Θ(1) Θ(1) isempty Θ(1) Θ(1)

You must include this cover sheet. Either type up the assignment using theory5.tex, or print out this PDF.

MIDTERM EXAM THURSDAY MARCH

CS 350 : Data Structures Hash Tables

Midterm 3 practice problems

Übungen zur Vorlesung Informatik II (D-BAUG) FS 2017 D. Sidler, F. Friedrich

Quadratic Sorting Algorithms

HASH TABLES. Hash Tables Page 1

(6) The specification of a name with its type in a program. (7) Some memory that holds a value of a given type.

AAL 217: DATA STRUCTURES

CSCI-1200 Data Structures Spring 2014 Lecture 19 Hash Tables

Hash table basics mod 83 ate. ate. hashcode()

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

CSC 427: Data Structures and Algorithm Analysis. Fall 2004

Outline. hash tables hash functions open addressing chained hashing

Chapter 27 Hashing. Objectives

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

DATA STRUCTURES AND ALGORITHMS

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

The dictionary problem

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

Computer Programming : C++

CS302 - Data Structures using C++

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Basic program The following is a basic program in C++; Basic C++ Source Code Compiler Object Code Linker (with libraries) Executable

CSCI-1200 Computer Science II Fall 2008 Lecture 15 Associative Containers (Maps), Part 2

SETS AND MAPS. Chapter 9

EP578 Computing for Physicists

Introduction to Hashing

You must print this PDF and write your answers neatly by hand. You should hand in the assignment before recitation begins.

Simulations. buffer of fixed capacity using the STL deque. modeling arrival times generating and processing jobs

CSE 214 Computer Science II Searching

C++ Basics. Data Processing Course, I. Hrivnacova, IPN Orsay

Transcription:

ing 1 2 3 MCS 360 Lecture 28 Introduction to Data Structures Jan Verschelde, 27 October 2010

ing 1 2 3

We covered STL sets and maps for a frequency table. The STL implementation uses a balanced search tree. An alternative to using trees is hashing. As running application, consider frequency. We map counted objects first to a positive integer: the key, and then we take key % n, for of size n. Using a hash function h to store objects in table of size n: object h( ) key % n index The index is used in an array of n elements.

We covered STL sets and maps for a frequency table. The STL implementation uses a balanced search tree. An alternative to using trees is hashing. As running application, consider frequency. We map counted objects first to a positive integer: the key, and then we take key % n, for of size n. Using a hash function h to store objects in table of size n: object h( ) key % n index The index is used in an array of n elements.

The mapping of an object (e.g.: integer or string) to a key is done by a hash function h. Desirable properties: 1 h is fast and easy to evaluate; 2 few collisions: x y h(x) h(y). Some similarity with uniform random number generator, although distribution of data is often not uniform, we want distribution of the keys to be uniform.

The mapping of an object (e.g.: integer or string) to a key is done by a hash function h. Desirable properties: 1 h is fast and easy to evaluate; 2 few collisions: x y h(x) h(y). Some similarity with uniform random number generator, although distribution of data is often not uniform, we want distribution of the keys to be uniform.

unsigned int The type used to index arrays is size_t, which is 8 bytes long on 64-bit configurations, or otherwise 4 bytes long. Including <cmath> allows int n = 8*sizeof(unsigned int); unsigned int m = pow(2.0,n-1); cout << " m = " << m << endl; m = 2*m; cout << "2*m = " << m << endl; m = m - 1; cout << "2*m - 1 = " << m << endl; shows the modular arithmetic (modulo 2 32 ) m = 2147483648 2*m = 0 2*m - 1 = 4294967295

unsigned int The type used to index arrays is size_t, which is 8 bytes long on 64-bit configurations, or otherwise 4 bytes long. Including <cmath> allows int n = 8*sizeof(unsigned int); unsigned int m = pow(2.0,n-1); cout << " m = " << m << endl; m = 2*m; cout << "2*m = " << m << endl; m = m - 1; cout << "2*m - 1 = " << m << endl; shows the modular arithmetic (modulo 2 32 ) m = 2147483648 2*m = 0 2*m - 1 = 4294967295

unsigned int The type used to index arrays is size_t, which is 8 bytes long on 64-bit configurations, or otherwise 4 bytes long. Including <cmath> allows int n = 8*sizeof(unsigned int); unsigned int m = pow(2.0,n-1); cout << " m = " << m << endl; m = 2*m; cout << "2*m = " << m << endl; m = m - 1; cout << "2*m - 1 = " << m << endl; shows the modular arithmetic (modulo 2 32 ) m = 2147483648 2*m = 0 2*m - 1 = 4294967295

ing 1 2 3

ing Integers Take a prime p, h(i) =(p i) mod 2 32. Start with i = 1, find n so p n mod 2 32 = 1? unsigned int x = 1; unsigned int i; while(true) x = prime*x; i++; if(x == 1) cout << "cycle detected at i = " << i << endl; break; cycle detected at i = 536870912

ing Integers Take a prime p, h(i) =(p i) mod 2 32. Start with i = 1, find n so p n mod 2 32 = 1? unsigned int x = 1; unsigned int i; while(true) x = prime*x; i++; if(x == 1) cout << "cycle detected at i = " << i << endl; break; cycle detected at i = 536870912

ing Integers Take a prime p, h(i) =(p i) mod 2 32. Start with i = 1, find n so p n mod 2 32 = 1? unsigned int x = 1; unsigned int i; while(true) x = prime*x; i++; if(x == 1) cout << "cycle detected at i = " << i << endl; break; cycle detected at i = 536870912

For a string s = s 0 s 1 s 2 s n 1, compute hashing k = s 0 31 n 1 + s 1 31 n 2 + s 2 31 n 3 + + s n 1 = ( ((s 0 31 + s 1 )31 + s 2 )31 + )31 + s n 1. Characters are 8 bits long; 31 is prime. size_t hash ( string s ) size_t r = 0; for(size_t i=0; i<s.length(); i++) r = r*31 + s[i]; return r;

For a string s = s 0 s 1 s 2 s n 1, compute hashing k = s 0 31 n 1 + s 1 31 n 2 + s 2 31 n 3 + + s n 1 = ( ((s 0 31 + s 1 )31 + s 2 )31 + )31 + s n 1. Characters are 8 bits long; 31 is prime. size_t hash ( string s ) size_t r = 0; for(size_t i=0; i<s.length(); i++) r = r*31 + s[i]; return r;

open addressing and chaining The two ways to organize hash differ in the ways of resolving collisions. For some index, we consider 1 open addressing: if index occupied take next one; Locating an element is called probing. 2 chaining: keep list of elements at position index. The list is called a bucket, we speak of bucket hashing.

ing 1 2 3

We define in the namespace mcs360_open_hash_table. The hash table stores a frequency table: type of the key is a string, value type is an integer. typedef std::pair<std::string,int> entry; We create hash table with number of elements n, the capacity of the table.

mcs360_open_hash_table.h #ifndef MCS360_OPEN_HASH_TABLE_H #define MCS360_OPEN_HASH_TABLE_H #include <utility> #include <string> #include <vector> namespace mcs360_open_hash_table class _Table public: typedef std::pair<std::string,int> entry; _Table( size_t n ); bool insert( const entry& e ); // returns true if e is inserted well int value( std::string s ); // returns -1 if s is not found

private members private: size_t hash ( std::string s ); // returns an index in the table // applying a hash function to the string s std::vector<entry*> table; // data member Our hash table will have a fixed capacity, determined at instantiation. Enlarging the hash table requires rehashing.

private members private: size_t hash ( std::string s ); // returns an index in the table // applying a hash function to the string s std::vector<entry*> table; // data member Our hash table will have a fixed capacity, determined at instantiation. Enlarging the hash table requires rehashing.

_Table T(n); test code snippets while(true) // inserting string w; cin >> w; _Table::entry e; e.first = w; cin >> e.second; if(t.insert(e)) // rest omitted while(true) // searching string w; cin >> w; int v = T.value(w); if(v == -1) // rest omitted

_Table T(n); test code snippets while(true) // inserting string w; cin >> w; _Table::entry e; e.first = w; cin >> e.second; if(t.insert(e)) // rest omitted while(true) // searching string w; cin >> w; int v = T.value(w); if(v == -1) // rest omitted

_Table T(n); test code snippets while(true) // inserting string w; cin >> w; _Table::entry e; e.first = w; cin >> e.second; if(t.insert(e)) // rest omitted while(true) // searching string w; cin >> w; int v = T.value(w); if(v == -1) // rest omitted

constructor and hash function _Table::_Table( size_t n ) for(int i=0; i<n; i++) table.push_back(null); size_t _Table::hash ( std::string s ) size_t r = 0; for(size_t i=0; i<s.length(); i++) r = r*31 + s[i]; return (r % table.size());

ing 1 2 3

insert() method bool _Table::insert( const entry& e ) size_t i = hash(e.first); if(table[i] == NULL) table[i] = new entry(e); return true; else if(table[i]->first == e.first) table[i]->second = e.second; return true;

insert() continued else for(int j=1; j<table.size(); j++) size_t k = (i+j) % table.size(); if(table[k] == NULL) table[k] = new entry(e); return true; else if(table[k]->first == e.first) table[k]->second = e.second; return true; return false;

search the value int _Table::value( std::string s ) size_t i = hash(s); if(table[i] == NULL) return -1; else if(table[i]->first == s) return table[i]->second; else for(int j=1; j<table.size(); j++) size_t k = (i+j) % table.size(); if(table[k] == NULL) return -1; else if(table[k]->first == s) return table[k]->second; return -1;

evaluation of open addressing Advantage: no other data structure needed to deal with collisions. Disadvantages: clusters may overlap; alternative to linear probing: quadratic probing in quadratic probing, increments form quadratic series: 1 2 + 2 2 + 3 2 + + i 2 deleted items must be marked and cannot be deallocated because of open addressing.

evaluation of open addressing Advantage: no other data structure needed to deal with collisions. Disadvantages: clusters may overlap; alternative to linear probing: quadratic probing in quadratic probing, increments form quadratic series: 1 2 + 2 2 + 3 2 + + i 2 deleted items must be marked and cannot be deallocated because of open addressing.

evaluation of open addressing Advantage: no other data structure needed to deal with collisions. Disadvantages: clusters may overlap; alternative to linear probing: quadratic probing in quadratic probing, increments form quadratic series: 1 2 + 2 2 + 3 2 + + i 2 deleted items must be marked and cannot be deallocated because of open addressing.

ing 1 2 3

In the namespace mcs360_chain_hash_table we define a _Table. The hash table stores a frequency table: type of the key is a string, value type is an integer. typedef std::pair<std::string,int> entry; We create a hash table with number of elements n, the capacity of the table.

mcs360_chain_hash_table.h #ifndef MCS360_CHAIN_HASH_TABLE_H #define MCS360_CHAIN_HASH_TABLE_H #include <utility> #include <string> #include <vector> #include <list> namespace mcs360_chain_hash_table class _Table public: typedef std::pair<std::string,int> entry; _Table( size_t n ); bool insert( const entry& e ); // returns true if e is inserted well int value( std::string s ); // returns -1 if s is not found

private members The public part of this _Table is the same as with open addressing use the same interactive test private: size_t hash ( std::string s ); // returns an index in the table // applying a hash function to the string s std::vector< std::list<entry> > table; // data member

private members The public part of this _Table is the same as with open addressing use the same interactive test private: size_t hash ( std::string s ); // returns an index in the table // applying a hash function to the string s std::vector< std::list<entry> > table; // data member

constructor and hash function _Table::_Table( size_t n ) for(int i=0; i<n; i++) std::list<entry> L; table.push_back(l); size_t _Table::hash ( std::string s ) size_t r = 0; for(size_t i=0; i<s.length(); i++) r = r*31 + s[i]; return (r % table.size());

ing 1 2 3

the insert() method bool _Table::insert( const entry& e ) size_t i = hash(e.first); table[i].push_back(e); return true;

search a value int _Table::value( std::string s ) size_t i = hash(s); if(table[i].size() == 0) return -1; else std::list<entry> L = table[i]; for(std::list<entry>::const_iterator k = L.begin(); k!= L.end(); k++) if(k->first == s) return k->second; return -1;

evaluation of hashing Resolving collisions with chaining: collisions affect only collided keys, flexible insert and effective delete, at expensive of list data structure. The STL list takes care of bucket management. We covered very basic implementations, left to do: 1 write delete methods 2 templates for key and value type 3 iterators on the

evaluation of hashing Resolving collisions with chaining: collisions affect only collided keys, flexible insert and effective delete, at expensive of list data structure. The STL list takes care of bucket management. We covered very basic implementations, left to do: 1 write delete methods 2 templates for key and value type 3 iterators on the

Summary + Assignments Covered more of chapter 9, elements of 9.3, 9.4, and 9.5. Assignments: 1 Download a text from www.gutenberg.org and write code to read the words from file. Use the STL map to make a frequency table of the keys applying on the the function hash of this lecture. Study the number of collisions for various sizes of the table. 2 To hash, would it help to apply the hash function h(i) =(p i) mod 2 32 on the key returned by the hash function on? Justify your answer, eventually referring to your answer to the previous exercise.

more assignments 3 Run the code on a small table with 3 entries to simulate collisions. Add printing statements and sketch the evolution of the hash table as more elements are inserted. Compare open addressing with chaining. 4 For the hash table with open addressing, modify the code of insert() to use quadratic probing. Illustrate with an example, choosing appropriate table size and sequence of insertions, how quadratic probing may reduce the effect of clustering. Homework collection on Friday 29 October, at noon: #1,2 of L-21; #1,2 of L-22; and #4 of L-23.