CSCI-1200 Data Structures Fall 2018 Lecture 22 Hash Tables, part 2 & Priority Queues, part 1

Similar documents
CSCI-1200 Data Structures Fall 2017 Lecture 23 Functors & Hash Tables, part II

CSCI-1200 Data Structures Spring 2016 Lecture 22 Priority Queues, part II (& Functors)

CSCI-1200 Data Structures Fall 2015 Lecture 24 Hash Tables

CSCI-1200 Data Structures Fall 2009 Lecture 20 Hash Tables, Part II

CSCI-1200 Data Structures Fall 2018 Lecture 23 Priority Queues II

CSCI-1200 Data Structures Spring 2014 Lecture 19 Hash Tables

CSCI-1200 Data Structures Spring 2016 Lecture 25 Hash Tables

CSCI-1200 Data Structures Fall 2018 Lecture 21 Hash Tables, part 1

CIS 190: C/C++ Programming. Lecture 12 Student Choice

CSCI-1200 Data Structures Fall 2017 Lecture 10 Vector Iterators & Linked Lists

CSCI-1200 Data Structures Spring 2018 Lecture 10 Vector Iterators & Linked Lists

CSCI-1200 Data Structures Spring 2018 Lecture 14 Associative Containers (Maps), Part 1 (and Problem Solving Too)

CSCI-1200 Data Structures Fall 2013 Lecture 9 Iterators & Lists

CSCI-1200 Data Structures Fall 2017 Lecture 17 Trees, Part I

Priority Queues. 1 Introduction. 2 Naïve Implementations. CSci 335 Software Design and Analysis III Chapter 6 Priority Queues. Prof.

Course Review. Cpt S 223 Fall 2009

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

University of Illinois at Urbana-Champaign Department of Computer Science. Final Examination

Prefix Trees Tables in Practice

Course Review. Cpt S 223 Fall 2010

CSE 332 Winter 2015: Midterm Exam (closed book, closed notes, no calculators)

CSCI-1200 Data Structures Spring 2018 Lecture 16 Trees, Part I

CSCI-1200 Data Structures Spring 2015 Lecture 18 Trees, Part I

BINARY HEAP cs2420 Introduction to Algorithms and Data Structures Spring 2015

CSCI-1200 Data Structures Spring 2018 Lecture 7 Order Notation & Basic Recursion

Cpt S 223 Fall Cpt S 223. School of EECS, WSU

Summer Final Exam Review Session August 5, 2009

TEMPLATES AND ITERATORS

HEAPS & PRIORITY QUEUES

Course Review for Finals. Cpt S 223 Fall 2008

University of Illinois at Urbana-Champaign Department of Computer Science. Final Examination

CE221 Programming in C++ Part 1 Introduction

CSCI-1200 Data Structures Test 3 Practice Problem Solutions

CMSC 341 Priority Queues & Heaps. Based on slides from previous iterations of this course

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

Binary Search Trees Treesort

CS 104 (Spring 2014) Final Exam 05/09/2014

CSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials)

CSCI-1200 Data Structures Fall 2017 Lecture 5 Pointers, Arrays, & Pointer Arithmetic

list<t>::const_iterator it1 = lst.begin(); // points to first element list<t>::const_iterator it2 = lst.begin(); // points to second element it2++;

About this exam review

Tables, Priority Queues, Heaps

CSCI-1200 Data Structures Fall 2017 Lecture 25 C++ Inheritance and Polymorphism

CSCI-1200 Computer Science II Fall 2006 Lecture 23 C++ Inheritance and Polymorphism

You must include this cover sheet. Either type up the assignment using theory5.tex, or print out this PDF.

Lecture 23: Priority Queues 10:00 AM, Mar 19, 2018

CSCI-1200 Data Structures Spring 2018 Lecture 8 Templated Classes & Vector Implementation

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

Priority Queues and Heaps (continues) Chapter 13: Heaps, Balances Trees and Hash Tables Hash Tables In-class Work / Suggested homework.

Lecture 26: Graphs: Traversal (Part 1)

We don t have much time, so we don t teach them [students]; we acquaint them with things that they can learn. Charles E. Leiserson

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Draw the resulting binary search tree. Be sure to show intermediate steps for partial credit (in case your final tree is incorrect).

Recitation 9. Prelim Review

l Determine if a number is odd or even l Determine if a number/character is in a range - 1 to 10 (inclusive) - between a and z (inclusive)

CSCI-1200 Data Structures Fall 2017 Lecture 7 Order Notation & Basic Recursion

CSCE 210/2201 Data Structures and Algorithms. Prof. Amr Goneid

CSCE 2014 Final Exam Spring Version A

CSCE 210/2201 Data Structures and Algorithms. Prof. Amr Goneid. Fall 2018

Abstract vs concrete data structures HEAPS AND PRIORITY QUEUES. Abstract vs concrete data structures. Concrete Data Types. Concrete data structures

use static size for this buffer

CSCI-1200 Data Structures Fall 2018 Lecture 7 Templated Classes & Vector Implementation

Lecture Notes on Priority Queues

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

CMSC 341 Lecture 14: Priority Queues, Heaps

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

CSE 373: Data Structures and Algorithms

CSCI-1200 Data Structures Fall 2018 Lecture 19 Trees, Part III

COMP 103 Introduction to Data Structures and Algorithms

Slide Set 14. for ENCM 339 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary

PRIORITY QUEUES AND HEAPS

What will happen if we try to compile, link and run this program? Do you have any comments to the code?

CS302 Data Structures using C++

Cpt S 223. School of EECS, WSU

CSE 373 Autumn 2012: Midterm #2 (closed book, closed notes, NO calculators allowed)

CSCI-1200 Data Structures Spring 2016 Lecture 7 Iterators, STL Lists & Order Notation

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

QUIZ. What is wrong with this code that uses default arguments?

Lesson 13 - Vectors Dynamic Data Storage

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Course Review for. Cpt S 223 Fall Cpt S 223. School of EECS, WSU

CSCI 102L - Data Structures Midterm Exam #2 Spring 2011

CSCI-1200 Data Structures Fall 2017 Lecture 18 Trees, Part II

Standard ADTs. Lecture 19 CS2110 Summer 2009

EXAMINATIONS 2005 END-YEAR. COMP 103 Introduction to Data Structures and Algorithms

+ Abstract Data Types

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

University of Illinois at Urbana-Champaign Department of Computer Science. Second Examination

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Practice Midterm Exam Solutions

Announcements HEAPS & PRIORITY QUEUES. Abstract vs concrete data structures. Concrete data structures. Abstract data structures

Assignment 1: grid. Due November 20, 11:59 PM Introduction

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

6.7 b. Show that a heap of eight elements can be constructed in eight comparisons between heap elements. Tournament of pairwise comparisons

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

CS301 - Data Structures Glossary By

CSE 332: Data Structures & Parallelism Lecture 7: Dictionaries; Binary Search Trees. Ruth Anderson Winter 2019

Algorithms and Data Structures

University of Illinois at Urbana-Champaign Department of Computer Science. Second Examination

Transcription:

Review from Lecture 21 CSCI-1200 Data Structures Fall 2018 Lecture 22 Hash Tables, part 2 & Priority Queues, part 1 the single most important data structure known to mankind Hash Tables, Hash Functions, and Collision Resolution Performance of: Hash Tables vs. Binary Search Trees Collision resolution: separate chaining Using a hash table to implement a set/map Iterators, find, insert, and erase Today s Lecture Using STL s for_each Something weird & cool in C++... Function Objects, a.k.a. Functors Hash Tables, part II STL s unordered_set (and unordered_map) Hash functions as functors/function objects (or non-type template parameters, or function pointers) Collision resolution: separate chaining vs. open addressing STL Queue and STL Stack Definition of a Binary Heap What s a Priority Queue? A Priority Queue as a Heap 22.1 Using STL s for each First, here s a tiny helper function: void float_print (float f) { std::cout << f << std::endl; Let s make an STL vector of floats: std::vector<float> my_data; my_data.push_back(3.14); my_data.push_back(1.41); my_data.push_back(6.02); my_data.push_back(2.71); Now we can write a loop to print out all the data in our vector: std::vector<float>::iterator itr; for (itr = my_data.begin(); itr!= my_data.end(); itr++) { float_print(*itr); Alternatively we can use it with STL s for_each function to visit and print each element: std::for_each(my_data.begin(), my_data.end(), float_print); Wow! That s alot less to type. Can I stop using regular for and while loops altogether?

We can actually also do the same thing without creating & explicitly naming the float_print function. We create an anonymous function using lambda: std::for_each(my_data.begin(), my_data.end(), [](float f){ std::cout << f << std::end; ); Lambda is new to the C++ language (part of C++11). But lambda is a core piece of many classic, older programming languages including Lisp and Scheme. Python lambdas and Perl anonymous subroutines are similar. (In fact lambda dates back to the 1930 s, before the first computers were built!) You ll learn more about lambda more in later courses like CSCI 4430 Programming Languages! 22.2 Function Objects, a.k.a. Functors In addition to the basic mathematical operators + - * / < >, another operator we can overload for our C++ classes is the function call operator. Why do we want to do this? This allows instances or objects of our class, to be used like functions. It s weird but powerful. Here s the basic syntax. Any specific number of arguments can be used. class my_class_name { //... normal class stuff... my_return_type operator() ( /* my list of args */ ); ; 22.3 Why are Functors Useful? One example is the default 3rd argument for std::sort. We know that by default STL s sort routines will use the less than comparison function for the type stored inside the container. How exactly do they do that? First let s define another tiny helper function: bool float_less(float x, float y) { return x < y; Remember how we can sort the my_data vector defined above using our own homemade comparison function for sorting: std::sort(my_data.begin(),my_data.end(),float_less); If we don t specify a 3rd argument: std::sort(my_data.begin(),my_data.end()); This is what STL does by default: std::sort(my_data.begin(),my_data.end(),std::less<float>()); What is std::less? It s a templated class. Above we have called the default constructor to make an instance of that class. Then, that instance/object can be used like it s a function. Weird! How does it do that? std::less is a teeny tiny class that just contains the overloaded function call operator. template <class T> class less { bool operator() (const T& x, const T& y) const { return x < y; ; You can use this instance/object/functor as a function that expects exactly two arguments of type T (in this example float) that returns a bool. That s exactly what we need for std::sort! This ultimately does the same thing as our tiny helper homemade compare function! 2

22.4 Another more Complicated Functor Example Constructors of function objects can be used to specify internal data for the functor that can then be used during computation of the function call operator! For example: class between_values { private: float low, high; between_values(float l, float h) : low(l), high(h) { bool operator() (float val) { return low <= val && val <= high; ; The range between low & high is specified when a functor/an instance of this class is created. We might have multiple different instances of the between_values functor, each with their own range. Later, when the functor is used, the query value will be passed in as an argument. The function call operator accepts that single argument val and compares against the internal data low & high. This can be used in combination with STL s find_if construct. For example: between_values two_and_four(2,4); if (std::find_if(my_data.begin(), my_data.end(), two_and_four)!= my_data.end()) { std::cout << "Found a value greater than 2 & less than 4!" << std::endl; Alternatively, we could create the functor without giving it a variable name. And in the use below we also capture the return value to print out the first item in the vector inside this range. Note that it does not print all values in the range. std::vector<float>::iterator itr; itr = std::find_if(my_data.begin(), my_data.end(), between_values(2,4)); if (itr!= my_data.end()) { std::cout << "my_data contains " << *itr << ", a value greater than 2 & less than 4!" << std::endl; Weird Things we can do in C++ Finished Now back to Hash Tables! 22.5 Hash Table in STL? The Standard Template Library standard and implementation of hash table have been slowly evolving over many years. Unfortunately, the names hashset and hashmap were spoiled by developers anticipating the STL standard, so to avoid breaking or having name clashes with code using these early implementations... STL s agreed-upon standard for hash tables: unordered set and unordered map Depending on your OS/compiler, you may need to add the -std=c++11 flag to the compile line (or other configuration tweaks) to access these more recent pieces of STL. (And this will certainly continue to evolve in future years!) For many types STL has a good default hash function, so in those cases you do not need to provide your own hash function. But sometimes we do want to write our own... 22.6 Writing our own Hash Functions or Hash Functors Often the programmer/designer for the program using a hash function has the best understanding of the distribution of data to be stored in the hash function. Thus, they are in the best position to define a custom hash function (if needed) for the data & application. Here s an example of a (generically) good hash function for STL strings: Note: This implementation comes from http://www.partow.net/programming/hashfunctions/ unsigned int MyHashFunction(std::string const& key) { unsigned int hash = 1315423911; for(unsigned int i = 0; i < key.length(); i++) hash ^= ((hash << 5) + key[i] + (hash >> 2)); return hash; 3

Alternately, this same string hash code can be written as a functor which is just a class wrapper around a function, and the function is implemented as the overloaded function call operator for the class. class MyHashFunctor { unsigned int operator() (std::string const& key) const { unsigned int hash = 1315423911; for(unsigned int i = 0; i < key.length(); i++) hash ^= ((hash << 5) + key[i] + (hash >> 2)); return hash; ; Once our new type containing the hash function is defined, we can create instances of our hash set object containing std::string by specifying the type MyHashFunctor as the second template parameter to the declaration of a ds_hashset. E.g., ds_hashset<std::string, MyHashFunctor> my_hashset; 22.7 Using STL s Associative Hash Table (Map) Using the default std::string hash function. With no specified initial table size. std::unordered_map<std::string,foo> m; Optionally specifying initial (minimum) table size. std::unordered_map<std::string,foo> m(1000); Using a home-made std::string hash function. Note: We are required to specify the initial table size. Manually specifying the hash function type. std::unordered_map<std::string,foo,std::function<unsigned int(std::string)> > m(1000, MyHashFunction); Using the decltype specifier to get the declared type of an entity. std::unordered_map<std::string,foo,decltype(&myhashfunction)> m(1000, MyHashFunction); Using a home-made std::string hash functor or function object. With no specified initial table size. std::unordered_map<std::string,foo,myhashfunctor> m; Optionally specifying initial (minimum) table size. std::unordered_map<std::string,foo,myhashfunctor> m(1000); Note: In the above examples we re creating a association between two types (STL strings and custom Foo object). If you d like to just create a set (no associated 2nd type), simply switch from unordered_map to unordered_set and remove the Foo from the template type in the examples above. 22.8 How do we Resolve Collisions? METHOD 1: Separate Chaining NOTE: We used this method in the last lecture & in lab! Each table location stores a linked list of keys (and values) hashed to that location. Thus, the hashing function really just selects which list to search or modify. This works well when the number of items stored in each list is small, e.g., an average of 1. Other data structures, such as binary search trees, may be used in place of the list, but these have even greater overhead considering the (hopefully, very small) number of items stored per bin. 22.9 How do we Resolve Collisions? METHOD 2: Open Addressing Let s eliminate the individual memory allocations and pointer indirection/dereferencing that are necessary for separate chaining. This will improve memory / data access performance. We will directly store the data (key/key-value pair) in the the top level vector, and store at most one item per index/location. When the chosen table index/location already stores a key (or key-value pair), we will seek a different table location to store the new value (or pair). 4

Here are three different open addressing variations to handle a collision during an insert operation: Linear probing: If i is the chosen hash location then the following sequence of table locations is tested ( probed ) until an empty location is found: (i+1)%n, (i+2)%n, (i+3)%n,... Quadratic probing: If i is the hash location then the following sequence of table locations is tested: (i+1)%n, (i+2*2)%n, (i+3*3)%n, (i+4*4)%n,... More generally, the j th probe of the table is (i + c 1 j + c 2 j 2 ) mod N where c 1 and c 2 are constants. Secondary hashing: When a collision occurs a second hash function is applied to compute a new table location. If that location is also full, we go to a third hash function, etc. This is repeated until an empty location is found. We can generate a sequence/family of hash functions by swapping in a fixed random-like sequence of big (prime?) constants values into the same general function structure. For each of these approaches, the find operation follows the same sequence of locations as the insert operation. The key value is determined to be absent from the table only when an empty location is found. When using open addressing to resolve collisions, the erase function must mark a location as formerly occupied. If a location is instead marked empty, find may fail to return elements that are actually in the table. Formerly-occupied locations may (and should) be reused, but only after the find search operation has been run to completion to determine the item is definitely not in the table. Problems with open addressing: Fails completely when the table is full. So we MUST resize before it gets full! Slows dramatically when the table is nearly full (e.g. about 80% or higher). This is particularly problematic for linear probing. Memory cache performance can be poor when we are jumping around unpredictably in the top level array. Cost of computing new hash values (linear < quadratic < secondary hashing). Careful testing and parameter tuning is necessary to achieve optimal memory/speed performance. 22.10 Additional STL Container Classes: Stacks and Queues We ve studied STL vectors, lists, maps, and sets. These data structures provide a wide range of flexibility in terms of operations. One way to obtain computational efficiency is to consider a simplified set of operations or functionality. For example, with a hash table we give up the notion of a sorted table and gain in find, insert, & erase efficiency. 2 additional examples are: Stacks allow access, insertion and deletion from only one end called the top There is no access to values in the middle of a stack. Stacks may be implemented efficiently in terms of vectors and lists, although vectors are preferable. All stack operations are O(1) Queues allow insertion at one end, called the back and removal from the other end, called the front There is no access to values in the middle of a queue. Queues may be implemented efficiently in terms of a list. Using vectors for queues is also possible, but requires more work to get right. All queue operations are O(1) 22.11 Suggested Exercises: Tree Traversal using a Stack and Queue Given a pointer to the root node in a binary tree: Use an STL stack to print the elements with a pre-order traversal ordering. This is straightforward. Use an STL stack to print the elements with an in-order traversal ordering. This is more complicated. Use an STL queue to print the elements with a breadth-first traversal ordering. 5

22.12 What s a Priority Queue? Priority queues are used in prioritizing operations. Examples include a personal to do list, what order to do homework assignments, jobs on a shop floor, packet routing in a network, scheduling in an operating system, or events in a simulation. Among the data structures we have studied, their interface is most similar to a queue, including the idea of a front or top and a tail or a back. Each item is stored in a priority queue using an associated priority and therefore, the top item is the one with the lowest value of the priority score. The tail or back is never accessed through the public interface to a priority queue. The main operations are push (a.k.a. insert), and pop (a.k.a. delete_min). 22.13 Some Data Structure Options for Implementing a Priority Queue Vector or list, either sorted or unsorted At least one of the operations, push or pop, will cost linear time, at least if we think of the container as a linear structure. Binary search trees If we use the priority as a key, then we can use a combination of finding the minimum key and erase to implement pop. An ordinary binary-search-tree insert may be used to implement push. This costs logarithmic time in the average case (and in the worst case as well if balancing is used). The latter is the better solution, but we would like to improve upon it for example, it might be more natural if the minimum priority value were stored at the root. We will achieve this with binary heap, giving up the complete ordering imposed in the binary search tree. 22.14 Definition: Binary Heaps A binary heap is a complete binary tree such that at each internal node, p, the value stored is less than the value stored at either of p s children. A complete binary tree is one that is completely filled, except perhaps at the lowest level, and at the lowest level all leaf nodes are as far to the left as possible. Binary heaps will be drawn as binary trees, but implemented using vectors! Alternatively, the heap could be organized such that the value stored at each internal node is greater than the values at its children. 22.15 Exercise: Drawing Binary Heaps Draw two different binary heaps with these values: 52 13 48 7 32 40 18 25 4 Draw several other trees with these values that not binary heaps. And we will continue with Priority Queues, Part 2 on Tuesday. Yes! We will have lecture on the Tuesday before Thanksgiving! 6