CPS222 Lecture: Sets. 1. Projectable of random maze creation example 2. Handout of union/find code from program that does this

Similar documents
Lecture 13. Reading: Weiss, Ch. 9, Ch 8 CSE 100, UCSD: LEC 13. Page 1 of 29

CS211 Lecture: Database Querying and Updating

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Disjoint Sets. The obvious data structure for disjoint sets looks like this.

COS 226 Midterm Fall 2007

CPS221 Lecture: Relational Database Querying and Updating

CPS221 Lecture: Relational Database Querying and Updating

CSE 373 MAY 10 TH SPANNING TREES AND UNION FIND

LECTURE 26 PRIM S ALGORITHM

Union/Find Aka: Disjoint-set forest. Problem definition. Naïve attempts CS 445

Trees in java.util. A set is an object that stores unique elements In Java, two implementations are available:

Final Examination CSE 100 UCSD (Practice)

Outline Purpose Disjoint Sets Maze Generation. Disjoint Sets. Seth Long. March 24, 2010

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Set and Set Operations

Disjoint Sets. Based on slides from previous iterations of this course.

Computational Geometry

Disjoint Sets. (Disjoint Sets) Data Structures and Programming Spring / 50

Sets. Mukulika Ghosh. Fall Based on slides by Dr. Hyunyoung Lee

1.2 Venn Diagrams and Partitions

Lecture Notes on Tries

Sets MAT231. Fall Transition to Higher Mathematics. MAT231 (Transition to Higher Math) Sets Fall / 31

B-Trees. Introduction. Definitions

CSCI-1200 Data Structures Fall 2018 Lecture 23 Priority Queues II

CPS352 Lecture - Indexing

Binary Heaps in Dynamic Arrays

CPS122 Lecture: From Python to Java last revised January 4, Objectives:

1 Format. 2 Topics Covered. 2.1 Minimal Spanning Trees. 2.2 Union Find. 2.3 Greedy. CS 124 Quiz 2 Review 3/25/18

CSE 332 Data Abstractions: Disjoint Set Union-Find and Minimum Spanning Trees

CSCI E 119 Section Notes Section 11 Solutions

CMSC 341 Lecture 20 Disjointed Sets

CSE 530A. B+ Trees. Washington University Fall 2013

Topic 22 Hash Tables

COUNTING AND PROBABILITY

Course Review. Cpt S 223 Fall 2009

2.2 Set Operations. Introduction DEFINITION 1. EXAMPLE 1 The union of the sets {1, 3, 5} and {1, 2, 3} is the set {1, 2, 3, 5}; that is, EXAMPLE 2

Implementation of Lexical Analysis. Lecture 4

13 A: External Algorithms II; Disjoint Sets; Java API Support

Exercises: Disjoint Sets/ Union-find [updated Feb. 3]

CSE 100 Disjoint Set, Union Find

Slides for Faculty Oxford University Press All rights reserved.

Data Structure. A way to store and organize data in order to support efficient insertions, queries, searches, updates, and deletions.

Disjoint Sets and the Union/Find Problem

CSE100 Practice Final Exam Section C Fall 2015: Dec 10 th, Problem Topic Points Possible Points Earned Grader

COSC 2007 Data Structures II Final Exam. Part 1: multiple choice (1 mark each, total 30 marks, circle the correct answer)

CMSC 341 Lecture 20 Disjointed Sets

CSE 100: GRAPH ALGORITHMS

CSE 100 Advanced Data Structures

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs

Database Applications (15-415)

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

Lecture Notes. char myarray [ ] = {0, 0, 0, 0, 0 } ; The memory diagram associated with the array can be drawn like this

Eric Roberts Handout #39 CS 106B February 20, 2015 Sets

Minimum spanning trees

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

COMP Data Structures

CMSC 341 Lecture 20 Disjointed Sets

Solved by: Syed Zain Ali Bukhari (BSCS)

Lecture 32. No computer use today. Reminders: Homework 11 is due today. Project 6 is due next Friday. Questions?

Data Structure. IBPS SO (IT- Officer) Exam 2017

COS 226 Midterm Exam, Spring 2011

STUDENT LESSON AB30 Binary Search Trees

Course Review for Finals. Cpt S 223 Fall 2008

So on the survey, someone mentioned they wanted to work on heaps, and someone else mentioned they wanted to work on balanced binary search trees.

SOLUTIONS. COMP103 Introduction to Data Structures and Algorithms

Taibah University College of Computer Science & Engineering Course Title: Discrete Mathematics Code: CS 103. Chapter 2. Sets

CS100: DISCRETE STRUCTURES

String Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42

EXAMINATIONS 2005 END-YEAR. COMP 103 Introduction to Data Structures and Algorithms

Topic 10: The Java Collections Framework (and Iterators)

COS 226 Midterm Exam, Spring 2009

Horn Formulae. CS124 Course Notes 8 Spring 2018

Principles of Computer Game Design and Implementation. Lecture 13

Today s Topics. What is a set?

Prelim 2. CS 2110, November 20, 2014, 7:30 PM Extra Total Question True/False Short Answer

CS2110: Software Development Methods. Maps and Sets in Java

Operations on Heap Tree The major operations required to be performed on a heap tree are Insertion, Deletion, and Merging.

COMP : Trees. COMP20012 Trees 219

Distributed Data Structures and Algorithms for Disjoint Sets in Computing Connected Components of Huge Network

COMS 1003 Fall Introduction to Computer Programming in C. Bits, Boolean Logic & Discrete Math. September 13 th

2.2 Syntax Definition

Sorting and Selection

CSC 421: Algorithm Design Analysis. Spring 2017

Graphs and Network Flows ISE 411. Lecture 7. Dr. Ted Ralphs

Chapter 13: Query Processing

Disjoint set (Union-Find)

Computational Geometry

Spring 2018 Mentoring 13: April 25, New World Order. return a list of all the words that start with a given prefix.

NAME: c. (true or false) The median is always stored at the root of a binary search tree.

Lecture Notes on Binary Decision Diagrams

ext Total Score /20 /20 /15 /20 /25 /5 Grader

CSE202 Greedy algorithms. Fan Chung Graham

Computing intersections in a set of line segments: the Bentley-Ottmann algorithm

managing an evolving set of connected components implementing a Union-Find data structure implementing Kruskal s algorithm

Algebra of Sets. Aditya Ghosh. April 6, 2018 It is recommended that while reading it, sit with a pen and a paper.

Priority Queues. 04/10/03 Lecture 22 1

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

DATA STRUCTURES AND ALGORITHMS

or else take their intersection. Now define

Transcription:

CPS222 Lecture: Sets Objectives: last revised April 16, 2015 1. To introduce representations for sets that can be used for various problems a. Array or list of members b. Map-based representation c. Bit vectors d. Union-Find structure 2. To introduce basic algorithms for working with sets a. Merge-based algorithms for intersection, disjunction, difference b. Union-Find Materials: 1. Projectable of random maze creation example 2. Handout of union/find code from program that does this I. Introduction A. One issue that arises in many kinds of problems is the need to represent and work with a type of information that can be understood as a mathematical set - i.e. each potential element is present exactly once or not at all, and order is not important. 1. The fundamental operation on a set is the contains test - does the set contain a specific element? 2. But other operations such as union, intersection, difference etc. are often important. B. There are a variety of different ways of representing sets, with the choice depending on the nature of the information and how it is to be manipulated. 1

1. A simple representation is to use a container such as an array or list that holds the set elements, which can, in this case, be of any type. a) Example: the set { Aardvark, Buffalo, Cat, Dog, Elephant } might be represented by a sequence such as an array or list whose elements are the strings "Aardvark", "Buffalo" etc. b) Of course, this representation implicitly creates an order for the elements, but the order is generally regarded as unimportant c) Of more concern is the fact that the contains() operation will involved iterating over all the elements in the set - which is O(set size) 2. Both the Java library and the C++ STL incorporate set containers, which are actually implemented as maps, with each key simply being mapped to itself. a) As you recall, Java offers two implementations: HashSet and TreeSet, which are actually realized by HashMap and TreeMap, respectively. Of course, the latter does, in fact, keep elements in an order and thus requires that the element type support comparison (e.g. that it is something like an Integer or a String.) b) C++ draw the same distinction, but the uses the name unordered_set for the version that uses hashing and reserves the name set for the version that maintains order (and hence must have an element type for which comparison is defined, either builtin by < or user-defined. c) Either representation makes a quick implementation of contains() easy (its just key lookup in the map, hence O(1) or O(log set size). 3. For representing sets whose elements are drawn from a fixed (and usually small) universe, a representation known as a bit vector can be used. 2

a) Example: suppose we are interested in sets of letters of the alphabet - e.g. vowels = { A, E, I, O, U }, labial consonants (consonants made with the lips) = { B, F, M, P, V } etc. In this case, our universe has 26 members. b) To represent sets as a bit vector, we use a vector of bits equal in length to the size of the universe (e.g. a vector of 26 bits for letters of the alphabet), with one position explicitly corresponding to each element of the universe (e.g. the leftmost bit might represent A, the one next to it B... the rightmost bit Z). The representation for a set has a 1 in a given position if the corresponding element is in the set, and 0 if it is not - e.g. the set vowels might be represented by the bit vector 10001000100000100000100000. c) This supports a fast implementation of contains - simply look to see if the bit corresponding to the desired value is set. d) But, of course, this is limited to cases where the set of possible values is small and can easily be mapped to bit positions. 4. We will look at one more structure - the union-find structure - later. II. Standard Set Operations A. Many operations on sets make use of the standard set operations union, intersection, difference and - more rarely - negation (which is only meaningful if the set is based on a finite universe.) Example: consider the search operations offered in many contexts. Operations such as and, or, and not map to set operations - e.g. the set of documents containing two different keywords is the intersection of the two sets of documents containing each keyword. B. If the sets are represented as sequences or maps, one (computationally expensive) way to perform these operations would be to make use of the contains operation. 3

1. Example: A B could be implemented as follows: iterate through the elements of set A. For each element, test if B contains it, and include it in the result only if it does. 2. This approach could be as bad as O(n 2 ) if the contains operation is implemented by iterating through all the elements of a set to see if it is there. If a tree-based representation for sets is used, it would be O(n log n)), and if a hashtable is used for sets, it might be O(n), though possibly with a large constant of proportionality. 3. If some sort of ordered representation is used (a tree or if an array or list is kept in some order), a much faster implementation is possible based on merging. a) What we do is iterate through both sets in order. Let currenta be the current element from set A, and currentb the current element from set B. If we have run out of elements in a set, let current for that set be a "manufactured" element that tests greater than any real element. Then while not past the end of both sets: if currenta == currentb : include the element in the result if the operation is union or intersection, but not if it is difference advance to the next element in both sets else if currenta < currentb include currenta in the result if the operation is union or difference, but not if it is intersection advance to the next element in set A only else (it must be that currenta > currentb) include currentb in the result if the operation is union, but not if it is intersection or difference advance to the next element in set B only 4

b) Example: calculate the intersection of { Aardvark, Buffalo, Zebra } and { Buffalo, Cat } currenta is Aardvark currentb is Buffalo Since currenta < currentb, advance to the next element in A, (so currenta is now Buffalo) Since currenta == currentb, include Buffalo in the result and advance to the next element in both A and B. (so now currenta is Zebra and is currentb Cat) Since currenta > currentb, advance to the next element in B, (so currentb is now ) Since currenta < currentb, advance to the next element in A, (so currenta is now ) Stop - result is { Buffalo } c) Of course, this approach is O(n) 4. If the set is represented by a bit vector, basic set operations can be done even more efficiently by using bitwise logical operations - e.g. set union is bitwise or; set intersection is bitwise and, set negation (possible with a finite universe) is bitwise complement, and difference is bitwise and with the complement. III.Union-Find Example: Represent the consonants and vowels by bit vectors; { 01110111011111011111011111 and 1000100010000100000100010 } show how to find letter than is both by set intersection. A. One category of operations that shows up in a number of places is based on the notion of a partition of a set. A partition is a set of subsets of the original set such that each element of the original set shows up in exactly one subset. 1. Example: create a partition of the class based on academic year 5

2. Example: because of double majors, it would not be possible to create a partition of a class based on academic major (since some people would show up in two subsets) - though it would be possible if we used first major as the basis for partitioning. B. For a partition, two operations are especially useful: 1. Find the partition a particular element belongs to. 2. Take the union of two subsets so that all the elements in either subset now belong to united subset. C. An example where this is useful: suppose we wanted to randomly generate a solvable maze. One way to do this would be to start with a maze structure in which all the possible walls were present, and then begin to remove walls at random until there was a path from start to finish. 1. Example. Starting with the simple maze on the left - with start cell A, we can create the maze on the right by generating random number, using the assignment of numbers to walls shown in the middle, and removing walls until there is a path from start to finish. (The example required removal of walls numbered 4, 10, 21, 22, 18, 9, 23, 11, 20, 13, 14 in that order.) PROJECT 2. Of course, if one were to do this, one would need some way of testing, at each step, whether a path exists. 6

a) A good way to do this would be to form a partition, with each cell initially in its own set. Each time a wall is removed: (1)Find the subset for each of the two cells that were separated by the wall. (2)If the subsets differ, replace them by their union. (3)If the two cells were already in the same subset, and we don't want to have cycles in the maze, we can not remove the wall. b) The result is that, at any time, each cell is in a set consisting of the cells that are reachable from it. c) When both the start and the finish cells are in the same set, we can stop. 3. Can you think of another place where we have seen something similar? ASK In Kruskal's shortest path algorithm, we need to test when we're adding an edge to see whether it would create a cycle, which occurs if it joins two vertices that are part of the same connected component. D. With the set representations we have considered thus far, the find operation is computationally expensive because we need to test each of the sets to see if the element in question is in it. 1. A much faster solution arises if we represent sets by trees in which the pointers go the other way - i.e. each element node points to its parent, rather than the other way around. a) We could have a special node to label the set, or we could just let one of the elements serve that purpose, with a pointer to itself. Example: The we could picture the cells after the removal of walls 4, 10, 21, 22 this way: 7

A B C D F G I J K L M O E H P N Note that find now requires just one step. b) If the element node in question has no parent, it is the name of the set; otherwise the name of the set is the node it points to. Example: A belongs to set A; H belongs to set G. 2. To perform union, we can simply make one set a subset of the other by making the root of the subset point to the root of the other. a) Of course, if we do union this way we may have to iterate up a tree to find the root of a set - e.g. if we take the union of sets A and G, we would get A / \ E G H H belongs to set A, but we had to iterate through G to find this out. b) A heuristic called union by size says to make the set containing the smaller number of elements the subset. (This minimizes the need for iteration) Example: our partition after removing all but the last 2 walls (13, 14), using union by size, is A B C D J L / / \ \ E I F K P / \ M O G N H (1)Removing wall 13 has no effect, since it the cells it separates (G and K) are already part of the same group. 8

(2)Removing wall 14 (which separates H and L) produces this result: A B C D J / / \ \ \ E I F K L / \ M O G P N H c) It is also possible to do path compression during iteration - so that all nodes we pass through are made direct children of the root. If we did this, when we were done, set A would look like this A / / / /\ \ \ \ E I O F G H K L M P N This reduces - but does not eliminate - the need for iteration in the find operation. 3. A program that uses this approach a) DEMO b) HANDOUT Union/Find code 9