Data Structures in Java. Session 18 Instructor: Bert Huang

Similar documents
Data Structures in Java. Session 18 Instructor: Bert Huang

Data Structures in Java. Session 17 Instructor: Bert Huang

Union-Find: A Data Structure for Disjoint Set Operations

Data Structures and Algorithms

Union-Find: A Data Structure for Disjoint Set Operations

Course Review. Cpt S 223 Fall 2010

Union-find algorithms. (1) How do you determine whether a vertex x is connected to a vertex y in a graph?

Lecture Summary CSC 263H. August 5, 2016

Course Review for Finals. Cpt S 223 Fall 2008

Lecture 13. Reading: Weiss, Ch. 9, Ch 8 CSE 100, UCSD: LEC 13. Page 1 of 29

CSci 231 Final Review

Representations of Weighted Graphs (as Matrices) Algorithms and Data Structures: Minimum Spanning Trees. Weighted Graphs

Cpt S 223 Fall Cpt S 223. School of EECS, WSU

Disjoint Sets. (Disjoint Sets) Data Structures and Programming Spring / 50

6.1 Minimum Spanning Trees

CSE 100: GRAPH ALGORITHMS

Outline Purpose Disjoint Sets Maze Generation. Disjoint Sets. Seth Long. March 24, 2010

CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find. Lauren Milne Spring 2015

Course Review. Cpt S 223 Fall 2009

CSE 373 MAY 10 TH SPANNING TREES AND UNION FIND

Advanced Set Representation Methods

Minimum-Spanning-Tree problem. Minimum Spanning Trees (Forests) Minimum-Spanning-Tree problem

Greedy Approach: Intro

2.1 Greedy Algorithms. 2.2 Minimum Spanning Trees. CS125 Lecture 2 Fall 2016

tree follows. Game Trees

Graphs and Network Flows ISE 411. Lecture 7. Dr. Ted Ralphs

Cpt S 223 Course Overview. Cpt S 223, Fall 2007 Copyright: Washington State University

Disjoint Sets and the Union/Find Problem

CMSC 341 Lecture 20 Disjointed Sets

Minimum Spanning Trees My T. UF

Data Structures in Java. Session 23 Instructor: Bert Huang

CIS 121 Data Structures and Algorithms Minimum Spanning Trees

CS 561, Lecture 9. Jared Saia University of New Mexico

Complexity of Prim s Algorithm

Today s Outline. Motivation. Disjoint Sets. Disjoint Sets and Dynamic Equivalence Relations. Announcements. Today s Topics:

Minimum Spanning Trees

CSE 202: Design and Analysis of Algorithms Lecture 3

Announcements Problem Set 5 is out (today)!

CSE 332 Data Abstractions: Disjoint Set Union-Find and Minimum Spanning Trees

CSE373: Data Structures & Algorithms Lecture 28: Final review and class wrap-up. Nicki Dell Spring 2014

CS 310 Advanced Data Structures and Algorithms

Building a network. Properties of the optimal solutions. Trees. A greedy approach. Lemma (1) Lemma (2) Lemma (3) Lemma (4)

Chapter 5. Greedy algorithms

The Shortest Path Problem

Disjoint Sets. The obvious data structure for disjoint sets looks like this.

Chapter 6 Heaps. Introduction. Heap Model. Heap Implementation

looking ahead to see the optimum

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Minimum Spanning Trees

22c:31 Algorithms. Kruskal s Algorithm for MST Union-Find for Disjoint Sets

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 8 Lecturer: David Wagner February 20, Notes 8 for CS 170

managing an evolving set of connected components implementing a Union-Find data structure implementing Kruskal s algorithm

Priority Queues Heaps Heapsort

CSE373: Data Structures & Algorithms Lecture 9: Priority Queues. Aaron Bauer Winter 2014

Algorithms (IX) Yijia Chen Shanghai Jiaotong University

Disjoint Sets. Based on slides from previous iterations of this course.

Disjoint Sets. Maintaining Disjoint Sets Complexity Analysis

Priority Queues. T. M. Murali. January 23, T. M. Murali January 23, 2008 Priority Queues

CMSC 341 Lecture 20 Disjointed Sets

CS 5321: Advanced Algorithms Minimum Spanning Trees. Acknowledgement. Minimum Spanning Trees

22 Elementary Graph Algorithms. There are two standard ways to represent a

Priority Queues. T. M. Murali. January 29, 2009

Notes on Minimum Spanning Trees. Red Rule: Given a cycle containing no red edges, select a maximum uncolored edge on the cycle, and color it red.

Priority Queues. 04/10/03 Lecture 22 1

Soft Heaps And Minimum Spanning Trees

Algorithms and Theory of Computation. Lecture 7: Priority Queue

CSE202 Greedy algorithms. Fan Chung Graham

CMSC 341 Lecture 20 Disjointed Sets

CS 161: Design and Analysis of Algorithms

CSE 373: Data Structures and Algorithms

22 Elementary Graph Algorithms. There are two standard ways to represent a

CSE373: Data Structures & Algorithms Lecture 17: Minimum Spanning Trees. Dan Grossman Fall 2013

CSC 1700 Analysis of Algorithms: Minimum Spanning Tree

CSE 373 Sample Midterm #2 (closed book, closed notes, calculators o.k.)

CS 4349 Lecture October 18th, 2017

CE 221 Data Structures and Algorithms

Heap Model. specialized queue required heap (priority queue) provides at least

Union/Find Aka: Disjoint-set forest. Problem definition. Naïve attempts CS 445

Solutions to Exam Data structures (X and NV)

1 Shortest Paths. 1.1 Breadth First Search (BFS) CS 124 Section #3 Shortest Paths and MSTs 2/13/2018

Greedy Algorithms Part Three

Lecture 4: Graph Algorithms

Minimum Spanning Trees

UNIT III BALANCED SEARCH TREES AND INDEXING

Algorithm Design (8) Graph Algorithms 1/2

CSE332: Data Abstractions Lecture 25: Minimum Spanning Trees. Ruth Anderson via Conrad Nied Winter 2015

Review of course COMP-251B winter 2010

Priority queue ADT part 1

Optimization I : Brute force and Greedy strategy

Exercises: Disjoint Sets/ Union-find [updated Feb. 3]

Data Structures and Algorithms

Data Structures and Algorithms

CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms. Lecturer: Shi Li

Heaps. 2/13/2006 Heaps 1

1 Format. 2 Topics Covered. 2.1 Minimal Spanning Trees. 2.2 Union Find. 2.3 Greedy. CS 124 Quiz 2 Review 3/25/18

CSE 373 Autumn 2010: Midterm #2 (closed book, closed notes, NO calculators allowed)

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Binary Heaps. COL 106 Shweta Agrawal and Amit Kumar

Chapter 2: Basic Data Structures

Today s Outline. The One Page Cheat Sheet. Simplifying Recurrences. Set Notation. Set Notation. Priority Queues. O( 2 n ) O( n 3 )

Transcription:

Data Structures in Java Session 18 Instructor: Bert Huang http://www.cs.columbia.edu/~bert/courses/3134

Announcements Homework posted, due 11/24 Old homeworks, midterm exams

Review Shortest Path algorithms Breadth first search Dijkstraʼs Algorithm All-Pairs Shortest Path

Todayʼs Plan Minimum Spanning Tree Primʼs Algorithm Kruskalʼs Algorithm Disjoint Sets

Minimum Spanning Tree Problem Definition Given connected graph G, find the connected, acyclic subgraph T with minimum edge weight A tree that includes every node is called a spanning tree The method to find the MST is another example of a greedy algorithm

Motivation for Greed Consider any spanning tree Adding another edge to the tree creates exactly one cycle Removing an edge from that cycle restores the tree structure

Primʼs Algorithm Grow the tree like Dijkstraʼs Algorithm Dijkstraʼs: grow the set of vertices to which we know the shortest path Primʼs: grow the set of vertices we have added to the minimum tree Store shortest edge D[ ] from each node to tree

Primʼs Algorithm Start with a single node tree, set distance of adjacent nodes to edge weights, infinite elsewhere Repeat until all nodes are in tree: Add the node v with shortest known distance Update distances of adjacent nodes w: D[w] = min( D[w], weight(v,w))

Primʼs Example 3 4 9 7 6

Primʼs Example 3 4 9 7 6

Primʼs Example 3 3 4 9 7 9 6

Primʼs Example 3 4 9 7 9 6

Primʼs Example 3 4 4 9 9 7 6 7

Primʼs Example 3 4 9 9 7 6 7

Primʼs Example 3 4 9 9 7 6 7

Primʼs Example 3 4 9 7 9 7 6

Primʼs Example 3 4 9 7 9 6 6

Primʼs Example 3 4 9 9 7 6

Primʼs Example 3 4 9 7 6

Primʼs Example 3 4 9 7 6

Implementation Details Store previous node like Dijkstraʼs Algorithm; backtrack to construct tree after completion Of course, use a priority queue to keep track of edge weights. Either keep track of nodes inside heap & decreasekey or just add a new copy of the node when key decreases, and call deletemin until you see a node not in the tree

Primʼs Algorithm Justification At any point, we can consider the set of nodes in the tree T and the set outside the tree Q Whatever the MST structure of the nodes in Q, at least one edge must connect the MSTs of T and Q The greedy edge is just as good structurally as any other edge, and has minimum weight

Primʼs Running Time Each stage requires one deletemin O(log V ), and there are exactly V stages We update keys for each edge, updating the key costs O(log V ) (either an insert or a decreasekey) Total time: O( V log V + E log V ) = O( E log V )

Kruskalʼs Algorithm Somewhat simpler conceptually, but more challenging to implement Algorithm: repeatedly add the shortest edge that does not cause a cycle until no such edges exist Each added edge performs a union on two trees; perform unions until there is only one tree Need special ADT for unions (Disjoint Set)

Kruskalʼs Example 3 4 9 7 6

Kruskalʼs Example 3 4 9 7 2 6

Kruskalʼs Example 3 4 9 7 2 6

Kruskalʼs Example 3 4 9 7 2 6

Kruskalʼs Example 3 4 9 7 2 6

Kruskalʼs Example 3 4 9 7 2 6

Kruskalʼs Example 3 4 9 7 2 6

Kruskalʼs Example 3 4 9 7 2 6

Kruskalʼs Justification At each stage, the greedy edge e connects two nodes v and w Eventually those two nodes must be connected; we must add an edge to connect trees including v and w We can always use e to connect v and w, which must have less weight since it's the greedy choice

Kruskalʼs Running Time First, buildheap costs O( E ) In the worst case, we have to call E deletemins E V 2 Total running time O( E log E ); but O( E log V 2 )=O(2 E log V ) =O( E log V )

MST Summary Connect all nodes in graph using minimum weight tree Two greedy algorithms: Primʼs: similar to Dijkstraʼs. Easier to code Kruskalʼs: easy on paper

Disjoint Sets

Motivating Example One interpretation of Kruskalʼs Algorithm: Merge sets by connecting nodes Think of trees as sets of connected nodes Never merge nodes that are in the same set Simple idea, but how can we implement it?

Equivalence Relations An equivalence relation is a relation operator that observes three properties: Reflexive: (a R a), for all a Symmetric: (a R b) if and only if (b R a) Transitive: (a R b) and (b R c) implies (a R c) Put another way, equivalence relations check if operands are in the same equivalence class

Equivalence Classes Equivalence class: the set of elements that are all related to each other via an equivalence relation Due to transitivity, each member can only be a member of one equivalence class Thus, equivalence classes are disjoint sets Choose any distinct sets S and T, S T =

Disjoint Set ADT Collection of objects, each in an equivalence class find(x) returns the class of the object union(x,y) puts x and y in the same class as well as every other relative of x and y Even less information than hash; no keys, no ordering

Implementation Observations One simple implementation would be to store the class label for each element in an array O(1) lookup for find, O(N) for union If we store equivalent elements in linked lists, we avoid scanning the whole set during union We can change the labels of the smaller class

Data Structure Store elements in equivalence (general) trees Use the treeʼs root as equivalence class label find returns root of containing tree union merges tree Since all operations only search up the tree, we can store in an array

Implementation Index all objects from 0 to N-1 Store a parent array such that s[i] is the index of iʼs parent If i is a root, store the negative size of its tree* find follows s[i] until negative, returns index union(x,y) points the root of xʼs tree to the root of yʼs tree

Analysis find costs the depth of the node union costs O(1) after finding the roots Both operations depend on the height of the tree Since these are general trees, the trees can be arbitrarily shallow

Union by Size Claim: if we union by pointing the smaller tree to the larger treeʼs root, the height is at most log N Each union increases the depths of nodes in the smaller trees Also puts nodes from the smaller tree into a tree at least twice the size We can only double the size log N times

Union by Size Figure 3 b 2 e 3 b a c d a c e d

Union by Height Similar method, attach the tree with less height to the taller tree overall height only increases if trees are equal height

Union by Height Figure 1 b 2 f 2 f a c e b e d g a c d g

Union by Height proof Induction: tree of height h has at least nodes Let T be tree of height h with least nodes possible via union operations At last union, T must have had height h-1, because otherwise, it would have been a smaller tree of height h Since the height was updated, T unioned with 2 h another tree of height h-1, each had at least nodes resulting in at least 2 h nodes for T 2 h 1

Path Compression Even if we have log N tall trees, we can keep calling find on the deepest node repeatedly, costing O(M log N) for M operations Additionally, we will perform path compression during each find call Point every node along the find path to root

Path Compression Figure 3 e 3 e d b b c d a c a

Union by Rank Path compression messes up union-by-height because we reduce the height when we compress We could fix the height, but this turns out to gain little, and costs find operations more Instead, rename to union by rank, where rank is just an overestimate of height Since heights change less often than sizes, rank/height is usually the cheaper choice

Worst Case Bound The algorithms described have been proven to have worst case Θ(Mα(M,N)) where α is the inverse of Ackermannʼs function: A(1,j) = 2 j A(i, 1) = A(i 1, 2) A(i, j) = A(i 1,A(i, j 1)) α(m,n) = min{i 1 A(i, M/N) > log N}

Worst Case Bound A slightly looser, but easier to prove/understand bound is that any sequence of M = Ω(N) operations will cost O(M log* N) running time log* N is the number of times the logarithm needs to be applied to N until the result is e.g., log*(636) = 4 because log(log(log(log(636)))) = 1 1

Log* Plots 4 log*(x) 3 2 1 0 0 10 20 30 40 0 60 70 80 90 100 x 6 4 2 0 0 2 4 6 8 10 12 x 10 29

Log* Steps N log* N = 1 log* N = 2 log* N = 3 log* N = 4 log* N = (1, 2] (2, 4] (4,16] (16, 636] (636, 2 636 ]

Note about Kruskalʼs With this bound, Kruskalʼs algorithm needs N-1 unions, so it should cost almost linear time to perform unions Unfortunately the algorithm is still dominated by heap deletemin calls, so asymptotic running time is still O(E log V)

Reading Weiss 9. (MST) Weiss 8.1-8. (Disjoint Sets)