Disjoint Sets (Disjoint Sets) Data Structures and Programming Spring 2017 1 / 50
Making a Good Maze What s a Good Maze? Connected Just one path between any two rooms Random The Maze Construction Problem Given: collection of rooms: V connections between rooms (initially all closed): E Construct a maze: collection of rooms: V = V designated rooms in, i V, and out, o V collection of connections to knock down: E E such that one unique path connects every two rooms (Disjoint Sets) Data Structures and Programming Spring 2017 2 / 50
Maze Construction Algorithm While edges remain in E 1 Remove a random edge e = (u, v) from E How can we do this efficiently? 2 If u and v have not yet been connected add e to E mark u and v as connected How to check connectedness efficiently? (Disjoint Sets) Data Structures and Programming Spring 2017 3 / 50
Equivalence Relations An equivalence relation R( A A) must have three properties reflexive: x A, (x, x) R symmetric: (x, y) R (y, x) R transitive: (x, y) R (y, z) R (x, z) R Connection between rooms is an equivalence relation any room is connected to itself if room a is connected to room b, then room b is connected to room a if room a is connected to room b and room b is connected to room c, then room a is connected to room c (Disjoint Sets) Data Structures and Programming Spring 2017 4 / 50
Disjoint Set Union/Find ADT Union/Find operations create destroy union find Disjoint set partition property: element of a DS U/F structure belongs to exactly one set with a unique name Dynamic equivalence property: Union(a, b) creates a new set which is the union of the sets containing a and b (Disjoint Sets) Data Structures and Programming Spring 2017 5 / 50
Example Construct the maze on the right Initial (the name of each set is in boldface): {a}{b}{c}{d}{e}{f}{g}{h}{i} Randomly select edge 1 (Disjoint Sets) Data Structures and Programming Spring 2017 6 / 50
Example, First Step {a}{b}{c}{d}{e}{f}{g}{h}{i} find(b) b find(e) e find(b) find(e) so: add 1 to E union(b, e) {a}{b, e}{c}{d}{f}{g}{h}{i} (Disjoint Sets) Data Structures and Programming Spring 2017 7 / 50
Up-Tree Intuition Finding the representative member of a set is somewhat like the opposite of finding whether a given key exists in a set. So, instead of using trees with pointers from each node to its children; let s use trees with a pointer from each node to its parent. Each subset is an up-tree with its root as its representative member All members of a given set are nodes in that set s up-tree Hash table maps input data to the node associated with that data (Disjoint Sets) Data Structures and Programming Spring 2017 8 / 50
Find (Disjoint Sets) Data Structures and Programming Spring 2017 9 / 50
Union (Disjoint Sets) Data Structures and Programming Spring 2017 10 / 50
The Whole Example (Disjoint Sets) Data Structures and Programming Spring 2017 11 / 50
The Whole Example (Disjoint Sets) Data Structures and Programming Spring 2017 12 / 50
The Whole Example (Disjoint Sets) Data Structures and Programming Spring 2017 13 / 50
The Whole Example (Disjoint Sets) Data Structures and Programming Spring 2017 14 / 50
The Whole Example (Disjoint Sets) Data Structures and Programming Spring 2017 15 / 50
The Whole Example (Disjoint Sets) Data Structures and Programming Spring 2017 16 / 50
The Whole Example (Disjoint Sets) Data Structures and Programming Spring 2017 17 / 50
The Whole Example (Disjoint Sets) Data Structures and Programming Spring 2017 18 / 50
The Whole Example (Disjoint Sets) Data Structures and Programming Spring 2017 19 / 50
The Whole Example (Disjoint Sets) Data Structures and Programming Spring 2017 20 / 50
The Whole Example (Disjoint Sets) Data Structures and Programming Spring 2017 21 / 50
Nifty storage trick A forest of up-trees can easily be stored in an array. Also, if the node names are integers or characters, we can use a very simple, perfect hash. (Disjoint Sets) Data Structures and Programming Spring 2017 22 / 50
Room for Improvement: Weighted Union Always makes the root of the larger tree the new root Often cuts down on height of the new up-tree (Disjoint Sets) Data Structures and Programming Spring 2017 23 / 50
Weighted Union Find Analysis Finds with weighted union are O(max up-tree height) But, an up-tree of height h with weighted union must have at least 2 h nodes Hence, 2 max height n and max height log n So, find takes O(log n) (Disjoint Sets) Data Structures and Programming Spring 2017 24 / 50
Alternatives to Weighted Union Union by height Ranked union (cheaper approximation to union by height) (Disjoint Sets) Data Structures and Programming Spring 2017 25 / 50
Room for Improvement: Path Compression Points everything along the path of a find to the root Reduces the height of the entire access path to 1 (Disjoint Sets) Data Structures and Programming Spring 2017 26 / 50
Path Compression Example (Disjoint Sets) Data Structures and Programming Spring 2017 27 / 50
Two Heuristics 1 Union by Rank Store rank of tree in rep. Rank tree size. Make root with smaller rank point to root with larger rank. 2 Path Compression During Find-Set, flatten tree. (Disjoint Sets) Data Structures and Programming Spring 2017 28 / 50
Operations (Disjoint Sets) Data Structures and Programming Spring 2017 29 / 50
Find Set (Disjoint Sets) Data Structures and Programming Spring 2017 30 / 50
A Slow Growing Function (Disjoint Sets) Data Structures and Programming Spring 2017 31 / 50
Complex Complexity of Weighted Union + Path Compression Tarjan (1984) proved that m weighted union and find operations with path compression on a set of n elements have worst case complexity actually even a little better! O(m log (n)) For all practical purposes this is amortized constant time (log n 5) (Disjoint Sets) Data Structures and Programming Spring 2017 32 / 50
One-pass path compression We can modify the structure of the tree as each edge is processed to attempt to save on the overall run-time of the algorithm. When we do a Find, we can do a bit of maintenance work along the path traversed. (Disjoint Sets) Data Structures and Programming Spring 2017 33 / 50
Analysis n elements (n Make-Sets, at most n 1 Unions) m Make-Set, Union, and Find-Set operations m = Ω(n) worst case = O(m α(m, n)) = O(m log n) α(m, n) is the inverse of Ackermann s function a very very very slow function A(1, j) = 2j for j 1 A(i, 1) = A(i 1, 2) for i 2 A(i, j) = A(i 1, A(i, j 1)) for i, j 2 α(m, n) = min{i 1 : A(i, m/n) > log n} (Disjoint Sets) Data Structures and Programming Spring 2017 34 / 50
Ackermann s Function (Disjoint Sets) Data Structures and Programming Spring 2017 35 / 50
Properties of Ranks: # 1 For all tree roots x, size(x) 2 rank(x) Basis: rank = 0, 2 0 = 1 = only one root node Induction: Let s Union(a, b) rank is increased by 1 only if rank[a] = rank[b] size(a b) = size(a)+size(b) 2 rank[a]+2rank[b] = 2 rank[a]+1 = 2 rank[a b] (Disjoint Sets) Data Structures and Programming Spring 2017 36 / 50
Properties of Ranks: # 2 The ranks of the nodes on a path from a leaf to a root increase monotonically (Disjoint Sets) Data Structures and Programming Spring 2017 37 / 50
Properties of Ranks: # 3 The number of nodes of rank r is at most n/2 r each node of rank r is the root of a subtree of at least 2 r nodes (property #1) all subtrees are disjoint Hence, there are at most n/2 r disjoint subtrees of rank r there are at most n/2 r nodes of rank r (Disjoint Sets) Data Structures and Programming Spring 2017 38 / 50
Analysis Actual cost of Find-Set( x ) is the number of nodes of the path from x to the root Accounting method : deposit 1 dollar for every node on the path from x to the root For m operations, how many dollars did we spend? Partition ranks into groups group 0 contains only elements rank r goes to group G(r) the largest rank in any group g is F(g), F = G 1 the number of ranks in any group g > 0 is F(g) F(g 1) (Disjoint Sets) Data Structures and Programming Spring 2017 39 / 50
Rank Grouping : Example (Disjoint Sets) Data Structures and Programming Spring 2017 40 / 50
Money Deposit (Disjoint Sets) Data Structures and Programming Spring 2017 41 / 50
Money Deposit (Disjoint Sets) Data Structures and Programming Spring 2017 42 / 50
Money Deposit Rules Deposit one dollar at the central account if v is the root the parent of v is the root the parent of v is in a different group from v Otherwise deposit one dollar into the node After m operations, the total deposit at the central account is at most m(g(n) + 2) Each operation deposits at most (G(n) + 2) dollars (Disjoint Sets) Data Structures and Programming Spring 2017 43 / 50
Central Account Deposit (Disjoint Sets) Data Structures and Programming Spring 2017 44 / 50
Node Accounts Let node v be in group g. How many dollars can be deposited into node v? If one dollar is deposited into a node v, v will get a new parent of higher rank. How many times can v be moved before its parent get pushed out of group g? Answer : the number of ranks in group g = F(g) F(g 1) 1 (F(g) is the largest rank # in group g ) (Disjoint Sets) Data Structures and Programming Spring 2017 45 / 50
The Number of Nodes in a Group The number of nodes, V(g), in group g > 0 is at most n/2 F(g 1) (Disjoint Sets) Data Structures and Programming Spring 2017 46 / 50
Total Node Deposit The maximum number of dollars deposited to all nodes in group g is at most nf(g)/2 F(g 1) the number of nodes in group g > 0 is at most n/2 F(g 1) each node gets at most F(g) F(g 1) 1 dollars F(g) F(g 1) 1 < F(g) The total node deposit is at most G(n) nf(g)/2 F(g 1) g=1 (Disjoint Sets) Data Structures and Programming Spring 2017 47 / 50
Total Deposit G(n) m(g(n) + 2) + n g=1 F(g)/2 F(g 1) Central account deposit + Total node deposit What are G(n) and F(g)? (Disjoint Sets) Data Structures and Programming Spring 2017 48 / 50
Total Deposit A Loose Bound Let F(g) = g, G(r) = r m(n + 2) + n G(n) m(g(n) + 2) + n g=1 F(g)/2 F(g 1) becomes n g/2 g 1 < m(n + 2) + 2n = O(mn) g=1 Very loose upper bound, worse than O(m log n) (Disjoint Sets) Data Structures and Programming Spring 2017 49 / 50
Total Deposit Let F(g) = 2 F(g 1), G(r) = 1 + log n Note that m = Ω(n) G(n) m(g(n) + 2) + n g=1 becomes F(g)/2 F(g 1) 1+log n m(1 + log n + 2) + n 2 F(g 1) /2 F(g 1) g=1 = m(3 + log n) + n(1 + log n) = O(m log n) (Disjoint Sets) Data Structures and Programming Spring 2017 50 / 50