Algorithms and Theory of Computation Lecture 5: Minimum Spanning Tree Xiaohui Bei MAS 714 August 31, 2017 Nanyang Technological University MAS 714 August 31, 2017 1 / 30
Minimum Spanning Trees (MST) A weighted graph is a graph in which each edge e is associated with a numerical weight c e. Nanyang Technological University MAS 714 August 31, 2017 2 / 30
Minimum Spanning Trees (MST) A weighted graph is a graph in which each edge e is associated with a numerical weight c e. Spanning Tree A spanning tree of an undirected connected graph G = (V, E) is a set of edges T E such that T forms a tree, and (V, T) is connected. In a weighted graph G, a minimum spanning tree is a spanning tree with the smallest sum of edge weights Nanyang Technological University MAS 714 August 31, 2017 2 / 30
Minimum Spanning Trees (MST) A weighted graph is a graph in which each edge e is associated with a numerical weight c e. Spanning Tree A spanning tree of an undirected connected graph G = (V, E) is a set of edges T E such that T forms a tree, and (V, T) is connected. In a weighted graph G, a minimum spanning tree is a spanning tree with the smallest sum of edge weights MST Find a minimum spanning tree of a weighted graph G. Nanyang Technological University MAS 714 August 31, 2017 2 / 30
Example a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g Nanyang Technological University MAS 714 August 31, 2017 3 / 30
Example a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g Nanyang Technological University MAS 714 August 31, 2017 3 / 30
Applications A basic and fundamental problem in graph theory, often used to measure costs to establish connections between vertices. 1 Network Design: designing networks with minimum cost while guaranting connectivity 2 Approximation Algorithm: can be used to approximate other hard problems such as Traveling Salesman Problem, Steiner Trees, etc. Nanyang Technological University MAS 714 August 31, 2017 4 / 30
Greedy Template Algorithm: SomeGreedyMSTAlgorithm(G): Initialize T = ; // T will store edges of a MST while T is not a spanning tree of G do choose e E that satisfies condition; add e to T; return T Nanyang Technological University MAS 714 August 31, 2017 5 / 30
Greedy Template Algorithm: SomeGreedyMSTAlgorithm(G): Initialize T = ; // T will store edges of a MST while T is not a spanning tree of G do choose e E that satisfies condition; add e to T; return T Which edges, and in what order, should be processed and added to the spanning tree? Nanyang Technological University MAS 714 August 31, 2017 5 / 30
Kruskal s Algorithm Process edges in the order of their costs (in increasing order), and add edges to T as long as they don t form a cycle. a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g
Kruskal s Algorithm Process edges in the order of their costs (in increasing order), and add edges to T as long as they don t form a cycle. a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g Nanyang Technological University MAS 714 August 31, 2017 6 / 30
Kruskal s Algorithm Process edges in the order of their costs (in increasing order), and add edges to T as long as they don t form a cycle. a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g Nanyang Technological University MAS 714 August 31, 2017 6 / 30
Kruskal s Algorithm Process edges in the order of their costs (in increasing order), and add edges to T as long as they don t form a cycle. a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g Nanyang Technological University MAS 714 August 31, 2017 6 / 30
Kruskal s Algorithm Process edges in the order of their costs (in increasing order), and add edges to T as long as they don t form a cycle. a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g Nanyang Technological University MAS 714 August 31, 2017 6 / 30
Kruskal s Algorithm Process edges in the order of their costs (in increasing order), and add edges to T as long as they don t form a cycle. a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g Nanyang Technological University MAS 714 August 31, 2017 6 / 30
Kruskal s Algorithm Process edges in the order of their costs (in increasing order), and add edges to T as long as they don t form a cycle. a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g Nanyang Technological University MAS 714 August 31, 2017 6 / 30
Prim s Algorithm T maintained by the algorithm will be a tree, starting from a single vertex. In each iteration, pick edges with least attachedment cost to T. a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g
Prim s Algorithm T maintained by the algorithm will be a tree, starting from a single vertex. In each iteration, pick edges with least attachedment cost to T. a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g
Prim s Algorithm T maintained by the algorithm will be a tree, starting from a single vertex. In each iteration, pick edges with least attachedment cost to T. a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g Nanyang Technological University MAS 714 August 31, 2017 7 / 30
Prim s Algorithm T maintained by the algorithm will be a tree, starting from a single vertex. In each iteration, pick edges with least attachedment cost to T. a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g Nanyang Technological University MAS 714 August 31, 2017 7 / 30
Prim s Algorithm T maintained by the algorithm will be a tree, starting from a single vertex. In each iteration, pick edges with least attachedment cost to T. a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g Nanyang Technological University MAS 714 August 31, 2017 7 / 30
Prim s Algorithm T maintained by the algorithm will be a tree, starting from a single vertex. In each iteration, pick edges with least attachedment cost to T. a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g Nanyang Technological University MAS 714 August 31, 2017 7 / 30
Prim s Algorithm T maintained by the algorithm will be a tree, starting from a single vertex. In each iteration, pick edges with least attachedment cost to T. a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g Nanyang Technological University MAS 714 August 31, 2017 7 / 30
Prim s Algorithm T maintained by the algorithm will be a tree, starting from a single vertex. In each iteration, pick edges with least attachedment cost to T. a 7 5 b 8 c 7 d 9 15 e 5 6 f 8 11 9 g Nanyang Technological University MAS 714 August 31, 2017 7 / 30
Correctness of MST algorithms Many different MST algorithms. All of them rely on some basic properties of MSTs. Assumption For simplicity, we assume that edge costs are distinct, that is no two edge costs are equal. Nanyang Technological University MAS 714 August 31, 2017 8 / 30
Cut Cut Given a graph G = (V, E), a cut is a partition of the vertices into two disjoint subsets. Edges that have one endpoint in each subset of the partition are the edges of the cut, and these edges are said to cross the cut. Nanyang Technological University MAS 714 August 31, 2017 9 / 30
Cut Cut Given a graph G = (V, E), a cut is a partition of the vertices into two disjoint subsets. Edges that have one endpoint in each subset of the partition are the edges of the cut, and these edges are said to cross the cut. S V \ S Nanyang Technological University MAS 714 August 31, 2017 9 / 30
Safe Edges Safe Edges An edge e is a safe edge if there exists a partition of V into S and V\S, and e is the unique minimum cost edge crossing this partition. S 13 V \ S 7 3 5 11 Nanyang Technological University MAS 714 August 31, 2017 10 / 30
Safe Edges Safe Edges An edge e is a safe edge if there exists a partition of V into S and V\S, and e is the unique minimum cost edge crossing this partition. S 13 13 7 7 3 3 5 5 V \ S 11 11 Safe edge in the cut (S, V \ S) Nanyang Technological University MAS 714 August 31, 2017 10 / 30
Cut Property Theorem Given an undirected connected graph G with distinct edge costs, the set of safe edges in G form the unique MST of G. Nanyang Technological University MAS 714 August 31, 2017 11 / 30
Cut Property Theorem Given an undirected connected graph G with distinct edge costs, the set of safe edges in G form the unique MST of G. Proof. Prove the lemma in two steps: 1 If e is a safe edge, then every MST contains e. 2 The set of safe edges form a connected graph that covers every vertex. Nanyang Technological University MAS 714 August 31, 2017 11 / 30
Cut Property Lemma 1 If e is a safe edge, then every MST contains e. Nanyang Technological University MAS 714 August 31, 2017 12 / 30
Cut Property Lemma 1 If e is a safe edge, then every MST contains e. Proof. Assume by contradiction that e is not in some MST T. 1 e = {u, v} is safe = there is an S V such that e is the unique minimum cost edge crossing S. 2 Adding e to T creates a cycle C in T {e}. 3 There must exist another edge e in T cross this cut. 4 T = (T\{e }) {e} is a spanning tree of lower cost. Nanyang Technological University MAS 714 August 31, 2017 12 / 30
Cut Property Lemma 2 The set of safe edges from a connected graph that covers every vertex. Nanyang Technological University MAS 714 August 31, 2017 13 / 30
Cut Property Lemma 2 The set of safe edges from a connected graph that covers every vertex. Proof. 1 Assume not. Let S be a connected component in the graph induced by safe edges. 2 Let e be the smallest cost edge crossing S = e is a safe edge. Nanyang Technological University MAS 714 August 31, 2017 13 / 30
Kruskal s Algorithm Process edges in the order of their costs (in increasing order), and add edges to T as long as they don t form a cycle. Algorithm: Kruskal(G): Initialize T = ; // T will store edges of a MST while T is not a spanning tree of G do choose e E of minimum cost; if T {e} does not contain cycles then add e to T; return T Nanyang Technological University MAS 714 August 31, 2017 14 / 30
Kruskal s Algorithm Process edges in the order of their costs (in increasing order), and add edges to T as long as they don t form a cycle. Algorithm: Kruskal(G): Initialize T = ; // T will store edges of a MST while T is not a spanning tree of G do choose e E of minimum cost; if T {e} does not contain cycles then add e to T; return T Presort edges by costs. Choosing minimum then takes O(1) time. Nanyang Technological University MAS 714 August 31, 2017 14 / 30
Kruskal s Algorithm Process edges in the order of their costs (in increasing order), and add edges to T as long as they don t form a cycle. Algorithm: Kruskal(G): Initialize T = ; // T will store edges of a MST while T is not a spanning tree of G do choose e E of minimum cost; if T {e} does not contain cycles then add e to T; return T Presort edges by costs. Choosing minimum then takes O(1) time. Do DFS on T {e}. Takes O(n) time. Nanyang Technological University MAS 714 August 31, 2017 14 / 30
Kruskal s Algorithm Process edges in the order of their costs (in increasing order), and add edges to T as long as they don t form a cycle. Algorithm: Kruskal(G): Initialize T = ; // T will store edges of a MST while T is not a spanning tree of G do choose e E of minimum cost; if T {e} does not contain cycles then add e to T; return T Presort edges by costs. Choosing minimum then takes O(1) time. Do DFS on T {e}. Takes O(n) time. Total time O(m log m) + O(mn) = O(mn). Nanyang Technological University MAS 714 August 31, 2017 14 / 30
Correctness Process edges in the order of their costs (in increasing order), and add edges to T as long as they don t form a cycle. Nanyang Technological University MAS 714 August 31, 2017 15 / 30
Correctness Process edges in the order of their costs (in increasing order), and add edges to T as long as they don t form a cycle. Only need to show that all edges added are safe. Proof of Correctness. When e = (u, v) is added to the tree, let S and S be the connected components containing u and v respectively. e has minimum cost under all edges forming no cycle, hence e has minimum cost among all edges crossing the cut S (and also S ) e is a safe edge. Nanyang Technological University MAS 714 August 31, 2017 15 / 30
More Efficient Implementation Nanyang Technological University MAS 714 August 31, 2017 16 / 30
More Efficient Implementation Algorithm: SmarterKruskal(G): Initialize T = ; // T will store edges of a MST Put each vertex u V into a set by itself; foreach e = {u, v} E in the order of increasing costs do if u and v belong to different sets then add e to T; merge the two sets containing u and v; return T Nanyang Technological University MAS 714 August 31, 2017 16 / 30
More Efficient Implementation Algorithm: SmarterKruskal(G): Initialize T = ; // T will store edges of a MST Put each vertex u V into a set by itself; foreach e = {u, v} E in the order of increasing costs do if u and v belong to different sets then add e to T; merge the two sets containing u and v; return T Need a data structure to: check if two elements belong to same set merge two sets Nanyang Technological University MAS 714 August 31, 2017 16 / 30
Data Structure: Union-Find Union-Find Store a set of disjoint sets with the following operations: 1 Make-Set(V): generate a set {v} for each vertex v V. Name of set {v} is v. 2 Find(u): find the name of the set containing vertex u. 3 Union(u, v): merge the sets named u and v. Name of the new set is either u or v. Nanyang Technological University MAS 714 August 31, 2017 17 / 30
Data Structure: Union-Find Union-Find Store a set of disjoint sets with the following operations: 1 Make-Set(V): generate a set {v} for each vertex v V. Name of set {v} is v. 2 Find(u): find the name of the set containing vertex u. 3 Union(u, v): merge the sets named u and v. Name of the new set is either u or v. The running time of Kruskal algorithm will depend on the implementation of the data structure. Nanyang Technological University MAS 714 August 31, 2017 17 / 30
Union-Find: Implementation Sets are represented as trees, by pointers towards the roots. All elements in one tree belong to a set with root s name. Find(u): Traverse from u to the root Union(u, v): Make root of u (smaller set) point to root of v. Takes O(1) time. Each vertex u has a pointer parent(u) to its ancestor. Nanyang Technological University MAS 714 August 31, 2017 18 / 30
Union-Find: Implementation Sets are represented as trees, by pointers towards the roots. All elements in one tree belong to a set with root s name. Find(u): Traverse from u to the root Union(u, v): Make root of u (smaller set) point to root of v. Takes O(1) time. Each vertex u has a pointer parent(u) to its ancestor. s v w Figure u Nanyang Technological University MAS 714 August 31, 2017 18 / 30
Union-Find: Implementation Sets are represented as trees, by pointers towards the roots. All elements in one tree belong to a set with root s name. Find(u): Traverse from u to the root Union(u, v): Make root of u (smaller set) point to root of v. Takes O(1) time. Each vertex u has a pointer parent(u) to its ancestor. s v w s v w u u Figure Figure: Union(Find(v), Find(u)) Nanyang Technological University MAS 714 August 31, 2017 18 / 30
New Implementation Algorithm: Make-Set(G): foreach u V do parent(u) = u; Algorithm: Find(u): while parent(u) u do u = parent(u); return u Algorithm: Union(u, v): ( parent(u) = u & parent(v) = v ) if component(u) component(v) then parent(u) = v else parent(v) = u set new componen size to component(u) + component(v). Nanyang Technological University MAS 714 August 31, 2017 19 / 30
Analysis Make-Set: O(n) time. Union: O(1) time. Find: Nanyang Technological University MAS 714 August 31, 2017 20 / 30
Analysis Make-Set: O(n) time. Union: O(1) time. Find: O(depth of the tree) time. Nanyang Technological University MAS 714 August 31, 2017 20 / 30
Analysis Make-Set: O(n) time. Union: O(1) time. Find: O(depth of the tree) time. Proposition The maximum depth of trees in union-find is O(log n). Nanyang Technological University MAS 714 August 31, 2017 20 / 30
Analysis Make-Set: O(n) time. Union: O(1) time. Find: O(depth of the tree) time. Proposition The maximum depth of trees in union-find is O(log n). Proof. Depth of tree(u) increases by at most 1 only when the set containing u changes its name. If depth of tree u increases then the size of the set containing u (at least) doubles. Maximum set size is n; so the depth of any tree is at most O(log n). Nanyang Technological University MAS 714 August 31, 2017 20 / 30
Speed up! Nanyang Technological University MAS 714 August 31, 2017 21 / 30
Speed up! When calling Find(u), we traverse the path from u to the root. Nanyang Technological University MAS 714 August 31, 2017 21 / 30
Speed up! When calling Find(u), we traverse the path from u to the root. Consecutive calls of find(u) traverse the same path. Nanyang Technological University MAS 714 August 31, 2017 21 / 30
Speed up! When calling Find(u), we traverse the path from u to the root. Consecutive calls of find(u) traverse the same path. Idea: Path Compression Make all vertices on the path in Find(u) point to root directly. Nanyang Technological University MAS 714 August 31, 2017 21 / 30
Path Compression: Example Algorithm: Find(u): if parent(u) u then parent(u) = Find(parent(u)); return parent(u) Nanyang Technological University MAS 714 August 31, 2017 22 / 30
Path Compression: Example Algorithm: Find(u): if parent(u) u then parent(u) = Find(parent(u)); return parent(u) r v w u Figure Nanyang Technological University MAS 714 August 31, 2017 22 / 30
Path Compression: Example Algorithm: Find(u): if parent(u) u then parent(u) = Find(parent(u)); return parent(u) r r v v w w u u Figure Figure: After Find(u) Nanyang Technological University MAS 714 August 31, 2017 22 / 30
Path Compression Question Does Path Compression help? Nanyang Technological University MAS 714 August 31, 2017 23 / 30
Path Compression Question Does Path Compression help? Yes! Nanyang Technological University MAS 714 August 31, 2017 23 / 30
Path Compression Question Does Path Compression help? Yes! Theorem With Path Compression, the amortized running time of Find operations is O(α(n)), where α(n) is the inverse of the Ackermann function A(n, n). Nanyang Technological University MAS 714 August 31, 2017 23 / 30
Ackermann and Inverse Ackermann Functions Ackermann function A(m, n) defined for m, n 0: n + 1 if m = 0 A(m, n) = A(m 1, 1) if m > 0 and n = 0 A(m 1, A(m, n 1)) if m > 0 and n > 0 Nanyang Technological University MAS 714 August 31, 2017 24 / 30
Ackermann and Inverse Ackermann Functions Ackermann function A(m, n) defined for m, n 0: n + 1 if m = 0 A(m, n) = A(m 1, 1) if m > 0 and n = 0 A(m 1, A(m, n 1)) if m > 0 and n > 0 A(3, n) = 2 n+3 3 A(4, 3) = 2 265536 3 Nanyang Technological University MAS 714 August 31, 2017 24 / 30
Ackermann and Inverse Ackermann Functions Ackermann function A(m, n) defined for m, n 0: n + 1 if m = 0 A(m, n) = A(m 1, 1) if m > 0 and n = 0 A(m 1, A(m, n 1)) if m > 0 and n > 0 A(3, n) = 2 n+3 3 A(4, 3) = 2 265536 3 α(n) is the inverse of A(n, n) For all practical purposes, α(n) 5. Nanyang Technological University MAS 714 August 31, 2017 24 / 30
Running time of Kruskal s Algorithm Using Union-Find data structure, Kruskal s Algorithm takes O(m) Find operations (two for each edge) O(n) Union operations (one for each edge added to T) 1 sorting operation Nanyang Technological University MAS 714 August 31, 2017 25 / 30
Running time of Kruskal s Algorithm Using Union-Find data structure, Kruskal s Algorithm takes O(m) Find operations (two for each edge) O(n) Union operations (one for each edge added to T) 1 sorting operation Total time = O(mα(n) + n + m log m) = O(m log m) Nanyang Technological University MAS 714 August 31, 2017 25 / 30
Prim s Algorithm T maintained by the algorithm will be a tree, starting from a single vertex. In each iteration, pick edges with least attachedment cost to T. Algorithm: Prim(u): Initialize T = ; // T will store edges of a MST Initialize S = {1}; while T is not a spanning tree of G do choose e = (u, v) E of minimum cost such that u S and v V S; T = T {e}; S = S {v}; return T Nanyang Technological University MAS 714 August 31, 2017 26 / 30
Prim s Algorithm T maintained by the algorithm will be a tree, starting from a single vertex. In each iteration, pick edges with least attachedment cost to T. Algorithm: Prim(u): Initialize T = ; // T will store edges of a MST Initialize S = {1}; while T is not a spanning tree of G do choose e = (u, v) E of minimum cost such that u S and v V S; T = T {e}; S = S {v}; return T O(n) iterations O(m) time to pick edge e in each iteration Total running time = O(mn) Nanyang Technological University MAS 714 August 31, 2017 26 / 30
Correctness T maintained by the algorithm will be a tree, starting from a single vertex. In each iteration, pick edges with least attachedment cost to T. Proof of correctness. 1 If e is added to the tree, then e is safe Let S be the vertices connected by edges in T when e is added. e is the minimum cost edge crossing cut (S, V\S). 2 S is connected in each iteration and eventually S = V. Nanyang Technological University MAS 714 August 31, 2017 27 / 30
More Efficient Implementation Algorithm: SmarterPrim(u): Initialize T = ; // T will store edges of a MST Initialize S = {1}; for u / S, a(u) = arg min e=(u,v),v S c e ; while T is not a spanning tree of G do pick minimum a(u) = (u, v) ; T = T {a(u)}; S = S {u}; update array a; return T Nanyang Technological University MAS 714 August 31, 2017 28 / 30
More Efficient Implementation Algorithm: SmarterPrim(u): Initialize T = ; // T will store edges of a MST Initialize S = {1}; for u / S, a(u) = arg min e=(u,v),v S c e ; while T is not a spanning tree of G do pick minimum a(u) = (u, v) ; T = T {a(u)}; S = S {u}; update array a; return T Maintain vertices in V\S in a priority queue. Nanyang Technological University MAS 714 August 31, 2017 28 / 30
Priority Queue Priority Queues Store a set S of n elements, where each element v S has an associated real/integer key k(v), with the following operations: 1 Make-Queue: create an empty queue 2 Find-Min: find the minimum key in S 3 Extract-Min: remove v S with the smallest key and return it 4 Decrease-Key(v, k (v)): decrease key of v from k(v) to k (v) 5 Add(v, k(v)): add new element v with key k(v) to S Nanyang Technological University MAS 714 August 31, 2017 29 / 30
Priority Queue Priority Queues Store a set S of n elements, where each element v S has an associated real/integer key k(v), with the following operations: 1 Make-Queue: create an empty queue 2 Find-Min: find the minimum key in S 3 Extract-Min: remove v S with the smallest key and return it 4 Decrease-Key(v, k (v)): decrease key of v from k(v) to k (v) 5 Add(v, k(v)): add new element v with key k(v) to S Very useful data structure, will discuss in detail in later lectures. Nanyang Technological University MAS 714 August 31, 2017 29 / 30
Priority Queue Priority Queues Store a set S of n elements, where each element v S has an associated real/integer key k(v), with the following operations: 1 Make-Queue: create an empty queue 2 Find-Min: find the minimum key in S 3 Extract-Min: remove v S with the smallest key and return it 4 Decrease-Key(v, k (v)): decrease key of v from k(v) to k (v) 5 Add(v, k(v)): add new element v with key k(v) to S Very useful data structure, will discuss in detail in later lectures. Prim requires O(n) Extract-Min and O(m) Decrease-Key operations. Using standard Heaps, total time = O((m + n) log n). Using Fibonacci Heaps, total time = O(n log n + m). Nanyang Technological University MAS 714 August 31, 2017 29 / 30
More about MST There is an algorithm that runs in O(n + mα(n)) time. There is a randomized algorithm that runs in O(m + n) expected time. There is an algorithm using bit operations in RAM model that runs in O(m + n) time. Still open: Is there an O(n + m) time deterministic algorithm in the comparison model? Nanyang Technological University MAS 714 August 31, 2017 30 / 30