DD2458, Problem Solving and Programming Under Pressure Lecture 5: Graph algorithms 1 Date: 2008-10-01 Scribe(s): Mikael Auno and Magnus Andermo Lecturer: Douglas Wikström This lecture presents some common problems in graph theory and how to solve them. 1 Graphs 1.1 Definitions Definition 1.1 An undirected graph consists of a set of vertices V and a set of edges E, where an edge is an unordered pair of vertices. Definition 1.2 A directed graph consists of a set of vertices V and a set of edges E, where an edge is an ordered pair of vertices. 1.2 Notation A graph is often denoted either G = (V, E) or G = V, E. An edge is denoted e = {u, v} or e = (u, v) in the undirected case and e = (u, v) in the directed case. 1.3 Data Structures When working with graphs it is important think through the choice of data structure to store the graph in as the suitability of each data structure depends highly on the characteristics of the graph and the algorithm used. Adjacency lists and adjacency matrices is two of the most common data structures for graphs. 1.3.1 Adjacency Lists When using adjacency lists, each vertex has a list of its neighbors (i.e. the vertices it is connected to). This structure has good memory performance (especially when compared to an adjacency matrix) for a sparse graph and good time performance for algorithms where iterating over an egde s neighbors is a common operation. For dense graphs this structure has poor time performance for testing for the existence of an edge between two given vertices. 1
2 DD2458 Popup HT 2008 Figure 1: The minimum spanning tree of a graph with weighted edges. 1.3.2 Adjacency Matrix An adjacency matrix is a matrix A of dimensions V by V where the value at A i,j indicate wether or not there is an edge between vertices v i and v j. In practice it could be useful to let the values of an adjacency matrix be pointers to some supplemental information structure and to let the null pointer indicate the absence of an edge and to let all other pointers indicate the precense of an edge at the same time as giving access to information about that particular edge. The adjacency matrix data structure has poor memory performance for sparse graphs and has poor time performance for algorithms where iterating over an edge s neighbors is a common operation in the case of a sparse graph. Testing for the existence of an edge between two given vertices on the other hand is a constant time operation. 1.4 A Word of Caution In mathematics and theoretical computer science, it is often clear from the context which type of graph is implied. Be careful to know which type of graph is refered to by an algorithm as a problem can be easy to solve for an undirected graph but hard to solve for a directed graph. Also, give careful thought to how the graph will be handled during the course of an algorithm. Will the vertices or edges be relabeled? Will vertices or edges be removed or added? 2 Minimum Spanning Tree Definition 2.1 Given an undirected, weighted and connected graph G, the tree T that contains all vertices of G and has the lowest possible total edge weight, is called the minimum spanning tree, or MST, of G. See Figure 1 for an example of a minimum spanning tree.
Graph algorithms 1 3 2.1 A Useful Lemma This lemma will prove useful in understanding and proving some of the algorithms used to solve this problem. Lemma 2.2 Let G = (V, E) be a graph, let W V, and let e = (u, v) be an edge of minimum weight such that u W and v V \ W. Then there exists an MST that cotains e. Proof If e is not included in an MST (V, E T ), a cycle would form if e is added. The cycle includes an edge f = (u, v ) such that u W and v V \ W since (V, E T ) is an MST. Therefor (V, {e} (E T \ {f})) is an MST. 2.2 Kruskal s Algorithm Algorithm 1 shows Kruskal s algorithm for finding a minimum spanning tree of a graph. The correctness follows from Lemma 2.2. It is unclear from Algorithm 1 how to test for acyclicity, but it can be done so that the sort is the expensive operation (i.e. the running time is O(m log m)). Algorithm 1: Kruskal s algorithm for finding the MST Input: An undirected graph G = (V, E). Output: A minimum spanning tree of G. Kruskal(G) (1) Sort edges e 1,...,e m E so that w(e i ) w(e i+1 ),i = 1,...,m 1. (2) E T (3) for i = 1 to m (4) if (V,E T {e i }) is acyclic (5) E T E T {e i } (6) return (V,E T ) 2.3 Prim s Algorithm Algorithm 2 shows an abstract version of Prim s algorithm for finding the minimum spanning tree of a graph. Algorithm 3 shows a more concrete version of the same. Algorithm 2: Prim s algorithm for finding the MST Input: An undirected graph G = (V, E). Output: A minimum spanning tree of G. Prim(G) (1) V T {1}, E T (2) while V T V (3) Find (u,v) E (V T V \ V T ) with minimal w(u,v). (4) E T E T {(u,v)} (5) V T V T {v} (6) return (V,E T )
4 DD2458 Popup HT 2008 Algorithm 3: Prim s algorithm for finding the MST Input: An undirected graph G = (V, E). Output: A minimum spanning tree of G. Prim(G) (1) V T {1} (2) p(u), c(u) for u V (3) c(1) 0, H MinHeap c (V ) (4) while V T V (5) v GetHeapMin(H) (6) V T V T {v} (7) foreach (v,u) with u / V T (8) if w(v,u) < c(u) (9) c(u) w(v,u), p(u) v (10) UpdateHeap(H, u) (11) return (V,E T ) where E T is induced by p( ) 2.3.1 Performance We do n extractions from the heap. We do 2m updates of values in the the heap. With an ordinary heap (binary heap), the complexity is O((n + m) log n), because extractions and updatations take O(log n) time. The time to create the heap is O(n). With a Fibonacii heap, the heap can be updated in O(1) amortized time 1, the complexity of extractions of the first element is O(1) amortized time, and the complexity of moving an element up is O(1) amortized time. So the running time with a Fibonacci heap is O(m + n logn). 2.3.2 Remarks on Algorithm 3 In C++ a binary heap can be implemented using std::vector together with the heap alogrithms by including the <algorithm> header. MinHeap is equvalent to std::make_heap together with std::greater<costtype> and GetHeapMin is equivalent to std::pop_heap together with std::greater<costtype>. UpdateHeap moves the element up one level and is a simple algorithm. 3 Topological Sorting Problem Given is a directed acyclic graph G = (V, E). Order the graph so that each vertex comes before its neighbours that can be reached with out-edges.
Graph algorithms 1 5 Algorithm 4: Kahn s algorithm for finding the topological order. Input: A directed acyclic graph G = (V,E). Output: The vertices in V sorted in topological order. Kahn(G) (1) Q Queue({u V : deg in (u) = 0}) (2) i 1 (3) while Q (4) u Pop(Q) (5) ord(u) i (6) i i + 1 (7) foreach (u,v) E (8) E E \ {(u,v)} (9) if deg in (v) = 0 (10) Push(Q, v) (11) return ord 3.1 Kahn s Algorithm 3.2 DFS Algorithm Algorithm 5 shows how to solve this problem using DFS and Algorithm 6 shows its recursion helper. Algorithm 7 shows the recursion helper with an added test for cycles. Algorithm 5: DFS algorithm for finding the topological order. Input: A directed acyclic graph G = (V,E). Output: The vertices in V sorted in topological order. TopoDFS(G) (1) foreach u V (2) vis(u) = false (3) ord(u) = (4) i V (5) foreach u V (6) if vis(u) = false (7) i InnerDFS(u, i, vis, ord) (8) return ord 4 Strongly Connected Components Problem Let G = (V, E) be a directed graph. Two vertices u, v V are in the same strongly connected component if there exists a path from u to v and a path from v to u. Figure 2 illustrates strongly connected components in a simple directed graph. 1 Amortized time is the average running time of a sequence of worst case operations.
6 DD2458 Popup HT 2008 Algorithm 6: Recursion helper for TopoDFS. InnerDFS(u, i, vis, ord) (1) vis(u) true (2) foreach (u,v) E (3) if vis(v) = false (4) i InnerDFS(u, i, vis, ord) (5) ord(u) i (6) return i 1 Algorithm 7: Recursion helper for TopoDFS, with test for cycles. InnerDFS(u, i, vis, ord) (1) vis(u) true (2) foreach (u,v) E (3) if vis(v) = false (4) i InnerDFS(u, i, vis, ord) (5) else if ord(v) = (6) return (7) ord(u) i (8) return i 1 4.1 Solution 1. Sort the vertices topologically (using Algorithm 5 without the test for cycles). 2. Reverse the direction of the edges (which is the same thing as transposing the adjacency matrix). 3. Run DFS on all vertices in topological order, remembering visited markings between runs. All vertices visited in a run of DFS is in the same strongly connected component. Why does this solution work? Lemma 4.1 Let C and C be two different strongly connected components in a Figure 2: A directed graph grouped by strongly connected components.
Graph algorithms 1 7 graph G = (V, E) and let (u, v) E (C C) where E = {(v, u) (u, v) E}. Then min w C {ord(w)} min w C {ord(w) }. (Note that the adjacency matrix of E is the transposition of the adjacency matrix of E). The lemma says that if there is an edge from a vertex in the strongly connected component C to a vertex in the strong connected component C (with the edges already reversed), then the order (as calculated before reversing the edges) of the vertex with the lowest order in C is less than or equal to the order of the vertex with the lowest order in C. This solution is illustrated in pseudocode in Algorithm 8 and Algorithm 9. Algorithm 8: DFS algorithm for finding strongly connected components. Input: A directed graph G = (V, E). Output: The strongly connected components of G. SCC(G) (1) vis(u) = false for u V (2) S (3) ord TopoDFS(G) (4) for i = 1 to V (5) S S {CollectDFS(ord 1 (i), vis)} (6) return S \ { } Algorithm 9: Recursion helper for Algorithm 8. CollectDFS(u, vis) (1) if vis(u) = true (2) return (3) else (4) vis(u) = true (5) C (6) foreach (u,v) E (7) C C CollectDFS(v, vis) (8) return C 5 Single-Source Shortest Path Problem Let G = (V, E) be a directed weighted graphwith non-negative weights and s V. For every v V, find the path from s to v with lowest total weight. 5.1 Solution Abstract Idea 1 If edge weights are discrete values, then find for every d = 0, 1, 2, 3,... the vertices that can be reached through a path from s with a weight of d. Let p(v) be the vertex just before v in such a path.
8 DD2458 Popup HT 2008 Concrete Idea 1 Think of an edge of weight a as chain of length a and run BFS from s, where every vertex gets a pointer to its parent. (This can produce a graph with many edges that can be slow when searching. This idea is thus not good enough.) Abstract Idea 2 Start with a set S = {s}. Repeat n 1 times: add to S the vertex v / S that lies closest to s. Let p(v) be a neighbor to v in S that recursively gives a path that satisfies the guarantee. Concrete Idea 2 It is enough to check the neighbours of S. Choose as v the neighbour of S that has the lowest D(s, u) + w(u, v) för some u S and set p(v) = u, where D(s, u) is the distance from s to u. Update the neighbors of S with the neighbors of v. 5.2 Dijkstra A vertex u with a minimal distance d(u) can be found with an algorithm resembling Algorithm 3, Prim s algorithm. The running time depends on the way it is implemented, in the same way as for Algorithm 3. This algorithm is Dijkstras algorithm (see Algorithm 10). Algorithm 10: Dijkstra s algorithm for solving the single-source shortest path problem. Input: A directed graph G = (V,E) non-negative weights w( ) and a vertex s V. Output: The shortest path from s to each u V. Dijkstra(G, w( ), s) (1) foreach u V (2) p(u) (3) d(u) (4) vis(u) false (5) d(s) 0 (6) for i = 1 to n (7) Find a vertex u with vis(u) = false and minimal d(u) (8) vis(u) true (9) if d(u) = (10) return (11) foreach (u,v) E with vis(v) = false (12) if d(u) + w(u,v) < d(v) (13) d(v) d(u) + w(u,v) (14) p(v) u (15) return p( ) 5.2.1 Why is Algorithm 10 Correct? Claim 5.1 Let S = {x V vis(x) = true}. For each vertex x S, d(x) denotes the shortest distance from s to x. Let P s be the set of paths starting at s and ending
Graph algorithms 1 9 at nodes outside of S, where all other nodes on the path lies in S. Let P denote the shortest path in P s. Each iteration of the outer loop we add the ending node, u, of P to the set S. There cannot exist a shorter path to u than P. Proof Let us suppose the opposite; that there exist such a path P. Let w be the first node on P that is not in S. Let L be the distance through P to w. Since edges cannot have negative weight, L must be less than the distance to u through P. But if that is the case, our algorithm would have picked w instead. Contradiction, our assumption that the shortest path to u is P is correct. The algorithm must thus be correct.