CS2 Algorithms and Data Structures Note 10 Depth-First Search and Topological Sorting In this lecture, we will analyse the running time of DFS and discuss a few applications. 10.1 A recursive implementation of DFS In the last lecture, we have seen an iterative implementation of DFS that used a stack to store the vertices that have been visited, but not yet fully processed. Algorithms 10.1 and 10.2 show a recursive implementation of DFS. It is somewhat closer to our intuitive understanding of depth-first search: To visit all vertices reachable from the start vertex, it visits, then the first neighbour of and all vertices reachable from this first neighbour remember that the search is depth-first. Then it visits the next neighbour and all vertices reachable from it, except those that have already been visited before, et cetera. If we write this down in pseudo-code, we ll end up with Algorithms 10.2. It is not really surprising that we can replace the recursive implementation by one using a stack, because after all stacks are used by the compiler to implement recursion. Algorithm dfs 1. Initialise Boolean array by setting all entries to FALSE 2. for all do. if FALSE then 4. dfsfromvertex Algorithm 10.1 Algorithm dfsfromvertex 1. TRUE 2. for all! adjacent to do. if "! FALSE then 4. dfsfromvertex! Algorithm 10.2 1
Let us analyse the running time of dfs. At first sight, it seems quite complicated because of the recursive calls inside the loops. Let us disregard the recursive calls for a moment. Let be the number of vertices of the input graph and the number of edges. Then dfs requires time. Moreover, dfsfromvertex requires time out-degree, because the loop is iterated out-degree times. Now the crucial observation is that dfsfromvertex is invocated exactly once for every vertex. To see that it is invocated at least once, note that is set to TRUE only if dfsfromvertex is invocated. So if the method was never invocated, then would remain false. But this cannot happen, because for all with FALSE dfsfromvertex is invocated in Line 4 of dfs (in the -execution of the loop). To see that dfsfromvertex is invocated at most once, note that it is only invocated if FALSE. However, after its first execution is TRUE and can never become FALSE again. Thus indeed dfsfromvertex is invocated exactly once for every vertex. Therefore, we get the following expression for the running time of dfs : out-degree out-degree Let be the number of edges of. Then out-degree dfs, and we get Note that here we count an undirected edge as two edges, one in each direction. If we don t want to do this in the case of an undirected graph, we get degree (for undirected graphs), but since, this does not really make a difference. 10.2 DFS Forests It is worth noting that a DFS starting at some vertex explores the graph by building up a tree that contains all vertices that are reachable from and all edges that are used to enter these vertices. We call this tree a DFS-tree. A complete DFS exploring the full graph (and not only the part reachable from a vertex ) builds up a collection of trees, or forest, called a BFS-forest. Suppose, for example, that we explore the graph in Figure 10. by a DFS starting at vertex that visits the vertices in the following order:. The corresponding DFS-forest is shown in Figure 10.4. Note that, just like the order in which the vertices are visited during a DFS, a DFS-forest is not unique. Figure 10.5 shows another DFS-forest for the graph in Figure 10.. 2
! 0 1 2 4 5 6 Figure 10.. 0 2 4 6 1 5 Figure 10.4. 10. Connected components As a first application of DFS, we want to compute the connected components of an undirected graph. Recall the definition of connected components first. Let be an undirected graph. A subset of is connected if for all there is a path from to!. (This includes the case that! because in this case there s a path of length 0 from to!.) A connected component of an undirected graph is a maximum connected subset of. Here maximum connected subset means that there is no connected subset of that strictly contains (and not that is a connected subset of with the maximum number of vertices). An undirected graph is connected if it only has one connected component, that is, if for all vertices! there is a path from to!. Our algorithm is based on the following two simple observations:
0 4 2 6 5 1 Figure 10.5. (1) Each vertex of an undirected graph is contained in exactly one connected component. (2) For each vertex of an undirected graph, the connected component that contains is precisely the set of all vertices that are reachable from. visits exactly the vertices in the connected com- Thus, by (2), dfsfromvertex ponent of. We modify dfs as follows: We add a statement print in after line 1 of dfs- FromVertex. After this modification, dfsfromvertex prints exactly the vertices in the connected component of. Then we add a statement print New Component before each call of dfsfromvertex in line 4 of dfs. The modified algorithm will print New Component, followed by the vertices of a connected component, and repeat that until all components have been printed. Of course, depending on what we want to do with the components, we may modify the algorithm in such a way that it returns the connected components in some convenient format. For example, it may put the vertices of each component into a linked list and return a linked list of these linked lists. The asymptotic running time of this algorithm for computing the connected. components is clearly the same as the running time of dfs, i.e., Components of directed graphs It is not so clear what connectivity means in directed graphs. For example, is the graph in Figure 10. connected? It looks fairly connected, but then vertex 6 is not reachable from vertex 0. We say that a directed graph is weakly connected if the undirected graph we obtain from by disregarding the direction of the edges is connected. A directed graph is strongly connected if for all vertices! 4
there is a path from to! and a path form! to. Strong connectivity is the more important notion. For example, the graph in Figure 10. is weakly connected, but not strongly connected; for example, there is no path from to. Derived from the notions of weak and strong connectivity, we have weakly connected components and strongly connected components. For example, the digraph in Figure 10. only has one weakly connected component (containing all vertices), and it has three strongly connected components: Computing the weakly connected components of a directed graph is easy (writing an algorithm doing this is a good exercise and asking for one a good exam question). Computing the strongly connected components is much harder. It can also be done by an algorithm based on DFS, but this application of DFS is much more sophisticated than those discussed in CS2. 10.4 Classifying vertices during a DFS Let be a graph. Recall that during an execution of dfs, the subroutine dfsfromvertex is invocated exactly once for each vertex. Let us call vertex finished after dfsfromvertex is completed. During the execution of dfs, a vertex can be in three states: not yet visited (let us call a vertex in this state white), visited, but not yet finished (grey). finished (black). We can modify our DFS algorithm so that it keeps track of the states of the vertices (see Algorithms 10.6 and 10.7). Algorithm dfs 1. Initialise array by setting all entries to 2. for all do. if then 4. dfsfromvertex Algorithm 10.6. 5
Algorithm dfsfromvertex 1. 2. for all! adjacent to do. if "! then 4. dfsfromvertex! 5. Algorithm 10.7 Lemma 10.8. Let be a graph and a vertex of. Consider the moment during the execution of dfs when dfsfromvertex is started. Then for all vertices! we have: (1) If! is white and reachable from, then! will be black before. (2) If! is grey, then is reachable from!. We will not give a formal proof here. Intuitively, (1) follows from the fact that all vertices! that are reachable from are either black before dfsfromvertex is started (and thus black before ) or they will be visited during the execution of dfsfromvertex, because a DFS starting at visits all vertices reachable from that have not been visited earlier. Thus they will become black while is still grey. (2) follows from the fact if! is still grey, dfsfromvertex! is not yet completed. However, a DFS starting at! only visits vertices reachable from!. Thus must be reachable from!. 10.5 Topological Sorting Suppose you have a list of tasks to do, some of which depend on others to be completed first. For example, a practical may involve 10 tasks (numbered 0 9 for simplicity). Task 0 must be completed before Task 1 can be started. Task 1 and Task 2 must be completed before Task can be started. Task 4 must be completed before Task 0 or Task 2 can be started. Task 5 must be completed before Task 0 or Task 4 can be started. Task 6 must be completed before Task 4 or Task 5 can be started. Task 7 must be complete before Task 0 or Task 9 can be started. Task 8 must be completed before Task 7 or Task 9 can be started. A good way to arrange all this information is in a dependency graph. The vertices of this directed graph are the tasks to be performed, and there is an edge from task to task! if must be completed before! can be started. Figure 10.9 shows the dependency graph of our example. (Fortunately, the dependency graph of CS2 Practical 8 is simpler). Before we can carry out the tasks we have to arrange them in an order that respects all the dependencies. This is what we call a topological order of the 6
! 5 0 4 2 1 6 7 8 9 Figure 10.9. dependency graph. Definition 10.10. Let be a directed graph. A topological order of is a total order of the vertex set such that for all edges we have!. For example, a topological order of the digraph in Figure 10.9 is It is not obvious how to find a topological order of a digraph efficiently. As a matter of fact, it not even clear whether every digraph has a topological order. A moment s thought reveals that there are digraphs that do not have a topological order: If there are vertices,! such that there is both an edge from to! and an edge from! to, then there no topological order of the graph exists, because neither! nor letting! would comply with the definition. More generally, if the graph has a cycle then there is no topological order. Let us call a directed graph that does not a have a cycle a directed acyclic graph, or DAG for short. Does every DAG have a topological order? The answer to this question is yes. To prove this, we give an algorithm that computes a topological order of a given DAG. Before we explain the algorithm, let us just record the result: Theorem 10.11. A directed graph has a topological order if, and only if, it is a DAG. Our algorithm is based on DFS. Let be a directed graph. Consider the execution of dfs. We define an order of the vertices of by saying that 7
!!! if! becomes black before (i.e., vertices that finish later are smaller in the order). I claim that if is a DAG, then is a topological order of. To prove this, let us assume that is a DAG. Let. We have to prove that!, i.e., that! becomes black before. Consider the moment in the DFS when dfsfromvertex is called. If! is already black at this moment, there is nothing to prove. If! is white, then by Lemma 10.8(1),! will be black before. If! is grey, then by Lemma 10.8(2) is reachable from!. Thus there is a path from! to, and together with the edge, this path forms a cycle. But we assumed that is acyclic, so this cannot happen. Note that our argument gives us some additional information: If we find, during the execution of dfsfromvertex for some vertex, an edge from to a grey vertex!, then we know that contains a cycle. We can now modify our basic DFS-algorithm in order to get an algorithm for computing the order and printing the vertices in this order. If the input graph is not a DAG, our algorithm will simply print has a cycle. The algorithm adds all vertices to the front of a linked list when they become black. Thus vertices becoming black earlier appear later in the list, which means that the list is in order. If during the execution of sortfromvertex for some vertex, an edge from to a grey vertex! is found, then there must be a cycle, and the algorithm reports this and stops. Algorithm topsort 1. Initialise array by setting all entries to 2. Initialise linked list. for all do 4. if then 5. sortfromvertex 6. print all vertices in in the order in which they appear Algorithm 10.12. The running time of topsort is the same as that of dfs,. Exercises 1. Give different DFS-forests for the graph in Figure 10.14. 8
Algorithm sortfromvertex 1. 2. for all! adjacent to do. if "! then 4. dfsfromvertex! 5. else if! then 6. print has a cycle 7. halt 8. 9. insertfirst Algorithm 10.1 n p s t u r o q v w x z y Figure 10.14. 2. The reflexive transitive closure of a directed graph is the graph with the same vertex set as and an edge from vertex to vertex! if there is a path (possibly of length ) from to!. Describe an algorithm that computes the reflexive transitive closure of a graph in time, where is the number of vertices and the number of edges of. Represent the output in adjacency matrix representation. Martin Grohe 9