CS 270 Algorithms. Oliver Kullmann. Breadth-first search. Analysing BFS. Depth-first. search. Analysing DFS. Dags and topological sorting.

Week 5 General remarks and 2 We consider the simplest graph- algorithm, breadth-first (). We apply to compute shortest paths. Then we consider the second main graph- algorithm, depth-first (). And we consider one application of, of graphs. Reading from CLRS for week 5 5 Chapter 22, Sections 22.2, 22., 22.. 6 The algorithm Searching through a graph is one of the most fundamental of all algorithmic tasks. () is a simple but very important technique for ing a connected graph: Such a starts from a given source vertex s and constructs a rooted spanning tree for the graph, called the breadth-first tree ( tree; the root is s). It uses a (first-in first-out) queue as its main data structure. computes the parent π[u] of each vertex u in the breadth-first tree (with the parent of the source being nil). And its distance d[u] from the source s (initialised to ), which is the length of the path from s to u in the breadth-first tree. Thus the tree contains the shortest paths from s to any other vertex, and is called an SPT. Input: A graph G with vertex set V (G) and edges represented by adjacency lists Adj. Queue A queue is a first-in-first-out data structure. (G, s) for each u V (G) 2 d[u] = π[s] = nil d[s] = 0 5 Q = (s) 6 while Q () 7 u = Dequeue[Q] 8 for each v Adj[u] 9 if d[v] = 0 d[v] = d[u] + π[v] = u 2 Enqueue(Q, v)

illustrated Analysis of 0/nil / / / / / / Q = () 0/nil / 2/2 / 2/2 / / Q = (, ) 0/nil / 2/2 / 2/2 / / Q = (5, 6, 7) 0/nil / 2/2 / 2/2 / / Q = (7) d[u]/π[u] u (labelling) 0/nil / / / / / / Q = (2) 0/nil / 2/2 / 2/2 / / Q = (, 5) 0/nil / 2/2 / 2/2 / / Q = (6, 7) 0/nil / 2/2 / 2/2 / / Q = () Correctness Analysis: At termination of (G, s), for every vertex v reachable from s: Time Analysis: v has been encountered; d[v] holds the length of the shortest path from s to v; π[v] represents an edge on a shortest path from v to s. The initialisation takes time Θ(V ). Each vertex is Enqueued once and Dequeued once; these queueing operations each take constant time, so the queue manipulation takes time Θ(V ) (altogether). The Adjacency list of each vertex is scanned only when the vertex is Dequeued, so scanning adjacency lists takes time Θ(E) (altogether). The overall time of is thus Θ(V + E). Background: Why do we get shortest paths? The role of the queue Is it really true that we get always shortest paths (that is, using the minimum number of edges)? Let s assume that to some vertex v there exists a shorter path P in G from s to v than found by. Let this length be d < d[v]. v s, since the distance from s to s is zero (using the path without an edge), and this is correctly computed by. 2 Consider the predecessor u on that shorter path P. If also d[u] would be wrong (that is, too big), than we could use u instead of v. Thus w.l.o.g. d[u] is correct. Now when exploring the neighbours of u, in case v is still unexplored, it would get the correct distance d = d[u] +. 5 So v must have been explored already earlier (than u). 6 So at the time of determining the distance d[u] d[v] 2, the distance d[v] must have been already set. 7 However the distances d[w] set by are non-decreasing! A crucial property of is that the distances set in step 0 of the algorithm are non-decreasing, that is, if we imagine a watch-point set at step 0, and monitor the stream of values d[v], then we will see 0,,...,, 2,..., 2,,...,,.... Why is this the case? This must be due to the queue used whose special properties we haven t yet exploited! Since in step 2, directly after setting the distance, we put the vertex into the queue, the above assertion is equivalent to the statement, that the distances of the vertices put into the queue are non-decreasing (in the order they enter the queue). Now why is this the case?

The role of the queue (cont.) Running on a disconnected graph First we need to notice that once a distance d[v] is set, it is never changed again. A graph G with at least one vertex s is connected if and only if (G, s) yields no d[v] =. The vertices are taken off the queue ( dequeued ) from the left, and are added ( enqueued ) to the right ( first in, first out ). What is added has a distance one more than what was taken away. We start with 0, and add some s. Then we take those s, and add 2 s. The vertices v with d[v] = are precisely the vertices not reachable from s. So for a disconnected G, running (G, s) does not yield a spanning tree for G, but a spanning tree for the component of s. Once the s are finished, we take the 2 s, and add s. And so on that why the sequence of distances is non-decreasing. For example, running on G := ({, 2,, }, { {, 2 }, {, } }) with s = we get π[] = NIL, π[2] =, π[], π[] not set. d[] = 0, d[2] =, d[] = d[] =. Restarting Now what if we need to reach all vertices? Then we need a spanning forest! Running on directed graphs We can run also on a digraph G, with start-vertex s: This means we need to For digraphs we have directed spanning trees/forests. restart on a yet unreached vertex, maintaining the old π and d values, repeating this until all vertices have been reached. Running on the previous example, first with s =, and then restarting with s =, yields 2 Only the vertices reachable from s following the directions of the edges are in that directed tree (directed from s towards the leaves). Still the paths in the directed tree, from s to any other vertex, are shortest possible (given that we obey the given directions of the edges). π[] = NIL, π[2] =, π[] = NIL, π[] =. d[] = 0, d[2] =, d[] = 0, d[] =. Note that now For a digraph there is a much higher chance, that we might need to restart if we wish to cover all vertices. d[v] is the distance of v to its root.

Restarts needed to have all vertices Restarts needed to have all vertices (cont.) In order to cover all vertices, Consider the simple digraph G := ({, 2, }, { (2, ), (, ) }) 2 obtaining a directed spanning forest, one needs to run at least two times, while keeping d and π (and if the first time you start it with s =, then you need to run it three times). Running (G, ), we obtain the directed tree ({ }, ) (no arc). Not covered are vertices 2, this is fine, since from vertex you can t reach them. 2 Running (G, 2), we obtain the directed tree ({, 2 }, { (2, ) }) (one arc). Not covered is vertex again fine, since from vertex 2 you can t reach vertex. We have three possibilities directed spanning forests here: ({ }, ), ({ 2 }, ), ({ }, ). 2 ({, 2 }, { (2, ) }), ({ }, ). ({, }, { (, ) }), ({ 2 }, ). Note that due to keeping the d-array between the different runs, there is no overlap. Background: Arcs of different lengths The edges of (di-)graphs have (implicitly) a length of one unit. If arbitrary non-negative lengths are allowed, then we have to generalise to Dijkstra s algorithm. This generalisation must keep the essential properties, that the distances encountered are the final ones and are non-decreasing. But now not all edges have unit-length, and thus instead of a simple queue we need to employ a priority queue. A priority queue returns the vertex in it with the smallest value (distance). () is another simple but very important technique for ing a graph. Such a constructs a spanning forest for the graph, called the depth-first forest, composed of several depth-first trees, which are rooted spanning trees of the connected components. recursively visits the next unvisited vertex, thus extending the current path as far as possible; when the gets stuck in a corner it backtracks up along the path until a new avenue presents itself. computes the parent π[u] of each vertex u in the depth-first tree (with the parent of initial vertices being nil), as well as its discovery time d[u] (when the vertex is first encountered, initialised to ) and its finishing time f [u] (when the has finished visiting its adjacent vertices).

The algorithm illustrated (G) for each u V (G) 2 d[u] = time = 0 for each u V (G) 5 if d[u] = 6 π[u] = nil 7 -Visit(u) -Visit(u) time = time + 2 d[u] = time for each v Adj[u] if d[v] = 5 π[v] = u 6 -Visit(v) 7 time = time + 8 f [u] = time Analysis: -Visit(u) is invoked exactly once for each vertex, during which we scan its adjacency list once. Hence, like, runs in time Θ(V + E). /nil / / / / / / d[u] f [u] /π[u] u / / / (labelling) / / / Stack = () u = Stack = () u = 2 / / / / / / /2 / / /2 / / Stack = (2, ) u = Stack = (, 2, ) u = / / / / / / /2 5 / / /2 5 / 6 /5 Stack = (,, 2, ) u = 5 Stack = (5,,, 2, ) u = 7 / / 9 / /nil 2 / / 9 / 0 /2 5 / 6 /5 8 7 2 /2 5 / 6 /5 8 7 Stack = (,, 2, ) u = 6 Stack = () Running on directed graphs What are the times good for? Again (as with ), we can run on a digraph G. Again, no longer do we obtain spanning trees of the connected components of the start vertex. But we obtain a directed spanning tree with exactly all vertices reachable from the root (when following the directions of the edges). Different from, the root (or start vertex) normally does not play a prominent role for : Thus we did not provide the form of with a start vertex as input. 2 But we provided the forest-version, which tries all vertices (in the given order) as start vertices. This will always cover all vertices, via a directed spanning forest. (Directed) trees do not contain shortest paths to that end their way of exploring a graph is too adventuresomely (while is very cautious ). Nevertheless, the information gained through the computation of discovery and finish times is very valuable for many tasks. We consider the example of scheduling later. In order to understand this example, we have first to gain a better understanding of the meaning of discovery and finishing time.

Existence of a path versus finishing times Background: Proof Lemma Consider a digraph G and nodes u, v V (G) with d[u] < d[v]. Then there is a path from u to v in G if and only if f [u] > f [v] holds. In words: If node u is discovered earlier than node v, then we can reach v from u iff v finishes earlier than u. If there is a path from u to v, then, since v was discovered later than u, the recursive call of -Visit(v) must happen inside the recursive call of -Visit(u) (by construction of the discovery-loop), and thus v must finish before u finishes. In the other direction, if v finishes earlier than u, then the recursive call of -Visit(v) must happen inside the recursive call of -Visit(u) (since the recursion for u must still be running). Thus, when discovering v, we must be on a path from u to v. Directed acyclic graphs An important applications of digraphs G is with scheduling: The vertices are the jobs (actions) to be scheduled. A directed edge from vertex u to vertex v means a dependency, that is, action u must be performed before action v. Now consider the situation where we have three jobs a, b, c and the following dependency digraph: G = a b Topological Given a dag G modelling a scheduling task, a basic task is to find a linear ordering of the vertices ( actions ) such that all dependencies are respected. This is modelled by the notion of. A sort of a dag is an ordering of its vertices such that for every edge (u, v), u appears before v in the ordering. For example consider Clearly this can not be scheduled! 8 8888888 c In general we require G to by acyclic, that is, G must not contain a directed cycle. G = a 8888888 b 8 c The two possible s of G are a, c, b and c, a, b. A directed acyclic graph is also called a dag.

Finishing times in DAGs Background: Proof There are two cases regarding the discovery times of u and v: Lemma 2 After calling on a dag, for every edge (u, v) we have f [u] > f [v]. If d[u] < d[v], then by Lemma we have f [u] > f [v] (since there is a path from u to v (of length )). 2 Now assume d[u] > d[v]. If f [u] > f [v], then we are done, and so assume that f [u] < f [v]. But then by Lemma there is a path from v to u in G, and this together with the edge from u to v establishes a cycle in G, contradicting that G is a dag. Topological via Topological illustrated Consider the result of running the algorithm on the following dag. Corollary To ly sort a dag G, we run on G and print the vertices in reverse order of finishing times. Reverse order can be obtained by putting each vertex on the front of a list as they are finished. /20 2/26 22/25 27/28 / m n o p q r s t u v w (labelling: 2/5 6/9 7/8 0/7 x y z /2 9/8 /5 d[u]/f [u]) 2/2/6 Listing the vertices in reverse order of finishing time gives the following of the vertices: u : p n o s m r y v w z x u q t f [u]: 28 26 25 2 20 9 8 7 6 5 2 8 5

The extracted directed spanning forest A directed spanning -forest (with 2 restarts) m q r t u y n o s p m n q r x o t u y s p v v x w w z z Replay the discovery and finishing times of on this forest! Note how this directed tree is developed in layers! Deciding acyclicity Revisiting the lemma for In Lemma 2 we said: For a graph G (i.e., undirected) detecting the existence of a cycle is simple, via or : G has a cycle (i.e., is not a forest) if and only if resp. will discover a vertex twice. One has to be a bit more careful here, since the parent vertex will always be discovered twice, and thus has to be excluded, but that s it. If G has no cycles, then along every edge the finishing times strictly decrease. So if we discover in digraph G an edge (u, v) with f [u] < f [v], then we know there is a cycle in G. Is this criterion also sufficient? YES: just consider a cycle, and what must happen with the finishing times on it. However for digraphs it is not that simple do you see why? Lemma Consider a digraph G. Then G has a cycle if and only if every run of on G yields some edge (u, v) E(G) with f [u] < f [v].