Algorithm Design and Analysis LECTURE 4 Graphs Definitions Traversals Adam Smith 9/8/10
Exercise How can you simulate an array with two unbounded stacks and a small amount of memory? (Hint: think of a tape machine with two reels) What is the cost per operation? What if you only have one stack and constant memory? Can you still simulate arbitrary access to an array? (Hint: think about pushdown automata.) No. For example, you can t check three unary numbers for equality. 9/8/10
Graphs 9/8/10
Graphs (KT Chapter 3) Definition. A directed graph (digraph) G = (V, E) is an ordered pair consisting of a set V of vertices (synonym: nodes), a set E V V of edges An edge e=(u,v) goes from u to v (may or may not allow u=v) In an undirected graph G = (V, E), the edge set E consists of unordered pairs of vertices Sometimes write e={u,v} How many edges can a graph have? In either case, E = O(V 2 ). 9/3/2008
Graphs are everywhere Example Nodes Edges Transportation network: airline routes airports nonstop flights Communication networks computers, hubs, routers physical wires Information network: web pages hyperlinks Information network: articles references scientific papers Social networks people u is v s friend, u sends email to v, u s MySpace page links to v 9/3/2008
Paths and Connectivity Path = sequence of consecutive edges in E (u,w 1 ), (w 1,w 2 ), (w 2,w 3 ),, (w k-1, v) Write u v or u v (Note: in a directed graph, direction matters) Undirected graph G is connected if for every two vertices u,v, there is a path from u to v in G 9/3/2008
Cycles Def. A cycle is a path v 1, v 2,, v k-1, v k in which v 1 = v k, k > 2, and the first k-1 nodes are all distinct. cycle C = 1-2-4-5-3-1 7
Exercises Suppose an undirected graph G is connected. True or false? G has at least n-1 edges. Suppose that an undirected graph G has exactly n-1 edges (and no self-loops) True or false? G is connected. What if G has n-1 edges and no cycles? 9/5/2008
Trees Def. An undirected graph is a tree if it is connected and does not contain a cycle. Theorem. Let G be an undirected graph on n nodes. Any two of the following statements imply the third. G is connected. G does not contain a cycle. G has n-1 edges. 9
Rooted Trees Rooted tree: Given a tree T, choose a root node r and orient each edge away from r. Models hierarchical structure. root r parent of v v child of v 10 a tree the same tree, rooted at 1
Phylogeny Trees Phylogeny trees. Describe evolutionary history of species. 11
Parse Trees Internal representation used by compiler, e.g.: if (A[x]==2) then (32 2 + (a*64 +12)/8) else fibonacci(n) if-then-else == + fn-call array ref 2 power + / 8 fibonacci n A x 32 2 * 12 9/8/10 a 64
Paths and Connectivity Directed graph? Strongly connected if for every pair, u v and v u s a b 0 1 3 1 c e 2 f g h 3 2 3 i 3 9/3/2008
Exploring a graph Classic problem: Given vertices s,t V, is there a path from s to t? Idea: explore all vertices reachable from s Two basic techniques: Breadth-first search (BFS) Explore children in order of distance to start node Depth-first search (DFS) How to convert these descriptions to precise algorithms? Recursively explore vertex s children before exploring siblings 9/5/2008
Breadth First Search BFS intuition. Explore outward from s in all possible directions, adding nodes one "layer" at a time. BFS algorithm. s L 0 = { s }. L 1 L 2 L n-1 L 1 = all neighbors of L 0. L 2 = all nodes that do not belong to L 0 or L 1, and that have an edge to a node in L 1. L i+1 = all nodes that do not belong to an earlier layer, and that have an edge to a node in L i. 15
Breadth First Search L 0 L 1 L 2 16 L 3
Breadth First Search Distance(u,v): number of edges on shortest path from u to v Properties. Let T be a BFS tree of G = (V, E). Nodes in layer i have distance i from root s Let (x, y) be an edge of G. Then the level of x and y differ by at most 1. L 0 L 1 L 2 17 L 3
Following material not covered in lecture but important to review 9/8/10
BFS App: Connected Component Connected component. Find all nodes reachable from s. 19
Flood Fill Flood fill. Given lime green pixel in an image, change color of entire blob of neighboring lime pixels to blue. Node: pixel. Edge: two neighboring lime pixels. recolor lime green blob to blue Blob: connected component of lime pixels. 20
Flood Fill Flood fill. Given lime green pixel in an image, change color of entire blob of neighboring lime pixels to blue. Node: pixel. Edge: two neighboring lime pixels. recolor lime green blob to blue Blob: connected component of lime pixels. 21
Connected Component Connected component. Find all nodes reachable from s. s R u v it's safe to add v 22
BFS example in a directed graph s a b 0 1 3 1 c e 2 f g h 3 2 3 i 3 9/8/10
BFS also works in directed graphs s a b 0 1 3 1 c e 2 f g h 3 2 3 i 3 9/8/10
Implementing Traversals Generic traversal algorithm 1. R = {s} 2. While there is an edge (u,v) where u R and v R, Add v to R To implement this, need to choose Graph representation Data structures to track Vertices already explored Edge to be followed next These choices affect the order of traversal 9/5/2008
Adjacency-matrix representation The adjacency matrix of a graph G = (V, E), where V = {1, 2,, n}, is the matrix A[1.. n, 1.. n] given by A[i, j] = A 1 2 3 4 1 if (i, j) E, 0 if (i, j) E. 2 1 1 2 0 1 1 0 0 0 1 0 Storage: (V 2 ) Good for dense graphs. 3 4 3 4 0 0 0 0 0 0 1 0 Lookup: O(1) time List all neighbors: O( V ) 9/5/2008
Adjacency list representation An adjacency list of a vertex v V is the list Adj[v] of vertices adjacent to v. 2 1 3 4 Adj[1] = {2, 3} Adj[2] = {3} Adj[3] = {} Adj[4] = {3} For undirected graphs, Adj[v] = degree(v). For digraphs, Adj[v] = out-degree(v). How many entries in lists? 2 E Total (V + E) storage good for sparse graphs. Typical notation: n = V = # vertices m = E = # edges 9/5/2008
BFS with adjacency lists d[1.. n]: array of integers initialized to infinity use to track distance from root (infinity = vertex not yet explored) Queue Q initialized to empty Tree T initialized to empty 9/5/2008
BFS pseudocode BFS(s): 1. Set d[s]=0 2. Add s to Q 3. While (Q not empty) a) Dequeue (u) b) For each edge (u,v) adjacent to u a) If d[v] == then a) Set d[v] =d[u]+1 b) Add edge (u,v) to tree T c) Enqueue v onto Q 9/5/2008
Theorem: BFS takes O(m+n) time BFS(s): 1. Set Discovered[s]=1 2. Add s to Q O(1) time, run once overall. 3. While (Q not empty) a) Dequeue (u) b) For each edge (u,v) adjacent to u a) If Discovered[v]= false then a) Set Discovered[v] =true b) Add edge (u,v) to tree T c) Add v to Q O(1) time, run once per vertex O(1) time per execution, run at most twice per edge Total: O(m+n) time (linear in input size) 9/5/2008
Notes If s is the roof of BFS tree For every vertex u, path in BFS tree from s to u is a shortest path in G depth in BFS tree = distance from u to s Proof of BFS correctness: see KT, Chapter 3. 9/5/2008
BFS Review Recall: Digraph G is strongly connected if for every pair of vertices, s t and s t Question: Give an algorithm for determining if a graph is connected. What is the running time? 9/5/2008
Strong Connectivity: Algorithm Lemma: G is strongly connected if and only if for any node s, every other node t has paths to and from s. Theorem. Can determine if G is strongly connected in O(m + n) time. Pf. Pick any node s. reverse orientation of every edge in G Run BFS from s in G. Run BFS from s in G rev. Return true iff all nodes reached in both BFS executions. Correctness follows immediately from the previous lemma. strongly connected not strongly connected
Directed Acyclic Graphs Def. An DAG is a directed graph that contains no directed cycles. Ex. Precedence constraints: edge (v i, v j ) means v i must precede v j. Def. A topological order of a directed graph G = (V, E) is an ordering of its nodes as v 1, v 2,, v n so that for every edge (v i, v j ) we have i < j. v 2 v 3 v 6 v 5 v 4 v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 7 v 1 a DAG a topological ordering
Precedence Constraints Precedence constraints. Edge (v i, v j ) means task v i must occur before v j. Applications. Course prerequisite graph: course v i must be taken before v j. Compilation: module v i must be compiled before v j. Pipeline of computing jobs: output of job v i needed to determine input of job v j.
Directed Acyclic Graphs Lemma. If G has a topological order, then G is a DAG. Pf. (by contradiction) Suppose that G has a topological order v 1,, v n and that G also has a directed cycle C. Let's see what happens. Let v i be the lowest-indexed node in C, and let v j be the node just before v i ; thus (v j, v i ) is an edge. By our choice of i, we have i < j. On the other hand, since (v j, v i ) is an edge and v 1,, v n is a topological order, we must have j < i, a contradiction. the directed cycle C v 1 v i v j v n the supposed topological order: v 1,, v n
Directed Acyclic Graphs Lemma. If G has a topological order, then G is a DAG. Q. Does every DAG have a topological ordering? Q. If so, how do we compute one?
Directed Acyclic Graphs Lemma. If G is a DAG, then G has a node with no incoming edges. Pf. (by contradiction) Suppose that G is a DAG and every node has at least one incoming edge. Let's see what happens. Pick any node v, and begin following edges backward from v. Since v has at least one incoming edge (u, v) we can walk backward to u. Then, since u has at least one incoming edge (x, u), we can walk backward to x. Repeat until we visit a node, say w, twice. Let C denote the sequence of nodes encountered between successive visits to w. C is a cycle. w x u v
Directed Acyclic Graphs Lemma. If G is a DAG, then G has a topological ordering. Pf. (by induction on n) Base case: true if n = 1. Given DAG on n > 1 nodes, find a node v with no incoming edges. G - { v } is a DAG, since deleting v cannot create cycles. By inductive hypothesis, G - { v } has a topological ordering. Place v first in topological ordering; then append nodes of G - { v } in topological order. This is valid since v has no incoming edges. DAG v
Topological Sorting Algorithm: Running Time Theorem. Algorithm finds a topological order in O(m + n) time. Proof. Maintain the following information: count[w] = remaining number of incoming edges S = set of remaining nodes with no incoming edges Initialization: O(m + n) via single scan through graph. Update: to delete v remove v from S decrement count[w] for all edges from v to w, and add w to S if c count[w] hits 0 this is O(1) per edge