Lecture 10. Finding strongly connected components

Similar documents
CS2 Algorithms and Data Structures Note 10. Depth-First Search and Topological Sorting

(Refer Slide Time: 05:25)

CS 161 Lecture 11 BFS, Dijkstra s algorithm Jessica Su (some parts copied from CLRS) 1 Review

Strongly Connected Components

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 9 Notes. Class URL:

CS 4349 Lecture October 18th, 2017

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Finding Strongly Connected Components

Strongly connected: A directed graph is strongly connected if every pair of vertices are reachable from each other.

Design and Analysis of Algorithms

Graph Algorithms. Andreas Klappenecker. [based on slides by Prof. Welch]

Strongly Connected Components. Andreas Klappenecker

Lecture 20 : Trees DRAFT

Graph Representations and Traversal

CSE 331: Introduction to Algorithm Analysis and Design Graphs

CS4800: Algorithms & Data Jonathan Ullman

Fundamental Graph Algorithms Part Four

CSE 101. Algorithm Design and Analysis Miles Jones Office 4208 CSE Building Lecture 5: SCC/BFS

Lecture 7. Binary Search Trees and Red-Black Trees

1 Connected components in undirected graphs

CS 361 Data Structures & Algs Lecture 15. Prof. Tom Hayes University of New Mexico

15-451/651: Design & Analysis of Algorithms October 5, 2017 Lecture #11: Depth First Search and Strong Components last changed: October 17, 2017

Lecture 3: Graphs and flows

Info 2950, Lecture 16

Last week: Breadth-First Search

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Algorithms: Lecture 10. Chalmers University of Technology

Lecture 4 Median and Selection

CS483 Design and Analysis of Algorithms

Lecture 6. Binary Search Trees and Red-Black Trees

(Re)Introduction to Graphs and Some Algorithms

Graph Algorithms. Chapter 22. CPTR 430 Algorithms Graph Algorithms 1

Lecture 10: Strongly Connected Components, Biconnected Graphs

Figure 1: A directed graph.

Lecture 22 Tuesday, April 10

We will show that the height of a RB tree on n vertices is approximately 2*log n. In class I presented a simple structural proof of this claim:

Trees. Arash Rafiey. 20 October, 2015

CS 125 Section #6 Graph Traversal and Linear Programs 10/13/14

Lecture 4: Trees and Art Galleries

Announcements. HW3 is graded. Average is 81%

More Graph Algorithms

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

CSE 417: Algorithms and Computational Complexity. 3.1 Basic Definitions and Applications. Goals. Chapter 3. Winter 2012 Graphs and Graph Algorithms

Kuratowski Notes , Fall 2005, Prof. Peter Shor Revised Fall 2007

CSE 417: Algorithms and Computational Complexity

Algorithm Design and Analysis

CSE 373: Data Structures and Algorithms

Goals! CSE 417: Algorithms and Computational Complexity!

Depth First Search (DFS)

Logic and Computation Lecture 20 CSU 290 Spring 2009 (Pucella) Thursday, Mar 12, 2009

22.3 Depth First Search

(Refer Slide Time: 02.06)

Professor: Padraic Bartlett. Lecture 9: Trees and Art Galleries. Week 10 UCSB 2015

Graph Algorithms Using Depth First Search

Algorithm Design and Analysis

Graph Algorithms: Chapters Part 1: Introductory graph concepts

MC 302 GRAPH THEORY 10/1/13 Solutions to HW #2 50 points + 6 XC points

CIS 121 Data Structures and Algorithms Midterm 3 Review Solution Sketches Fall 2018

Solutions to relevant spring 2000 exam problems

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015

The Konigsberg Bridge Problem

Undirected Graphs. V = { 1, 2, 3, 4, 5, 6, 7, 8 } E = { 1-2, 1-3, 2-3, 2-4, 2-5, 3-5, 3-7, 3-8, 4-5, 5-6 } n = 8 m = 11

Definition For vertices u, v V (G), the distance from u to v, denoted d(u, v), in G is the length of a shortest u, v-path. 1

2. CONNECTIVITY Connectivity

CSE 421 Applications of DFS(?) Topological sort

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

CSE 101. Algorithm Design and Analysis Miles Jones and Russell Impagliazzo Miles Office 4208 CSE Building

MITOCW watch?v=zm5mw5nkzjg

Shortest Path Routing Communications networks as graphs Graph terminology Breadth-first search in a graph Properties of breadth-first search

W4231: Analysis of Algorithms

DFS & STRONGLY CONNECTED COMPONENTS

Copyright 2000, Kevin Wayne 1

Problem Set 2 Solutions

CS781 Lecture 2 January 13, Graph Traversals, Search, and Ordering

Today s Outline. CSE 326: Data Structures. Topic #15: Cool Graphs n Pretty Pictures. A Great Mathematician. The Bridges of Königsberg

(Refer Slide Time: 06:01)

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition.

Algorithm Design and Analysis

Graphs and trees come up everywhere. We can view the internet as a graph (in many ways) Web search views web pages as a graph

Graph Representation

Figure 4.1: The evolution of a rooted tree.

CS261: Problem Set #1

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

Chapter 3. Graphs CLRS Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Graphs. Part I: Basic algorithms. Laura Toma Algorithms (csci2200), Bowdoin College

CS 473: Algorithms. Chandra Chekuri 3228 Siebel Center. Fall University of Illinois, Urbana-Champaign

CS 270 Algorithms. Oliver Kullmann. Breadth-first search. Analysing BFS. Depth-first. search. Analysing DFS. Dags and topological sorting.

Graphs. A graph is a data structure consisting of nodes (or vertices) and edges. An edge is a connection between two nodes

Fundamental Graph Algorithms Part Three

INF2220: algorithms and data structures Series 7

Week 5. 1 Analysing BFS. 2 Depth-first search. 3 Analysing DFS. 4 Dags and topological sorting. 5 Detecting cycles. CS 270 Algorithms.

Applied Algorithm Design Lecture 3

CS 270 Algorithms. Oliver Kullmann. Analysing BFS. Depth-first search. Analysing DFS. Dags and topological sorting.

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Graph. Vertex. edge. Directed Graph. Undirected Graph

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 10 Notes. Class URL:

Graph representation

COL351: Analysis and Design of Algorithms (CSE, IITD, Semester-I ) Name: Entry number:

Chapter 3. Graphs. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Transcription:

Lecture 10 Finding strongly connected components

Announcements HW4 due Friday Nothing assigned Friday because MIDTERM in class, Monday 10/30. Please show up. During class, 1:30-2:50 If your last name is A-M: 370-370 (here) If your last name is N-V: 160-124 If your last name is W-Z: 160-323 You may bring one double-sided letter-size page of notes, that you have prepared yourself. Any material through Hashing (Lecture 8) is fair game. Practice exams on the website

More midterm info There will be four sections: 1. Multiple choice Tests basic knowledge 2. Short answer Tests your ability to apply basic knowledge 3. Algorithm Design Similar to a alg. design HW problem (a bit easier) 4. Proving Stuff Similar to a proving-stuff HW problem (a bit easier) This may be a hard exam If it is, that means it s okay if you don t get all the questions. (Please don t freak out).

Last time Breadth-first and depth-first search Plus, applications! Topological sorting In-order traversal of BSTs Shortest path in unweighted graphs Testing bipartite-ness The key was paying attention to the structure of the tree that these search algorithms implicitly build.

Today One more application: Finding strongly connected components But first! Let s briefly recap DFS

Recall: DFS Today, all graphs are directed! Check that the things we did on Monday still all work! It s how you d explore a labyrinth with chalk and a piece of string. 2 5 6 7 1 8 3 4

Depth First Search Exploring a labyrinth with chalk and a piece of string start=7 leave=8 Not been there yet start=0 leave=15 start=13 leave=14 start=4 leave=5 Been there, haven t explored all the paths out. Been there, have explored all the paths out. start=1 leave=11 start=3 leave=9 start=2 leave=10

Depth first search implicitly creates a tree on everything you can reach YOINK! D E A A F B E B C G Call this the DFS tree G C F D

When you can t reach everything Run DFS repeatedly to get a depth-first forest D E A F H B G I J C

When you can t reach everything Run DFS repeatedly to get a depth-first forest D E A F H B G I J C

When you can t reach everything Run DFS repeatedly to get a depth-first forest YOINK! D E YOINK! A F H B G I J C

When you can t reach everything Run DFS repeatedly to get a depth-first forest A H B E I C J F G D The DFS forest is made up of DFS trees

Recall: the parentheses theorem (Works the same with DFS forests) DFS tree If v is a descendent of w in this tree: timeline w.start v.start v.finish w.finish If w is a descendent of v in this tree: v.start w.start w.finish v.finish If neither are descendents of each other: v.start v.finish w.start w.finish If v and w are in different trees, it s always this last one. (or the other way around)

A great question from Monday Why don t start times work for Topological Sorting? I mean, demonstrably they don t (we saw some examples) but what goes wrong in the proof?

SLIDE FROM LAST TIME So to prove this -> If A B Then B.finishTime < A.finishTime Suppose the underlying graph has no cycles Since the graph has no cycles, B must be a descendant of A in that tree. All edges go down the tree. I MESSED UP! This is false. Then This is true. B.startTime A.finishTime A.startTime B.finishTime aka, B.finishTime < A.finishTime.

What it should have been So to prove this -> If A If we got to B, then either: B is a descendant of A in the DFS tree (Same argument as before) Or! B is not a descendant of A in the DFS tree Then we must have gotten to B before we got to A. Otherwise we would have explored B from A, and B would have been a descendant of A in the DFS tree. B Then B.finishTime < A.finishTime Suppose the underlying graph has no cycles B.startTime A.finishTime A.startTime B.finishTime B.finishTime A.finishTime B.startTime A.startTime A either way, B.finishTime < A.finishTime. B

What it should have been So to prove this -> If A If we got to B, then either: B is a descendant of A in the DFS tree (Same argument as before) Or! B is not a descendant of A in the DFS tree Then we must have gotten to B before we got to A. Otherwise we would have explored B from A, and B would have been a descendant of A in the DFS tree. B Then B.finishTime < A.finishTime Suppose the underlying graph has no cycles B.startTime A.finishTime A.startTime B.finishTime B.finishTime A.finishTime B.startTime A.startTime A either way, B.finishTime < A.finishTime. B

Enough of review (and enough of my shortcomings) Strongly connected components

Strongly connected components A directed graph G = (V,E) is strongly connected if: for all v,w in V: there is a path from v to w and there is a path from w to v. strongly connected not strongly connected

We can decompose a graph into strongly connected components (SCCs) At least, we can in theory. How do we do this algorithmically? Note: it s not immediately obvious that we can even do this in theory! The reason why is because two vertices are reachable from each other is an equivalence relation, and the SCCs are equivalence classes.

Why do we care about SCCs? stanford.edu Consider the internet: wikipedia.org nytimes.com 4chan.org berkeley.edu reddit.com Let s ignore this corner of the internet for now but everything today works fine if the graph is disconnected. google image search for puppies Google terms and conditions

Why do we care about SCCs? stanford.edu Consider the internet: wikipedia.org nytimes.com berkeley.edu google image search for puppies Google terms and conditions

What are the SCCs of the internet? In real life, turns out there s one giant one. and then a bunch of tendrils. More generally: Strongly connected components tell you about communities. Lots of graph algorithms only make sense on SCCs. (So some times we want to find the SCCs as a first step) Eg: I was talking to an economist the other day who has to first break up his labor market data into SCCs in order to make sense of it.

Try 1: Try 2: How to find SCCs? Consider all possible decompositions and check. For each pair (u,v), use DFS to find if there are paths u to v and v to u. Aggregate accordingly. Running time: [on board] (Definitely not any better than O(n 2 ))

Pre-Lecture exercise Run DFS starting at D: That will identify SCCs Issues: How do we know where to start DFS? It wouldn t have found the SCCs if we started from A.

Algorithm Running time: O(n + m) Do DFS to create a DFS forest. Choose starting vertices in any order. Keep track of finishing times. Reverse all the edges in the graph. Do DFS again to create another DFS forest. This time, order the nodes in the reverse order of the finishing times that they had from the first DFS run. The SCCs are the different trees in the second DFS forest.

Look, it works! (See IPython notebook) But let s break that down a bit

Example Here s another DFS tree in the DFS forest! Start:0 Finish:10 Start:1 Finish:9 Start:2 Finish:5 Start:11 Finish:12 This is one DFS tree in the DFS forest! Start:7 Finish:8 Start:3 Finish:4 3. Do DFS again, but this time, start with the vertices with the largest finish time.

One question

The SCC graph Pretend that each SCC is a vertex in a new graph.

The SCC graph Lemma 1: The SCC graph is a Directed Acyclic Graph (DAG). Proof idea: if not, then two SCCs would collapse into one.

all times are with respect to the first DFS run Starting and finishing times in a SCC The finishing time of a SCC is the largest finishing time of any element of that SCC. The starting time of a SCC is the smallest starting time of any element of that SCC. Start:0 Finish:10 Start: 0 Finish: 10 Start:1 Finish:9 Start:7 Finish:8

Our SCC DAG with start and finish times Last time we saw that: Finishing times allowed us to topologically sort of the vertices. Notice that works in this example too Start: 0 Finish: 10 Start: 2 Finish: 5 Start: 11 Finish: 12

This is the main idea. Let s reverse the edges. Start: 0 Finish: 10 Start: 2 Finish: 5 Start: 11 Finish: 12

This is the main idea. Let s reverse the edges. Now, the SCC with the largest finish time has no edges going out. So if I run DFS there, I ll find exactly that component. Remove and repeat. Start: 0 Finish: 10 Start: 2 Finish: 5 Start: 11 Finish: 12

Let s make this idea formal.

Back the the parentheses theorem If v is a descendent of w in this tree: timeline w.start v.start v.finish w.finish If w is a descendent of v in this tree: w v.start w.start w.finish v.finish If neither are descendents of each other: v.start v.finish w.start w.finish (or the other way around) v

As we saw (correctly this time ) Claim: In a DAG, we ll always have: A finish: [larger] B finish: [smaller]

Same thing, in the SCC DAG. Claim: we ll always have finish: [larger] finish: [smaller]

Proof idea A B Want to show A.finish > B.finish. Two cases: We reached A before B in our first DFS. We reached B before A in our first DFS.

Proof idea Case 1: We reached A before B in our first DFS. Say that: x has the largest finish time in A; y has the largest finish in B; z was discovered first in A; Then: Reach A before B => we will discover y via z => y is a descendant of z in the DFS forest. Then z.start y.start A B z.finish y.finish x.finish Want to show A.finish > B.finish. So A.finish = x.finish B.finish = y.finish x.finish >= z.finish

Proof idea Case 2: We reached B before A in our first DFS. A There are no paths from B to A because the SCC graph has no cycles B Want to show A.finish > B.finish. So we completely finish exploring B and never reach A. A is explored later after we restart DFS.

Proof idea Two cases: We reached A before B in our first DFS. We reached B before A in our first DFS. In either case: A B Want to show A.finish > B.finish. which is what we wanted to show. Notice: this is exactly the same two-case argument that we did earlier, just with the SCC DAG!

This establishes: Lemma 2 If there is an edge like this: finish: [larger] A Then A.finish > B.finish. finish: [smaller] B

This establishes: Corollary 1 If there is an edge like this in the reversed graph: finish: [larger] A Then A.finish > B.finish. finish: [smaller] B

Now we see why this works. The Corollary says that all blue arrows point towards larger finish times. So if we start with the largest finish time, all blue arrows lead in. Thus, that connected component, and only that connected component, are reachable by the second round of DFS Remember that after the first round of DFS, and after we reversed all the edges, we ended up with this SCC DAG: Start: 0 Finish: 10 Now, we ve deleted that first component. The next one has the next biggest finishing time. So all remaining blue arrows lead in. Repeat. Start: 2 Finish: 5 Start: 11 Finish:12

Formally, we prove it by induction Theorem: The algorithm we saw before will correctly identify strongly connected components. Inductive hypothesis: The first t trees found in the second (reversed) DFS forest are the t SCCs with the largest finish times. Moreover, what s left unvisited after these t trees have been explored is a DAG on the un-found SCCs. Base case: (t=0) The first 0 trees found in the reversed DFS forest are the 0 SCCs with the largest finish times. (TRUE) Moreover, what s left unvisited after 0 trees have been explored is a DAG on all the SCCs. (TRUE by Lemma 1.)

Inductive step [drawing on board to supplement] Assume by induction that the first t trees are the last-finishing SCCs, and the remaining SCCs form a DAG. Consider the (t+1) st tree produced, suppose the root is x. Suppose that x lives in the SCC A. Then A.finish > B.finish for all remaining SCCs B. This is because we chose x to have the largest finish time. Then there are no edges leaving A in the remaining SCC DAG. This follows from the Corollary. Then DFS started at x recovers exactly A. It doesn t recover any more since nothing else is reachable. It doesn t recover any less since A is strongly connected. (Notice that we are using that A is still strongly connected when we reverse all the edges). So the (t+1) st tree is the SCC with the (t+1) st biggest finish time.

Formally, we prove it by induction Theorem: The algorithm we saw before will correctly identify strongly connected components. Inductive hypothesis: The first t trees found in the second (reversed) DFS forest are the t SCCs with the largest finish times. Moreover, what s left unvisited after these t trees have been explored is a DAG on the un-found SCCs. Base case: [done] Inductive step: [done] Conclusion: The second (reversed) DFS forest contains all the SCCs as its trees! (This is the first bullet of IH when t = #SCCs)

Punchline: we can find SCCs in time O(n + m) Algorithm: Do DFS to create a DFS forest. Choose starting vertices in any order. Keep track of finishing times. Reverse all the edges in the graph. Do DFS again to create another DFS forest. This time, order the nodes in the reverse order of the finishing times that they had from the first DFS run. The SCCs are the different trees in the second DFS forest. (Clearly it wasn t obvious since it took all class to do! But hopefully it is less mysterious now.)

Recap Depth First Search reveals a very useful structure! We saw Monday that this structure can be used to do Topological Sorting in time O(n+m) Today we saw that it can also find Strongly Connected Components in time O(n + m) This was pretty non-trivial.

Next time MIDTERM BEFORE Next time Study for the midterm!