Scalable Big Graph Processing in Map Reduce

Size: px
Start display at page:

Download "Scalable Big Graph Processing in Map Reduce"

Transcription

1 Scalable Big Graph Processing in Map Reduce Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Chengqi Zhang, Xuemin Lin, Presented by Megan Bryant College of William and Mary February 11, 2015 Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William1 and / 60Ma

2 Overview In this presentation, we will be introduced to methods for scalable big graph processing in MapReduce. Specifically, we will be introduced with a new class SGC which has the potential to guide the development of scalable graph processing algorithm in MapReduce. Two new graph join operators will also be introduced which will greatly enhance the capabilities of the SGC class. Finally, we will compare the performance of these three classes on several scalable graph algorithms. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William2 and / 60Ma

3 Computational Complexity Computational complexity theory provides a framework and a set of analysis tools for gauging the work performed by an algorithm as measured by the elementary (i.e. basic) operations it performs. The different basic steps (operations) that an algorithm typically takes are: Assignment (e.g. assigning some value to a variable) Arithmetic (e.g. addition, subtraction, multiplication, and division) Logical (e.g. comparison of two numbers) Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William3 and / 60Ma

4 Big-O Notation We utilize Big-O notation to define the complexity of an algorithm. Definition An algorithm is said to run in O(f(n)) time if for some numbers c and n 0, the time taken by the algorithm is at most cf(n) for all n n 0 for some constant c. This is an example of worst case analysis, which is independent of computing environment, relatively easy to perform, and providing an upper bound on the maximum number of steps an running time an algorithm must take. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William4 and / 60Ma

5 Big-O Complexity Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William5 and / 60Ma

6 Common Complexities The following table contains the complexities of common algorithms. Algorithm Data Structure Time Space Complexity Complexity Depth First Search Graph w/n nodes O(n + m) O(m) and n nodes Breadth First Search Graph w/n nodes O(n + m) O(m) and m nodes Binary Search Sorted array O(log(n)) O(1) Dijkstra s Shortest Graph w/m nodes O(n 2 ) O(n) Path (unsorted array) and n nodes Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William6 and / 60Ma

7 Algorithm Classes in Map Reduce There are currently two main algorithm classes in the MapReduce paradigm: The MapReduce Class (MRC). The Minimal MapReduce Class (MMC). These classes are defined in terms of disk usage, memory usage, communication cost, CPU cost, and number of map reduce rounds. There is also the popular Parallel Random-Access Machine (PRAM) model, against which performance studies were run. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William7 and / 60Ma

8 Map Reduce Class Let S be the set of objects in the problem and let t be the number of machines in the system. Fix a ɛ > 0, a MapReduce algorithm in MRC should have the following properties: Each Machine Total Disk: O( S 1 ɛ ) O( S 2 2ɛ ) Memory: O( S 1 ɛ ) O( S 2 2ɛ ) Communication: O( S 1 1ɛ )/per round O( S 2 2ɛ ) CPU: O( Tseq t ) Number of Rounds: O(1) T seq is the time to solve the same problem on a single sequential machine Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William8 and / 60Ma

9 Minimal Map Reduce Class Let S be the set of objects in the problem and let t be the number of machines in the system. Fix a ɛ > 0, a MapReduce algorithm in MRC should have the following properties: Disk: Memory: Each Machine O( S t ) O( S t ) O( S t Total O( S ) O( S ) Communication: )/per round O( S ) CPU: O(poly( S ))/per round Number of Rounds: O(log i S ), i 0 Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William9 and / 60Ma

10 Parallel Random Access Machine Parallel Random Access Machine (PRAM) is an algorithm for creating a model of parallel computation. It is an extension of the RAM model of sequential computation. In this model, there are p processors connected to a single shared memory and each processor has a unique index 1 i p called the processor id. A single program is executed in single-instruction stream, multiple-data stream fashion. Meaning that each instruction is carried out by all processors simultaneously and requires unit time, regardless of the number of processors. Finally, each processor has a private flag that controls whether it is active in the execution of an instruction. Inactive processors do no participate in the execution of instructions, except for instructions to reset the flag. We will later compare the performance of this algorithm to MRC, MMC, and SGC. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William10 and / 60Ma

11 MRC VS MMC MRC defines the basic requirements for an algorithm to execute in MapReduce, whereas MMC requires several aspects to achieve optimality simultaneously in a MapReduce algorithm. We will begin by analyzing the problems involved in MRC and MMC in graph processing. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William11 and / 60Ma

12 Defining a Graph Let s consider a graph G = (V, E), where V represents the set of vertices (nodes) and E represents the set of edges (arcs). Further, let n = V be the number of nodes and m = E be the number of edges. A graph can be either directed or undirected, cyclic or acyclic, connected or unconnected. We can represent a graph in either a Adjacency Matrix Adjacency List Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William12 and / 60Ma

13 Adjacency Matrix Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William13 and / 60Ma

14 Adjacency List Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William14 and / 60Ma

15 Scalable Graph Processing in MMC For a graph G(V, E), a common graph operation is to exchange data among all adjacent nodes (nodes that share a common edge) in the graph. The memory constraint in MMC requires that all edges/nodes are distributed evenly among all machines in the system. This can be formalized as: Let E i,j be the set of edges (u, v) in G such that u is in machine i and v is in machine j. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William15 and / 60Ma

16 Scalable Graph Processing in MMC The communication constraint in MMC can be formalized as follows: max ( 1 i t 1 j t,j i E i,j ) O( (n + m) ) t where once again E(i, j) is the set of edges (u, v) G and u is in machin i and v is in machine j. In order to achieve this inequality, we must minimize the maximum, i.e. min max ( E i,j ). 1 i t 1 j t,j i However, this problem is actually NP -Hard, meaning that it is at least as hard as the hardest problems in NP. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William16 and / 60Ma

17 Scalable Graph Processing in MMC In addition to being NP -Hard, the optimal solution to max ( 1 i t 1 j t,j i E i,j ) O( (n + m) ) t is successfully, computed, we can t guarantee that the inequality O( (n+m) t ) since it might be as large as O(n + m). Therefore, MMC is not a suitable class for scalable graph processing. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William17 and / 60Ma

18 Scalable Graph Processing in MRC MRC has few constraints than MMC as it simply defines the basic conditions that a MapReduce algorithm should satisfy. Thus a graph algorithm in MapReduce is not an exception. Like MMC, however, we can define a better class to handle Scalable Graph Processing Given a graph G(V,E) with n nodes and m edges, assume that m n 1+c, an MRC graph define a class based on MRC for graph processing in MapReduce, in which a MapReduce algorithm has the following properties: Each Machine Total Disk: O(n 1+c 2 ) O(m 1+c 2 ) Memory: O(n 1+c 2 ) O(m 1+c 2 ) Communication: O(n 1+c 2 )/per round O(m 1+c 2 ) CPU: O(poly(m))/per round Number of Rounds: O(1) Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William18 and / 60Ma

19 Scalable Graph Processing in MRC This class has a good property in that the algorithm runs in constant rounds. However, the memory constraint can cause difficulty as it is large for even a dense graph. (Note: Dense graphs are generally easier to solve than sparse graphs.) Furthermore, if the memory of each machine cannot hold O(n 1+c 2 ), then the algorithm will always fail. Thus, the class is not scalable and can t handle large n. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William19 and / 60Ma

20 Scalable Graph Processing Class We will now formulate a new algorithm class which counters this deficiency. First, we will weaken the bounds on the communication cost per machine from O( m+n t ) to Õ( m t, D(G, t)). This is done to account for the fact that graphs, especially large graphs, can have a skewed degree distribution. This is seen in graphs such as social networks, which often have several nodes with a large number of degrees (subscribers, followers, etc.) as opposed to lower-level users with only a few connections. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William20 and / 60Ma

21 Skewed Degree Distribution Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, in Map Presented Reduceby Megan Bryant February (College 11, 2015 of William 21and / 60Ma

22 Scalable Graph Processing Class Suppose the nodes are uniformly distributed among all machines, denote by V i the set of nodes stored in machine i for 1 i t, and let d j be the degree of node v j in the input graph, Õ( m t, D(G, t)) is defined as: Õ( m, D(G, t)) =O( max t ( d j )) 1 i t v j V i D(G, t) = t1 t 2 d 2 j v j V Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William22 and / 60Ma

23 Scalable Graph Processing Class This leads us to the following lemma, the proof of which has been omitted. Lemma Lemma 3.1: Let x i (1 i q) be the communication cost upper bound for machine i, i.e., x i = v j V i d j, the expected value of x i, E(x i ) = 2m t, and the variance of x i, V ar(x i ) = D(G, t). The important thing that we want to note here is that the variance of the degree distribution of G, denoted V ar(g) is ( (d j 2m n )2 /n = (n d 2 j 4m2 )/n 2. v j V v j V For fixed t, n, and m values, minimizing D(G, t) is equivalent to minimizing V ar(g). In other words, the variance of communication cost for each machine is minimized if all nodes in the graph have the same Lu Qin, degree. Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William23 and / 60Ma

24 Scalable Graph Processing Class Thus, we define the Scalable Graph Processing Class (SGC) as follows. Each Machine Total Disk: O( m+n 2 ) O(m + n) Memory: O(1) O(t) Communication: Õ( m t, D(G, t)) /per round O(m + n) CPU: Õ( m t, D(G, t)) /per round Number of Rounds: O(log(n)) Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William24 and / 60Ma

25 Comparison Between Classes We examine the upper bounds of the three classes to see how the running times of SGC compare. MRC MMC SGC Disk/machine O(n 1+ c 2 ) O( n+m t ) O( n+m t ) Disk/total O(m 1+ c 2 ) O(n + m) O(n + m) Memory/machine O(n 1+ c 2 ) O( n+m t ) O(1) Memory/total O(m 1+ c 2 ) O(n + m) O(t) Communication/machine O(n 1+ c 2 ) O(n + mt) Õ( m t, D(G, t)) Communication/total O(m 1+ c 2 ) O(n + m) O(n + m) CPU/machine O(poly(m)) O( Tseq t ) Õ( m t, D(G, t)) CPU/total O(poly(m)) O(T seq ) O(n + m) Number of rounds O(1) O(1) O(log(n)) Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William25 and / 60Ma

26 Comparison Between Classes We see that even though SGC requires each machine to use constant memory. Meaning, if the total memory of the system is smaller than the input data, the algorithm can still be processed successfully. This is an even stronger constraint than that defined in MMC. Given the constraints on memory, communication, and CPU, it is nearly impossible for a wide range of graph algorithms to be processed in constant rounds in MapReduce. Thus, we relax the O(1) rounds defined in MMC to O(log(n)) rounds. Since Ω(log(n)) is the processing time lower bound for a large number of parallel graph algorithms in the parallel random-access machines, it is practical for the MapReduce framework as evidenced by our experiments. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William26 and / 60Ma

27 Big-O Complexity Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William27 and / 60Ma

28 Graph Operators in SGC In addition to the normal set of graph operators, such as union, intersection, etc., we have introduced two graph operators in SGC, namely, NE join, and EN join, using which a large range of graph problems can be designed. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William28 and / 60Ma

29 Graph Operators in SGC We assume that a graph G(V, E) is stored in a distributed file system as a node table V and an edge table E. Each node in the table has a unique id and some other information such as label and keywords. Each edge in the table has id 1, id 2 defining the source and target node ids of the edge, and some other information such as weight and label. We use the node id to represent the node if it is obvious. G can be either directed or undirected. For an undirected graph, each edge is stored as two edges (id 1, id 2 ) and (id 2, id 1 ). Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William29 and / 60Ma

30 Graph Operators in SGC Before we go any further, let s examine the natural join operation,, acting on two sets of data. Here we see a graphical representation of Employee Dept. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William30 and / 60Ma

31 NE Join An NE join aims to propagate the information on nodes into edges. For each edge (v i, v j ) E, an NE join outputs an edge (v i, v j, F (v i )) (or (v i, v j, F (vj))) where F (v i ) (or F (v j )) is a set of functions operated on v i (or v j ) in the node table V. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William31 and / 60Ma

32 NE Join Given node table V i, & edge table E j, an NE join of V i & E j is represented in SQL as: select id 1, id 2, f 1 (c 1 ) as p 1, f 2 (c 2 ) as p 2, from V i as V NE join E j as E on V.id = E.id where cond(c) count cond (c ) as cnt With the following definitions, c, c, a subset of fields in the two tables V i and E j c 1, c 2 a subset of fields in the two tables V i and E j f k a function operated on the fields c k cond a fucntion that retrusn true or false defined on the fields in c. cond a fucntion that retrusn true or false defined on the fields in c. id can be either id 1 or id 2. count counts the number of trues in cond (c ), assigns it to cnt. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William32 and / 60Ma

33 EN Join An EN join aims to aggregate the information on edges into nodes. For each node v i V, an EN join outputs a node (v i, G(adj(v i ))) where adj(v i ) = (v i, v j ) E, and G is a set of decomposable aggregate functions on the edge set adj(v i ). A decomposable aggregate function g k is defined as decomposable if for any dataset s, and any two subsets of s, s 1 and s 2, with s 1 s 2 = and s 1 s 2 = s, g k (s) can be computed using g k (s 1 ) and g k (s 2 ). Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William33 and / 60Ma

34 EN Join EN join can be defined in SQL form as select id, g 1 (c 1 ) as p 1, g 2 (c 2 ) as p 2, from V i as V EN join Ej as E on V.id = E.id where cond(c) group by id count cond (c ) as cnt With the following definitions, c, c, a subset of fields in the two tables V i and E j c 1, c 2 a subset of fields in the two tables V i and E j id either id 1 or id 2 count cond (c ) as cnt g k decomposable aggregate function operated on the fields in c k by grouping the results using node id Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William34 and / 60Ma

35 Basic Graph Algorithms The combination of NE join and EN join can solve a wide range of graph problems in SGC. In this section, we introduce some basic graph algorithms: PageRank Breadth First Search Graph Keyword Search We will use MRC, MMC, and SGC versions of these algorithms for performance testing, which will be covered later. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William35 and / 60Ma

36 Page Rank PageRank is a key graph operation which computes the rank of each node based on the links (directed edges) among them. Given a directed graph G(V, E), and a page x with inlinks t 1,..., t n, the page rank of x can be calculated iteratively as follows with the following definitions ( ) 1 P R(x) = α + (1 α) V C(t) out-degree of t α probability of random jump V total number of nodes n i=1 P R(t i ) C(t i ) Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William36 and / 60Ma

37 Page Rank Algorithm Graphical overview of the Page Rank algorithm. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William37 and / 60Ma

38 Page Rank in MapReduce Graphical overview of the Page Rank in MapReduce. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William38 and / 60Ma

39 Breadth First Search Breadth First Search (BFS) is a fundamental graph operation. Given an undirected graph G(V, E), and a source node s, a BFS computes for every node v V the shortest distance (i.e., the minimum number of hops) from s to v in G. Define: b is reachable from a if b is on adjacency list of a DistanceTo(s) =0 For all nodes p reachable from s, DistanceTo(p)= 1 For all nodes n reachable from some other set of nodes M, DistanceTo(n)= 1 + min(distanceto(m), m M) Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William39 and / 60Ma

40 Breadth First Search Graphical overview of the Breadth First Search algorithm. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William40 and / 60Ma

41 Graph Key Word Search We now investigate a more complex algorithm, namely, keyword search in an undirected graph G(V, E). Suppose for each v V, t(v) is the text information included in v. Given a keyword query with Q = {k 1, k 2,..., k l } set of l keywords (r, {(p 1, d(r, p 1 )), (p 2, d(r, p 2 )), set of rooted trees..., (p l, d(r, p l ))}) r the root node p i node that contains keyword k i in t(p i ) d(r, p i ) shortest distance from r to p i in G for 1 i l Each answer is uniquely determined by its root node r and rmax is the maximum distance allowed from s to a keyword node in an answer, i.e., d(r, p i ) rmax for 1 i l. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William41 and / 60Ma

42 Connected Component Given an undirected graph G(V, E) with n nodes and m edges, a Connected Component (CC) is a maximal set of nodes that can reach each other through paths in G. Computing all CCs of G is a fundamental graph problem and can be solved efficiently on a sequential machine using O(n + m) time. However, it is non-trivial to solve the problem in MapReduce. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William42 and / 60Ma

43 Existing Algorithms We present three algorithms for Connected Components computation in MapReduce to compare the success of CC in SGC. HashToMin HashGToMin PRAM-Simulation Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William43 and / 60Ma

44 HashToMin HashToMin and HashGToMin are two MapReduce algorithms with a similar idea to use the smallest node in each CC as the representative of the CC, assuming that there is a total order among all nodes in G. The HashToMin algorithm finishes in O(log(n)) rounds, with O(log(n)(m + n)) total communication cost in each round. The algorithm can be optimized to use O(1) memory on each machine using secondary sort in MapReduce. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William44 and / 60Ma

45 HashGToMin The HashGToMin algorithm finishes in Õ(log(n)). Meaning, it is expected to finish in O(log(n))) rounds, with O(m + n) total communication cost in each round. However, it needs O(n) memory for a single machine to hold a whole CC in memory. Thus, HashGToMin is not suitable to handle a graph with large n. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William45 and / 60Ma

46 PRAM Simulation PRAM-Simulation is to simulate the algorithm in the Parallel Random Access Machine (PRAM) model in MapReduce using simulation. The PRAM model allows multiple processors to compute in parallel using a shared memory. A theoretical result shows that an CREW PRAM algorithm in O(t) time can be simulated in MapReduce in O(t) rounds. For the CC computation problem, in the literature, the best result in computes CCs in O(log(n)) time. However, it needs to compute the 2-hop node pairs which requires O(n2) communication cost in the worst case in each round. Thus, the simulation algorithm is impractical. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William46 and / 60Ma

47 Connected Component in SGC We introduce our algorithm to compute CCs in SGC. Conceptually, the algorithm shares similar ideas with most deterministic O(log(n)) PRAM algorithms, but it is non-trivial. Our algorithm maintains a forest using a parent pointer p(v) for each v V. Each rooted tree in the forest represents a partial CC. A singleton is a tree with one node, and a star is a tree of height 1. A tree is an isolated tree if there are no edges in E that connect the tree to another tree. The forest is iteratively updated using two operations: hooking and pointer jumping. Hooking merges several trees into a larger tree, and pointer jumping changes the parent of each node to its grandparent in each tree. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William47 and / 60Ma

48 Comparison We can now compare the running times of these algorithms. We omit PRAM since it was impractical. Note that the CC algorithm in SGC class has the best bounds in each category. This indicates the significant improvement that SGC represents for scalable big graph processing. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William48 and / 60Ma

49 Minimum Spanning Forest Given a weighted undirected graph G(V, E) of n nodes and m edges, with each edge (u, v) E assigned a weight w((u, v)), a Minimum Spanning Forest (MSF) is a spanning forest of G with the minimum total edge weight. We also use (u, v, w((u, v))) to denote an edge. Although MSF can be efficiently computed on a sequential machine using O(m + nlog(n)) time, it is non-trivial to solve the algorithm in MapReduce. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William49 and / 60Ma

50 Minimum Spanning Forest The following is an example of a Minimum Spanning Tree. A forest is made up of many trees. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William50 and / 60Ma

51 MSF Algorithm in SGC Suppose there is a total order among all edges as follows. For any two edges e 1 = (u 1, v 1, w 1 ) and e 2 = (u 2, v 2, w 2 ), e 1 < e 2 iff one of the following conditions holds: 1 w 1 < w 2 2 w 1 = w 2 and min(u 1, v 1 ) < min(u 2, v 2 ) 3 w 1 = w 2 and min(u 1, v 1 ) = min(u 2, v 2 ), and max(u 1, v 1 ) < max(u 2, v 2 ) Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William51 and / 60Ma

52 MSF Comparisons The comparison of two existing algorithms OneRoundMSF, MultiRoundMSF, and our algorithm MSF is shown below in terms of memory consumption per machine, total communication cost per round, and the number of rounds. As we will show in our performance testing, the high memory requirement of OneRoundMSF and MultiRoundMSF becomes the bottleneck for the algorithms to achieve high scalability when handling graphs with large n. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William52 and / 60Ma

53 Performance Testing We tested the performance of the aforementioned algorithms on a cluster of 17 computing nodes, including one master node and 16 slave nodes running, each of which has four Intel Xeon 2.4GHz CPUs and 15GB RAM running 64-bit Ubuntu Linux. We implement all algorithms using Hadoop (version 1.2.1) with Java 1.6. We allow each node to run three mappers and three reducers concurrently Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William53 and / 60Ma

54 Data Sets We use two web-scale graphs Twitter-2010 and Friendster with different graph characteristics for testing. Twitter-2010 contains 41,652,230 nodes and 1,468,365,182 edges with an average degree of 71. The maximum degree is 3,081,112 and the diameter of Twitter-2010 is around 24. Friendster contains 65,608,366 nodes and 1,806,067,135 edges with an average degree of 55. The maximum degree is 5,214 and the diameter of Friendster is around 32. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William54 and / 60Ma

55 Algorithms Besides the five algorithms PageRank (Algorithm 1), BFS (Algorithm 2), KWS (Algorithm 3), CC (Algorithm 4), and MSF (Algorithm 5), we also implement the algorithms for PageRank, BFS, and graph keyword search using the join operations supported by Pig on Hadoop, denoted PageRank-Pig, BFS-Pig and KWS-Pig respectively. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William55 and / 60Ma

56 PageRank Algorithm Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William56 and / 60Ma

57 BFS Algorithm Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William57 and / 60Ma

58 CC Algorithm Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William58 and / 60Ma

59 MSF Algorithm Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William59 and / 60Ma

60 Conclusions In this paper, we studied scalable big graph processing in MapReduce. We reviewed previous MapReduce classes, and propose a new class SGC to guide the development of scalable graph processing algorithms in MapReduce. We introduce two graph join operators using which a large range of graph algorithms can be designed in SGC. Especially, for two fundamental graph algorithms CC computation and MSF computation, we improve the state-of-the-art algorithms both in theory and practice. We conducted extensive performance studies using real web-scale graphs to show the high scalability achieved for our algorithms in SGC. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William60 and / 60Ma

University of Maryland. Tuesday, March 2, 2010

University of Maryland. Tuesday, March 2, 2010 Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 5: Graph Processing Jimmy Lin University of Maryland Thursday, February 21, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Algorithms for Grid Graphs in the MapReduce Model

Algorithms for Grid Graphs in the MapReduce Model University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department

More information

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011 Data-Intensive Information Processing Applications! Session #5 Graph Algorithms Jordan Boyd-Graber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Lecture Summary CSC 263H. August 5, 2016

Lecture Summary CSC 263H. August 5, 2016 Lecture Summary CSC 263H August 5, 2016 This document is a very brief overview of what we did in each lecture, it is by no means a replacement for attending lecture or doing the readings. 1. Week 1 2.

More information

Parallel Graph Algorithms

Parallel Graph Algorithms Parallel Graph Algorithms Design and Analysis of Parallel Algorithms 5DV050/VT3 Part I Introduction Overview Graphs definitions & representations Minimal Spanning Tree (MST) Prim s algorithm Single Source

More information

CSI 604 Elementary Graph Algorithms

CSI 604 Elementary Graph Algorithms CSI 604 Elementary Graph Algorithms Ref: Chapter 22 of the text by Cormen et al. (Second edition) 1 / 25 Graphs: Basic Definitions Undirected Graph G(V, E): V is set of nodes (or vertices) and E is the

More information

Graph Data Processing with MapReduce

Graph Data Processing with MapReduce Distributed data processing on the Cloud Lecture 5 Graph Data Processing with MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, 2015 (licensed under Creation Commons Attribution

More information

Graph Algorithms. Revised based on the slides by Ruoming Kent State

Graph Algorithms. Revised based on the slides by Ruoming Kent State Graph Algorithms Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/

More information

Finding Connected Components in Map-Reduce in Logarithmic Rounds

Finding Connected Components in Map-Reduce in Logarithmic Rounds Finding Connected Components in Map-Reduce in Logarithmic Rounds Vibhor Rastogi Ashwin Machanavajjhala Laukik Chitnis Anish Das Sarma {vibhor.rastogi, ashwin.machanavajjhala, laukik, anish.dassarma}@gmail.com

More information

2. True or false: even though BFS and DFS have the same space complexity, they do not always have the same worst case asymptotic time complexity.

2. True or false: even though BFS and DFS have the same space complexity, they do not always have the same worst case asymptotic time complexity. 1. T F: Consider a directed graph G = (V, E) and a vertex s V. Suppose that for all v V, there exists a directed path in G from s to v. Suppose that a DFS is run on G, starting from s. Then, true or false:

More information

CS521 \ Notes for the Final Exam

CS521 \ Notes for the Final Exam CS521 \ Notes for final exam 1 Ariel Stolerman Asymptotic Notations: CS521 \ Notes for the Final Exam Notation Definition Limit Big-O ( ) Small-o ( ) Big- ( ) Small- ( ) Big- ( ) Notes: ( ) ( ) ( ) ( )

More information

CS 161 Lecture 11 BFS, Dijkstra s algorithm Jessica Su (some parts copied from CLRS) 1 Review

CS 161 Lecture 11 BFS, Dijkstra s algorithm Jessica Su (some parts copied from CLRS) 1 Review 1 Review 1 Something I did not emphasize enough last time is that during the execution of depth-firstsearch, we construct depth-first-search trees. One graph may have multiple depth-firstsearch trees,

More information

Link Analysis in the Cloud

Link Analysis in the Cloud Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)

More information

Parallel Graph Algorithms

Parallel Graph Algorithms Parallel Graph Algorithms Design and Analysis of Parallel Algorithms 5DV050 Spring 202 Part I Introduction Overview Graphsdenitions, properties, representation Minimal spanning tree Prim's algorithm Shortest

More information

Graphs and Network Flows ISE 411. Lecture 7. Dr. Ted Ralphs

Graphs and Network Flows ISE 411. Lecture 7. Dr. Ted Ralphs Graphs and Network Flows ISE 411 Lecture 7 Dr. Ted Ralphs ISE 411 Lecture 7 1 References for Today s Lecture Required reading Chapter 20 References AMO Chapter 13 CLRS Chapter 23 ISE 411 Lecture 7 2 Minimum

More information

22 Elementary Graph Algorithms. There are two standard ways to represent a

22 Elementary Graph Algorithms. There are two standard ways to represent a VI Graph Algorithms Elementary Graph Algorithms Minimum Spanning Trees Single-Source Shortest Paths All-Pairs Shortest Paths 22 Elementary Graph Algorithms There are two standard ways to represent a graph

More information

Elementary Graph Algorithms. Ref: Chapter 22 of the text by Cormen et al. Representing a graph:

Elementary Graph Algorithms. Ref: Chapter 22 of the text by Cormen et al. Representing a graph: Elementary Graph Algorithms Ref: Chapter 22 of the text by Cormen et al. Representing a graph: Graph G(V, E): V set of nodes (vertices); E set of edges. Notation: n = V and m = E. (Vertices are numbered

More information

22 Elementary Graph Algorithms. There are two standard ways to represent a

22 Elementary Graph Algorithms. There are two standard ways to represent a VI Graph Algorithms Elementary Graph Algorithms Minimum Spanning Trees Single-Source Shortest Paths All-Pairs Shortest Paths 22 Elementary Graph Algorithms There are two standard ways to represent a graph

More information

CS 341: Algorithms. Douglas R. Stinson. David R. Cheriton School of Computer Science University of Waterloo. February 26, 2019

CS 341: Algorithms. Douglas R. Stinson. David R. Cheriton School of Computer Science University of Waterloo. February 26, 2019 CS 341: Algorithms Douglas R. Stinson David R. Cheriton School of Computer Science University of Waterloo February 26, 2019 D.R. Stinson (SCS) CS 341 February 26, 2019 1 / 296 1 Course Information 2 Introduction

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 7 Greedy Graph Algorithms Topological sort Shortest paths Adam Smith The (Algorithm) Design Process 1. Work out the answer for some examples. Look for a general principle

More information

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Lecture 16, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In This Lecture Parallel formulations of some important and fundamental

More information

Clustering Using Graph Connectivity

Clustering Using Graph Connectivity Clustering Using Graph Connectivity Patrick Williams June 3, 010 1 Introduction It is often desirable to group elements of a set into disjoint subsets, based on the similarity between the elements in the

More information

Minimum-Spanning-Tree problem. Minimum Spanning Trees (Forests) Minimum-Spanning-Tree problem

Minimum-Spanning-Tree problem. Minimum Spanning Trees (Forests) Minimum-Spanning-Tree problem Minimum Spanning Trees (Forests) Given an undirected graph G=(V,E) with each edge e having a weight w(e) : Find a subgraph T of G of minimum total weight s.t. every pair of vertices connected in G are

More information

CS369G: Algorithmic Techniques for Big Data Spring

CS369G: Algorithmic Techniques for Big Data Spring CS369G: Algorithmic Techniques for Big Data Spring 2015-2016 Lecture 11: l 0 -Sampling and Introduction to Graph Streaming Prof. Moses Charikar Scribe: Austin Benson 1 Overview We present and analyze the

More information

Undirected Graphs. DSA - lecture 6 - T.U.Cluj-Napoca - M. Joldos 1

Undirected Graphs. DSA - lecture 6 - T.U.Cluj-Napoca - M. Joldos 1 Undirected Graphs Terminology. Free Trees. Representations. Minimum Spanning Trees (algorithms: Prim, Kruskal). Graph Traversals (dfs, bfs). Articulation points & Biconnected Components. Graph Matching

More information

Algorithms Sequential and Parallel: A Unified Approach; R. Miller and L. Boxer 3rd Graph Algorithms

Algorithms Sequential and Parallel: A Unified Approach; R. Miller and L. Boxer 3rd Graph Algorithms Algorithms Sequential and Parallel: A Unified Approach; R. Miller and L. Boxer rd Edition @ 0 www.thestudycampus.com Graph Algorithms Terminology Representations Fundamental Algorithms Computing the Transitive

More information

Chapter 9 Graph Algorithms

Chapter 9 Graph Algorithms Chapter 9 Graph Algorithms 2 Introduction graph theory useful in practice represent many real-life problems can be slow if not careful with data structures 3 Definitions an undirected graph G = (V, E)

More information

Design and Analysis of Algorithms - - Assessment

Design and Analysis of Algorithms - - Assessment X Courses» Design and Analysis of Algorithms Week 1 Quiz 1) In the code fragment below, start and end are integer values and prime(x) is a function that returns true if x is a prime number and false otherwise.

More information

Graph Connectivity in MapReduce...How Hard Could it Be?

Graph Connectivity in MapReduce...How Hard Could it Be? Graph Connectivity in MapReduce......How Hard Could it Be? Sergei Vassilvitskii +Karloff, Kumar, Lattanzi, Moseley, Roughgarden, Suri, Vattani, Wang August 28, 2015 Google NYC Maybe Easy...Maybe Hard...

More information

11/22/2016. Chapter 9 Graph Algorithms. Introduction. Definitions. Definitions. Definitions. Definitions

11/22/2016. Chapter 9 Graph Algorithms. Introduction. Definitions. Definitions. Definitions. Definitions Introduction Chapter 9 Graph Algorithms graph theory useful in practice represent many real-life problems can be slow if not careful with data structures 2 Definitions an undirected graph G = (V, E) is

More information

Fast Clustering using MapReduce

Fast Clustering using MapReduce Fast Clustering using MapReduce Alina Ene, Sungjin Im, Benjamin Moseley UIUC KDD 2011 Clustering Massive Data Group web pages based on their content Group users based on their online behavior Finding communities

More information

MapReduce Patterns. MCSN - N. Tonellotto - Distributed Enabling Platforms

MapReduce Patterns. MCSN - N. Tonellotto - Distributed Enabling Platforms MapReduce Patterns 1 Intermediate Data Written locally Transferred from mappers to reducers over network Issue - Performance bottleneck Solution - Use combiners - Use In-Mapper Combining 2 Original Word

More information

Graph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web

Graph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

Lecture 4: Graph Algorithms

Lecture 4: Graph Algorithms Lecture 4: Graph Algorithms Definitions Undirected graph: G =(V, E) V finite set of vertices, E finite set of edges any edge e = (u,v) is an unordered pair Directed graph: edges are ordered pairs If e

More information

Scribe from 2014/2015: Jessica Su, Hieu Pham Date: October 6, 2016 Editor: Jimmy Wu

Scribe from 2014/2015: Jessica Su, Hieu Pham Date: October 6, 2016 Editor: Jimmy Wu CS 267 Lecture 3 Shortest paths, graph diameter Scribe from 2014/2015: Jessica Su, Hieu Pham Date: October 6, 2016 Editor: Jimmy Wu Today we will talk about algorithms for finding shortest paths in a graph.

More information

CS 6783 (Applied Algorithms) Lecture 5

CS 6783 (Applied Algorithms) Lecture 5 CS 6783 (Applied Algorithms) Lecture 5 Antonina Kolokolova January 19, 2012 1 Minimum Spanning Trees An undirected graph G is a pair (V, E); V is a set (of vertices or nodes); E is a set of (undirected)

More information

Efficient Subgraph Matching by Postponing Cartesian Products

Efficient Subgraph Matching by Postponing Cartesian Products Efficient Subgraph Matching by Postponing Cartesian Products Computer Science and Engineering Lijun Chang Lijun.Chang@unsw.edu.au The University of New South Wales, Australia Joint work with Fei Bi, Xuemin

More information

Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms

Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Introduction to Algorithms Preface xiii 1 Introduction 1 1.1 Algorithms 1 1.2 Analyzing algorithms 6 1.3 Designing algorithms 1 1 1.4 Summary 1 6

More information

Algorithms for Finding Dominators in Directed Graphs

Algorithms for Finding Dominators in Directed Graphs Department of Computer Science Aarhus University Master s Thesis Algorithms for Finding Dominators in Directed Graphs Author: Henrik Knakkegaard Christensen 20082178 Supervisor: Gerth Støling Brodal January

More information

Parallel Graph Algorithms. Richard Peng Georgia Tech

Parallel Graph Algorithms. Richard Peng Georgia Tech Parallel Graph Algorithms Richard Peng Georgia Tech OUTLINE Model and problems Graph decompositions Randomized clusterings Interface with optimization THE MODEL The `scale up approach: Have a number of

More information

CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms

CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms Edgar Solomonik University of Illinois at Urbana-Champaign October 12, 2016 Defining

More information

University of Illinois at Urbana-Champaign Department of Computer Science. Final Examination

University of Illinois at Urbana-Champaign Department of Computer Science. Final Examination University of Illinois at Urbana-Champaign Department of Computer Science Final Examination CS 225 Data Structures and Software Principles Spring 2010 7-10p, Wednesday, May 12 Name: NetID: Lab Section

More information

A NETWORK SIMPLEX ALGORITHM FOR SOLVING THE MINIMUM DISTRIBUTION COST PROBLEM. I-Lin Wang and Shiou-Jie Lin. (Communicated by Shu-Cherng Fang)

A NETWORK SIMPLEX ALGORITHM FOR SOLVING THE MINIMUM DISTRIBUTION COST PROBLEM. I-Lin Wang and Shiou-Jie Lin. (Communicated by Shu-Cherng Fang) JOURNAL OF INDUSTRIAL AND doi:10.3934/jimo.2009.5.929 MANAGEMENT OPTIMIZATION Volume 5, Number 4, November 2009 pp. 929 950 A NETWORK SIMPLEX ALGORITHM FOR SOLVING THE MINIMUM DISTRIBUTION COST PROBLEM

More information

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader A New Parallel Algorithm for Connected Components in Dynamic Graphs Robert McColl Oded Green David Bader Overview The Problem Target Datasets Prior Work Parent-Neighbor Subgraph Results Conclusions Problem

More information

ACO Comprehensive Exam March 19 and 20, Computability, Complexity and Algorithms

ACO Comprehensive Exam March 19 and 20, Computability, Complexity and Algorithms 1. Computability, Complexity and Algorithms Bottleneck edges in a flow network: Consider a flow network on a directed graph G = (V,E) with capacities c e > 0 for e E. An edge e E is called a bottleneck

More information

CSE 100: GRAPH ALGORITHMS

CSE 100: GRAPH ALGORITHMS CSE 100: GRAPH ALGORITHMS Dijkstra s Algorithm: Questions Initialize the graph: Give all vertices a dist of INFINITY, set all done flags to false Start at s; give s dist = 0 and set prev field to -1 Enqueue

More information

Distributed Algorithms 6.046J, Spring, Nancy Lynch

Distributed Algorithms 6.046J, Spring, Nancy Lynch Distributed Algorithms 6.046J, Spring, 205 Nancy Lynch What are Distributed Algorithms? Algorithms that run on networked processors, or on multiprocessors that share memory. They solve many kinds of problems:

More information

Direct Addressing Hash table: Collision resolution how handle collisions Hash Functions:

Direct Addressing Hash table: Collision resolution how handle collisions Hash Functions: Direct Addressing - key is index into array => O(1) lookup Hash table: -hash function maps key to index in table -if universe of keys > # table entries then hash functions collision are guaranteed => need

More information

CSE 332 Autumn 2016 Final Exam (closed book, closed notes, no calculators)

CSE 332 Autumn 2016 Final Exam (closed book, closed notes, no calculators) Name: Sample Solution Email address (UWNetID): CSE 332 Autumn 2016 Final Exam (closed book, closed notes, no calculators) Instructions: Read the directions for each question carefully before answering.

More information

I/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs

I/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs I/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs Yishi Lin, Xiaowei Chen, John C.S. Lui The Chinese University of Hong Kong 9/4/15 EXACT DISTANCE QUERIES ON DYNAMIC

More information

A6-R3: DATA STRUCTURE THROUGH C LANGUAGE

A6-R3: DATA STRUCTURE THROUGH C LANGUAGE A6-R3: DATA STRUCTURE THROUGH C LANGUAGE NOTE: 1. There are TWO PARTS in this Module/Paper. PART ONE contains FOUR questions and PART TWO contains FIVE questions. 2. PART ONE is to be answered in the TEAR-OFF

More information

Lecture 10. Elementary Graph Algorithm Minimum Spanning Trees

Lecture 10. Elementary Graph Algorithm Minimum Spanning Trees Lecture 10. Elementary Graph Algorithm Minimum Spanning Trees T. H. Cormen, C. E. Leiserson and R. L. Rivest Introduction to Algorithms, 3rd Edition, MIT Press, 2009 Sungkyunkwan University Hyunseung Choo

More information

Graphs and Graph Algorithms. Slides by Larry Ruzzo

Graphs and Graph Algorithms. Slides by Larry Ruzzo Graphs and Graph Algorithms Slides by Larry Ruzzo Goals Graphs: defns, examples, utility, terminology Representation: input, internal Traversal: Breadth- & Depth-first search Three Algorithms: Connected

More information

Computational Geometry

Computational Geometry Windowing queries Windowing Windowing queries Zoom in; re-center and zoom in; select by outlining Windowing Windowing queries Windowing Windowing queries Given a set of n axis-parallel line segments, preprocess

More information

MapReduce Algorithms. Barna Saha. March 28, 2016

MapReduce Algorithms. Barna Saha. March 28, 2016 MapReduce Algorithms Barna Saha March 28, 2016 Complexity Model for MapReduce Minimum Spanning Tree in MapReduce Computing Dense Subgraph in MapReduce Complexity Model for MapReduce:MRC i Input: finite

More information

Theory of Computing. Lecture 4/5 MAS 714 Hartmut Klauck

Theory of Computing. Lecture 4/5 MAS 714 Hartmut Klauck Theory of Computing Lecture 4/5 MAS 714 Hartmut Klauck How fast can we sort? There are deterministic algorithms that sort in worst case time O(n log n) Do better algorithms exist? Example [Andersson et

More information

Course Review for Finals. Cpt S 223 Fall 2008

Course Review for Finals. Cpt S 223 Fall 2008 Course Review for Finals Cpt S 223 Fall 2008 1 Course Overview Introduction to advanced data structures Algorithmic asymptotic analysis Programming data structures Program design based on performance i.e.,

More information

Parallel Breadth First Search

Parallel Breadth First Search CSE341T/CSE549T 11/03/2014 Lecture 18 Parallel Breadth First Search Today, we will look at a basic graph algorithm, breadth first search (BFS). BFS can be applied to solve a variety of problems including:

More information

Week 12: Minimum Spanning trees and Shortest Paths

Week 12: Minimum Spanning trees and Shortest Paths Agenda: Week 12: Minimum Spanning trees and Shortest Paths Kruskal s Algorithm Single-source shortest paths Dijkstra s algorithm for non-negatively weighted case Reading: Textbook : 61-7, 80-87, 9-601

More information

Chapter 9 Graph Algorithms

Chapter 9 Graph Algorithms Introduction graph theory useful in practice represent many real-life problems can be if not careful with data structures Chapter 9 Graph s 2 Definitions Definitions an undirected graph is a finite set

More information

Trees Rooted Trees Spanning trees and Shortest Paths. 12. Graphs and Trees 2. Aaron Tan November 2017

Trees Rooted Trees Spanning trees and Shortest Paths. 12. Graphs and Trees 2. Aaron Tan November 2017 12. Graphs and Trees 2 Aaron Tan 6 10 November 2017 1 10.5 Trees 2 Definition Definition Definition: Tree A graph is said to be circuit-free if, and only if, it has no circuits. A graph is called a tree

More information

Graph Algorithms Using Depth First Search

Graph Algorithms Using Depth First Search Graph Algorithms Using Depth First Search Analysis of Algorithms Week 8, Lecture 1 Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Graph Algorithms Using Depth

More information

Approximation Algorithms: The Primal-Dual Method. My T. Thai

Approximation Algorithms: The Primal-Dual Method. My T. Thai Approximation Algorithms: The Primal-Dual Method My T. Thai 1 Overview of the Primal-Dual Method Consider the following primal program, called P: min st n c j x j j=1 n a ij x j b i j=1 x j 0 Then the

More information

Graph Theory. ICT Theory Excerpt from various sources by Robert Pergl

Graph Theory. ICT Theory Excerpt from various sources by Robert Pergl Graph Theory ICT Theory Excerpt from various sources by Robert Pergl What can graphs model? Cost of wiring electronic components together. Shortest route between two cities. Finding the shortest distance

More information

DS UNIT 4. Matoshri College of Engineering and Research Center Nasik Department of Computer Engineering Discrete Structutre UNIT - IV

DS UNIT 4. Matoshri College of Engineering and Research Center Nasik Department of Computer Engineering Discrete Structutre UNIT - IV Sr.No. Question Option A Option B Option C Option D 1 2 3 4 5 6 Class : S.E.Comp Which one of the following is the example of non linear data structure Let A be an adjacency matrix of a graph G. The ij

More information

Databases 2 (VU) ( / )

Databases 2 (VU) ( / ) Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:

More information

Simple Parallel Biconnectivity Algorithms for Multicore Platforms

Simple Parallel Biconnectivity Algorithms for Multicore Platforms Simple Parallel Biconnectivity Algorithms for Multicore Platforms George M. Slota Kamesh Madduri The Pennsylvania State University HiPC 2014 December 17-20, 2014 Code, presentation available at graphanalysis.info

More information

Chapter 9 Graph Algorithms

Chapter 9 Graph Algorithms Chapter 9 Graph Algorithms 2 Introduction graph theory useful in practice represent many real-life problems can be if not careful with data structures 3 Definitions an undirected graph G = (V, E) is a

More information

CSE 613: Parallel Programming. Lecture 11 ( Graph Algorithms: Connected Components )

CSE 613: Parallel Programming. Lecture 11 ( Graph Algorithms: Connected Components ) CSE 61: Parallel Programming Lecture ( Graph Algorithms: Connected Components ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 01 Graph Connectivity 1 1 1 6 5 Connected Components:

More information

Solutions to relevant spring 2000 exam problems

Solutions to relevant spring 2000 exam problems Problem 2, exam Here s Prim s algorithm, modified slightly to use C syntax. MSTPrim (G, w, r): Q = V[G]; for (each u Q) { key[u] = ; key[r] = 0; π[r] = 0; while (Q not empty) { u = ExtractMin (Q); for

More information

2 A Template for Minimum Spanning Tree Algorithms

2 A Template for Minimum Spanning Tree Algorithms CS, Lecture 5 Minimum Spanning Trees Scribe: Logan Short (05), William Chen (0), Mary Wootters (0) Date: May, 0 Introduction Today we will continue our discussion of greedy algorithms, specifically in

More information

Lecture 4: Primal Dual Matching Algorithm and Non-Bipartite Matching. 1 Primal/Dual Algorithm for weighted matchings in Bipartite Graphs

Lecture 4: Primal Dual Matching Algorithm and Non-Bipartite Matching. 1 Primal/Dual Algorithm for weighted matchings in Bipartite Graphs CMPUT 675: Topics in Algorithms and Combinatorial Optimization (Fall 009) Lecture 4: Primal Dual Matching Algorithm and Non-Bipartite Matching Lecturer: Mohammad R. Salavatipour Date: Sept 15 and 17, 009

More information

Solution for Homework set 3

Solution for Homework set 3 TTIC 300 and CMSC 37000 Algorithms Winter 07 Solution for Homework set 3 Question (0 points) We are given a directed graph G = (V, E), with two special vertices s and t, and non-negative integral capacities

More information

MapReduce and Friends

MapReduce and Friends MapReduce and Friends Craig C. Douglas University of Wyoming with thanks to Mookwon Seo Why was it invented? MapReduce is a mergesort for large distributed memory computers. It was the basis for a web

More information

CSE 100 Minimum Spanning Trees Prim s and Kruskal

CSE 100 Minimum Spanning Trees Prim s and Kruskal CSE 100 Minimum Spanning Trees Prim s and Kruskal Your Turn The array of vertices, which include dist, prev, and done fields (initialize dist to INFINITY and done to false ): V0: dist= prev= done= adj:

More information

Lecture 11: Graph algorithms! Claudia Hauff (Web Information Systems)!

Lecture 11: Graph algorithms! Claudia Hauff (Web Information Systems)! Lecture 11: Graph algorithms!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the scenes of MapReduce:

More information

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions Dr. Amotz Bar-Noy s Compendium of Algorithms Problems Problems, Hints, and Solutions Chapter 1 Searching and Sorting Problems 1 1.1 Array with One Missing 1.1.1 Problem Let A = A[1],..., A[n] be an array

More information

Weighted Graph Algorithms Presented by Jason Yuan

Weighted Graph Algorithms Presented by Jason Yuan Weighted Graph Algorithms Presented by Jason Yuan Slides: Zachary Friggstad Programming Club Meeting Weighted Graphs struct Edge { int u, v ; int w e i g h t ; // can be a double } ; Edge ( int uu = 0,

More information

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1 Asymptotics, Recurrence and Basic Algorithms 1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1 1. O(logn) 2. O(n) 3. O(nlogn) 4. O(n 2 ) 5. O(2 n ) 2. [1 pt] What is the solution

More information

Seminar on. Edge Coloring Series Parallel Graphs. Mohammmad Tawhidul Islam. Masters of Computer Science Summer Semester 2002 Matrikel Nr.

Seminar on. Edge Coloring Series Parallel Graphs. Mohammmad Tawhidul Islam. Masters of Computer Science Summer Semester 2002 Matrikel Nr. Seminar on Edge Coloring Series Parallel Graphs Mohammmad Tawhidul Islam Masters of Computer Science Summer Semester 2002 Matrikel Nr. 9003378 Fachhochschule Bonn-Rhein-Sieg Contents 1. Introduction. 2.

More information

Jeffrey D. Ullman Stanford University/Infolab

Jeffrey D. Ullman Stanford University/Infolab Jeffrey D. Ullman Stanford University/Infolab 3 Why Care? 1. Density of triangles measures maturity of a community. As communities age, their members tend to connect. 2. The algorithm is actually an example

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,

More information

Minimum Spanning Trees My T. UF

Minimum Spanning Trees My T. UF Introduction to Algorithms Minimum Spanning Trees @ UF Problem Find a low cost network connecting a set of locations Any pair of locations are connected There is no cycle Some applications: Communication

More information

CS 4407 Algorithms Lecture 5: Graphs an Introduction

CS 4407 Algorithms Lecture 5: Graphs an Introduction CS 4407 Algorithms Lecture 5: Graphs an Introduction Prof. Gregory Provan Department of Computer Science University College Cork 1 Outline Motivation Importance of graphs for algorithm design applications

More information

COMP Parallel Computing. PRAM (3) PRAM algorithm design techniques

COMP Parallel Computing. PRAM (3) PRAM algorithm design techniques COMP 633 - Parallel Computing Lecture 4 August 30, 2018 PRAM algorithm design techniques Reading for next class PRAM handout section 5 1 Topics Parallel connected components algorithm representation of

More information

Elements of Graph Theory

Elements of Graph Theory Elements of Graph Theory Quick review of Chapters 9.1 9.5, 9.7 (studied in Mt1348/2008) = all basic concepts must be known New topics we will mostly skip shortest paths (Chapter 9.6), as that was covered

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 4: Analyzing Graphs (1/2) October 4, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are

More information

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}.

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}. Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,

More information

Parallel Connected Components

Parallel Connected Components Parallel Connected Components prof. Ing. Pavel Tvrdík CSc. Katedra počítačových systémů Fakulta informačních technologií České vysoké učení technické v Praze c Pavel Tvrdík, 00 Pokročilé paralelní algoritmy

More information

Lecture 7: Asymmetric K-Center

Lecture 7: Asymmetric K-Center Advanced Approximation Algorithms (CMU 18-854B, Spring 008) Lecture 7: Asymmetric K-Center February 5, 007 Lecturer: Anupam Gupta Scribe: Jeremiah Blocki In this lecture, we will consider the K-center

More information

Algorithm Design (8) Graph Algorithms 1/2

Algorithm Design (8) Graph Algorithms 1/2 Graph Algorithm Design (8) Graph Algorithms / Graph:, : A finite set of vertices (or nodes) : A finite set of edges (or arcs or branches) each of which connect two vertices Takashi Chikayama School of

More information

Goals! CSE 417: Algorithms and Computational Complexity!

Goals! CSE 417: Algorithms and Computational Complexity! Goals! CSE : Algorithms and Computational Complexity! Graphs: defns, examples, utility, terminology! Representation: input, internal! Traversal: Breadth- & Depth-first search! Three Algorithms:!!Connected

More information

CS781 Lecture 2 January 13, Graph Traversals, Search, and Ordering

CS781 Lecture 2 January 13, Graph Traversals, Search, and Ordering CS781 Lecture 2 January 13, 2010 Graph Traversals, Search, and Ordering Review of Lecture 1 Notions of Algorithm Scalability Worst-Case and Average-Case Analysis Asymptotic Growth Rates: Big-Oh Prototypical

More information

Union/Find Aka: Disjoint-set forest. Problem definition. Naïve attempts CS 445

Union/Find Aka: Disjoint-set forest. Problem definition. Naïve attempts CS 445 CS 5 Union/Find Aka: Disjoint-set forest Alon Efrat Problem definition Given: A set of atoms S={1, n E.g. each represents a commercial name of a drugs. This set consists of different disjoint subsets.

More information

Network optimization: An overview

Network optimization: An overview Network optimization: An overview Mathias Johanson Alkit Communications 1 Introduction Various kinds of network optimization problems appear in many fields of work, including telecommunication systems,

More information

and 6.855J February 6, Data Structures

and 6.855J February 6, Data Structures 15.08 and 6.855J February 6, 003 Data Structures 1 Overview of this Lecture A very fast overview of some data structures that we will be using this semester lists, sets, stacks, queues, networks, trees

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

Graph Representations and Traversal

Graph Representations and Traversal COMPSCI 330: Design and Analysis of Algorithms 02/08/2017-02/20/2017 Graph Representations and Traversal Lecturer: Debmalya Panigrahi Scribe: Tianqi Song, Fred Zhang, Tianyu Wang 1 Overview This lecture

More information