Scalable Big Graph Processing in Map Reduce
|
|
- Ella Wilson
- 6 years ago
- Views:
Transcription
1 Scalable Big Graph Processing in Map Reduce Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Chengqi Zhang, Xuemin Lin, Presented by Megan Bryant College of William and Mary February 11, 2015 Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William1 and / 60Ma
2 Overview In this presentation, we will be introduced to methods for scalable big graph processing in MapReduce. Specifically, we will be introduced with a new class SGC which has the potential to guide the development of scalable graph processing algorithm in MapReduce. Two new graph join operators will also be introduced which will greatly enhance the capabilities of the SGC class. Finally, we will compare the performance of these three classes on several scalable graph algorithms. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William2 and / 60Ma
3 Computational Complexity Computational complexity theory provides a framework and a set of analysis tools for gauging the work performed by an algorithm as measured by the elementary (i.e. basic) operations it performs. The different basic steps (operations) that an algorithm typically takes are: Assignment (e.g. assigning some value to a variable) Arithmetic (e.g. addition, subtraction, multiplication, and division) Logical (e.g. comparison of two numbers) Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William3 and / 60Ma
4 Big-O Notation We utilize Big-O notation to define the complexity of an algorithm. Definition An algorithm is said to run in O(f(n)) time if for some numbers c and n 0, the time taken by the algorithm is at most cf(n) for all n n 0 for some constant c. This is an example of worst case analysis, which is independent of computing environment, relatively easy to perform, and providing an upper bound on the maximum number of steps an running time an algorithm must take. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William4 and / 60Ma
5 Big-O Complexity Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William5 and / 60Ma
6 Common Complexities The following table contains the complexities of common algorithms. Algorithm Data Structure Time Space Complexity Complexity Depth First Search Graph w/n nodes O(n + m) O(m) and n nodes Breadth First Search Graph w/n nodes O(n + m) O(m) and m nodes Binary Search Sorted array O(log(n)) O(1) Dijkstra s Shortest Graph w/m nodes O(n 2 ) O(n) Path (unsorted array) and n nodes Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William6 and / 60Ma
7 Algorithm Classes in Map Reduce There are currently two main algorithm classes in the MapReduce paradigm: The MapReduce Class (MRC). The Minimal MapReduce Class (MMC). These classes are defined in terms of disk usage, memory usage, communication cost, CPU cost, and number of map reduce rounds. There is also the popular Parallel Random-Access Machine (PRAM) model, against which performance studies were run. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William7 and / 60Ma
8 Map Reduce Class Let S be the set of objects in the problem and let t be the number of machines in the system. Fix a ɛ > 0, a MapReduce algorithm in MRC should have the following properties: Each Machine Total Disk: O( S 1 ɛ ) O( S 2 2ɛ ) Memory: O( S 1 ɛ ) O( S 2 2ɛ ) Communication: O( S 1 1ɛ )/per round O( S 2 2ɛ ) CPU: O( Tseq t ) Number of Rounds: O(1) T seq is the time to solve the same problem on a single sequential machine Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William8 and / 60Ma
9 Minimal Map Reduce Class Let S be the set of objects in the problem and let t be the number of machines in the system. Fix a ɛ > 0, a MapReduce algorithm in MRC should have the following properties: Disk: Memory: Each Machine O( S t ) O( S t ) O( S t Total O( S ) O( S ) Communication: )/per round O( S ) CPU: O(poly( S ))/per round Number of Rounds: O(log i S ), i 0 Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William9 and / 60Ma
10 Parallel Random Access Machine Parallel Random Access Machine (PRAM) is an algorithm for creating a model of parallel computation. It is an extension of the RAM model of sequential computation. In this model, there are p processors connected to a single shared memory and each processor has a unique index 1 i p called the processor id. A single program is executed in single-instruction stream, multiple-data stream fashion. Meaning that each instruction is carried out by all processors simultaneously and requires unit time, regardless of the number of processors. Finally, each processor has a private flag that controls whether it is active in the execution of an instruction. Inactive processors do no participate in the execution of instructions, except for instructions to reset the flag. We will later compare the performance of this algorithm to MRC, MMC, and SGC. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William10 and / 60Ma
11 MRC VS MMC MRC defines the basic requirements for an algorithm to execute in MapReduce, whereas MMC requires several aspects to achieve optimality simultaneously in a MapReduce algorithm. We will begin by analyzing the problems involved in MRC and MMC in graph processing. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William11 and / 60Ma
12 Defining a Graph Let s consider a graph G = (V, E), where V represents the set of vertices (nodes) and E represents the set of edges (arcs). Further, let n = V be the number of nodes and m = E be the number of edges. A graph can be either directed or undirected, cyclic or acyclic, connected or unconnected. We can represent a graph in either a Adjacency Matrix Adjacency List Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William12 and / 60Ma
13 Adjacency Matrix Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William13 and / 60Ma
14 Adjacency List Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William14 and / 60Ma
15 Scalable Graph Processing in MMC For a graph G(V, E), a common graph operation is to exchange data among all adjacent nodes (nodes that share a common edge) in the graph. The memory constraint in MMC requires that all edges/nodes are distributed evenly among all machines in the system. This can be formalized as: Let E i,j be the set of edges (u, v) in G such that u is in machine i and v is in machine j. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William15 and / 60Ma
16 Scalable Graph Processing in MMC The communication constraint in MMC can be formalized as follows: max ( 1 i t 1 j t,j i E i,j ) O( (n + m) ) t where once again E(i, j) is the set of edges (u, v) G and u is in machin i and v is in machine j. In order to achieve this inequality, we must minimize the maximum, i.e. min max ( E i,j ). 1 i t 1 j t,j i However, this problem is actually NP -Hard, meaning that it is at least as hard as the hardest problems in NP. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William16 and / 60Ma
17 Scalable Graph Processing in MMC In addition to being NP -Hard, the optimal solution to max ( 1 i t 1 j t,j i E i,j ) O( (n + m) ) t is successfully, computed, we can t guarantee that the inequality O( (n+m) t ) since it might be as large as O(n + m). Therefore, MMC is not a suitable class for scalable graph processing. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William17 and / 60Ma
18 Scalable Graph Processing in MRC MRC has few constraints than MMC as it simply defines the basic conditions that a MapReduce algorithm should satisfy. Thus a graph algorithm in MapReduce is not an exception. Like MMC, however, we can define a better class to handle Scalable Graph Processing Given a graph G(V,E) with n nodes and m edges, assume that m n 1+c, an MRC graph define a class based on MRC for graph processing in MapReduce, in which a MapReduce algorithm has the following properties: Each Machine Total Disk: O(n 1+c 2 ) O(m 1+c 2 ) Memory: O(n 1+c 2 ) O(m 1+c 2 ) Communication: O(n 1+c 2 )/per round O(m 1+c 2 ) CPU: O(poly(m))/per round Number of Rounds: O(1) Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William18 and / 60Ma
19 Scalable Graph Processing in MRC This class has a good property in that the algorithm runs in constant rounds. However, the memory constraint can cause difficulty as it is large for even a dense graph. (Note: Dense graphs are generally easier to solve than sparse graphs.) Furthermore, if the memory of each machine cannot hold O(n 1+c 2 ), then the algorithm will always fail. Thus, the class is not scalable and can t handle large n. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William19 and / 60Ma
20 Scalable Graph Processing Class We will now formulate a new algorithm class which counters this deficiency. First, we will weaken the bounds on the communication cost per machine from O( m+n t ) to Õ( m t, D(G, t)). This is done to account for the fact that graphs, especially large graphs, can have a skewed degree distribution. This is seen in graphs such as social networks, which often have several nodes with a large number of degrees (subscribers, followers, etc.) as opposed to lower-level users with only a few connections. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William20 and / 60Ma
21 Skewed Degree Distribution Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, in Map Presented Reduceby Megan Bryant February (College 11, 2015 of William 21and / 60Ma
22 Scalable Graph Processing Class Suppose the nodes are uniformly distributed among all machines, denote by V i the set of nodes stored in machine i for 1 i t, and let d j be the degree of node v j in the input graph, Õ( m t, D(G, t)) is defined as: Õ( m, D(G, t)) =O( max t ( d j )) 1 i t v j V i D(G, t) = t1 t 2 d 2 j v j V Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William22 and / 60Ma
23 Scalable Graph Processing Class This leads us to the following lemma, the proof of which has been omitted. Lemma Lemma 3.1: Let x i (1 i q) be the communication cost upper bound for machine i, i.e., x i = v j V i d j, the expected value of x i, E(x i ) = 2m t, and the variance of x i, V ar(x i ) = D(G, t). The important thing that we want to note here is that the variance of the degree distribution of G, denoted V ar(g) is ( (d j 2m n )2 /n = (n d 2 j 4m2 )/n 2. v j V v j V For fixed t, n, and m values, minimizing D(G, t) is equivalent to minimizing V ar(g). In other words, the variance of communication cost for each machine is minimized if all nodes in the graph have the same Lu Qin, degree. Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William23 and / 60Ma
24 Scalable Graph Processing Class Thus, we define the Scalable Graph Processing Class (SGC) as follows. Each Machine Total Disk: O( m+n 2 ) O(m + n) Memory: O(1) O(t) Communication: Õ( m t, D(G, t)) /per round O(m + n) CPU: Õ( m t, D(G, t)) /per round Number of Rounds: O(log(n)) Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William24 and / 60Ma
25 Comparison Between Classes We examine the upper bounds of the three classes to see how the running times of SGC compare. MRC MMC SGC Disk/machine O(n 1+ c 2 ) O( n+m t ) O( n+m t ) Disk/total O(m 1+ c 2 ) O(n + m) O(n + m) Memory/machine O(n 1+ c 2 ) O( n+m t ) O(1) Memory/total O(m 1+ c 2 ) O(n + m) O(t) Communication/machine O(n 1+ c 2 ) O(n + mt) Õ( m t, D(G, t)) Communication/total O(m 1+ c 2 ) O(n + m) O(n + m) CPU/machine O(poly(m)) O( Tseq t ) Õ( m t, D(G, t)) CPU/total O(poly(m)) O(T seq ) O(n + m) Number of rounds O(1) O(1) O(log(n)) Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William25 and / 60Ma
26 Comparison Between Classes We see that even though SGC requires each machine to use constant memory. Meaning, if the total memory of the system is smaller than the input data, the algorithm can still be processed successfully. This is an even stronger constraint than that defined in MMC. Given the constraints on memory, communication, and CPU, it is nearly impossible for a wide range of graph algorithms to be processed in constant rounds in MapReduce. Thus, we relax the O(1) rounds defined in MMC to O(log(n)) rounds. Since Ω(log(n)) is the processing time lower bound for a large number of parallel graph algorithms in the parallel random-access machines, it is practical for the MapReduce framework as evidenced by our experiments. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William26 and / 60Ma
27 Big-O Complexity Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William27 and / 60Ma
28 Graph Operators in SGC In addition to the normal set of graph operators, such as union, intersection, etc., we have introduced two graph operators in SGC, namely, NE join, and EN join, using which a large range of graph problems can be designed. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William28 and / 60Ma
29 Graph Operators in SGC We assume that a graph G(V, E) is stored in a distributed file system as a node table V and an edge table E. Each node in the table has a unique id and some other information such as label and keywords. Each edge in the table has id 1, id 2 defining the source and target node ids of the edge, and some other information such as weight and label. We use the node id to represent the node if it is obvious. G can be either directed or undirected. For an undirected graph, each edge is stored as two edges (id 1, id 2 ) and (id 2, id 1 ). Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William29 and / 60Ma
30 Graph Operators in SGC Before we go any further, let s examine the natural join operation,, acting on two sets of data. Here we see a graphical representation of Employee Dept. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William30 and / 60Ma
31 NE Join An NE join aims to propagate the information on nodes into edges. For each edge (v i, v j ) E, an NE join outputs an edge (v i, v j, F (v i )) (or (v i, v j, F (vj))) where F (v i ) (or F (v j )) is a set of functions operated on v i (or v j ) in the node table V. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William31 and / 60Ma
32 NE Join Given node table V i, & edge table E j, an NE join of V i & E j is represented in SQL as: select id 1, id 2, f 1 (c 1 ) as p 1, f 2 (c 2 ) as p 2, from V i as V NE join E j as E on V.id = E.id where cond(c) count cond (c ) as cnt With the following definitions, c, c, a subset of fields in the two tables V i and E j c 1, c 2 a subset of fields in the two tables V i and E j f k a function operated on the fields c k cond a fucntion that retrusn true or false defined on the fields in c. cond a fucntion that retrusn true or false defined on the fields in c. id can be either id 1 or id 2. count counts the number of trues in cond (c ), assigns it to cnt. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William32 and / 60Ma
33 EN Join An EN join aims to aggregate the information on edges into nodes. For each node v i V, an EN join outputs a node (v i, G(adj(v i ))) where adj(v i ) = (v i, v j ) E, and G is a set of decomposable aggregate functions on the edge set adj(v i ). A decomposable aggregate function g k is defined as decomposable if for any dataset s, and any two subsets of s, s 1 and s 2, with s 1 s 2 = and s 1 s 2 = s, g k (s) can be computed using g k (s 1 ) and g k (s 2 ). Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William33 and / 60Ma
34 EN Join EN join can be defined in SQL form as select id, g 1 (c 1 ) as p 1, g 2 (c 2 ) as p 2, from V i as V EN join Ej as E on V.id = E.id where cond(c) group by id count cond (c ) as cnt With the following definitions, c, c, a subset of fields in the two tables V i and E j c 1, c 2 a subset of fields in the two tables V i and E j id either id 1 or id 2 count cond (c ) as cnt g k decomposable aggregate function operated on the fields in c k by grouping the results using node id Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William34 and / 60Ma
35 Basic Graph Algorithms The combination of NE join and EN join can solve a wide range of graph problems in SGC. In this section, we introduce some basic graph algorithms: PageRank Breadth First Search Graph Keyword Search We will use MRC, MMC, and SGC versions of these algorithms for performance testing, which will be covered later. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William35 and / 60Ma
36 Page Rank PageRank is a key graph operation which computes the rank of each node based on the links (directed edges) among them. Given a directed graph G(V, E), and a page x with inlinks t 1,..., t n, the page rank of x can be calculated iteratively as follows with the following definitions ( ) 1 P R(x) = α + (1 α) V C(t) out-degree of t α probability of random jump V total number of nodes n i=1 P R(t i ) C(t i ) Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William36 and / 60Ma
37 Page Rank Algorithm Graphical overview of the Page Rank algorithm. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William37 and / 60Ma
38 Page Rank in MapReduce Graphical overview of the Page Rank in MapReduce. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William38 and / 60Ma
39 Breadth First Search Breadth First Search (BFS) is a fundamental graph operation. Given an undirected graph G(V, E), and a source node s, a BFS computes for every node v V the shortest distance (i.e., the minimum number of hops) from s to v in G. Define: b is reachable from a if b is on adjacency list of a DistanceTo(s) =0 For all nodes p reachable from s, DistanceTo(p)= 1 For all nodes n reachable from some other set of nodes M, DistanceTo(n)= 1 + min(distanceto(m), m M) Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William39 and / 60Ma
40 Breadth First Search Graphical overview of the Breadth First Search algorithm. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William40 and / 60Ma
41 Graph Key Word Search We now investigate a more complex algorithm, namely, keyword search in an undirected graph G(V, E). Suppose for each v V, t(v) is the text information included in v. Given a keyword query with Q = {k 1, k 2,..., k l } set of l keywords (r, {(p 1, d(r, p 1 )), (p 2, d(r, p 2 )), set of rooted trees..., (p l, d(r, p l ))}) r the root node p i node that contains keyword k i in t(p i ) d(r, p i ) shortest distance from r to p i in G for 1 i l Each answer is uniquely determined by its root node r and rmax is the maximum distance allowed from s to a keyword node in an answer, i.e., d(r, p i ) rmax for 1 i l. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William41 and / 60Ma
42 Connected Component Given an undirected graph G(V, E) with n nodes and m edges, a Connected Component (CC) is a maximal set of nodes that can reach each other through paths in G. Computing all CCs of G is a fundamental graph problem and can be solved efficiently on a sequential machine using O(n + m) time. However, it is non-trivial to solve the problem in MapReduce. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William42 and / 60Ma
43 Existing Algorithms We present three algorithms for Connected Components computation in MapReduce to compare the success of CC in SGC. HashToMin HashGToMin PRAM-Simulation Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William43 and / 60Ma
44 HashToMin HashToMin and HashGToMin are two MapReduce algorithms with a similar idea to use the smallest node in each CC as the representative of the CC, assuming that there is a total order among all nodes in G. The HashToMin algorithm finishes in O(log(n)) rounds, with O(log(n)(m + n)) total communication cost in each round. The algorithm can be optimized to use O(1) memory on each machine using secondary sort in MapReduce. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William44 and / 60Ma
45 HashGToMin The HashGToMin algorithm finishes in Õ(log(n)). Meaning, it is expected to finish in O(log(n))) rounds, with O(m + n) total communication cost in each round. However, it needs O(n) memory for a single machine to hold a whole CC in memory. Thus, HashGToMin is not suitable to handle a graph with large n. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William45 and / 60Ma
46 PRAM Simulation PRAM-Simulation is to simulate the algorithm in the Parallel Random Access Machine (PRAM) model in MapReduce using simulation. The PRAM model allows multiple processors to compute in parallel using a shared memory. A theoretical result shows that an CREW PRAM algorithm in O(t) time can be simulated in MapReduce in O(t) rounds. For the CC computation problem, in the literature, the best result in computes CCs in O(log(n)) time. However, it needs to compute the 2-hop node pairs which requires O(n2) communication cost in the worst case in each round. Thus, the simulation algorithm is impractical. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William46 and / 60Ma
47 Connected Component in SGC We introduce our algorithm to compute CCs in SGC. Conceptually, the algorithm shares similar ideas with most deterministic O(log(n)) PRAM algorithms, but it is non-trivial. Our algorithm maintains a forest using a parent pointer p(v) for each v V. Each rooted tree in the forest represents a partial CC. A singleton is a tree with one node, and a star is a tree of height 1. A tree is an isolated tree if there are no edges in E that connect the tree to another tree. The forest is iteratively updated using two operations: hooking and pointer jumping. Hooking merges several trees into a larger tree, and pointer jumping changes the parent of each node to its grandparent in each tree. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William47 and / 60Ma
48 Comparison We can now compare the running times of these algorithms. We omit PRAM since it was impractical. Note that the CC algorithm in SGC class has the best bounds in each category. This indicates the significant improvement that SGC represents for scalable big graph processing. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William48 and / 60Ma
49 Minimum Spanning Forest Given a weighted undirected graph G(V, E) of n nodes and m edges, with each edge (u, v) E assigned a weight w((u, v)), a Minimum Spanning Forest (MSF) is a spanning forest of G with the minimum total edge weight. We also use (u, v, w((u, v))) to denote an edge. Although MSF can be efficiently computed on a sequential machine using O(m + nlog(n)) time, it is non-trivial to solve the algorithm in MapReduce. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William49 and / 60Ma
50 Minimum Spanning Forest The following is an example of a Minimum Spanning Tree. A forest is made up of many trees. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William50 and / 60Ma
51 MSF Algorithm in SGC Suppose there is a total order among all edges as follows. For any two edges e 1 = (u 1, v 1, w 1 ) and e 2 = (u 2, v 2, w 2 ), e 1 < e 2 iff one of the following conditions holds: 1 w 1 < w 2 2 w 1 = w 2 and min(u 1, v 1 ) < min(u 2, v 2 ) 3 w 1 = w 2 and min(u 1, v 1 ) = min(u 2, v 2 ), and max(u 1, v 1 ) < max(u 2, v 2 ) Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William51 and / 60Ma
52 MSF Comparisons The comparison of two existing algorithms OneRoundMSF, MultiRoundMSF, and our algorithm MSF is shown below in terms of memory consumption per machine, total communication cost per round, and the number of rounds. As we will show in our performance testing, the high memory requirement of OneRoundMSF and MultiRoundMSF becomes the bottleneck for the algorithms to achieve high scalability when handling graphs with large n. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William52 and / 60Ma
53 Performance Testing We tested the performance of the aforementioned algorithms on a cluster of 17 computing nodes, including one master node and 16 slave nodes running, each of which has four Intel Xeon 2.4GHz CPUs and 15GB RAM running 64-bit Ubuntu Linux. We implement all algorithms using Hadoop (version 1.2.1) with Java 1.6. We allow each node to run three mappers and three reducers concurrently Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William53 and / 60Ma
54 Data Sets We use two web-scale graphs Twitter-2010 and Friendster with different graph characteristics for testing. Twitter-2010 contains 41,652,230 nodes and 1,468,365,182 edges with an average degree of 71. The maximum degree is 3,081,112 and the diameter of Twitter-2010 is around 24. Friendster contains 65,608,366 nodes and 1,806,067,135 edges with an average degree of 55. The maximum degree is 5,214 and the diameter of Friendster is around 32. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William54 and / 60Ma
55 Algorithms Besides the five algorithms PageRank (Algorithm 1), BFS (Algorithm 2), KWS (Algorithm 3), CC (Algorithm 4), and MSF (Algorithm 5), we also implement the algorithms for PageRank, BFS, and graph keyword search using the join operations supported by Pig on Hadoop, denoted PageRank-Pig, BFS-Pig and KWS-Pig respectively. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William55 and / 60Ma
56 PageRank Algorithm Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William56 and / 60Ma
57 BFS Algorithm Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William57 and / 60Ma
58 CC Algorithm Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William58 and / 60Ma
59 MSF Algorithm Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William59 and / 60Ma
60 Conclusions In this paper, we studied scalable big graph processing in MapReduce. We reviewed previous MapReduce classes, and propose a new class SGC to guide the development of scalable graph processing algorithms in MapReduce. We introduce two graph join operators using which a large range of graph algorithms can be designed in SGC. Especially, for two fundamental graph algorithms CC computation and MSF computation, we improve the state-of-the-art algorithms both in theory and practice. We conducted extensive performance studies using real web-scale graphs to show the high scalability achieved for our algorithms in SGC. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Scalable Chengqi Big Graph Zhang, Processing Xuemin Lin, Map Presented Reduceby Megan Bryant February (College 11, 2015 of William60 and / 60Ma
University of Maryland. Tuesday, March 2, 2010
Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationData-Intensive Computing with MapReduce
Data-Intensive Computing with MapReduce Session 5: Graph Processing Jimmy Lin University of Maryland Thursday, February 21, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationAlgorithms for Grid Graphs in the MapReduce Model
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department
More informationJordan Boyd-Graber University of Maryland. Thursday, March 3, 2011
Data-Intensive Information Processing Applications! Session #5 Graph Algorithms Jordan Boyd-Graber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationLecture Summary CSC 263H. August 5, 2016
Lecture Summary CSC 263H August 5, 2016 This document is a very brief overview of what we did in each lecture, it is by no means a replacement for attending lecture or doing the readings. 1. Week 1 2.
More informationParallel Graph Algorithms
Parallel Graph Algorithms Design and Analysis of Parallel Algorithms 5DV050/VT3 Part I Introduction Overview Graphs definitions & representations Minimal Spanning Tree (MST) Prim s algorithm Single Source
More informationCSI 604 Elementary Graph Algorithms
CSI 604 Elementary Graph Algorithms Ref: Chapter 22 of the text by Cormen et al. (Second edition) 1 / 25 Graphs: Basic Definitions Undirected Graph G(V, E): V is set of nodes (or vertices) and E is the
More informationGraph Data Processing with MapReduce
Distributed data processing on the Cloud Lecture 5 Graph Data Processing with MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, 2015 (licensed under Creation Commons Attribution
More informationGraph Algorithms. Revised based on the slides by Ruoming Kent State
Graph Algorithms Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/
More informationFinding Connected Components in Map-Reduce in Logarithmic Rounds
Finding Connected Components in Map-Reduce in Logarithmic Rounds Vibhor Rastogi Ashwin Machanavajjhala Laukik Chitnis Anish Das Sarma {vibhor.rastogi, ashwin.machanavajjhala, laukik, anish.dassarma}@gmail.com
More information2. True or false: even though BFS and DFS have the same space complexity, they do not always have the same worst case asymptotic time complexity.
1. T F: Consider a directed graph G = (V, E) and a vertex s V. Suppose that for all v V, there exists a directed path in G from s to v. Suppose that a DFS is run on G, starting from s. Then, true or false:
More informationCS521 \ Notes for the Final Exam
CS521 \ Notes for final exam 1 Ariel Stolerman Asymptotic Notations: CS521 \ Notes for the Final Exam Notation Definition Limit Big-O ( ) Small-o ( ) Big- ( ) Small- ( ) Big- ( ) Notes: ( ) ( ) ( ) ( )
More informationCS 161 Lecture 11 BFS, Dijkstra s algorithm Jessica Su (some parts copied from CLRS) 1 Review
1 Review 1 Something I did not emphasize enough last time is that during the execution of depth-firstsearch, we construct depth-first-search trees. One graph may have multiple depth-firstsearch trees,
More informationLink Analysis in the Cloud
Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)
More informationParallel Graph Algorithms
Parallel Graph Algorithms Design and Analysis of Parallel Algorithms 5DV050 Spring 202 Part I Introduction Overview Graphsdenitions, properties, representation Minimal spanning tree Prim's algorithm Shortest
More informationGraphs and Network Flows ISE 411. Lecture 7. Dr. Ted Ralphs
Graphs and Network Flows ISE 411 Lecture 7 Dr. Ted Ralphs ISE 411 Lecture 7 1 References for Today s Lecture Required reading Chapter 20 References AMO Chapter 13 CLRS Chapter 23 ISE 411 Lecture 7 2 Minimum
More information22 Elementary Graph Algorithms. There are two standard ways to represent a
VI Graph Algorithms Elementary Graph Algorithms Minimum Spanning Trees Single-Source Shortest Paths All-Pairs Shortest Paths 22 Elementary Graph Algorithms There are two standard ways to represent a graph
More informationElementary Graph Algorithms. Ref: Chapter 22 of the text by Cormen et al. Representing a graph:
Elementary Graph Algorithms Ref: Chapter 22 of the text by Cormen et al. Representing a graph: Graph G(V, E): V set of nodes (vertices); E set of edges. Notation: n = V and m = E. (Vertices are numbered
More information22 Elementary Graph Algorithms. There are two standard ways to represent a
VI Graph Algorithms Elementary Graph Algorithms Minimum Spanning Trees Single-Source Shortest Paths All-Pairs Shortest Paths 22 Elementary Graph Algorithms There are two standard ways to represent a graph
More informationCS 341: Algorithms. Douglas R. Stinson. David R. Cheriton School of Computer Science University of Waterloo. February 26, 2019
CS 341: Algorithms Douglas R. Stinson David R. Cheriton School of Computer Science University of Waterloo February 26, 2019 D.R. Stinson (SCS) CS 341 February 26, 2019 1 / 296 1 Course Information 2 Introduction
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 7 Greedy Graph Algorithms Topological sort Shortest paths Adam Smith The (Algorithm) Design Process 1. Work out the answer for some examples. Look for a general principle
More informationIntroduction to Parallel & Distributed Computing Parallel Graph Algorithms
Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Lecture 16, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In This Lecture Parallel formulations of some important and fundamental
More informationClustering Using Graph Connectivity
Clustering Using Graph Connectivity Patrick Williams June 3, 010 1 Introduction It is often desirable to group elements of a set into disjoint subsets, based on the similarity between the elements in the
More informationMinimum-Spanning-Tree problem. Minimum Spanning Trees (Forests) Minimum-Spanning-Tree problem
Minimum Spanning Trees (Forests) Given an undirected graph G=(V,E) with each edge e having a weight w(e) : Find a subgraph T of G of minimum total weight s.t. every pair of vertices connected in G are
More informationCS369G: Algorithmic Techniques for Big Data Spring
CS369G: Algorithmic Techniques for Big Data Spring 2015-2016 Lecture 11: l 0 -Sampling and Introduction to Graph Streaming Prof. Moses Charikar Scribe: Austin Benson 1 Overview We present and analyze the
More informationUndirected Graphs. DSA - lecture 6 - T.U.Cluj-Napoca - M. Joldos 1
Undirected Graphs Terminology. Free Trees. Representations. Minimum Spanning Trees (algorithms: Prim, Kruskal). Graph Traversals (dfs, bfs). Articulation points & Biconnected Components. Graph Matching
More informationAlgorithms Sequential and Parallel: A Unified Approach; R. Miller and L. Boxer 3rd Graph Algorithms
Algorithms Sequential and Parallel: A Unified Approach; R. Miller and L. Boxer rd Edition @ 0 www.thestudycampus.com Graph Algorithms Terminology Representations Fundamental Algorithms Computing the Transitive
More informationChapter 9 Graph Algorithms
Chapter 9 Graph Algorithms 2 Introduction graph theory useful in practice represent many real-life problems can be slow if not careful with data structures 3 Definitions an undirected graph G = (V, E)
More informationDesign and Analysis of Algorithms - - Assessment
X Courses» Design and Analysis of Algorithms Week 1 Quiz 1) In the code fragment below, start and end are integer values and prime(x) is a function that returns true if x is a prime number and false otherwise.
More informationGraph Connectivity in MapReduce...How Hard Could it Be?
Graph Connectivity in MapReduce......How Hard Could it Be? Sergei Vassilvitskii +Karloff, Kumar, Lattanzi, Moseley, Roughgarden, Suri, Vattani, Wang August 28, 2015 Google NYC Maybe Easy...Maybe Hard...
More information11/22/2016. Chapter 9 Graph Algorithms. Introduction. Definitions. Definitions. Definitions. Definitions
Introduction Chapter 9 Graph Algorithms graph theory useful in practice represent many real-life problems can be slow if not careful with data structures 2 Definitions an undirected graph G = (V, E) is
More informationFast Clustering using MapReduce
Fast Clustering using MapReduce Alina Ene, Sungjin Im, Benjamin Moseley UIUC KDD 2011 Clustering Massive Data Group web pages based on their content Group users based on their online behavior Finding communities
More informationMapReduce Patterns. MCSN - N. Tonellotto - Distributed Enabling Platforms
MapReduce Patterns 1 Intermediate Data Written locally Transferred from mappers to reducers over network Issue - Performance bottleneck Solution - Use combiners - Use In-Mapper Combining 2 Original Word
More informationGraph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web
Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some
More information2.3 Algorithms Using Map-Reduce
28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure
More informationLecture 4: Graph Algorithms
Lecture 4: Graph Algorithms Definitions Undirected graph: G =(V, E) V finite set of vertices, E finite set of edges any edge e = (u,v) is an unordered pair Directed graph: edges are ordered pairs If e
More informationScribe from 2014/2015: Jessica Su, Hieu Pham Date: October 6, 2016 Editor: Jimmy Wu
CS 267 Lecture 3 Shortest paths, graph diameter Scribe from 2014/2015: Jessica Su, Hieu Pham Date: October 6, 2016 Editor: Jimmy Wu Today we will talk about algorithms for finding shortest paths in a graph.
More informationCS 6783 (Applied Algorithms) Lecture 5
CS 6783 (Applied Algorithms) Lecture 5 Antonina Kolokolova January 19, 2012 1 Minimum Spanning Trees An undirected graph G is a pair (V, E); V is a set (of vertices or nodes); E is a set of (undirected)
More informationEfficient Subgraph Matching by Postponing Cartesian Products
Efficient Subgraph Matching by Postponing Cartesian Products Computer Science and Engineering Lijun Chang Lijun.Chang@unsw.edu.au The University of New South Wales, Australia Joint work with Fei Bi, Xuemin
More informationThomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms
Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Introduction to Algorithms Preface xiii 1 Introduction 1 1.1 Algorithms 1 1.2 Analyzing algorithms 6 1.3 Designing algorithms 1 1 1.4 Summary 1 6
More informationAlgorithms for Finding Dominators in Directed Graphs
Department of Computer Science Aarhus University Master s Thesis Algorithms for Finding Dominators in Directed Graphs Author: Henrik Knakkegaard Christensen 20082178 Supervisor: Gerth Støling Brodal January
More informationParallel Graph Algorithms. Richard Peng Georgia Tech
Parallel Graph Algorithms Richard Peng Georgia Tech OUTLINE Model and problems Graph decompositions Randomized clusterings Interface with optimization THE MODEL The `scale up approach: Have a number of
More informationCS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms
CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms Edgar Solomonik University of Illinois at Urbana-Champaign October 12, 2016 Defining
More informationUniversity of Illinois at Urbana-Champaign Department of Computer Science. Final Examination
University of Illinois at Urbana-Champaign Department of Computer Science Final Examination CS 225 Data Structures and Software Principles Spring 2010 7-10p, Wednesday, May 12 Name: NetID: Lab Section
More informationA NETWORK SIMPLEX ALGORITHM FOR SOLVING THE MINIMUM DISTRIBUTION COST PROBLEM. I-Lin Wang and Shiou-Jie Lin. (Communicated by Shu-Cherng Fang)
JOURNAL OF INDUSTRIAL AND doi:10.3934/jimo.2009.5.929 MANAGEMENT OPTIMIZATION Volume 5, Number 4, November 2009 pp. 929 950 A NETWORK SIMPLEX ALGORITHM FOR SOLVING THE MINIMUM DISTRIBUTION COST PROBLEM
More informationA New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader
A New Parallel Algorithm for Connected Components in Dynamic Graphs Robert McColl Oded Green David Bader Overview The Problem Target Datasets Prior Work Parent-Neighbor Subgraph Results Conclusions Problem
More informationACO Comprehensive Exam March 19 and 20, Computability, Complexity and Algorithms
1. Computability, Complexity and Algorithms Bottleneck edges in a flow network: Consider a flow network on a directed graph G = (V,E) with capacities c e > 0 for e E. An edge e E is called a bottleneck
More informationCSE 100: GRAPH ALGORITHMS
CSE 100: GRAPH ALGORITHMS Dijkstra s Algorithm: Questions Initialize the graph: Give all vertices a dist of INFINITY, set all done flags to false Start at s; give s dist = 0 and set prev field to -1 Enqueue
More informationDistributed Algorithms 6.046J, Spring, Nancy Lynch
Distributed Algorithms 6.046J, Spring, 205 Nancy Lynch What are Distributed Algorithms? Algorithms that run on networked processors, or on multiprocessors that share memory. They solve many kinds of problems:
More informationDirect Addressing Hash table: Collision resolution how handle collisions Hash Functions:
Direct Addressing - key is index into array => O(1) lookup Hash table: -hash function maps key to index in table -if universe of keys > # table entries then hash functions collision are guaranteed => need
More informationCSE 332 Autumn 2016 Final Exam (closed book, closed notes, no calculators)
Name: Sample Solution Email address (UWNetID): CSE 332 Autumn 2016 Final Exam (closed book, closed notes, no calculators) Instructions: Read the directions for each question carefully before answering.
More informationI/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs
I/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs Yishi Lin, Xiaowei Chen, John C.S. Lui The Chinese University of Hong Kong 9/4/15 EXACT DISTANCE QUERIES ON DYNAMIC
More informationA6-R3: DATA STRUCTURE THROUGH C LANGUAGE
A6-R3: DATA STRUCTURE THROUGH C LANGUAGE NOTE: 1. There are TWO PARTS in this Module/Paper. PART ONE contains FOUR questions and PART TWO contains FIVE questions. 2. PART ONE is to be answered in the TEAR-OFF
More informationLecture 10. Elementary Graph Algorithm Minimum Spanning Trees
Lecture 10. Elementary Graph Algorithm Minimum Spanning Trees T. H. Cormen, C. E. Leiserson and R. L. Rivest Introduction to Algorithms, 3rd Edition, MIT Press, 2009 Sungkyunkwan University Hyunseung Choo
More informationGraphs and Graph Algorithms. Slides by Larry Ruzzo
Graphs and Graph Algorithms Slides by Larry Ruzzo Goals Graphs: defns, examples, utility, terminology Representation: input, internal Traversal: Breadth- & Depth-first search Three Algorithms: Connected
More informationComputational Geometry
Windowing queries Windowing Windowing queries Zoom in; re-center and zoom in; select by outlining Windowing Windowing queries Windowing Windowing queries Given a set of n axis-parallel line segments, preprocess
More informationMapReduce Algorithms. Barna Saha. March 28, 2016
MapReduce Algorithms Barna Saha March 28, 2016 Complexity Model for MapReduce Minimum Spanning Tree in MapReduce Computing Dense Subgraph in MapReduce Complexity Model for MapReduce:MRC i Input: finite
More informationTheory of Computing. Lecture 4/5 MAS 714 Hartmut Klauck
Theory of Computing Lecture 4/5 MAS 714 Hartmut Klauck How fast can we sort? There are deterministic algorithms that sort in worst case time O(n log n) Do better algorithms exist? Example [Andersson et
More informationCourse Review for Finals. Cpt S 223 Fall 2008
Course Review for Finals Cpt S 223 Fall 2008 1 Course Overview Introduction to advanced data structures Algorithmic asymptotic analysis Programming data structures Program design based on performance i.e.,
More informationParallel Breadth First Search
CSE341T/CSE549T 11/03/2014 Lecture 18 Parallel Breadth First Search Today, we will look at a basic graph algorithm, breadth first search (BFS). BFS can be applied to solve a variety of problems including:
More informationWeek 12: Minimum Spanning trees and Shortest Paths
Agenda: Week 12: Minimum Spanning trees and Shortest Paths Kruskal s Algorithm Single-source shortest paths Dijkstra s algorithm for non-negatively weighted case Reading: Textbook : 61-7, 80-87, 9-601
More informationChapter 9 Graph Algorithms
Introduction graph theory useful in practice represent many real-life problems can be if not careful with data structures Chapter 9 Graph s 2 Definitions Definitions an undirected graph is a finite set
More informationTrees Rooted Trees Spanning trees and Shortest Paths. 12. Graphs and Trees 2. Aaron Tan November 2017
12. Graphs and Trees 2 Aaron Tan 6 10 November 2017 1 10.5 Trees 2 Definition Definition Definition: Tree A graph is said to be circuit-free if, and only if, it has no circuits. A graph is called a tree
More informationGraph Algorithms Using Depth First Search
Graph Algorithms Using Depth First Search Analysis of Algorithms Week 8, Lecture 1 Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Graph Algorithms Using Depth
More informationApproximation Algorithms: The Primal-Dual Method. My T. Thai
Approximation Algorithms: The Primal-Dual Method My T. Thai 1 Overview of the Primal-Dual Method Consider the following primal program, called P: min st n c j x j j=1 n a ij x j b i j=1 x j 0 Then the
More informationGraph Theory. ICT Theory Excerpt from various sources by Robert Pergl
Graph Theory ICT Theory Excerpt from various sources by Robert Pergl What can graphs model? Cost of wiring electronic components together. Shortest route between two cities. Finding the shortest distance
More informationDS UNIT 4. Matoshri College of Engineering and Research Center Nasik Department of Computer Engineering Discrete Structutre UNIT - IV
Sr.No. Question Option A Option B Option C Option D 1 2 3 4 5 6 Class : S.E.Comp Which one of the following is the example of non linear data structure Let A be an adjacency matrix of a graph G. The ij
More informationDatabases 2 (VU) ( / )
Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:
More informationSimple Parallel Biconnectivity Algorithms for Multicore Platforms
Simple Parallel Biconnectivity Algorithms for Multicore Platforms George M. Slota Kamesh Madduri The Pennsylvania State University HiPC 2014 December 17-20, 2014 Code, presentation available at graphanalysis.info
More informationChapter 9 Graph Algorithms
Chapter 9 Graph Algorithms 2 Introduction graph theory useful in practice represent many real-life problems can be if not careful with data structures 3 Definitions an undirected graph G = (V, E) is a
More informationCSE 613: Parallel Programming. Lecture 11 ( Graph Algorithms: Connected Components )
CSE 61: Parallel Programming Lecture ( Graph Algorithms: Connected Components ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 01 Graph Connectivity 1 1 1 6 5 Connected Components:
More informationSolutions to relevant spring 2000 exam problems
Problem 2, exam Here s Prim s algorithm, modified slightly to use C syntax. MSTPrim (G, w, r): Q = V[G]; for (each u Q) { key[u] = ; key[r] = 0; π[r] = 0; while (Q not empty) { u = ExtractMin (Q); for
More information2 A Template for Minimum Spanning Tree Algorithms
CS, Lecture 5 Minimum Spanning Trees Scribe: Logan Short (05), William Chen (0), Mary Wootters (0) Date: May, 0 Introduction Today we will continue our discussion of greedy algorithms, specifically in
More informationLecture 4: Primal Dual Matching Algorithm and Non-Bipartite Matching. 1 Primal/Dual Algorithm for weighted matchings in Bipartite Graphs
CMPUT 675: Topics in Algorithms and Combinatorial Optimization (Fall 009) Lecture 4: Primal Dual Matching Algorithm and Non-Bipartite Matching Lecturer: Mohammad R. Salavatipour Date: Sept 15 and 17, 009
More informationSolution for Homework set 3
TTIC 300 and CMSC 37000 Algorithms Winter 07 Solution for Homework set 3 Question (0 points) We are given a directed graph G = (V, E), with two special vertices s and t, and non-negative integral capacities
More informationMapReduce and Friends
MapReduce and Friends Craig C. Douglas University of Wyoming with thanks to Mookwon Seo Why was it invented? MapReduce is a mergesort for large distributed memory computers. It was the basis for a web
More informationCSE 100 Minimum Spanning Trees Prim s and Kruskal
CSE 100 Minimum Spanning Trees Prim s and Kruskal Your Turn The array of vertices, which include dist, prev, and done fields (initialize dist to INFINITY and done to false ): V0: dist= prev= done= adj:
More informationLecture 11: Graph algorithms! Claudia Hauff (Web Information Systems)!
Lecture 11: Graph algorithms!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the scenes of MapReduce:
More informationDr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions
Dr. Amotz Bar-Noy s Compendium of Algorithms Problems Problems, Hints, and Solutions Chapter 1 Searching and Sorting Problems 1 1.1 Array with One Missing 1.1.1 Problem Let A = A[1],..., A[n] be an array
More informationWeighted Graph Algorithms Presented by Jason Yuan
Weighted Graph Algorithms Presented by Jason Yuan Slides: Zachary Friggstad Programming Club Meeting Weighted Graphs struct Edge { int u, v ; int w e i g h t ; // can be a double } ; Edge ( int uu = 0,
More information1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1
Asymptotics, Recurrence and Basic Algorithms 1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1 1. O(logn) 2. O(n) 3. O(nlogn) 4. O(n 2 ) 5. O(2 n ) 2. [1 pt] What is the solution
More informationSeminar on. Edge Coloring Series Parallel Graphs. Mohammmad Tawhidul Islam. Masters of Computer Science Summer Semester 2002 Matrikel Nr.
Seminar on Edge Coloring Series Parallel Graphs Mohammmad Tawhidul Islam Masters of Computer Science Summer Semester 2002 Matrikel Nr. 9003378 Fachhochschule Bonn-Rhein-Sieg Contents 1. Introduction. 2.
More informationJeffrey D. Ullman Stanford University/Infolab
Jeffrey D. Ullman Stanford University/Infolab 3 Why Care? 1. Density of triangles measures maturity of a community. As communities age, their members tend to connect. 2. The algorithm is actually an example
More informationCSIT5300: Advanced Database Systems
CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,
More informationMinimum Spanning Trees My T. UF
Introduction to Algorithms Minimum Spanning Trees @ UF Problem Find a low cost network connecting a set of locations Any pair of locations are connected There is no cycle Some applications: Communication
More informationCS 4407 Algorithms Lecture 5: Graphs an Introduction
CS 4407 Algorithms Lecture 5: Graphs an Introduction Prof. Gregory Provan Department of Computer Science University College Cork 1 Outline Motivation Importance of graphs for algorithm design applications
More informationCOMP Parallel Computing. PRAM (3) PRAM algorithm design techniques
COMP 633 - Parallel Computing Lecture 4 August 30, 2018 PRAM algorithm design techniques Reading for next class PRAM handout section 5 1 Topics Parallel connected components algorithm representation of
More informationElements of Graph Theory
Elements of Graph Theory Quick review of Chapters 9.1 9.5, 9.7 (studied in Mt1348/2008) = all basic concepts must be known New topics we will mostly skip shortest paths (Chapter 9.6), as that was covered
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 4: Analyzing Graphs (1/2) October 4, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are
More informationGreedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}.
Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,
More informationParallel Connected Components
Parallel Connected Components prof. Ing. Pavel Tvrdík CSc. Katedra počítačových systémů Fakulta informačních technologií České vysoké učení technické v Praze c Pavel Tvrdík, 00 Pokročilé paralelní algoritmy
More informationLecture 7: Asymmetric K-Center
Advanced Approximation Algorithms (CMU 18-854B, Spring 008) Lecture 7: Asymmetric K-Center February 5, 007 Lecturer: Anupam Gupta Scribe: Jeremiah Blocki In this lecture, we will consider the K-center
More informationAlgorithm Design (8) Graph Algorithms 1/2
Graph Algorithm Design (8) Graph Algorithms / Graph:, : A finite set of vertices (or nodes) : A finite set of edges (or arcs or branches) each of which connect two vertices Takashi Chikayama School of
More informationGoals! CSE 417: Algorithms and Computational Complexity!
Goals! CSE : Algorithms and Computational Complexity! Graphs: defns, examples, utility, terminology! Representation: input, internal! Traversal: Breadth- & Depth-first search! Three Algorithms:!!Connected
More informationCS781 Lecture 2 January 13, Graph Traversals, Search, and Ordering
CS781 Lecture 2 January 13, 2010 Graph Traversals, Search, and Ordering Review of Lecture 1 Notions of Algorithm Scalability Worst-Case and Average-Case Analysis Asymptotic Growth Rates: Big-Oh Prototypical
More informationUnion/Find Aka: Disjoint-set forest. Problem definition. Naïve attempts CS 445
CS 5 Union/Find Aka: Disjoint-set forest Alon Efrat Problem definition Given: A set of atoms S={1, n E.g. each represents a commercial name of a drugs. This set consists of different disjoint subsets.
More informationNetwork optimization: An overview
Network optimization: An overview Mathias Johanson Alkit Communications 1 Introduction Various kinds of network optimization problems appear in many fields of work, including telecommunication systems,
More informationand 6.855J February 6, Data Structures
15.08 and 6.855J February 6, 003 Data Structures 1 Overview of this Lecture A very fast overview of some data structures that we will be using this semester lists, sets, stacks, queues, networks, trees
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic
More informationGraph Representations and Traversal
COMPSCI 330: Design and Analysis of Algorithms 02/08/2017-02/20/2017 Graph Representations and Traversal Lecturer: Debmalya Panigrahi Scribe: Tianqi Song, Fred Zhang, Tianyu Wang 1 Overview This lecture
More information