Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing

Size: px
Start display at page:

Download "Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing"

Transcription

1 Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing Laxman Dhulipala Joint work with Guy Blelloch and Julian Shun SPAA 07

2 Giant graph datasets Graph V E (symmetrized) com-orkut M 4M Twitter 4M.46B Friendster 4M.6B Hyperlink0-Host 0M.04B Facebook (0) 7M 68.4B Hyperlink04.7B 4B Hyperlink0.5B 5B Facebook (07) > B > 00B Google (07)?? : Publicly available graphs used in our experiments : Private graph datasets

3 Traditional approaches One possible way to solve large graph problems: Hand-write MPI/OpenMP/Cilk codes Run a powerful machine or a large cluster

4 Traditional approaches One possible way to solve large graph problems: Hand-write MPI/OpenMP/Cilk codes Run a powerful machine or a large cluster Benefits Good performance Can hand-code custom optimizations

5 Traditional approaches One possible way to solve large graph problems: Hand-write MPI/OpenMP/Cilk codes Run a powerful machine or a large cluster Benefits Good performance Can hand-code custom optimizations Downsides Usually require a lot of code Need lots of expertise to write and understand codes Not everyone has a supercomputer

6 Graph processing frameworks High level goals Simple set of primitives (interface) Implementations easy to write and understand Algorithms can handle very large graphs

7 Graph processing frameworks High level goals Simple set of primitives (interface) Implementations easy to write and understand Algorithms can handle very large graphs Ex: Pregel, GraphLab, Ligra, GraphX, GraphChi

8 Graph processing frameworks High level goals Simple set of primitives (interface) Implementations easy to write and understand Algorithms can handle very large graphs Ex: Pregel, GraphLab, Ligra, GraphX, GraphChi Additional goals: Primitives have theoretical guarantees Support common optimizations under the hood Implementations competitive with non-framework codes

9 Graph processing frameworks High level goals Simple set of primitives (interface) Implementations easy to write and understand Algorithms can handle very large graphs Ex: Pregel, GraphLab, Ligra, GraphX, GraphChi Additional goals: Primitives have theoretical guarantees Support common optimizations under the hood Implementations competitive with non-framework codes Our goals: All of the above on a single affordable shared memory machine

10 An affordable machine

11 An affordable machine Dell PowerEdge R90 7-cores (4 x.4ghz 8-core E v4 Xeon processors) TB of main memory Costs less than a mid-range BMW

12 An affordable machine Dell PowerEdge R90 7-cores (4 x.4ghz 8-core E v4 Xeon processors) TB of main memory Costs less than a mid-range BMW

13 Ligra Shared memory graph processing framework [] [] Shun and Blelloch, 0, Ligra: A Lightweight Graph Processing Framework for Shared Memory [] Shun, Dhulipala and Blelloch, 0, Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+

14 Ligra Shared memory graph processing framework [] Benefits Designed to express frontier-based algorithms Primitives and implementations have theoretical guarantees Optimizations (direction-optimizing, compression []) Implementations are simple to write and understand Competitive with hand-tuned codes [] Shun and Blelloch, 0, Ligra: A Lightweight Graph Processing Framework for Shared Memory [] Shun, Dhulipala and Blelloch, 0, Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+

15 Ligra Shared memory graph processing framework [] Benefits Designed to express frontier-based algorithms Primitives and implementations have theoretical guarantees Optimizations (direction-optimizing, compression []) Implementations are simple to write and understand Competitive with hand-tuned codes Downsides Some algorithms may not be efficiently implementable [] Shun and Blelloch, 0, Ligra: A Lightweight Graph Processing Framework for Shared Memory [] Shun, Dhulipala and Blelloch, 0, Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+

16 Ligra Shared memory graph processing framework [] Benefits Designed to express frontier-based algorithms Primitives and implementations have theoretical guarantees Optimizations (direction-optimizing, compression []) Implementations are simple to write and understand Competitive with hand-tuned codes Downsides Some algorithms may not be efficiently implementable This work: Made Ligra codes run on the largest publicly available graphs on a single machine [] Shun and Blelloch, 0, Ligra: A Lightweight Graph Processing Framework for Shared Memory [] Shun, Dhulipala and Blelloch, 0, Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+

17 Ligra: Frontier-based algorithms Primitives Frontier data-structure (vertexsubset) Map over vertices in a frontier Map over out-edges of a frontier

18 Ligra: Frontier-based algorithms Primitives Frontier data-structure (vertexsubset) Map over vertices in a frontier Map over out-edges of a frontier Example: Breadth-First Search

19 Ligra: Frontier-based algorithms Primitives Frontier data-structure (vertexsubset) Map over vertices in a frontier Map over out-edges of a frontier Example: Breadth-First Search Round : in frontier : unvisited : visited

20 Ligra: Frontier-based algorithms Primitives Frontier data-structure (vertexsubset) Map over vertices in a frontier Map over out-edges of a frontier Example: Breadth-First Search Round Round : in frontier : unvisited : visited

21 Ligra: Frontier-based algorithms Primitives Frontier data-structure (vertexsubset) Map over vertices in a frontier Map over out-edges of a frontier Example: Breadth-First Search Round Round Round : in frontier : unvisited : visited

22 Ligra: Frontier-based algorithms Primitives Frontier data-structure (vertexsubset) Map over vertices in a frontier Map over out-edges of a frontier Example: Breadth-First Search Round Round Round Round 4 : in frontier : unvisited : visited

23 Ligra: Frontier-based algorithms Primitives Frontier data-structure (vertexsubset) Map over vertices in a frontier Map over out-edges of a frontier Example: Breadth-First Search Round Round Round Round 4 : in frontier : unvisited : visited Some useful graph algorithms cannot be efficiently implemented in frontier-based frameworks

24 Example: Weighted Breadth-First Search Given: G =(V,E,w) with positive integer edge weights, s V Problem: Compute the shortest path distances from s s

25 Example: Weighted Breadth-First Search Given: G =(V,E,w) with positive integer edge weights, s V Problem: Compute the shortest path distances from s Frontier-based: On each step, visit all neighbors that had their distance decrease s

26 Example: Weighted Breadth-First Search s Frontier: s Round

27 Example: Weighted Breadth-First Search s Frontier: s Round

28 Example: Weighted Breadth-First Search s Frontier: 4 Round

29 Example: Weighted Breadth-First Search s Frontier: 4 Round

30 Example: Weighted Breadth-First Search s Frontier: Round

31 Example: Weighted Breadth-First Search s Frontier: Round

32 Example: Weighted Breadth-First Search s Frontier: Not work-efficient! Round

33 Sequential Weighted Breadth-First Search s Idea: Run Dijkstra s algorithm, but use buckets instead of a PQ Represent buckets using dynamic arrays Simple, efficient implementation running in O(D + E ) work

34 Sequential Weighted Breadth-First Search s Round s

35 Sequential Weighted Breadth-First Search s Round s

36 Sequential Weighted Breadth-First Search s Round s

37 Sequential Weighted Breadth-First Search s Round

38 Sequential Weighted Breadth-First Search s Round

39 Sequential Weighted Breadth-First Search s Round

40 Sequential Weighted Breadth-First Search s Round

41 Sequential Weighted Breadth-First Search s Round

42 Sequential Weighted Breadth-First Search s Round

43 Sequential Weighted Breadth-First Search s Round

44 Sequential Weighted Breadth-First Search s O(D + E ) work where D is the graph diameter Round

45 Bucketing The algorithm uses buckets to organize work for future iterations

46 Bucketing The algorithm uses buckets to organize work for future iterations

47 Bucketing The algorithm uses buckets to organize work for future iterations This algorithms is actually parallelizable In each step:. Process all vertices in the next bucket in parallel. Update buckets of neighbors in parallel

48 Sequential Weighted Breadth-First Search s Round Sequential: process vertices one by one

49 Parallel Weighted Breadth-First Search s Round () Process vertices in the same bucket in parallel

50 Parallel Weighted Breadth-First Search s Round () Insert neighbors into buckets in parallel

51 Parallel Weighted Breadth-First Search s 5 Resulting algorithm performs: O(D 4 + E ) work O(D log V ) depth (assuming efficient bucketing) Round () Insert neighbors into buckets in parallel

52 Parallel bucketing Bucketing is useful for more than just wbfs k-core (coreness) Delta-Stepping Parallel Approximate Set Cover

53 Parallel bucketing Bucketing is useful for more than just wbfs k-core (coreness) Delta-Stepping Parallel Approximate Set Cover Goals Simplify expressing algorithms using an interface Theoretically efficient, reusable implementation

54 Parallel bucketing Bucketing is useful for more than just wbfs k-core (coreness) Delta-Stepping Parallel Approximate Set Cover Goals Simplify expressing algorithms using an interface Theoretically efficient, reusable implementation Difficulties. Multiple vertices insert into the same bucket in parallel. Possible to make work-efficient parallel implementations?

55 Results: Julienne Shared memory framework for bucketing-based algorithms

56 Results: Julienne Shared memory framework for bucketing-based algorithms Extend Ligra with an interface for bucketing Theoretical bounds for primitives Fast implementations of primitives

57 Results: Julienne Shared memory framework for bucketing-based algorithms Extend Ligra with an interface for bucketing Theoretical bounds for primitives Fast implementations of primitives Can implement a bucketing algorithm with n vertices T total buckets U updates over K Update calls, and L calls to NextBucket O(n + T + U) expected work and O((K + L) log n) depth w.h.p.

58 Results: Julienne Shared memory framework for bucketing-based algorithms Extend Ligra with an interface for bucketing Theoretical bounds for primitives Fast implementations of primitives Can implement a bucketing algorithm with n vertices T total buckets U updates over K Update calls, and L calls to NextBucket Bucketing O(n + implementation T + U) expected is work-efficient and O((K + L) log n) depth w.h.p.

59 Results: Julienne Work-efficient implementations of 4 bucketing-based algorithms: k-core (coreness) Weighted Breadth-First Search Delta-Stepping Parallel Approximate Set Cover

60 Results: Julienne Work-efficient implementations of 4 bucketing-based algorithms: k-core (coreness) Weighted Breadth-First Search Delta-Stepping Parallel Approximate Set Cover Codes are simple All implementations < 00 LoC

61 Results: Julienne Work-efficient implementations of 4 bucketing-based algorithms: k-core (coreness) Weighted Breadth-First Search Delta-Stepping Parallel Approximate Set Cover Codes are simple All implementations < 00 LoC Codes competitive with, or outperform existing implementations

62 Results: Julienne Work-efficient implementations of 4 bucketing-based algorithms: k-core (coreness) Weighted Breadth-First Search Delta-Stepping Parallel Approximate Set Cover Codes are simple All implementations < 00 LoC Codes competitive with, or outperform existing implementations First work-efficient k-core algorithm with non-trivial parallelism

63 Results: Julienne Work-efficient implementations of 4 bucketing-based algorithms: k-core (coreness) Weighted Breadth-First Search Delta-Stepping Parallel Approximate Set Cover Codes are simple All implementations < 00 LoC Codes competitive with, or outperform existing implementations First work-efficient k-core algorithm with non-trivial parallelism Compute k-cores of largest publicly available graph (~00B edges) in ~ minutes and approximate set-cover in ~ minutes

64 Julienne: Interface Julienne Bucketing Interface Ligra vertexsubset Graph

65 Julienne: Interface Julienne Bucketing Interface Bucketing Interface: () Create bucket structure Ligra vertexsubset Graph () Get the next bucket (vertexsubset) () Update buckets of a subset of identifiers

66 Julienne: Interface MakeBuckets : buckets n : int D : identifier bucket id O : bucket order Initialize bucket structure

67 Julienne: Interface D() = 0,D() =,D() = 4,... MakeBuckets : buckets n : int D : identifier bucket id O : bucket order Initialize bucket structure

68 Julienne: Interface D() = 0,D() =,D() = 4,... MakeBuckets : buckets n : int D : identifier bucket id O : bucket order Initialize bucket structure

69 Julienne: Interface NextBucket : bucket Extract identifiers in the next non-empty bucket

70 Julienne: Interface Order: increasing NextBucket : bucket Extract identifiers in the next non-empty bucket

71 Julienne: Interface Order: increasing NextBucket : bucket Extract identifiers in the next non-empty bucket

72 Julienne: Interface Order: increasing 4 NextBucket : bucket Extract identifiers in the next non-empty bucket

73 Julienne: Interface UpdateBuckets k : int F : int (identifier, bucket dest) Update buckets for k identifiers

74 Julienne: Interface [(,), (7,), (6,)] UpdateBuckets k : int F : int (identifier, bucket dest) Update buckets for k identifiers

75 Julienne: Interface [(,), (7,), (6,)] UpdateBuckets k : int F : int (identifier, bucket dest) Update buckets for k identifiers

76 Julienne: Interface [(,), (7,), (6,)] UpdateBuckets k : int F : int (identifier, bucket dest) Update buckets for k identifiers

77 Julienne: Interface [(,), (7,), (6,)] UpdateBuckets k : int F : int (identifier, bucket dest) Update buckets for k identifiers

78 Sequential Bucketing Can implement sequential bucketing with: n identifiers T total buckets K calls to UpdateBuckets, where each updates the ids in S i in O(n + T + K S i ) work i=0

79 Sequential Bucketing Can implement sequential bucketing with: n identifiers T total buckets K calls to UpdateBuckets, where each updates the ids in S i in O(n + T + K S i ) work i=0 Implementation: Use dynamic arrays Update lazily

80 Parallel Bucketing Can implement parallel bucketing with: n identifiers T total buckets K calls to UpdateBuckets, where each updates the ids in S i L calls to NextBucket in O(n + T + K S i ) expected work and i=0 O((K + L) log n) depth w.h.p.

81 Parallel Bucketing Can implement parallel bucketing with: n identifiers T total buckets K calls to UpdateBuckets, where each updates the ids in S i L calls to NextBucket in O(n + T + K S i ) expected work and i=0 O((K + L) log n) depth w.h.p. Implementation: Use dynamic arrays MakeBuckets: call UpdateBuckets. NextBucket: parallel filter

82 Parallel Bucketing UpdateBuckets: Use work-efficient semisort [Gu et al. 05] Given k (key, value) pairs, semisorts in O(k) expected work and O(log k) depth w.h.p.

83 Parallel Bucketing UpdateBuckets: Use work-efficient semisort [Gu et al. 05] Given k (key, value) pairs, semisorts in O(k) expected work and O(log k) depth w.h.p. [(,9), (4,7),, (,), (,)]

84 Parallel Bucketing UpdateBuckets: Use work-efficient semisort [Gu et al. 05] Given k (key, value) pairs, semisorts in O(k) expected work and O(log k) depth w.h.p. [(,9), (4,7),, (,), (,)] [(,), (,), (7,),, (4,7), (6,7),, (,9)] All ids going to bucket

85 Parallel Bucketing UpdateBuckets: Use work-efficient semisort [Gu et al. 05] Given k (key, value) pairs, semisorts in O(k) expected work and O(log k) depth w.h.p. [(,9), (4,7),, (,), (,)] [(,), (,), (7,),, (4,7), (6,7),, (,9)] All ids going to bucket Prefix sum to compute #ids going to each bucket Resize buckets and inject all ids in parallel

86 Parallel Bucketing UpdateBuckets: Use work-efficient semisort [Gu et al. 05] Given k (key, value) pairs, semisorts in O(k) expected work and O(log k) depth w.h.p. [(,9), (4,7),, (,), (,)] [(,), (,), (7,),, (4,7), (6,7),, (,9)] All ids going to bucket Prefix sum to compute #ids going to each bucket Resize buckets and inject all ids in parallel Please see paper for details on practical implementation and optimizations

87 Example: k-core and coreness k-core : maximal connected subgraph of G s.t. all vertices have degree k (v) : largest k-core that v participates in

88 Example: k-core and coreness k-core : maximal connected subgraph of G s.t. all vertices have degree k (v) : largest k-core that v participates in a

89 Example: k-core and coreness k-core : maximal connected subgraph of G s.t. all vertices have degree k (v) : largest k-core that v participates in a -core

90 Example: k-core and coreness k-core : maximal connected subgraph of G s.t. all vertices have degree k (v) : largest k-core that v participates in -core a -core

91 Example: k-core and coreness k-core : maximal connected subgraph of G s.t. all vertices have degree k (v) : largest k-core that v participates in -core -core a -core

92 Example: k-core and coreness k-core : maximal connected subgraph of G s.t. all vertices have degree k (v) : largest k-core that v participates in -core (a) = -core a -core

93 Example: k-core and coreness k-core : maximal connected subgraph of G s.t. all vertices have degree k (v) : largest k-core that v participates in -core (a) = -core a -core Can efficiently compute k-cores after computing coreness

94 k-core and Coreness Sequential Peeling: Bucket sort vertices by degree Remove the minimum degree vertex, set its core number Update the buckets of its neighbors

95 k-core and Coreness Sequential Peeling: Bucket sort vertices by degree Remove the minimum degree vertex, set its core number Update the buckets of its neighbors Each vertex and edge is processed exactly once: W = O( E + V )

96 k-core and Coreness Sequential Peeling: Bucket sort vertices by degree Remove the minimum degree vertex, set its core number Update the buckets of its neighbors Each vertex and edge is processed exactly once: W = O( E + V ) Existing parallel algorithms: Scan all remaining vertices when computing each core

97 k-core and Coreness Sequential Peeling: Bucket sort vertices by degree Remove the minimum degree vertex, set its core number Update the buckets of its neighbors Each vertex and edge is processed exactly once: W = O( E + V ) Existing parallel algorithms: Scan all remaining vertices when computing each core = number of peeling steps done by the parallel algorithm W = O( E + V ) D = O( log V )

98 Work-efficient Peeling Insert vertices in bucket structure by degree

99 Work-efficient Peeling Insert vertices in bucket structure by degree

100 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:

101 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers

102 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers

103 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers

104 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier

105 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier () () ()

106 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier. Compute the new buckets for the neighbors () () ()

107 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier. Compute the new buckets for the neighbors () () () (0) () ()

108 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier. Compute the new buckets for the neighbors 4. Update the bucket structure with the (neighbors, buckets) () () () (0) () ()

109 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier. Compute the new buckets for the neighbors 4. Update the bucket structure with the (neighbors, buckets) () () () (0) () ()

110 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier. Compute the new buckets for the neighbors 4. Update the bucket structure with the (neighbors, buckets) () () () (0) () ()

111 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier. Compute the new buckets for the neighbors 4. Update the bucket structure with the (neighbors, buckets)

112 Work-efficient Peeling

113 Work-efficient Peeling We process each edge at most once in each direction:

114 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E )

115 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E ) # buckets V

116 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E ) # buckets V # calls to NextBucket =

117 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E ) # buckets V # calls to NextBucket = # calls to UpdateBuckets =

118 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E ) # buckets V # calls to NextBucket = # calls to UpdateBuckets = Therefore the algorithm runs in: O( E + V ) O( log V ) expected work depth w.h.p.

119 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E ) # buckets V # calls to NextBucket = # calls to UpdateBuckets = Therefore the algorithm runs in: O( E + V ) O( log V ) expected work depth w.h.p. On the largest graph we test on, = 0, 78

120 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E ) # buckets V # calls to NextBucket = # calls to UpdateBuckets = Therefore the algorithm runs in: O( E + V ) O( log V ) expected work depth w.h.p. On the largest graph we test on, = 0, 78 On 7 cores, our code finishes in a few minutes, but the work-inefficient algorithm does not terminate within hours

121 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E ) # buckets V # calls to NextBucket = # calls to UpdateBuckets = Therefore the algorithm runs in: O( E + V ) expected work O( log V ) depth w.h.p. On the largest graph we test on, = 0, 78 On 7 cores, our code finishes in a few minutes, but the work-inefficient algorithm does not terminate within hours Efficient peeling using Julienne

122 Summary of results Algorithm Work Depth k-core wbfs Delta-stepping Approx Set Cover O( E + V ) O( log V ) O(D + E ) O(D log V ) O(w ) O(d log V ) O(M) O(log M) [] [] : number of rounds of parallel peeling w,d D : diameter : work and number of rounds of the delta-stepping algorithm M : sum of sizes of sets [] Meyer, Sanders: Δ-stepping: a parallelizable shortest path algorithm [] Blelloch, Peng, Tangwongsan: Linear-work greedy parallel approximate set cover and variants

123 Experiments: k-core Running time (seconds) Julienne (work-efficient) Ligra (work-inefficient) Friendster V = M E =.6B h Number of threads Across all inputs: Between 4-4x speedup over sequential peeling Speedups are smaller on small graphs with large -9x faster than work-inefficient implementation

124 Experiments: Delta-stepping Running time (seconds) 00 0 Julienne Galois Gap Ligra (Bellman-Ford) Friendster V = M E =.6B h Number of threads Across all inputs: 8-x self-relative speedup, 7-0x speedup over DIMACS solver.-.7x faster than best existing implementation of Delta-Stepping.8-5.x faster than (work-inefficient) Bellman-Ford

125 Experiments: Hyperlink Graphs Hyperlink graphs extracted from Common Crawl Corpus Graph V E E (symmetrized) HL04.7B 64B 4B HL0.5B 8B 5B Previous analyses use supercomputers [] or external memory [] HL0-Sym requires ~TB of memory uncompressed [] Slota et al., 05, Supercomputing for Web Graph Analytics [] Zheng et al., 05, FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs

126 Experiments: Hyperlink Graphs Graph k-core wbfs Set Cover HL HL Running time in seconds on 7 cores with hyperthreading Able to process in main-memory of TB machine by compressing -4x speedup across applications Compression is crucial Julienne/Ligra codes run without any modifications Can t run other codes on these graphs without significant effort

127 Conclusion Julienne: framework for bucketing-based algorithms k-core Delta-stepping wbfs d d Parallel Approximate Set Cover

128 Conclusion Julienne: framework for bucketing-based algorithms Codes: Simple (< 00 lines each) Theoretically efficient Good performance in practice Code will be included as part of github.com/jshun/ligra Future work: Trusses, Nucleus Decomposition, Densest Subgraph k-core Delta-stepping wbfs d d Parallel Approximate Set Cover

129 Thank you! Please feel free to reach out to k-core Delta-stepping wbfs d d Parallel Approximate Set Cover

A Framework for Processing Large Graphs in Shared Memory

A Framework for Processing Large Graphs in Shared Memory A Framework for Processing Large Graphs in Shared Memory Julian Shun Based on joint work with Guy Blelloch and Laxman Dhulipala (Work done at Carnegie Mellon University) 2 What are graphs? Vertex Edge

More information

Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing

Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing : A Framework for Parallel Graph Algorithms using Work-efficient Bucketing Laxman Dhulipala Carnegie Mellon University ldhulipa@cs.cmu.edu Guy Blelloch Carnegie Mellon University guyb@cs.cmu.edu Julian

More information

A Simple and Practical Linear-Work Parallel Algorithm for Connectivity

A Simple and Practical Linear-Work Parallel Algorithm for Connectivity A Simple and Practical Linear-Work Parallel Algorithm for Connectivity Julian Shun, Laxman Dhulipala, and Guy Blelloch Presentation based on publication in Symposium on Parallelism in Algorithms and Architectures

More information

A Simple and Practical Linear-Work Parallel Algorithm for Connectivity

A Simple and Practical Linear-Work Parallel Algorithm for Connectivity A Simple and Practical Linear-Work Parallel Algorithm for Connectivity Julian Shun, Laxman Dhulipala, and Guy Blelloch Presentation based on publication in Symposium on Parallelism in Algorithms and Architectures

More information

A Framework for Processing Large Graphs in Shared Memory

A Framework for Processing Large Graphs in Shared Memory A Framework for Processing Large Graphs in Shared Memory Julian Shun UC Berkeley (Miller Institute, AMPLab) Based on joint work with Guy Blelloch and Laxman Dhulipala, Kimon Fountoulakis, Michael Mahoney,

More information

CS 5220: Parallel Graph Algorithms. David Bindel

CS 5220: Parallel Graph Algorithms. David Bindel CS 5220: Parallel Graph Algorithms David Bindel 2017-11-14 1 Graphs Mathematically: G = (V, E) where E V V Convention: V = n and E = m May be directed or undirected May have weights w V : V R or w E :

More information

Parallel Local Graph Clustering

Parallel Local Graph Clustering Parallel Local Graph Clustering Julian Shun Joint work with Farbod Roosta-Khorasani, Kimon Fountoulakis, and Michael W. Mahoney Work appeared in VLDB 2016 2 Metric for Cluster Quality Conductance = Number

More information

Tutorial: Large-Scale Graph Processing in Shared Memory

Tutorial: Large-Scale Graph Processing in Shared Memory Tutorial: Large-Scale Graph Processing in Shared Memory Julian Shun (University of California, Berkeley) Laxman Dhulipala (Carnegie Mellon University) Guy Blelloch (Carnegie Mellon University) PPoPP 2016

More information

Multicore Triangle Computations Without Tuning

Multicore Triangle Computations Without Tuning Multicore Triangle Computations Without Tuning Julian Shun and Kanat Tangwongsan Presentation is based on paper published in International Conference on Data Engineering (ICDE), 2015 1 2 Triangle Computations

More information

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs Sungpack Hong 2, Nicole C. Rodia 1, and Kunle Olukotun 1 1 Pervasive Parallelism Laboratory, Stanford University

More information

Ligra: A Lightweight Graph Processing Framework for Shared Memory

Ligra: A Lightweight Graph Processing Framework for Shared Memory Ligra: A Lightweight Graph Processing Framework for Shared Memory Julian Shun Carnegie Mellon University jshun@cs.cmu.edu Guy E. Blelloch Carnegie Mellon University guyb@cs.cmu.edu Abstract There has been

More information

Parallel Breadth First Search

Parallel Breadth First Search CSE341T/CSE549T 11/03/2014 Lecture 18 Parallel Breadth First Search Today, we will look at a basic graph algorithm, breadth first search (BFS). BFS can be applied to solve a variety of problems including:

More information

Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs

Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs Sungpack Hong 2, Nicole C. Rodia 1, and Kunle Olukotun 1 1 Pervasive Parallelism Laboratory, Stanford University 2 Oracle

More information

2. True or false: even though BFS and DFS have the same space complexity, they do not always have the same worst case asymptotic time complexity.

2. True or false: even though BFS and DFS have the same space complexity, they do not always have the same worst case asymptotic time complexity. 1. T F: Consider a directed graph G = (V, E) and a vertex s V. Suppose that for all v V, there exists a directed path in G from s to v. Suppose that a DFS is run on G, starting from s. Then, true or false:

More information

Graph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web

Graph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some

More information

SociaLite: A Datalog-based Language for

SociaLite: A Datalog-based Language for SociaLite: A Datalog-based Language for Large-Scale Graph Analysis Jiwon Seo M OBIS OCIAL RESEARCH GROUP Overview Overview! SociaLite: language for large-scale graph analysis! Extensions to Datalog! Compiler

More information

GraFBoost: Using accelerated flash storage for external graph analytics

GraFBoost: Using accelerated flash storage for external graph analytics GraFBoost: Using accelerated flash storage for external graph analytics Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu and Arvind MIT CSAIL Funded by: 1 Large Graphs are Found Everywhere in Nature

More information

High-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock

High-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock High-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock Yangzihao Wang University of California, Davis yzhwang@ucdavis.edu March 24, 2014 Yangzihao Wang (yzhwang@ucdavis.edu)

More information

Domain-specific Programming on Graphs

Domain-specific Programming on Graphs Lecture 16: Domain-specific Programming on Graphs Parallel Computer Architecture and Programming Last time: Increasing acceptance of domain-specific programming systems Challenge to programmers: modern

More information

Big Data Analytics Using Hardware-Accelerated Flash Storage

Big Data Analytics Using Hardware-Accelerated Flash Storage Big Data Analytics Using Hardware-Accelerated Flash Storage Sang-Woo Jun University of California, Irvine (Work done while at MIT) Flash Memory Summit, 2018 A Big Data Application: Personalized Genome

More information

Domain-specific programming on graphs

Domain-specific programming on graphs Lecture 25: Domain-specific programming on graphs Parallel Computer Architecture and Programming CMU 15-418/15-618, Fall 2016 1 Last time: Increasing acceptance of domain-specific programming systems Challenge

More information

GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. Xiaowei ZHU Tsinghua University

GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. Xiaowei ZHU Tsinghua University GridGraph: Large-Scale Graph Processing on a Single Machine Using -Level Hierarchical Partitioning Xiaowei ZHU Tsinghua University Widely-Used Graph Processing Shared memory Single-node & in-memory Ligra,

More information

BFS and Coloring-based Parallel Algorithms for Strongly Connected Components and Related Problems

BFS and Coloring-based Parallel Algorithms for Strongly Connected Components and Related Problems 20 May 2014 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy

More information

Parallel Local Graph Clustering

Parallel Local Graph Clustering Parallel Local Graph Clustering Kimon Fountoulakis, joint work with J. Shun, X. Cheng, F. Roosta-Khorasani, M. Mahoney, D. Gleich University of California Berkeley and Purdue University Based on J. Shun,

More information

Multicore Triangle Computations Without Tuning

Multicore Triangle Computations Without Tuning Multicore Triangle Computations Without Tuning Julian Shun, Kanat Tangwongsan 2 Computer Science Department, Carnegie Mellon University, USA 2 Computer Science Program, Mahidol University International

More information

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs Da Zheng, Disa Mhembere, Randal Burns Department of Computer Science Johns Hopkins University Carey E. Priebe Department of Applied

More information

L22-23: Graph Algorithms

L22-23: Graph Algorithms Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत Department of Computational and Data Sciences DS 0--0,0 L-: Graph Algorithms Yogesh Simmhan simmhan@cds.iisc.ac.in Slides

More information

6.886: Algorithm Engineering

6.886: Algorithm Engineering 6.886: Algorithm Engineering LECTURE 2 PARALLEL ALGORITHMS Julian Shun February 7, 2019 Lecture material taken from Parallel Algorithms by Guy E. Blelloch and Bruce M. Maggs and 6.172 by Charles Leiserson

More information

Parallel Local Graph Clustering

Parallel Local Graph Clustering Parallel Local Graph Clustering Julian Shun Farbod Roosta-Khorasani Kimon Fountoulakis Michael W. Mahoney UC Berkeley International Computer Science Institute { jshun@eecs, farbod@icsi, kfount@icsi, mmahoney@stat

More information

CS61BL. Lecture 5: Graphs Sorting

CS61BL. Lecture 5: Graphs Sorting CS61BL Lecture 5: Graphs Sorting Graphs Graphs Edge Vertex Graphs (Undirected) Graphs (Directed) Graphs (Multigraph) Graphs (Acyclic) Graphs (Cyclic) Graphs (Connected) Graphs (Disconnected) Graphs (Unweighted)

More information

Parallel Shortest Path Algorithms for Solving Large-Scale Instances

Parallel Shortest Path Algorithms for Solving Large-Scale Instances Parallel Shortest Path Algorithms for Solving Large-Scale Instances Kamesh Madduri Georgia Institute of Technology kamesh@cc.gatech.edu Jonathan W. Berry Sandia National Laboratories jberry@sandia.gov

More information

Scalable GPU Graph Traversal!

Scalable GPU Graph Traversal! Scalable GPU Graph Traversal Duane Merrill, Michael Garland, and Andrew Grimshaw PPoPP '12 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming Benwen Zhang

More information

Optimizing Cache Performance for Graph Analytics. Yunming Zhang Presentation

Optimizing Cache Performance for Graph Analytics. Yunming Zhang Presentation Optimizing Cache Performance for Graph Analytics Yunming Zhang 6.886 Presentation Goals How to optimize in-memory graph applications How to go about performance engineering just about anything In-memory

More information

A Lightweight Infrastructure for Graph Analytics

A Lightweight Infrastructure for Graph Analytics A Lightweight Infrastructure for Graph Analytics Donald Nguyen, Andrew Lenharth and Keshav Pingali The University of Texas at Austin, Texas, USA {ddn@cs, lenharth@ices, pingali@cs}.utexas.edu Abstract

More information

CS 3410 Ch 14 Graphs and Paths

CS 3410 Ch 14 Graphs and Paths CS 3410 Ch 14 Graphs and Paths Sections 14.1-14.3 Pages 527-552 Exercises 1,2,5,7-9 14.1 Definitions 1. A vertex (node) and an edge are the basic building blocks of a graph. Two vertices, (, ) may be connected

More information

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader A New Parallel Algorithm for Connected Components in Dynamic Graphs Robert McColl Oded Green David Bader Overview The Problem Target Datasets Prior Work Parent-Neighbor Subgraph Results Conclusions Problem

More information

Weighted Graph Algorithms Presented by Jason Yuan

Weighted Graph Algorithms Presented by Jason Yuan Weighted Graph Algorithms Presented by Jason Yuan Slides: Zachary Friggstad Programming Club Meeting Weighted Graphs struct Edge { int u, v ; int w e i g h t ; // can be a double } ; Edge ( int uu = 0,

More information

Domain-specific programming on graphs

Domain-specific programming on graphs Lecture 23: Domain-specific programming on graphs Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2014 Tunes Alt-J Tessellate (An Awesome Wave) We wrote the lyrics to Tessellate

More information

CS261: Problem Set #1

CS261: Problem Set #1 CS261: Problem Set #1 Due by 11:59 PM on Tuesday, April 21, 2015 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Turn in your solutions by

More information

Wedge A New Frontier for Pull-based Graph Processing. Samuel Grossman and Christos Kozyrakis Platform Lab Retreat June 8, 2018

Wedge A New Frontier for Pull-based Graph Processing. Samuel Grossman and Christos Kozyrakis Platform Lab Retreat June 8, 2018 Wedge A New Frontier for Pull-based Graph Processing Samuel Grossman and Christos Kozyrakis Platform Lab Retreat June 8, 2018 Graph Processing Problems modelled as objects (vertices) and connections between

More information

A Top-Down Parallel Semisort

A Top-Down Parallel Semisort A Top-Down Parallel Semisort Yan Gu Carnegie Mellon University yan.gu@cs.cmu.edu Julian Shun Carnegie Mellon University jshun@cs.cmu.edu Guy E. Blelloch Carnegie Mellon University guyb@cs.cmu.edu Yihan

More information

Graph traversal and BFS

Graph traversal and BFS Graph traversal and BFS Fundamental building block Graph traversal is part of many important tasks Connected components Tree/Cycle detection Articulation vertex finding Real-world applications Peer-to-peer

More information

Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System

Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Seunghwa Kang David A. Bader 1 A Challenge Problem Extracting a subgraph from

More information

LITERATURE REVIEW: Parallel algorithms for the maximal independent set problem in graphs

LITERATURE REVIEW: Parallel algorithms for the maximal independent set problem in graphs LITERATURE REVIEW: Parallel algorithms for the maximal independent set problem in graphs Luis Barba October 16, 2012 1 Introduction Let G = (V, E) be an undirected graph. An independent set of G is a subset

More information

1 Optimizing parallel iterative graph computation

1 Optimizing parallel iterative graph computation May 15, 2012 1 Optimizing parallel iterative graph computation I propose to develop a deterministic parallel framework for performing iterative computation on a graph which schedules work on vertices based

More information

Parallel Pointers: Graphs

Parallel Pointers: Graphs Parallel Pointers: Graphs Breadth first Search (BFS) 101 Given a vertex S, what is the distance of all the other vertices from S? Why would this be useful? To find the fastest route through a network.

More information

A Study of Different Parallel Implementations of Single Source Shortest Path Algorithms

A Study of Different Parallel Implementations of Single Source Shortest Path Algorithms A Study of Different Parallel Implementations of Single Source Shortest Path s Dhirendra Pratap Singh Department of Computer Science and Engineering Maulana Azad National Institute of Technology, Bhopal

More information

Algorithm Design (8) Graph Algorithms 1/2

Algorithm Design (8) Graph Algorithms 1/2 Graph Algorithm Design (8) Graph Algorithms / Graph:, : A finite set of vertices (or nodes) : A finite set of edges (or arcs or branches) each of which connect two vertices Takashi Chikayama School of

More information

The GraphGrind Framework: Fast Graph Analytics on Large. Shared-Memory Systems

The GraphGrind Framework: Fast Graph Analytics on Large. Shared-Memory Systems Memory-Centric System Level Design of Heterogeneous Embedded DSP Systems The GraphGrind Framework: Scott Johan Fischaber, BSc (Hon) Fast Graph Analytics on Large Shared-Memory Systems Jiawen Sun, PhD candidate

More information

GIAN Course on Distributed Network Algorithms. Spanning Tree Constructions

GIAN Course on Distributed Network Algorithms. Spanning Tree Constructions GIAN Course on Distributed Network Algorithms Spanning Tree Constructions Stefan Schmid @ T-Labs, 2011 Spanning Trees Attactive infrastructure : sparse subgraph ( loop-free backbone ) connecting all nodes.

More information

CSC 373 Lecture # 3 Instructor: Milad Eftekhar

CSC 373 Lecture # 3 Instructor: Milad Eftekhar Huffman encoding: Assume a context is available (a document, a signal, etc.). These contexts are formed by some symbols (words in a document, discrete samples from a signal, etc). Each symbols s i is occurred

More information

Research Statement. Yihan Sun. Introduction. 1 Parallel Data Structures

Research Statement. Yihan Sun. Introduction. 1 Parallel Data Structures Introduction Research Statement Yihan Sun Modern data-driven applications aim to process, analyze, and discover valuable information from data with increasing volumes. These applications seek high-performance

More information

University of Illinois at Urbana-Champaign Department of Computer Science. Final Examination

University of Illinois at Urbana-Champaign Department of Computer Science. Final Examination University of Illinois at Urbana-Champaign Department of Computer Science Final Examination CS 225 Data Structures and Software Principles Fall 2009 7-10p, Tuesday, December 15 Name: NetID: Lab Section

More information

GPU Sparse Graph Traversal. Duane Merrill

GPU Sparse Graph Traversal. Duane Merrill GPU Sparse Graph Traversal Duane Merrill Breadth-first search of graphs (BFS) 1. Pick a source node 2. Rank every vertex by the length of shortest path from source Or label every vertex by its predecessor

More information

GIAN Course on Distributed Network Algorithms. Spanning Tree Constructions

GIAN Course on Distributed Network Algorithms. Spanning Tree Constructions GIAN Course on Distributed Network Algorithms Spanning Tree Constructions Stefan Schmid @ T-Labs, 2011 Spanning Trees Attactive infrastructure : sparse subgraph ( loop-free backbone ) connecting all nodes.

More information

Performance analysis of -stepping algorithm on CPU and GPU

Performance analysis of -stepping algorithm on CPU and GPU Performance analysis of -stepping algorithm on CPU and GPU Dmitry Lesnikov 1 dslesnikov@gmail.com Mikhail Chernoskutov 1,2 mach@imm.uran.ru 1 Ural Federal University (Yekaterinburg, Russia) 2 Krasovskii

More information

CS350: Data Structures Dijkstra s Shortest Path Alg.

CS350: Data Structures Dijkstra s Shortest Path Alg. Dijkstra s Shortest Path Alg. James Moscola Department of Engineering & Computer Science York College of Pennsylvania James Moscola Shortest Path Algorithms Several different shortest path algorithms exist

More information

Simple Parallel Biconnectivity Algorithms for Multicore Platforms

Simple Parallel Biconnectivity Algorithms for Multicore Platforms Simple Parallel Biconnectivity Algorithms for Multicore Platforms George M. Slota Kamesh Madduri The Pennsylvania State University HiPC 2014 December 17-20, 2014 Code, presentation available at graphanalysis.info

More information

Report. X-Stream Edge-centric Graph processing

Report. X-Stream Edge-centric Graph processing Report X-Stream Edge-centric Graph processing Yassin Hassan hassany@student.ethz.ch Abstract. X-Stream is an edge-centric graph processing system, which provides an API for scatter gather algorithms. The

More information

Domain-specific Programming on Graphs

Domain-specific Programming on Graphs Lecture 21: Domain-specific Programming on Graphs Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Tunes The Love Me Nots Make Up Your Mind (The Demon and the Devotee) Music

More information

The ADT priority queue Orders its items by a priority value The first item removed is the one having the highest priority value

The ADT priority queue Orders its items by a priority value The first item removed is the one having the highest priority value The ADT priority queue Orders its items by a priority value The first item removed is the one having the highest priority value 1 Possible implementations Sorted linear implementations o Appropriate if

More information

University of Illinois at Urbana-Champaign Department of Computer Science. Final Examination

University of Illinois at Urbana-Champaign Department of Computer Science. Final Examination University of Illinois at Urbana-Champaign Department of Computer Science Final Examination CS 225 Data Structures and Software Principles Spring 2010 7-10p, Wednesday, May 12 Name: NetID: Lab Section

More information

Problem set 2. Problem 1. Problem 2. Problem 3. CS261, Winter Instructor: Ashish Goel.

Problem set 2. Problem 1. Problem 2. Problem 3. CS261, Winter Instructor: Ashish Goel. CS261, Winter 2017. Instructor: Ashish Goel. Problem set 2 Electronic submission to Gradescope due 11:59pm Thursday 2/16. Form a group of 2-3 students that is, submit one homework with all of your names.

More information

Why do we need graph processing?

Why do we need graph processing? Why do we need graph processing? Community detection: suggest followers? Determine what products people will like Count how many people are in different communities (polling?) Graphs are Everywhere Group

More information

GPU Sparse Graph Traversal

GPU Sparse Graph Traversal GPU Sparse Graph Traversal Duane Merrill (NVIDIA) Michael Garland (NVIDIA) Andrew Grimshaw (Univ. of Virginia) UNIVERSITY of VIRGINIA Breadth-first search (BFS) 1. Pick a source node 2. Rank every vertex

More information

Parallel Shortest Path Algorithms for Solving Large-Scale Instances

Parallel Shortest Path Algorithms for Solving Large-Scale Instances Parallel Shortest Path Algorithms for Solving Large-Scale Instances Kamesh Madduri Georgia Institute of Technology kamesh@cc.gatech.edu Jonathan W. Berry Sandia National Laboratories jberry@sandia.gov

More information

Recitation 13. Minimum Spanning Trees Announcements. SegmentLab has been released, and is due Friday, November 17. It s worth 135 points.

Recitation 13. Minimum Spanning Trees Announcements. SegmentLab has been released, and is due Friday, November 17. It s worth 135 points. Recitation 13 Minimum Spanning Trees 13.1 Announcements SegmentLab has been released, and is due Friday, November 17. It s worth 135 points. 73 74 RECITATION 13. MINIMUM SPANNING TREES 13.2 Prim s Algorithm

More information

Irregular Graph Algorithms on Parallel Processing Systems

Irregular Graph Algorithms on Parallel Processing Systems Irregular Graph Algorithms on Parallel Processing Systems George M. Slota 1,2 Kamesh Madduri 1 (advisor) Sivasankaran Rajamanickam 2 (Sandia mentor) 1 Penn State University, 2 Sandia National Laboratories

More information

Distributed Maximal Matching: Greedy is Optimal

Distributed Maximal Matching: Greedy is Optimal Distributed Maximal Matching: Greedy is Optimal Jukka Suomela Helsinki Institute for Information Technology HIIT University of Helsinki Joint work with Juho Hirvonen Weimann Institute of Science, 7 November

More information

arxiv: v1 [cs.db] 21 Oct 2017

arxiv: v1 [cs.db] 21 Oct 2017 : High-performance external graph analytics Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu and Arvind Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology

More information

CS 161 Lecture 11 BFS, Dijkstra s algorithm Jessica Su (some parts copied from CLRS) 1 Review

CS 161 Lecture 11 BFS, Dijkstra s algorithm Jessica Su (some parts copied from CLRS) 1 Review 1 Review 1 Something I did not emphasize enough last time is that during the execution of depth-firstsearch, we construct depth-first-search trees. One graph may have multiple depth-firstsearch trees,

More information

Minimum-Spanning-Tree problem. Minimum Spanning Trees (Forests) Minimum-Spanning-Tree problem

Minimum-Spanning-Tree problem. Minimum Spanning Trees (Forests) Minimum-Spanning-Tree problem Minimum Spanning Trees (Forests) Given an undirected graph G=(V,E) with each edge e having a weight w(e) : Find a subgraph T of G of minimum total weight s.t. every pair of vertices connected in G are

More information

Mosaic: Processing a Trillion-Edge Graph on a Single Machine

Mosaic: Processing a Trillion-Edge Graph on a Single Machine Mosaic: Processing a Trillion-Edge Graph on a Single Machine Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, Taesoo Kim Georgia Institute of Technology Best Student Paper @ EuroSys

More information

Extreme-scale Graph Analysis on Blue Waters

Extreme-scale Graph Analysis on Blue Waters Extreme-scale Graph Analysis on Blue Waters 2016 Blue Waters Symposium George M. Slota 1,2, Siva Rajamanickam 1, Kamesh Madduri 2, Karen Devine 1 1 Sandia National Laboratories a 2 The Pennsylvania State

More information

FINAL EXAM SOLUTIONS

FINAL EXAM SOLUTIONS COMP/MATH 3804 Design and Analysis of Algorithms I Fall 2015 FINAL EXAM SOLUTIONS Question 1 (12%). Modify Euclid s algorithm as follows. function Newclid(a,b) if a

More information

Graph Data Management

Graph Data Management Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of

More information

Data Analytics on RAMCloud

Data Analytics on RAMCloud Data Analytics on RAMCloud Jonathan Ellithorpe jdellit@stanford.edu Abstract MapReduce [1] has already become the canonical method for doing large scale data processing. However, for many algorithms including

More information

Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations

Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations George M. Slota 1 Sivasankaran Rajamanickam 2 Kamesh Madduri 3 1 Rensselaer Polytechnic Institute, 2 Sandia National

More information

Parallel Write-Efficient Algorithms and Data Structures for Computational Geometry

Parallel Write-Efficient Algorithms and Data Structures for Computational Geometry Parallel Write-Efficient Algorithms and Data Structures for Computational Geometry Guy E. Blelloch Carnegie Mellon University guyb@cs.cmu.edu Yan Gu Carnegie Mellon University yan.gu@cs.cmu.edu Julian

More information

Direct Addressing Hash table: Collision resolution how handle collisions Hash Functions:

Direct Addressing Hash table: Collision resolution how handle collisions Hash Functions: Direct Addressing - key is index into array => O(1) lookup Hash table: -hash function maps key to index in table -if universe of keys > # table entries then hash functions collision are guaranteed => need

More information

Parallel Graph Algorithms. Richard Peng Georgia Tech

Parallel Graph Algorithms. Richard Peng Georgia Tech Parallel Graph Algorithms Richard Peng Georgia Tech OUTLINE Model and problems Graph decompositions Randomized clusterings Interface with optimization THE MODEL The `scale up approach: Have a number of

More information

Chapter 14. Graphs Pearson Addison-Wesley. All rights reserved 14 A-1

Chapter 14. Graphs Pearson Addison-Wesley. All rights reserved 14 A-1 Chapter 14 Graphs 2011 Pearson Addison-Wesley. All rights reserved 14 A-1 Terminology G = {V, E} A graph G consists of two sets A set V of vertices, or nodes A set E of edges A subgraph Consists of a subset

More information

10/31/18. About A6, Prelim 2. Spanning Trees, greedy algorithms. Facts about trees. Undirected trees

10/31/18. About A6, Prelim 2. Spanning Trees, greedy algorithms. Facts about trees. Undirected trees //8 About A, Prelim Spanning Trees, greedy algorithms Lecture CS Fall 8 Prelim : Thursday, November. Visit exams page of course website and read carefully to find out when you take it (: or 7:) and what

More information

Spanning Trees, greedy algorithms. Lecture 20 CS2110 Fall 2018

Spanning Trees, greedy algorithms. Lecture 20 CS2110 Fall 2018 1 Spanning Trees, greedy algorithms Lecture 20 CS2110 Fall 2018 1 About A6, Prelim 2 Prelim 2: Thursday, 15 November. Visit exams page of course website and read carefully to find out when you take it

More information

Lecture 5. Figure 1: The colored edges indicate the member edges of several maximal matchings of the given graph.

Lecture 5. Figure 1: The colored edges indicate the member edges of several maximal matchings of the given graph. 5.859-Z Algorithmic Superpower Randomization September 25, 204 Lecture 5 Lecturer: Bernhard Haeupler Scribe: Neil Shah Overview The next 2 lectures are on the topic of distributed computing there are lots

More information

Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms

Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Introduction to Algorithms Preface xiii 1 Introduction 1 1.1 Algorithms 1 1.2 Analyzing algorithms 6 1.3 Designing algorithms 1 1 1.4 Summary 1 6

More information

Planar: Parallel Lightweight Architecture-Aware Adaptive Graph Repartitioning

Planar: Parallel Lightweight Architecture-Aware Adaptive Graph Repartitioning Planar: Parallel Lightweight Architecture-Aware Adaptive Graph Repartitioning Angen Zheng, Alexandros Labrinidis, and Panos K. Chrysanthis University of Pittsburgh 1 Graph Partitioning Applications of

More information

CS 161 Fall 2015 Final Exam

CS 161 Fall 2015 Final Exam CS 161 Fall 2015 Final Exam Name: Student ID: 1: 2: 3: 4: 5: 6: 7: 8: Total: 1. (15 points) Let H = [24, 21, 18, 15, 12, 9, 6, 3] be an array of eight numbers, interpreted as a binary heap with the maximum

More information

Distributed Graph Algorithms for Planar Networks

Distributed Graph Algorithms for Planar Networks Bernhard Haeupler CMU joint work with Mohsen Ghaffari (MIT) ADGA, Austin, October 12th 2014 Distributed: CONGEST(log n) model Distributed: CONGEST(log n) model Network Optimization Problems: shortest path

More information

Fast and Scalable Analysis of Massive Social Graphs

Fast and Scalable Analysis of Massive Social Graphs Fast and Scalable Analysis of Massive Social Graphs Xiaohan Zhao, Alessandra Sala, Haitao Zheng and Ben Y. Zhao Department of Computer Science, U. C. Santa Barbara, Santa Barbara, CA USA {xiaohanzhao,

More information

Topology-Based Spam Avoidance in Large-Scale Web Crawls

Topology-Based Spam Avoidance in Large-Scale Web Crawls Topology-Based Spam Avoidance in Large-Scale Web Crawls Clint Sparkman Joint work with Hsin-Tsang Lee and Dmitri Loguinov Internet Research Lab Department of Computer Science and Engineering Texas A&M

More information

RStream:Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine

RStream:Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine RStream:Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine Guoqing Harry Xu Kai Wang, Zhiqiang Zuo, John Thorpe, Tien Quang Nguyen, UCLA Nanjing University Facebook

More information

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely

More information

Lecture Notes. char myarray [ ] = {0, 0, 0, 0, 0 } ; The memory diagram associated with the array can be drawn like this

Lecture Notes. char myarray [ ] = {0, 0, 0, 0, 0 } ; The memory diagram associated with the array can be drawn like this Lecture Notes Array Review An array in C++ is a contiguous block of memory. Since a char is 1 byte, then an array of 5 chars is 5 bytes. For example, if you execute the following C++ code you will allocate

More information

PREGEL AND GIRAPH. Why Pregel? Processing large graph problems is challenging Options

PREGEL AND GIRAPH. Why Pregel? Processing large graph problems is challenging Options Data Management in the Cloud PREGEL AND GIRAPH Thanks to Kristin Tufte 1 Why Pregel? Processing large graph problems is challenging Options Custom distributed infrastructure Existing distributed computing

More information

Solving problems on graph algorithms

Solving problems on graph algorithms Solving problems on graph algorithms Workshop Organized by: ACM Unit, Indian Statistical Institute, Kolkata. Tutorial-3 Date: 06.07.2017 Let G = (V, E) be an undirected graph. For a vertex v V, G {v} is

More information

An Energy-Efficient Abstraction for Simultaneous Breadth-First Searches. Adam McLaughlin, Jason Riedy, and David A. Bader

An Energy-Efficient Abstraction for Simultaneous Breadth-First Searches. Adam McLaughlin, Jason Riedy, and David A. Bader An Energy-Efficient Abstraction for Simultaneous Breadth-First Searches Adam McLaughlin, Jason Riedy, and David A. Bader Problem Data is unstructured, heterogeneous, and vast Serious opportunities for

More information

Domain-specific programming on graphs

Domain-specific programming on graphs Lecture 23: Domain-specific programming on graphs Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Alt-J Tessellate (An Awesome Wave) We wrote the lyrics to Tessellate

More information

SPANNING TREES. Lecture 21 CS2110 Spring 2016

SPANNING TREES. Lecture 21 CS2110 Spring 2016 1 SPANNING TREES Lecture 1 CS110 Spring 016 Spanning trees What we do today: Calculating the shortest path in Dijkstra s algorithm Look at time complexity of shortest path Definitions Minimum spanning

More information

A Comparative Study on Exact Triangle Counting Algorithms on the GPU

A Comparative Study on Exact Triangle Counting Algorithms on the GPU A Comparative Study on Exact Triangle Counting Algorithms on the GPU Leyuan Wang, Yangzihao Wang, Carl Yang, John D. Owens University of California, Davis, CA, USA 31 st May 2016 L. Wang, Y. Wang, C. Yang,

More information