Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing

Size: px

Start display at page:

Download "Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing"

Barrie Barrett
5 years ago
Views:

1 Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing Laxman Dhulipala Joint work with Guy Blelloch and Julian Shun SPAA 07

2 Giant graph datasets Graph V E (symmetrized) com-orkut M 4M Twitter 4M.46B Friendster 4M.6B Hyperlink0-Host 0M.04B Facebook (0) 7M 68.4B Hyperlink04.7B 4B Hyperlink0.5B 5B Facebook (07) > B > 00B Google (07)?? : Publicly available graphs used in our experiments : Private graph datasets

3 Traditional approaches One possible way to solve large graph problems: Hand-write MPI/OpenMP/Cilk codes Run a powerful machine or a large cluster

4 Traditional approaches One possible way to solve large graph problems: Hand-write MPI/OpenMP/Cilk codes Run a powerful machine or a large cluster Benefits Good performance Can hand-code custom optimizations

5 Traditional approaches One possible way to solve large graph problems: Hand-write MPI/OpenMP/Cilk codes Run a powerful machine or a large cluster Benefits Good performance Can hand-code custom optimizations Downsides Usually require a lot of code Need lots of expertise to write and understand codes Not everyone has a supercomputer

6 Graph processing frameworks High level goals Simple set of primitives (interface) Implementations easy to write and understand Algorithms can handle very large graphs

7 Graph processing frameworks High level goals Simple set of primitives (interface) Implementations easy to write and understand Algorithms can handle very large graphs Ex: Pregel, GraphLab, Ligra, GraphX, GraphChi

8 Graph processing frameworks High level goals Simple set of primitives (interface) Implementations easy to write and understand Algorithms can handle very large graphs Ex: Pregel, GraphLab, Ligra, GraphX, GraphChi Additional goals: Primitives have theoretical guarantees Support common optimizations under the hood Implementations competitive with non-framework codes

9 Graph processing frameworks High level goals Simple set of primitives (interface) Implementations easy to write and understand Algorithms can handle very large graphs Ex: Pregel, GraphLab, Ligra, GraphX, GraphChi Additional goals: Primitives have theoretical guarantees Support common optimizations under the hood Implementations competitive with non-framework codes Our goals: All of the above on a single affordable shared memory machine

10 An affordable machine

11 An affordable machine Dell PowerEdge R90 7-cores (4 x.4ghz 8-core E v4 Xeon processors) TB of main memory Costs less than a mid-range BMW

12 An affordable machine Dell PowerEdge R90 7-cores (4 x.4ghz 8-core E v4 Xeon processors) TB of main memory Costs less than a mid-range BMW

13 Ligra Shared memory graph processing framework [] [] Shun and Blelloch, 0, Ligra: A Lightweight Graph Processing Framework for Shared Memory [] Shun, Dhulipala and Blelloch, 0, Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+

14 Ligra Shared memory graph processing framework [] Benefits Designed to express frontier-based algorithms Primitives and implementations have theoretical guarantees Optimizations (direction-optimizing, compression []) Implementations are simple to write and understand Competitive with hand-tuned codes [] Shun and Blelloch, 0, Ligra: A Lightweight Graph Processing Framework for Shared Memory [] Shun, Dhulipala and Blelloch, 0, Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+

15 Ligra Shared memory graph processing framework [] Benefits Designed to express frontier-based algorithms Primitives and implementations have theoretical guarantees Optimizations (direction-optimizing, compression []) Implementations are simple to write and understand Competitive with hand-tuned codes Downsides Some algorithms may not be efficiently implementable [] Shun and Blelloch, 0, Ligra: A Lightweight Graph Processing Framework for Shared Memory [] Shun, Dhulipala and Blelloch, 0, Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+

16 Ligra Shared memory graph processing framework [] Benefits Designed to express frontier-based algorithms Primitives and implementations have theoretical guarantees Optimizations (direction-optimizing, compression []) Implementations are simple to write and understand Competitive with hand-tuned codes Downsides Some algorithms may not be efficiently implementable This work: Made Ligra codes run on the largest publicly available graphs on a single machine [] Shun and Blelloch, 0, Ligra: A Lightweight Graph Processing Framework for Shared Memory [] Shun, Dhulipala and Blelloch, 0, Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+

17 Ligra: Frontier-based algorithms Primitives Frontier data-structure (vertexsubset) Map over vertices in a frontier Map over out-edges of a frontier

18 Ligra: Frontier-based algorithms Primitives Frontier data-structure (vertexsubset) Map over vertices in a frontier Map over out-edges of a frontier Example: Breadth-First Search

19 Ligra: Frontier-based algorithms Primitives Frontier data-structure (vertexsubset) Map over vertices in a frontier Map over out-edges of a frontier Example: Breadth-First Search Round : in frontier : unvisited : visited

20 Ligra: Frontier-based algorithms Primitives Frontier data-structure (vertexsubset) Map over vertices in a frontier Map over out-edges of a frontier Example: Breadth-First Search Round Round : in frontier : unvisited : visited

21 Ligra: Frontier-based algorithms Primitives Frontier data-structure (vertexsubset) Map over vertices in a frontier Map over out-edges of a frontier Example: Breadth-First Search Round Round Round : in frontier : unvisited : visited

22 Ligra: Frontier-based algorithms Primitives Frontier data-structure (vertexsubset) Map over vertices in a frontier Map over out-edges of a frontier Example: Breadth-First Search Round Round Round Round 4 : in frontier : unvisited : visited

23 Ligra: Frontier-based algorithms Primitives Frontier data-structure (vertexsubset) Map over vertices in a frontier Map over out-edges of a frontier Example: Breadth-First Search Round Round Round Round 4 : in frontier : unvisited : visited Some useful graph algorithms cannot be efficiently implemented in frontier-based frameworks

24 Example: Weighted Breadth-First Search Given: G =(V,E,w) with positive integer edge weights, s V Problem: Compute the shortest path distances from s s

25 Example: Weighted Breadth-First Search Given: G =(V,E,w) with positive integer edge weights, s V Problem: Compute the shortest path distances from s Frontier-based: On each step, visit all neighbors that had their distance decrease s

26 Example: Weighted Breadth-First Search s Frontier: s Round

27 Example: Weighted Breadth-First Search s Frontier: s Round

28 Example: Weighted Breadth-First Search s Frontier: 4 Round

29 Example: Weighted Breadth-First Search s Frontier: 4 Round

30 Example: Weighted Breadth-First Search s Frontier: Round

31 Example: Weighted Breadth-First Search s Frontier: Round

32 Example: Weighted Breadth-First Search s Frontier: Not work-efficient! Round

33 Sequential Weighted Breadth-First Search s Idea: Run Dijkstra s algorithm, but use buckets instead of a PQ Represent buckets using dynamic arrays Simple, efficient implementation running in O(D + E ) work

34 Sequential Weighted Breadth-First Search s Round s

35 Sequential Weighted Breadth-First Search s Round s

36 Sequential Weighted Breadth-First Search s Round s

37 Sequential Weighted Breadth-First Search s Round

38 Sequential Weighted Breadth-First Search s Round

39 Sequential Weighted Breadth-First Search s Round

40 Sequential Weighted Breadth-First Search s Round

41 Sequential Weighted Breadth-First Search s Round

42 Sequential Weighted Breadth-First Search s Round

43 Sequential Weighted Breadth-First Search s Round

44 Sequential Weighted Breadth-First Search s O(D + E ) work where D is the graph diameter Round

45 Bucketing The algorithm uses buckets to organize work for future iterations

46 Bucketing The algorithm uses buckets to organize work for future iterations

47 Bucketing The algorithm uses buckets to organize work for future iterations This algorithms is actually parallelizable In each step:. Process all vertices in the next bucket in parallel. Update buckets of neighbors in parallel

48 Sequential Weighted Breadth-First Search s Round Sequential: process vertices one by one

49 Parallel Weighted Breadth-First Search s Round () Process vertices in the same bucket in parallel

50 Parallel Weighted Breadth-First Search s Round () Insert neighbors into buckets in parallel

51 Parallel Weighted Breadth-First Search s 5 Resulting algorithm performs: O(D 4 + E ) work O(D log V ) depth (assuming efficient bucketing) Round () Insert neighbors into buckets in parallel

52 Parallel bucketing Bucketing is useful for more than just wbfs k-core (coreness) Delta-Stepping Parallel Approximate Set Cover

53 Parallel bucketing Bucketing is useful for more than just wbfs k-core (coreness) Delta-Stepping Parallel Approximate Set Cover Goals Simplify expressing algorithms using an interface Theoretically efficient, reusable implementation

54 Parallel bucketing Bucketing is useful for more than just wbfs k-core (coreness) Delta-Stepping Parallel Approximate Set Cover Goals Simplify expressing algorithms using an interface Theoretically efficient, reusable implementation Difficulties. Multiple vertices insert into the same bucket in parallel. Possible to make work-efficient parallel implementations?

55 Results: Julienne Shared memory framework for bucketing-based algorithms

56 Results: Julienne Shared memory framework for bucketing-based algorithms Extend Ligra with an interface for bucketing Theoretical bounds for primitives Fast implementations of primitives

57 Results: Julienne Shared memory framework for bucketing-based algorithms Extend Ligra with an interface for bucketing Theoretical bounds for primitives Fast implementations of primitives Can implement a bucketing algorithm with n vertices T total buckets U updates over K Update calls, and L calls to NextBucket O(n + T + U) expected work and O((K + L) log n) depth w.h.p.

58 Results: Julienne Shared memory framework for bucketing-based algorithms Extend Ligra with an interface for bucketing Theoretical bounds for primitives Fast implementations of primitives Can implement a bucketing algorithm with n vertices T total buckets U updates over K Update calls, and L calls to NextBucket Bucketing O(n + implementation T + U) expected is work-efficient and O((K + L) log n) depth w.h.p.

59 Results: Julienne Work-efficient implementations of 4 bucketing-based algorithms: k-core (coreness) Weighted Breadth-First Search Delta-Stepping Parallel Approximate Set Cover

60 Results: Julienne Work-efficient implementations of 4 bucketing-based algorithms: k-core (coreness) Weighted Breadth-First Search Delta-Stepping Parallel Approximate Set Cover Codes are simple All implementations < 00 LoC

61 Results: Julienne Work-efficient implementations of 4 bucketing-based algorithms: k-core (coreness) Weighted Breadth-First Search Delta-Stepping Parallel Approximate Set Cover Codes are simple All implementations < 00 LoC Codes competitive with, or outperform existing implementations

62 Results: Julienne Work-efficient implementations of 4 bucketing-based algorithms: k-core (coreness) Weighted Breadth-First Search Delta-Stepping Parallel Approximate Set Cover Codes are simple All implementations < 00 LoC Codes competitive with, or outperform existing implementations First work-efficient k-core algorithm with non-trivial parallelism

63 Results: Julienne Work-efficient implementations of 4 bucketing-based algorithms: k-core (coreness) Weighted Breadth-First Search Delta-Stepping Parallel Approximate Set Cover Codes are simple All implementations < 00 LoC Codes competitive with, or outperform existing implementations First work-efficient k-core algorithm with non-trivial parallelism Compute k-cores of largest publicly available graph (~00B edges) in ~ minutes and approximate set-cover in ~ minutes

64 Julienne: Interface Julienne Bucketing Interface Ligra vertexsubset Graph

65 Julienne: Interface Julienne Bucketing Interface Bucketing Interface: () Create bucket structure Ligra vertexsubset Graph () Get the next bucket (vertexsubset) () Update buckets of a subset of identifiers

66 Julienne: Interface MakeBuckets : buckets n : int D : identifier bucket id O : bucket order Initialize bucket structure

67 Julienne: Interface D() = 0,D() =,D() = 4,... MakeBuckets : buckets n : int D : identifier bucket id O : bucket order Initialize bucket structure

68 Julienne: Interface D() = 0,D() =,D() = 4,... MakeBuckets : buckets n : int D : identifier bucket id O : bucket order Initialize bucket structure

69 Julienne: Interface NextBucket : bucket Extract identifiers in the next non-empty bucket

70 Julienne: Interface Order: increasing NextBucket : bucket Extract identifiers in the next non-empty bucket

71 Julienne: Interface Order: increasing NextBucket : bucket Extract identifiers in the next non-empty bucket

72 Julienne: Interface Order: increasing 4 NextBucket : bucket Extract identifiers in the next non-empty bucket

73 Julienne: Interface UpdateBuckets k : int F : int (identifier, bucket dest) Update buckets for k identifiers

74 Julienne: Interface [(,), (7,), (6,)] UpdateBuckets k : int F : int (identifier, bucket dest) Update buckets for k identifiers

75 Julienne: Interface [(,), (7,), (6,)] UpdateBuckets k : int F : int (identifier, bucket dest) Update buckets for k identifiers

76 Julienne: Interface [(,), (7,), (6,)] UpdateBuckets k : int F : int (identifier, bucket dest) Update buckets for k identifiers

77 Julienne: Interface [(,), (7,), (6,)] UpdateBuckets k : int F : int (identifier, bucket dest) Update buckets for k identifiers

78 Sequential Bucketing Can implement sequential bucketing with: n identifiers T total buckets K calls to UpdateBuckets, where each updates the ids in S i in O(n + T + K S i ) work i=0

79 Sequential Bucketing Can implement sequential bucketing with: n identifiers T total buckets K calls to UpdateBuckets, where each updates the ids in S i in O(n + T + K S i ) work i=0 Implementation: Use dynamic arrays Update lazily

80 Parallel Bucketing Can implement parallel bucketing with: n identifiers T total buckets K calls to UpdateBuckets, where each updates the ids in S i L calls to NextBucket in O(n + T + K S i ) expected work and i=0 O((K + L) log n) depth w.h.p.

81 Parallel Bucketing Can implement parallel bucketing with: n identifiers T total buckets K calls to UpdateBuckets, where each updates the ids in S i L calls to NextBucket in O(n + T + K S i ) expected work and i=0 O((K + L) log n) depth w.h.p. Implementation: Use dynamic arrays MakeBuckets: call UpdateBuckets. NextBucket: parallel filter

82 Parallel Bucketing UpdateBuckets: Use work-efficient semisort [Gu et al. 05] Given k (key, value) pairs, semisorts in O(k) expected work and O(log k) depth w.h.p.

83 Parallel Bucketing UpdateBuckets: Use work-efficient semisort [Gu et al. 05] Given k (key, value) pairs, semisorts in O(k) expected work and O(log k) depth w.h.p. [(,9), (4,7),, (,), (,)]

84 Parallel Bucketing UpdateBuckets: Use work-efficient semisort [Gu et al. 05] Given k (key, value) pairs, semisorts in O(k) expected work and O(log k) depth w.h.p. [(,9), (4,7),, (,), (,)] [(,), (,), (7,),, (4,7), (6,7),, (,9)] All ids going to bucket

85 Parallel Bucketing UpdateBuckets: Use work-efficient semisort [Gu et al. 05] Given k (key, value) pairs, semisorts in O(k) expected work and O(log k) depth w.h.p. [(,9), (4,7),, (,), (,)] [(,), (,), (7,),, (4,7), (6,7),, (,9)] All ids going to bucket Prefix sum to compute #ids going to each bucket Resize buckets and inject all ids in parallel

86 Parallel Bucketing UpdateBuckets: Use work-efficient semisort [Gu et al. 05] Given k (key, value) pairs, semisorts in O(k) expected work and O(log k) depth w.h.p. [(,9), (4,7),, (,), (,)] [(,), (,), (7,),, (4,7), (6,7),, (,9)] All ids going to bucket Prefix sum to compute #ids going to each bucket Resize buckets and inject all ids in parallel Please see paper for details on practical implementation and optimizations

87 Example: k-core and coreness k-core : maximal connected subgraph of G s.t. all vertices have degree k (v) : largest k-core that v participates in

88 Example: k-core and coreness k-core : maximal connected subgraph of G s.t. all vertices have degree k (v) : largest k-core that v participates in a

89 Example: k-core and coreness k-core : maximal connected subgraph of G s.t. all vertices have degree k (v) : largest k-core that v participates in a -core

90 Example: k-core and coreness k-core : maximal connected subgraph of G s.t. all vertices have degree k (v) : largest k-core that v participates in -core a -core

91 Example: k-core and coreness k-core : maximal connected subgraph of G s.t. all vertices have degree k (v) : largest k-core that v participates in -core -core a -core

92 Example: k-core and coreness k-core : maximal connected subgraph of G s.t. all vertices have degree k (v) : largest k-core that v participates in -core (a) = -core a -core

93 Example: k-core and coreness k-core : maximal connected subgraph of G s.t. all vertices have degree k (v) : largest k-core that v participates in -core (a) = -core a -core Can efficiently compute k-cores after computing coreness

94 k-core and Coreness Sequential Peeling: Bucket sort vertices by degree Remove the minimum degree vertex, set its core number Update the buckets of its neighbors

95 k-core and Coreness Sequential Peeling: Bucket sort vertices by degree Remove the minimum degree vertex, set its core number Update the buckets of its neighbors Each vertex and edge is processed exactly once: W = O( E + V )

96 k-core and Coreness Sequential Peeling: Bucket sort vertices by degree Remove the minimum degree vertex, set its core number Update the buckets of its neighbors Each vertex and edge is processed exactly once: W = O( E + V ) Existing parallel algorithms: Scan all remaining vertices when computing each core

97 k-core and Coreness Sequential Peeling: Bucket sort vertices by degree Remove the minimum degree vertex, set its core number Update the buckets of its neighbors Each vertex and edge is processed exactly once: W = O( E + V ) Existing parallel algorithms: Scan all remaining vertices when computing each core = number of peeling steps done by the parallel algorithm W = O( E + V ) D = O( log V )

98 Work-efficient Peeling Insert vertices in bucket structure by degree

99 Work-efficient Peeling Insert vertices in bucket structure by degree

100 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:

101 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers

102 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers

103 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers

104 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier

105 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier () () ()

106 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier. Compute the new buckets for the neighbors () () ()

107 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier. Compute the new buckets for the neighbors () () () (0) () ()

108 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier. Compute the new buckets for the neighbors 4. Update the bucket structure with the (neighbors, buckets) () () () (0) () ()

109 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier. Compute the new buckets for the neighbors 4. Update the bucket structure with the (neighbors, buckets) () () () (0) () ()

110 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier. Compute the new buckets for the neighbors 4. Update the bucket structure with the (neighbors, buckets) () () () (0) () ()

111 Work-efficient Peeling Insert vertices in bucket structure by degree While not all vertices have been processed yet:. Extract the next bucket, set core numbers. Sum edges removed from each neighbor of this frontier. Compute the new buckets for the neighbors 4. Update the bucket structure with the (neighbors, buckets)

112 Work-efficient Peeling

113 Work-efficient Peeling We process each edge at most once in each direction:

114 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E )

115 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E ) # buckets V

116 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E ) # buckets V # calls to NextBucket =

117 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E ) # buckets V # calls to NextBucket = # calls to UpdateBuckets =

118 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E ) # buckets V # calls to NextBucket = # calls to UpdateBuckets = Therefore the algorithm runs in: O( E + V ) O( log V ) expected work depth w.h.p.

119 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E ) # buckets V # calls to NextBucket = # calls to UpdateBuckets = Therefore the algorithm runs in: O( E + V ) O( log V ) expected work depth w.h.p. On the largest graph we test on, = 0, 78

120 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E ) # buckets V # calls to NextBucket = # calls to UpdateBuckets = Therefore the algorithm runs in: O( E + V ) O( log V ) expected work depth w.h.p. On the largest graph we test on, = 0, 78 On 7 cores, our code finishes in a few minutes, but the work-inefficient algorithm does not terminate within hours

Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E ) # buckets V # calls to NextBucket = # calls to UpdateBuckets = Therefore the algorithm runs in: O( E + V

121 Work-efficient Peeling We process each edge at most once in each direction: # updates = O( E ) # buckets V # calls to NextBucket = # calls to UpdateBuckets = Therefore the algorithm runs in: O( E + V ) expected work O( log V ) depth w.h.p. On the largest graph we test on, = 0, 78 On 7 cores, our code finishes in a few minutes, but the work-inefficient algorithm does not terminate within hours Efficient peeling using Julienne

122 Summary of results Algorithm Work Depth k-core wbfs Delta-stepping Approx Set Cover O( E + V ) O( log V ) O(D + E ) O(D log V ) O(w ) O(d log V ) O(M) O(log M) [] [] : number of rounds of parallel peeling w,d D : diameter : work and number of rounds of the delta-stepping algorithm M : sum of sizes of sets [] Meyer, Sanders: Δ-stepping: a parallelizable shortest path algorithm [] Blelloch, Peng, Tangwongsan: Linear-work greedy parallel approximate set cover and variants

123 Experiments: k-core Running time (seconds) Julienne (work-efficient) Ligra (work-inefficient) Friendster V = M E =.6B h Number of threads Across all inputs: Between 4-4x speedup over sequential peeling Speedups are smaller on small graphs with large -9x faster than work-inefficient implementation

124 Experiments: Delta-stepping Running time (seconds) 00 0 Julienne Galois Gap Ligra (Bellman-Ford) Friendster V = M E =.6B h Number of threads Across all inputs: 8-x self-relative speedup, 7-0x speedup over DIMACS solver.-.7x faster than best existing implementation of Delta-Stepping.8-5.x faster than (work-inefficient) Bellman-Ford

125 Experiments: Hyperlink Graphs Hyperlink graphs extracted from Common Crawl Corpus Graph V E E (symmetrized) HL04.7B 64B 4B HL0.5B 8B 5B Previous analyses use supercomputers [] or external memory [] HL0-Sym requires ~TB of memory uncompressed [] Slota et al., 05, Supercomputing for Web Graph Analytics [] Zheng et al., 05, FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs

126 Experiments: Hyperlink Graphs Graph k-core wbfs Set Cover HL HL Running time in seconds on 7 cores with hyperthreading Able to process in main-memory of TB machine by compressing -4x speedup across applications Compression is crucial Julienne/Ligra codes run without any modifications Can t run other codes on these graphs without significant effort

127 Conclusion Julienne: framework for bucketing-based algorithms k-core Delta-stepping wbfs d d Parallel Approximate Set Cover

128 Conclusion Julienne: framework for bucketing-based algorithms Codes: Simple (< 00 lines each) Theoretically efficient Good performance in practice Code will be included as part of github.com/jshun/ligra Future work: Trusses, Nucleus Decomposition, Densest Subgraph k-core Delta-stepping wbfs d d Parallel Approximate Set Cover

129 Thank you! Please feel free to reach out to k-core Delta-stepping wbfs d d Parallel Approximate Set Cover

A Framework for Processing Large Graphs in Shared Memory

A Framework for Processing Large Graphs in Shared Memory Julian Shun Based on joint work with Guy Blelloch and Laxman Dhulipala (Work done at Carnegie Mellon University) 2 What are graphs? Vertex Edge