Exploring the Hidden Dimension in Graph Processing

Size: px

Start display at page:

Download "Exploring the Hidden Dimension in Graph Processing"

Cordelia Lorena Thornton
5 years ago
Views:

1 Exploring the Hidden Dimension in Graph Processing Mingxing Zhang, Yongwei Wu, Kang Chen, *Xuehai Qian, Xue Li, and Weimin Zheng Tsinghua University *University of Shouthern California

4 k nodes (k = 0,, 40) Biological Network

2 Graph is Ubiquitous Google: > trillion indexed pages Facebook: > 800 million active users Web Graph Social Graph 3 billion RDF triples in 0 Information Network De Bruijn: 4 k nodes (k = 0,, 40) Biological Network Acknowledgement: Arijit Khan, Sameh Elnikety [VLDB '4]

3 Graph Computing is Even More Ubiquitous Traditional Graph Analysis Many graph applications aim at analyzing the property of graphs Examples Shortest Path Triangle Counting PageRank Connecting Component

4 Graph Computing is Even More Ubiquitous Traditional Graph Analysis Many graph applications aim at analyzing the property of graphs MLDM as Graph Computing Many MLDM applications can be modeled and computed as graph problems Examples Shortest Path Triangle Counting PageRank Connecting Component Examples Collaborative Filtering SpMM Neural Network Matrix Factorization

5 D Partitioning D Partitioning: each vertex is assigned to one worker with all its incoming/outcoming edges attached Original Graph Partition into 4 nodes with D partitioning Node 0 Node V 3 V Node Node 3 4 Sub- graphs & 6 Replicas (denoted by dashed circle) V 3 V

D Each Vertex is a Task Vertex Program Vertex

Granularity of task dispatching is vertex,

6 D Each Vertex is a Task Vertex Program Vertex Program Each vertex program can read and modify the neighborhood of a vertex Communication by messages (e.g. Pregel) by shared state (e.g. GraphLab) Granularity of task dispatching is vertex, which is the SAME as the granularity of data dispatching Parallelism by running multiple vertex programs simultaneously Problem SKEW!!!

7 D Partitioning D Partitioning: each edge is assigned to one worker certain heuristics can be used for the assigning Original Graph Partition into 4 nodes with D partitioning Node 0 Node V 3 V V Node Node 3 4 Sub- graphs & 6 Replicas (denoted by dashed circle) V

8 D à Each Edge is a Task PowerGraph Gather User Defined: Gather( Σ + Σ à Σ 3 Y ) à Σ Apply User Defined: Apply( Y, Σ) à Y Scatter User Defined: Scaber( Y ) à Y Σ Y Y Y Y Granularity of task dispatching is edge, which is also the SAME as the granularity of data dispatching

9 Is this the END of task partitioning? YES Vertex can be attached with multiple edges But an edge is connected to only two vertices Indivisible!

10 Is this the END of task partitioning? YES Vertex can be attached with multiple edges But an edge is connected to only two vertices Indivisible! ß NOT true for many problems NO Each vertex/edge can be further partitioned!

11 Example: Collaborative Filtering Matrix View N: # of users M: # of items R R[u,v] Graph View D P * P[u] P 0 P P P u Q 0 Q Q Q v Q T Q[v] D R[u,v] <P[u], Q[v]> R[u,v] Q M P N Collaborative Filtering estimating missing ratings based on a given incomplete set of (user, item) ratings In graph view Each vertex is attached with a feature vector with size D. A typical pattern of modelling MLDM problem as graph problem.

12 3D Partitioning 3D Partitioning: each vertex is split into sub- vertices and a D partitioner is used in each layer Original Graph Partition into 4 nodes with 3D partitioning Node 0,0 Node 0, V Sub- graphs & 3 Replicas in each layer. Node,0 Node, V

13 Motivation We found a NEW dimension, shouldn t we partition it? Observation Ø These vector properties are usually operated by element- wise operators Ø Easy to parallelize v With no communication!

14 Two Kinds of Communications Node 0,0 Node 0, Intra- layer Communication Less sub- graphs Less replicas V Reduced! Node,0 Node, V

15 Two Kinds of Communications Node 0,0 Node 0, Intra- layer Communication Less sub- graphs Less replicas V Reduced! Node,0 Node, Inter- layer Communication V Not exist before Increased!

16 Two Kinds of Communications Node 0,0 Node 0, Intra- layer Communication Less sub- graphs Less replicas V Reduced! Node,0 Node, Inter- layer Communication V Not exist before Increased!

17 New System: CUBE Existing models does not formalize the inter- layer communication CUBE Adopts 3D partitioning Implements a new programming model UPPS Use a matrix backend

18 Data Model of CUBE CUBE The data graph model is also used in CUBE, but each vertex/edge can be attached with two kinds of data. Two classes of data Ø DShare :- - an indivisible property v represented by a single variable. Ø DColle :- - a divisible collection of property vector v stored as a vector of variables v len(dcolle) == collection size SC

19 3D Partitioner CUBE 3D Partitioner (L, P): where L is the number of layers, and P is a D partitioner. Node 0,0 Node,0 S S S S DShare is replicated L S V S S = Node 0, Node, S S V S S S S S DColle is equally partitioned Partitioned with P

20 Programming Model CUBE Update :- - inter- layer communication v v UpdateVertex: read/write all the data of a vertex UpdateEdge: read/write all the data of an edge Push, Pull, Sink :- - intra- layer communication Ø Variations of GAS, for an edge edg that connects (src, dst) v Push: use src and edg to update dst. v Pull: use dst and edg to update src. v Sink: use src and dst to update edg.

21 Matrix Backend CUBE Vertex Frontend ü Easy to program ü Similar to existing works Less efficiency MAP Matrix Backend Hard to program ü High efficiency v Simple indexing v Hilbert order Mapping from vertex frontend to matrix backend can significantly speedup the system; a useful method pioneered by GraphMat.

22 Evaluation CUBE Baseline: PowerGraph and PowerLyra Ø the default partitioner oblivious is used for PowerGraph Ø every possible partitioner is tested for PowerLyra and the best is used Ø P of CUBE is the same as the partitioner used by PowerLyra Platform: 8- node system; connected with Gb Ethernet Dataset Name U V E Best D Partitioner Libimseti 35,359 68,79 7,359,346 Hybrid- cut Last.fm 359,349,067 7,559,530 Bi- cut Netflix 7, ,89 00,480,507 Bi- cut

23 Micro Benchmarks: SpMM SpMM P[u] P[w] D P * Q[v] M Q = R[u,v] R R[w,v] D N Q[v] = R[u,v]*P[u] + R[w,v]*P[w] Execution Time of SpMM Libimseti, Sc = 56 Lastfm, Sc = 56 Libimseti, Sc = # of layers proportional to Communication cost # of replicas # of layers # of subgraphs inversely proportional to negative relation

24 Micro Benchmarks: SumV & SumE Execution Time of SumV Libimseti, Sc = 56 Lastfm, Sc = 56 Libimseti, Sc = # of layers Execution Time of SumE Libimseti, Sc = 56 Lastfm, Sc = 56 Libimseti, Sc = # of layers SumV: summing up each vertex s vector SumE: summing up each edge s vector Communication cost proportional to!"#!, L is # of layers

25 Compare to PowerLyra & PowerGraph CUBE Dataset D Execution Time GD ALS PowerGraph PowerLyra CUBE PowerGraph PowerLyra CUBE Libimseti Lastfm Netflix (4) (64) (8) (64) (4) (64) (8) Failed Failed 30 (64) () (8) () Failed 39 8 (8) ü Up to 4.7x speedup (V.S. PowerLyra) ü Up to 7.3x speedup (V.S. PowerGraph)

Not useful for Netflix. v Can achieve.5x speedup on Netflix if D is set to 048.

26 Piecewise Breakdown CUBE Network Reduction on GD Network Reduction on ALS ü About half of the speedup is from 3D partitioning (up to x) for Lastfm and Libimseti. Not useful for Netflix. v Can achieve.5x speedup on Netflix if D is set to 048. ü Almost all the speedup is from network reduction. v Computation of ALS is dominated by DSYSV, an O(N 3 ) computation kernel.

27 Memory Consumption CUBE Total Memory Consumption # of layers Memory Consumption better than L=, i.e., D only best for time is not always best for memory consumption Total memory consumption for running ALS on 64 workers with D=3, which means that SC=D +D=056 Partitioning Time similar curve as memory consumption

28 Conclusion We found that, for certain graph problems, vertices and edges are not indivisible. We propose CUBE, which Ø adopts 3D partitioning; Ø uses matrix backend; Ø achieves up to 4.7x speedup.

29 Thank You!!

Exploring the Hidden Dimension in Graph Processing

Exploring the Hidden Dimension in Graph Processing Exploring the Hidden imension in Graph Processing Mingxing Zhang, Yongwei Wu, and Kang hen, Tsinghua University; Xuehai Qian, University of Southern alifornia; Xue Li and Weimin Zheng, Tsinghua University