PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) SIGMOD 2010 Presented by : Xiu Zhang 20161011
Motivations Computation Models System Architecture False Toleration Applications Experiments
MOTIVATION 3
Motivation Large Graphs Computation is needed: Social Media Transportation
Motivation Documents=vertices Links=edges Web graph
Graph Algorithms Pattern Matching Search through the entire graph Identify similar components Traversals Define a specific start point Iteratively explore the graph Global measurements Compute one value for graph, based on all its vertices or edges
Challenges for Graph Algorithms Poor Locality of memory access Very little computation work required per vertex, however iterate many times Shortest Path Changing degree of parallelism over course of execution Connect Component Analysis
Possible Solutions Custom distributed frame work for each alg. Existing distributed computing platforms MapReduce unnecessarily slow, hard to implement Single-computer graph algorithm libraries Scale limitation Existing parallel graph systems Fault tolerance Parallel BGL and CGMgraph
Inspired by Valiant s Bulk Synchronous Parallel (BSP) mode Vertex centric computation
COMPUTATION MODEL 10
Computation Model(BSP) asynchronization Source: http://en.wikipedia.org/wiki/bulk_synchronous_parallel 11
: Message Passing Model Vertex: A unique identifier A modifiable, user defined value Edge: Source vertex and Target vertex identifiers A modifiable, user defined value
Basic Organization Supersteps: Iterations Invoke user defined function for each vertex Read messages sent to V in superstep S-1 Send messages that will be received in S+1 Modify the state of V and the outgoing edges Make topology changes Introduce/Delete/Modify edges(vertices) Votes to halt if no further work to do
State machine for a vertex Termination Condition All vertices are simultaneously inactive There are no messages in transit
Example Single Source Shortest Path Find shortest path from a source node to all target nodes Example taken from talk by Taewhi Lee,2010 http://zhenxiao.com/read/.ppt 15
Example: SSSP Parallel BFS in 1 10 0 2 3 9 4 6 Inactive Vertex Active Vertex 5 2 7 x x Edge weight Message 16
Example: SSSP Parallel BFS in 10 10 1 0 2 3 9 4 6 Inactive Vertex Active Vertex 5 7 x x Edge weight Message 5 2 17
Example: SSSP Parallel BFS in 10 1 10 0 2 3 9 4 6 Inactive Vertex Active Vertex x Edge weight 5 7 x Message 5 2 18
Example: SSSP Parallel BFS in 10 1 11 10 8 14 0 2 3 9 4 6 Inactive Vertex Active Vertex x Edge weight 5 12 7 x Message 5 2 7 19
Example: SSSP Parallel BFS in 8 1 11 10 0 2 3 9 4 6 Inactive Vertex Active Vertex x Edge weight 5 7 x Message 5 2 7 20
Example: SSSP Parallel BFS in 8 1 9 10 0 2 3 9 4 6 Inactive Vertex Active Vertex x Edge weight 5 7 x Message 5 2 7 21
Example: SSSP Parallel BFS in 8 1 9 10 0 2 3 9 4 6 Inactive Vertex Active Vertex x Edge weight 5 7 13 x Message 5 2 7 22
Example: SSSP Parallel BFS in 8 1 9 10 0 2 3 9 4 6 Inactive Vertex Active Vertex x Edge weight 5 7 x Message 5 2 7 23
SYSTEM ARCHITECTURE 24
System Architecture system uses the master/worker model Master Coordinates workers Recovers faults of workers Worker Processes its task Communicates with the other workers Persistent data is in distributed storage system Temporary data is stored on local disk 25
Execution 26
Execution 27
Execution 28
Execution 29
Execution 30
FALSE TOLERANCE 31
Fault Tolerance Checkpointing The master periodically instructs the workers to save the state of their partitions to persistent storage e.g., Vertex values, edge values, incoming messages Failure detection Master uses regular ping messages to detect worker failures 32
Fault Tolerance Recovery The master reassigns graph partitions to the currently available workers The workers all reload their partition states from most recent available checkpoint 33
APPLICATIONS 34
PageRank the importance of a document the number of references to it the importance of the source documents themselves A = A given page T 1. T n = Pages that point to page A (citations) d = Damping factor between 0 and 1 (usually kept as 0.85) C(T) = number of links going out of T PR(A) = the PageRank of page A PR( A) (1 d) d PR( T1 ) ( C( T ) 1 PR( T2 ) C( T ) 2... PR( Tn ) ) C( T ) n 35
PageRank Courtesy: Wikipedia 36
PageRank Iterative loop till convergence Initial value of PageRank of all pages = 1.0; While ( sum of PageRank of all pages numpages > epsilon) { for each Page Pi in list { PageRank(Pi) = (1-d); for each page Pj linking to page Pi { PageRank(Pi) += d (PageRank(Pj)/numOutLinks(Pj)); } } } 37
Page Rank In
EXPERIMENTS 39
Experiments: (Shortest Paths) 1 billion vertex binary tree: varying number of worker tasks 40
Experiments: binary trees: varying graph sizes on 800 worker tasks 41
Experiments Log-normal random graphs, mean out-degree 127.1 (thus over 127 billion edges in the largest case): varying graph sizes on 800 worker tasks 42
Conclusion Distributed system for large scale graph processing Think like a vertex computation model (intuitive API) 43
Limitations Inefficient if different regions of the graph converge at different speed Slowest machine Dense Graphs
THANK YOU ANY QUESTIONS?