Pregel: A System for Large-Scale Graph Proces sing

Similar documents
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G.

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING

PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING

Big Graph Processing. Fenggang Wu Nov. 6, 2016

Pregel: A System for Large- Scale Graph Processing. Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010

PREGEL AND GIRAPH. Why Pregel? Processing large graph problems is challenging Options

Pregel. Ali Shah

Distributed Systems. 21. Graph Computing Frameworks. Paul Krzyzanowski. Rutgers University. Fall 2016

modern database systems lecture 10 : large-scale graph processing

Large-Scale Graph Processing 1: Pregel & Apache Hama Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC

Pregel: A System for Large-Scale Graph Processing

COSC 6339 Big Data Analytics. Graph Algorithms and Apache Giraph

CS /21/2016. Paul Krzyzanowski 1. Can we make MapReduce easier? Distributed Systems. Apache Pig. Apache Pig. Pig: Loading Data.

Part II: Software Infrastructure in Data Centers: Distributed Execution Engines

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing

PREGEL. A System for Large-Scale Graph Processing

Handling limits of high degree vertices in graph processing using MapReduce and Pregel

King Abdullah University of Science and Technology. CS348: Cloud Computing. Large-Scale Graph Processing

Graph Processing. Connor Gramazio Spiros Boosalis

Giraph: Large-scale graph processing infrastructure on Hadoop. Qu Zhi

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

Large Scale Graph Processing Pregel, GraphLab and GraphX

GPS: A Graph Processing System

Distributed Systems. 20. Other parallel frameworks. Paul Krzyzanowski. Rutgers University. Fall 2017

Report. X-Stream Edge-centric Graph processing

CS November 2017

Memory-Optimized Distributed Graph Processing. through Novel Compression Techniques

TI2736-B Big Data Processing. Claudia Hauff

GraphHP: A Hybrid Platform for Iterative Graph Processing

Graph Processing & Bulk Synchronous Parallel Model

The MapReduce Abstraction

[CoolName++]: A Graph Processing Framework for Charm++

Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Access

Frameworks for Graph-Based Problems

One Trillion Edges. Graph processing at Facebook scale

Map-Reduce Applications: Counting, Graph Shortest Paths

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

BSP, Pregel and the need for Graph Processing

Map-Reduce Applications: Counting, Graph Shortest Paths

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21

Distributed Systems. 21. Other parallel frameworks. Paul Krzyzanowski. Rutgers University. Fall 2018

Turning NoSQL data into Graph Playing with Apache Giraph and Apache Gora

CS November 2018

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR LARGE-SCALE TEMPORAL GRAPH PROCESSING

Oolong: Asynchronous Distributed Applications Made Easy

Data-Intensive Distributed Computing

DFA-G: A Unified Programming Model for Vertex-centric Parallel Graph Processing

PREGEL. A System for Large Scale Graph Processing

AN INTRODUCTION TO GRAPH COMPRESSION TECHNIQUES FOR IN-MEMORY GRAPH COMPUTATION

Fast Failure Recovery in Distributed Graph Processing Systems

GiViP: A Visual Profiler for Distributed Graph Processing Systems

Lecture 8, 4/22/2015. Scribed by Orren Karniol-Tambour, Hao Yi Ong, Swaroop Ramaswamy, and William Song.

Map-Reduce. Marco Mura 2010 March, 31th

CSE 100 Minimum Spanning Trees Prim s and Kruskal

Drakkar: a graph based All-Nearest Neighbour search algorithm for bibliographic coupling

Parallel Computing: MapReduce Jin, Hai

Distributed Computations MapReduce. adapted from Jeff Dean s slides

Resilient Distributed Datasets

Exam 2 Review. October 29, Paul Krzyzanowski 1

Distributed Filesystem

Defining supersteps for BSP

Apache Flink. Alessandro Margara

Graph Processing Frameworks

Apache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis

Apache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM

Webinar Series TMIP VISION

NPTEL Course Jan K. Gopinath Indian Institute of Science

Giraphx: Parallel Yet Serializable Large-Scale Graph Processing

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

ZHT+ : Design and Implementation of a Graph Database Using ZHT

CS 345A Data Mining. MapReduce

Optimizing CPU Cache Performance for Pregel-Like Graph Computation

Sync-on-the-fly: A Parallel Framework for Gradient Descent Algorithms on Transient Resources

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

Batch & Stream Graph Processing with Apache Flink. Vasia

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011

MapReduce: A Programming Model for Large-Scale Distributed Computation

Today s content. Resilient Distributed Datasets(RDDs) Spark and its data model

CIM/E Oriented Graph Database Model Architecture and Parallel Network Topology Processing

Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Distributed computing: index building and use

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed computing: index building and use

PAGE: A Partition Aware Graph Computation Engine

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

RESILIENT DISTRIBUTED DATASETS: A FAULT-TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER COMPUTING

Lecture 11 Hadoop & Spark

Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

2. True or false: even though BFS and DFS have the same space complexity, they do not always have the same worst case asymptotic time complexity.

Implementing Graph Transformations in the Bulk Synchronous Parallel Model

TENSORFLOW: LARGE-SCALE MACHINE LEARNING ON HETEROGENEOUS DISTRIBUTED SYSTEMS. by Google Research. presented by Weichen Wang

Transcription:

Pregel: A System for Large-Scale Graph Proces sing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkwoski Google, Inc. SIGMOD July 20 Taewhi Lee

Outline Introduction Computation Model Writing a Pregel Program System Implementation Experiments Conclusion & Future Work 2

Outline Introduction Computation Model Writing a Pregel Program System Implementation Experiments Conclusion & Future Work 3

Introduction (1/2) Source: SIGMETRICS 09 Tutorial MapReduce: The Programming Model and Practice, by Jerry Zhao 4

Introduction (2/2) Many practical computing problems concern large graphs Large graph data Web graph Transportation routes Citation relationships Social networks Graph algorithms PageRank Shortest path Connected components Clustering techniques MapReduce is ill-suited for graph processing Many iterations are needed for parallel graph processing Materializations of intermediate results at every MapReduce iterati on harm performance

Single Source Shortest Path (SSSP) Problem Find shortest path from a source node to all target nodes Solution Single processor machine: Dijkstra s algorithm 6

Example: SSSP Dijkstra s Algorithm 1 0 2 3 9 4 6 2

Example: SSSP Dijkstra s Algorithm 1 0 2 3 9 4 6 2 8

Example: SSSP Dijkstra s Algorithm 8 1 14 0 2 3 9 4 6 2 9

Example: SSSP Dijkstra s Algorithm 8 1 13 0 2 3 9 4 6 2

Example: SSSP Dijkstra s Algorithm 8 1 9 0 2 3 9 4 6 2 11

Example: SSSP Dijkstra s Algorithm 8 1 9 0 2 3 9 4 6 2 12

Single Source Shortest Path (SSSP) Problem Find shortest path from a source node to all target nodes Solution Single processor machine: Dijkstra s algorithm MapReduce/Pregel: parallel breadth-first search (BFS) 13

MapReduce Execution Overview 14

Example: SSSP Parallel BFS in MapRed uce Adjacency matrix A B C D E A B 1 2 C 4 D 3 9 2 E 6 Adjacency List B C 1 A 0 2 3 9 4 6 A: (B, ), (D, ) B: (C, 1), (D, 2) C: (E, 4) D 2 E D: (B, 3), (C, 9), (E, 2) E: (A, ), (C, 6) 1

Example: SSSP Parallel BFS in MapRed uce Map input: <node ID, <dist, adj list>> B C <A, <0, <(B, ), (D, )>>> 1 <B, <inf, <(C, 1), (D, 2)>>> <C, <inf, <(E, 4)>>> A <D, <inf, <(B, 3), (C, 9), (E, 2)>>> <E, <inf, <(A, ), (C, 6)>>> 0 2 3 9 4 6 Map output: <dest node ID, dist> <B, > <D, > <C, inf> <D, inf> <E, inf> <B, inf> <C, inf> <E, inf> <A, inf> <C, inf> <A, <0, <(B, ), (D, )>>> <B, <inf, <(C, 1), (D, 2)>>> D <C, <inf, <(E, 4)>>> <D, <inf, <(B, 3), (C, 9), (E, 2)>>> <E, <inf, <(A, ), (C, 6)>>> 16 2 E Flushed to local disk!!

Example: SSSP Parallel BFS in MapRed uce Reduce input: <node ID, dist> B C <A, <0, <(B, ), (D, )>>> <A, inf> A <B, <inf, <(C, 1), (D, 2)>>> 1 <B, > <B, inf> 0 2 3 9 4 6 <C, <inf, <(E, 4)>>> <C, inf> <C, inf> <C, inf> <D, <inf, <(B, 3), (C, 9), (E, 2)>>> <D, > <D, inf> D 2 E <E, <inf, <(A, ), (C, 6)>>> <E, inf> <E, inf> 1

Example: SSSP Parallel BFS in MapRed uce Reduce input: <node ID, dist> B C <A, <0, <(B, ), (D, )>>> <A, inf> A <B, <inf, <(C, 1), (D, 2)>>> 1 <B, > <B, inf> 0 2 3 9 4 6 <C, <inf, <(E, 4)>>> <C, inf> <C, inf> <C, inf> <D, <inf, <(B, 3), (C, 9), (E, 2)>>> <D, > <D, inf> D 2 E <E, <inf, <(A, ), (C, 6)>>> <E, inf> <E, inf> 18

Example: SSSP Parallel BFS in MapRed uce Reduce output: <node ID, <dist, adj list>> B = Map input for next iteration <A, <0, <(B, ), (D, )>>> <B, <, <(C, 1), (D, 2)>>> <C, <inf, <(E, 4)>>> <D, <, <(B, 3), (C, 9), (E, 2)>>> <E, <inf, <(A, ), (C, 6)>>> Flushed to DFS!! Map output: <dest node ID, dist> C 1 A 0 2 3 9 4 6 <B, > <D, > <C, 11> <D, 12> <E, inf> <B, 8> <C, 14> <E, > <A, inf> <C, inf> <A, <0, <(B, ), (D, )>>> 2 <B, <, <(C, 1), (D, 2)>>> D E <C, <inf, <(E, 4)>>> <D, <, <(B, 3), (C, 9), (E, 2)>>> <E, <inf, <(A, ), (C, 6)>>> Flushed to local disk!! 19

Example: SSSP Parallel BFS in MapRed uce Reduce input: <node ID, dist> <A, <0, <(B, ), (D, )>>> <A, inf> B C 1 <B, <, <(C, 1), (D, 2)>>> A <B, > <B, 8> 0 2 3 9 4 6 <C, <inf, <(E, 4)>>> <C, 11> <C, 14> <C, inf> <D, <, <(B, 3), (C, 9), (E, 2)>>> <D, > <D, 12> D 2 E <E, <inf, <(A, ), (C, 6)>>> <E, inf> <E, > 20

Example: SSSP Parallel BFS in MapRed uce Reduce input: <node ID, dist> <A, <0, <(B, ), (D, )>>> <A, inf> B C 1 <B, <, <(C, 1), (D, 2)>>> A <B, > <B, 8> 0 2 3 9 4 6 <C, <inf, <(E, 4)>>> <C, 11> <C, 14> <C, inf> <D, <, <(B, 3), (C, 9), (E, 2)>>> <D, > <D, 12> D 2 E <E, <inf, <(A, ), (C, 6)>>> <E, inf> <E, > 21

Example: SSSP Parallel BFS in MapRed uce Reduce output: <node ID, <dist, adj list>> B = Map input for next iteration <A, <0, <(B, ), (D, )>>> Flushed to DFS!! 8 C 1 11 <B, <8, <(C, 1), (D, 2)>>> <C, <11, <(E, 4)>>> A <D, <, <(B, 3), (C, 9), (E, 2)>>> <E, <, <(A, ), (C, 6)>>> 0 2 3 9 4 6 the rest omitted 2 D E 22

Outline Introduction Computation Model Writing a Pregel Program System Implementation Experiments Conclusion & Future Work 23

Computation Model (1/3) Input Supersteps (a sequence of iterations) Output 24

Computation Model (2/3) Think like a vertex Inspired by Valiant s Bulk Synchronous Parallel model (199 0) Source: http://en.wikipedia.org/wiki/bulk_synchronous_parallel 2

Computation Model (3/3) Superstep: the vertices compute in parallel Each vertex Receives messages sent in the previous superstep Executes the same user-defined function Modifies its value or that of its outgoing edges Sends messages to other vertices (to be received in the next superstep) Mutates the topology of the graph Votes to halt if it has no further work to do Termination condition All vertices are simultaneously inactive There are no messages in transit 26

Example: SSSP Parallel BFS in Pregel 1 0 2 3 9 4 6 2 2

Example: SSSP Parallel BFS in Pregel 1 0 2 3 9 4 6 2 28

Example: SSSP Parallel BFS in Pregel 1 0 2 3 9 4 6 2 29

Example: SSSP Parallel BFS in Pregel 1 11 8 14 0 2 3 9 4 6 12 2 30

Example: SSSP Parallel BFS in Pregel 8 1 11 0 2 3 9 4 6 2 31

Example: SSSP Parallel BFS in Pregel 8 1 9 11 13 0 14 2 3 9 4 6 1 2 32

Example: SSSP Parallel BFS in Pregel 8 1 9 0 2 3 9 4 6 2 33

Example: SSSP Parallel BFS in Pregel 8 1 9 0 2 3 9 4 6 13 2 34

Example: SSSP Parallel BFS in Pregel 8 1 9 0 2 3 9 4 6 2 3

Differences from MapReduce Graph algorithms can be written as a series of chained Ma preduce invocation Pregel Keeps vertices & edges on the machine that performs computation Uses network transfers only for messages MapReduce Passes the entire state of the graph from one stage to the next Needs to coordinate the steps of a chained MapReduce 36

Outline Introduction Computation Model Writing a Pregel Program System Implementation Experiments Conclusion & Future Work 3

C++ API Writing a Pregel program Subclassing the predefined Vertex class Override this! in msgs out msg 38

Example: Vertex Class for SSSP 39

Outline Introduction Computation Model Writing a Pregel Program System Implementation Experiments Conclusion & Future Work 40

System Architecture Pregel system also uses the master/worker model Master Maintains worker Recovers faults of workers Provides Web-UI monitoring tool of job progress Worker Processes its task Communicates with the other workers Persistent data is stored as files on a distributed storage sy stem (such as GFS or BigTable) Temporary data is stored on local disk 41

Execution of a Pregel Program 1. Many copies of the program begin executing on a cluster of m achines 2. The master assigns a partition of the input to each worker Each worker loads the vertices and marks them as active 3. The master instructs each worker to perform a superstep Each worker loops through its active vertices & computes for each vertex Messages are sent asynchronously, but are delivered before the en d of the superstep This step is repeated as long as any vertices are active, or any mess ages are in transit 4. After the computation halts, the master may instruct each wor ker to save its portion of the graph 42

Fault Tolerance Checkpointing The master periodically instructs the workers to save the state of th eir partitions to persistent storage e.g., Vertex values, edge values, incoming messages Failure detection Using regular ping messages Recovery The master reassigns graph partitions to the currently available wor kers The workers all reload their partition state from most recent availabl e checkpoint 43

Outline Introduction Computation Model Writing a Pregel Program System Implementation Experiments Conclusion & Future Work 44

Experiments Environment H/W: A cluster of 300 multicore commodity PCs Data: binary trees, log-normal random graphs (general graphs) Naïve SSSP implementation The weight of all edges = 1 No checkpointing 4

Experiments SSSP 1 billion vertex binary tree: varying # of worker task s 46

Experiments SSSP binary trees: varying graph sizes on 800 worker task s 4

Experiments SSSP Random graphs: varying graph sizes on 800 worker tasks 48

Outline Introduction Computation Model Writing a Pregel Program System Implementation Experiments Conclusion & Future Work 49

Conclusion & Future Work Pregel is a scalable and fault-tolerant platform with an API that is sufficiently flexible to express arbitrary graph algorit hms Future work Relaxing the synchronicity of the model Not to wait for slower workers at inter-superstep barriers Assigning vertices to machines to minimize inter-machine commun ication Caring dense graphs in which most vertices send messages to mos t other vertices 0

Thank You! Any questions or comments?