Apache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM

Similar documents
One Trillion Edges. Graph processing at Facebook scale

PREGEL AND GIRAPH. Why Pregel? Processing large graph problems is challenging Options

Giraph: Large-scale graph processing infrastructure on Hadoop. Qu Zhi

Big Graph Processing. Fenggang Wu Nov. 6, 2016

modern database systems lecture 10 : large-scale graph processing

PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing

Pregel. Ali Shah

Apache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis

Pregel: A System for Large- Scale Graph Processing. Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G.

Distributed Systems. 21. Graph Computing Frameworks. Paul Krzyzanowski. Rutgers University. Fall 2016

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

King Abdullah University of Science and Technology. CS348: Cloud Computing. Large-Scale Graph Processing

RAMCloud. Scalable High-Performance Storage Entirely in DRAM. by John Ousterhout et al. Stanford University. presented by Slavik Derevyanko

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Graph Processing. Connor Gramazio Spiros Boosalis

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Large-Scale Graph Processing 1: Pregel & Apache Hama Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC

Efficient and Scalable Friend Recommendations

Jure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah

Turning NoSQL data into Graph Playing with Apache Giraph and Apache Gora

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

Clustering Lecture 8: MapReduce

From Think Like a Vertex to Think Like a Graph. Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, John McPherson

The MapReduce Framework

Real-time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments

COSC 6339 Big Data Analytics. Graph Algorithms and Apache Giraph

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

MapReduce: A Programming Model for Large-Scale Distributed Computation

CISC 7610 Lecture 2b The beginnings of NoSQL

Resource and Performance Distribution Prediction for Large Scale Analytics Queries

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

CS /21/2016. Paul Krzyzanowski 1. Can we make MapReduce easier? Distributed Systems. Apache Pig. Apache Pig. Pig: Loading Data.

GiViP: A Visual Profiler for Distributed Graph Processing Systems

Distributed Graph Storage. Veronika Molnár, UZH

Large Scale Graph Processing Pregel, GraphLab and GraphX

PARALLELIZED IMPLEMENTATION OF LOGISTIC REGRESSION USING MPI

Webinar Series TMIP VISION

Introduction to MapReduce (cont.)

CS 61C: Great Ideas in Computer Architecture. MapReduce

Distributed Graph Algorithms

BSP, Pregel and the need for Graph Processing

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING

GPS: A Graph Processing System

Distributed Computation Models

Map-Reduce. Marco Mura 2010 March, 31th

Processing 11 billions events a day with Spark. Alexander Krasheninnikov

Fall 2018: Introduction to Data Science GIRI NARASIMHAN, SCIS, FIU

A Glimpse of the Hadoop Echosystem

Piccolo. Fast, Distributed Programs with Partitioned Tables. Presenter: Wu, Weiyi Yale University. Saturday, October 15,

Pregel: A System for Large-Scale Graph Proces sing

Big Data Architect.

HaLoop Efficient Iterative Data Processing on Large Clusters

Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016

Chapter 5. The MapReduce Programming Model and Implementation

volley: automated data placement for geo-distributed cloud services

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing

Distributed Computations MapReduce. adapted from Jeff Dean s slides

Parallel Computing: MapReduce Jin, Hai

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

Research challenges in data-intensive computing The Stratosphere Project Apache Flink

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

The Stratosphere Platform for Big Data Analytics

Page 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24

Arabesque. A system for distributed graph mining. Mohammed Zaki, RPI

iihadoop: an asynchronous distributed framework for incremental iterative computations

Achieving Horizontal Scalability. Alain Houf Sales Engineer

Big Data Hadoop Stack

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

A Cost-efficient Auto-scaling Algorithm for Large-scale Graph Processing in Cloud Environments with Heterogeneous Resources

MapReduce-style data processing

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

The MapReduce Abstraction

Graph-Processing Systems. (focusing on GraphChi)

Scalable Label Propagation Algorithms for Heterogeneous Networks

Map Reduce Group Meeting

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

Data-Intensive Distributed Computing

[CoolName++]: A Graph Processing Framework for Charm++

Coflow. Recent Advances and What s Next? Mosharaf Chowdhury. University of Michigan

A Fast and High Throughput SQL Query System for Big Data

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Distributed Systems Pre-exam 3 review Selected questions from past exams. David Domingo Paul Krzyzanowski Rutgers University Fall 2018

Processing of big data with Apache Spark

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins

Distributed Systems. 20. Other parallel frameworks. Paul Krzyzanowski. Rutgers University. Fall 2017

Chapter 26 Cluster Heuristic Search

Graphs (Part II) Shannon Quinn

CS November 2017

An Introduction to Big Data Formats

Transcription:

Apache Giraph: Facebook-scale graph processing infrastructure 3/31/2014 Avery Ching, Facebook GDM

Motivation

Apache Giraph Inspired by Google s Pregel but runs on Hadoop Think like a vertex Maximum value vertex example Processor 1 5 5 5 5 Processor 2 1 1 5 5 5 2 2 2 5 5 Time

Giraph on Hadoop / Yarn Giraph MapReduce YARN Hadoop Hadoop Hadoop Hadoop 0.20.x 0.20.203 1.x 2.0.x

Page rank in Giraph Calculate updated page rank value from neighbors Send page rank value to neighbors for 30 iterations public class PageRankComputation extends BasicComputation<LongWritable, DoubleWritable, FloatWritable, DoubleWritable> { public void compute(vertex<longwritable, DoubleWritable, FloatWritable> vertex, Iterable<DoubleWritable> messages) { if (getsuperstep() >= 1) { double sum = 0; for (DoubleWritable message : messages) { sum += message.get(); } vertex.getvalue().set(doublewritable((0.15d / gettotalnumvertices()) + 0.85d * sum); } if (getsuperstep() < 30) { sentmsgtoalledges(new DoubleWritable(getVertexValue().get() / getnumoutedges())); } else { votetohalt(); } } }

Apache Giraph data flow Loading the graph Compute / Iterate Storing the graph Master Input format Split 0 Split 1 Split 2 Split 3 Worker 0 Worker 1 Load/ Send Graph Load/ Send Graph Master In-memory graph Part 0 Part 1 Part 2 Part 3 Worker 0 Worker 1 Compute/ Send Messages Compute/ Send Messages Worker 0 Worker 1 Part 0 Part 1 Part 2 Part 3 Output format Part 0 Part 1 Part 2 Part 3 Send stats/iterate!

Use case: k-means clustering Cluster input vectors into k clusters Assign each input vector to the closest centroid Update centroid locations based on assignments Random centroid location Assignment to centroid Update centroids c0 c2 c0 c2 c0 c2 c0 c2 c1 c1 c1 c1

k-means in Giraph Partitioning the problem Input vectors vertices c0 c2 Partitioned across machines Centroids aggregators Shared data across all machines c1 Worker 0 Worker 1 Problem solved...right? c0 c2 c0 c2 c1 c1

Problem 1: Massive dimensions Cluster Facebook members by friendships? 1 billion members (dimensions) k clusters Each worker sending to the master a maximum of 1B * (2 bytes - max 5k friends) * k = 2 * k GB Master receives up to 2 * k * workers GB Saturated network link OOM

Sharded aggregators Master handles all aggregators final agg 0 Aggregators sharded to workers final agg 0 master final agg 1 master final agg 1 final agg 2 final agg 2 worker 0 partial agg 0 partial agg 1 partial agg 2 final agg 0 final agg 1 final agg 2 worker 0 partial agg 0 partial agg 1 partial agg 2 final agg 0 final agg 1 final agg 2 worker 1 partial agg 0 partial agg 1 partial agg 2 final agg 0 final agg 1 final agg 2 worker 1 partial agg 0 partial agg 1 partial agg 2 final agg 1 final agg 0 final agg 2 worker 2 partial agg 0 partial agg 1 partial agg 2 final agg 0 final agg 1 final agg 2 worker 2 partial agg 0 partial agg 1 partial agg 2 final agg 2 final agg 0 final agg 1 Share aggregator load across workers Future work - tree-based optimizations (not yet a problem)

Problem 2: Edge cut metric Clusters should reduce the number of cut edges Two phases Send all out edges your cluster id Aggregate edges with different cluster ids Calculate no more than once an hour?

Master computation Serial computation on master Communicates to workers via aggregators Added to Giraph by Stanford GPS team Master Worker 0 k-means k-means start cut end cut k-means Worker 1 k-means k-means start cut end cut k-means Time

Problem 3: More phases, more Add a stage to initialize the centroids Add random input vectors to centroids Add a few random friends Two phases Randomly sample input vertices to add Send messages to a few random neighbors c0 c2 c3

Problem 3: (continued) Cannot easily support different messages, combiners Vertex compute code getting messy if (phase == INITIALIZE_SELF) // Randomly add to centroid else if (phase == INITIALIZE_FRIEND) // Add my vector to centroid if a friend selected me else if (phase == K_MEANS) // Do k-means else if (phase == START_EDGE_CUT)... c0 c2 c3

Composable computation Decouple vertex from computation Master sets the computation, combiner classes Reusable and composable Computation Add random centroid / random friends Add to centroid K-means Start edge cut End edge cut In message Null Centroid message Null Null Cluster Out message Centroid message Null Null Cluster Null Combiner N/A N/A N/A Cluster combiner N/A

Composable computation (cont) Balanced Label Propagation compute candidates to move to partitions probabilistically move vertices Continue if halting condition not met (i.e. < n vertices moved?)

Continue if halting condition not met (i.e. < n vertices changed exemplars?) Composable computation (cont) Balanced Label Propagation compute candidates to move to partitions probabilistically move vertices Continue if halting condition not met (i.e. < n vertices moved?) Affinity Propagation calculate and send responsibilities calculate and send availabilities update exemplars

Faster than Hive? Application Graph Size CPU Time Speedup Elapsed Time Speedup Page rank (single iteration) 400B+ edges 26x 120x Friends of friends score 71B+ edges 12.5x 48x

Apache Giraph scalability Scalability of workers (200B edges) Scalability of edges (50 workers) 500 500 Seconds 375 250 125 Seconds 375 250 125 0 50 100 150 200 250 300 # of Workers Giraph Ideal 0 1E+09 7E+10 1E+11 2E+11 # of Edges Giraph Ideal

Page rank on 200 machines with 1 trillion (1,000,000,000,000) edges <4 minutes / iteration! * Results from 6/30/2013 with one-to-all messaging + request processing improvements

Why balanced partitioning Random partitioning == good balance BUT ignores entity affinity 0 1 6 3 4 5 7 8 9 2 11

Balanced partitioning application Results from one service: Cache hit rate grew from 70% to 85%, bandwidth cut in 1/2 0 3 6 9 1 4 7 2 5 8 11

Balanced label propagation results * Loosely based on Ugander and Backstrom. Balanced label propagation for partitioning massive graphs, WSDM '13

Avoiding out-of-core Example: Mutual friends calculation between neighbors 1. Send your friends a list of your friends C:{D} D:{C} A E:{} B 2. Intersect with your friend list 1.23B (as of 1/2014) A:{D} D:{A,E} E:{D} C E B:{} C:{D} D:{C} 200+ average friends (2011 S1) 8-byte ids (longs) = 394 TB / 100 GB machines D A:{C} C:{A,E} E:{C} 3,940 machines (not including the graph)

Superstep splitting Subsets of sources/destinations edges per superstep * Currently manual - future work automatic! Sources: A (on), B (off) Destinations: A (on), B (off) Sources: A (on), B (off) Destinations: A (off), B (on) Sources: A (off), B (on) Destinations: A (on), B (off) Sources: A (off), B (on) Destinations: A (off), B (on)

Debugging with GiraphicJam

Giraph in Production Over 1.5 years in production Over 100 jobs processed a week 30+ applications in our internal application repository Sample production job - 700B+ edges

Future work Investigate alternative computing models Giraph++ (IBM research) Giraphx (University at Buffalo, SUNY) Performance Lower the barrier to entry Applications

Our team Pavan Athivarapu Avery Ching Maja Kabiljo Greg Malewicz Sambavi Muthukrishnan