One Trillion Edges. Graph processing at Facebook scale

Size: px

Start display at page:

Download "One Trillion Edges. Graph processing at Facebook scale"

Melanie Jackson
5 years ago
Views:

1 One Trillion Edges Graph processing at Facebook scale

2 Introduction Platform improvements Compute model extensions Experimental results Operational experience

3 How Facebook improved Apache Giraph Facebook's contributions amount to Usability, performance and scalability improvements Extensions to the Pregel model Real world applications and their performance Shared operational experiences Contribution of code into the open-source Apache Giraph project Excerpt While Giraph initially did not scale to our needs at Facebook with over 1.39B users and hundreds of billions of social connections, we improved the platform in a variety of ways to support our workloads and implement our production applications

4 How large is one trillion?

5 Motivation: the Facebook Social Graph Facebook has 1.39B active users with more than 400B edges Adding all social connections brings us to over 1 trillion edges Open Graph allows application developers to connect objects in their application with real-world actions Many traditional graph frameworks Fail due to inefficient memory usage Asynchronous engines have challenges due to unbounded message queues In practice often build on MapReduce Different challenges than traditional data analytics

Related systems MapReduce Open-Source implementation

be written as acyclic programs, making iterative

processing and generating large data sets with a

procedure and a Reduce() method Data warehouse built

residing in distributed storage Pregel The first

6 Related systems MapReduce Open-Source implementation of MapReduce written in Java Apache Hive Tasks must be written as acyclic programs, making iterative algorithms difficult Apache Hadoop Model for processing and generating large data sets with a parallel, distributed algorithm Composed of a Map() procedure and a Reduce() method Data warehouse built on Hadoop for querying and managing large datasets residing in distributed storage Pregel The first implementation of the Bulk Synchronous Parallel (BSP) model Think like a vertex

7 Apache Giraph Iterative graph processing system, designed for scalability Requirements Iterative computing model Graph based API Easy access to Facebook data Promising platforms: HIVE, GraphLab, Giraph Reasons for selection Written in Java MapReduce Faster Debug

8 Introduction Platform improvements Compute model extensions Experimental results Operational experience

9 Platform improvements: flexible vertex/edge input Computing platform, needs external storage to read input Original input model required a rigid layout extra overhead Allowed load from several sources Workers can read arbitrary subset of edges Reduce or eliminate pre-processing

10 Platform improvements: parallelization support Parallelization by inc. # of workers Hard to share resources on the same machine Giraph can mitigate these issues by running one monopolizing worker per machine Multithreading. One worker per machine. Max resource utilization

arrays using native direct methods OutEdges interface - allow developers to utilize Java

11 Platform improvements: memory optimization Giraph: flexible model allowing arbitrary classes for id, value, edge, message Potentially large overhead avoided by Serializing edges into byte arrays using native direct methods OutEdges interface - allow developers to utilize Java primitives for edge storage Memory usage = bytes per edge * number of edges * 1.5 Overhead reduced from 10x

12 Platform improvements: sharded aggregators Zookeeper not scalable due to Znode write constraint (1MB) Write constraints Now limited only by total memory on each worker

13 Platform improvements: sharded aggregators Worker is in charge of gathering the values of its aggregators, performing the aggregation and distributing the final values to the master and other workers

14 Introduction Platform improvements Compute model extensions (CME) Experimental results Operational experience

15 CME: worker phases e.g. presupertstep() can be used to calculate new centroid location preallocation() can be used to initialize the centroid locations Adds functionality, but require special considerations for application specific techniques. Reduced overhead Easier implementation presuperstep() postsuperstep() preallocation() postallocation()

16 CME: master computation Added master computation to do centralized computation prior to superstep Master computation calculate when to switch task Communicates via aggregators Workers only need to check aggregator, rather than perform check

17 CME: composable computation Decouples the computation from the vertex Different types of computation can be used for current superstep The master can at any time choose computation, keeping the data in memory Two message types: M1 incoming and M2 outgoing public abstract class Vertex<I, V, E, M> { public abstract void compute(iterable<m> messages); } public interface Computation<I, V, E, M1, M2> { void compute(vertex<i, V, E> vertex, Iterable<M1> messages); } public abstract class MasterCompute { public abstract void compute(); public void setcomputation( Class<? extends Computation> computation); public void setmessagecombiner( Class<? extends MessageCombiner> combiner); public void setincomingmessage( Class<? extends Writable> incomingmessage); public void setoutgoingmessage( Class<? extends Writable> outgoingmessage); }

18 CME: superstep splitting Used when messaging patterns can exceed available memory on destination vertex Send only a fragment of the messages Determined by hash function with destination vertex id Do a partial computation that updates the state Limitations are Update must be commutative and associative No single message can overflow memory buffer of single vertex Run the same superstep for a fixed number of iterations

19 Introduction Platform improvements Compute model extensions Experimental results Operational experience

20 Experimental results: PageRank PageRank algorithm

21 Experimental results Large speedups vs. Hive (5x-100x speedup for CPU and elapsed time) Label propagation Friends of friends score Hive queries Double join queries Trillion Edge PageRank Unweighted PageRank on 1.39B users with more than 1 trillion social connections Less than 3 minutes/iteration with only 200 machines

22 Introduction Platform improvements Compute model extensions Experimental results Operational experience

Handled by restarts Facebook warehouse data is stored in Hive tables. Read with HiveIO.

23 Operational experience Experiences after running Giraph at Facebook. Graph preparation: Scheduling: Change method for scheduling Turn off checkpointing for error handling. Handled by restarts Facebook warehouse data is stored in Hive tables. Read with HiveIO. Production application workflow Write application, unit test. Run application on test dataset Run application at scale Deploy to production

26 Further reading The original Google system: Pregel

Apache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM

Apache Giraph: Facebook-scale graph processing infrastructure 3/31/2014 Avery Ching, Facebook GDM Motivation Apache Giraph Inspired by Google s Pregel but runs on Hadoop Think like a vertex Maximum value