King Abdullah University of Science and Technology. CS348: Cloud Computing. Large-Scale Graph Processing

Size: px
Start display at page:

Download "King Abdullah University of Science and Technology. CS348: Cloud Computing. Large-Scale Graph Processing"

Transcription

1 King Abdullah University of Science and Technology CS348: Cloud Computing Large-Scale Graph Processing Zuhair Khayyat 10/March/2013

2 The Importance of Graphs A graph is a mathematical structure that represents pairwise relations between entities or objects. Such as: Physical communication networks Web pages links Social interaction graphs Protein-to-protein interactions Graphs are used to abstract application-specific features into a generic problem, which makes Graph Algorithms *

3 Graph algorithm characteristics* Data-Drivin Computations: Computations in graph algorithms depends on the structure of the graph. It is hard to predict the algorithm behavior Unstructured Problems: Different graph distributions requires distinct load balancing techniques. Poor Data Locality. High Data Access to Computation Ratio: Runtime can be dominated by waiting memory fetches. *Lumsdaine et. al, Challenges in Parallel Graph Processing

4 Challenges in Graph processing Graphs grows fast; a single computer either cannot fit a large graph into memory or it fits the large graph with huge cost. Custom implementations for a single graph algorithm requires time and effort and cannot be used on other algorithms Scientific parallel applications (i.e. parallel PDE solvers) cannot fully adapt to the computational requirements of graph algorithms*. Fault tolerance is required to support large scale processing. *Lumsdaine et. al, Challenges in Parallel Graph Processing

5 Why Cloud in Graph Processing Easy to scale up and down; provision machines depending on your graph size. Cheaper than buying a physical large cluster. Can be used in the cloud as Software as a services to support online social networks.

6 Large Scale Graph Processing Systems that tries to solve the problem of processing large graphs in parallel: MapReduce auto task scheduling, distributed disk based computations: Pegasus X-Rime Pregel - Bulk Synchronous Parallel Graph Processing: Giraph GPS Mizan GraphLab Asynchronous Parallel Graph

7 Pregel* Graph Processing Consists of a series of synchronized iterations (supersteps); based on Bulk Synchronous Parallel computing model. Each superstep consists of: Concurrent computations Communication Synchronization barrier Vertex centric computation, the user's compute() function is applied individually on each vertex, which is able to: Send message to vertices in the next superstep Receive messages from the previous superstep *Malewicz et. al., Pregel: A System for Large-Scale Graph Processing

8 Pregel messaging Example 1 Superstep 0 A B D C

9 Pregel messaging Example 1 Superstep 0 Superstep 1 A B A 22 B 9 15 D C D C 47

10 Pregel messaging Example 1 Superstep 0 Superstep 1 A B A 22 B 9 15 D C D C 47 Superstep , 9 A B D 14 C 15

11 Pregel messaging Example 1 Superstep 0 Superstep 1 A B A 22 B 9 15 D C D C 47 Superstep , 9 Superstep 3 5-2, 7 A B A B D 14 C D 9 C 55

12 Vertex's State All vertices are active at superstep 1 All active vertices runs user function compute() at any superstep A vertex deactivates itself by voting to halt, but returns to active if it received messages. Pregel terminates of all vertices are inactive

13 Pregel Example 2 Data Distribution (Hash-based partitioning) Worker 1 Worker 2 Worker 3 Computation Communication Synchronization Barrier Terminate Yes Done? No

14 Pregel Example 3 Max

15 Pregel Example 3 Max

16 Pregel Example 3 Max

17 Pregel Example 3 Max

18 Pregel Example 4 Max code Vertex value class Class MaxFindVertex:public Vertex<double, void, double> { public: virtual void Compute(MessageIterator* msgs) { int currmax = GetValue(); SendMessageToAllNeighbors(currMax); for ( ;!msgs->done(); msgs- >Next()) { if (msgs->value() > currmax) currmax = msgs->value(); Edge value class Message class Send current Max Check messages and store max Store new max

19 Pregel Message Optimizations Message Combiners: A special function that combines the incoming messages for a vertex before running compute() Can run on the message sending or receiving worker Global Aggregators : A shared object accessible to all vertices. that is synchronized at the end of each superstep, i.e., max and min aggregators.

20 Pregel Guarantees Scalability: process vertices in parallel, overlap computation and communication. Messages will be received without duplication in any order. Fault tolerance through check points

21 Pregel's Limitations Pregel's superstep waits for all workers to finish at the synchronization barrier. That is, it waits for the slowest worker to finish. Smart partitioning can solve the load balancing problem for static algorithms. However not all algorithms are static, algorithms can have a variable execution behaviors which leads to an unbalanced supersteps.

22 *Khayyat et. al., Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Mizan* Graph Processing Mizan is an open source graph processing system, similar to Pregel, developed locally at KAUST. Mizan employs dynamic graph repartitioning without affecting the correctness of graph processing to rebalanced the execution of the supersteps for all types of workloads.

23 Source of Imbalance in BSP

24 Source of Imbalance in BSP

25 Types of Graph Algorithms Stationary Graph Algorithms: Algorithms with fixed message distribution across superstep All vertices are either active or inactive at same time i.e. PageRank, Diameter Estimation and weakly connected components. Non-stationary Graph Algorithms Algorithms with variable message distribution across supersteps Vertices can be active and inactive independent to others

26 Mizan architecture Each Mizan worker contains three distinct main components: BSP Processor, communicator and storage manager. The distributed hash table (DHT) is used to maintain the location of each vertex The migration planner interacts with other components during the BSP barrier

27 Mizan's Barriers

28 Dynamic migration: Statistics Mizan monitors the following for every vertex: Response time Remote outgoing messages Incoming messages

29 Dynamic migration: planning Mizan's migration planner runs after the BSP barrier and creates a new barrier. The planning includes the following steps: Identifying unbalanced workers. Identifying migration objective: Response time Incoming messages Outgoing messages

30 Mizan's Migration Work-flow

31 Mizan PageRank Compute() Example void compute(messageiterator<mdouble> * messages, uservertexobject<mlong, mdouble, mdouble, mlong> * data,messagemanager<mlong, mdouble, mdouble, mlong> * comm) { double currval = data >getvertexvalue().getvalue(); double newval = 0; double c = 0.85; while (messages >hasnext()) { double tmp = messages >getnext().getvalue(); newval = newval + tmp; } Processing Messages } newval = newval * c + (1.0 c) / ((double) vertextotal); mdouble outval(newval / ((double) data >getoutedgecount())); if (data >getcurrentss() <= maxsuperstep) { for (int i = 0; i < data >getoutedgecount(); i++) { comm >sendmessage(data >getoutedgeid(i), outval); data >getoutedgeid(i); } } else { data >votetohalt(); } data >setvertexvalue(mdouble(newval)); Termination Condition Sending to Neighbors

32 Mizan PageRank Combiner Example void combinemessages(mlong dst, messageiterator<mdouble> * messages,messagemanager<mlong, mdouble, mdouble, mlong> * mmanager) { double newval = 0; while (messages >hasnext()) { double tmp = messages >getnext().getvalue(); newval = newval + tmp; } } mdouble messageout(newval); mmanager >sendmessage(dst,messageout);

33 Mizan Max Aggregator Example class maxaggregator: public IAggregator<mLong> { Public: mlong aggvalue; maxaggregator() { aggvalue.setvalue(0); } void aggregate(mlong value) { if (value > aggvalue) { aggvalue = value; } } mlong getvalue() { return aggvalue; } void setvalue(mlong value) { this >aggvalue = value; } }; virtual ~maxaggregator() {}

34 Class Assignment Your assignment is to configure, install and run Mizan on a single Linux machine throw following this tutorial: By the end of the tutorial, you should be able to execute the command on your machine: mpirun np 2./Mizan 0.1b u ubuntu g web Google.txt w 2 Deliverables: you store the output of of the above command and submit it by Wednesday's class. Any questions regarding the tutorial or to get an account for a Ubuntu machine, contact me on: zuhair.khayyat@kaust.edu.sa

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing /34 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of

More information

Large-Scale Graph Processing 1: Pregel & Apache Hama Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC

Large-Scale Graph Processing 1: Pregel & Apache Hama Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC Large-Scale Graph Processing 1: Pregel & Apache Hama Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC Lecture material is mostly home-grown, partly taken with permission and courtesy from Professor Shih-Wei

More information

modern database systems lecture 10 : large-scale graph processing

modern database systems lecture 10 : large-scale graph processing modern database systems lecture 1 : large-scale graph processing Aristides Gionis spring 18 timeline today : homework is due march 6 : homework out april 5, 9-1 : final exam april : homework due graphs

More information

Graph Processing. Connor Gramazio Spiros Boosalis

Graph Processing. Connor Gramazio Spiros Boosalis Graph Processing Connor Gramazio Spiros Boosalis Pregel why not MapReduce? semantics: awkward to write graph algorithms efficiency: mapreduces serializes state (e.g. all nodes and edges) while pregel keeps

More information

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G.

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G. Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G. Speaker: Chong Li Department: Applied Health Science Program: Master of Health Informatics 1 Term

More information

Distributed Systems. 21. Graph Computing Frameworks. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems. 21. Graph Computing Frameworks. Paul Krzyzanowski. Rutgers University. Fall 2016 Distributed Systems 21. Graph Computing Frameworks Paul Krzyzanowski Rutgers University Fall 2016 November 21, 2016 2014-2016 Paul Krzyzanowski 1 Can we make MapReduce easier? November 21, 2016 2014-2016

More information

Pregel: A System for Large- Scale Graph Processing. Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010

Pregel: A System for Large- Scale Graph Processing. Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010 Pregel: A System for Large- Scale Graph Processing Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010 1 Graphs are hard Poor locality of memory access Very

More information

PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING

PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, G. Czajkowski Google, Inc. SIGMOD 2010 Presented by Ke Hong (some figures borrowed from

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 60 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Pregel: A System for Large-Scale Graph Processing

More information

Pregel. Ali Shah

Pregel. Ali Shah Pregel Ali Shah s9alshah@stud.uni-saarland.de 2 Outline Introduction Model of Computation Fundamentals of Pregel Program Implementation Applications Experiments Issues with Pregel 3 Outline Costs of Computation

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 14: Distributed Graph Processing Motivation Many applications require graph processing E.g., PageRank Some graph data sets are very large

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 14: Distributed Graph Processing Motivation Many applications require graph processing E.g., PageRank Some graph data sets are very large

More information

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) SIGMOD 2010 Presented by : Xiu

More information

GPS: A Graph Processing System

GPS: A Graph Processing System GPS: A Graph Processing System Semih Salihoglu and Jennifer Widom Stanford University {semih,widom}@cs.stanford.edu Abstract GPS (for Graph Processing System) is a complete open-source system we developed

More information

Pregel: A System for Large-Scale Graph Proces sing

Pregel: A System for Large-Scale Graph Proces sing Pregel: A System for Large-Scale Graph Proces sing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkwoski Google, Inc. SIGMOD July 20 Taewhi

More information

Distributed Graph Algorithms

Distributed Graph Algorithms Distributed Graph Algorithms Alessio Guerrieri University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents 1 Introduction

More information

[CoolName++]: A Graph Processing Framework for Charm++

[CoolName++]: A Graph Processing Framework for Charm++ [CoolName++]: A Graph Processing Framework for Charm++ Hassan Eslami, Erin Molloy, August Shi, Prakalp Srivastava Laxmikant V. Kale Charm++ Workshop University of Illinois at Urbana-Champaign {eslami2,emolloy2,awshi2,psrivas2,kale}@illinois.edu

More information

CS /21/2016. Paul Krzyzanowski 1. Can we make MapReduce easier? Distributed Systems. Apache Pig. Apache Pig. Pig: Loading Data.

CS /21/2016. Paul Krzyzanowski 1. Can we make MapReduce easier? Distributed Systems. Apache Pig. Apache Pig. Pig: Loading Data. Distributed Systems 1. Graph Computing Frameworks Can we make MapReduce easier? Paul Krzyzanowski Rutgers University Fall 016 1 Apache Pig Apache Pig Why? Make it easy to use MapReduce via scripting instead

More information

Large Scale Graph Processing Pregel, GraphLab and GraphX

Large Scale Graph Processing Pregel, GraphLab and GraphX Large Scale Graph Processing Pregel, GraphLab and GraphX Amir H. Payberah amir@sics.se KTH Royal Institute of Technology Amir H. Payberah (KTH) Large Scale Graph Processing 2016/10/03 1 / 76 Amir H. Payberah

More information

Giraph: Large-scale graph processing infrastructure on Hadoop. Qu Zhi

Giraph: Large-scale graph processing infrastructure on Hadoop. Qu Zhi Giraph: Large-scale graph processing infrastructure on Hadoop Qu Zhi Why scalable graph processing? Web and social graphs are at immense scale and continuing to grow In 2008, Google estimated the number

More information

GraphHP: A Hybrid Platform for Iterative Graph Processing

GraphHP: A Hybrid Platform for Iterative Graph Processing GraphHP: A Hybrid Platform for Iterative Graph Processing Qun Chen, Song Bai, Zhanhuai Li, Zhiying Gou, Bo Suo and Wei Pan Northwestern Polytechnical University Xi an, China {chenbenben, baisong, lizhh,

More information

Graph Processing & Bulk Synchronous Parallel Model

Graph Processing & Bulk Synchronous Parallel Model Graph Processing & Bulk Synchronous Parallel Model CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 14 : 590.02 Spring 13 1 Recap: Graph Algorithms Many graph algorithms need iterafve computafon

More information

PREGEL. A System for Large Scale Graph Processing

PREGEL. A System for Large Scale Graph Processing PREGEL A System for Large Scale Graph Processing The Problem Large Graphs are often part of computations required in modern systems (Social networks and Web graphs etc.) There are many graph computing

More information

PREGEL. A System for Large-Scale Graph Processing

PREGEL. A System for Large-Scale Graph Processing PREGEL A System for Large-Scale Graph Processing The Problem Large Graphs are often part of computations required in modern systems (Social networks and Web graphs etc.) There are many graph computing

More information

One Trillion Edges. Graph processing at Facebook scale

One Trillion Edges. Graph processing at Facebook scale One Trillion Edges Graph processing at Facebook scale Introduction Platform improvements Compute model extensions Experimental results Operational experience How Facebook improved Apache Giraph Facebook's

More information

PREGEL AND GIRAPH. Why Pregel? Processing large graph problems is challenging Options

PREGEL AND GIRAPH. Why Pregel? Processing large graph problems is challenging Options Data Management in the Cloud PREGEL AND GIRAPH Thanks to Kristin Tufte 1 Why Pregel? Processing large graph problems is challenging Options Custom distributed infrastructure Existing distributed computing

More information

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem I J C T A, 9(41) 2016, pp. 1235-1239 International Science Press Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem Hema Dubey *, Nilay Khare *, Alind Khare **

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 8: Analyzing Graphs, Redux (1/2) March 20, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Apache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM

Apache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM Apache Giraph: Facebook-scale graph processing infrastructure 3/31/2014 Avery Ching, Facebook GDM Motivation Apache Giraph Inspired by Google s Pregel but runs on Hadoop Think like a vertex Maximum value

More information

A Cost-efficient Auto-scaling Algorithm for Large-scale Graph Processing in Cloud Environments with Heterogeneous Resources

A Cost-efficient Auto-scaling Algorithm for Large-scale Graph Processing in Cloud Environments with Heterogeneous Resources IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MANUSCRIPT ID 1 A Cost-efficient Auto-scaling Algorithm for Large-scale Graph Processing in Cloud Environments with Heterogeneous Resources Safiollah Heidari,

More information

Graph Processing Frameworks

Graph Processing Frameworks Graph Processing Frameworks Lecture 24 CSCI 4974/6971 5 Dec 2016 1 / 13 Today s Biz 1. Reminders 2. Review 3. Graph Processing Frameworks 4. 2D Partitioning 2 / 13 Reminders Assignment 6: due date Dec

More information

COSC 6339 Big Data Analytics. Graph Algorithms and Apache Giraph

COSC 6339 Big Data Analytics. Graph Algorithms and Apache Giraph COSC 6339 Big Data Analytics Graph Algorithms and Apache Giraph Parts of this lecture are adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

From Think Like a Vertex to Think Like a Graph. Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, John McPherson

From Think Like a Vertex to Think Like a Graph. Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, John McPherson From Think Like a Vertex to Think Like a Graph Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, John McPherson Large Scale Graph Processing Graph data is everywhere and growing

More information

Distributed Systems. 20. Other parallel frameworks. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 20. Other parallel frameworks. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 20. Other parallel frameworks Paul Krzyzanowski Rutgers University Fall 2017 November 20, 2017 2014-2017 Paul Krzyzanowski 1 Can we make MapReduce easier? 2 Apache Pig Why? Make it

More information

CS November 2017

CS November 2017 Distributed Systems 0. Other parallel frameworks Can we make MapReduce easier? Paul Krzyzanowski Rutgers University Fall 017 November 0, 017 014-017 Paul Krzyzanowski 1 Apache Pig Apache Pig Why? Make

More information

Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems

Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems University of Waterloo Technical Report CS-215-4 ABSTRACT Minyang Han David R. Cheriton School of Computer

More information

Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems

Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems ABSTRACT Minyang Han David R. Cheriton School of Computer Science University of Waterloo m25han@uwaterloo.ca

More information

Computation and Communication Efficient Graph Processing with Distributed Immutable View

Computation and Communication Efficient Graph Processing with Distributed Immutable View Computation and Communication Efficient Graph Processing with Distributed Immutable View Rong Chen, Xin Ding, Peng Wang, Haibo Chen, Binyu Zang, Haibing Guan Shanghai Key Laboratory of Scalable Computing

More information

GiViP: A Visual Profiler for Distributed Graph Processing Systems

GiViP: A Visual Profiler for Distributed Graph Processing Systems GiViP: A Visual Profiler for Distributed Graph Processing Systems Alessio Arleo, Walter Didimo, Giuseppe Liotta and Fabrizio Montecchiani University of Perugia, Italy The Value of Big Graphs The analysis

More information

Optimizing CPU Cache Performance for Pregel-Like Graph Computation

Optimizing CPU Cache Performance for Pregel-Like Graph Computation Optimizing CPU Cache Performance for Pregel-Like Graph Computation Songjie Niu, Shimin Chen* State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences

More information

Big Graph Processing. Fenggang Wu Nov. 6, 2016

Big Graph Processing. Fenggang Wu Nov. 6, 2016 Big Graph Processing Fenggang Wu Nov. 6, 2016 Agenda Project Publication Organization Pregel SIGMOD 10 Google PowerGraph OSDI 12 CMU GraphX OSDI 14 UC Berkeley AMPLab PowerLyra EuroSys 15 Shanghai Jiao

More information

Fast Failure Recovery in Distributed Graph Processing Systems

Fast Failure Recovery in Distributed Graph Processing Systems Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H. V. agadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore, Zhejiang University, University

More information

BSP, Pregel and the need for Graph Processing

BSP, Pregel and the need for Graph Processing BSP, Pregel and the need for Graph Processing Patrizio Dazzi, HPC Lab ISTI - CNR mail: patrizio.dazzi@isti.cnr.it web: http://hpc.isti.cnr.it/~dazzi/ National Research Council of Italy A need for Graph

More information

Handling limits of high degree vertices in graph processing using MapReduce and Pregel

Handling limits of high degree vertices in graph processing using MapReduce and Pregel Handling limits of high degree vertices in graph processing using MapReduce and Pregel Mostafa Bamha, Mohamad Al Hajj Hassan To cite this version: Mostafa Bamha, Mohamad Al Hajj Hassan. Handling limits

More information

LFGraph: Simple and Fast Distributed Graph Analytics

LFGraph: Simple and Fast Distributed Graph Analytics LFGraph: Simple and Fast Distributed Graph Analytics Imranul Hoque VMware, Inc. ihoque@vmware.com Indranil Gupta University of Illinois, Urbana-Champaign indy@illinois.edu Abstract Distributed graph analytics

More information

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems Fall 2017 Exam 3 Review Paul Krzyzanowski Rutgers University Fall 2017 December 11, 2017 CS 417 2017 Paul Krzyzanowski 1 Question 1 The core task of the user s map function within a

More information

Pregel: A System for Large-Scale Graph Processing

Pregel: A System for Large-Scale Graph Processing Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski Google, Inc. {malewicz,austern,ajcbik,dehnert,ilan,naty,gczaj@google.com

More information

TI2736-B Big Data Processing. Claudia Hauff

TI2736-B Big Data Processing. Claudia Hauff TI2736-B Big Data Processing Claudia Hauff ti2736b-ewi@tudelft.nl Intro Streams Streams Map Reduce HDFS Pig Ctd. Graphs Pig Design Patterns Hadoop Ctd. Giraph Zoo Keeper Spark Spark Ctd. Learning objectives

More information

Distributed Systems. 21. Other parallel frameworks. Paul Krzyzanowski. Rutgers University. Fall 2018

Distributed Systems. 21. Other parallel frameworks. Paul Krzyzanowski. Rutgers University. Fall 2018 Distributed Systems 21. Other parallel frameworks Paul Krzyzanowski Rutgers University Fall 2018 1 Can we make MapReduce easier? 2 Apache Pig Why? Make it easy to use MapReduce via scripting instead of

More information

CS November 2018

CS November 2018 Distributed Systems 1. Other parallel frameworks Can we make MapReduce easier? Paul Krzyzanowski Rutgers University Fall 018 1 Apache Pig Apache Pig Why? Make it easy to use MapReduce via scripting instead

More information

Quegel: A General-Purpose Query-Centric Framework for Querying Big Graphs

Quegel: A General-Purpose Query-Centric Framework for Querying Big Graphs Quegel: A General-Purpose Query-Centric Framework for Querying Big Graphs Da Yan 1, James Cheng 2, M. Tamer Özsu 3, Fan Yang 4, Yi Lu 5, John C. S. Lui 6, Qizhen Zhang 7, Wilfred Ng +8 Department of Computer

More information

LogGP: A Log-based Dynamic Graph Partitioning Method

LogGP: A Log-based Dynamic Graph Partitioning Method LogGP: A Log-based Dynamic Graph Partitioning Method Ning Xu, Lei Chen, Bin Cui Department of Computer Science, Peking University, Beijing, China Hong Kong University of Science and Technology, Hong Kong,

More information

Graph-Parallel Problems. ML in the Context of Parallel Architectures

Graph-Parallel Problems. ML in the Context of Parallel Architectures Case Study 4: Collaborative Filtering Graph-Parallel Problems Synchronous v. Asynchronous Computation Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 20 th, 2014

More information

Frameworks for Graph-Based Problems

Frameworks for Graph-Based Problems Frameworks for Graph-Based Problems Dakshil Shah U.G. Student Computer Engineering Department Dwarkadas J. Sanghvi College of Engineering, Mumbai, India Chetashri Bhadane Assistant Professor Computer Engineering

More information

GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics

GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics Yogesh Simmhan 1, Alok Kumbhare 2, Charith Wickramaarachchi 2, Soonil Nagarkar 2, Santosh Ravi 2, Cauligi Raghavendra 2, and Viktor

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 Web Search Engine Crawling Indexing Computing

More information

Automatic Scaling Iterative Computations. Aug. 7 th, 2012

Automatic Scaling Iterative Computations. Aug. 7 th, 2012 Automatic Scaling Iterative Computations Guozhang Wang Cornell University Aug. 7 th, 2012 1 What are Non-Iterative Computations? Non-iterative computation flow Directed Acyclic Examples Batch style analytics

More information

A Parallel Community Detection Algorithm for Big Social Networks

A Parallel Community Detection Algorithm for Big Social Networks A Parallel Community Detection Algorithm for Big Social Networks Yathrib AlQahtani College of Computer and Information Sciences King Saud University Collage of Computing and Informatics Saudi Electronic

More information

High Performance Data Analytics: Experiences Porting the Apache Hama Graph Analytics Framework to an HPC InfiniBand Connected Cluster

High Performance Data Analytics: Experiences Porting the Apache Hama Graph Analytics Framework to an HPC InfiniBand Connected Cluster High Performance Data Analytics: Experiences Porting the Apache Hama Graph Analytics Framework to an HPC InfiniBand Connected Cluster Summary Open source analytic frameworks, such as those in the Apache

More information

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs Da Zheng, Disa Mhembere, Randal Burns Department of Computer Science Johns Hopkins University Carey E. Priebe Department of Applied

More information

igiraph: A Cost-efficient Framework for Processing Large-scale Graphs on Public Clouds

igiraph: A Cost-efficient Framework for Processing Large-scale Graphs on Public Clouds 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing igiraph: A Cost-efficient Framework for Processing Large-scale Graphs on Public Clouds Safiollah Heidari, Rodrigo N. Calheiros

More information

Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Access

Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Access Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Access Yongzhe Zhang National Institute of Informatics 3rd Spring Festival Workshop March 21, 2017 Outline Background of vertex-centric

More information

Giraphx: Parallel Yet Serializable Large-Scale Graph Processing

Giraphx: Parallel Yet Serializable Large-Scale Graph Processing Giraphx: Parallel Yet Serializable Large-Scale Graph Processing Serafettin Tasci and Murat Demirbas Computer Science & Engineering Department University at Buffalo, SUNY Abstract. Bulk Synchronous Parallelism

More information

Turning NoSQL data into Graph Playing with Apache Giraph and Apache Gora

Turning NoSQL data into Graph Playing with Apache Giraph and Apache Gora Turning NoSQL data into Graph Playing with Apache Giraph and Apache Gora Team Renato Marroquín! PhD student: Interested in: Information retrieval. Distributed and scalable data management. Apache Gora:

More information

Mizan-RMA: Accelerating Mizan Graph Processing Framework with MPI RMA*

Mizan-RMA: Accelerating Mizan Graph Processing Framework with MPI RMA* 216 IEEE 23rd International Conference on High Performance Computing Mizan-RMA: Accelerating Mizan Graph Processing Framework with MPI RMA* Mingzhe Li, Xiaoyi Lu, Khaled Hamidouche, Jie Zhang, and Dhabaleswar

More information

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21 Big Processing -Parallel Computation COS 418: Distributed Systems Lecture 21 Michael Freedman 2 Ex: Word count using partial aggregation Putting it together 1. Compute word counts from individual files

More information

HIGH PERFORMANCE DATA ANALYTICS:

HIGH PERFORMANCE DATA ANALYTICS: www.gdmissionsystems.com/hpc HIGH PERFORMANCE DATA ANALYTICS: Experiences Porting the Apache Hama Graph Analytics Framework to an HPC InfiniBand Connected Cluster 1. Summary Open source analytic frameworks,

More information

Webinar Series TMIP VISION

Webinar Series TMIP VISION Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing

More information

Graph Partitioning for Scalable Distributed Graph Computations

Graph Partitioning for Scalable Distributed Graph Computations Graph Partitioning for Scalable Distributed Graph Computations Aydın Buluç ABuluc@lbl.gov Kamesh Madduri madduri@cse.psu.edu 10 th DIMACS Implementation Challenge, Graph Partitioning and Graph Clustering

More information

Efficient graph computation on hybrid CPU and GPU systems

Efficient graph computation on hybrid CPU and GPU systems J Supercomput (2015) 71:1563 1586 DOI 10.1007/s11227-015-1378-z Efficient graph computation on hybrid CPU and GPU systems Tao Zhang Jingjie Zhang Wei Shu Min-You Wu Xiaoyao Liang Published online: 21 January

More information

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 125-129 www.iosrjournals.org I ++ Mapreduce: Incremental Mapreduce for

More information

Management and Analysis of Big Graph Data: Current Systems and Open Challenges

Management and Analysis of Big Graph Data: Current Systems and Open Challenges Management and Analysis of Big Graph Data: Current Systems and Open Challenges Martin Junghanns 1, André Petermann 1, Martin Neumann 2 and Erhard Rahm 1 1 Leipzig University, Database Research Group 2

More information

Distributed Graph Storage. Veronika Molnár, UZH

Distributed Graph Storage. Veronika Molnár, UZH Distributed Graph Storage Veronika Molnár, UZH Overview Graphs and Social Networks Criteria for Graph Processing Systems Current Systems Storage Computation Large scale systems Comparison / Best systems

More information

Research challenges in data-intensive computing The Stratosphere Project Apache Flink

Research challenges in data-intensive computing The Stratosphere Project Apache Flink Research challenges in data-intensive computing The Stratosphere Project Apache Flink Seif Haridi KTH/SICS haridi@kth.se e2e-clouds.org Presented by: Seif Haridi May 2014 Research Areas Data-intensive

More information

Scalable Analytics over Distributed Time-series Graphs using GoFFish

Scalable Analytics over Distributed Time-series Graphs using GoFFish Scalable Analytics over Distributed Time-series Graphs using GoFFish Yogesh Simmhan, Charith Wickramaarachchi, Alok Kumbhare, Marc Frincu, Soonil Nagarkar, Santosh Ravi, Cauligi Raghavendra, Viktor Prasanna

More information

Report. X-Stream Edge-centric Graph processing

Report. X-Stream Edge-centric Graph processing Report X-Stream Edge-centric Graph processing Yassin Hassan hassany@student.ethz.ch Abstract. X-Stream is an edge-centric graph processing system, which provides an API for scatter gather algorithms. The

More information

Graph-Processing Systems. (focusing on GraphChi)

Graph-Processing Systems. (focusing on GraphChi) Graph-Processing Systems (focusing on GraphChi) Recall: PageRank in MapReduce (Hadoop) Input: adjacency matrix H D F S (a,[c]) (b,[a]) (c,[a,b]) (c,pr(a) / out (a)), (a,[c]) (a,pr(b) / out (b)), (b,[a])

More information

Parallel Computing: MapReduce Jin, Hai

Parallel Computing: MapReduce Jin, Hai Parallel Computing: MapReduce Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! MapReduce is a distributed/parallel computing framework introduced by Google

More information

DFA-G: A Unified Programming Model for Vertex-centric Parallel Graph Processing

DFA-G: A Unified Programming Model for Vertex-centric Parallel Graph Processing SCHOOL OF COMPUTER SCIENCE AND ENGINEERING DFA-G: A Unified Programming Model for Vertex-centric Parallel Graph Processing Bo Suo, Jing Su, Qun Chen, Zhanhuai Li, Wei Pan 2016-08-19 1 ABSTRACT Many systems

More information

Apache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis

Apache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis Züri Machine Learning Meetup #5 June 16, 2014 Apache Giraph for applications in Machine Learning

More information

hyperx: scalable hypergraph processing

hyperx: scalable hypergraph processing hyperx: scalable hypergraph processing Jin Huang November 15, 2015 The University of Melbourne overview Research Outline Scalable Hypergraph Processing Problem and Challenge Idea Solution Implementation

More information

Abstract. Keywords: Graph processing; cloud computing; quality of service; resource provisioning. 1. Introduction

Abstract. Keywords: Graph processing; cloud computing; quality of service; resource provisioning. 1. Introduction Quality of Service (QoS)-driven Resource Provisioning for Large-scale Graph Processing in Cloud Computing Environments: Graph Processing-as-a-Service (GPaaS) Safiollah Heidari and Rajkumar Buyya Cloud

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

A Study of Skew in MapReduce Applications

A Study of Skew in MapReduce Applications A Study of Skew in MapReduce Applications YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Motivation MapReduce is great Hides details of distributed execution

More information

Case Study 4: Collaborative Filtering. GraphLab

Case Study 4: Collaborative Filtering. GraphLab Case Study 4: Collaborative Filtering GraphLab Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin March 14 th, 2013 Carlos Guestrin 2013 1 Social Media

More information

An efficient graph data processing system for large-scale social network service applications

An efficient graph data processing system for large-scale social network service applications CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. (214) Published online in Wiley Online Library (wileyonlinelibrary.com)..3393 SPECIAL ISSUE PAPER An efficient

More information

PAGE: A Partition Aware Graph Computation Engine

PAGE: A Partition Aware Graph Computation Engine PAGE: A Graph Computation Engine Yingxia Shao, Junjie Yao, Bin Cui, Lin Ma Department of Computer Science Key Lab of High Confidence Software Technologies (Ministry of Education) Peking University {simon227,

More information

Efficient Graph Computation on Hybrid CPU and GPU Systems

Efficient Graph Computation on Hybrid CPU and GPU Systems Noname manuscript No. (will be inserted by the editor) Efficient Graph Computation on Hybrid CPU and GPU Systems Tao Zhang Jingjie Zhang Wei Shu Min-You Wu Xiaoyao Liang* Received: date / Accepted: date

More information

Parallel Processing of Large Graphs. Tomasz Kajdanowicz, Przemyslaw Kazienko, Wojciech Indyk. Wroclaw University of Technology, Poland

Parallel Processing of Large Graphs. Tomasz Kajdanowicz, Przemyslaw Kazienko, Wojciech Indyk. Wroclaw University of Technology, Poland Parallel Processing of Large Graphs Tomasz Kajdanowicz, Przemyslaw Kazienko, Wojciech Indyk Wroclaw University of Technology, Poland arxiv:1306.0326v1 [cs.dc] 3 Jun 2013 Abstract More and more large data

More information

Graphs! December 1, 2014

Graphs! December 1, 2014 Graphs! December 1, 2014 Announcements This is our last technical lecture! Thank you for all your great ques@ons and interes@ng interac@ons Next lecture is our final review Send ques@ons!!! All exam logis@cs

More information

arxiv: v1 [cs.dc] 21 Jan 2016

arxiv: v1 [cs.dc] 21 Jan 2016 Efficient Processing of Very Large Graphs in a Small Cluster Da Yan 1, Yuzhen Huang 2, James Cheng 3, Huanhuan Wu 4 Department of Computer Science and Engineering, The Chinese University of Hong Kong {

More information

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia MapReduce Spark Some slides are adapted from those of Jeff Dean and Matei Zaharia What have we learnt so far? Distributed storage systems consistency semantics protocols for fault tolerance Paxos, Raft,

More information

Memory-Optimized Distributed Graph Processing. through Novel Compression Techniques

Memory-Optimized Distributed Graph Processing. through Novel Compression Techniques Memory-Optimized Distributed Graph Processing through Novel Compression Techniques Katia Papakonstantinopoulou Joint work with Panagiotis Liakos and Alex Delis University of Athens Athens Colloquium in

More information

A Hierarchical Synchronous Parallel Model for Wide-Area Graph Analytics

A Hierarchical Synchronous Parallel Model for Wide-Area Graph Analytics A Hierarchical Synchronous Parallel Model for Wide-Area Graph Analytics Shuhao Liu*, Li Chen, Baochun Li, Aiden Carnegie University of Toronto April 17, 2018 Graph Analytics What is Graph Analytics? 2

More information

Efficient and Simplified Parallel Graph Processing over CPU and MIC

Efficient and Simplified Parallel Graph Processing over CPU and MIC Efficient and Simplified Parallel Graph Processing over CPU and MIC Linchuan Chen Xin Huo Bin Ren Surabhi Jain Gagan Agrawal Department of Computer Science and Engineering The Ohio State University Columbus,

More information

CS 5220: Parallel Graph Algorithms. David Bindel

CS 5220: Parallel Graph Algorithms. David Bindel CS 5220: Parallel Graph Algorithms David Bindel 2017-11-14 1 Graphs Mathematically: G = (V, E) where E V V Convention: V = n and E = m May be directed or undirected May have weights w V : V R or w E :

More information

Praynaa Rawlani. at the. August 2014 Fseovevber 20H L4-RARIES. Department of Electrical Engineering and Computer Science August 22, 2014

Praynaa Rawlani. at the. August 2014 Fseovevber 20H L4-RARIES. Department of Electrical Engineering and Computer Science August 22, 2014 Graph Analytics on Relational Databases by Praynaa Rawlani S.B., Electrical Engineering and Computer Science, MIT (2013) Submitted to the Department of Electrical Engineering and Computer Science in Partial

More information

On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage

On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage Arijit Khan Nanyang Technological University (NTU), Singapore Gustavo Segovia ETH Zurich, Switzerland Donald Kossmann Microsoft

More information

Batch Processing Basic architecture

Batch Processing Basic architecture Batch Processing Basic architecture in big data systems COS 518: Distributed Systems Lecture 10 Andrew Or, Mike Freedman 2 1 2 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores 3

More information