PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING

Size: px
Start display at page:

Download "PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING"

Transcription

1 PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) SIGMOD 2010 Presented by : Xiu Zhang

2 Motivations Computation Models System Architecture False Toleration Applications Experiments

3 MOTIVATION 3

4 Motivation Large Graphs Computation is needed: Social Media Transportation

5 Motivation Documents=vertices Links=edges Web graph

6 Graph Algorithms Pattern Matching Search through the entire graph Identify similar components Traversals Define a specific start point Iteratively explore the graph Global measurements Compute one value for graph, based on all its vertices or edges

7 Challenges for Graph Algorithms Poor Locality of memory access Very little computation work required per vertex, however iterate many times Shortest Path Changing degree of parallelism over course of execution Connect Component Analysis

8 Possible Solutions Custom distributed frame work for each alg. Existing distributed computing platforms MapReduce unnecessarily slow, hard to implement Single-computer graph algorithm libraries Scale limitation Existing parallel graph systems Fault tolerance Parallel BGL and CGMgraph

9 Inspired by Valiant s Bulk Synchronous Parallel (BSP) mode Vertex centric computation

10 COMPUTATION MODEL 10

11 Computation Model(BSP) asynchronization Source: 11

12 : Message Passing Model Vertex: A unique identifier A modifiable, user defined value Edge: Source vertex and Target vertex identifiers A modifiable, user defined value

13 Basic Organization Supersteps: Iterations Invoke user defined function for each vertex Read messages sent to V in superstep S-1 Send messages that will be received in S+1 Modify the state of V and the outgoing edges Make topology changes Introduce/Delete/Modify edges(vertices) Votes to halt if no further work to do

14 State machine for a vertex Termination Condition All vertices are simultaneously inactive There are no messages in transit

15 Example Single Source Shortest Path Find shortest path from a source node to all target nodes Example taken from talk by Taewhi Lee,

16 Example: SSSP Parallel BFS in Inactive Vertex Active Vertex x x Edge weight Message 16

17 Example: SSSP Parallel BFS in Inactive Vertex Active Vertex 5 7 x x Edge weight Message

18 Example: SSSP Parallel BFS in Inactive Vertex Active Vertex x Edge weight 5 7 x Message

19 Example: SSSP Parallel BFS in Inactive Vertex Active Vertex x Edge weight x Message

20 Example: SSSP Parallel BFS in Inactive Vertex Active Vertex x Edge weight 5 7 x Message

21 Example: SSSP Parallel BFS in Inactive Vertex Active Vertex x Edge weight 5 7 x Message

22 Example: SSSP Parallel BFS in Inactive Vertex Active Vertex x Edge weight x Message

23 Example: SSSP Parallel BFS in Inactive Vertex Active Vertex x Edge weight 5 7 x Message

24 SYSTEM ARCHITECTURE 24

25 System Architecture system uses the master/worker model Master Coordinates workers Recovers faults of workers Worker Processes its task Communicates with the other workers Persistent data is in distributed storage system Temporary data is stored on local disk 25

26 Execution 26

27 Execution 27

28 Execution 28

29 Execution 29

30 Execution 30

31 FALSE TOLERANCE 31

32 Fault Tolerance Checkpointing The master periodically instructs the workers to save the state of their partitions to persistent storage e.g., Vertex values, edge values, incoming messages Failure detection Master uses regular ping messages to detect worker failures 32

33 Fault Tolerance Recovery The master reassigns graph partitions to the currently available workers The workers all reload their partition states from most recent available checkpoint 33

34 APPLICATIONS 34

35 PageRank the importance of a document the number of references to it the importance of the source documents themselves A = A given page T 1. T n = Pages that point to page A (citations) d = Damping factor between 0 and 1 (usually kept as 0.85) C(T) = number of links going out of T PR(A) = the PageRank of page A PR( A) (1 d) d PR( T1 ) ( C( T ) 1 PR( T2 ) C( T ) 2... PR( Tn ) ) C( T ) n 35

36 PageRank Courtesy: Wikipedia 36

37 PageRank Iterative loop till convergence Initial value of PageRank of all pages = 1.0; While ( sum of PageRank of all pages numpages > epsilon) { for each Page Pi in list { PageRank(Pi) = (1-d); for each page Pj linking to page Pi { PageRank(Pi) += d (PageRank(Pj)/numOutLinks(Pj)); } } } 37

38 Page Rank In

39 EXPERIMENTS 39

40 Experiments: (Shortest Paths) 1 billion vertex binary tree: varying number of worker tasks 40

41 Experiments: binary trees: varying graph sizes on 800 worker tasks 41

42 Experiments Log-normal random graphs, mean out-degree (thus over 127 billion edges in the largest case): varying graph sizes on 800 worker tasks 42

43 Conclusion Distributed system for large scale graph processing Think like a vertex computation model (intuitive API) 43

44 Limitations Inefficient if different regions of the graph converge at different speed Slowest machine Dense Graphs

45 THANK YOU ANY QUESTIONS?

Pregel: A System for Large-Scale Graph Proces sing

Pregel: A System for Large-Scale Graph Proces sing Pregel: A System for Large-Scale Graph Proces sing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkwoski Google, Inc. SIGMOD July 20 Taewhi

More information

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G.

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G. Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G. Speaker: Chong Li Department: Applied Health Science Program: Master of Health Informatics 1 Term

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 60 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Pregel: A System for Large-Scale Graph Processing

More information

PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING

PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, G. Czajkowski Google, Inc. SIGMOD 2010 Presented by Ke Hong (some figures borrowed from

More information

PREGEL AND GIRAPH. Why Pregel? Processing large graph problems is challenging Options

PREGEL AND GIRAPH. Why Pregel? Processing large graph problems is challenging Options Data Management in the Cloud PREGEL AND GIRAPH Thanks to Kristin Tufte 1 Why Pregel? Processing large graph problems is challenging Options Custom distributed infrastructure Existing distributed computing

More information

Big Graph Processing. Fenggang Wu Nov. 6, 2016

Big Graph Processing. Fenggang Wu Nov. 6, 2016 Big Graph Processing Fenggang Wu Nov. 6, 2016 Agenda Project Publication Organization Pregel SIGMOD 10 Google PowerGraph OSDI 12 CMU GraphX OSDI 14 UC Berkeley AMPLab PowerLyra EuroSys 15 Shanghai Jiao

More information

Pregel: A System for Large- Scale Graph Processing. Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010

Pregel: A System for Large- Scale Graph Processing. Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010 Pregel: A System for Large- Scale Graph Processing Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010 1 Graphs are hard Poor locality of memory access Very

More information

modern database systems lecture 10 : large-scale graph processing

modern database systems lecture 10 : large-scale graph processing modern database systems lecture 1 : large-scale graph processing Aristides Gionis spring 18 timeline today : homework is due march 6 : homework out april 5, 9-1 : final exam april : homework due graphs

More information

Pregel. Ali Shah

Pregel. Ali Shah Pregel Ali Shah s9alshah@stud.uni-saarland.de 2 Outline Introduction Model of Computation Fundamentals of Pregel Program Implementation Applications Experiments Issues with Pregel 3 Outline Costs of Computation

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 14: Distributed Graph Processing Motivation Many applications require graph processing E.g., PageRank Some graph data sets are very large

More information

Large-Scale Graph Processing 1: Pregel & Apache Hama Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC

Large-Scale Graph Processing 1: Pregel & Apache Hama Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC Large-Scale Graph Processing 1: Pregel & Apache Hama Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC Lecture material is mostly home-grown, partly taken with permission and courtesy from Professor Shih-Wei

More information

Distributed Systems. 21. Graph Computing Frameworks. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems. 21. Graph Computing Frameworks. Paul Krzyzanowski. Rutgers University. Fall 2016 Distributed Systems 21. Graph Computing Frameworks Paul Krzyzanowski Rutgers University Fall 2016 November 21, 2016 2014-2016 Paul Krzyzanowski 1 Can we make MapReduce easier? November 21, 2016 2014-2016

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 14: Distributed Graph Processing Motivation Many applications require graph processing E.g., PageRank Some graph data sets are very large

More information

PREGEL. A System for Large-Scale Graph Processing

PREGEL. A System for Large-Scale Graph Processing PREGEL A System for Large-Scale Graph Processing The Problem Large Graphs are often part of computations required in modern systems (Social networks and Web graphs etc.) There are many graph computing

More information

COSC 6339 Big Data Analytics. Graph Algorithms and Apache Giraph

COSC 6339 Big Data Analytics. Graph Algorithms and Apache Giraph COSC 6339 Big Data Analytics Graph Algorithms and Apache Giraph Parts of this lecture are adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Large Scale Graph Processing Pregel, GraphLab and GraphX

Large Scale Graph Processing Pregel, GraphLab and GraphX Large Scale Graph Processing Pregel, GraphLab and GraphX Amir H. Payberah amir@sics.se KTH Royal Institute of Technology Amir H. Payberah (KTH) Large Scale Graph Processing 2016/10/03 1 / 76 Amir H. Payberah

More information

CS /21/2016. Paul Krzyzanowski 1. Can we make MapReduce easier? Distributed Systems. Apache Pig. Apache Pig. Pig: Loading Data.

CS /21/2016. Paul Krzyzanowski 1. Can we make MapReduce easier? Distributed Systems. Apache Pig. Apache Pig. Pig: Loading Data. Distributed Systems 1. Graph Computing Frameworks Can we make MapReduce easier? Paul Krzyzanowski Rutgers University Fall 016 1 Apache Pig Apache Pig Why? Make it easy to use MapReduce via scripting instead

More information

Pregel: A System for Large-Scale Graph Processing

Pregel: A System for Large-Scale Graph Processing Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski Google, Inc. {malewicz,austern,ajcbik,dehnert,ilan,naty,gczaj@google.com

More information

King Abdullah University of Science and Technology. CS348: Cloud Computing. Large-Scale Graph Processing

King Abdullah University of Science and Technology. CS348: Cloud Computing. Large-Scale Graph Processing King Abdullah University of Science and Technology CS348: Cloud Computing Large-Scale Graph Processing Zuhair Khayyat 10/March/2013 The Importance of Graphs A graph is a mathematical structure that represents

More information

Memory-Optimized Distributed Graph Processing. through Novel Compression Techniques

Memory-Optimized Distributed Graph Processing. through Novel Compression Techniques Memory-Optimized Distributed Graph Processing through Novel Compression Techniques Katia Papakonstantinopoulou Joint work with Panagiotis Liakos and Alex Delis University of Athens Athens Colloquium in

More information

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem I J C T A, 9(41) 2016, pp. 1235-1239 International Science Press Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem Hema Dubey *, Nilay Khare *, Alind Khare **

More information

Handling limits of high degree vertices in graph processing using MapReduce and Pregel

Handling limits of high degree vertices in graph processing using MapReduce and Pregel Handling limits of high degree vertices in graph processing using MapReduce and Pregel Mostafa Bamha, Mohamad Al Hajj Hassan To cite this version: Mostafa Bamha, Mohamad Al Hajj Hassan. Handling limits

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 8: Analyzing Graphs, Redux (1/2) March 20, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

One Trillion Edges. Graph processing at Facebook scale

One Trillion Edges. Graph processing at Facebook scale One Trillion Edges Graph processing at Facebook scale Introduction Platform improvements Compute model extensions Experimental results Operational experience How Facebook improved Apache Giraph Facebook's

More information

USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR LARGE-SCALE TEMPORAL GRAPH PROCESSING

USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR LARGE-SCALE TEMPORAL GRAPH PROCESSING USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR LARGE-SCALE TEMPORAL GRAPH PROCESSING Matthias Steinbauer, Gabriele Anderst-Kotsis Institute of Telecooperation TALK OUTLINE Introduction and Motivation Preliminaries

More information

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 125-129 www.iosrjournals.org I ++ Mapreduce: Incremental Mapreduce for

More information

Lecture 8, 4/22/2015. Scribed by Orren Karniol-Tambour, Hao Yi Ong, Swaroop Ramaswamy, and William Song.

Lecture 8, 4/22/2015. Scribed by Orren Karniol-Tambour, Hao Yi Ong, Swaroop Ramaswamy, and William Song. CME 323: Distributed Algorithms and Optimization, Spring 2015 http://stanford.edu/~rezab/dao. Instructor: Reza Zadeh, Databricks and Stanford. Lecture 8, 4/22/2015. Scribed by Orren Karniol-Tambour, Hao

More information

Part II: Software Infrastructure in Data Centers: Distributed Execution Engines

Part II: Software Infrastructure in Data Centers: Distributed Execution Engines CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services Part II: Software Infrastructure in Data Centers: Distributed Execution Engines 1 MapReduce: Simplified

More information

DFA-G: A Unified Programming Model for Vertex-centric Parallel Graph Processing

DFA-G: A Unified Programming Model for Vertex-centric Parallel Graph Processing SCHOOL OF COMPUTER SCIENCE AND ENGINEERING DFA-G: A Unified Programming Model for Vertex-centric Parallel Graph Processing Bo Suo, Jing Su, Qun Chen, Zhanhuai Li, Wei Pan 2016-08-19 1 ABSTRACT Many systems

More information

PREGEL. A System for Large Scale Graph Processing

PREGEL. A System for Large Scale Graph Processing PREGEL A System for Large Scale Graph Processing The Problem Large Graphs are often part of computations required in modern systems (Social networks and Web graphs etc.) There are many graph computing

More information

Report. X-Stream Edge-centric Graph processing

Report. X-Stream Edge-centric Graph processing Report X-Stream Edge-centric Graph processing Yassin Hassan hassany@student.ethz.ch Abstract. X-Stream is an edge-centric graph processing system, which provides an API for scatter gather algorithms. The

More information

Graph Processing. Connor Gramazio Spiros Boosalis

Graph Processing. Connor Gramazio Spiros Boosalis Graph Processing Connor Gramazio Spiros Boosalis Pregel why not MapReduce? semantics: awkward to write graph algorithms efficiency: mapreduces serializes state (e.g. all nodes and edges) while pregel keeps

More information

BSP, Pregel and the need for Graph Processing

BSP, Pregel and the need for Graph Processing BSP, Pregel and the need for Graph Processing Patrizio Dazzi, HPC Lab ISTI - CNR mail: patrizio.dazzi@isti.cnr.it web: http://hpc.isti.cnr.it/~dazzi/ National Research Council of Italy A need for Graph

More information

GraphHP: A Hybrid Platform for Iterative Graph Processing

GraphHP: A Hybrid Platform for Iterative Graph Processing GraphHP: A Hybrid Platform for Iterative Graph Processing Qun Chen, Song Bai, Zhanhuai Li, Zhiying Gou, Bo Suo and Wei Pan Northwestern Polytechnical University Xi an, China {chenbenben, baisong, lizhh,

More information

Turning NoSQL data into Graph Playing with Apache Giraph and Apache Gora

Turning NoSQL data into Graph Playing with Apache Giraph and Apache Gora Turning NoSQL data into Graph Playing with Apache Giraph and Apache Gora Team Renato Marroquín! PhD student: Interested in: Information retrieval. Distributed and scalable data management. Apache Gora:

More information

Distributed Systems. 20. Other parallel frameworks. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 20. Other parallel frameworks. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 20. Other parallel frameworks Paul Krzyzanowski Rutgers University Fall 2017 November 20, 2017 2014-2017 Paul Krzyzanowski 1 Can we make MapReduce easier? 2 Apache Pig Why? Make it

More information

CS November 2017

CS November 2017 Distributed Systems 0. Other parallel frameworks Can we make MapReduce easier? Paul Krzyzanowski Rutgers University Fall 017 November 0, 017 014-017 Paul Krzyzanowski 1 Apache Pig Apache Pig Why? Make

More information

Graph Processing Frameworks

Graph Processing Frameworks Graph Processing Frameworks Lecture 24 CSCI 4974/6971 5 Dec 2016 1 / 13 Today s Biz 1. Reminders 2. Review 3. Graph Processing Frameworks 4. 2D Partitioning 2 / 13 Reminders Assignment 6: due date Dec

More information

CS6200 Information Retreival. The WebGraph. July 13, 2015

CS6200 Information Retreival. The WebGraph. July 13, 2015 CS6200 Information Retreival The WebGraph The WebGraph July 13, 2015 1 Web Graph: pages and links The WebGraph describes the directed links between pages of the World Wide Web. A directed edge connects

More information

Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Access

Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Access Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Access Yongzhe Zhang National Institute of Informatics 3rd Spring Festival Workshop March 21, 2017 Outline Background of vertex-centric

More information

Giraph: Large-scale graph processing infrastructure on Hadoop. Qu Zhi

Giraph: Large-scale graph processing infrastructure on Hadoop. Qu Zhi Giraph: Large-scale graph processing infrastructure on Hadoop Qu Zhi Why scalable graph processing? Web and social graphs are at immense scale and continuing to grow In 2008, Google estimated the number

More information

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics Presented by: Dishant Mittal Authors: Juwei Shi, Yunjie Qiu, Umar Firooq Minhas, Lemei Jiao, Chen Wang, Berthold Reinwald and Fatma

More information

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011 Data-Intensive Information Processing Applications! Session #5 Graph Algorithms Jordan Boyd-Graber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Drakkar: a graph based All-Nearest Neighbour search algorithm for bibliographic coupling

Drakkar: a graph based All-Nearest Neighbour search algorithm for bibliographic coupling Drakkar: a graph based All-Nearest Neighbour search algorithm for bibliographic coupling Bart Thijs KU Leuven, FEB, ECOOM; Leuven; Belgium Bart.thijs@kuleuven.be Abstract Drakkar is a novel algorithm for

More information

Graph Processing & Bulk Synchronous Parallel Model

Graph Processing & Bulk Synchronous Parallel Model Graph Processing & Bulk Synchronous Parallel Model CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 14 : 590.02 Spring 13 1 Recap: Graph Algorithms Many graph algorithms need iterafve computafon

More information

GPS: A Graph Processing System

GPS: A Graph Processing System GPS: A Graph Processing System Semih Salihoglu and Jennifer Widom Stanford University {semih,widom}@cs.stanford.edu Abstract GPS (for Graph Processing System) is a complete open-source system we developed

More information

Optimizing CPU Cache Performance for Pregel-Like Graph Computation

Optimizing CPU Cache Performance for Pregel-Like Graph Computation Optimizing CPU Cache Performance for Pregel-Like Graph Computation Songjie Niu, Shimin Chen* State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences

More information

TI2736-B Big Data Processing. Claudia Hauff

TI2736-B Big Data Processing. Claudia Hauff TI2736-B Big Data Processing Claudia Hauff ti2736b-ewi@tudelft.nl Intro Streams Streams Map Reduce HDFS Pig Ctd. Graphs Pig Design Patterns Hadoop Ctd. Giraph Zoo Keeper Spark Spark Ctd. Learning objectives

More information

CIM/E Oriented Graph Database Model Architecture and Parallel Network Topology Processing

CIM/E Oriented Graph Database Model Architecture and Parallel Network Topology Processing CIM/E Oriented Graph Model Architecture and Parallel Network Topology Processing Zhangxin Zhou a, b, Chen Yuan a, Ziyan Yao a, Jiangpeng Dai a, Guangyi Liu a, Renchang Dai a, Zhiwei Wang a, and Garng M.

More information

Distributed Systems. 21. Other parallel frameworks. Paul Krzyzanowski. Rutgers University. Fall 2018

Distributed Systems. 21. Other parallel frameworks. Paul Krzyzanowski. Rutgers University. Fall 2018 Distributed Systems 21. Other parallel frameworks Paul Krzyzanowski Rutgers University Fall 2018 1 Can we make MapReduce easier? 2 Apache Pig Why? Make it easy to use MapReduce via scripting instead of

More information

CS November 2018

CS November 2018 Distributed Systems 1. Other parallel frameworks Can we make MapReduce easier? Paul Krzyzanowski Rutgers University Fall 018 1 Apache Pig Apache Pig Why? Make it easy to use MapReduce via scripting instead

More information

AN INTRODUCTION TO GRAPH COMPRESSION TECHNIQUES FOR IN-MEMORY GRAPH COMPUTATION

AN INTRODUCTION TO GRAPH COMPRESSION TECHNIQUES FOR IN-MEMORY GRAPH COMPUTATION AN INTRODUCTION TO GRAPH COMPRESSION TECHNIQUES FOR IN-MEMORY GRAPH COMPUTATION AMIT CHAVAN Computer Science Department University of Maryland, College Park amitc@cs.umd.edu ABSTRACT. In this work we attempt

More information

GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics

GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics Yogesh Simmhan 1, Alok Kumbhare 2, Charith Wickramaarachchi 2, Soonil Nagarkar 2, Santosh Ravi 2, Cauligi Raghavendra 2, and Viktor

More information

Investigating Graph Algorithms in the BSP Model on the Cray XMT

Investigating Graph Algorithms in the BSP Model on the Cray XMT 2013 IEEE 27th International Symposium on Parallel & Distributed Processing Workshops and PhD Forum Investigating Graph Algorithms in the BSP Model on the Cray XMT David Ediger David A. Bader Georgia Institute

More information

[CoolName++]: A Graph Processing Framework for Charm++

[CoolName++]: A Graph Processing Framework for Charm++ [CoolName++]: A Graph Processing Framework for Charm++ Hassan Eslami, Erin Molloy, August Shi, Prakalp Srivastava Laxmikant V. Kale Charm++ Workshop University of Illinois at Urbana-Champaign {eslami2,emolloy2,awshi2,psrivas2,kale}@illinois.edu

More information

Giraphx: Parallel Yet Serializable Large-Scale Graph Processing

Giraphx: Parallel Yet Serializable Large-Scale Graph Processing Giraphx: Parallel Yet Serializable Large-Scale Graph Processing Serafettin Tasci and Murat Demirbas Computer Science & Engineering Department University at Buffalo, SUNY Abstract. Bulk Synchronous Parallelism

More information

Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems

Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems ABSTRACT Minyang Han David R. Cheriton School of Computer Science University of Waterloo m25han@uwaterloo.ca

More information

Frameworks for Graph-Based Problems

Frameworks for Graph-Based Problems Frameworks for Graph-Based Problems Dakshil Shah U.G. Student Computer Engineering Department Dwarkadas J. Sanghvi College of Engineering, Mumbai, India Chetashri Bhadane Assistant Professor Computer Engineering

More information

Project Number: Start Date of Project: 01/12/2012 Duration: 36 months

Project Number: Start Date of Project: 01/12/2012 Duration: 36 months Collaborative Project GeoKnow - Making the Web an Exploratory Place for Geospatial Knowledge Project Number: 318159 Start Date of Project: 01/12/2012 Duration: 36 months Deliverable 2.6.1 Prototype of

More information

Oolong: Asynchronous Distributed Applications Made Easy

Oolong: Asynchronous Distributed Applications Made Easy Oolong: Asynchronous Distributed Applications Made Easy Christopher Mitchell Russell Power Jinyang Li New York University {cmitchell, power, jinyang}@cs.nyu.edu Abstract We present Oolong, a distributed

More information

Implementing Graph Transformations in the Bulk Synchronous Parallel Model

Implementing Graph Transformations in the Bulk Synchronous Parallel Model Implementing Graph Transformations in the Bulk Synchronous Parallel Model Christian Krause 1, Matthias Tichy 2, and Holger Giese 3 1 SAP Innovation Center, Potsdam, Germany, christian.krause01@sap.com

More information

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21 Big Processing -Parallel Computation COS 418: Distributed Systems Lecture 21 Michael Freedman 2 Ex: Word count using partial aggregation Putting it together 1. Compute word counts from individual files

More information

Distributed Graph Algorithms

Distributed Graph Algorithms Distributed Graph Algorithms Alessio Guerrieri University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents 1 Introduction

More information

Graphs (Part II) Shannon Quinn

Graphs (Part II) Shannon Quinn Graphs (Part II) Shannon Quinn (with thanks to William Cohen and Aapo Kyrola of CMU, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University) Parallel Graph Computation Distributed computation

More information

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing /34 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of

More information

igiraph: A Cost-efficient Framework for Processing Large-scale Graphs on Public Clouds

igiraph: A Cost-efficient Framework for Processing Large-scale Graphs on Public Clouds 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing igiraph: A Cost-efficient Framework for Processing Large-scale Graphs on Public Clouds Safiollah Heidari, Rodrigo N. Calheiros

More information

Optimizing Memory Performance for FPGA Implementation of PageRank

Optimizing Memory Performance for FPGA Implementation of PageRank Optimizing Memory Performance for FPGA Implementation of PageRank Shijie Zhou, Charalampos Chelmis, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles,

More information

Today s content. Resilient Distributed Datasets(RDDs) Spark and its data model

Today s content. Resilient Distributed Datasets(RDDs) Spark and its data model Today s content Resilient Distributed Datasets(RDDs) ------ Spark and its data model Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing -- Spark By Matei Zaharia,

More information

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications Spark In- Memory Cluster Computing for Iterative and Interactive Applications Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker,

More information

PAGE: A Partition Aware Graph Computation Engine

PAGE: A Partition Aware Graph Computation Engine PAGE: A Graph Computation Engine Yingxia Shao, Junjie Yao, Bin Cui, Lin Ma Department of Computer Science Key Lab of High Confidence Software Technologies (Ministry of Education) Peking University {simon227,

More information

Apache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM

Apache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM Apache Giraph: Facebook-scale graph processing infrastructure 3/31/2014 Avery Ching, Facebook GDM Motivation Apache Giraph Inspired by Google s Pregel but runs on Hadoop Think like a vertex Maximum value

More information

Parallel Computing: MapReduce Jin, Hai

Parallel Computing: MapReduce Jin, Hai Parallel Computing: MapReduce Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! MapReduce is a distributed/parallel computing framework introduced by Google

More information

Graph-Processing Systems. (focusing on GraphChi)

Graph-Processing Systems. (focusing on GraphChi) Graph-Processing Systems (focusing on GraphChi) Recall: PageRank in MapReduce (Hadoop) Input: adjacency matrix H D F S (a,[c]) (b,[a]) (c,[a,b]) (c,pr(a) / out (a)), (a,[c]) (a,pr(b) / out (b)), (b,[a])

More information

Piccolo. Fast, Distributed Programs with Partitioned Tables. Presenter: Wu, Weiyi Yale University. Saturday, October 15,

Piccolo. Fast, Distributed Programs with Partitioned Tables. Presenter: Wu, Weiyi Yale University. Saturday, October 15, Piccolo Fast, Distributed Programs with Partitioned Tables 1 Presenter: Wu, Weiyi Yale University Outline Background Intuition Design Evaluation Future Work 2 Outline Background Intuition Design Evaluation

More information

Batch & Stream Graph Processing with Apache Flink. Vasia

Batch & Stream Graph Processing with Apache Flink. Vasia Batch & Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri Outline Distributed Graph Processing Gelly: Batch Graph Processing with Flink Gelly-Stream: Continuous Graph

More information

Optimistic Recovery for Iterative Dataflows in Action

Optimistic Recovery for Iterative Dataflows in Action Optimistic Recovery for Iterative Dataflows in Action Sergey Dudoladov 1 Asterios Katsifodimos 1 Chen Xu 1 Stephan Ewen 2 Volker Markl 1 Sebastian Schelter 1 Kostas Tzoumas 2 1 Technische Universität Berlin

More information

Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems

Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems University of Waterloo Technical Report CS-215-4 ABSTRACT Minyang Han David R. Cheriton School of Computer

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Piccolo: Building Fast, Distributed Programs

More information

Fast Failure Recovery in Distributed Graph Processing Systems

Fast Failure Recovery in Distributed Graph Processing Systems Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H. V. agadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore, Zhejiang University, University

More information

Distributed Computations MapReduce. adapted from Jeff Dean s slides

Distributed Computations MapReduce. adapted from Jeff Dean s slides Distributed Computations MapReduce adapted from Jeff Dean s slides What we ve learnt so far Basic distributed systems concepts Consistency (sequential, eventual) Fault tolerance (recoverability, availability)

More information

ZHT+ : Design and Implementation of a Graph Database Using ZHT

ZHT+ : Design and Implementation of a Graph Database Using ZHT ZHT+ : Design and Implementation of a Graph Database Using ZHT Gagan Munisiddha Gowda Benjamin L. Miwa Anirudh Sunkineni Department of Computer Science Illinois Institute of Technology Chicago, IL Abstract

More information

Defining supersteps for BSP

Defining supersteps for BSP efining supersteps for SP arrier Synchronization arrier Synchronization igure 1 Superstep with same nature of computation executed in parallel today. (The arrows denote just the temporal progress of task

More information

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기 MapReduce: Simplified Data Processing on Large Clusters 유연일민철기 Introduction MapReduce is a programming model and an associated implementation for processing and generating large data set with parallel,

More information

Scale-up Graph Processing: A Storage-centric View

Scale-up Graph Processing: A Storage-centric View Scale-up Graph Processing: A Storage-centric View Eiko Yoneki University of Cambridge eiko.yoneki@cl.cam.ac.uk Amitabha Roy EPFL amitabha.roy@epfl.ch ABSTRACT The determinant of performance in scale-up

More information

Parallelizing Machine Learning Functionally

Parallelizing Machine Learning Functionally Submitted to the 2011 Scala Workshop Parallelizing Machine Learning Functionally A Framework and Abstractions for Parallel Graph Processing Philipp Haller EPFL, Switzerland, and Stanford University firstname.lastname@epfl.ch

More information

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia MapReduce Spark Some slides are adapted from those of Jeff Dean and Matei Zaharia What have we learnt so far? Distributed storage systems consistency semantics protocols for fault tolerance Paxos, Raft,

More information

Data Analytics on RAMCloud

Data Analytics on RAMCloud Data Analytics on RAMCloud Jonathan Ellithorpe jdellit@stanford.edu Abstract MapReduce [1] has already become the canonical method for doing large scale data processing. However, for many algorithms including

More information

Faster Parallel Traversal of Scale Free Graphs at Extreme Scale with Vertex Delegates

Faster Parallel Traversal of Scale Free Graphs at Extreme Scale with Vertex Delegates Faster Parallel Traversal of Scale Free Graphs at Extreme Scale with Vertex Delegates Roger Pearce, Maya Gokhale Center for Applied Scientific Computing Lawrence Livermore National Laboratory; Livermore,

More information

Link Analysis in the Cloud

Link Analysis in the Cloud Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)

More information

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum

More information

Computation and Communication Efficient Graph Processing with Distributed Immutable View

Computation and Communication Efficient Graph Processing with Distributed Immutable View Computation and Communication Efficient Graph Processing with Distributed Immutable View Rong Chen, Xin Ding, Peng Wang, Haibo Chen, Binyu Zang, Haibing Guan Shanghai Key Laboratory of Scalable Computing

More information

Apache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis

Apache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis Züri Machine Learning Meetup #5 June 16, 2014 Apache Giraph for applications in Machine Learning

More information

Graph-Parallel Problems. ML in the Context of Parallel Architectures

Graph-Parallel Problems. ML in the Context of Parallel Architectures Case Study 4: Collaborative Filtering Graph-Parallel Problems Synchronous v. Asynchronous Computation Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 20 th, 2014

More information

Webinar Series TMIP VISION

Webinar Series TMIP VISION Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing

More information

MOTIVATION 1/18/18. Software tools for Complex Networks Analysis. Graphs are everywhere. Why do we need tools?

MOTIVATION 1/18/18. Software tools for Complex Networks Analysis. Graphs are everywhere. Why do we need tools? 1/18/18 Software tools for Complex Networks Analysis Fabrice Huet, University of Nice SophiaAntipolis SCALE Team Why do we need tools? MOTIVATION Graphs are everywhere Source : nature.com Visualization

More information

Sync-on-the-fly: A Parallel Framework for Gradient Descent Algorithms on Transient Resources

Sync-on-the-fly: A Parallel Framework for Gradient Descent Algorithms on Transient Resources Sync-on-the-fly: A Parallel Framework for Gradient Descent Algorithms on Transient Resources Guoyi Zhao, Lixin Gao and David Irwin Dept. of Electrical and Computer Engineering University of Massachusetts

More information

Embedded domain specific language for GPUaccelerated graph operations with automatic transformation and fusion

Embedded domain specific language for GPUaccelerated graph operations with automatic transformation and fusion Embedded domain specific language for GPUaccelerated graph operations with automatic transformation and fusion Stephen T. Kozacik, Aaron L. Paolini, Paul Fox, James L. Bonnett, Eric Kelmelis EM Photonics

More information

Distributed Graph Storage. Veronika Molnár, UZH

Distributed Graph Storage. Veronika Molnár, UZH Distributed Graph Storage Veronika Molnár, UZH Overview Graphs and Social Networks Criteria for Graph Processing Systems Current Systems Storage Computation Large scale systems Comparison / Best systems

More information

MPGM: A Mixed Parallel Big Graph Mining Tool

MPGM: A Mixed Parallel Big Graph Mining Tool MPGM: A Mixed Parallel Big Graph Mining Tool Ma Pengjiang 1 mpjr_2008@163.com Liu Yang 1 liuyang1984@bupt.edu.cn Wu Bin 1 wubin@bupt.edu.cn Wang Hongxu 1 513196584@qq.com 1 School of Computer Science,

More information

Social Networks 2015 Lecture 10: The structure of the web and link analysis

Social Networks 2015 Lecture 10: The structure of the web and link analysis 04198250 Social Networks 2015 Lecture 10: The structure of the web and link analysis The structure of the web Information networks Nodes: pieces of information Links: different relations between information

More information