Frameworks for Graph-Based Problems

Size: px
Start display at page:

Download "Frameworks for Graph-Based Problems"

Transcription

1 Frameworks for Graph-Based Problems Dakshil Shah U.G. Student Computer Engineering Department Dwarkadas J. Sanghvi College of Engineering, Mumbai, India Chetashri Bhadane Assistant Professor Computer Engineering Department Dwarkadas J. Sanghvi College of Engineering, Mumbai, India Aishwarya Sinh U.G. Student Computer Engineering Department Dwarkadas J. Sanghvi College of Engineering, Mumbai, India ABSTRACT There has been a tremendous growth of graph-structured data in modern applications. Social networks and knowledge bases are some such applications, which create a crucial need for architectures that can process dense graphs. MapReduce is a programming model used in large-scale data parallel applications. Despite its notable role in big data analytics, MapReduce is not suited for large scale graph processing. Due to these limitations of MapReduce, GraphLab is used as an alternative for large dense graph processing applications. In this paper, PageRank algorithm is explained with respect to both the implementations. The same are then compared and a conclusion is drawn based on the capabilities of both the methods in solving graph-based problems. Keywords Big Data, MapReduce, PageRank, GraphLab, Graph Based Problems, Pregel walk over the link structure will arrive at a particular node. Nodes with high in-degrees tend to have a high PageRank. P is the PageRank of a page n where G is the total number of pages in the graph, α is the random jump factor, L (n) is the set of pages that link to n, and C (m) is the out-degree of page m. PageRank is defined recursively. This leads to an iterative algorithm, which is similar in structure to a parallel breadth first search. There are a large number of webpages in existence today on the Internet. Google has estimated the number of unique web URLs to be over a trillion [2]. Calculating the PageRank of this collection needs a Big Data approach. 1. INTRODUCTION Graphs are an abstract way of representing connectivity using nodes and links, also known as vertices and edges. Edges can be one-directional or bi-directional. Nodes and edges may have some auxiliary information. Many problems are formulated and solved in terms of graphs. Some graph based problems are: Shortest Path problems, Network flow problems, Matching problems, 2- SAT problem, Graph coloring problem, Traveling Salesman Problem (TSP), PageRank. 2. PAGERANK The Web can be considered as a directed graph. The nodes are the pages and there is an arc from page 1 to page 2 if there are one or more links from 1 to 2. [1] PageRank is a probability distribution over nodes in the graph representing the likelihood that a random 47 Dakshil Shah, Chetashri Bhadane, Aishwarya Sinh 3. MAPREDUCE 3.1 Introduction to MapReduce Google introduced MapReduce. Apache has implemented its own version as part of Apache Hadoop. Thousands of commodity machines are used to process petabytes of data using clusters. This makes it suitable for programs that can be decomposed into parallel tasks. It consists of an execution runtime and Hadoop Distributed File System (HDFS). The partitioning of input data, scheduling the program s execution, managing inter-machine communication and dealing with machine failures is handled by the execution runtime. There are two primary methods: map and reduce. A job is partitioned into multiple tasks, which are then parallelized for execution on a cluster. In the map stage, every task fetches data from HDFS and splits it into records using the Record Reader. This is then

2 processed by a user defined function map (). The results are stored into temporary files, which contain many data partitions, which will be processed by the Reduce task. In the Reduce stage, every task goes through the following 3 phases [3]: 1) The data partitions are copied from remote Map tasks. 2) The data partitions are then sorted. 3) The sorted data partitions are further processed using user defined reduce () function. The results are then written back to the disk. The user sets in a configuration file the maximum number of parallelized tasks. Slots can be considered as tokens and only tasks that get a slot are allowed to run. This slot distribution is overseen by the scheduler, with the default scheduler being First in First out (FIFO). 3.2 PageRank Implementation in MapReduce PageRank algorithm is simplified by ignoring random jump factor and assuming no dangling nodes. Pseudo code of MapReduce: 1: class Mapper 2: method Map(nid n; node N) 3: p N:PageRank/ N.AdjacencyList 4: Emit(nid n, N) Pass along graph structure 5: for all nodeid m N.AdjacencyList do 6: Emit(nid m, p) Pass PageRank mass to neighbours 1: class Reducer 2: method Reduce(nid m, [p1, p2, ]) 3: M 4: for all p counts [p1, p2, ] do 5: if IsNode(p) then 6: M p Recover graph structure 7: else 8: s s + p Sum incoming PageRank contributions 9: M.PageRank s 10: Emit(nid m, node M) In the map phase, we evenly divide each node s PageRank mass and pass each piece along outgoing edges to neighbours. In the reduce phase PageRank contributions are summed up at each destination node. Each MapReduce job corresponds to one iteration of the above algorithm. Figure 1: Sample graph [4] Figure 2: After one iteration on MapReduce [4] The size of each box in Figure 2 is in proportion to its PageRank value. The PageRank mass is evenly distributed to each nodes adjacency list. In the reduce phase, all partial PageRank contributions are then summed together to get the updated values. The graph structure seen in Figure 1 is passed recursively. 3.3 Issues The MapReduce model does not provide any built in mechanism for communicating global state. Due to the representation being an adjacency matrix, inter node communication can occur only via direct links or intermediate nodes. Information can be passed only within the local graph structure. The local computation is carried out at each node and passed on to the neighbouring nodes. Convergence on the global graph is possible after multiple iterations. The shuffling and sorting steps of MapReduce carry out the passing of these partial results. The amount of intermediate data generated is dependent on the order of the number of edges. For a dense graph, run time would be wasted in copying this intermediate data across the network, which may lead to a run time of O (n 2 ) in the worst case where n is number of nodes. Thus, MapReduce algorithms are not feasible on large, dense graphs. Due to the large number of graph-based applications such as in social networks, a more efficient approach is needed. 48 Dakshil Shah, Chetashri Bhadane, Aishwarya Sinh

3 4. GRAPHLAB 4.1 Introduction to GraphLab MaReduce is unable to express statistical inference algorithms efficiently. Graphlab compactly expresses asynchronous algorithms with sparse computational dependencies. It ensures data consistency and a high degree of parallelism.[5] GraphLab can express complex computational dependencies using a data graph. It can express iterative parallel algorithms with dynamic scheduling. The data model consists of 2 parts: A directed data graph and a shared data table. The graph can be represented as G= (V, E). The data associated with vertex v is denoted as D v and data associated with edge ( by D (u v) The shared data table (SDT) is an associative map, T[Key] Value, between keys and arbitrary blocks of data. Computation in GraphLab is performed through an update function, which defines the local computation, or through the sync mechanism, that defines global aggregation. The Update Function is analogous to the Map in MapReduce. Update functions are permitted to access and modify overlapping contexts in the graph unlike the map function. The sync mechanism is analogous to the Reduce operation and runs concurrently with the update functions unlike in MapReduce. The GraphLab runtime engine determines the best order to run vertices in by relaxing the execution order requirements of the shared memory, thus enabling efficient distributed execution. However, the restriction that all vertices must be eventually run is imposed. GraphLab eliminates messages and isolates the user-defined algorithm from data movement. Due to this, the system can choose how and when to move program state. It enables the algorithm designer to distinguish between data that is shared with all neighbours and data that is shared with a particular neighbour. This is done by allowing mutable data to be associated with both vertices and edges. GraphLab does not differentiate between edge directions. The asynchronous execution behaviour depends on the number of machines and the availability of network resources. This leads to nondeterminism that can complicate algorithm design and debugging. The sequential model of the GraphLab abstraction is translated into parallel execution by allowing multiple processors to run the same loop on the same graph, removing and running different vertices simultaneously. It ensures that overlapping computation is not run simultaneously, so as to retain the semantics of sequential execution. Graphlab enforces serializabilty to overcome the above, so that every parallel execution of vertex oriented programs has a corresponding sequential executionto do this, it prevents adjacent vertex programs from running concurrently by using a fine grained locking protocol that requires sequentially grabbing locks on all neighbouring vertices[6]. 4.2 PageRank Implementation in GraphLab 1)Data Graph Every vertex (v) is related to a webpage Every edge (u,v) is related to a link (u v) Vertex data D(v) keeps the rank of the webpage R(v) Edge data Du v keeps the weight of the link (u v) 2) The update function The update function for PageRank assigns the computed weighted sum of the current ranks of neighbouring vertices as the rank of the current vertex. If the value of the current vertex changes more than a predefined threshold, only then the neighbours are listed for update making the algorithm adaptive. The update function for GraphLab takes as inputs a vertex and its scope v and Sv respectively. It returns as outputs the new version of the scope and a set of tasks T that is responsible for encoding future task executions. Update : (v,sv) (Sv,T ) Algorithm : PageRank update function Input: Vertex data R(v) from Sv Edge data {wu,v : u N[v]} from Sv Neighbour vertex data {R(u) : u N[v]} from Sv Algorithm: Rold(v) R(v) // old PageRank is saved R(v) α/n For each u N[v] do // Loop over neighbours 49 Dakshil Shah, Chetashri Bhadane, Aishwarya Sinh

4 R(v) R(v) + (1 α) wu,v R(u) /* If the PageRank changes sufficiently if R(v) Rold(v) > then schedule neighbors to be updated*/ return {(PageRankFun,u) : u N[v]} Output: Modified scope Sv with new R(v) 3) The sync operation The sync operation is defined as a tuple as follows:- (Key,Fold,Merge,Finalize,acc(0),τ) It consists of a unique key, three user defined functions, an initial accumulator value, and an integer representing the interval between sync operations. The Fold and Merge functions are used by the sync operation to perform a Global Synchronous Reduce. Fold aggregates vertex data. The intermediate Fold results are combined by Merge. A transformation on the final value is performed by the Finalize function and the result is stored. The Key is used by update functions to access the most recent result of the sync operation Algorithm : GraphLab Execution Model Input: Data Graph G = (V,E,D) Input: Initial task set T = {(f,v1),(g,v2),...} Initial set of syncs:(name,fold,merge,finalize,acc(0),τ) Algorithm: while T is not Empty do (f,v) RemoveNext(T )1 (T 0,Sv) f(v,sv)2 T T T 03 Run all Sync operations which are ready Output: Modified Data Graph G = (V,E,D0) Result of Sync operations 5. PREGEL 5.1 Introduction to Pregel Pregel[7] is a scalable infrastructure developed by Google to mine data contained in a variety of 50 Dakshil Shah, Chetashri Bhadane, Aishwarya Sinh graphs. In order to solve graph-based problems, programs are developed as a sequence of iterations. In every iteration, a vertex is capable of receiving messages sent to it in the previous iteration, send messages to other vertices independent of other vertices. The vertices may also modify its edge states and alter the graphs topology. Pregel programs can be scaled automatically on a cluster [8]. Vertices carry out the graph computation, while the edges are responsible for communication of these computed results between vertices. Edges do not participate in computation [9]. The input is a directed graph. The input is initialized, followed by a sequence of supersteps and terminates at the end of the algorithm. In each superstep, the same user-defined function for the given algorithm executes in parallel, with computation being done in the vertices. The termination of the algorithm depends upon every vertex voting to halt [8]. When a vertex votes to halt, it deactivates itself in the process. The framework in subsequent iterations will not execute this vertex unless reactivated by a message. When all the vertices are in an inactive state simultaneously, the algorithm terminates. 5.2 PageRank Implementation in Pregel Algorithm[10] class PageRank function ComputeAtVertex(Message msg) if(superstep() >= 1) sum=0 while(!msg->done) sum+=msg->value() msg->next() mutablevalue()=0.15/numvertces() +dampingfactor*sum if(superstep()>maxset) VoteToHalt() else n=getoutedgeiterator().size() SendMessageToAllNeighbours(GetValue()/n) The vertices store the intermediate PageRank value which is computed. On crossing the maximum number of set supersteps, no further messages are sent and the vertices vote to halt. The algorithm is generally run until a convergence is achieved.

5 6. COMPARISON Most advanced machine learning and data mining algorithms focus on modelling dependencies between data. But data parallel abstractions like MapReduce fails when there are computational dependencies in data. For expressing such computational dependencies, graph-parallel abstractions like GraphLab and Pregel simplify the design and implementation of graph-parallel algorithms. This is achieved by freeing the user to focus on sequential computation rather than parallel movement of data. While the MapReduce abstraction can be run iteratively, there is no mechanism provided to directly carry out iterative computation. As a result, it is not possible to express sophisticated scheduling, automatically assess termination, or leverage basic data persistence. Unlike MapReduce, GraphLab can provide sophisticated scheduling primitives that can express iterative parallel algorithms with dynamic scheduling. Table 1. Comparison MapReduce GraphLab Pregel Messages Yes Eliminated Yes File System Hadoop Distributed File System Shared Data Table Google File System Programming model Parallelism Architecture model Task/Vertex Scheduling model Application Suitability Computational Model Synchronous Asynchronous Synchronous Sparse dependency No Yes Yes Iterative With help Yes Yes of extensions [5] Sharedmemory Sharedmemory Messagepassing Data- Graph- Graph- Parallel Parallel Parallel Master- Peer-to-Peer Master- Slave Slave Pull-based Push-based Push-based Loosely- Connected Strongly- Connected Applications Strongly- Connected Applications 51 Dakshil Shah, Chetashri Bhadane, Aishwarya Sinh GraphLab gives a faster and more efficient runtime performance than MapReduce and eliminates messages. The Update Function in GraphLab is analogous to the Map in MapReduce, but unlike in MapReduce, update functions are permitted to access and modify overlapping contexts in the graph. The sync mechanism is analogous to the Reduce operation, but unlike in MapReduce, the sync mechanism runs concurrently with the update functions.[5] Pregel is more effective in handling iterative processing in comparison with MapReduce[10]. Pregel keeps vertices & edges on the machine that performs computation, uses the network to transfers only messages whereas MapReduce passes the entire state of the graph from one stage to the next, and needs to coordinate the steps of a chained MapReduce. 7. CONCLUSION Both MapReduce and Pregel are synchronous computation models while GraphLab is asynchronous. GraphLab and Pregel both support iterative algorithms and are graph parallel. MapReduce does support iterative algorithms with the help of extensions and is data parallel. In this paper we reviewed three frameworks with respect to graph based problems by comparing their mechanisms for solving the same, with GraphLab and Pregel more suited for graph based problems in comparison to MapReduce. 8. REFERENCES [1] A. Rajaraman and J. Ullman, Mining of massive datasets. New York, N.Y.: Cambridge University Press, [2] J. Alpert and N. Hajaj, 'We knew the web was big...', Google Blog, [Online]. Available: [2] [Accessed: 08- Sep- 2015]. [3] Dong, Xicheng, Ying Wang, and Huaming Liao. "Scheduling mixed real-time and non-real-time applications in mapreduce environment." Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on. IEEE, [4] Lin, Jimmy, and Chris Dyer. "Data-intensive text processing with MapReduce."Synthesis Lectures on Human Language Technologies 3.1 (2010): [5] S. Sakr, 'Processing large-scale graph data: A guide to current technology', Ibm.com, [Online]. Available:

6 [5] [Accessed: 16- Sep- 2015]. [6] Hindman, Benjamin, et al. "Nexus: A common substrate for cluster computing."workshop on Hot Topics in Cloud Computing [7] Low, Yucheng, et al. "Graphlab: A new framework for parallel machine learning." arxiv preprint arxiv: (2014).. [8] G. Czajkowski, 'Large-scale graph computing at Google', Google Research Blog, [Online]. Available: [8] research.blogspot.in /2009/06/large-scale-graph-computing-atgoogle.html. [Accessed: 15- Sep- 2015]. [9] Malewicz, Grzegorz, et al. "Pregel: a system for large-scale graph processing."proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, [10] Han, Minyang, et al. "An experimental comparison of pregel-like graph processing systems." Proceedings of the VLDB Endowment 7.12 (2014): [11] Jiang, Dawei, et al. "epic: an extensible and scalable system for processing big data." Proceedings of the VLDB Endowment 7.7 (2014): [12] 'The GraphLab Abstraction', Dakshil Shah, Chetashri Bhadane, Aishwarya Sinh

Graph Processing. Connor Gramazio Spiros Boosalis

Graph Processing. Connor Gramazio Spiros Boosalis Graph Processing Connor Gramazio Spiros Boosalis Pregel why not MapReduce? semantics: awkward to write graph algorithms efficiency: mapreduces serializes state (e.g. all nodes and edges) while pregel keeps

More information

PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING

PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, G. Czajkowski Google, Inc. SIGMOD 2010 Presented by Ke Hong (some figures borrowed from

More information

Pregel: A System for Large-Scale Graph Proces sing

Pregel: A System for Large-Scale Graph Proces sing Pregel: A System for Large-Scale Graph Proces sing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkwoski Google, Inc. SIGMOD July 20 Taewhi

More information

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G.

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G. Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G. Speaker: Chong Li Department: Applied Health Science Program: Master of Health Informatics 1 Term

More information

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem I J C T A, 9(41) 2016, pp. 1235-1239 International Science Press Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem Hema Dubey *, Nilay Khare *, Alind Khare **

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 60 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Pregel: A System for Large-Scale Graph Processing

More information

MapReduce Design Patterns

MapReduce Design Patterns MapReduce Design Patterns MapReduce Restrictions Any algorithm that needs to be implemented using MapReduce must be expressed in terms of a small number of rigidly defined components that must fit together

More information

Graph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web

Graph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some

More information

Pregel. Ali Shah

Pregel. Ali Shah Pregel Ali Shah s9alshah@stud.uni-saarland.de 2 Outline Introduction Model of Computation Fundamentals of Pregel Program Implementation Applications Experiments Issues with Pregel 3 Outline Costs of Computation

More information

modern database systems lecture 10 : large-scale graph processing

modern database systems lecture 10 : large-scale graph processing modern database systems lecture 1 : large-scale graph processing Aristides Gionis spring 18 timeline today : homework is due march 6 : homework out april 5, 9-1 : final exam april : homework due graphs

More information

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department

More information

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) SIGMOD 2010 Presented by : Xiu

More information

Big Graph Processing. Fenggang Wu Nov. 6, 2016

Big Graph Processing. Fenggang Wu Nov. 6, 2016 Big Graph Processing Fenggang Wu Nov. 6, 2016 Agenda Project Publication Organization Pregel SIGMOD 10 Google PowerGraph OSDI 12 CMU GraphX OSDI 14 UC Berkeley AMPLab PowerLyra EuroSys 15 Shanghai Jiao

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany GraphLab GraphLab 1 / 34 Outline 1. Introduction

More information

[CoolName++]: A Graph Processing Framework for Charm++

[CoolName++]: A Graph Processing Framework for Charm++ [CoolName++]: A Graph Processing Framework for Charm++ Hassan Eslami, Erin Molloy, August Shi, Prakalp Srivastava Laxmikant V. Kale Charm++ Workshop University of Illinois at Urbana-Champaign {eslami2,emolloy2,awshi2,psrivas2,kale}@illinois.edu

More information

Large Scale Graph Processing Pregel, GraphLab and GraphX

Large Scale Graph Processing Pregel, GraphLab and GraphX Large Scale Graph Processing Pregel, GraphLab and GraphX Amir H. Payberah amir@sics.se KTH Royal Institute of Technology Amir H. Payberah (KTH) Large Scale Graph Processing 2016/10/03 1 / 76 Amir H. Payberah

More information

Distributed Systems. 21. Graph Computing Frameworks. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems. 21. Graph Computing Frameworks. Paul Krzyzanowski. Rutgers University. Fall 2016 Distributed Systems 21. Graph Computing Frameworks Paul Krzyzanowski Rutgers University Fall 2016 November 21, 2016 2014-2016 Paul Krzyzanowski 1 Can we make MapReduce easier? November 21, 2016 2014-2016

More information

PREGEL AND GIRAPH. Why Pregel? Processing large graph problems is challenging Options

PREGEL AND GIRAPH. Why Pregel? Processing large graph problems is challenging Options Data Management in the Cloud PREGEL AND GIRAPH Thanks to Kristin Tufte 1 Why Pregel? Processing large graph problems is challenging Options Custom distributed infrastructure Existing distributed computing

More information

Data Partitioning and MapReduce

Data Partitioning and MapReduce Data Partitioning and MapReduce Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies,

More information

Pregel: A System for Large- Scale Graph Processing. Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010

Pregel: A System for Large- Scale Graph Processing. Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010 Pregel: A System for Large- Scale Graph Processing Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010 1 Graphs are hard Poor locality of memory access Very

More information

Large-Scale Graph Processing 1: Pregel & Apache Hama Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC

Large-Scale Graph Processing 1: Pregel & Apache Hama Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC Large-Scale Graph Processing 1: Pregel & Apache Hama Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC Lecture material is mostly home-grown, partly taken with permission and courtesy from Professor Shih-Wei

More information

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21 Big Processing -Parallel Computation COS 418: Distributed Systems Lecture 21 Michael Freedman 2 Ex: Word count using partial aggregation Putting it together 1. Compute word counts from individual files

More information

COSC 6339 Big Data Analytics. Graph Algorithms and Apache Giraph

COSC 6339 Big Data Analytics. Graph Algorithms and Apache Giraph COSC 6339 Big Data Analytics Graph Algorithms and Apache Giraph Parts of this lecture are adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Automatic Scaling Iterative Computations. Aug. 7 th, 2012

Automatic Scaling Iterative Computations. Aug. 7 th, 2012 Automatic Scaling Iterative Computations Guozhang Wang Cornell University Aug. 7 th, 2012 1 What are Non-Iterative Computations? Non-iterative computation flow Directed Acyclic Examples Batch style analytics

More information

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia MapReduce Spark Some slides are adapted from those of Jeff Dean and Matei Zaharia What have we learnt so far? Distributed storage systems consistency semantics protocols for fault tolerance Paxos, Raft,

More information

TI2736-B Big Data Processing. Claudia Hauff

TI2736-B Big Data Processing. Claudia Hauff TI2736-B Big Data Processing Claudia Hauff ti2736b-ewi@tudelft.nl Intro Streams Streams Map Reduce HDFS Pig Ctd. Graphs Pig Design Patterns Hadoop Ctd. Giraph Zoo Keeper Spark Spark Ctd. Learning objectives

More information

Cloud Computing CS

Cloud Computing CS Cloud Computing CS 15-319 Programming Models- Part III Lecture 6, Feb 1, 2012 Majd F. Sakr and Mohammad Hammoud 1 Today Last session Programming Models- Part II Today s session Programming Models Part

More information

Global Journal of Engineering Science and Research Management

Global Journal of Engineering Science and Research Management A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE K. Srikanth*, P. Venkateswarlu, Ashok Suragala * Department of Information Technology, JNTUK-UCEV

More information

King Abdullah University of Science and Technology. CS348: Cloud Computing. Large-Scale Graph Processing

King Abdullah University of Science and Technology. CS348: Cloud Computing. Large-Scale Graph Processing King Abdullah University of Science and Technology CS348: Cloud Computing Large-Scale Graph Processing Zuhair Khayyat 10/March/2013 The Importance of Graphs A graph is a mathematical structure that represents

More information

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 125-129 www.iosrjournals.org I ++ Mapreduce: Incremental Mapreduce for

More information

Lecture 11: Graph algorithms! Claudia Hauff (Web Information Systems)!

Lecture 11: Graph algorithms! Claudia Hauff (Web Information Systems)! Lecture 11: Graph algorithms!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the scenes of MapReduce:

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) Hadoop Lecture 24, April 23, 2014 Mohammad Hammoud Today Last Session: NoSQL databases Today s Session: Hadoop = HDFS + MapReduce Announcements: Final Exam is on Sunday April

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

1. Introduction to MapReduce

1. Introduction to MapReduce Processing of massive data: MapReduce 1. Introduction to MapReduce 1 Origins: the Problem Google faced the problem of analyzing huge sets of data (order of petabytes) E.g. pagerank, web access logs, etc.

More information

Introduction to MapReduce

Introduction to MapReduce Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed

More information

Graph Algorithms. Chapter 5

Graph Algorithms. Chapter 5 Chapter Graph Algorithms Graphs are ubiquitous in modern society: examples encountered by almost everyone on a daily basis include the hyperlink structure of the web (simply known as the web graph), social

More information

High Performance Computing on MapReduce Programming Framework

High Performance Computing on MapReduce Programming Framework International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming

More information

Giraph: Large-scale graph processing infrastructure on Hadoop. Qu Zhi

Giraph: Large-scale graph processing infrastructure on Hadoop. Qu Zhi Giraph: Large-scale graph processing infrastructure on Hadoop Qu Zhi Why scalable graph processing? Web and social graphs are at immense scale and continuing to grow In 2008, Google estimated the number

More information

iihadoop: an asynchronous distributed framework for incremental iterative computations

iihadoop: an asynchronous distributed framework for incremental iterative computations DOI 10.1186/s40537-017-0086-3 RESEARCH Open Access iihadoop: an asynchronous distributed framework for incremental iterative computations Afaf G. Bin Saadon * and Hoda M. O. Mokhtar *Correspondence: eng.afaf.fci@gmail.com

More information

HaLoop Efficient Iterative Data Processing on Large Clusters

HaLoop Efficient Iterative Data Processing on Large Clusters HaLoop Efficient Iterative Data Processing on Large Clusters Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst University of Washington Department of Computer Science & Engineering Presented

More information

CS /21/2016. Paul Krzyzanowski 1. Can we make MapReduce easier? Distributed Systems. Apache Pig. Apache Pig. Pig: Loading Data.

CS /21/2016. Paul Krzyzanowski 1. Can we make MapReduce easier? Distributed Systems. Apache Pig. Apache Pig. Pig: Loading Data. Distributed Systems 1. Graph Computing Frameworks Can we make MapReduce easier? Paul Krzyzanowski Rutgers University Fall 016 1 Apache Pig Apache Pig Why? Make it easy to use MapReduce via scripting instead

More information

Graphs (Part II) Shannon Quinn

Graphs (Part II) Shannon Quinn Graphs (Part II) Shannon Quinn (with thanks to William Cohen and Aapo Kyrola of CMU, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University) Parallel Graph Computation Distributed computation

More information

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기 MapReduce: Simplified Data Processing on Large Clusters 유연일민철기 Introduction MapReduce is a programming model and an associated implementation for processing and generating large data set with parallel,

More information

Databases 2 (VU) ( / )

Databases 2 (VU) ( / ) Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:

More information

Survey Paper on Traditional Hadoop and Pipelined Map Reduce

Survey Paper on Traditional Hadoop and Pipelined Map Reduce International Journal of Computational Engineering Research Vol, 03 Issue, 12 Survey Paper on Traditional Hadoop and Pipelined Map Reduce Dhole Poonam B 1, Gunjal Baisa L 2 1 M.E.ComputerAVCOE, Sangamner,

More information

GraphHP: A Hybrid Platform for Iterative Graph Processing

GraphHP: A Hybrid Platform for Iterative Graph Processing GraphHP: A Hybrid Platform for Iterative Graph Processing Qun Chen, Song Bai, Zhanhuai Li, Zhiying Gou, Bo Suo and Wei Pan Northwestern Polytechnical University Xi an, China {chenbenben, baisong, lizhh,

More information

Harp-DAAL for High Performance Big Data Computing

Harp-DAAL for High Performance Big Data Computing Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big

More information

MapReduce: Algorithm Design for Relational Operations

MapReduce: Algorithm Design for Relational Operations MapReduce: Algorithm Design for Relational Operations Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec Projection π Projection in MapReduce Easy Map over tuples, emit

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 14: Distributed Graph Processing Motivation Many applications require graph processing E.g., PageRank Some graph data sets are very large

More information

Distributed Computation Models

Distributed Computation Models Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case

More information

CS 5220: Parallel Graph Algorithms. David Bindel

CS 5220: Parallel Graph Algorithms. David Bindel CS 5220: Parallel Graph Algorithms David Bindel 2017-11-14 1 Graphs Mathematically: G = (V, E) where E V V Convention: V = n and E = m May be directed or undirected May have weights w V : V R or w E :

More information

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011 Data-Intensive Information Processing Applications! Session #5 Graph Algorithms Jordan Boyd-Graber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Survey on MapReduce Scheduling Algorithms

Survey on MapReduce Scheduling Algorithms Survey on MapReduce Scheduling Algorithms Liya Thomas, Mtech Student, Department of CSE, SCTCE,TVM Syama R, Assistant Professor Department of CSE, SCTCE,TVM ABSTRACT MapReduce is a programming model used

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 5: Graph Processing Jimmy Lin University of Maryland Thursday, February 21, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b International Conference on Artificial Intelligence and Engineering Applications (AIEA 2016) Implementation of Parallel CASINO Algorithm Based on MapReduce Li Zhang a, Yijie Shi b State key laboratory

More information

Apache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM

Apache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM Apache Giraph: Facebook-scale graph processing infrastructure 3/31/2014 Avery Ching, Facebook GDM Motivation Apache Giraph Inspired by Google s Pregel but runs on Hadoop Think like a vertex Maximum value

More information

One Trillion Edges. Graph processing at Facebook scale

One Trillion Edges. Graph processing at Facebook scale One Trillion Edges Graph processing at Facebook scale Introduction Platform improvements Compute model extensions Experimental results Operational experience How Facebook improved Apache Giraph Facebook's

More information

Programming Models MapReduce

Programming Models MapReduce Programming Models MapReduce Majd Sakr, Garth Gibson, Greg Ganger, Raja Sambasivan 15-719/18-847b Advanced Cloud Computing Fall 2013 Sep 23, 2013 1 MapReduce In a Nutshell MapReduce incorporates two phases

More information

Link Analysis in the Cloud

Link Analysis in the Cloud Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)

More information

FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN. School of Computing, SASTRA University, Thanjavur , India

FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN. School of Computing, SASTRA University, Thanjavur , India Volume 115 No. 7 2017, 105-110 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN Balaji.N 1,

More information

arxiv: v1 [cs.db] 26 Apr 2012

arxiv: v1 [cs.db] 26 Apr 2012 Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud Yucheng Low Carnegie Mellon University ylow@cs.cmu.edu Joseph Gonzalez Carnegie Mellon University jegonzal@cs.cmu.edu

More information

Graph-Parallel Problems. ML in the Context of Parallel Architectures

Graph-Parallel Problems. ML in the Context of Parallel Architectures Case Study 4: Collaborative Filtering Graph-Parallel Problems Synchronous v. Asynchronous Computation Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 20 th, 2014

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 14: Distributed Graph Processing Motivation Many applications require graph processing E.g., PageRank Some graph data sets are very large

More information

Distributed computing: index building and use

Distributed computing: index building and use Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput

More information

Map-Reduce and Adwords Problem

Map-Reduce and Adwords Problem Map-Reduce and Adwords Problem Map-Reduce and Adwords Problem Miłosz Kadziński Institute of Computing Science Poznan University of Technology, Poland www.cs.put.poznan.pl/mkadzinski/wpi Big Data (1) Big

More information

Developing MapReduce Programs

Developing MapReduce Programs Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes

More information

Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems

Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems ABSTRACT Minyang Han David R. Cheriton School of Computer Science University of Waterloo m25han@uwaterloo.ca

More information

Hadoop Map Reduce 10/17/2018 1

Hadoop Map Reduce 10/17/2018 1 Hadoop Map Reduce 10/17/2018 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 10/17/2018

More information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop

More information

Implementation of Aggregation of Map and Reduce Function for Performance Improvisation

Implementation of Aggregation of Map and Reduce Function for Performance Improvisation 2016 IJSRSET Volume 2 Issue 5 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Implementation of Aggregation of Map and Reduce Function for Performance Improvisation

More information

Mitigating Data Skew Using Map Reduce Application

Mitigating Data Skew Using Map Reduce Application Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,

More information

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja

More information

CS-2510 COMPUTER OPERATING SYSTEMS

CS-2510 COMPUTER OPERATING SYSTEMS CS-2510 COMPUTER OPERATING SYSTEMS Cloud Computing MAPREDUCE Dr. Taieb Znati Computer Science Department University of Pittsburgh MAPREDUCE Programming Model Scaling Data Intensive Application Functional

More information

Distributed computing: index building and use

Distributed computing: index building and use Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput

More information

A Parallel Community Detection Algorithm for Big Social Networks

A Parallel Community Detection Algorithm for Big Social Networks A Parallel Community Detection Algorithm for Big Social Networks Yathrib AlQahtani College of Computer and Information Sciences King Saud University Collage of Computing and Informatics Saudi Electronic

More information

Hadoop/MapReduce Computing Paradigm

Hadoop/MapReduce Computing Paradigm Hadoop/Reduce Computing Paradigm 1 Large-Scale Data Analytics Reduce computing paradigm (E.g., Hadoop) vs. Traditional database systems vs. Database Many enterprises are turning to Hadoop Especially applications

More information

Massive Online Analysis - Storm,Spark

Massive Online Analysis - Storm,Spark Massive Online Analysis - Storm,Spark presentation by R. Kishore Kumar Research Scholar Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Kharagpur-721302, India (R

More information

Memory-Optimized Distributed Graph Processing. through Novel Compression Techniques

Memory-Optimized Distributed Graph Processing. through Novel Compression Techniques Memory-Optimized Distributed Graph Processing through Novel Compression Techniques Katia Papakonstantinopoulou Joint work with Panagiotis Liakos and Alex Delis University of Athens Athens Colloquium in

More information

Introduction to MapReduce

Introduction to MapReduce Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed

More information

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics Presented by: Dishant Mittal Authors: Juwei Shi, Yunjie Qiu, Umar Firooq Minhas, Lemei Jiao, Chen Wang, Berthold Reinwald and Fatma

More information

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 22: MapReduce We are talking about parallel query processing There exist two main types of engines: Parallel DBMSs (last lecture + quick review)

More information

Dept. Of Computer Science, Colorado State University

Dept. Of Computer Science, Colorado State University CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [HADOOP/HDFS] Trying to have your cake and eat it too Each phase pines for tasks with locality and their numbers on a tether Alas within a phase, you get one,

More information

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Clustering Lecture 8: MapReduce

Clustering Lecture 8: MapReduce Clustering Lecture 8: MapReduce Jing Gao SUNY Buffalo 1 Divide and Conquer Work Partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 Result Combine 4 Distributed Grep Very big data Split data Split data

More information

An Improved Performance Evaluation on Large-Scale Data using MapReduce Technique

An Improved Performance Evaluation on Large-Scale Data using MapReduce Technique Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 8: Analyzing Graphs, Redux (1/2) March 20, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on

More information

SQL-to-MapReduce Translation for Efficient OLAP Query Processing

SQL-to-MapReduce Translation for Efficient OLAP Query Processing , pp.61-70 http://dx.doi.org/10.14257/ijdta.2017.10.6.05 SQL-to-MapReduce Translation for Efficient OLAP Query Processing with MapReduce Hyeon Gyu Kim Department of Computer Engineering, Sahmyook University,

More information

Distributed Computations MapReduce. adapted from Jeff Dean s slides

Distributed Computations MapReduce. adapted from Jeff Dean s slides Distributed Computations MapReduce adapted from Jeff Dean s slides What we ve learnt so far Basic distributed systems concepts Consistency (sequential, eventual) Fault tolerance (recoverability, availability)

More information

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

A Comparative study of Clustering Algorithms using MapReduce in Hadoop A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering

More information

STATS Data Analysis using Python. Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns

STATS Data Analysis using Python. Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns STATS 700-002 Data Analysis using Python Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns Unit 3: parallel processing and big data The next few lectures will focus on big

More information

Introduction To Graphs and Networks. Fall 2013 Carola Wenk

Introduction To Graphs and Networks. Fall 2013 Carola Wenk Introduction To Graphs and Networks Fall 203 Carola Wenk On the Internet, links are essentially weighted by factors such as transit time, or cost. The goal is to find the shortest path from one node to

More information

DFA-G: A Unified Programming Model for Vertex-centric Parallel Graph Processing

DFA-G: A Unified Programming Model for Vertex-centric Parallel Graph Processing SCHOOL OF COMPUTER SCIENCE AND ENGINEERING DFA-G: A Unified Programming Model for Vertex-centric Parallel Graph Processing Bo Suo, Jing Su, Qun Chen, Zhanhuai Li, Wei Pan 2016-08-19 1 ABSTRACT Many systems

More information

The amount of data increases every day Some numbers ( 2012):

The amount of data increases every day Some numbers ( 2012): 1 The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect

More information

Outline. Graphs. Divide and Conquer.

Outline. Graphs. Divide and Conquer. GRAPHS COMP 321 McGill University These slides are mainly compiled from the following resources. - Professor Jaehyun Park slides CS 97SI - Top-coder tutorials. - Programming Challenges books. Outline Graphs.

More information

2/26/2017. The amount of data increases every day Some numbers ( 2012):

2/26/2017. The amount of data increases every day Some numbers ( 2012): The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect to

More information

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad

More information

University of Maryland. Tuesday, March 2, 2010

University of Maryland. Tuesday, March 2, 2010 Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Graph Algorithms. Revised based on the slides by Ruoming Kent State

Graph Algorithms. Revised based on the slides by Ruoming Kent State Graph Algorithms Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Piccolo: Building Fast, Distributed Programs

More information