Graph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web

Similar documents
G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu

Graph Data Processing with MapReduce

Lecture 11: Graph algorithms! Claudia Hauff (Web Information Systems)!

MapReduce Design Patterns

University of Maryland. Tuesday, March 2, 2010

Graph Algorithms. Revised based on the slides by Ruoming Kent State

Data-Intensive Distributed Computing

Data-Intensive Computing with MapReduce

TI2736-B Big Data Processing. Claudia Hauff

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011

Link Analysis in the Cloud

Map-Reduce Applications: Counting, Graph Shortest Paths

Map-Reduce Applications: Counting, Graph Shortest Paths

Graph Algorithms. Chapter 5

Parallel Dijkstra s Algorithm

Introduction To Graphs and Networks. Fall 2013 Carola Wenk

Copyright 2000, Kevin Wayne 1

CCF: Fast and Scalable Connected Component Computation in MapReduce

COSC 6339 Big Data Analytics. Graph Algorithms and Apache Giraph

Implementing Mapreduce Algorithms In Hadoop Framework Guide : Dr. SOBHAN BABU

Parallel Breadth First Search

Laboratory Session: MapReduce

Parallel Pointers: Graphs

This course is intended for 3rd and/or 4th year undergraduate majors in Computer Science.

Graphs. Tessema M. Mengistu Department of Computer Science Southern Illinois University Carbondale Room - Faner 3131

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader

Homework Assignment #3 Graph

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

CS4800: Algorithms & Data Jonathan Ullman

Algorithm Design and Analysis

Elementary Graph Algorithms. Ref: Chapter 22 of the text by Cormen et al. Representing a graph:

CS 220: Discrete Structures and their Applications. graphs zybooks chapter 10

Wedge A New Frontier for Pull-based Graph Processing. Samuel Grossman and Christos Kozyrakis Platform Lab Retreat June 8, 2018

Figure 1: A directed graph.

Chapter 10. Fundamental Network Algorithms. M. E. J. Newman. May 6, M. E. J. Newman Chapter 10 May 6, / 33

Graph. Vertex. edge. Directed Graph. Undirected Graph

Introduction To Graphs and Networks. Fall 2013 Carola Wenk

Graphs. Edges may be directed (from u to v) or undirected. Undirected edge eqvt to pair of directed edges

22.1 Representations of graphs

2. True or false: even though BFS and DFS have the same space complexity, they do not always have the same worst case asymptotic time complexity.

Graph Theory for Network Science

Distance Estimation for Very Large Networks using MapReduce and Network Structure Indices

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

CS 361 Data Structures & Algs Lecture 13. Prof. Tom Hayes University of New Mexico

Introduction to MapReduce

LECTURE 17 GRAPH TRAVERSALS

Efficient and Scalable Friend Recommendations

Graph definitions. There are two kinds of graphs: directed graphs (sometimes called digraphs) and undirected graphs. An undirected graph

Routing Algorithms. CS158a Chris Pollett Apr 4, 2007.

CSE 373 NOVEMBER 20 TH TOPOLOGICAL SORT

An undirected graph G can be represented by G = (V, E), where

Lecture 3, Review of Algorithms. What is Algorithm?

CS 103 Six Degrees of Kevin Bacon

Gearing Hadoop towards HPC sytems

CSI 604 Elementary Graph Algorithms

Antonio Fernández Anta

MapReduce for Graph Algorithms

Graphs. Data Structures and Algorithms CSE 373 SU 18 BEN JONES 1

B490 Mining the Big Data. 5. Models for Big Data

Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing

modern database systems lecture 10 : large-scale graph processing


DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI. Department of Computer Science and Engineering CS6301 PROGRAMMING DATA STRUCTURES II

1. Introduction to MapReduce

Graphs. Graph G = (V, E) Types of graphs E = O( V 2 ) V = set of vertices E = set of edges (V V)

MapReduce Patterns, Algorithms, and Use Cases

CSE 373: Data Structures and Algorithms

Algorithm Design and Analysis

Unit 2: Algorithmic Graph Theory

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs

Graphs. CSE 2320 Algorithms and Data Structures Alexandra Stefan and Vassilis Athitsos University of Texas at Arlington

Undirected Graphs. V = { 1, 2, 3, 4, 5, 6, 7, 8 } E = { 1-2, 1-3, 2-3, 2-4, 2-5, 3-5, 3-7, 3-8, 4-5, 5-6 } n = 8 m = 11

Frameworks for Graph-Based Problems

Clustering Lecture 8: MapReduce

Graph representation

GRAPHS Lecture 17 CS2110 Spring 2014

Graph Algorithms. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Improvement of path analysis algorithm in social networks based on HBase

Review: Graph Theory and Representation

CS 3410 Ch 14 Graphs and Paths

Analyzing the Programmability and the Efficiency of Large-Scale Graph Processing on. A Project Report

COP 4531 Complexity & Analysis of Data Structures & Algorithms

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

RDFPath. Path Query Processing on Large RDF Graphs with MapReduce. 29 May 2011

Map Reduce & Hadoop Recommended Text:

Jure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah

CSE 417: Algorithms and Computational Complexity. 3.1 Basic Definitions and Applications. Goals. Chapter 3. Winter 2012 Graphs and Graph Algorithms

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Graphs - I CS 2110, Spring 2016

MapReduce Algorithms

Hadoop Map Reduce 10/17/2018 1

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Sampling Large Graphs: Algorithms and Applications

CS/COE 1501 cs.pitt.edu/~bill/1501/ Graphs

Biological Networks Analysis

Real-time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments

Graph Mining and Social Network Analysis

Lecture 9 Graph Traversal

Transcription:

Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web

Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Social networks on social networking sites like Facebook, IMDB, email, text messages and tweet flows (like Twitter)

Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Social networks on social networking sites like Facebook, IMDB, email, text messages and tweet flows (like Twitter) Transportation networks (roads, trains, flights etc)

Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Social networks on social networking sites like Facebook, IMDB, email, text messages and tweet flows (like Twitter) Transportation networks (roads, trains, flights etc) Human body can be seen as a graph of genes, proteins, cells etc

Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Social networks on social networking sites like Facebook, IMDB, email, text messages and tweet flows (like Twitter) Transportation networks (roads, trains, flights etc) Human body can be seen as a graph of genes, proteins, cells etc Typical graph problems and algorithms: Graph search and path planning

Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Social networks on social networking sites like Facebook, IMDB, email, text messages and tweet flows (like Twitter) Transportation networks (roads, trains, flights etc) Human body can be seen as a graph of genes, proteins, cells etc Typical graph problems and algorithms: Graph search and path planning Graph clustering

Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Social networks on social networking sites like Facebook, IMDB, email, text messages and tweet flows (like Twitter) Transportation networks (roads, trains, flights etc) Human body can be seen as a graph of genes, proteins, cells etc Typical graph problems and algorithms: Graph search and path planning Graph clustering Minimum spanning trees

Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Social networks on social networking sites like Facebook, IMDB, email, text messages and tweet flows (like Twitter) Transportation networks (roads, trains, flights etc) Human body can be seen as a graph of genes, proteins, cells etc Typical graph problems and algorithms: Graph search and path planning Graph clustering Minimum spanning trees Bipartite graph matching

Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Social networks on social networking sites like Facebook, IMDB, email, text messages and tweet flows (like Twitter) Transportation networks (roads, trains, flights etc) Human body can be seen as a graph of genes, proteins, cells etc Typical graph problems and algorithms: Graph search and path planning Graph clustering Minimum spanning trees Bipartite graph matching Maximum flow

Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Social networks on social networking sites like Facebook, IMDB, email, text messages and tweet flows (like Twitter) Transportation networks (roads, trains, flights etc) Human body can be seen as a graph of genes, proteins, cells etc Typical graph problems and algorithms: Graph search and path planning Graph clustering Minimum spanning trees Bipartite graph matching Maximum flow Finding special nodes

Big Graphs Big graphs are typically sparse so adjacency list representation is much more space efficient. Typical value may be m = O(n), where m is the number of links and n is the number of nodes in the graph. Example: Facebook has around 1 billion users but each user, on an average, may only have few hundred friends, so the graph is very sparse.

Parallel Depth-First Search Assume that all links have unit distance for simplification. Assume that graph is connected. We are interested in finding the shortest distance from a source vertex to all other vertices. Since the distances are all one, this is the same as a breadth-first search. The largest shortest distance (starting from any node) is known as the diameter. Each node is represented by a node id n (an integer), its current distance (initialized to ) and its adjacency list data structure N.

Parallel Depth-First Search Assume that all links have unit distance for simplification. Assume that graph is connected. We are interested in finding the shortest distance from a source vertex to all other vertices. Since the distances are all one, this is the same as a breadth-first search. The largest shortest distance (starting from any node) is known as the diameter. Each node is represented by a node id n (an integer), its current distance (initialized to ) and its adjacency list data structure N. Each mapper emits a key-value pair for each neighbor on the nodes adjacency list. The key contains the node id of the neighbor, and the value is the current distance to the node plus one.

Parallel Depth-First Search Assume that all links have unit distance for simplification. Assume that graph is connected. We are interested in finding the shortest distance from a source vertex to all other vertices. Since the distances are all one, this is the same as a breadth-first search. The largest shortest distance (starting from any node) is known as the diameter. Each node is represented by a node id n (an integer), its current distance (initialized to ) and its adjacency list data structure N. Each mapper emits a key-value pair for each neighbor on the nodes adjacency list. The key contains the node id of the neighbor, and the value is the current distance to the node plus one. After shuffle and sort, reducers will receive keys corresponding to the destination node ids and distances corresponding to all paths leading to that node.the reducer will select the shortest of these distances and then update the distance in the node data structure.

Parallel Depth-First Search Assume that all links have unit distance for simplification. Assume that graph is connected. We are interested in finding the shortest distance from a source vertex to all other vertices. Since the distances are all one, this is the same as a breadth-first search. The largest shortest distance (starting from any node) is known as the diameter. Each node is represented by a node id n (an integer), its current distance (initialized to ) and its adjacency list data structure N. Each mapper emits a key-value pair for each neighbor on the nodes adjacency list. The key contains the node id of the neighbor, and the value is the current distance to the node plus one. After shuffle and sort, reducers will receive keys corresponding to the destination node ids and distances corresponding to all paths leading to that node.the reducer will select the shortest of these distances and then update the distance in the node data structure. Each iteration of the map-reduce algorithm expands the search frontier by one hop, and, eventually, all nodes will be discovered with their shortest distances (assuming a fully-connected graph).

Parallel Breadth-First Search Pseudo-code

Implementation Tips Note that in this algorithm we are overloading the value type, which can either be a distance (integer) or a complex data structure representing a node. This can be done in Hadoop by creating a wrapper object with an indicator variable specifying the type of the content. Or by creating an abstract class with to sub-classes.

Implementation Tips Note that in this algorithm we are overloading the value type, which can either be a distance (integer) or a complex data structure representing a node. This can be done in Hadoop by creating a wrapper object with an indicator variable specifying the type of the content. Or by creating an abstract class with to sub-classes. Since the graph is connected, all nodes are reachable, and since all edge distances are one, all discovered nodes are guaranteed to have the shortest distances (i.e., there is not a shorter path that goes through a node that hasnt been discovered).

Implementation Tips Note that in this algorithm we are overloading the value type, which can either be a distance (integer) or a complex data structure representing a node. This can be done in Hadoop by creating a wrapper object with an indicator variable specifying the type of the content. Or by creating an abstract class with to sub-classes. Since the graph is connected, all nodes are reachable, and since all edge distances are one, all discovered nodes are guaranteed to have the shortest distances (i.e., there is not a shorter path that goes through a node that hasnt been discovered). Global Hadoop counters can be defined to count the number of nodes that have distances of. At the end of the job, the driver program can access the final counter value and check to see if another iteration is necessary.

Implementation Tips Note that in this algorithm we are overloading the value type, which can either be a distance (integer) or a complex data structure representing a node. This can be done in Hadoop by creating a wrapper object with an indicator variable specifying the type of the content. Or by creating an abstract class with to sub-classes. Since the graph is connected, all nodes are reachable, and since all edge distances are one, all discovered nodes are guaranteed to have the shortest distances (i.e., there is not a shorter path that goes through a node that hasnt been discovered). Global Hadoop counters can be defined to count the number of nodes that have distances of. At the end of the job, the driver program can access the final counter value and check to see if another iteration is necessary. Most real-life graphs have a small diameter. Search for six degrees of freedom or Facebook s experiment that showed 4.74 degrees of freedom over a large set of users.

Shortest Paths in Weighted Graphs Instead of d + 1, the mapper now emits d + w, where w is the weight of the edge. Termination will be different: The algorithm can terminate when shortest distances at every node no longer change. The worst-case is that the number of iterations equals the number of nodes. But most real-life graph will terminate for iterations somewhere close to the diameter of the graph.

References Chapter 5 in Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer. Hadoop: The Definitive Guide (3rd ed.). Tom White, 2012, O Reilly.