Graph-Parallel Problems. ML in the Context of Parallel Architectures

Similar documents
Case Study 4: Collabora1ve Filtering

Case Study 4: Collaborative Filtering. GraphLab

Clustering Documents. Case Study 2: Document Retrieval

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval

GraphLab: A New Framework for Parallel Machine Learning

Why do we need graph processing?

Apache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis

Joseph hgonzalez. A New Parallel Framework for Machine Learning. Joint work with. Yucheng Low. Aapo Kyrola. Carlos Guestrin.

Graphs (Part II) Shannon Quinn

GraphChi: Large-Scale Graph Computation on Just a PC

modern database systems lecture 10 : large-scale graph processing

One Trillion Edges. Graph processing at Facebook scale

Asynchronous Graph Processing

Clustering Lecture 8: MapReduce

Giraph: Large-scale graph processing infrastructure on Hadoop. Qu Zhi

Automatic Scaling Iterative Computations. Aug. 7 th, 2012

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Distributed Systems. 21. Graph Computing Frameworks. Paul Krzyzanowski. Rutgers University. Fall 2016

TI2736-B Big Data Processing. Claudia Hauff

The Future of High Performance Computing

Webinar Series TMIP VISION

Pregel: A System for Large- Scale Graph Processing. Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010

15-388/688 - Practical Data Science: Big data and MapReduce. J. Zico Kolter Carnegie Mellon University Spring 2018

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

The amount of data increases every day Some numbers ( 2012):

2/26/2017. The amount of data increases every day Some numbers ( 2012):

COSC 6339 Big Data Analytics. Graph Algorithms and Apache Giraph

Graph Processing. Connor Gramazio Spiros Boosalis

King Abdullah University of Science and Technology. CS348: Cloud Computing. Large-Scale Graph Processing

CS 5220: Parallel Graph Algorithms. David Bindel

MapReduce: A Programming Model for Large-Scale Distributed Computation

Sublinear Models for Streaming and/or Distributed Data

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Distributed Graph Storage. Veronika Molnár, UZH

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

arxiv: v1 [cs.db] 26 Apr 2012

Programming Systems for Big Data

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

Distributed Graph Algorithms

Graph Data Management

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Large-Scale Graph Processing 1: Pregel & Apache Hama Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G.

Turning NoSQL data into Graph Playing with Apache Giraph and Apache Gora

HaLoop Efficient Iterative Data Processing on Large Clusters

CS /21/2016. Paul Krzyzanowski 1. Can we make MapReduce easier? Distributed Systems. Apache Pig. Apache Pig. Pig: Loading Data.

Databases 2 (VU) ( / )

PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING

Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets

CS10 The Beauty and Joy of Computing

More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING

Large Scale Graph Processing Pregel, GraphLab and GraphX

Pregel. Ali Shah

Lecture 22 : Distributed Systems for ML

CS 61C: Great Ideas in Computer Architecture. MapReduce

MapReduce ML & Clustering Algorithms

Batch Processing Basic architecture

B490 Mining the Big Data. 5. Models for Big Data

Harp-DAAL for High Performance Big Data Computing

Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

PREGEL. A System for Large Scale Graph Processing

Big Graph Processing. Fenggang Wu Nov. 6, 2016

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

Introduction to MapReduce (cont.)

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Research challenges in data-intensive computing The Stratosphere Project Apache Flink

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu

Apache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM

Hadoop An Overview. - Socrates CCDH

Parallel and Distributed Systems for Probabilistic Reasoning Joseph Gonzalez

Pregel: A System for Large-Scale Graph Proces sing

Absorbing Random walks Coverage

Large-Scale GPU programming

PREGEL AND GIRAPH. Why Pregel? Processing large graph problems is challenging Options

Advanced Database Technologies NoSQL: Not only SQL

Absorbing Random walks Coverage

Op#mizing MapReduce for Highly- Distributed Environments

MapReduce. Cloud Computing COMP / ECPE 293A

Jure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah

Cluster Computing Architecture. Intel Labs

Distributed Machine Learning: An Intro. Chen Huang

GraphHP: A Hybrid Platform for Iterative Graph Processing

Introduction to NoSQL by William McKnight

MapReduce Design Patterns

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

The Future of Virtualization: The Virtualization of the Future

SociaLite: A Datalog-based Language for

MapReduce and Friends

Fall 2018: Introduction to Data Science GIRI NARASIMHAN, SCIS, FIU

Distributed Systems. COS 418: Distributed Systems Lecture 1. Mike Freedman. Backrub (Google) Google The Cloud is not amorphous

Massive Online Analysis - Storm,Spark

TensorFlow: A System for Learning-Scale Machine Learning. Google Brain

732A54/TDDE31 Big Data Analytics

Templates. for scalable data analysis. 2 Synchronous Templates. Amr Ahmed, Alexander J Smola, Markus Weimer. Yahoo! Research & UC Berkeley & ANU

Transcription:

Case Study 4: Collaborative Filtering Graph-Parallel Problems Synchronous v. Asynchronous Computation Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 20 th, 2014 Emily Fox 2014 1 ML in the Context of Parallel Architectures GPUs Multicore Clusters Clouds Supercomputers But scalable ML in these systems is hard, especially in terms of: 1. Programmability 2. Data distribution 3. Failures Emily Fox 2014 2 1

Move Towards Higher-Level Abstraction Distributed computing challenges are hard and annoying! 1. Programmability 2. Data distribution 3. Failures High-level abstractions try to simplify distributed programming by hiding challenges: Provide different levels of robustness to failures, optimizing data movement and communication, protect against race conditions Generally, you are still on your own WRT designing parallel algorithms Some common parallel abstractions: Lower-level: Pthreads: abstraction for distributed threads on single machine MPI: abstraction for distributed communication in a cluster of computers Higher-level: Map-Reduce (Hadoop: open-source version): mostly data-parallel problems GraphLab: for graph-structured distributed problems Emily Fox 2014 3 Simplest Type of Parallelism: Data Parallel Problems You have already learned a classifier What s the test error? You have 10B labeled documents and 1000 machines Problems that can be broken into independent subproblems are called data-parallel (or embarrassingly parallel) Map-Reduce is a great tool for this Focus of today s lecture but first a simple example Emily Fox 2014 4 2

Data Parallelism (MapReduce) 1 2 72 4 72 4 48 1 4 5 CPU. 1 CPU. 2 CPU.. 3 CPU.. 4 59 1 64 8 5 3 1 2 94 3 38 34 2 8 Solve a huge number of independent subproblems, e.g., extract features in images Emily Fox 2014 5 Map-Reduce Abstraction Map: Data-parallel over elements, e.g., documents Generate (key,value) pairs value can be any data type Reduce: Aggregate values for each key Must be commutative-associate operation Data-parallel over keys Generate (key,value) pairs Map-Reduce has long history in functional programming But popularized by Google, and subsequently by open-source Hadoop implementation from Yahoo! Emily Fox 2014 6 3

Map-Reduce Execution Overview Map Phase Shuffle Phase Reduce Phase M1 (k 1,v 1 ) (k 2,v 2 ) M1 (k 1,v 1 ) (k 2,v 2 ) Big Data Split data across machines M2 (k 1,v 1 ) (k 2,v 2 ) Assign tuple (k i,v i ) to machine h[k i ] M2 (k 3,v 3 ) (k 4,v 4 ) M1000 (k 1,v 1 ) (k 2,v 2 ) M1000 (k 5,v 5 ) (k 6,v 6 ) Emily Fox 2014 7 Issues with Map-Reduce Abstraction Often all data gets moved around cluster Very bad for iterative settings Definition of Map & Reduce functions can be unintuitive in many apps Graphs are challenging Computation is synchronous Emily Fox 2014 8 4

SGD for Matrix Factorization in Map-Reduce? " L (t+1) u R (t+1) v # " (1 t u )L (t) u (1 t v )R v (t) t t R v (t) t t L (t) u # Map and Reduce functions??? t = L (t) u R (t) v r uv Map-Reduce: Data-parallel over all mappers Data-parallel over reducers with same key Here, one update at a time! Emily Fox 2014 9 Matrix Factorization as a Graph 4 3 Women on the Verge of a Nervous Breakdown The Celebration City of God 2 5 Wild Strawberries La Dolce Vita Emily Fox 2014 10 5

Flashback to 1998 Why? First Google advantage: a Graph Algorithm & a System to Support it! Social Media Science Adver6sing Web " Graphs encode the rela6onships between: People Facts Products Interests Ideas " Big: 100 billions of ver6ces and edges and rich metadata " Facebook (10/2012): 1B users, 144B friendships " TwiUer (2011): 15B follower edges Emily Fox 2014 12 6

Facebook Graph Slide from Facebook Engineering presentation13 Label a Face and Propagate 14 7

Pairwise similarity not enough Not similar enough to be sure 15 Propagate SimilariZes & Co- occurrences for Accurate PredicZons similarity edges co-occurring faces further evidence 16 8

Example: Es9mate Poli9cal Bias Liberal ConservaZve 17 Topic Modeling (e.g., LDA) Cat Apple Growth Hat Plant 18 9

ML Tasks Beyond Data- Parallelism Data-Parallel Graph-Parallel Map Reduce Feature ExtracZon Cross ValidaZon CompuZng Sufficient StaZsZcs Graphical Models Gibbs Sampling Belief Propagation Variational Opt. Collaborative Filtering Tensor Factorization Semi-Supervised Learning Label Propagation CoEM Graph Analysis PageRank Triangle Counting Emily Fox 2014 19 Example of a Graph- Parallel Algorithm 10

PageRank Depends on rank of who follows them Depends on rank of who follows her What s the rank of this user? Rank? Loops in graph Must iterate! 21 PageRank IteraZon R[i] = + (1 R[j] ) X wji R[j] (j,i)2e wji R[i] " " α is the random reset probability wji is the prob. transizoning (similarity) from j to i 22 11

ProperZes of Graph Parallel Algorithms Dependency Graph Local Updates Iterative Computation My Rank Friends Rank 23 Addressing Graph- Parallel ML Data-Parallel Map Reduce Feature ExtracZon Cross ValidaZon CompuZng Sufficient StaZsZcs Graph-Parallel Graph- Parallel AbstracZon Graphical Models Semi-Supervised Gibbs Sampling Learning Belief Propagation Variational Opt. Label Propagation CoEM Collaborative Filtering Data-Mining Tensor Factorization PageRank Triangle Counting 24 12

Graph ComputaZon: Synchronous v. Asynchronous Bulk Synchronous Parallel Model: Pregel (Giraph) [Valiant 90] Compute Communicate Barrier Emily Fox 2014 26 13

Map-Reduce Execution Overview Map Phase Shuffle Phase Reduce Phase M1 (k 1,v 1 ) (k 2,v 2 ) M1 (k 1,v 1 ) (k 2,v 2 ) Big Data Split data across machines M2 (k 1,v 1 ) (k 2,v 2 ) Assign tuple (k i,v i ) to machine h[k i ] M2 (k 3,v 3 ) (k 4,v 4 ) M1000 (k 1,v 1 ) (k 2,v 2 ) M1000 (k 5,v 5 ) (k 6,v 6 ) Emily Fox 2014 27 BSP Execution Overview Compute Phase Communicate Phase M1 (vid 1,vid 1, v 1 ) (vid 2,vid 2, v 2 ) M1 Big Graph Split graph across machines M2 (vid 1,vid 1, v 1 ) (vid 2,vid 2, v 2 ) Message machine for every edge (vid,vid,val) M2 M1000 (vid 1,vid 1, v 1 ) (vid 2,vid 2, v 2 ) M1000 Emily Fox 2014 28 14

Bulk synchronous parallel model provably inefficient for some ML tasks Analyzing Belief PropagaZon [Gonzalez, Low, G. 09] focus here A B Priority Queue Smart Scheduling important influence Emily Fox 2014 30 15

Asynchronous Belief PropagaZon Challenge = Boundaries Many Updates Synthetic Noisy Image Cumulative Vertex Updates Few Updates Graphical Model Algorithm idenzfies and focuses on hidden sequenzal structure Emily Fox 2014 31 BSP ML Problem: Synchronous Algorithms can be Inefficient Run6me in Seconds 10000 8000 6000 4000 2000 0 Bulk Synchronous (e.g., Pregel) Asynchronous Splash BP 1 2 3 4 5 6 7 8 Number of CPUs Theorem: Bulk Synchronous BP O(#verZces) slower than Asynchronous BP Emily Fox 2014 32 16

Synchronous v. Asynchronous Bulk synchronous processing: Computation in phases All vertices participate in a phase Though OK to say no-op All messages are sent Simpler to build, like Map-Reduce No worries about race conditions, barrier guarantees data consistency Simpler to make fault-tolerant, save data on barrier Slower convergence for many ML problems In matrix-land, called Jacobi Iteration Implemented by Google Pregel 2010 Asynchronous processing: Vertices see latest information from neighbors Most closely related to sequential execution Harder to build: Race conditions can happen all the time Must protect against this issue More complex fault tolerance When are you done? Must implement scheduler over vertices Faster convergence for many ML problems In matrix-land, called Gauss-Seidel Iteration Implemented by GraphLab 2010, 2012 Emily Fox 2014 33 Acknowledgements Slides based on Carlos Guestrin s GraphLab talk Emily Fox 2014 34 17