Distributed Machine Learning: An Intro. Chen Huang

Similar documents
Toward Scalable Deep Learning

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Distributed Computing with Spark

CS-541 Wireless Sensor Networks

CS 6453: Parameter Server. Soumya Basu March 7, 2017

Big Data Infrastructures & Technologies

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

Lecture 22 : Distributed Systems for ML

Alternating Direction Method of Multipliers

RESEARCH ARTICLE. Angel: a new large-scale machine learning system. Jie Jiang 1,2,, Lele Yu 1,,JiaweiJiang 1, Yuhong Liu 2 and Bin Cui 1, ABSTRACT

Distributed Machine Learning" on Spark

Research challenges in data-intensive computing The Stratosphere Project Apache Flink

Scaling Distributed Machine Learning

ECS289: Scalable Machine Learning

Batch Processing Basic architecture

An Introduction to Apache Spark

Large Scale Graph Processing Pregel, GraphLab and GraphX

Spark. Cluster Computing with Working Sets. Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications

Analytics in Spark. Yanlei Diao Tim Hunter. Slides Courtesy of Ion Stoica, Matei Zaharia and Brooke Wenig

More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server

COMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare

Processing of big data with Apache Spark

A Hierarchical Synchronous Parallel Model for Wide-Area Graph Analytics

Distributed Computing with Spark and MapReduce

DATA SCIENCE USING SPARK: AN INTRODUCTION

Practice Questions for Midterm

MLlib and Distributing the " Singular Value Decomposition. Reza Zadeh

Webinar Series TMIP VISION

A Parallel R Framework

Chapter 4: Apache Spark

a Spark in the cloud iterative and interactive cluster computing

2/4/2019 Week 3- A Sangmi Lee Pallickara

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

Resilient Distributed Datasets

Scaling Distributed Machine Learning with the Parameter Server

Lecture 11 Hadoop & Spark

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Apache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters

MLI - An API for Distributed Machine Learning. Sarang Dev

Large Scale Distributed Deep Networks

MALT: Distributed Data-Parallelism for Existing ML Applications

Asynchronous Stochastic Gradient Descent on GPU: Is It Really Better than CPU?

15-388/688 - Practical Data Science: Big data and MapReduce. J. Zico Kolter Carnegie Mellon University Spring 2018

732A54/TDDE31 Big Data Analytics

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21

Big Graph Processing. Fenggang Wu Nov. 6, 2016

Apache Flink- A System for Batch and Realtime Stream Processing

Programming Systems for Big Data

Lecture 11: Distributed Training and Communication Protocols. CSE599W: Spring 2018

MI-PDB, MIE-PDB: Advanced Database Systems

ECS289: Scalable Machine Learning

Automatic Scaling Iterative Computations. Aug. 7 th, 2012

Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Spark Overview. Professor Sasu Tarkoma.

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Apache Flink

Data Platforms and Pattern Mining

Parallelization in the Big Data Regime: Model Parallelization? Sham M. Kakade

Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds

Scaled Machine Learning at Matroid

Linear Regression Optimization

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications

Introduction to MapReduce (cont.)

Harp-DAAL for High Performance Big Data Computing

SCALABLE DISTRIBUTED DEEP LEARNING

RESILIENT DISTRIBUTED DATASETS: A FAULT-TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER COMPUTING

15.1 Optimization, scaling, and gradient descent in Spark

Big Data Hadoop Course Content

Hadoop Map Reduce 10/17/2018 1

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Graph-Processing Systems. (focusing on GraphChi)

Machine Learning. The Algorithm and System Interface of Distributed Machine Learning. Eric Xing , Fall Lecture 22, November 28, 2016

MIT805 BIG DATA MAPREDUCE


AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang

Analytic Cloud with. Shelly Garion. IBM Research -- Haifa IBM Corporation

Why do we need graph processing?

arxiv: v1 [cs.db] 26 Apr 2012

Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm

Accelerating Spark Workloads using GPUs

Big Data Infrastructures & Technologies Hadoop Streaming Revisit.

STA141C: Big Data & High Performance Statistical Computing

modern database systems lecture 10 : large-scale graph processing

HAWQ: A Massively Parallel Processing SQL Engine in Hadoop

Starling: A Scheduler Architecture for High Performance Cloud Computing

Big Data Architect.

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Templates. for scalable data analysis. 2 Synchronous Templates. Amr Ahmed, Alexander J Smola, Markus Weimer. Yahoo! Research & UC Berkeley & ANU

MLlib. Distributed Machine Learning on. Evan Sparks. UC Berkeley

Spark 2. Alexey Zinovyev, Java/BigData Trainer in EPAM

Microsoft Windows HPC Server 2008 R2 for the Cluster Developer

SparkStreaming. Large scale near- realtime stream processing. Tathagata Das (TD) UC Berkeley UC BERKELEY

Database Systems CSE 414

Harp: Collective Communication on Hadoop

Apache Flink Big Data Stream Processing

Transcription:

: An Intro. Chen Huang Feature Engineering Group, Data Mining Lab, Big Data Research Center, UESTC

Contents Background Some Examples Model Parallelism & Data Parallelism Parallelization Mechanisms Synchronous Asynchronous Parallelization Frameworks MPI / AllReduce / MapReduce / Parameter Server GraphLab / Spark GraphX 2

Background Why Distributed ML? Big Data Problem Efficient Algorithm Online Learning / Data Stream Feasible. What about high dimension? Distributed Machine The more, the merrier 3

Background Why Distributed ML? Big Data Efficient Algorithm Online Learning / Data Stream Distributed Machine Big Model Model Split Model Distributed 4

Background Distributed Machine Learning Big model over big data 5

Background Overview Distributed Machine Learning Motivation Big model over big data DML Multiple workers cooperate each other with communication Target Get the job done (convergence, ) Min communication cost (IO, ) Max effect (Time, performance ) 6

Example K-means 7

Example Distributed K-means? 8

Example Spark K-means 9

Example Spark K-means 10

Example Spark K-means 11

Example Item filter Given two files, you need to output key-value pairs in file B, whose key exists in file A. File B is super large. (e.g. 100GB) A Key1 Key12 Key3 Key5 Key1, val1 Key2, val2 Key4, val4 Key3, val3 Key5, val5 B What if A is also super large? 12

Example Item filter A Key1 Key12 Key3 Key5 Key1 Key12 Key3 Key5. B Key1, val1 Key2, val2 Key4, val4 Key3, val3 Key5, val5 Key1, val1 Key2, val2 Key4, val4 Key3, val3 Key5, val5,. 13

Example Item filter A Key1 Key12 Key3 Key5 Hash Key1 Key3 Key5 Key12. B Key1, val1 Key2, val2 Key4, val4 Key3, val3 Key5, val5 Hash Key1, val1 Key2, val2 Key3, val3 Key7, val7,. Key4, val4 Key5, val5 14

Overview * AAAI 2017 Workshop on Distributed Machine Learning for more information 15

How To Distribute Key Problems How to split Data parallelism / model parallelism Data / Parameters dependency How to aggregate messages Parallelization mechanisms Consensus between local & global parameters Does algorithm converge Other concerns Communication cost, 16

How To Split How To Distribute Data Parallelism 1. Data partition 2. Parallel training 3. Combine local updates 4. Refresh local model with new parameters 17

How To Split How To Distribute Model Parallelism 1. Partition model into multiple local workers 2. Workers collaborate with each other to perform optimization 18

How To Split How To Distribute Model Parallelism & Data Parallelism Example: Distributed Logistic Regression 19

How To Split Categories Data Parallelism Split data into many samples sets Workers calculate the same parameter(s) on different sample set Model Parallelism Split model/parameter Workers calculate different parameter(s) on the same data set Hybrid Parallelism 20

How To Split Data / Parameter Split Data Allocation Random selection. (Shuffling) Partition. (e.g. Item filter, word count) Sampling Parallel graph calculation (for non-i.i.d. data) Parameter Split Most algorithms assume parameter independent and randomly split parameters Petuum (KDD 15, Eric Xing) 21

How To Aggregate Messages Parallelization Mechanisms Given the feedback g i w of worker i, how can we update the model parameter W? W = f g 1 w, g 2 w,, g m w 22

Parallelization Mechanism Bulk Synchronous Parallel (BSP) Synchronous update Update parameter until all workers are done with their job Example: Sync SGD (Mini-batch SGD), Hadoop 23

Sync SGD Perceptron 24

Sync SGD W W W i W W 1 W W 2 W W 3 = x i M x i y i 25

Parallelization Mechanism Asynchronous Parallel Asynchronous update Update parameter whenever received the feedback of workers Example: Downpour SGD (NIPS 12) 26

Downpour SGD 27

Async. V.S. Sync. Sync. Single point of failure: it has to wait until all workers finished his job. The overall efficiency of algorithm is determinated by the slowest worker. Nice convergence Async. Very fast! Affect the convergence of algorithm. (e.g. expired gradient) Use it, if model is not sensitive to async. update 28

Parallelization Mechanism ADMM for DML Alternating Direction Method of Multipliers Augmented Lagrangian + Dual Decomposition For DML case: replace x k 1 2 with mean(x k 1 2 ) and x k 1 with mean x k 1 when updating Famous optimization algorithm for both industrial and academic. (e.g. computing advertising) 29

Parallelization Mechanisms Overview Sync. Async ADMM Model Average Elastic Averaging SGD (NIPS 15) Lock Free: Hogwild! (NIPS 11) 30

Distributed ML Framework 31

Frameworks This is a joke, please laugh 32

Frameworks Message Passing Interface (MPI) Parallel computing architecture Many operations: send, receive, broadcast, scatter, gather 33

Frameworks Message Passing Interface (MPI) Parallel computing architecture Many operations: AllReduce = reduce + broadcast Hard to write code! 34

Frameworks MapReduce Well-encapsulated code, user-friendly! Designed scheduler, Integration with HDFS / fault-tolerant /. 35

Frameworks MapReduce Synchronous parallel, single point of failure. 数据溢写 (I don t know how to translate ) Not so suitable for machine learning task. Many ML models are solved in iterative manner, and Hadoop/MapReduce does not naturally support iteration calculation Spark does Iterative MapReduce Style Machine Learning Toolkits Hadoop Mahout Spark MLlib 36

Frameworks GraphLab (UAI 10, VLDB 12) Distributed computing framework for graph Split graph into sub-graphs by node cut Asynchronous parallel 37

Frameworks GraphLab (UAI 10, VLDB 12) Data Graph + Update Function + Sync Operation Data Graph Update function: user-defined function, working on scopes Sync:global parameter update Scope allows overlapping 38

Frameworks GraphLab (UAI 10, VLDB 12) Data Graph + Update Function + Sync Operation Three Steps = Gather + Apply + Scatter Read Only Write Node Only Write Edge Only 39

Frameworks GraphLab: Consistency Control Trade-off between conflict and parallelization Scope 内, 除了邻居节点, 都不能读写 Scope 内不能读写 一次更新中, 其他操作不能读写该节点 40

Frameworks Spark GraphX Avoid the cost of moving sub-graphs among workers by combining Table view & Graph view 41

Frameworks Spark GraphX Avoid the cost of moving sub-graphs among workers by combining Table view & Graph view 42

Frameworks Parameter Server Asynchronous parallel 1. Workers query for current parameters 2. Parameters are stored in distributed way, among server nodes Workers calculate partial parameters 43

Frameworks Parameter Server Asynchronous parallel 44

DML Trends Overview For more information, please go to: AAAI-17 Tutorial on Distributed Machine Learning 45

Take Home Message How to split Data parallelism / model parallelism Data / Parameters dependency How to aggregate messages Parallelization mechanisms Consensus between local & global parameters Does algorithm converge Frameworks 46

Thanks