More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server

Size: px
Start display at page:

Download "More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server"

Transcription

1 More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server Q. Ho, J. Cipar, H. Cui, J.K. Kim, S. Lee, *P.B. Gibbons, G.A. Gibson, G.R. Ganger, E.P. Xing Carnegie Mellon University *Intel Labs

2 Distributed ML: one machine to many Setting: have iterative, parallel ML algorithm E.g. optimization, MCMC algorithms For topic models, regression, matrix factorization, SVMs, DNNs, etc. Critical updates executed on one machine, in parallel Worker threads share global model parameters θ via RAM for (t = to T) { dothings() parallelupdate(x,θ) dootherthings() } Parallelize over worker threads Share global model parameters via RAM θ θ θ θ θ θ θ θ θ θ θ θ θ

3 Distributed ML: one machine to many Want: scale up by distributing ML algorithm Must now share parameters over a network Seems like a simple task Many distributed tools available, so just pick one and go? Distributed Algorithm θ θ θ θ Single machine, multiple threads Multiple machines, communicating over network switches

4 4 Distributed ML Challenges Not quite that easy Two distributed challenges: Networks are slow Identical machines rarely perform equally Unequal performance Low bandwidth, High delay

5 5 Networks are (relatively) slow Low network bandwidth: 0.-GB/s (inter-machine) vs 0GB/s (CPU-RAM) Fewer parameters transmitted per second High network latency (messaging time): 0,000-00,000 ns (inter-machine) vs 00 ns (CPU-RAM) Wait much longer to receive parameters High bandwidth Low latency Low bandwidth High latency

6 6 Networks are (relatively) slow Parallel ML requires frequent synchronization Exchange 0-000K scalars per second, per thread Parameters not shared quickly enough communication bottleneck Significant bottleneck over a network! θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ

7 Seconds 7 Networks are (relatively) slow Time Breakdown: Compute vs Network LDA machines (56 cores), 0% data per iter Network waiting time Compute time BSP For a clean setting with full control over machines and full network capacity Real clusters with many users have even worse network:compute ratios!

8 8 Machines don t perform equally Even when configured identically Variety of reasons: Vibrating hard drive Background programs; part of a distributed filesystem Other users Machine is a VM/cloud service Occasional, random slowdowns in different machines

9 9 Consequence: Scaling up ML is hard! Going from to N machines: Naïve implementations rarely yield N-fold speedup Slower convergence due to machine slowdowns, network bottlenecks If not careful, even worse than a single machine! Algorithm diverges due to errors from slowdowns!

10 0 Existing general-purpose scalable ML Theory-oriented Focus on algorithm correctness/convergence Examples: Cyclic fixed-delay schemes (Langford et al., Agarwal & Duchi) Single-machine asynchronous (Niu et al.) Naively-parallel SGD (Zinkevich et al.) Partitioned SGD (Gemulla et al.) May oversimplify systems issues e.g. need machines to perform consistently e.g. need lots of synchronization e.g. or even try not to communicate at all Systems-oriented Focus on high iteration throughput Examples: MapReduce: Hadoop and Mahout Spark Graph-based: GraphLab, Pregel May oversimplify ML issues e.g. assume algorithms just work in distributed setting, without proof e.g. must convert programs to new programming model; nontrivial effort

11 Existing general-purpose scalable ML Theory-oriented Focus on algorithm correctness/convergence Examples: Cyclic fixed-delay schemes (Langford et al., Agarwal & Duchi) Single-machine asynchronous (Niu et al.) Naively-parallel SGD (Zinkevich et al.) Partitioned SGD (Gemulla et al.) May oversimplify systems issues e.g. need machines to perform consistently e.g. need lots of synchronization e.g. or even try not to communicate at all Systems-oriented Focus on high iteration throughput Examples: MapReduce: Hadoop and Mahout Spark Graph-based: GraphLab, Pregel May oversimplify ML issues e.g. assume algorithms just work in distributed setting, without proof e.g. must convert programs to new programming model; nontrivial effort Can we take both sides into account?

12 Middle of the road approach Want: ML algorithms converge quickly under imperfect systems conditions e.g. slow network performance e.g. random machine slowdowns Parameters are not communicated consistently Existing work: mostly use one of two communication models Bulk Synchronous Parallel (BSP) Asynchronous (Async) First, understand pros and cons of BSP and Async

13 Bulk Synchronous Parallel Synchronization Barrier (Parameters read/updated here) Thread 4 5 Thread 4 5 Thread 4 5 Thread Time Threads synchronize (wait for each other) every iteration Threads all on same iteration # Parameters read/updated at synchronization barriers

14 4 The cost of synchronicity Thread Thread Thread Thread 4 (a) Machines perform unequally (b) Algorithmic workload imbalanced So threads must wait for each other Time End-of-iteration sync gets longer with larger clusters (due to slow network)

15 5 The cost of synchronicity Wasted computing time! Thread Thread Thread Thread 4 Time Threads must wait for each other End-of-iteration sync gets longer with larger clusters Precious computing time wasted

16 6 Asynchronous Parameters read/updated at any time Thread Thread Thread Thread Time Threads proceed to next iteration without waiting Threads not on same iteration # Parameters read/updated any time

17 7 Slowdowns and Async Difference in iterations parameter error Thread Thread Thread Thread Time Machine suddenly slows down (hard drive, background process, etc.) Causing iteration difference between threads Leading to error in parameters

18 8 Async worst-case situation Difference in iterations parameter error Thread Thread Thread Thread Time Large clusters have arbitrarily large slowdowns! Machines become inaccessible for extended periods Error becomes unbounded!

19 9 What we really want Partial synchronicity Spread network comms evenly (don t sync unless needed) Threads usually shouldn t wait but mustn t drift too far apart! Straggler tolerance Slow threads must somehow catch up Is there a middle ground between BSP and Async? Thread Thread Thread Thread Thread Thread Thread 4 Thread BSP??? Async

20 0 That middle ground Partial synchronicity Spread network comms evenly (don t sync unless needed) Threads usually shouldn t wait but mustn t drift too far apart! Straggler tolerance Slow threads must somehow catch up Thread Force threads to sync up Make thread catch up Thread Thread Thread Time

21 That middle ground How do we realize this? Thread Force threads to sync up Make thread catch up Thread Thread Thread Time

22 Stale Synchronous Parallel Staleness Threshold Thread waits until Thread has reached iter 4 Thread Thread Thread Thread Iteration Note: x-axis is now iteration count, not time! Allow threads to usually run at own pace Fastest/slowest threads not allowed to drift >S iterations apart Threads cache local (stale) versions of the parameters, to reduce network syncing

23 Stale Synchronous Parallel Staleness Threshold Thread Thread Thread Thread 4 Thread will always see these updates Thread may not see these updates (possible error) Iteration A thread at iter T sees all parameter updates before iter T-S Protocol: check cache first; if too old, get latest version from network Consequence: fast threads must check network every iteration Slow threads only check every S iterations fewer network accesses, so catch up!

24 4 SSP provides best-of-both-worlds SSP combines best properties of BSP and Async BSP-like convergence guarantees Threads cannot drift more than S iterations apart Every thread sees all updates before iteration T-S Asynchronous-like speed Threads usually don t wait (unless there is drift) Slower threads read from network less often, thus catching up SSP is a spectrum of choices Can be fully synchronous (S = 0) or very asynchronous (S ) Or just take the middle ground, and benefit from both! Thread Thread Thread Thread 4

25 5 Why does SSP converge? Instead of x true, SSP sees x stale = x true + error The error caused by staleness is bounded Over many iterations, average error goes to zero

26 6 Why does SSP converge? SSP approximates sequential execution Thread Thread Sequential execution Thread Thread Clock Compare actual update order to ideal sequential execution

27 7 Why does SSP converge? SSP approximates sequential execution Staleness Thread Thread Thread Sequential execution Possible error windows for this update: Thread Clock SSP may lose up to S iterations of updates to the left

28 8 Why does SSP converge? SSP approximates sequential execution Staleness Thread Thread Thread Sequential execution Possible error windows for this update: Thread Clock as well as gain up to S iterations of updates to the right

29 9 Why does SSP converge? SSP approximates sequential execution Error window (x)- = 5 iters Thread Thread Thread Sequential execution Possible error windows for this update: Thread Clock Thus, at most S- iterations of erroneous updates Hence numeric error in parameters is also bounded Partial, but bounded, loss of serializability

30 0 Convergence Theorem Want: minimize convex (Example: Stochastic Gradient) L-Lipschitz, problem diameter bounded by F Staleness s, using P threads across all machines Use step size

31 Convergence Theorem Want: minimize convex (Example: Stochastic Gradient) L-Lipschitz, problem diameter bounded by F Staleness s, using P threads across all machines Use step size Difference between SSP estimate and true optimum SSP converges according to Where T is the number of iterations Note: RHS bound contains (L, F) and (s, P) The interaction between theory and systems parameters

32 SSP solves Distributed ML challenges SSP is a synchronization model for fast and correct distributed ML For abelian parameter updates of the form θ new = θ old + Δ SSP reduces network traffic Threads use stale local cache whenever possible Addresses slow network and occasional machine slowdowns Cache Cache Cache Cache

33 SSP + Parameter Server = Easy Distributed ML We implement SSP as a parameter server (PS), called SSPTable Provides all machines with convenient access to global model parameter Can be run on multiple machines reduces load per machine SSPTable allows easy conversion of single-machine parallel ML algorithms Distributed shared memory programming style No need for complicated message passing Replace local memory access with PS access Worker Worker SSPTable (one or more machines) Worker Worker 4 Ahmed et al. (WSDM 0), Power and Li (OSDI 00) Single Machine Parallel Distributed with SSPTable UpdateVar(i) { old = y[i] delta = f(old) y[i] += delta } UpdateVar(i) { old = PS.read(y,i) delta = f(old) PS.inc(y,i,delta) }

34 4 SSPTable Programming Easy, table-based programming just commands! No message passing, barriers, locks, etc. read_row(table,row,s) Retrieve a table row with staleness s inc(table,row,col,value) Increment table s (row,col) by value clock() Inform PS that this thread is advancing to the next iteration

35 5 SSPTable Programming Just put global parameters in SSPTable! Examples: Topic Modeling (MCMC) Topic-word table Matrix Factorization (SGD) Factor matrices L, R Topic Topic Topic Topic 4 SSPTable R Lasso Regression (CD) Coefficients β L SSPTable supports generic classes of algorithms With these models as examples β

36 Seconds 6 SSPTable uses networks efficiently Time Breakdown: Compute vs Network LDA machines (56 cores), 0% data per iter Network waiting time Compute time BSP

37 Seconds 7 SSPTable uses networks efficiently Time Breakdown: Compute vs Network LDA machines (56 cores), 0% data per iter Network waiting time Compute time BSP Staleness Network communication is a huge bottleneck with many machines SSP balances network and compute time

38 Log-Likelihood 8 SSPTable vs BSP and Async -9.00E E E E E E E E E+09 LDA on NYtimes Dataset LDA machines (56 cores), 0% docs per iter Seconds BSP (stale 0) async NYtimes data N = 00M tokens K = 00 topics V = 00K terms BSP has strong convergence guarantees but is slow Asynchronous is fast but has weak convergence guarantees

39 Log-Likelihood 9 SSPTable vs BSP and Async -9.00E E E E E E E+09 LDA on NYtimes Dataset LDA machines (56 cores), 0% docs per iter BSP (stale 0) stale NYtimes data N = 00M tokens K = 00 topics V = 00K terms -.5E E+09 Seconds async BSP has strong convergence guarantees but is slow Asynchronous is fast but has weak convergence guarantees SSPTable is fast and has strong convergence guarantees

40 Iterations Log-Likelihood 40 The Quality vs Quantity tradeoff Quantity: iterations versus time LDA machines, 0% data Seconds Quality: objective versus iterations LDA machines, 0% data -9.00E E E E E E E E E Iterations BSP (stale 0) stale 6 stale 4 stale 48 Progress per time is (iters/sec) * (progress/iter) High staleness yields more iters/sec, but lowers progress/iter Find the sweet spot staleness >0 for maximum progress per second

41 4 The Quality vs Quantity tradeoff More Staleness Progress per time is (iters/sec) * (progress/iter) High staleness yields more iters/sec, but lowers progress/iter Find the sweet spot staleness >0 for maximum progress per second

42 Objective 4 Matrix Factorization (Netflix).40E+09.0E+09.00E E E E+08.00E E+00 Objective function versus time MF machines (56 threads) BSP (stale 0) stale Seconds Netflix data 00M nonzeros 480K rows 8K columns rank 00

43 Objective 4 LASSO (Synthetic) 4.80E E E E E-0 Objective function versus time Lasso 6 machines (8 threads) BSP (stale 0) stale 0 stale 0 stale 40 stale 80 Synthetic data N = 500 samples P = 400K features 4.0E-0 4.0E Seconds

44 Log-Likelihood Inverse time to convergence 44 SSPTable scaling with # machines -8E+08-9E+08 LDA on NYtimes dataset (staleness = 0, k docs per core per iteration) Double # machines: 78% speedup converge in 56% time -E+09 -.E+09 -.E+09 -.E+09 machines (56 cores) 6 machines (8 cores) 8 machines (64 cores) 4 machines ( cores) machines (6 cores) machine (8 cores) Ideal Scaling SSP E+09 Seconds # machines SSP computational model scales with increasing # machines (given a fixed dataset)

45 45 Recent Results Using 8 machines * 6 cores = 8 threads 8GB RAM per machine Latent Dirichlet Allocation NYTimes dataset (00M tokens, 00K words, 0K topics) SSP 00K tokens/s GraphLab 80K tokens/s PubMed dataset (7.5B tokens, 4K words, 00 topics) SSP.M tokens/s GraphLab.8M tokens/s Network latent space role modeling Friendster network sample (9M nodes, 80M edges) 50 roles: SSP takes 4h to converge (vs 5 days on one machine)

46 46 Future work Theory SSP for MCMC Automatic staleness tuning Average-case analysis for better bounds Systems Load balancing Fault tolerance Prefetching Other consistency schemes Applications Hard-to-parallelize ML models DNNs, Regularized Bayes, Network Analysis models

47 47 Coauthors James Cipar Henggang Cui Jin Kyu Kim Seunghak Lee Phillip B. Gibbons Garth A. Gibson Gregory R. Ganger Eric P. Xing

48 48 Workshop Demo SSP is part of a bigger system: Petuum SSP parameter server STRADS dynamic variable scheduler More features in the works We have a demo! Topic modeling (8.M docs, 7.5B tokens, 4K words, 0K topics) Lasso regression (00K samples, 00M dimensions, 5 billion nonzeros) Network latent space modeling (9M nodes, 80M edges, 50 roles) At BigLearning 0 workshop (Monday)

49 49 Summary Distributed ML is nontrivial Slow network Unequal machine performance SSP addresses those problems Efficiently use network resources; reduces waiting time Allows slow machines to catch up Fast like Async, converges like BSP SSPTable parameter server provides easy table interface Quickly convert single-machine parallel ML algorithms to distributed Slides:

Lecture 22 : Distributed Systems for ML

Lecture 22 : Distributed Systems for ML 10-708: Probabilistic Graphical Models, Spring 2017 Lecture 22 : Distributed Systems for ML Lecturer: Qirong Ho Scribes: Zihang Dai, Fan Yang 1 Introduction Big data has been very popular in recent years.

More information

More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server

More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server Research Showcase @ CMU Machine Learning Department School of Computer Science 12-213 More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server Qirong Ho James Cipar Henggang Cui

More information

Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds

Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds Kevin Hsieh Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, Gregory R. Ganger, Phillip B. Gibbons, Onur Mutlu Machine Learning and Big

More information

Future of Computing II: What s So Special About Big Learning? : Introduction to Computer Systems 28 th Lecture, Dec. 6, 2016

Future of Computing II: What s So Special About Big Learning? : Introduction to Computer Systems 28 th Lecture, Dec. 6, 2016 Carnegie Mellon Future of Computing II: What s So Special About Big Learning? 15-213: Introduction to Computer Systems 28 th Lecture, Dec. 6, 2016 Instructor: Phil Gibbons Bryant and O Hallaron, Computer

More information

Machine Learning. The Algorithm and System Interface of Distributed Machine Learning. Eric Xing , Fall Lecture 22, November 28, 2016

Machine Learning. The Algorithm and System Interface of Distributed Machine Learning. Eric Xing , Fall Lecture 22, November 28, 2016 School of Computer Science Machine Learning 10-701, Fall 2016 The Algorithm and System Interface of Distributed Machine Learning Eric Xing Lecture 22, November 28, 2016 Reading: see post Eric Xing @ CMU,

More information

Toward Scalable Deep Learning

Toward Scalable Deep Learning 한국정보과학회 인공지능소사이어티 머신러닝연구회 두번째딥러닝워크샵 2015.10.16 Toward Scalable Deep Learning 윤성로 Electrical and Computer Engineering Seoul National University http://data.snu.ac.kr Breakthrough: Big Data + Machine Learning

More information

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters Hao Zhang Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jianliang Wei, Pengtao Xie,

More information

Benchmarking Apache Spark with Machine Learning Applications

Benchmarking Apache Spark with Machine Learning Applications Benchmarking Apache Spark with Machine Learning Applications Jinliang Wei, Jin Kyu Kim, Garth A. Gibson CMU-PDL-16-7 October 216 Parallel Data Laboratory Carnegie Mellon University Pittsburgh, PA 15213-389

More information

Solving the straggler problem for iterative convergent parallel ML

Solving the straggler problem for iterative convergent parallel ML Solving the straggler problem for iterative convergent parallel ML Aaron Harlap, Henggang Cui, Wei Dai, Jinliang Wei Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, Eric P. Xing Carnegie Mellon

More information

Distributed Machine Learning: An Intro. Chen Huang

Distributed Machine Learning: An Intro. Chen Huang : An Intro. Chen Huang Feature Engineering Group, Data Mining Lab, Big Data Research Center, UESTC Contents Background Some Examples Model Parallelism & Data Parallelism Parallelization Mechanisms Synchronous

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 4, 2016 Outline Multi-core v.s. multi-processor Parallel Gradient Descent Parallel Stochastic Gradient Parallel Coordinate Descent Parallel

More information

Managed Communication and Consistency for Fast Data-Parallel Iterative Analytics

Managed Communication and Consistency for Fast Data-Parallel Iterative Analytics Managed Communication and Consistency for Fast Data-Parallel Iterative Analytics Jinliang Wei Wei Dai Aurick Qiao Qirong Ho? Henggang Cui Gregory R. Ganger Phillip B. Gibbons Garth A. Gibson Eric P. Xing

More information

STRADS: A Distributed Framework for Scheduled Model Parallel Machine Learning

STRADS: A Distributed Framework for Scheduled Model Parallel Machine Learning STRADS: A Distributed Framework for Scheduled Model Parallel Machine Learning Jin Kyu Kim 1 Qirong Ho 2 Seunghak Lee 1 Xun Zheng 1 Wei Dai 1 Garth A. Gibson 1 Eric P. Xing 1 1 School of Computer Science,

More information

Scaling Distributed Machine Learning with the Parameter Server

Scaling Distributed Machine Learning with the Parameter Server Scaling Distributed Machine Learning with the Parameter Server Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su Presented

More information

Sparse Training Data Tutorial of Parameter Server

Sparse Training Data Tutorial of Parameter Server Carnegie Mellon University Sparse Training Data Tutorial of Parameter Server Mu Li! CSD@CMU & IDL@Baidu! muli@cs.cmu.edu High-dimensional data are sparse Why high dimension?! make the classifier s job

More information

Scaling Distributed Machine Learning

Scaling Distributed Machine Learning Scaling Distributed Machine Learning with System and Algorithm Co-design Mu Li Thesis Defense CSD, CMU Feb 2nd, 2017 nx min w f i (w) Distributed systems i=1 Large scale optimization methods Large-scale

More information

Exploiting characteristics of machine learning applications for efficient parameter servers

Exploiting characteristics of machine learning applications for efficient parameter servers Exploiting characteristics of machine learning applications for efficient parameter servers Henggang Cui hengganc@ece.cmu.edu October 4, 2016 1 Introduction Large scale machine learning has emerged as

More information

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,

More information

Automatic Scaling Iterative Computations. Aug. 7 th, 2012

Automatic Scaling Iterative Computations. Aug. 7 th, 2012 Automatic Scaling Iterative Computations Guozhang Wang Cornell University Aug. 7 th, 2012 1 What are Non-Iterative Computations? Non-iterative computation flow Directed Acyclic Examples Batch style analytics

More information

CS 6453: Parameter Server. Soumya Basu March 7, 2017

CS 6453: Parameter Server. Soumya Basu March 7, 2017 CS 6453: Parameter Server Soumya Basu March 7, 2017 What is a Parameter Server? Server for large scale machine learning problems Machine learning tasks in a nutshell: Feature Extraction (1, 1, 1) (2, -1,

More information

Addressing the straggler problem for iterative convergent parallel ML

Addressing the straggler problem for iterative convergent parallel ML Addressing the straggler problem for iterative convergent parallel ML Aaron Harlap, Henggang Cui, Wei Dai, Jinliang Wei, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, Eric P. Xing Carnegie Mellon

More information

MALT: Distributed Data-Parallelism for Existing ML Applications

MALT: Distributed Data-Parallelism for Existing ML Applications MALT: Distributed -Parallelism for Existing ML Applications Hao Li, Asim Kadav, Erik Kruus NEC Laboratories America {asim, kruus@nec-labs.com} Abstract We introduce MALT, a machine learning library that

More information

Analyzing Stochastic Gradient Descent for Some Non- Convex Problems

Analyzing Stochastic Gradient Descent for Some Non- Convex Problems Analyzing Stochastic Gradient Descent for Some Non- Convex Problems Christopher De Sa Soon at Cornell University cdesa@stanford.edu stanford.edu/~cdesa Kunle Olukotun Christopher Ré Stanford University

More information

RESEARCH ARTICLE. Angel: a new large-scale machine learning system. Jie Jiang 1,2,, Lele Yu 1,,JiaweiJiang 1, Yuhong Liu 2 and Bin Cui 1, ABSTRACT

RESEARCH ARTICLE. Angel: a new large-scale machine learning system. Jie Jiang 1,2,, Lele Yu 1,,JiaweiJiang 1, Yuhong Liu 2 and Bin Cui 1, ABSTRACT RESEARCH ARTICLE National Science Review 00: 1 21, 2017 doi: 10.1093/nsr/nwx018 Advance access publication 24 February 2017 INFORMATION SCIENCE Angel: a new large-scale machine learning system Jie Jiang

More information

Litz: An Elastic Framework for High-Performance Distributed Machine Learning

Litz: An Elastic Framework for High-Performance Distributed Machine Learning Litz: An Elastic Framework for High-Performance Distributed Machine Learning Aurick Qiao *,, Abutalib Aghayev, Weiren Yu *, Haoyang Chen *, Qirong Ho *, Garth A. Gibson, and Eric P. Xing *, * Petuum, Inc.

More information

Parallelizing Big Data Machine Learning Algorithms with Model Rotation

Parallelizing Big Data Machine Learning Algorithms with Model Rotation Parallelizing Big Data Machine Learning Algorithms with Model Rotation Bingjing Zhang, Bo Peng, Judy Qiu School of Informatics and Computing Indiana University Bloomington, IN, USA Email: {zhangbj, pengb,

More information

Parallel Methods for Convex Optimization. A. Devarakonda, J. Demmel, K. Fountoulakis, M. Mahoney

Parallel Methods for Convex Optimization. A. Devarakonda, J. Demmel, K. Fountoulakis, M. Mahoney Parallel Methods for Convex Optimization A. Devarakonda, J. Demmel, K. Fountoulakis, M. Mahoney Problems minimize g(x)+f(x; A, b) Sparse regression g(x) =kxk 1 f(x) =kax bk 2 2 mx Sparse SVM g(x) =kxk

More information

Scalable deep learning on distributed GPUs with a GPU-specialized parameter server

Scalable deep learning on distributed GPUs with a GPU-specialized parameter server Scalable deep learning on distributed GPUs with a GPU-specialized parameter server Henggang Cui, Gregory R. Ganger, and Phillip B. Gibbons Carnegie Mellon University CMU-PDL-15-107 October 2015 Parallel

More information

Memory Bandwidth and Low Precision Computation. CS6787 Lecture 9 Fall 2017

Memory Bandwidth and Low Precision Computation. CS6787 Lecture 9 Fall 2017 Memory Bandwidth and Low Precision Computation CS6787 Lecture 9 Fall 2017 Memory as a Bottleneck So far, we ve just been talking about compute e.g. techniques to decrease the amount of compute by decreasing

More information

Distributed Delayed Proximal Gradient Methods

Distributed Delayed Proximal Gradient Methods Distributed Delayed Proximal Gradient Methods Mu Li, David G. Andersen and Alexander Smola, Carnegie Mellon University Google Strategic Technologies Abstract We analyze distributed optimization algorithms

More information

Memory Bandwidth and Low Precision Computation. CS6787 Lecture 10 Fall 2018

Memory Bandwidth and Low Precision Computation. CS6787 Lecture 10 Fall 2018 Memory Bandwidth and Low Precision Computation CS6787 Lecture 10 Fall 2018 Memory as a Bottleneck So far, we ve just been talking about compute e.g. techniques to decrease the amount of compute by decreasing

More information

Distributed Computing with Spark

Distributed Computing with Spark Distributed Computing with Spark Reza Zadeh Thanks to Matei Zaharia Outline Data flow vs. traditional network programming Limitations of MapReduce Spark computing engine Numerical computing on Spark Ongoing

More information

CS-541 Wireless Sensor Networks

CS-541 Wireless Sensor Networks CS-541 Wireless Sensor Networks Lecture 14: Big Sensor Data Prof Panagiotis Tsakalides, Dr Athanasia Panousopoulou, Dr Gregory Tsagkatakis 1 Overview Big Data Big Sensor Data Material adapted from: Recent

More information

15.1 Optimization, scaling, and gradient descent in Spark

15.1 Optimization, scaling, and gradient descent in Spark CME 323: Distributed Algorithms and Optimization, Spring 2017 http://stanford.edu/~rezab/dao. Instructor: Reza Zadeh, Matroid and Stanford. Lecture 16, 5/24/2017. Scribed by Andreas Santucci. Overview

More information

A Hierarchical Synchronous Parallel Model for Wide-Area Graph Analytics

A Hierarchical Synchronous Parallel Model for Wide-Area Graph Analytics A Hierarchical Synchronous Parallel Model for Wide-Area Graph Analytics Shuhao Liu*, Li Chen, Baochun Li, Aiden Carnegie University of Toronto April 17, 2018 Graph Analytics What is Graph Analytics? 2

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

MY WEAK CONSISTENCY IS STRONG WHEN BAD THINGS DO NOT COME IN THREES ZECHAO SHANG JEFFREY XU YU

MY WEAK CONSISTENCY IS STRONG WHEN BAD THINGS DO NOT COME IN THREES ZECHAO SHANG JEFFREY XU YU MY WEAK CONSISTENCY IS STRONG WHEN BAD THINGS DO NOT COME IN THREES ZECHAO SHANG JEFFREY XU YU DISCLAIMER: NOT AN OLTP TALK HOW TO GET ALMOST EVERYTHING FOR NOTHING SHARED-MEMORY SYSTEM IS BACK shared

More information

arxiv: v1 [stat.ml] 30 Dec 2013

arxiv: v1 [stat.ml] 30 Dec 2013 Petuum: A Framework for Iterative-Convergent Distributed ML arxiv:1312.7651v1 [stat.ml] 30 Dec 2013 Wei Dai, Jinliang Wei, Xun Zheng, Jin Kyu Kim Seunghak Lee, Junming Yin, Qirong Ho and Eric P. Xing School

More information

Linear Regression Optimization

Linear Regression Optimization Gradient Descent Linear Regression Optimization Goal: Find w that minimizes f(w) f(w) = Xw y 2 2 Closed form solution exists Gradient Descent is iterative (Intuition: go downhill!) n w * w Scalar objective:

More information

Staged Memory Scheduling

Staged Memory Scheduling Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:

More information

Graph Partitioning for Scalable Distributed Graph Computations

Graph Partitioning for Scalable Distributed Graph Computations Graph Partitioning for Scalable Distributed Graph Computations Aydın Buluç ABuluc@lbl.gov Kamesh Madduri madduri@cse.psu.edu 10 th DIMACS Implementation Challenge, Graph Partitioning and Graph Clustering

More information

MapReduce ML & Clustering Algorithms

MapReduce ML & Clustering Algorithms MapReduce ML & Clustering Algorithms Reminder MapReduce: A trade-off between ease of use & possible parallelism Graph Algorithms Approaches: Reduce input size (filtering) Graph specific optimizations (Pregel

More information

Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds

Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds Kevin Hsieh Aaron Harlap Nandita Vijaykumar Dimitris Konomis Gregory R. Ganger Phillip B. Gibbons Onur Mutlu Carnegie Mellon University EH

More information

Sync-on-the-fly: A Parallel Framework for Gradient Descent Algorithms on Transient Resources

Sync-on-the-fly: A Parallel Framework for Gradient Descent Algorithms on Transient Resources Sync-on-the-fly: A Parallel Framework for Gradient Descent Algorithms on Transient Resources Guoyi Zhao, Lixin Gao and David Irwin Dept. of Electrical and Computer Engineering University of Massachusetts

More information

Graph-Parallel Problems. ML in the Context of Parallel Architectures

Graph-Parallel Problems. ML in the Context of Parallel Architectures Case Study 4: Collaborative Filtering Graph-Parallel Problems Synchronous v. Asynchronous Computation Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 20 th, 2014

More information

Harp-DAAL for High Performance Big Data Computing

Harp-DAAL for High Performance Big Data Computing Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Parallelism. CS6787 Lecture 8 Fall 2017

Parallelism. CS6787 Lecture 8 Fall 2017 Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does

More information

Decentralized and Distributed Machine Learning Model Training with Actors

Decentralized and Distributed Machine Learning Model Training with Actors Decentralized and Distributed Machine Learning Model Training with Actors Travis Addair Stanford University taddair@stanford.edu Abstract Training a machine learning model with terabytes to petabytes of

More information

Algorithm Engineering with PRAM Algorithms

Algorithm Engineering with PRAM Algorithms Algorithm Engineering with PRAM Algorithms Bernard M.E. Moret moret@cs.unm.edu Department of Computer Science University of New Mexico Albuquerque, NM 87131 Rome School on Alg. Eng. p.1/29 Measuring and

More information

Graphs (Part II) Shannon Quinn

Graphs (Part II) Shannon Quinn Graphs (Part II) Shannon Quinn (with thanks to William Cohen and Aapo Kyrola of CMU, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University) Parallel Graph Computation Distributed computation

More information

Introduction to parallel Computing

Introduction to parallel Computing Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information

Distributed Machine Learning" on Spark

Distributed Machine Learning on Spark Distributed Machine Learning" on Spark Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Outline Data flow vs. traditional network programming Spark computing engine Optimization Example Matrix Computations

More information

modern database systems lecture 10 : large-scale graph processing

modern database systems lecture 10 : large-scale graph processing modern database systems lecture 1 : large-scale graph processing Aristides Gionis spring 18 timeline today : homework is due march 6 : homework out april 5, 9-1 : final exam april : homework due graphs

More information

Templates. for scalable data analysis. 3 Distributed Latent Variable Models. Amr Ahmed, Alexander J Smola, Markus Weimer

Templates. for scalable data analysis. 3 Distributed Latent Variable Models. Amr Ahmed, Alexander J Smola, Markus Weimer Templates for scalable data analysis 3 Distributed Latent Variable Models Amr Ahmed, Alexander J Smola, Markus Weimer Yahoo! Research & UC Berkeley & ANU Variations on a theme inference for mixtures Parallel

More information

Litz: Elastic Framework for High-Performance Distributed Machine Learning

Litz: Elastic Framework for High-Performance Distributed Machine Learning Litz: Elastic Framework for High-Performance Distributed Machine Learning Aurick Qiao, Petuum, Inc. and Carnegie Mellon University; Abutalib Aghayev, Carnegie Mellon University; Weiren Yu, Petuum, Inc.

More information

Online Course Evaluation. What we will do in the last week?

Online Course Evaluation. What we will do in the last week? Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do

More information

Utilizing Datacenter Networks: Centralized or Distributed Solutions?

Utilizing Datacenter Networks: Centralized or Distributed Solutions? Utilizing Datacenter Networks: Centralized or Distributed Solutions? Costin Raiciu Department of Computer Science University Politehnica of Bucharest We ve gotten used to great applications Enabling Such

More information

Accelerating Machine Learning on Emerging Architectures

Accelerating Machine Learning on Emerging Architectures Accelerating Machine Learning on Emerging Architectures Big Simulation and Big Data Workshop January 9, 2017 Indiana University Judy Qiu Associate Professor of Intelligent Systems Engineering Indiana University

More information

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

IX: A Protected Dataplane Operating System for High Throughput and Low Latency IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this

More information

HARP: A MACHINE LEARNING FRAMEWORK ON TOP OF THE COLLECTIVE COMMUNICATION LAYER FOR THE BIG DATA SOFTWARE STACK. Bingjing Zhang

HARP: A MACHINE LEARNING FRAMEWORK ON TOP OF THE COLLECTIVE COMMUNICATION LAYER FOR THE BIG DATA SOFTWARE STACK. Bingjing Zhang HARP: A MACHINE LEARNING FRAMEWORK ON TOP OF THE COLLECTIVE COMMUNICATION LAYER FOR THE BIG DATA SOFTWARE STACK Bingjing Zhang Submitted to the faculty of the University Graduate School in partial fulfillment

More information

A Brief Look at Optimization

A Brief Look at Optimization A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form

More information

Scaled Machine Learning at Matroid

Scaled Machine Learning at Matroid Scaled Machine Learning at Matroid Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Machine Learning Pipeline Learning Algorithm Replicate model Data Trained Model Serve Model Repeat entire pipeline Scaling

More information

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre NoSQL systems: sharding, replication and consistency Riccardo Torlone Università Roma Tre Data distribution NoSQL systems: data distributed over large clusters Aggregate is a natural unit to use for data

More information

RACS: Extended Version in Java Gary Zibrat gdz4

RACS: Extended Version in Java Gary Zibrat gdz4 RACS: Extended Version in Java Gary Zibrat gdz4 Abstract Cloud storage is becoming increasingly popular and cheap. It is convenient for companies to simply store their data online so that they don t have

More information

Towards the world s fastest k-means algorithm

Towards the world s fastest k-means algorithm Greg Hamerly Associate Professor Computer Science Department Baylor University Joint work with Jonathan Drake May 15, 2014 Objective function and optimization Lloyd s algorithm 1 The k-means clustering

More information

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G.

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G. Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G. Speaker: Chong Li Department: Applied Health Science Program: Master of Health Informatics 1 Term

More information

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Flat Datacenter Storage Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Motivation Imagine a world with flat data storage Simple, Centralized, and easy to program Unfortunately, datacenter networks

More information

Case Study 1: Estimating Click Probabilities

Case Study 1: Estimating Click Probabilities Case Study 1: Estimating Click Probabilities SGD cont d AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade March 31, 2015 1 Support/Resources Office Hours Yao Lu:

More information

arxiv: v1 [stat.ml] 31 Dec 2015

arxiv: v1 [stat.ml] 31 Dec 2015 Strategies and Principles of Distributed Machine Learning on Big Data Eric P. Xing, Qirong Ho, Pengtao Xie, Wei Dai School of Computer Science, Carnegie Mellon University January 1, 2016 arxiv:1512.09295v1

More information

The Future of High Performance Computing

The Future of High Performance Computing The Future of High Performance Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Comparing Two Large-Scale Systems Oakridge Titan Google Data Center 2 Monolithic supercomputer

More information

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability Janis Keuper Itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern,

More information

Optimizing Network Performance in Distributed Machine Learning. Luo Mai Chuntao Hong Paolo Costa

Optimizing Network Performance in Distributed Machine Learning. Luo Mai Chuntao Hong Paolo Costa Optimizing Network Performance in Distributed Machine Learning Luo Mai Chuntao Hong Paolo Costa Machine Learning Successful in many fields Online advertisement Spam filtering Fraud detection Image recognition

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores Swapnil Patil Milo Polte, Wittawat Tantisiriroj, Kai Ren, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs *, Billie

More information

Resilient Distributed Datasets

Resilient Distributed Datasets Resilient Distributed Datasets A Fault- Tolerant Abstraction for In- Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin,

More information

Native Offload of Haskell Repa Programs to Integrated GPUs

Native Offload of Haskell Repa Programs to Integrated GPUs Native Offload of Haskell Repa Programs to Integrated GPUs Hai (Paul) Liu with Laurence Day, Neal Glew, Todd Anderson, Rajkishore Barik Intel Labs. September 28, 2016 General purpose computing on integrated

More information

Machine Learning at the Limit

Machine Learning at the Limit Machine Learning at the Limit John Canny*^ * Computer Science Division University of California, Berkeley ^ Yahoo Research Labs @GTC, March, 2015 My Other Job(s) Yahoo [Chen, Pavlov, Canny, KDD 2009]*

More information

The Stratosphere Platform for Big Data Analytics

The Stratosphere Platform for Big Data Analytics The Stratosphere Platform for Big Data Analytics Hongyao Ma Franco Solleza April 20, 2015 Stratosphere Stratosphere Stratosphere Big Data Analytics BIG Data Heterogeneous datasets: structured / unstructured

More information

Developing MapReduce Programs

Developing MapReduce Programs Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes

More information

GraphLab: A New Framework for Parallel Machine Learning

GraphLab: A New Framework for Parallel Machine Learning GraphLab: A New Framework for Parallel Machine Learning Yucheng Low, Aapo Kyrola, Carlos Guestrin, Joseph Gonzalez, Danny Bickson, Joe Hellerstein Presented by Guozhang Wang DB Lunch, Nov.8, 2010 Overview

More information

Hyperparameter optimization. CS6787 Lecture 6 Fall 2017

Hyperparameter optimization. CS6787 Lecture 6 Fall 2017 Hyperparameter optimization CS6787 Lecture 6 Fall 2017 Review We ve covered many methods Stochastic gradient descent Step size/learning rate, how long to run Mini-batching Batch size Momentum Momentum

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

Asynchronous Stochastic Gradient Descent on GPU: Is It Really Better than CPU?

Asynchronous Stochastic Gradient Descent on GPU: Is It Really Better than CPU? Asynchronous Stochastic Gradient Descent on GPU: Is It Really Better than CPU? Florin Rusu Yujing Ma, Martin Torres (Ph.D. students) University of California Merced Machine Learning (ML) Boom Two SIGMOD

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint

More information

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

CS535 Big Data Fall 2017 Colorado State University   10/10/2017 Sangmi Lee Pallickara Week 8- A. CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE

More information

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BRETT WENINGER, MANAGING DIRECTOR 10/21/2014 ADURANT APPROACH TO BIG DATA Align to Un/Semi-structured Data Instead of Big Scale out will become Big Greatest

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data

More information

MySQL Database Scalability

MySQL Database Scalability MySQL Database Scalability Nextcloud Conference 2016 TU Berlin Oli Sennhauser Senior MySQL Consultant at FromDual GmbH oli.sennhauser@fromdual.com 1 / 14 About FromDual GmbH Support Consulting remote-dba

More information

Partitioning Algorithms for Improving Efficiency of Topic Modeling Parallelization

Partitioning Algorithms for Improving Efficiency of Topic Modeling Parallelization Partitioning Algorithms for Improving Efficiency of Topic Modeling Parallelization Hung Nghiep Tran University of Information Technology VNU-HCMC Vietnam Email: nghiepth@uit.edu.vn Atsuhiro Takasu National

More information

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568 FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected

More information

Litz: Elastic Framework for High-Performance Distributed Machine Learning

Litz: Elastic Framework for High-Performance Distributed Machine Learning Litz: Elastic Framework for High-Performance Distributed Machine Learning Aurick Qiao 1,2, Abutalib Aghayev 2, Weiren Yu 1,3, Haoyang Chen 1, Qirong Ho 1, Garth A. Gibson 2,4, Eric P. Xing 1,2, 1 Petuum,

More information

MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS

MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS INSTRUCTOR: Dr. MUHAMMAD SHAABAN PRESENTED BY: MOHIT SATHAWANE AKSHAY YEMBARWAR WHAT IS MULTICORE SYSTEMS? Multi-core processor architecture means placing

More information

Apache Flink. Alessandro Margara

Apache Flink. Alessandro Margara Apache Flink Alessandro Margara alessandro.margara@polimi.it http://home.deib.polimi.it/margara Recap: scenario Big Data Volume and velocity Process large volumes of data possibly produced at high rate

More information

COMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare

COMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare COMP6237 Data Mining Data Mining & Machine Learning with Big Data Jonathon Hare jsh2@ecs.soton.ac.uk Contents Going to look at two case-studies looking at how we can make machine-learning algorithms work

More information

I/O CANNOT BE IGNORED

I/O CANNOT BE IGNORED LECTURE 13 I/O I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access improves by ~10% per year and I/O remains the same.

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information