Toward Scalable Deep Learning

Size: px
Start display at page:

Download "Toward Scalable Deep Learning"

Transcription

1 한국정보과학회 인공지능소사이어티 머신러닝연구회 두번째딥러닝워크샵 Toward Scalable Deep Learning 윤성로 Electrical and Computer Engineering Seoul National University

2 Breakthrough: Big Data + Machine Learning Daphne Koller Andrew Ng

3 Shadow beyond the Revolution Training challenges < MNIST test > Ciresan, Dan Claudiu, et al. "Deep, big, simple neural nets for handwritten digit recognition." Neural computation (2010): T. Chilimbi et al. (OSDI 2014)

4 Machine Learning Representation + Training Sparse structured input/output regression Nonparametric Bayesian models Representations Graphical models Deep learning Training

5 Parallelism in Machine Learning Basic form of ML F D, θ = L D, θ + r(θ) Iterative-convergent θ t+1 = θ t + Δθ(D) Δθ(D) Data Parallel Model Parallel Δθ(D) Δθ(D 1 ) Δθ 1 (D) Δθ(D 2 ) Δθ 2 (D) Δθ(D n ) Δθ m (D) E. Xing & Q. Ho., 2015 KDD Tutorial

6 Deep Learning: A New Learning Paradigm Far more complex and larger than conventional ML models A large number of model parameters to learn Many (mostly simple) computations with latent variables Needs scaling up/out computation & numerical optimization

7 Dealing with the Challenges (1) Minimize computation [Bengio, 2014] Improve (reduce) the ratio of # OF COMPUTATIONS / # OF PARAMETERS Extreme success story (but poor generalization): decision trees O(n) computations for O(2 n ) parameters Extreme unlucky story: deep neural nets O(n) computations for O(n) parameters Example Conditional computation (Bengio, 2014)

8 Dealing with the Challenges (2) Scale-up approaches Co-processor/Accelerator (SIMD, GPGPU, ) Learning workload Enhanced single machine performance Organized in SIMD blocks 10-fold to 100-fold speed-up Stuck with memory constraints!

9 Dealing with the Challenges (3) Scale-out approaches Learning workload Distributed System (GraphLab, Hadoop, Spark,...) Can handle enormous size of data or model Split the entire workload using Data parallelism Model parallelism Parameter communication issues!

10 Notable ML Platforms Spark RDD-based programming model ML library (includes deep learning) GraphLab Gather-Apply-Scatter programming model Large-scale graph mining Petuum Key-value store + scheduler General-purpose large-scale ML

11 Notable ML Platforms GPU-based (scale-up) Keras( ) Distributed (scale-out) DistBelief [D. Jeffrey et al., "Large scale distributed deep networks," NIPS 2012] Project Adam [T. Chilimbi et al., "Project Adam: Building an efficient and scalable deep learning training system, OSDI 2014]

12 Recent Technological Trends DistBelief: supports both data and model parallelism J. Dean et al. "Large scale distributed deep networks," NIPS 2012

13 Recent Technological Trends GPU-accelerated library of primitives for DNN Used by frameworks such as Caffe, Theano, Ex) cudnn (v3) vs. cudnn (v2) on Caffe

14 Recent Technological Trends Open-source, distributed, commercial-grade DL framework DeepLearning4j ND4J (Scientific computing library for JVM) Scalable backend Apache Hadoop and Spark GPUs Partners

15 Recent Technological Trends Large-scale distributed machine learning Considers both data and model parallelism Key-value store + dynamic scheduler

16 Recent Technological Trends» Retainable Evaluator Execution Framework An Apache incubator project Package a variety of data-processing libraries in a reusable form MapReduce, query, graph processing and stream data processing REEF introduction (

17 Scalable Deep Learning Techniques 1) Data parallelism Hogwild! (B. Recht, et al., NIPS 2011) Downpour SGD (J. Dean, et al., NIPS 2012), Dogwild (C. Noel, et al., 2014) 2) Parameter Server (M. Li, et al., NIPS 2013) 3) Model parallelism (STRADS) (S. Lee, et al., NIPS 2014) 4) Acceleration with GPUs (CUDA convnet) Examples of distributed schemes

18 Data Parallelism Based on the independency between data Leads to concurrent executions for each data speed up Samples Attributes DATA Worker 1 Worker 2 Worker 3 Model Aggregation

19 Data Parallelism : Hogwild! Asynchronous running; Don t Lock! Don t Communicate! For each processor, calculate gradients independently Processors can overwrite each others work Y. Nishioka, Scalable Task-Parallel SGD on Matrix Factorization in Multicore Architectures, IPDPS 2015

20 Data Parallelism : Hogwild! Guarantees a reasonable converge rate Exploits sparsity Better performance even in non-sparse examples than traditional synchronized techniques (e.g., SVM) B. Recht et al., "Hogwild: A lock-free approach to parallelizing stochastic gradient descent," NIPS 2011

21 Data Parallelism : Downpour SGD, Dogwild! Hogwild: designed for shared-memory machines Limited scalability Expand the concept of Hogwild! to distributed systems Asynchronous update gradients to master or parameter server Ex) downpour SGD & Dogwild! (=distributed Hogwild!) J. Dean et al., "Large scale distributed deep networks," NIPS 2012

22 Parameter Server Parameter server Widely used concept for distributed machine learning Separate servers for parameters in the model Key features (Li et al., 2013) Efficient communication Flexible consistency models Elastic scalability Fault tolerance and durability Ease of Use M. Li et al.,"parameter server for distributed machine learning," Big Learning NIPS Workshop, 2013

23 Parameter Server : Key Value Vector Model Usually expressed as a vector or an array Sparse data & linear model Not all parameters are used to calculate gradients Key value vector ww 1, ww 2,, ww nn {(i, w) i Feature, ww Weight} Used to transmit the parameters only which workers need Example: (ww 1, ww 2, ww 3, ww 4 ) (1, ww 1 ) (2, ww 2 ) (3, ww 3 ) (4, ww 4 )

24 Parameter Server : Interface Server node Data: a partition of the globally shared parameters Worker node Data: a portion of the training data Task: local Push Direction: Worker Server Data: Calculated update value Pull Direction : Server Worker Data: Updated parameter M. Li et al., "Parameter server for distributed machine learning," Big Learning NIPS Workshop, 2013

25 Parameter Server : Data & model Partition Partition Model Server Push Pull Dimension Worker Data

26 STRADS STRADS (Lee et al., 2014) STRucture-Aware Dynamic Scheduler Parameter server with dynamic scheduler Chooses a set of parameters which can be updated in parallel Parameters are not transmitted between masters and workers S. Lee et al., "On model parallelization and scheduling strategies for distributed machine learning, NIPS 2014

27 STRADS : Execution Basic execution unit order Schedule Push - Pull Schedule Subject: Master Task Pick sets of model parameters that can be safely updated in parallel Push Subject: Master Tasks Dispatch computation jobs via the coordinator to the workers Execute push to compute partial updates for each parameter Pull Subject: key-value store Tasks Aggregate the partial updates Keep newly updated parameters

28 STRADS : Performance Performance advantages of STRADS Faster convergence Larger model size Latent Dirichlet Allocation Matrix Factorization S. Lee et al., "On model parallelization and scheduling strategies for distributed machine learning, NIPS 2014

29 CUDA-convnet Fast C++/CUDA implementation of convolutional neural networks Supports multiple-gpu training GPU1 GPU1 Fully-connected layers GPU2 GPU2 A. Krizhevsky, 2012 Convolutional layers

30 CUDA-convnet2 New features (wrt cuda-convnet): Improved training time Enhanced data parallelism, model parallelism, and hybrids Possible parallelizing schemes: (a) Computing fully-connected activities after assembling a big batch from laststage convlayer activities. (b) Each worker sending its last-stage convlayer activities to all the other workers in turns. In parallel with feedforward & backprop computation, the next worker updates its activities. A. Krizhevsky, One weird trick for parallelizing convolutional neural networks, (c) All of the workers sending # eeeeeeeeeeeeeeee KK of their convlayer activities to all other workers. The workers then proceed as in (b)

31 CUDA-convnet2 : Model Parallelism (Fully Connected Layers) A. Krizhevsky, One weird trick for parallelizing convolutional neural networks, 2014

32 Caffe Open framework, models, and worked examples for deep learning Pure C++/CUDA architecture for deep learning (Python, Matlab interfaces) Fast, well-tested code Tools, reference models, demos, and recipes Seamless switch between CPU and GPU Application Object classification Learning semantic features Object detection Sequences Reinforcement learning Speech + text C5G71UMscNPlvArsWER41PsU/edit#slide=id.gc2fcdcce7_216_0

33 Caffe : Example LeNet A network is a set of layers and their connections Caffe creates and checks the net from the definition Layer : plain text scheme, not a code LeNet

34 Caffe : Pros and Cons Performance 2 ms / image on K40 GPU <1 ms inference with Caffe + cudnn v2 on Titan X 72 million images per day with batched IO Pros Fast way to apply deep neural networks Support GPU Many common and new functions are supported Python and Matlab binding Cons Only a few input formats and only one output format (HDF5)

35 DistBelief Introduced by Google Brain research team J. Dean et al., Large scale distributed deep networks, NIPS 2012 Use large-scale cluster to distribute training and inference Exploits both data & model parallelism Distributed optimization Algorithms using parameter server Downpour SGD Sandblaster L-BFGS Trains a DN w/ billions of params using tens of thousands of CPU cores Capable of training a deep network 30x larger State-of-art performance on ImageNet 1) (by 2012) Faster than a GPU on modestly sized deep networks 1) An image database w/ 16m images, 20k categories and 1b params

36 DistBelief : Partition Model Across Machines J. Dean et al., "Large scale distributed deep networks," NIPS 2012

37 DistBelief : Asynchronous Distributed SGD Computes gradient on partial data (3) ttttttpp jj = ww jj 300 h ww xx ii yy ii 2 ii=201 xx 201, yy 201,, (xx 300, yy 300 ) Asynchronous communication on partitioned data Utilization of parameter server J. Dean, et al. "Large scale distributed deep networks." NIPS 2012.

38 DistBelief : Downpour SGD Asynchronous distributed SGD Robust to machine failures Introduces additional stochasticity Adagrad Adaptive learning rate Improve robustness and scalability 1. Asynchronously fetching parameters to multiple model replicas 2. SGD process inside model 3. Asynchronously pushing gradients to parameter server J. Dean et al., "Large scale distributed deep networks," NIPS 2012

39 Adam Optimizes and balances computation and communication Exploits model parallelism Minimized memory bandwidth and communication overhead Achieves high performance and scalability Also with accuracy improvement Multi-threaded model parameter updates without locks Asynchronous batched parameter updates Supports training any combination of Stacked convolutional and fully-connected network layers

40 Adam : Architecture On a single machine: Multi-threaded training Fast weight updates without lock (similar to Hogwild!) Multiple machines: Model partitioning Reducing memory copies (= data transfer) using own network library Optimization of memory system: L3 cache, cache locality Use vector processing units for matrix multiplication Asynchronous mitigating (speed variance of machines) Asynchronous updates with a global parameter server T. Chilimbi et al., Project adam: Building an efficient and scalable deep learning training system, OSDI 14

41 Adam : Results Application: Mnist / ImageNet 120 machines: 90 (training) + 20 (parameter server) + 10 (image server) <Performance of training nodes> <Scaling model size with more workers> <Accuracy of two applications> 30 fewer machines 2x improvements

42 Petuum Data & model parallel approach Considers the three properties of ML stated below Three properties of general ML Error tolerance Robustness against limited errors in the middle of calculation Dynamic structural dependency Changes in correlation between parameters Non-uniform convergence Differences between the convergence speed for each parameters

43 Petuum : Architecture Scheduler The core of the model parallelism support User can schedule which parameters are updated by schedule( ) Partial updates are aggregated by pull( ) Worker Parameters are received by schedule( ) Updates are computed by push( ) Any data storage system can be used Parameter server Uses the Stale Synchronous Parallel (SSP) consistency model Table based or key value stores

44 Petuum : Stale Synchronous Parallel (SSP) A parallel consistency model Limits the difference of the number of iteration which have progressed between workers Reduces network synchronization and communication costs due to error tolerant convergence Ho et al.,"more effective distributed ml via a stale synchronous parallel parameter server, NIPS 2013

45 Petuum : Performance High relative speedup compared to other implementations Near-linear speed-up by increasing machines

46 SINGA Distributed deep learning platform for big data analytics Support CNN, RBM, RNN and others Flexible to run synchronous/asynchronous and hybrid framework Support various neural net partitioning schemes Design goals Generality Different categories of models Different training frameworks Scalability Scalable to a large model and training datasets ex) Trained with 1 billion parameters and 10M images Ease of use Provides a simple programming model Supports built-in models, Python binding, and web interface Useable without much awareness of the underlying distributed platform

47 SINGA : Distributed Training Worker Group Loads a subset of training data and computes gradients for model replica Workers within a group run synchronously Different worker groups run asynchronously Server Group Maintains on ParamShard Handles requests of multiple worker groups for parameter updates Synchronize with neighboring groups

48 SINGA : Configurations 1 server group & 1 worker group (synchronous frameworks) 1 server group & 1 worker groups (asynchronous frameworks) Co-locate worker and server AllReduce (Baidu s DeepImage) Dogwild (distributed Hogwild) Separate worker and server groups Sandblaster Downpour

49 SINGA : Pros and Cons Pros Easy to use and support programming without much awareness of the underlying distributed platform Distributed architecture using synchronous, asynchronous and hybrid updates Cons Limited scale-up support (e.g., no support for GPUs)

50 Summary In the era of big data, deep learning techniques show higher accuracy than the traditional machine learning algorithms. However, deep learning often requires a huge amount of resources for showing state-of-the art performance on large-scale data. This talk provides a survey of recent proposals for alleviating the computational challenges involved in training large-scale deep neural networks. With emphasis on examples of scale-up or scale-out techniques

Large Scale Distributed Deep Networks

Large Scale Distributed Deep Networks Large Scale Distributed Deep Networks Yifu Huang School of Computer Science, Fudan University huangyifu@fudan.edu.cn COMP630030 Data Intensive Computing Report, 2013 Yifu Huang (FDU CS) COMP630030 Report

More information

CS 179 Lecture 16. Logistic Regression & Parallel SGD

CS 179 Lecture 16. Logistic Regression & Parallel SGD CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)

More information

Distributed Machine Learning: An Intro. Chen Huang

Distributed Machine Learning: An Intro. Chen Huang : An Intro. Chen Huang Feature Engineering Group, Data Mining Lab, Big Data Research Center, UESTC Contents Background Some Examples Model Parallelism & Data Parallelism Parallelization Mechanisms Synchronous

More information

Training Deep Neural Networks (in parallel)

Training Deep Neural Networks (in parallel) Lecture 9: Training Deep Neural Networks (in parallel) Visual Computing Systems How would you describe this professor? Easy? Mean? Boring? Nerdy? Professor classification task Classifies professors as

More information

Asynchronous Parallel Stochastic Gradient Descent. A Numeric Core for Scalable Distributed Machine Learning Algorithms

Asynchronous Parallel Stochastic Gradient Descent. A Numeric Core for Scalable Distributed Machine Learning Algorithms Asynchronous Parallel Stochastic Gradient Descent A Numeric Core for Scalable Distributed Machine Learning Algorithms J. Keuper and F.-J. Pfreundt Competence Center High Performance Computing Fraunhofer

More information

Lecture 22 : Distributed Systems for ML

Lecture 22 : Distributed Systems for ML 10-708: Probabilistic Graphical Models, Spring 2017 Lecture 22 : Distributed Systems for ML Lecturer: Qirong Ho Scribes: Zihang Dai, Fan Yang 1 Introduction Big data has been very popular in recent years.

More information

DL4J Components. Open Source DeepLearning Library CPU & GPU support Hadoop-Yarn & Spark integration Futre additions

DL4J Components. Open Source DeepLearning Library CPU & GPU support Hadoop-Yarn & Spark integration Futre additions DeepLearning4j DL4J Components Open Source DeepLearning Library CPU & GPU support Hadoop-Yarn & Spark integration Futre additions DL4J Components Open Source Deep Learning Library What is DeepLearning

More information

TensorFlow: A System for Learning-Scale Machine Learning. Google Brain

TensorFlow: A System for Learning-Scale Machine Learning. Google Brain TensorFlow: A System for Learning-Scale Machine Learning Google Brain The Problem Machine learning is everywhere This is in large part due to: 1. Invention of more sophisticated machine learning models

More information

Harp-DAAL for High Performance Big Data Computing

Harp-DAAL for High Performance Big Data Computing Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Motivation And Intro Programming Model Spark Data Transformation Model Construction Model Training Model Inference Execution Model Data Parallel Training

More information

More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server

More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server Q. Ho, J. Cipar, H. Cui, J.K. Kim, S. Lee, *P.B. Gibbons, G.A. Gibson, G.R. Ganger, E.P. Xing Carnegie Mellon University

More information

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters Hao Zhang Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jianliang Wei, Pengtao Xie,

More information

Parallel Deep Network Training

Parallel Deep Network Training Lecture 26: Parallel Deep Network Training Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2016 Tunes Speech Debelle Finish This Album (Speech Therapy) Eat your veggies and study

More information

Deep Learning Frameworks with Spark and GPUs

Deep Learning Frameworks with Spark and GPUs Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel,

More information

CafeGPI. Single-Sided Communication for Scalable Deep Learning

CafeGPI. Single-Sided Communication for Scalable Deep Learning CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks

More information

Scaling Distributed Machine Learning

Scaling Distributed Machine Learning Scaling Distributed Machine Learning with System and Algorithm Co-design Mu Li Thesis Defense CSD, CMU Feb 2nd, 2017 nx min w f i (w) Distributed systems i=1 Large scale optimization methods Large-scale

More information

Parallel Deep Network Training

Parallel Deep Network Training Lecture 19: Parallel Deep Network Training Parallel Computer Architecture and Programming How would you describe this professor? Easy? Mean? Boring? Nerdy? Professor classification task Classifies professors

More information

COMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare

COMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare COMP6237 Data Mining Data Mining & Machine Learning with Big Data Jonathon Hare jsh2@ecs.soton.ac.uk Contents Going to look at two case-studies looking at how we can make machine-learning algorithms work

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 4, 2016 Outline Multi-core v.s. multi-processor Parallel Gradient Descent Parallel Stochastic Gradient Parallel Coordinate Descent Parallel

More information

Asynchronous Stochastic Gradient Descent on GPU: Is It Really Better than CPU?

Asynchronous Stochastic Gradient Descent on GPU: Is It Really Better than CPU? Asynchronous Stochastic Gradient Descent on GPU: Is It Really Better than CPU? Florin Rusu Yujing Ma, Martin Torres (Ph.D. students) University of California Merced Machine Learning (ML) Boom Two SIGMOD

More information

Parallel Implementation of Deep Learning Using MPI

Parallel Implementation of Deep Learning Using MPI Parallel Implementation of Deep Learning Using MPI CSE633 Parallel Algorithms (Spring 2014) Instructor: Prof. Russ Miller Team #13: Tianle Ma Email: tianlema@buffalo.edu May 7, 2014 Content Introduction

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability Janis Keuper Itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern,

More information

DEEP NEURAL NETWORKS AND GPUS. Julie Bernauer

DEEP NEURAL NETWORKS AND GPUS. Julie Bernauer DEEP NEURAL NETWORKS AND GPUS Julie Bernauer GPU Computing GPU Computing Run Computations on GPUs x86 CUDA Framework to Program NVIDIA GPUs A simple sum of two vectors (arrays) in C void vector_add(int

More information

RESEARCH ARTICLE. Angel: a new large-scale machine learning system. Jie Jiang 1,2,, Lele Yu 1,,JiaweiJiang 1, Yuhong Liu 2 and Bin Cui 1, ABSTRACT

RESEARCH ARTICLE. Angel: a new large-scale machine learning system. Jie Jiang 1,2,, Lele Yu 1,,JiaweiJiang 1, Yuhong Liu 2 and Bin Cui 1, ABSTRACT RESEARCH ARTICLE National Science Review 00: 1 21, 2017 doi: 10.1093/nsr/nwx018 Advance access publication 24 February 2017 INFORMATION SCIENCE Angel: a new large-scale machine learning system Jie Jiang

More information

Deep Learning Training System: Adam & Google Cloud ML. CSE 5194: Introduction to High- Performance Deep Learning Qinghua Zhou

Deep Learning Training System: Adam & Google Cloud ML. CSE 5194: Introduction to High- Performance Deep Learning Qinghua Zhou Deep Learning Training System: Adam & Google Cloud ML CSE 5194: Introduction to High- Performance Deep Learning Qinghua Zhou Outline Project Adam: Building an Efficient and Scalable Deep Learning Training

More information

Mocha.jl. Deep Learning in Julia. Chiyuan Zhang CSAIL, MIT

Mocha.jl. Deep Learning in Julia. Chiyuan Zhang CSAIL, MIT Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, MIT Deep Learning Learning with multi-layer (3~30) neural networks, on a huge training set. State-of-the-art on many AI tasks Computer Vision:

More information

SCALABLE DISTRIBUTED DEEP LEARNING

SCALABLE DISTRIBUTED DEEP LEARNING SEOUL Oct.7, 2016 SCALABLE DISTRIBUTED DEEP LEARNING Han Hee Song, PhD Soft On Net 10/7/2016 BATCH PROCESSING FRAMEWORKS FOR DL Data parallelism provides efficient big data processing: data collecting,

More information

CS 6453: Parameter Server. Soumya Basu March 7, 2017

CS 6453: Parameter Server. Soumya Basu March 7, 2017 CS 6453: Parameter Server Soumya Basu March 7, 2017 What is a Parameter Server? Server for large scale machine learning problems Machine learning tasks in a nutshell: Feature Extraction (1, 1, 1) (2, -1,

More information

Data Analytics and Machine Learning: From Node to Cluster

Data Analytics and Machine Learning: From Node to Cluster Data Analytics and Machine Learning: From Node to Cluster Presented by Viswanath Puttagunta Ganesh Raju Understanding use cases to optimize on ARM Ecosystem Date BKK16-404B March 10th, 2016 Event Linaro

More information

Accelerating Spark Workloads using GPUs

Accelerating Spark Workloads using GPUs Accelerating Spark Workloads using GPUs Rajesh Bordawekar, Minsik Cho, Wei Tan, Benjamin Herta, Vladimir Zolotov, Alexei Lvov, Liana Fong, and David Kung IBM T. J. Watson Research Center 1 Outline Spark

More information

The Future of High Performance Computing

The Future of High Performance Computing The Future of High Performance Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Comparing Two Large-Scale Systems Oakridge Titan Google Data Center 2 Monolithic supercomputer

More information

Parallelism. CS6787 Lecture 8 Fall 2017

Parallelism. CS6787 Lecture 8 Fall 2017 Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does

More information

Decentralized and Distributed Machine Learning Model Training with Actors

Decentralized and Distributed Machine Learning Model Training with Actors Decentralized and Distributed Machine Learning Model Training with Actors Travis Addair Stanford University taddair@stanford.edu Abstract Training a machine learning model with terabytes to petabytes of

More information

Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures

Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures Talk at Bench 2018 by Xiaoyi Lu The Ohio State University E-mail: luxi@cse.ohio-state.edu http://www.cse.ohio-state.edu/~luxi

More information

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction

More information

Scaled Machine Learning at Matroid

Scaled Machine Learning at Matroid Scaled Machine Learning at Matroid Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Machine Learning Pipeline Learning Algorithm Replicate model Data Trained Model Serve Model Repeat entire pipeline Scaling

More information

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD. Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,

More information

Near-Data Processing for Differentiable Machine Learning Models

Near-Data Processing for Differentiable Machine Learning Models Near-Data Processing for Differentiable Machine Learning Models Hyeokjun Choe 1, Seil Lee 1, Hyunha Nam 1, Seongsik Park 1, Seijoon Kim 1, Eui-Young Chung 2 and Sungroh Yoon 1,3 1 Electrical and Computer

More information

11. Neural Network Regularization

11. Neural Network Regularization 11. Neural Network Regularization CS 519 Deep Learning, Winter 2016 Fuxin Li With materials from Andrej Karpathy, Zsolt Kira Preventing overfitting Approach 1: Get more data! Always best if possible! If

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017 3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural

More information

Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds

Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds Kevin Hsieh Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, Gregory R. Ganger, Phillip B. Gibbons, Onur Mutlu Machine Learning and Big

More information

Scalable deep learning on distributed GPUs with a GPU-specialized parameter server

Scalable deep learning on distributed GPUs with a GPU-specialized parameter server Scalable deep learning on distributed GPUs with a GPU-specialized parameter server Henggang Cui, Gregory R. Ganger, and Phillip B. Gibbons Carnegie Mellon University CMU-PDL-15-107 October 2015 Parallel

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Towards Scalable Machine Learning

Towards Scalable Machine Learning Towards Scalable Machine Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Fraunhofer Center Machnine Larning Outline I Introduction

More information

Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture

Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture The 51st Annual IEEE/ACM International Symposium on Microarchitecture Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture Byungchul Hong Yeonju Ro John Kim FuriosaAI Samsung

More information

GPU Acceleration for Machine Learning

GPU Acceleration for Machine Learning GPU Acceleration for Machine Learning John Canny*^ * Computer Science Division University of California, Berkeley ^ Google Research, 2016 Outline BIDMach on single machines BIDMach on clusters DNNs for

More information

Machine Learning at the Limit

Machine Learning at the Limit Machine Learning at the Limit John Canny*^ * Computer Science Division University of California, Berkeley ^ Yahoo Research Labs @GTC, March, 2015 My Other Job(s) Yahoo [Chen, Pavlov, Canny, KDD 2009]*

More information

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics IBM Data Science Experience White paper R Transforming R into a tool for big data analytics 2 R Executive summary This white paper introduces R, a package for the R statistical programming language that

More information

Exploiting characteristics of machine learning applications for efficient parameter servers

Exploiting characteristics of machine learning applications for efficient parameter servers Exploiting characteristics of machine learning applications for efficient parameter servers Henggang Cui hengganc@ece.cmu.edu October 4, 2016 1 Introduction Large scale machine learning has emerged as

More information

MALT: Distributed Data-Parallelism for Existing ML Applications

MALT: Distributed Data-Parallelism for Existing ML Applications MALT: Distributed -Parallelism for Existing ML Applications Hao Li, Asim Kadav, Erik Kruus NEC Laboratories America {asim, kruus@nec-labs.com} Abstract We introduce MALT, a machine learning library that

More information

High Performance Computing

High Performance Computing High Performance Computing 9th Lecture 2016/10/28 YUKI ITO 1 Selected Paper: vdnn: Virtualized Deep Neural Networks for Scalable, MemoryEfficient Neural Network Design Minsoo Rhu, Natalia Gimelshein, Jason

More information

arxiv: v2 [cs.ne] 26 Apr 2014

arxiv: v2 [cs.ne] 26 Apr 2014 One weird trick for parallelizing convolutional neural networks Alex Krizhevsky Google Inc. akrizhevsky@google.com April 29, 2014 arxiv:1404.5997v2 [cs.ne] 26 Apr 2014 Abstract I present a new way to parallelize

More information

Effectively Scaling Deep Learning Frameworks

Effectively Scaling Deep Learning Frameworks Effectively Scaling Deep Learning Frameworks (To 40 GPUs and Beyond) Welcome everyone! I m excited to be here today and get the opportunity to present some of the work that we ve been doing at SVAIL, the

More information

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems Announcements CompSci 516 Database Systems Lecture 12 - and Spark Practice midterm posted on sakai First prepare and then attempt! Midterm next Wednesday 10/11 in class Closed book/notes, no electronic

More information

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context 1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes

More information

Parallel learning of content recommendations using map- reduce

Parallel learning of content recommendations using map- reduce Parallel learning of content recommendations using map- reduce Michael Percy Stanford University Abstract In this paper, machine learning within the map- reduce paradigm for ranking

More information

Research Faculty Summit Systems Fueling future disruptions

Research Faculty Summit Systems Fueling future disruptions Research Faculty Summit 2018 Systems Fueling future disruptions Wolong: A Back-end Optimizer for Deep Learning Computation Jilong Xue Researcher, Microsoft Research Asia System Challenge in Deep Learning

More information

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating

More information

CompSci 516: Database Systems

CompSci 516: Database Systems CompSci 516 Database Systems Lecture 12 Map-Reduce and Spark Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements Practice midterm posted on sakai First prepare and

More information

Review: The best frameworks for machine learning and deep learning

Review: The best frameworks for machine learning and deep learning Review: The best frameworks for machine learning and deep learning infoworld.com/article/3163525/analytics/review-the-best-frameworks-for-machine-learning-and-deep-learning.html By Martin Heller Over the

More information

Similarities and Differences Between Parallel Systems and Distributed Systems

Similarities and Differences Between Parallel Systems and Distributed Systems Similarities and Differences Between Parallel Systems and Distributed Systems Pulasthi Wickramasinghe, Geoffrey Fox School of Informatics and Computing,Indiana University, Bloomington, IN 47408, USA In

More information

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Unified Deep Learning with CPU, GPU, and FPGA Technologies Unified Deep Learning with CPU, GPU, and FPGA Technologies Allen Rush 1, Ashish Sirasao 2, Mike Ignatowski 1 1: Advanced Micro Devices, Inc., 2: Xilinx, Inc. Abstract Deep learning and complex machine

More information

Parallel Stochastic Gradient Descent: The case for native GPU-side GPI

Parallel Stochastic Gradient Descent: The case for native GPU-side GPI Parallel Stochastic Gradient Descent: The case for native GPU-side GPI J. Keuper Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Mark Silberstein Accelerated Computer

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

A performance comparison of Deep Learning frameworks on KNL

A performance comparison of Deep Learning frameworks on KNL A performance comparison of Deep Learning frameworks on KNL R. Zanella, G. Fiameni, M. Rorro Middleware, Data Management - SCAI - CINECA IXPUG Bologna, March 5, 2018 Table of Contents 1. Problem description

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing

More information

BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques

BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques Jingyang Zhu 1, Zhiliang Qian 2*, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and

More information

Automatic Scaling Iterative Computations. Aug. 7 th, 2012

Automatic Scaling Iterative Computations. Aug. 7 th, 2012 Automatic Scaling Iterative Computations Guozhang Wang Cornell University Aug. 7 th, 2012 1 What are Non-Iterative Computations? Non-iterative computation flow Directed Acyclic Examples Batch style analytics

More information

CS-541 Wireless Sensor Networks

CS-541 Wireless Sensor Networks CS-541 Wireless Sensor Networks Lecture 14: Big Sensor Data Prof Panagiotis Tsakalides, Dr Athanasia Panousopoulou, Dr Gregory Tsagkatakis 1 Overview Big Data Big Sensor Data Material adapted from: Recent

More information

Machine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich

Machine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich Machine Learning In A Snap Thomas Parnell Research Staff Member IBM Research - Zurich What are GLMs? Ridge Regression Support Vector Machines Regression Generalized Linear Models Classification Lasso Regression

More information

Brook: An Easy and Efficient Framework for Distributed Machine Learning

Brook: An Easy and Efficient Framework for Distributed Machine Learning 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Asynchronous Distributed Data Parallelism for Machine Learning

Asynchronous Distributed Data Parallelism for Machine Learning Asynchronous Distributed Data Parallelis for Machine Learning Zheng Yan, Yunfeng Shao Shannon Lab, Huawei Technologies Co., Ltd. Beijing, China, 100085 {yanzheng, shaoyunfeng}@huawei.co Abstract Distributed

More information

Optimizing Network Performance in Distributed Machine Learning. Luo Mai Chuntao Hong Paolo Costa

Optimizing Network Performance in Distributed Machine Learning. Luo Mai Chuntao Hong Paolo Costa Optimizing Network Performance in Distributed Machine Learning Luo Mai Chuntao Hong Paolo Costa Machine Learning Successful in many fields Online advertisement Spam filtering Fraud detection Image recognition

More information

Programming Systems for Big Data

Programming Systems for Big Data Programming Systems for Big Data CS315B Lecture 17 Including material from Kunle Olukotun Prof. Aiken CS 315B Lecture 17 1 Big Data We ve focused on parallel programming for computational science There

More information

Research challenges in data-intensive computing The Stratosphere Project Apache Flink

Research challenges in data-intensive computing The Stratosphere Project Apache Flink Research challenges in data-intensive computing The Stratosphere Project Apache Flink Seif Haridi KTH/SICS haridi@kth.se e2e-clouds.org Presented by: Seif Haridi May 2014 Research Areas Data-intensive

More information

Analyzing Stochastic Gradient Descent for Some Non- Convex Problems

Analyzing Stochastic Gradient Descent for Some Non- Convex Problems Analyzing Stochastic Gradient Descent for Some Non- Convex Problems Christopher De Sa Soon at Cornell University cdesa@stanford.edu stanford.edu/~cdesa Kunle Olukotun Christopher Ré Stanford University

More information

SINGA: Putting Deep Learning in the Hands of Multimedia Users

SINGA: Putting Deep Learning in the Hands of Multimedia Users SINGA: Putting Deep Learning in the Hands of Multimedia Users Wei Wang, Gang Chen, Tien Tuan Anh Dinh, Jinyang Gao Beng Chin Ooi, Kian-Lee Tan, Sheng Wang School of Computing, National University of Singapore,

More information

Introduction to MapReduce

Introduction to MapReduce Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed

More information

Interactive Machine Learning - and our new INRIA project

Interactive Machine Learning - and our new INRIA project Interactive Machine Learning - and our new INRIA project John Canny Computer Science Division University of California, Berkeley June, 2014 Where is my computer? Where is my computer? Intel CPU NVIDIA

More information

Map Reduce Group Meeting

Map Reduce Group Meeting Map Reduce Group Meeting Yasmine Badr 10/07/2014 A lot of material in this presenta0on has been adopted from the original MapReduce paper in OSDI 2004 What is Map Reduce? Programming paradigm/model for

More information

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30

More information

Distributed computing: index building and use

Distributed computing: index building and use Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput

More information

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru

More information

Machine Learning with Python

Machine Learning with Python DEVNET-2163 Machine Learning with Python Dmitry Figol, SE WW Enterprise Sales @dmfigol Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session

More information

Practical Near-Data Processing for In-Memory Analytics Frameworks

Practical Near-Data Processing for In-Memory Analytics Frameworks Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard

More information

DATA SCIENCE USING SPARK: AN INTRODUCTION

DATA SCIENCE USING SPARK: AN INTRODUCTION DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

Dynamic Resource Allocation for Distributed Dataflows. Lauritz Thamsen Technische Universität Berlin

Dynamic Resource Allocation for Distributed Dataflows. Lauritz Thamsen Technische Universität Berlin Dynamic Resource Allocation for Distributed Dataflows Lauritz Thamsen Technische Universität Berlin 04.05.2018 Distributed Dataflows E.g. MapReduce, SCOPE, Spark, and Flink Used for scalable processing

More information

FTSGD: An Adaptive Stochastic Gradient Descent Algorithm for Spark MLlib

FTSGD: An Adaptive Stochastic Gradient Descent Algorithm for Spark MLlib FTSGD: An Adaptive Stochastic Gradient Descent Algorithm for Spark MLlib Hong Zhang, Zixia Liu, Hai Huang, Liqiang Wang Department of Computer Science, University of Central Florida, Orlando, FL, USA IBM

More information

Processing of big data with Apache Spark

Processing of big data with Apache Spark Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT

More information

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company

More information

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Abstract: This project aims at creating a benchmark for Deep Learning (DL) algorithms

More information

Distributed Computing with Spark

Distributed Computing with Spark Distributed Computing with Spark Reza Zadeh Thanks to Matei Zaharia Outline Data flow vs. traditional network programming Limitations of MapReduce Spark computing engine Numerical computing on Spark Ongoing

More information

Apache SystemML Declarative Machine Learning

Apache SystemML Declarative Machine Learning Apache Big Data Seville 2016 Apache SystemML Declarative Machine Learning Luciano Resende About Me Luciano Resende (lresende@apache.org) Architect and community liaison at Have been contributing to open

More information

GPU-Accelerated Deep Learning

GPU-Accelerated Deep Learning GPU-Accelerated Deep Learning July 6 th, 2016. Greg Heinrich. Credits: Alison B. Lowndes, Julie Bernauer, Leo K. Tam. PRACTICAL DEEP LEARNING EXAMPLES Image Classification, Object Detection, Localization,

More information

Lecture 3: Theano Programming

Lecture 3: Theano Programming Lecture 3: Theano Programming Misc Class Items Registration & auditing class Paper presentation Projects: ~10 projects in total ~2 students per project AAAI: Hinton s invited talk: Training data size increase

More information

ChainerMN: Scalable Distributed Deep Learning Framework

ChainerMN: Scalable Distributed Deep Learning Framework ChainerMN: Scalable Distributed Deep Learning Framework Takuya Akiba Preferred Networks, Inc. akiba@preferred.jp Keisuke Fukuda Preferred Networks, Inc. kfukuda@preferred.jp Shuji Suzuki Preferred Networks,

More information

Templates. for scalable data analysis. 2 Synchronous Templates. Amr Ahmed, Alexander J Smola, Markus Weimer. Yahoo! Research & UC Berkeley & ANU

Templates. for scalable data analysis. 2 Synchronous Templates. Amr Ahmed, Alexander J Smola, Markus Weimer. Yahoo! Research & UC Berkeley & ANU Templates for scalable data analysis 2 Synchronous Templates Amr Ahmed, Alexander J Smola, Markus Weimer Yahoo! Research & UC Berkeley & ANU Running Example Inbox Spam Running Example Inbox Spam Spam Filter

More information