Large Scale Distributed Deep Networks
|
|
- Maria Townsend
- 5 years ago
- Views:
Transcription
1 Large Scale Distributed Deep Networks Yifu Huang School of Computer Science, Fudan University COMP Data Intensive Computing Report, 2013 Yifu Huang (FDU CS) COMP Report 2013/11/20 1 / 21
2 Outline 1 Motivation 2 Software Framework: DistBelief 3 Distributed Algorithm Downpour SGD Sandblaster L-BFGS 4 Experiments 5 Discussion Yifu Huang (FDU CS) COMP Report 2013/11/20 2 / 21
3 Motivation Motivation Why I choose this paper [1]? Google + Stanford Jerey Dean + Andrew Y. Ng Why we need large scale distributed deep networks? Large model can dramatically improve performance Training examples + model parameters Exist methods have limitations GPU, MapReduce, GraphLab What can we learn from this paper? Best parallelism design ideas for deep networks up to now Model parallelism + data parallelism Yifu Huang (FDU CS) COMP Report 2013/11/20 3 / 21
4 Motivation Motivation Why I choose this paper [1]? Google + Stanford Jerey Dean + Andrew Y. Ng Why we need large scale distributed deep networks? Large model can dramatically improve performance Training examples + model parameters Exist methods have limitations GPU, MapReduce, GraphLab What can we learn from this paper? Best parallelism design ideas for deep networks up to now Model parallelism + data parallelism Yifu Huang (FDU CS) COMP Report 2013/11/20 3 / 21
5 Motivation Motivation Why I choose this paper [1]? Google + Stanford Jerey Dean + Andrew Y. Ng Why we need large scale distributed deep networks? Large model can dramatically improve performance Training examples + model parameters Exist methods have limitations GPU, MapReduce, GraphLab What can we learn from this paper? Best parallelism design ideas for deep networks up to now Model parallelism + data parallelism Yifu Huang (FDU CS) COMP Report 2013/11/20 3 / 21
6 Motivation Preliminaries Neural Networks [6] Deep Networks [7] Multiple hidden layers This will allow us to compute much more complex features of the input Yifu Huang (FDU CS) COMP Report 2013/11/20 4 / 21
7 Motivation Preliminaries Neural Networks [6] Deep Networks [7] Multiple hidden layers This will allow us to compute much more complex features of the input Yifu Huang (FDU CS) COMP Report 2013/11/20 4 / 21
8 Software Framework: DistBelief Software Framework: DistBelief Model parallelism Inside parallelism Multi-thread + message passing -> large scale User denes Computation in node, message upward/downward Framework manages Synchronization, data transfer Performance depends on Connectivity structure, computational needs Yifu Huang (FDU CS) COMP Report 2013/11/20 5 / 21
9 Software Framework: DistBelief Software Framework: DistBelief Model parallelism Inside parallelism Multi-thread + message passing -> large scale User denes Computation in node, message upward/downward Framework manages Synchronization, data transfer Performance depends on Connectivity structure, computational needs Yifu Huang (FDU CS) COMP Report 2013/11/20 5 / 21
10 Software Framework: DistBelief Software Framework: DistBelief Model parallelism Inside parallelism Multi-thread + message passing -> large scale User denes Computation in node, message upward/downward Framework manages Synchronization, data transfer Performance depends on Connectivity structure, computational needs Yifu Huang (FDU CS) COMP Report 2013/11/20 5 / 21
11 Software Framework: DistBelief Software Framework: DistBelief Model parallelism Inside parallelism Multi-thread + message passing -> large scale User denes Computation in node, message upward/downward Framework manages Synchronization, data transfer Performance depends on Connectivity structure, computational needs Yifu Huang (FDU CS) COMP Report 2013/11/20 5 / 21
12 Software Framework: DistBelief Software Framework: DistBelief (cont.) An example of model parallelism in DistBelief Yifu Huang (FDU CS) COMP Report 2013/11/20 6 / 21
13 Distributed Algorithm Distributed Algorithm Data parallelism Outside parallelism Multiple model instances optimize a single objective -> high speed A centralized sharded parameter server Dierent model replicas retrieve/update their own parameters Load balance, robust Tolerate variance in the processing speed of dierent model replicas The wholesale failure may be taken oine or restart at random Yifu Huang (FDU CS) COMP Report 2013/11/20 7 / 21
14 Distributed Algorithm Distributed Algorithm Data parallelism Outside parallelism Multiple model instances optimize a single objective -> high speed A centralized sharded parameter server Dierent model replicas retrieve/update their own parameters Load balance, robust Tolerate variance in the processing speed of dierent model replicas The wholesale failure may be taken oine or restart at random Yifu Huang (FDU CS) COMP Report 2013/11/20 7 / 21
15 Distributed Algorithm Distributed Algorithm Data parallelism Outside parallelism Multiple model instances optimize a single objective -> high speed A centralized sharded parameter server Dierent model replicas retrieve/update their own parameters Load balance, robust Tolerate variance in the processing speed of dierent model replicas The wholesale failure may be taken oine or restart at random Yifu Huang (FDU CS) COMP Report 2013/11/20 7 / 21
16 Distributed Algorithm Downpour SGD Outline 1 Motivation 2 Software Framework: DistBelief 3 Distributed Algorithm Downpour SGD Sandblaster L-BFGS 4 Experiments 5 Discussion Yifu Huang (FDU CS) COMP Report 2013/11/20 8 / 21
17 Distributed Algorithm Downpour SGD Downpour SGD SGD [8] Minimize the object function F (ω) Update parameters ω = ω η ω asynchronous SGD [3] Downpour Massive parameters retrieved and updated Adagrad learning rate [4] η i,k = γ/ K j=1 ω i,j 2 Improve both robust and scale Yifu Huang (FDU CS) COMP Report 2013/11/20 9 / 21
18 Distributed Algorithm Downpour SGD Downpour SGD SGD [8] Minimize the object function F (ω) Update parameters ω = ω η ω asynchronous SGD [3] Downpour Massive parameters retrieved and updated Adagrad learning rate [4] η i,k = γ/ K j=1 ω i,j 2 Improve both robust and scale Yifu Huang (FDU CS) COMP Report 2013/11/20 9 / 21
19 Distributed Algorithm Downpour SGD Downpour SGD SGD [8] Minimize the object function F (ω) Update parameters ω = ω η ω asynchronous SGD [3] Downpour Massive parameters retrieved and updated Adagrad learning rate [4] η i,k = γ/ K j=1 ω i,j 2 Improve both robust and scale Yifu Huang (FDU CS) COMP Report 2013/11/20 9 / 21
20 Distributed Algorithm Downpour SGD Downpour SGD (cont.) Algorithm visualization Yifu Huang (FDU CS) COMP Report 2013/11/20 10 / 21
21 Distributed Algorithm Downpour SGD Downpour SGD (cont.) Algorithm pseudo code Yifu Huang (FDU CS) COMP Report 2013/11/20 11 / 21
22 Distributed Algorithm Sandblaster L-BFGS Outline 1 Motivation 2 Software Framework: DistBelief 3 Distributed Algorithm Downpour SGD Sandblaster L-BFGS 4 Experiments 5 Discussion Yifu Huang (FDU CS) COMP Report 2013/11/20 12 / 21
23 Distributed Algorithm Sandblaster L-BFGS Sandblaster L-BFGS BFGS [9] An iterative method for solving unconstrained nonlinear optimization Compute an approximation to the Hessian matrix B Limited-memory BFGS [10] Sandblaster Massive commands issued by coordinator Load banancing scheme Dynamic work assigned by coordinator Yifu Huang (FDU CS) COMP Report 2013/11/20 13 / 21
24 Distributed Algorithm Sandblaster L-BFGS Sandblaster L-BFGS BFGS [9] An iterative method for solving unconstrained nonlinear optimization Compute an approximation to the Hessian matrix B Limited-memory BFGS [10] Sandblaster Massive commands issued by coordinator Load banancing scheme Dynamic work assigned by coordinator Yifu Huang (FDU CS) COMP Report 2013/11/20 13 / 21
25 Distributed Algorithm Sandblaster L-BFGS Sandblaster L-BFGS BFGS [9] An iterative method for solving unconstrained nonlinear optimization Compute an approximation to the Hessian matrix B Limited-memory BFGS [10] Sandblaster Massive commands issued by coordinator Load banancing scheme Dynamic work assigned by coordinator Yifu Huang (FDU CS) COMP Report 2013/11/20 13 / 21
26 Distributed Algorithm Sandblaster L-BFGS Sandblaster L-BFGS (cont.) Algorithm visualization Yifu Huang (FDU CS) COMP Report 2013/11/20 14 / 21
27 Distributed Algorithm Sandblaster L-BFGS Sandblaster L-BFGS (cont.) Algorithm pseudo code Yifu Huang (FDU CS) COMP Report 2013/11/20 15 / 21
28 Experiments Experiments Setup Object recognition in still images [5] Acoustic processing for speech recognition [2] Model parallelism benchmarks Yifu Huang (FDU CS) COMP Report 2013/11/20 16 / 21
29 Experiments Experiments Setup Object recognition in still images [5] Acoustic processing for speech recognition [2] Model parallelism benchmarks Yifu Huang (FDU CS) COMP Report 2013/11/20 16 / 21
30 Experiments Experiments (cont.) Optimization method comparisons Yifu Huang (FDU CS) COMP Report 2013/11/20 17 / 21
31 Experiments Experiments (cont.) Optimization method comparisons Application to ImageNet [2] This network achieved a cross-validated classication accuracy of over 15%, a relative improvement over 60% from the best performance we are aware of on the 21k category ImageNet classication task Yifu Huang (FDU CS) COMP Report 2013/11/20 18 / 21
32 Discussion Discussion Contributions Increase the scale and speed of deep networks training Drawbacks There is no guarantee that parameters are consistent Yifu Huang (FDU CS) COMP Report 2013/11/20 19 / 21
33 Discussion Discussion Contributions Increase the scale and speed of deep networks training Drawbacks There is no guarantee that parameters are consistent Yifu Huang (FDU CS) COMP Report 2013/11/20 19 / 21
34 Appendix References I [1] Large Scale Distributed Deep Networks. NIPS [2] Building High-level Features Using Large Scale Unsupervised Learning. ICML [3] Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. NIPS [4] Adaptive subgradient methods for online learning and stochastic optimization. JMLR [5] Improving the speed of neural networks on cpus. NIPS [6] [7] Deep_Networks:_Overview [8] Gradient_checking_and_advanced_optimization Yifu Huang (FDU CS) COMP Report 2013/11/20 20 / 21
35 Appendix References II [9] [10] Yifu Huang (FDU CS) COMP Report 2013/11/20 21 / 21
Toward Scalable Deep Learning
한국정보과학회 인공지능소사이어티 머신러닝연구회 두번째딥러닝워크샵 2015.10.16 Toward Scalable Deep Learning 윤성로 Electrical and Computer Engineering Seoul National University http://data.snu.ac.kr Breakthrough: Big Data + Machine Learning
More informationCS 179 Lecture 16. Logistic Regression & Parallel SGD
CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)
More informationParallel Implementation of Deep Learning Using MPI
Parallel Implementation of Deep Learning Using MPI CSE633 Parallel Algorithms (Spring 2014) Instructor: Prof. Russ Miller Team #13: Tianle Ma Email: tianlema@buffalo.edu May 7, 2014 Content Introduction
More informationDL4J Components. Open Source DeepLearning Library CPU & GPU support Hadoop-Yarn & Spark integration Futre additions
DeepLearning4j DL4J Components Open Source DeepLearning Library CPU & GPU support Hadoop-Yarn & Spark integration Futre additions DL4J Components Open Source Deep Learning Library What is DeepLearning
More informationDecentralized and Distributed Machine Learning Model Training with Actors
Decentralized and Distributed Machine Learning Model Training with Actors Travis Addair Stanford University taddair@stanford.edu Abstract Training a machine learning model with terabytes to petabytes of
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 4, 2016 Outline Multi-core v.s. multi-processor Parallel Gradient Descent Parallel Stochastic Gradient Parallel Coordinate Descent Parallel
More informationAsynchronous Parallel Stochastic Gradient Descent. A Numeric Core for Scalable Distributed Machine Learning Algorithms
Asynchronous Parallel Stochastic Gradient Descent A Numeric Core for Scalable Distributed Machine Learning Algorithms J. Keuper and F.-J. Pfreundt Competence Center High Performance Computing Fraunhofer
More informationScalable Search Engine Solution
Scalable Search Engine Solution A Case Study of BBS Yifu Huang School of Computer Science, Fudan University huangyifu@fudan.edu.cn COMP620028 Information Retrieval Project, 2013 Yifu Huang (FDU CS) COMP620028
More informationDeep Learning and Its Applications
Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent
More informationCS281 Section 3: Practical Optimization
CS281 Section 3: Practical Optimization David Duvenaud and Dougal Maclaurin Most parameter estimation problems in machine learning cannot be solved in closed form, so we often have to resort to numerical
More informationParallelization in the Big Data Regime: Model Parallelization? Sham M. Kakade
Parallelization in the Big Data Regime: Model Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 12 Announcements...
More informationECE 901: Large-scale Machine Learning and Optimization Spring Lecture 13 March 6
ECE 901: Large-scale Machine Learning and Optimization Spring 2018 Lecture 13 March 6 Lecturer: Dimitris Papailiopoulos Scribe: Yuhua Zhu & Xiaowu Dai Note: These lecture notes are still rough, and have
More informationParallel and Distributed Deep Learning
Parallel and Distributed Deep Learning Vishakh Hegde Stanford University vishakh@stanford.edu Sheema Usmani Stanford University sheema@stanford.edu Abstract The goal of this report is to explore ways to
More informationTraining Deep Neural Networks (in parallel)
Lecture 9: Training Deep Neural Networks (in parallel) Visual Computing Systems How would you describe this professor? Easy? Mean? Boring? Nerdy? Professor classification task Classifies professors as
More informationParallel Deep Network Training
Lecture 26: Parallel Deep Network Training Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2016 Tunes Speech Debelle Finish This Album (Speech Therapy) Eat your veggies and study
More informationParallel Unsupervised Feature Learning with Sparse AutoEncoder
Parallel Unsupervised Feature Learning with Sparse AutoEncoder Milinda Lakkam, Siranush Sarkizova {milindal, sisis}@stanford.edu December 10, 2010 1 Introduction and Motivation Choosing the right input
More informationCase Study 1: Estimating Click Probabilities
Case Study 1: Estimating Click Probabilities SGD cont d AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade March 31, 2015 1 Support/Resources Office Hours Yao Lu:
More informationAnalyzing Stochastic Gradient Descent for Some Non- Convex Problems
Analyzing Stochastic Gradient Descent for Some Non- Convex Problems Christopher De Sa Soon at Cornell University cdesa@stanford.edu stanford.edu/~cdesa Kunle Olukotun Christopher Ré Stanford University
More informationParallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade
Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 23 Announcements...
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing
More informationNeural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Neural Network and Deep Learning Early history of deep learning Deep learning dates back to 1940s: known as cybernetics in the 1940s-60s, connectionism in the 1980s-90s, and under the current name starting
More informationDeep Learning Training System: Adam & Google Cloud ML. CSE 5194: Introduction to High- Performance Deep Learning Qinghua Zhou
Deep Learning Training System: Adam & Google Cloud ML CSE 5194: Introduction to High- Performance Deep Learning Qinghua Zhou Outline Project Adam: Building an Efficient and Scalable Deep Learning Training
More informationParallel Deep Network Training
Lecture 19: Parallel Deep Network Training Parallel Computer Architecture and Programming How would you describe this professor? Easy? Mean? Boring? Nerdy? Professor classification task Classifies professors
More informationDeep Learning. Volker Tresp Summer 2014
Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More informationTensorFlow: A System for Learning-Scale Machine Learning. Google Brain
TensorFlow: A System for Learning-Scale Machine Learning Google Brain The Problem Machine learning is everywhere This is in large part due to: 1. Invention of more sophisticated machine learning models
More informationCNN Basics. Chongruo Wu
CNN Basics Chongruo Wu Overview 1. 2. 3. Forward: compute the output of each layer Back propagation: compute gradient Updating: update the parameters with computed gradient Agenda 1. Forward Conv, Fully
More informationCS 6453: Parameter Server. Soumya Basu March 7, 2017
CS 6453: Parameter Server Soumya Basu March 7, 2017 What is a Parameter Server? Server for large scale machine learning problems Machine learning tasks in a nutshell: Feature Extraction (1, 1, 1) (2, -1,
More informationNear-Data Processing for Differentiable Machine Learning Models
Near-Data Processing for Differentiable Machine Learning Models Hyeokjun Choe 1, Seil Lee 1, Hyunha Nam 1, Seongsik Park 1, Seijoon Kim 1, Eui-Young Chung 2 and Sungroh Yoon 1,3 1 Electrical and Computer
More informationShmCaffe: A Distributed Deep Learning Platform with Shared Memory Buffer for HPC Architecture
ShmCaffe: A Distributed Deep Learning Platform with Shared Memory Buffer for HPC Architecture Shinyoung Ahn, Joongheon Kim, Eunji Lim, Wan Choi, Aziz Mohaisen, and Sungwon Kang Dept. Info. & Comm. Engineering,
More informationDistributed Machine Learning: An Intro. Chen Huang
: An Intro. Chen Huang Feature Engineering Group, Data Mining Lab, Big Data Research Center, UESTC Contents Background Some Examples Model Parallelism & Data Parallelism Parallelization Mechanisms Synchronous
More informationOptimization in the Big Data Regime 5: Parallelization? Sham M. Kakade
Optimization in the Big Data Regime 5: Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 21 Announcements...
More informationCOMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare
COMP6237 Data Mining Data Mining & Machine Learning with Big Data Jonathon Hare jsh2@ecs.soton.ac.uk Contents Going to look at two case-studies looking at how we can make machine-learning algorithms work
More informationDeep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers
Deep Learning Workshop Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Why deep learning? The ImageNet Challenge Goal: image classification with 1000 categories Top 5 error rate of 15%. Krizhevsky, Alex,
More informationTENSORFLOW: LARGE-SCALE MACHINE LEARNING ON HETEROGENEOUS DISTRIBUTED SYSTEMS. by Google Research. presented by Weichen Wang
TENSORFLOW: LARGE-SCALE MACHINE LEARNING ON HETEROGENEOUS DISTRIBUTED SYSTEMS by Google Research presented by Weichen Wang 2016.11.28 OUTLINE Introduction The Programming Model The Implementation Single
More informationCS-541 Wireless Sensor Networks
CS-541 Wireless Sensor Networks Lecture 14: Big Sensor Data Prof Panagiotis Tsakalides, Dr Athanasia Panousopoulou, Dr Gregory Tsagkatakis 1 Overview Big Data Big Sensor Data Material adapted from: Recent
More informationDeep Learning with Tensorflow AlexNet
Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationCOMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationFTSGD: An Adaptive Stochastic Gradient Descent Algorithm for Spark MLlib
FTSGD: An Adaptive Stochastic Gradient Descent Algorithm for Spark MLlib Hong Zhang, Zixia Liu, Hai Huang, Liqiang Wang Department of Computer Science, University of Central Florida, Orlando, FL, USA IBM
More informationParallel Stochastic Gradient Descent: The case for native GPU-side GPI
Parallel Stochastic Gradient Descent: The case for native GPU-side GPI J. Keuper Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Mark Silberstein Accelerated Computer
More informationSINGA: Putting Deep Learning in the Hands of Multimedia Users
SINGA: Putting Deep Learning in the Hands of Multimedia Users Wei Wang, Gang Chen, Tien Tuan Anh Dinh, Jinyang Gao Beng Chin Ooi, Kian-Lee Tan, Sheng Wang School of Computing, National University of Singapore,
More informationAsynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features
Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Xu SUN ( 孙栩 ) Peking University xusun@pku.edu.cn Motivation Neural networks -> Good Performance CNN, RNN, LSTM
More informationGradient Descent. Michail Michailidis & Patrick Maiden
Gradient Descent Michail Michailidis & Patrick Maiden Outline Mo4va4on Gradient Descent Algorithm Issues & Alterna4ves Stochas4c Gradient Descent Parallel Gradient Descent HOGWILD! Mo4va4on It is good
More informationParallelism. CS6787 Lecture 8 Fall 2017
Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationarxiv: v2 [cs.ne] 26 Apr 2014
One weird trick for parallelizing convolutional neural networks Alex Krizhevsky Google Inc. akrizhevsky@google.com April 29, 2014 arxiv:1404.5997v2 [cs.ne] 26 Apr 2014 Abstract I present a new way to parallelize
More informationCafeGPI. Single-Sided Communication for Scalable Deep Learning
CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationAsynchronous Stochastic Gradient Descent on GPU: Is It Really Better than CPU?
Asynchronous Stochastic Gradient Descent on GPU: Is It Really Better than CPU? Florin Rusu Yujing Ma, Martin Torres (Ph.D. students) University of California Merced Machine Learning (ML) Boom Two SIGMOD
More informationGUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning
GUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning Koichi Shirahata, Youri Coppens, Takuya Fukagai, Yasumoto Tomita, and Atsushi Ike FUJITSU LABORATORIES LTD. March 27, 2018 0 Deep
More informationClass 6 Large-Scale Image Classification
Class 6 Large-Scale Image Classification Liangliang Cao, March 7, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual
More informationMetric Learning for Large-Scale Image Classification:
Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka
More informationA More Robust Asynchronous SGD
A More Robust Asynchronous SGD Said Al Faraby Artificial Intelligence University of Amsterdam This thesis is submitted for the degree of Master of Science February 2015 Abstract With the massive amount
More informationSCALABLE DISTRIBUTED DEEP LEARNING
SEOUL Oct.7, 2016 SCALABLE DISTRIBUTED DEEP LEARNING Han Hee Song, PhD Soft On Net 10/7/2016 BATCH PROCESSING FRAMEWORKS FOR DL Data parallelism provides efficient big data processing: data collecting,
More informationDistributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability Janis Keuper Itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern,
More informationScaling Distributed Machine Learning
Scaling Distributed Machine Learning with System and Algorithm Co-design Mu Li Thesis Defense CSD, CMU Feb 2nd, 2017 nx min w f i (w) Distributed systems i=1 Large scale optimization methods Large-scale
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun
More informationLearning Visual Semantics: Models, Massive Computation, and Innovative Applications
Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features
More informationLarge-scale Deep Unsupervised Learning using Graphics Processors
Large-scale Deep Unsupervised Learning using Graphics Processors Rajat Raina Anand Madhavan Andrew Y. Ng Stanford University Learning from unlabeled data Classify vs. car motorcycle Input space Unlabeled
More informationLarge-Scale Deep Learning
Large-Scale Deep Learning IPAM SUMMER SCHOOL Marc'Aurelio ranzato@google.com www.cs.toronto.edu/~ranzato UCLA, 24 July 2012 Two Approches to Deep Learning Deep Neural Nets: - usually more efficient (at
More informationApache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis
Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis Züri Machine Learning Meetup #5 June 16, 2014 Apache Giraph for applications in Machine Learning
More informationDistributed Deep Q-Learning
Distributed Deep Q-Learning Kevin Chavez 1, Hao Yi Ong 1, and Augustus Hong 1 Abstract We propose a distributed deep learning model to successfully learn control policies directly from highdimensional
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationConflict Graphs for Parallel Stochastic Gradient Descent
Conflict Graphs for Parallel Stochastic Gradient Descent Darshan Thaker*, Guneet Singh Dhillon* Abstract We present various methods for inducing a conflict graph in order to effectively parallelize Pegasos.
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationAsynchronous Distributed Data Parallelism for Machine Learning
Asynchronous Distributed Data Parallelis for Machine Learning Zheng Yan, Yunfeng Shao Shannon Lab, Huawei Technologies Co., Ltd. Beijing, China, 100085 {yanzheng, shaoyunfeng}@huawei.co Abstract Distributed
More informationPractice Questions for Midterm
Practice Questions for Midterm - 10-605 Oct 14, 2015 (version 1) 10-605 Name: Fall 2015 Sample Questions Andrew ID: Time Limit: n/a Grade Table (for teacher use only) Question Points Score 1 6 2 6 3 15
More informationCharacterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager
Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance
More informationOptimization for Machine Learning
with a focus on proximal gradient descent algorithm Department of Computer Science and Engineering Outline 1 History & Trends 2 Proximal Gradient Descent 3 Three Applications A Brief History A. Convex
More informationObject Recognition Under Complex Environmental Conditions on Remote Sensing Image
International Journal of Applied Environmental Sciences ISSN 0973-6077 Volume 11, Number 3 (016), pp. 751-758 Research India Publications http://www.ripublication.com Object Recognition Under Complex Environmental
More informationImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture
More informationEmpirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural
More informationAdaptive Regularization. in Neural Network Filters
Adaptive Regularization in Neural Network Filters Course 0455 Advanced Digital Signal Processing May 3 rd, 00 Fares El-Azm Michael Vinther d97058 s97397 Introduction The bulk of theoretical results and
More informationDeep Learning. Volker Tresp Summer 2015
Deep Learning Volker Tresp Summer 2015 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More information291 Programming Assignment #3
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationBHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques
BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques Jingyang Zhu 1, Zhiliang Qian 2*, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and
More informationLogistic Regression. Abstract
Logistic Regression Tsung-Yi Lin, Chen-Yu Lee Department of Electrical and Computer Engineering University of California, San Diego {tsl008, chl60}@ucsd.edu January 4, 013 Abstract Logistic regression
More informationTENSORFLOW: LARGE-SCALE MACHINE LEARNING ON HETEROGENEOUS DISTRIBUTED SYSTEMS. By Sanjay Surendranath Girija
TENSORFLOW: LARGE-SCALE MACHINE LEARNING ON HETEROGENEOUS DISTRIBUTED SYSTEMS By Sanjay Surendranath Girija WHAT IS TENSORFLOW? TensorFlow is an interface for expressing machine learning algorithms, and
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationAsynchronous Multi-Task Learning
Asynchronous Multi-Task Learning Inci M. Baytas, Ming Yan, Anil K. Jain and Jiayu Zhou December 14th, 2016 ICDM 2016 Inci M. Baytas, Ming Yan, Anil K. Jain and Jiayu Zhou 1 Outline 1 Introduction 2 Solving
More informationLecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen
Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 12 December 2012
More informationDistributed Hessian-Free Optimization for Deep Neural Network
The AAAI-17 Workshop on Distributed Machine Learning WS-17-08 Distributed Hessian-Free Optimization for Deep Neural Network Xi He Industrial and Systems Engineering Lehigh University, USA xih314@lehigh.edu
More informationMULTICORE LEARNING ALGORITHM
MULTICORE LEARNING ALGORITHM CHENG-TAO CHU, YI-AN LIN, YUANYUAN YU 1. Summary The focus of our term project is to apply the map-reduce principle to a variety of machine learning algorithms that are computationally
More informationEfficient Deep Learning Optimization Methods
11-785/ Spring 2019/ Recitation 3 Efficient Deep Learning Optimization Methods Josh Moavenzadeh, Kai Hu, and Cody Smith Outline 1 Review of optimization 2 Optimization practice 3 Training tips in PyTorch
More informationThe Future of High Performance Computing
The Future of High Performance Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Comparing Two Large-Scale Systems Oakridge Titan Google Data Center 2 Monolithic supercomputer
More informationUnifying Adversarial Training Algorithms with Data Gradient Regularization
ARTICLE Communicated by Anh Nguyen Unifying Adversarial Training Algorithms with Data Gradient Regularization Alexander G. Ororbia II agol09@ist.psu.edu Daniel Kifer dkifer@cse.psu.edu C. Lee Giles giles@ist.psu.edu
More informationMachine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center
Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction
More informationLecture 22 : Distributed Systems for ML
10-708: Probabilistic Graphical Models, Spring 2017 Lecture 22 : Distributed Systems for ML Lecturer: Qirong Ho Scribes: Zihang Dai, Fan Yang 1 Introduction Big data has been very popular in recent years.
More informationIndex. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,
A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,
More informationStacked Denoising Autoencoders for Face Pose Normalization
Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University
More informationWhy DNN Works for Speech and How to Make it More Efficient?
Why DNN Works for Speech and How to Make it More Efficient? Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering, York University, CANADA Joint work with Y.
More informationStructured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen
Structured Perceptron Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen 1 Outline 1. 2. 3. 4. Brief review of perceptron Structured Perceptron Discriminative Training Methods for Hidden Markov Models: Theory and
More informationHuman-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015
Human-level Control Through Deep Reinforcement Learning (Deep Q Network) Peidong Wang 11/13/2015 Content Demo Framework Remarks Experiment Discussion Content Demo Framework Remarks Experiment Discussion
More informationCS 229 Final Report: Artistic Style Transfer for Face Portraits
CS 229 Final Report: Artistic Style Transfer for Face Portraits Daniel Hsu, Marcus Pan, Chen Zhu {dwhsu, mpanj, chen0908}@stanford.edu Dec 16, 2016 1 Introduction The goal of our project is to learn the
More informationDeep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES
Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated
More informationMALT: Distributed Data-Parallelism for Existing ML Applications
MALT: Distributed -Parallelism for Existing ML Applications Hao Li, Asim Kadav, Erik Kruus NEC Laboratories America {asim, kruus@nec-labs.com} Abstract We introduce MALT, a machine learning library that
More information