Motivation Dropout Fast Dropout Maxout References. Dropout. Auston Sterling. January 26, 2016

Similar documents
Weighted Convolutional Neural Network. Ensemble.

Groupout: A Way to Regularize Deep Convolutional Neural Network

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Dropout. Sargur N. Srihari This is part of lecture slides on Deep Learning:

Deep Learning for Computer Vision

Stochastic Function Norm Regularization of DNNs

Deep Learning With Noise

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity

End-To-End Spam Classification With Neural Networks

Deep Learning with Tensorflow AlexNet

Deep Learning for Computer Vision II

Deep Model Compression

CS489/698: Intro to ML

From Maxout to Channel-Out: Encoding Information on Sparse Pathways

ImageNet Classification with Deep Convolutional Neural Networks

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018

DropConnect Regularization Method with Sparsity Constraint for Neural Networks

Contextual Dropout. Sam Fok. Abstract. 1. Introduction. 2. Background and Related Work

arxiv: v2 [cs.cv] 26 Jan 2018

Adversarial Examples and Adversarial Training. Ian Goodfellow, Staff Research Scientist, Google Brain CS 231n, Stanford University,

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Fast-Lipschitz Optimization

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Alternatives to Direct Supervision

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

arxiv: v1 [cs.cv] 9 Nov 2015

Deep Neural Networks:

Deep Learning. Volker Tresp Summer 2014

Slides credited from Dr. David Silver & Hung-Yi Lee

Backpropagation and Neural Networks. Lecture 4-1

Overall Description. Goal: to improve spatial invariance to the input data. Translation, Rotation, Scale, Clutter, Elastic

Supplementary material for Analyzing Filters Toward Efficient ConvNet

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Stochastic Gradient Descent Algorithm in the Computational Network Toolkit

Unsupervised Learning

Regularization. EE807: Recent Advances in Deep Learning Lecture 3. Slide made by Jongheon Jeong and Insu Han KAIST EE

An Exploration of Computer Vision Techniques for Bird Species Classification

Replay spoofing detection system for automatic speaker verification using multi-task learning of noise classes

Convolutional Neural Network for Facial Expression Recognition

Non-rigid body Object Tracking using Fuzzy Neural System based on Multiple ROIs and Adaptive Motion Frame Method

arxiv: v5 [cs.lg] 23 Sep 2015

arxiv: v3 [stat.ml] 15 Nov 2017

Deep Neural Network Acceleration Framework Under Hardware Uncertainty

Convolutional Neural Networks

Automated Crystal Structure Identification from X-ray Diffraction Patterns

Model validation T , , Heli Hiisilä

EE 511 Neural Networks

Domain-Aware Sentiment Classification with GRUs and CNNs

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Real-time convolutional networks for sonar image classification in low-power embedded systems

Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules

arxiv: v2 [cs.cv] 20 Oct 2018

model order p weights The solution to this optimization problem is obtained by solving the linear system

Learning Transferable Features with Deep Adaptation Networks

Deep Learning & Neural Networks

10703 Deep Reinforcement Learning and Control

Generalized Inverse Reinforcement Learning

3D model classification using convolutional neural network

Tiny ImageNet Visual Recognition Challenge

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

arxiv: v3 [cs.lg] 23 Jan 2018

Summary: A Tutorial on Learning With Bayesian Networks

Training Convolutional Neural Networks for Translational Invariance on SAR ATR

Dynamic Routing Between Capsules

arxiv: v3 [stat.ml] 20 Feb 2013

Character Recognition from Google Street View Images

3D-CNN and SVM for Multi-Drug Resistance Detection

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

Global Optimality in Neural Network Training

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

All You Want To Know About CNNs. Yukun Zhu

A Fast Learning Algorithm for Deep Belief Nets

Deep Neural Networks Optimization

Deep neural networks II

COS 513: Foundations of Probabilistic Modeling. Lecture 5

Does the Brain do Inverse Graphics?

Why equivariance is better than premature invariance

Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Stacked Denoising Autoencoders for Face Pose Normalization

Keras: Handwritten Digit Recognition using MNIST Dataset

Exemplar-Supported Generative Reproduction for Class Incremental Learning Supplementary Material

Learning from Data: Adaptive Basis Functions

arxiv: v1 [cs.cv] 6 Jul 2016

Exploring Capsules. Binghui Peng Runzhou Tao Shunyu Yao IIIS, Tsinghua University {pbh15, trz15,

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Adaptive Dropout Training for SVMs

Gradient of the lower bound

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters

PIXELCNN++: IMPROVING THE PIXELCNN WITH DISCRETIZED LOGISTIC MIXTURE LIKELIHOOD AND OTHER MODIFICATIONS

Part Localization by Exploiting Deep Convolutional Networks

On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units

GRADIENT-BASED OPTIMIZATION OF NEURAL

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Transcription:

Dropout Auston Sterling January 26, 2016

Outline Motivation Dropout Fast Dropout Maxout

Co-adaptation Each unit in a neural network should ideally compute one complete feature. Since units are trained together, multiple units may co-adapt, becoming dependent on one another to compute a feature This is sub-optimal, requiring more computation and causing overfitting

Co-adaptation 1 Which is preferable? 1 Srivastava et al., Dropout: A Simple Way to Prevent Neural Networks from Overfitting.

Model Combination We can reduce overfitting by combining the outputs of many different neural nets Best to train on different subsets of the data so that, while each may overfit to its subset, the combined models have a broader view This can be prohibitively expensive and requires large amounts of data

Sexual Reproduction Genes taken from either of two parents Each gene must be useful by itself; no guarantee that dependent genes will also make it through Specialized genes make it easy to incorporate beneficial new ones

Dropout 2 For each step of training, set the output of each unit to 0 with probability p. Best results with p 0.5 for hidden units and p close to 1 for inputs When testing, use all units but multiply weights by p That s it! 2 Hinton et al., Improving neural networks by preventing co-adaptation of feature detectors.

Dropout Notes Constrain L2 norm of weight vector for each unit (max-norm regularization), use a large learning rate The final trained network (if using softmax output) is exactly equivalent to the geometric mean of the probability distributions over labels predicted by all 2 N networks

Dropout results

Dropout results

Dropout is a Monte Carlo process, sampling the 2 N masks Can the process be approximated without requiring so much sampling? If z is the mask and w is weights, Y(z) = w T D z x = m i w i x i z i tends to a normal distribution Approximate Y(z) with a Gaussian and sample to compute gradients Fast Dropout Training 3 3 Wang and Manning, Fast dropout training.

Fast Dropout Results

Maxout Networks 4 Alternative activation function: h i (x) = max j [1,k] xt W i,j + b i,j Can approximate other activations Universal approximator 4 Goodfellow et al., Maxout networks.

Maxout and Dropout Dropout is exact model averaging for softmax, but also for multiple linear layers Authors claim linear operations with max works particularly well with dropout

Bibliography Goodfellow, Ian J et al. Maxout networks. In: arxiv preprint arxiv:1302.4389 (2013). Hinton, Geoffrey E. et al. Improving neural networks by preventing co-adaptation of feature detectors. In: CoRR abs/1207.0580 (2012). URL: http://arxiv.org/abs/1207.0580. Srivastava, Nitish et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. In: Journal of Machine Learning Research 15 (2014), pp. 1929 1958. URL: http://jmlr.org/papers/v15/srivastava14a.html. Wang, Sida and Christopher Manning. Fast dropout training. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13). Ed. by Sanjoy Dasgupta and David Mcallester. Vol. 28. 2. JMLR Workshop and Conference Proceedings, May 2013, pp. 118 126. URL: http://jmlr.csail.mit.edu/proceedings/papers/v28/wang13a.pdf.