CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Similar documents
CS489/698: Intro to ML

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

Neural Network Neurons

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Deep Neural Networks Optimization

Deep Learning with Tensorflow AlexNet

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs

Deep Learning Cook Book

Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Lecture : Training a neural net part I Initialization, activations, normalizations and other practical details Anne Solberg February 28, 2018

Machine Learning 13. week

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep neural networks II

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Lecture 37: ConvNets (Cont d) and Training

Supervised Learning in Neural Networks (Part 2)

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

EE 511 Neural Networks

Perceptron: This is convolution!

Efficient Deep Learning Optimization Methods

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018

Deep Learning for Computer Vision II

A Quick Guide on Training a neural network using Keras.

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Machine Learning. MGS Lecture 3: Deep Learning

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

COMPUTATIONAL INTELLIGENCE

Multi-layer Perceptron Forward Pass Backpropagation. Lecture 11: Aykut Erdem November 2016 Hacettepe University

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Deep Learning and Its Applications

Using neural nets to recognize hand-written digits. Srikumar Ramalingam School of Computing University of Utah

Keras: Handwritten Digit Recognition using MNIST Dataset

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

CS230: Deep Learning Winter Quarter 2018 Stanford University

Please write your initials at the top right of each page (e.g., write JS if you are Jonathan Shewchuk). Finish this by the end of your 3 hours.

Notes on Multilayer, Feedforward Neural Networks

DEEP LEARNING IN PYTHON. The need for optimization

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44

Visual object classification by sparse convolutional neural networks

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )

Administrative. Assignment 1 due Wednesday April 18, 11:59pm

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units

Unsupervised Learning

Technical University of Munich. Exercise 7: Neural Network Basics

Dynamic Routing Between Capsules

Lecture 5: Training Neural Networks, Part I

CS6220: DATA MINING TECHNIQUES

Keras: Handwritten Digit Recognition using MNIST Dataset

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Introduction to Deep Learning

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

A Simple (?) Exercise: Predicting the Next Word

11. Neural Network Regularization

Handwritten Hindi Numerals Recognition System

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Multinomial Regression and the Softmax Activation Function. Gary Cottrell!

INTRODUCTION TO DEEP LEARNING

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

2. Neural network basics

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Backpropagation and Neural Networks. Lecture 4-1

Deep Learning for Computer Vision

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Advanced Introduction to Machine Learning, CMU-10715

Backpropagation + Deep Learning

Homework 5. Due: April 20, 2018 at 7:00PM

Slides adapted from Marshall Tappen and Bryan Russell. Algorithms in Nature. Non-negative matrix factorization

Motivation. Problem: With our linear methods, we can train the weights but not the basis functions: Activator Trainable weight. Fixed basis function

Como funciona o Deep Learning

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Neural Nets & Deep Learning

CSC 578 Neural Networks and Deep Learning

HW Assignment 3 (Due by 9:00am on Mar 6)

Neural Networks (pp )

CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

ImageNet Classification with Deep Convolutional Neural Networks

Model Generalization and the Bias-Variance Trade-Off

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Convolutional Neural Networks

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

Alternatives to Direct Supervision

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Deep Neural Networks with Flexible Activation Function

Deep Learning Applications

For Monday. Read chapter 18, sections Homework:

Transcription:

CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes

Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional neuron types: Softmax

Preprocessing Common: zero-center, can normalize variance. Slide from Stanford CS231n

Preprocessing Can also decorrelate the data by using PCA, or whiten data Slide from Stanford CS231n

Preprocessing for Images Center the data only Compute a mean image (examples of mean faces) Either grayscale or compute separate mean for channels (RGB) Subtract the mean from your dataset

Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional neuron types: Softmax

Initialization Need to start gradient descent at an initial guess What happens if we initialize all weights to zero? Slide from Stanford CS231n

Initialization Idea: random numbers (e.g. normal distribution) w "# = N(μ, σ) μ=0, σ = const OK for shallow networks, but what about deep networks?

Initialization, σ = 0.01 Simulation: multilayer perceptron, 10 fully-connected hidden layers Tanh() activation function Hidden layer activation function statistics: Are there any problems with this? Hidden Layer 1 Hidden Layer 10 Slide from Stanford CS231n

Initialization, σ = 1 Simulation: multilayer perceptron, 10 fully-connected hidden layers Tanh() activation function Hidden layer activation function statistics: Are there any problems with this? Hidden Layer 1 Hidden Layer 10 Slide from Stanford CS231n

Xavier Initialization σ =, -./ n 12 : Number of neurons feeding into given neuron (actually, Xavier used a uniform distribution) Hidden layer activation function statistics: Reasonable initialization for tanh() activation function. But what happens with ReLU? Hidden Layer 1 Hidden Layer 10 Slide from Stanford CS231n

Xavier Initialization, ReLU σ =, -./ n 12 : Number of neurons feeding into given neuron Hidden layer activation function statistics: Hidden Layer 1 Hidden Layer 10 Slide from Stanford CS231n

He et al. 2015 Initialization, ReLU σ = 3 -./ n 12 : Number of neurons feeding into given neuron Hidden layer activation function statistics: Hidden Layer 1 Hidden Layer 10 Slide from Stanford CS231n

Other Ways to Initialize? Start with an existing pre-trained neural network s weights, fine tune its weights by re-running gradient descent This is really transfer learning, since it also transfers knowledge from the previously trained network Previously, people used unsupervised pre-training with autoencoders But we have better initializations now

Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional neuron types: Softmax

Vanishing/exploding gradient problem Recall from the backpropagation algorithm (last class slides): E w "# = δ # o " δ # = E o (o # t # ) # = φ (o o # net # ) > # J δ K w #K K M if j is an output neuron if j is an interior neuron Take δ over all neurons in a layer. We can call this a learning speed.

Vanishing/exploding gradient problem Vanishing gradients problem: neurons in earlier layers learn more slowly than in latter layers. Image from Nielson 2015

Vanishing/exploding gradient problem Vanishing gradients problem: neurons in earlier layers learn more slowly than in latter layers. Exploding gradients problem: gradients are significantly larger in earlier layers than latter layers. How to avoid? Use a good initialization Do not use sigmoid for deep networks Use momentum with carefully tuned schedules, e.g.: Image from Nielson 2015

Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional neuron types: Softmax

Batch normalization It would be great if we could just whiten the inputs to all neurons in a layer: i.e. zero mean, variance of 1. Avoid vanishing gradients problem, improve learning rates! For each input k to the next layer: Slight problem: this reduces representation ability of network Why?

Get stuck in this part of the activation function Batch normalization It would be great if we could just whiten the inputs to all neurons in a layer: i.e. zero mean, variance of 1. Avoid vanishing gradients problem, improve learning rates! For each input k to the next layer: Slight problem: this reduces representation ability of network Why?

Batch normalization First whiten each input k independently, using statistics from the mini-batch: Then introduce parameters to scale and shift each input: These parameters are learned by the optimization.

Batch normalization

Dropout: regularization Randomly zero outputs of p fraction of the neurons during training Can we learn representations that are robust to loss of neurons? Intuition: learn and remember useful information even if there are some errors in the computation (biological connection?) Slide from Stanford CS231n

Dropout Another interpretation: we are learning a large ensemble of models that share weights. Slide from Stanford CS231n

Dropout Another interpretation: we are learning a large ensemble of models that share weights. What can we do during testing to correct for the dropout process? Multiply all neurons outputs by p. Or equivalently (inverse dropout) simply divide all neurons outputs by p during training. Slide from Stanford CS231n

Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional neuron types: Softmax

Softmax Often used in final output layer to convert neuron outputs into a class probability scores that sum to 1. For example, might want to convert the final network output to: P(dog) = 0.2 (Probabilities in range [0, 1]) P(cat) = 0.8 (Sum of all probabilities is 1).

Softmax Softmax takes a vector z and outputs a vector of the same length.