DEEP LEARNING IN PYTHON. The need for optimization

Similar documents
Deep Learning with R. Francesca Lazzeri Data Scientist II - Microsoft, AI Research

DEEP LEARNING IN PYTHON. Introduction to deep learning

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Frameworks in Python for Numeric Computation / ML

Deep neural networks II

Multinomial Regression and the Softmax Activation Function. Gary Cottrell!

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

CSC 578 Neural Networks and Deep Learning

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Ensemble methods in machine learning. Example. Neural networks. Neural networks

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Keras: Handwritten Digit Recognition using MNIST Dataset

CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm

The Mathematics Behind Neural Networks

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018

Technical University of Munich. Exercise 7: Neural Network Basics

Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent

Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017

Usable while performant: the challenges building. Soumith Chintala

Supervised Learning in Neural Networks (Part 2)

Lecture : Training a neural net part I Initialization, activations, normalizations and other practical details Anne Solberg February 28, 2018

Thomas Nabelek September 22, ECE 7870 Project 1 Backpropagation

Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift. Amit Patel Sambit Pradhan

Derek Bridge School of Computer Science and Information Technology University College Cork. from sklearn.preprocessing import add_dummy_feature

Multilayer Feed-forward networks

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Perceptron: This is convolution!

ImageNet Classification with Deep Convolutional Neural Networks

Hybrid PSO-SA algorithm for training a Neural Network for Classification

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

CS281 Section 3: Practical Optimization

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs

Practice Exam Sample Solutions

CS6220: DATA MINING TECHNIQUES

Neural Network Neurons

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Homework 5. Due: April 20, 2018 at 7:00PM

Training Deep Neural Networks (in parallel)

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

House Price Prediction Using LSTM

A neural network that classifies glass either as window or non-window depending on the glass chemistry.

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Deep Learning with Tensorflow AlexNet

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Parallel Deep Network Training

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Deep Learning for Computer Vision II

Parallel Deep Network Training

CS230: Deep Learning Winter Quarter 2018 Stanford University

Lecture 3: Theano Programming

Back Propagation and Other Differentiation Algorithms. Sargur N. Srihari

Final Report: Classification of Plankton Classes By Tae Ho Kim and Saaid Haseeb Arshad

EE 511 Neural Networks

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

Deep Neural Networks Optimization

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

SGD: Stochastic Gradient Descent

Administrative. Assignment 1 due Wednesday April 18, 11:59pm

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps

Implement NN using NumPy

STA141C: Big Data & High Performance Statistical Computing

DEEP LEARNING IN PYTHON. Creating a keras model

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Optimization. 1. Optimization. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Graph Neural Network. learning algorithm and applications. Shujia Zhang

Building your Deep Neural Network: Step by Step

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Neuron Selectivity as a Biologically Plausible Alternative to Backpropagation

Tensorflow Example: Fizzbuzz. Sargur N. Srihari

Learning via Optimization

APPLICATION OF A MULTI- LAYER PERCEPTRON FOR MASS VALUATION OF REAL ESTATES

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Technical University of Munich. Exercise 8: Neural Networks

Neural Networks. Lab 3: Multi layer perceptrons. Nonlinear regression and prediction.

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

Machine Learning 13. week

Tutorial on Deep Learning with Theano and Lasagne. Jan Schlüter Sander Dieleman EMBL-EBI

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

Backpropagation + Deep Learning

Backpropagation and Neural Networks. Lecture 4-1

Practical Tips for using Backpropagation

Efficient Deep Learning Optimization Methods

An Introduction to the Directional Derivative and the Gradient Math Insight

Cost Functions in Machine Learning

Recurrent Neural Network (RNN) Industrial AI Lab.

CSC321 Lecture 10: Automatic Differentiation

Neural Networks (pp )

XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework

A Quick Guide on Training a neural network using Keras.

11. Neural Network Regularization

Transcription:

DEEP LEARNING IN PYTHON The need for optimization

A baseline neural network Input 2 Hidden Layer 5 2 Output - 9-3 Actual Value of Target: 3 Error: Actual - Predicted = 4

A baseline neural network Input 2 Hidden Layer 5 3 Output - 3-2 3 Actual Value of Target: 3 Error: Actual - Predicted = 0

Predictions with multiple points Making accurate predictions gets harder with more points At any set of weights, there are many values of the error corresponding to the many points we make predictions for

Loss function Aggregates errors in predictions from many data points into single number Measure of model s predictive performance

Squared error loss function Prediction Actual Error Squared Error 0 20-0 00 8 3 5 25 6 5 25 Total Squared Error: 50 Mean Squared Error: 50

Loss function Loss function Weight Weight2

Loss function Lower loss function value means a better model Goal: Find the weights that give the lowest value for the loss function Gradient descent

Gradient descent Imagine you are in a pitch dark field Want to find the lowest point Feel the ground to see how it slopes Take a small step downhill Repeat until it is uphill in every direction

Gradient descent steps Start at random point Until you are somewhere flat: Find the slope Take a step downhill

Optimizing a model with a single weight Loss(w) Minimum value w

DEEP LEARNING IN PYTHON Let s practice!

DEEP LEARNING IN PYTHON Gradient descent

Gradient descent Loss(w) w

Gradient descent If the slope is positive: Going opposite the slope means moving to lower numbers Subtract the slope from the current value Too big a step might lead us astray Solution: learning rate Update each weight by subtracting learning rate * slope

Slope calculation example 2 3 6 Actual Target Value = 0 To calculate the slope for a weight, need to multiply: Slope of the loss function w.r.t value at the node we feed into The value of the node that feeds into our weight Slope of the activation function w.r.t value we feed into

Slope calculation example 2 3 6 Actual Target Value = 0 To calculate the slope for a weight, need to multiply: Slope of the loss function w.r.t value at the node we feed into The value of the node that feeds into our weight Slope of the activation function w.r.t value we feed into

Slope calculation example 2 3 6 Actual Target Value = 0 Slope of mean-squared loss function w.r.t prediction: 2 * (Predicted Value - Actual Value) = 2 * Error 2 * -4

Slope calculation example 2 3 6 Actual Target Value = 0 To calculate the slope for a weight, need to multiply: Slope of the loss function w.r.t value at the node we feed into The value of the node that feeds into our weight Slope of the activation function w.r.t value we feed into

Slope calculation example 2 3 6 Actual Target Value = 0 To calculate the slope for a weight, need to multiply: Slope of the loss function w.r.t value at the node we feed into The value of the node that feeds into our weight Slope of the activation function w.r.t value we feed into

Slope calculation example 2 3 6 Actual Target Value = 0 To calculate the slope for a weight, need to multiply: Slope of the loss function w.r.t value at the node we feed into The value of the node that feeds into our weight Slope of the activation function w.r.t value we feed into

Slope calculation example 3 2 6 Actual Target Value = 0 2 * -4 * 3-24 If learning rate is 0.0, the new weight would be 2-0.0(-24) = 2.24

Network with two inputs affecting prediction 3 2 4

Code to calculate slopes and update weights In []: import numpy as np In [2]: weights = np.array([, 2]) In [3]: input_data = np.array([3, 4]) In [4]: target = 6 In [5]: learning_rate = 0.0 In [6]: preds = (weights * input_data).sum() In [7]: error = preds - target In [8]: print(error) 5

Code to calculate slopes and update weights In [9]: gradient = 2 * input_data * error In [0]: gradient Out[0]: array([30, 40]) In []: weights_updated = weights - learning_rate * gradient In [2]: preds_updated = (weights_updated * input_data).sum() In [3]: error_updated = preds_updated - target In [4]: print(error_updated) -2.5

DEEP LEARNING IN PYTHON Let s practice!

DEEP LEARNING IN PYTHON Backpropagation

Backpropagation Allows gradient descent to update all weights in neural network (by getting gradients for all weights) Comes from chain rule of calculus Important to understand the process, but you will generally use a library that implements this Prediction Error

Backpropagation process Trying to estimate the slope of the loss function w.r.t each weight Do forward propagation to calculate predictions and errors

Backpropagation process ReLU Activation Function Actual Target Value = 0 0.5 2 0-2 4 2 2-3

Backpropagation process ReLU Activation Function Actual Target Value = 0 Error = 3 0.5 2 0 0-2 7 4 2 2-3 3

Backpropagation process Go back one layer at a time Gradients for weight is product of:. Node value feeding into that weight 2. Slope of loss function w.r.t node it feeds into 3. Slope of activation function at the node it feeds into

ReLU Activation Function

Backpropagation process Need to also keep track of the slopes of the loss function w.r.t node values Slope of node values are the sum of the slopes for all weights that come out of them

DEEP LEARNING IN PYTHON Let s practice!

DEEP LEARNING IN PYTHON Backpropagation in practice

Backpropagation ReLU Activation Function Actual Target Value = 0 Error = 3 7 2 3 Top weight s slope = * 6 Bottom weight s slope = 3 * 6

Backpropagation 0 0 6 2 3 8

Calculating slopes associated with any weight Gradients for weight is product of:. Node value feeding into that weight 2. Slope of activation function for the node being fed into 3. Slope of loss function w.r.t output node

Backpropagation 0 0 6 Current Weight Value Gradient 0 0 2 6 3 8 2 0 3 8

Backpropagation: Recap Start at some random set of weights Use forward propagation to make a prediction Use backward propagation to calculate the slope of the loss function w.r.t each weight Multiply that slope by the learning rate, and subtract from the current weights Keep going with that cycle until we get to a flat part

Stochastic gradient descent It is common to calculate slopes on only a subset of the data ( batch ) Use a different batch of data to calculate the next update Start over from the beginning once all data is used Each time through the training data is called an epoch When slopes are calculated on one batch at a time: stochastic gradient descent

DEEP LEARNING IN PYTHON Let s practice