A Quick Guide on Training a neural network using Keras.

Similar documents
Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

An Introduction to NNs using Keras

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Deep Nets with. Keras

Machine Learning and Algorithms for Data Mining Practical 2: Neural Networks

Deep Neural Networks Optimization

PLT: Inception (cuz there are so many layers)

An Introduction to Deep Learning with RapidMiner. Philipp Schlunder - RapidMiner Research

Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Machine Learning 13. week

Deep neural networks II

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

Deep Learning with Tensorflow AlexNet

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

MoonRiver: Deep Neural Network in C++

Deep Learning Applications

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

RNN LSTM and Deep Learning Libraries

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs

Lecture 5: Training Neural Networks, Part I

Deep Learning in NLP. Horacio Rodríguez. AHLT Deep Learning 2 1

Deep Learning and Its Applications

CSC 578 Neural Networks and Deep Learning

Practical 8: Neural networks

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

CNN Basics. Chongruo Wu

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

ˆ How to develop a naive LSTM network for a sequence prediction problem.

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Software and Practical Methodology. Tambet Matiisen Neural Networks course

Package kerasformula

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Advanced Machine Learning

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018

Dynamic Routing Between Capsules

How to Develop Encoder-Decoder LSTMs

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

Deep Learning for Computer Vision II

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Deep Learning Frameworks. COSC 7336: Advanced Natural Language Processing Fall 2017

DEEP LEARNING IN PYTHON. Creating a keras model

Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Tutorial on Machine Learning Tools

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 15

MATTI TUHOLA WIRELESS ACCESS POINT QUALITY ASSESSMENT USING CONVOLUTIONAL NEURAL NETWORKS. Bachelor of Science Thesis

Lecture 20: Neural Networks for NLP. Zubin Pahuja

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Day 1 Lecture 6. Software Frameworks for Deep Learning

Neural Networks with Input Specified Thresholds

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

The Mathematics Behind Neural Networks

Lecture : Training a neural net part I Initialization, activations, normalizations and other practical details Anne Solberg February 28, 2018

If you installed VM and Linux libraries as in the tutorial, you should not get any errors. Otherwise, you may need to install wget or gunzip.

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Machine Learning Workshop

Neural networks. About. Linear function approximation. Spyros Samothrakis Research Fellow, IADS University of Essex.

Administrative. Assignment 1 due Wednesday April 18, 11:59pm

Computer Vision: Homework 5 Optical Character Recognition using Neural Networks

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Visual scene real-time analysis for Intelligent Vehicles:

All You Want To Know About CNNs. Yukun Zhu

EE 511 Neural Networks

Backpropagation and Neural Networks. Lecture 4-1

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Deep Learning Cook Book

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Machine Learning. MGS Lecture 3: Deep Learning

Real-Time Depth Estimation from 2D Images

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Application of Deep Learning Techniques in Satellite Telemetry Analysis.

Deep Learning. Volker Tresp Summer 2017

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

INTRODUCTION TO DEEP LEARNING

Introduction to Neural Networks

Vulnerability of machine learning models to adversarial examples

Hidden Units. Sargur N. Srihari

Usable while performant: the challenges building. Soumith Chintala

Deep Learning with R. Francesca Lazzeri Data Scientist II - Microsoft, AI Research

Tutorial on Deep Learning with Theano and Lasagne. Jan Schlüter Sander Dieleman EMBL-EBI

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )

Deep learning at Microsoft

ECE 5470 Classification, Machine Learning, and Neural Network Review

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Recurrent neural networks for polyphonic sound event detection in real life recordings

Transcription:

A Quick Guide on Training a neural network using Keras.

TensorFlow and Keras Keras Open source High level, less flexible Easy to learn Perfect for quick implementations Starts by François Chollet from a project and developed by lots of people. Written in Python, wrapper for Theano, TensorFlow, and CNTK TensorFlow Open Source Low level, you can do everything! Complete documentation Deep learning research, complex networks Was developed by the Google Brain team Written mostly in C++ and CUDA and Python https://keras.io/ https://www.tensorflow.org/

Keras Feed Forward MNIST Recurrent networks, cosine function Convolutional MNIST https://en.wikipedia.org/wiki/mnist_database

Train Feed-Forward Network 2.5 4.1 3.0 0.003 0.003 0.001 Target: 1.0 0.0025 Weight Initialization and Loading the Data More on Weight Initialization: http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf

Train Feed-Forward Network - ) w + I +. f 2.5 4.1 3.0 0.003 0.3 0.3 0.5 0.4 0.0025 0.001 Target: 1.0 Going forward (Regularization, Dropout, Batch Normalization, ) More on Dropout: http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf

Train Feed-Forward Network - ) w + I +. f 2.5 4.1 3.0 0.003 0.3 0.3 0.5 0.4 0.0025.12.18.15.10 0.001 Target: 1.0 Going forward (Regularization, Dropout, Batch Normalization, )

Train Feed-Forward Network - ) w + I +. f 2.5 4.1 3.0 0.003 0.3 0.3 0.5 0.4 0.0025.12.18.15.10 0.001 0.55 Target: 1.0 Loss Function: T O Loss: 0.45 More on Loss Function: https://arxiv.org/abs/1702.05659

Train Feed-Forward Network - ) w + I +. f 2.5 4.1 3.0 0.003 0.3 0.3 0.5 0.4 0.0025.12.18.15.10 0.001 0.55 Error: 0.45 Updating the weights using Backpropagation E w A + More on Backpropagation: http://cs231n.github.io/optimization-2/ http://arxiv.org/abs/1502.05767

Train Feed-Forward Network Σ f x y Σ z x f L z z Σ f z y Backpropagation

Train Feed-Forward Network - ) w + I +. f 2.5 4.1 3.0 0.0027 0.003 0.3.12 0.3.18 0.5.15 0.4 0.0025.10 0.0021 0.001 0.0015 0.55 By Choosing the Optimizer, new weights will be computed w GHI = w η E w A quick explanation on Optimizers: Optimizers

Now Let s Code!

Design the network Architecture Sequential Model Add layers Define all operations Define the output layer Instantiate an object from Keras.model.Sequential class. All information about your network such as weights, layers, operations will be stored in this object.

Design the network Architecture Sequential Model Add layers Define all operations Define the output layer The magic add method, makes your life easier! You can create any layer of your choice in the network using model.add(layer_name). This method will preserve the order of layers you add.

Design the network Architecture Sequential Model Add layers Define operations per layer Define the output layer There are lots of layers implemented in keras. When you add a layer to you model, a gradient operation will be created in the background and it will take care of computing the backward gradient automatically!

Add layers Design the network Architecture Sequential Model Define all operations Define the output layer In this example we use Dense layer, which is the basic feed forward fully connected layer. All operations of a layer can be passed as args to the Dense Object. 1. Number of hidden units 2. Activation function 3. Bias 4. Weight/bias initialization 5. Weight/bias regularization 6. Activation regularization

Activation Function sigmoid: f x = 1 1 + e QR The output is not zero centered! Exp() is a relatively expensive operation. Small active region

Activation Function tanh: f x = tanh (x) Out put is zero centered (good) Exp() is an expensive operation Smaller active region!

Activation Function Sigmoid() is not a good choices! Think about the backward path when all then inputs to a neuron are positive numbers.

Activation Function ReLU Rectifier Linear Unit : f x = ^0, x < 0 x, x 0 A neuron will be active during the backprop if the out put was a positive number.

Activation Function Other activation functions

Weight initialization Initialize all the Ws at Zero! Small random numbers Xavier Initialization Other Ideas

Weight initialization What happen when you initialize all the weights at Zero? Hint: Think about backpropagation

Weight initialization Small random numbers? 0.001*np.random.randn(M, N) Not good for deep networks! What happened during the backpropagation?

Weight initialization Xavier Initialization [Glorot et al., 2010] np.random.randn(fan_in, fan_out) / np.sqrt(fan_in) Fixes the problem of small random numbers with linear activation.

Weight initialization You can use other available initializations e.g. [He et al., 2015] Or you can write your own initialization. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks Random walk initialization for training very deep feedforward networks Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification Data-dependent Initializations of Convolutional Neural Networks All you need is a good init

Regularization L1 Regularization - loss += λ() w +. ) L2 Regularization - d loss += λ() w + ). Sanity Check: your loss should become larger when you use regularization.

Dropout keras.layers.dropout(rate) Randomly setting a fraction rate of input units to 0 at each update during training time. More More about about dropout: Dropout: Dropout

Add layers Design the network Architecture Sequential Model Define all operations Define the output layer Based on the task of prediction, you need to define your output layer properly. For classification task on MNIST dataset, we have ten possible classes, so it s a multiclass classification. So it s better to use softmax activation for a ten unit output layer.

Compile your model model.compile(loss, optimizer, metrics) Losses: mean_squared_error, squared_hinge, categorical_hinge, binary_crossentropy, Optimizers: SGD, Adam, RMSprop,... Metrics: binary_accuracy, categorical_accuracy, top_k_categorical_accuracy

Loss Function Categorical crossentropy is the appropriate loss function for the softmax output For linear outputs use mean_squared_error For logisticoutputs use binomial crossentropy Some notes on the math behind the loss: Peter's note

Optimizers

Optimizers

Optimizers

Train the model History = model.fit(x_train, Y_train, batch_size, ) Batch_size Epoches Validation split, Validation data Callbacks Shuffle Class_weights

Train the model History = model.fit(x_train, Y_train, batch_size, ) History Object Attribute History.history sotres training loss, validation loss, metrics, validation metrics values at successive epochs in a python dictionary.

Evaluate the performance model.evaluate(x_test, Y_test, verbose) Returns a list of score. First element is the loss and the rest are the metrics you specified during the compilation of your model.

Do the prediction! model.predict_classes(data) You can use your trained model to predict the class of a given input.

How can we make it better?

Keras Feed Forward MNIST Recurrent networks, Learning cosine function Convolutional MNIST More on Recurrent neural networks: Anrej Karpathy note Colah's blog

Recurrent Network What is the digit written in the image? What is the next character? Hell_

Recurrent Network What is the digit written in the image? Input: an Image (a matrix of numbers) Output: one of the 10 possible classes What is the next character? Hell_ Input:? Output: one of the 26 possible classes

LSTM Cell h f : σ : Tanh C fqg C f h fqg x f h f

Learn the cosine function by looking at previous inputs.

Recurrent Network, LSTMs Vanilla LSTM Stateful LSTM Wider Window Stacked LSTM

Recurrent Network, LSTMs Vanilla LSTM Stateful LSTM How can we make it better? Wider Window Stacked LSTM

Recurrent Network, LSTMs Vanilla LSTM Stateful LSTM Wider Window Stacked LSTM

Recurrent Network, LSTMs Vanilla LSTM Stateful LSTM How can we make it better? Wider Window Stacked LSTM

Recurrent Network, LSTMs Vanilla LSTM Stateful LSTM Wider Window Stacked LSTM

Recurrent Network, LSTMs Vanilla LSTM Stateful LSTM How can we make it better? Wider Window Stacked LSTM

Recurrent Network, LSTMs Vanilla LSTM Stateful LSTM Wider Window Stacked LSTM

Recurrent Network, LSTMs Vanilla LSTM Stateful LSTM Let s try a feed forward network! Wider Window Stacked LSTM