Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Similar documents
Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Deep Learning with Tensorflow AlexNet

Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

CS489/698: Intro to ML

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

A Quick Guide on Training a neural network using Keras.

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Tutorial on Machine Learning Tools

The Mathematics Behind Neural Networks

MoonRiver: Deep Neural Network in C++

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44

CNN Basics. Chongruo Wu

Efficient Deep Learning Optimization Methods

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm

Deep Learning for Computer Vision II

Perceptron: This is convolution!

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018

Deep Neural Networks Optimization

Machine Learning 13. week

INTRODUCTION TO DEEP LEARNING

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Facial Expression Classification with Random Filters Feature Extraction

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Classifying Depositional Environments in Satellite Images

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Deep Learning and Its Applications

Practical 8: Neural networks

Multinomial Regression and the Softmax Activation Function. Gary Cottrell!

ECE 5470 Classification, Machine Learning, and Neural Network Review

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Deep Learning Cook Book

Machine Learning and Algorithms for Data Mining Practical 2: Neural Networks

If you installed VM and Linux libraries as in the tutorial, you should not get any errors. Otherwise, you may need to install wget or gunzip.

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

CSC 578 Neural Networks and Deep Learning

Deep Learning Applications

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

An Introduction to Deep Learning with RapidMiner. Philipp Schlunder - RapidMiner Research

Machine Learning. MGS Lecture 3: Deep Learning

CS230: Deep Learning Winter Quarter 2018 Stanford University

All You Want To Know About CNNs. Yukun Zhu

Convolutional Neural Networks

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

Deep Learning. Architecture Design for. Sargur N. Srihari

Plankton Classification Using ConvNets

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Advanced Machine Learning

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent

M. Sc. (Artificial Intelligence and Machine Learning)

11. Neural Network Regularization

Neural Networks with Input Specified Thresholds

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs

An Introduction to NNs using Keras

Convolutional Neural Networks

Fusion of Mini-Deep Nets

The exam is closed book, closed notes except your one-page cheat sheet.

Jersey Number Recognition using Convolutional Neural Networks

Hidden Units. Sargur N. Srihari

R for SQListas, a Continuation

Multi-layer Perceptron Forward Pass Backpropagation. Lecture 11: Aykut Erdem November 2016 Hacettepe University

Package automl. September 13, 2018

Lecture : Training a neural net part I Initialization, activations, normalizations and other practical details Anne Solberg February 28, 2018

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

Deep Learning. Volker Tresp Summer 2014

Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks

Homework 01 : Deep learning Tutorial

Recurrent Neural Network (RNN) Industrial AI Lab.

Lecture 37: ConvNets (Cont d) and Training

Convolution Neural Networks for Chinese Handwriting Recognition

Advanced Video Analysis & Imaging

Applying Supervised Learning

Recurrent Convolutional Neural Networks for Scene Labeling

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Vulnerability of machine learning models to adversarial examples

Vulnerability of machine learning models to adversarial examples

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Deep Learning With Noise

IST 597 Foundations of Deep Learning Fall 2018 Homework 1: Regression & Gradient Descent

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

Knowledge Discovery and Data Mining

Weiguang Guan Code & data: guanw.sharcnet.ca/ss2017-deeplearning.tar.gz

Neural Bag-of-Features Learning

Deep neural networks II

Hello Edge: Keyword Spotting on Microcontrollers

Transcription:

A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane, 3 middle navigation pane, 4 Not installed, drop-down menu, 6 numpy, 6 7 Python packages, 2 screen, 1 2 TensorFlow (see TensorFlow) ArcTan, 47 Average pooling, 345 B Batch gradient descent, 114 115 Bayes error, 219, 222 Bayesian optimization acquisition function, 298, 301 black-box function, 302 303 Gaussian processes, 291 Nadaraya-Watson regression, 290 prediction with Gaussian processes, 292 298 stationary process, 292 surrogate function, 302 trigonometric function, 300 UCB, 299 Black-box functions acquisition function, 301 309 classes, 273 global optimization, 271 hyperparameters, 273 neural network model, 272 sample problem, 275 276 Boston Standard Metropolitan Statistical Area (SMSA), 59 Broadcasting, 41 C Convolutional neural networks (CNNs) building blocks convolutional layers, 347 348 pooling layers, 349 stacking layers, 349 convolution operation chessboard, 334 341 examples, 332 formal definition, 328 image recognition, 325 matrix formalism, 325 327 Python, 333 334 strides, 328, 330 Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning, https://doi.org/10.1007/978-1-4842-3790-8 403

Convolutional neural networks (CNNs) (cont.) D tensors, 325 visual explanation, 329, 331 cost function, 354 fully connected layer, 353 hyperparameter tuning, 350 kernels and filters, 323 324 mini-batch gradient descent, 354 355 padding, 345 346 pooling, 342 345 ReLU, 353 RGB, 352 TensorFlow, 351 353 Zalando dataset, 350 Dynamic learning rate decay exponential decay, 148 150 gradient descent algorithm, 137 138 inverse time decay, 145 148 iterations/epochs, 139 natural exponential decay, 150 156 staircase decay, 140 142 step decay, 142 145 TensorFlow implementation, 158 161 Zalando dataset, 162 163 E Exponential decay, 148 150 Exponential Linear unit (ELU), 47 F Feedforward neural networks adding layers, 127 130 architecture (see Network architecture) G description, 83 hidden layers, 130 131 network comparison, 131 135 overfitting (see Overfitting) practical example, 89 TensorFlow (see TensorFlow) weight initialization, 125 127 wrong predictions, 123 124 Zalando dataset (see Zalando dataset) Gaussian processes, 291 prediction, 292 298 Gradient descent variations batch, 114 115 cost function mini-batch sizes, 120 running time, 120 121 100 epochs, 119 hyperparameters, 121 mini-batch, 117 119 model() function and parameters, 122 123 SGD, 116 117 H Human-level performance accuracy, 218, 220 Bayes error, 219 definition, 218 219 Karpathy, blog post, 221 222 MNIST dataset, 223 techniques, 220 Hyperbolic tangent function, 41 42 Hyperparameter tuning activation function, 274 Bayesian optimization (see Bayesian optimization) 404

I black-box optimization (see Black-box functions) categories, 275 choice of optimizer, 274 coarse-to-fine optimization, 285 289 grid search, 277 281 layers and neurons, 275 learning rate decay methods, 275 logarithmic scale, 310 312 mini-batch size, 275 number of epochs, 274 radial basis function, 321 322 random search, 282 285 regularization method, 274 weight initialization methods, 275 Zalando dataset (see Zalando dataset) Identity function, 38 39 Inverse time decay, 145 148 J Jupyter Notebook description, 11 documentation, 11, 13 empty page, 13 New button, 12 open with, 11 12 K K-fold cross-validation Adam optimizer, 259 arrays, 256 balanced dataset, 257 libraries, 255 logistic regression, 255, 258 L MNIST dataset, 254 255 normalize data, 257 observations, 255 pseudo-code, 254, 259 260 sklearn, 254, 256, 262 standard deviation, 262 train set and dev set, accuracy values, 261 262 Xinputfold and yinputfold, 256 Leaky ReLU, 45 LeNet-5 network, 349 350 Linear regression cost function, 69 dataset, 59, 61 62 features and observations, 58 neuron and cost function Boston dataset, 67 identity function, 63 64 learning rate, 64 65 MSE, 62 number of observations, 63 output of command, 66 predicted target value vs. measured target value, 67, 68 TensorFlow code, 62 numpy, 57 observations, 57 optimizing metric, 69 satisficing metric, 69 single number evaluation metric, 68 vectors and matrices, 58 Logistic regression activation function, 71 computational graph construction, Python code, 391 392 405

Logistic regression (cont.) cost function, 70 71 dataset, 71 75 dataset preparation, 398 399 gradient descent algorithm, 395 iterations, 400 MNIST dataset, 391, 398 prediction, 392 Python implementation, 395 398 sigmoid activation function, 392 TensorFlow, 395 weights and bias, cost function, 392 394 Long short-term memory (LSTM), 364 Lorentzian function, 370 l p norm, 192 l 1 regularization cost function, 206 percentage of weights less than 1e-3, 207 208 TensorFlow implementation, 206, 207 weights vs. epochs, 208 210 l 2 regularization cost function, 192 gradient descent algorithm, 193 TensorFlow implementation cost function, 194, 202 decision boundary, 202 205 effects of, 201 lambda, 195 number of learnable parameters, 198 overfitting regime, 196 197 percentage of weights less than 1e-3, 199 training and dev datasets, 196, 200 weights distribution, 197 406 M Manual metric analysis accuracy, 263 characteristics of data, 267 one-dimensional array, gray values, 263 267 trained network, 268 269 Metric analysis bias, 223 224 datasets arrays, 247 build the model, 249 MAD diagram, 252 matrices, 247 MNIST, 246 observations, 246, 248 professional DSLR and smartphone, 245 random image and shifted version, 248 single neuron, 249 sources, 246 techniques, data mismatch, 253 training and dev, 251 train the model, 249 250 Xtrain, Xdev, and Xtraindev, 250 dataset splitting dev and test datasets, 230, 232 MNIST dataset, 231 observations, 230, 233 234 training and dev datasets, 233 description, 217 error analysis, 217 human-level performance (see Human-level performance) MAD, 225, 227 precision, recall, and F1 metrics, 239 244 test set, 228 229

training set overfitting, 225 227 unbalanced class distribution (see Unbalanced class distribution) Metric analysis diagram (MAD), 225, 227, 251 252 Mini-batch gradient descent, 117 120 N Nadaraya-Watson regression, 290 Natural exponential decay, 150 156 Network architecture bias matrix, 87 generic network, 85 86 graphical representation, 84 85 hyperparameters, 90 input and output layers, 84 matrix dimensions, 88 output of neurons, 87 88 softmax function, 84, 90 91 weight matrix, 86 Neuron activation functions ArcTan, 47 ELU, 47 identity, 38 39 Leaky ReLU, 45 ReLU, 42 44 sigmoid, 39 41 Softplus, 47 Swish, 46 tanh (hyperbolic tangent), 41 42 computational graph, 33 cost function and gradient descent, 47 50 gradient descent optimization, 31 learning rate cost function vs. number of iterations, 55 56 O cost functions, 50, 52 gradient descent algorithm, 51, 53 55 linear regression (see Linear regression) logistic regression (see Logistic regression) loops and numpy, 36 37 matrix notation, 35 36 representation, 34 35 structure, 31 35 TensorFlow implementation, 75 80 Optimizers Adam, 175 177 exponentially weighted averages, 163 167 momentum cost function vs. number of epochs, 170 3D surface plot, cost function, 171 exponentially weighted averages, 168 gradient descent, 167 path, 172 TensorFlow, 169 RMSProp, 172 175 self-developed, 179 182, 184 Zalando dataset, 178 Optimizing metric, 69 Overfitting bias and variance, 97 98 curve_fit function, 92 data, 93 94 21-degree polynomial, 95 96 error analysis, 99 100 linear model, 94, 96 97 mean square error, 92 407

Overfitting (cont.) numpy array, 93 parameters, 92 second-degree polynomial, 93 two-degree polynomial, 94 95 two-dimensional points, 92 P, Q Padding, 345 346 Pooling, 342 345, 349 R Radial basis function (RBF), 290, 321 322 Rectified Linear Unit (ReLU), 42 44 Recurrent neural networks (RNNs) chatbots, 356 description, 355 fully connected networks, 359, 364 generating image labels, 356 generating text, 356 internal memory state, 358 359 LSTM, 364 metric analysis, 360 MNIST dataset, 360 notation, 357 358 ReLU, 359 schematic representation, 358 speech recognition, 356 target variables, 362 TensorFlow, 360 training and dev sets, 360, 362 translation, 356 Regularization complex networks Adam optimizer, 188 Boston housing price dataset, 185 error analysis, 189 MSE, training and dev dataset, 188 189 packages, 185 ReLU activation functions, 187 target numpy array, 186 training and dev dataset, 186 definition, 190 191 dropout construction code, 212 cost function, 213 keep_prob parameter, 211 predictions, dev dataset, 211 training and dev datasets, MSE, 214 training phase, 211 l p norm, 192 methods, 216 network complexity, 191 overfitting, 189 190, 216 training and dev datasets, MSE, 215 Research project dataset preparation angular frequencies, 381 383 data frames, 379 380 file records, 378 interpolation functions, 382 mathematical function, 383 neural networks, 375 nonlinear fitting, 380 official documentation page, 380 random examples, 383 temperature and oxygen concentration, 375 377 training dataset, 382 gas concentration, 365 luminescence quenching, 366 368 mathematical models, 369 model training 408

absolute error, oxygen concentration, 388 Adam optimizer, 385 cost function vs. epochs, 386, 387 mini-batches of size, 385 neurons, 384 predicted value for O2 vs. measured value, 387, 388 sigmoid activation function, 385 regression problem cost function, 373 dev dataset, 371, 374 375 Lorentzian function, 370 mini-batch gradient descent, 373 neural network, 369 observations, 370 371 predicted vs. real values, 374 random examples, functions, 371 372 random value, 371 simple network, 372 training dataset, 370 sensor devices, 365 RMSProp, 172 175 S Satisficing metric, 69 Self-developed optimizer, 179 182, 184 Sigmoid function, 39 41 Single number evaluation metric, 68 softmax function, 84, 90 91 Softplus, 47 Staircase decay, 140 142 Stationary process, 292 Step decay, 142 145 Stochastic gradient descent (SGD), 116 117, 119, 139 Swish activation function, 46 T TensorFlow build model, 110 113 computational graphs assigning values, 16 build and evaluate, nodes, 26 27 create and close, session, 27 28 input quantities, 15 16 neural network, 16 run and evaluate, 25 26 sum of two tensors, 19 sum of two variables, 15 tf.constant, 19 20 tf.placeholder, 22 25 tf.variable, 20 21 variables, 14 installation, 9 11 linear regression (see Linear regression) network architecture hidden layer, 106 softmax function, 106 107 tf.nn.softmax(), 108 one-hot encoding, 108 110 tensors, 17 18 Training set overfitting, 225 227 U, V, W, X, Y Unbalanced class distribution, 234 accuracy, 237 change metric, 239 logistic regression, 235 matrix for labels, 238 MNIST dataset, 235 observations, 239 oversampled dataset, 239 run the model, 236 409

Unbalanced class distribution (cont.) single neuron, 236 training and dev dataset, 235 undersampled dataset, 239 Upper confidence bound (UCB), 299 Z Zalando dataset, 162 163, 178 classes, 102 CSV files, 103 data_train.head(), 104 data_train[ label ], 104 hyperparameter tuning accuaracy, train and test datasets, 318 build_model(number_neurons), 314 316 cost tensor, 315 CSV files, 313 data_train array, 313 dev dataset, 314 functions, 314 grid search, 317 libraries, 313 numpy array, 314 random search, 319 320 run the model, 317 test dataset vs. number of neurons, 319 kaggle, 100, 103 MIT License, 103 MNIST, 100 NumPy functions, 103 tensor labels, 105 training and test sample, 101 410