Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane, 3 middle navigation pane, 4 Not installed, drop-down menu, 6 numpy, 6 7 Python packages, 2 screen, 1 2 TensorFlow (see TensorFlow) ArcTan, 47 Average pooling, 345 B Batch gradient descent, 114 115 Bayes error, 219, 222 Bayesian optimization acquisition function, 298, 301 black-box function, 302 303 Gaussian processes, 291 Nadaraya-Watson regression, 290 prediction with Gaussian processes, 292 298 stationary process, 292 surrogate function, 302 trigonometric function, 300 UCB, 299 Black-box functions acquisition function, 301 309 classes, 273 global optimization, 271 hyperparameters, 273 neural network model, 272 sample problem, 275 276 Boston Standard Metropolitan Statistical Area (SMSA), 59 Broadcasting, 41 C Convolutional neural networks (CNNs) building blocks convolutional layers, 347 348 pooling layers, 349 stacking layers, 349 convolution operation chessboard, 334 341 examples, 332 formal definition, 328 image recognition, 325 matrix formalism, 325 327 Python, 333 334 strides, 328, 330 Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning, https://doi.org/10.1007/978-1-4842-3790-8 403

Convolutional neural networks (CNNs) (cont.) D tensors, 325 visual explanation, 329, 331 cost function, 354 fully connected layer, 353 hyperparameter tuning, 350 kernels and filters, 323 324 mini-batch gradient descent, 354 355 padding, 345 346 pooling, 342 345 ReLU, 353 RGB, 352 TensorFlow, 351 353 Zalando dataset, 350 Dynamic learning rate decay exponential decay, 148 150 gradient descent algorithm, 137 138 inverse time decay, 145 148 iterations/epochs, 139 natural exponential decay, 150 156 staircase decay, 140 142 step decay, 142 145 TensorFlow implementation, 158 161 Zalando dataset, 162 163 E Exponential decay, 148 150 Exponential Linear unit (ELU), 47 F Feedforward neural networks adding layers, 127 130 architecture (see Network architecture) G description, 83 hidden layers, 130 131 network comparison, 131 135 overfitting (see Overfitting) practical example, 89 TensorFlow (see TensorFlow) weight initialization, 125 127 wrong predictions, 123 124 Zalando dataset (see Zalando dataset) Gaussian processes, 291 prediction, 292 298 Gradient descent variations batch, 114 115 cost function mini-batch sizes, 120 running time, 120 121 100 epochs, 119 hyperparameters, 121 mini-batch, 117 119 model() function and parameters, 122 123 SGD, 116 117 H Human-level performance accuracy, 218, 220 Bayes error, 219 definition, 218 219 Karpathy, blog post, 221 222 MNIST dataset, 223 techniques, 220 Hyperbolic tangent function, 41 42 Hyperparameter tuning activation function, 274 Bayesian optimization (see Bayesian optimization) 404

I black-box optimization (see Black-box functions) categories, 275 choice of optimizer, 274 coarse-to-fine optimization, 285 289 grid search, 277 281 layers and neurons, 275 learning rate decay methods, 275 logarithmic scale, 310 312 mini-batch size, 275 number of epochs, 274 radial basis function, 321 322 random search, 282 285 regularization method, 274 weight initialization methods, 275 Zalando dataset (see Zalando dataset) Identity function, 38 39 Inverse time decay, 145 148 J Jupyter Notebook description, 11 documentation, 11, 13 empty page, 13 New button, 12 open with, 11 12 K K-fold cross-validation Adam optimizer, 259 arrays, 256 balanced dataset, 257 libraries, 255 logistic regression, 255, 258 L MNIST dataset, 254 255 normalize data, 257 observations, 255 pseudo-code, 254, 259 260 sklearn, 254, 256, 262 standard deviation, 262 train set and dev set, accuracy values, 261 262 Xinputfold and yinputfold, 256 Leaky ReLU, 45 LeNet-5 network, 349 350 Linear regression cost function, 69 dataset, 59, 61 62 features and observations, 58 neuron and cost function Boston dataset, 67 identity function, 63 64 learning rate, 64 65 MSE, 62 number of observations, 63 output of command, 66 predicted target value vs. measured target value, 67, 68 TensorFlow code, 62 numpy, 57 observations, 57 optimizing metric, 69 satisficing metric, 69 single number evaluation metric, 68 vectors and matrices, 58 Logistic regression activation function, 71 computational graph construction, Python code, 391 392 405

Logistic regression (cont.) cost function, 70 71 dataset, 71 75 dataset preparation, 398 399 gradient descent algorithm, 395 iterations, 400 MNIST dataset, 391, 398 prediction, 392 Python implementation, 395 398 sigmoid activation function, 392 TensorFlow, 395 weights and bias, cost function, 392 394 Long short-term memory (LSTM), 364 Lorentzian function, 370 l p norm, 192 l 1 regularization cost function, 206 percentage of weights less than 1e-3, 207 208 TensorFlow implementation, 206, 207 weights vs. epochs, 208 210 l 2 regularization cost function, 192 gradient descent algorithm, 193 TensorFlow implementation cost function, 194, 202 decision boundary, 202 205 effects of, 201 lambda, 195 number of learnable parameters, 198 overfitting regime, 196 197 percentage of weights less than 1e-3, 199 training and dev datasets, 196, 200 weights distribution, 197 406 M Manual metric analysis accuracy, 263 characteristics of data, 267 one-dimensional array, gray values, 263 267 trained network, 268 269 Metric analysis bias, 223 224 datasets arrays, 247 build the model, 249 MAD diagram, 252 matrices, 247 MNIST, 246 observations, 246, 248 professional DSLR and smartphone, 245 random image and shifted version, 248 single neuron, 249 sources, 246 techniques, data mismatch, 253 training and dev, 251 train the model, 249 250 Xtrain, Xdev, and Xtraindev, 250 dataset splitting dev and test datasets, 230, 232 MNIST dataset, 231 observations, 230, 233 234 training and dev datasets, 233 description, 217 error analysis, 217 human-level performance (see Human-level performance) MAD, 225, 227 precision, recall, and F1 metrics, 239 244 test set, 228 229

training set overfitting, 225 227 unbalanced class distribution (see Unbalanced class distribution) Metric analysis diagram (MAD), 225, 227, 251 252 Mini-batch gradient descent, 117 120 N Nadaraya-Watson regression, 290 Natural exponential decay, 150 156 Network architecture bias matrix, 87 generic network, 85 86 graphical representation, 84 85 hyperparameters, 90 input and output layers, 84 matrix dimensions, 88 output of neurons, 87 88 softmax function, 84, 90 91 weight matrix, 86 Neuron activation functions ArcTan, 47 ELU, 47 identity, 38 39 Leaky ReLU, 45 ReLU, 42 44 sigmoid, 39 41 Softplus, 47 Swish, 46 tanh (hyperbolic tangent), 41 42 computational graph, 33 cost function and gradient descent, 47 50 gradient descent optimization, 31 learning rate cost function vs. number of iterations, 55 56 O cost functions, 50, 52 gradient descent algorithm, 51, 53 55 linear regression (see Linear regression) logistic regression (see Logistic regression) loops and numpy, 36 37 matrix notation, 35 36 representation, 34 35 structure, 31 35 TensorFlow implementation, 75 80 Optimizers Adam, 175 177 exponentially weighted averages, 163 167 momentum cost function vs. number of epochs, 170 3D surface plot, cost function, 171 exponentially weighted averages, 168 gradient descent, 167 path, 172 TensorFlow, 169 RMSProp, 172 175 self-developed, 179 182, 184 Zalando dataset, 178 Optimizing metric, 69 Overfitting bias and variance, 97 98 curve_fit function, 92 data, 93 94 21-degree polynomial, 95 96 error analysis, 99 100 linear model, 94, 96 97 mean square error, 92 407

Overfitting (cont.) numpy array, 93 parameters, 92 second-degree polynomial, 93 two-degree polynomial, 94 95 two-dimensional points, 92 P, Q Padding, 345 346 Pooling, 342 345, 349 R Radial basis function (RBF), 290, 321 322 Rectified Linear Unit (ReLU), 42 44 Recurrent neural networks (RNNs) chatbots, 356 description, 355 fully connected networks, 359, 364 generating image labels, 356 generating text, 356 internal memory state, 358 359 LSTM, 364 metric analysis, 360 MNIST dataset, 360 notation, 357 358 ReLU, 359 schematic representation, 358 speech recognition, 356 target variables, 362 TensorFlow, 360 training and dev sets, 360, 362 translation, 356 Regularization complex networks Adam optimizer, 188 Boston housing price dataset, 185 error analysis, 189 MSE, training and dev dataset, 188 189 packages, 185 ReLU activation functions, 187 target numpy array, 186 training and dev dataset, 186 definition, 190 191 dropout construction code, 212 cost function, 213 keep_prob parameter, 211 predictions, dev dataset, 211 training and dev datasets, MSE, 214 training phase, 211 l p norm, 192 methods, 216 network complexity, 191 overfitting, 189 190, 216 training and dev datasets, MSE, 215 Research project dataset preparation angular frequencies, 381 383 data frames, 379 380 file records, 378 interpolation functions, 382 mathematical function, 383 neural networks, 375 nonlinear fitting, 380 official documentation page, 380 random examples, 383 temperature and oxygen concentration, 375 377 training dataset, 382 gas concentration, 365 luminescence quenching, 366 368 mathematical models, 369 model training 408

absolute error, oxygen concentration, 388 Adam optimizer, 385 cost function vs. epochs, 386, 387 mini-batches of size, 385 neurons, 384 predicted value for O2 vs. measured value, 387, 388 sigmoid activation function, 385 regression problem cost function, 373 dev dataset, 371, 374 375 Lorentzian function, 370 mini-batch gradient descent, 373 neural network, 369 observations, 370 371 predicted vs. real values, 374 random examples, functions, 371 372 random value, 371 simple network, 372 training dataset, 370 sensor devices, 365 RMSProp, 172 175 S Satisficing metric, 69 Self-developed optimizer, 179 182, 184 Sigmoid function, 39 41 Single number evaluation metric, 68 softmax function, 84, 90 91 Softplus, 47 Staircase decay, 140 142 Stationary process, 292 Step decay, 142 145 Stochastic gradient descent (SGD), 116 117, 119, 139 Swish activation function, 46 T TensorFlow build model, 110 113 computational graphs assigning values, 16 build and evaluate, nodes, 26 27 create and close, session, 27 28 input quantities, 15 16 neural network, 16 run and evaluate, 25 26 sum of two tensors, 19 sum of two variables, 15 tf.constant, 19 20 tf.placeholder, 22 25 tf.variable, 20 21 variables, 14 installation, 9 11 linear regression (see Linear regression) network architecture hidden layer, 106 softmax function, 106 107 tf.nn.softmax(), 108 one-hot encoding, 108 110 tensors, 17 18 Training set overfitting, 225 227 U, V, W, X, Y Unbalanced class distribution, 234 accuracy, 237 change metric, 239 logistic regression, 235 matrix for labels, 238 MNIST dataset, 235 observations, 239 oversampled dataset, 239 run the model, 236 409

Unbalanced class distribution (cont.) single neuron, 236 training and dev dataset, 235 undersampled dataset, 239 Upper confidence bound (UCB), 299 Z Zalando dataset, 162 163, 178 classes, 102 CSV files, 103 data_train.head(), 104 data_train[ label ], 104 hyperparameter tuning accuaracy, train and test datasets, 318 build_model(number_neurons), 314 316 cost tensor, 315 CSV files, 313 data_train array, 313 dev dataset, 314 functions, 314 grid search, 317 libraries, 313 numpy array, 314 random search, 319 320 run the model, 317 test dataset vs. number of neurons, 319 kaggle, 100, 103 MIT License, 103 MNIST, 100 NumPy functions, 103 tensor labels, 105 training and test sample, 101 410