Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Size: px

Start display at page:

Download "Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,"

Erik Fleming
5 years ago
Views:

1 A Acquisition function, 298, 301 Adam optimizer, Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, left navigation pane, 3 middle navigation pane, 4 Not installed, drop-down menu, 6 numpy, 6 7 Python packages, 2 screen, 1 2 TensorFlow (see TensorFlow) ArcTan, 47 Average pooling, 345 B Batch gradient descent, Bayes error, 219, 222 Bayesian optimization acquisition function, 298, 301 black-box function, Gaussian processes, 291 Nadaraya-Watson regression, 290 prediction with Gaussian processes, stationary process, 292 surrogate function, 302 trigonometric function, 300 UCB, 299 Black-box functions acquisition function, classes, 273 global optimization, 271 hyperparameters, 273 neural network model, 272 sample problem, Boston Standard Metropolitan Statistical Area (SMSA), 59 Broadcasting, 41 C Convolutional neural networks (CNNs) building blocks convolutional layers, pooling layers, 349 stacking layers, 349 convolution operation chessboard, examples, 332 formal definition, 328 image recognition, 325 matrix formalism, Python, strides, 328, 330 Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning, 403

2 Convolutional neural networks (CNNs) (cont.) D tensors, 325 visual explanation, 329, 331 cost function, 354 fully connected layer, 353 hyperparameter tuning, 350 kernels and filters, mini-batch gradient descent, padding, pooling, ReLU, 353 RGB, 352 TensorFlow, Zalando dataset, 350 Dynamic learning rate decay exponential decay, gradient descent algorithm, inverse time decay, iterations/epochs, 139 natural exponential decay, staircase decay, step decay, TensorFlow implementation, Zalando dataset, E Exponential decay, Exponential Linear unit (ELU), 47 F Feedforward neural networks adding layers, architecture (see Network architecture) G description, 83 hidden layers, network comparison, overfitting (see Overfitting) practical example, 89 TensorFlow (see TensorFlow) weight initialization, wrong predictions, Zalando dataset (see Zalando dataset) Gaussian processes, 291 prediction, Gradient descent variations batch, cost function mini-batch sizes, 120 running time, epochs, 119 hyperparameters, 121 mini-batch, model() function and parameters, SGD, H Human-level performance accuracy, 218, 220 Bayes error, 219 definition, Karpathy, blog post, MNIST dataset, 223 techniques, 220 Hyperbolic tangent function, Hyperparameter tuning activation function, 274 Bayesian optimization (see Bayesian optimization) 404

3 I black-box optimization (see Black-box functions) categories, 275 choice of optimizer, 274 coarse-to-fine optimization, grid search, layers and neurons, 275 learning rate decay methods, 275 logarithmic scale, mini-batch size, 275 number of epochs, 274 radial basis function, random search, regularization method, 274 weight initialization methods, 275 Zalando dataset (see Zalando dataset) Identity function, Inverse time decay, J Jupyter Notebook description, 11 documentation, 11, 13 empty page, 13 New button, 12 open with, K K-fold cross-validation Adam optimizer, 259 arrays, 256 balanced dataset, 257 libraries, 255 logistic regression, 255, 258 L MNIST dataset, normalize data, 257 observations, 255 pseudo-code, 254, sklearn, 254, 256, 262 standard deviation, 262 train set and dev set, accuracy values, Xinputfold and yinputfold, 256 Leaky ReLU, 45 LeNet-5 network, Linear regression cost function, 69 dataset, 59, features and observations, 58 neuron and cost function Boston dataset, 67 identity function, learning rate, MSE, 62 number of observations, 63 output of command, 66 predicted target value vs. measured target value, 67, 68 TensorFlow code, 62 numpy, 57 observations, 57 optimizing metric, 69 satisficing metric, 69 single number evaluation metric, 68 vectors and matrices, 58 Logistic regression activation function, 71 computational graph construction, Python code,

4 Logistic regression (cont.) cost function, dataset, dataset preparation, gradient descent algorithm, 395 iterations, 400 MNIST dataset, 391, 398 prediction, 392 Python implementation, sigmoid activation function, 392 TensorFlow, 395 weights and bias, cost function, Long short-term memory (LSTM), 364 Lorentzian function, 370 l p norm, 192 l 1 regularization cost function, 206 percentage of weights less than 1e-3, TensorFlow implementation, 206, 207 weights vs. epochs, l 2 regularization cost function, 192 gradient descent algorithm, 193 TensorFlow implementation cost function, 194, 202 decision boundary, effects of, 201 lambda, 195 number of learnable parameters, 198 overfitting regime, percentage of weights less than 1e-3, 199 training and dev datasets, 196, 200 weights distribution, M Manual metric analysis accuracy, 263 characteristics of data, 267 one-dimensional array, gray values, trained network, Metric analysis bias, datasets arrays, 247 build the model, 249 MAD diagram, 252 matrices, 247 MNIST, 246 observations, 246, 248 professional DSLR and smartphone, 245 random image and shifted version, 248 single neuron, 249 sources, 246 techniques, data mismatch, 253 training and dev, 251 train the model, Xtrain, Xdev, and Xtraindev, 250 dataset splitting dev and test datasets, 230, 232 MNIST dataset, 231 observations, 230, training and dev datasets, 233 description, 217 error analysis, 217 human-level performance (see Human-level performance) MAD, 225, 227 precision, recall, and F1 metrics, test set,

5 training set overfitting, unbalanced class distribution (see Unbalanced class distribution) Metric analysis diagram (MAD), 225, 227, Mini-batch gradient descent, N Nadaraya-Watson regression, 290 Natural exponential decay, Network architecture bias matrix, 87 generic network, graphical representation, hyperparameters, 90 input and output layers, 84 matrix dimensions, 88 output of neurons, softmax function, 84, weight matrix, 86 Neuron activation functions ArcTan, 47 ELU, 47 identity, Leaky ReLU, 45 ReLU, sigmoid, Softplus, 47 Swish, 46 tanh (hyperbolic tangent), computational graph, 33 cost function and gradient descent, gradient descent optimization, 31 learning rate cost function vs. number of iterations, O cost functions, 50, 52 gradient descent algorithm, 51, linear regression (see Linear regression) logistic regression (see Logistic regression) loops and numpy, matrix notation, representation, structure, TensorFlow implementation, Optimizers Adam, exponentially weighted averages, momentum cost function vs. number of epochs, 170 3D surface plot, cost function, 171 exponentially weighted averages, 168 gradient descent, 167 path, 172 TensorFlow, 169 RMSProp, self-developed, , 184 Zalando dataset, 178 Optimizing metric, 69 Overfitting bias and variance, curve_fit function, 92 data, degree polynomial, error analysis, linear model, 94, mean square error,

6 Overfitting (cont.) numpy array, 93 parameters, 92 second-degree polynomial, 93 two-degree polynomial, two-dimensional points, 92 P, Q Padding, Pooling, , 349 R Radial basis function (RBF), 290, Rectified Linear Unit (ReLU), Recurrent neural networks (RNNs) chatbots, 356 description, 355 fully connected networks, 359, 364 generating image labels, 356 generating text, 356 internal memory state, LSTM, 364 metric analysis, 360 MNIST dataset, 360 notation, ReLU, 359 schematic representation, 358 speech recognition, 356 target variables, 362 TensorFlow, 360 training and dev sets, 360, 362 translation, 356 Regularization complex networks Adam optimizer, 188 Boston housing price dataset, 185 error analysis, 189 MSE, training and dev dataset, packages, 185 ReLU activation functions, 187 target numpy array, 186 training and dev dataset, 186 definition, dropout construction code, 212 cost function, 213 keep_prob parameter, 211 predictions, dev dataset, 211 training and dev datasets, MSE, 214 training phase, 211 l p norm, 192 methods, 216 network complexity, 191 overfitting, , 216 training and dev datasets, MSE, 215 Research project dataset preparation angular frequencies, data frames, file records, 378 interpolation functions, 382 mathematical function, 383 neural networks, 375 nonlinear fitting, 380 official documentation page, 380 random examples, 383 temperature and oxygen concentration, training dataset, 382 gas concentration, 365 luminescence quenching, mathematical models, 369 model training 408

7 absolute error, oxygen concentration, 388 Adam optimizer, 385 cost function vs. epochs, 386, 387 mini-batches of size, 385 neurons, 384 predicted value for O2 vs. measured value, 387, 388 sigmoid activation function, 385 regression problem cost function, 373 dev dataset, 371, Lorentzian function, 370 mini-batch gradient descent, 373 neural network, 369 observations, predicted vs. real values, 374 random examples, functions, random value, 371 simple network, 372 training dataset, 370 sensor devices, 365 RMSProp, S Satisficing metric, 69 Self-developed optimizer, , 184 Sigmoid function, Single number evaluation metric, 68 softmax function, 84, Softplus, 47 Staircase decay, Stationary process, 292 Step decay, Stochastic gradient descent (SGD), , 119, 139 Swish activation function, 46 T TensorFlow build model, computational graphs assigning values, 16 build and evaluate, nodes, create and close, session, input quantities, neural network, 16 run and evaluate, sum of two tensors, 19 sum of two variables, 15 tf.constant, tf.placeholder, tf.variable, variables, 14 installation, 9 11 linear regression (see Linear regression) network architecture hidden layer, 106 softmax function, tf.nn.softmax(), 108 one-hot encoding, tensors, Training set overfitting, U, V, W, X, Y Unbalanced class distribution, 234 accuracy, 237 change metric, 239 logistic regression, 235 matrix for labels, 238 MNIST dataset, 235 observations, 239 oversampled dataset, 239 run the model,

8 Unbalanced class distribution (cont.) single neuron, 236 training and dev dataset, 235 undersampled dataset, 239 Upper confidence bound (UCB), 299 Z Zalando dataset, , 178 classes, 102 CSV files, 103 data_train.head(), 104 data_train[ label ], 104 hyperparameter tuning accuaracy, train and test datasets, 318 build_model(number_neurons), cost tensor, 315 CSV files, 313 data_train array, 313 dev dataset, 314 functions, 314 grid search, 317 libraries, 313 numpy array, 314 random search, run the model, 317 test dataset vs. number of neurons, 319 kaggle, 100, 103 MIT License, 103 MNIST, 100 NumPy functions, 103 tensor labels, 105 training and test sample,

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30