Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Size: px

Start display at page:

Download "Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow"

June Spencer
6 years ago
Views:

1 Practical Methodology Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

2 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains of data? Knowing how to apply 3-4 standard techniques? h 1 (2) h 2 (2) h 3 (2) h 1 (1) h 2 (1) h 3 (1) h 4 (1) v 1 v 2 v 3

3 Example: Street View Address Number Transcription (Goodfellow et al, 2014)

4 Three Step Process Use needs to define metric-based goals Build an end-to-end system Data-driven refinement

5 Identify Needs High accuracy or low accuracy? Surgery robot: high accuracy Celebrity look-a-like app: low accuracy

6 Choose Metrics Accuracy? (% of examples correct) Coverage? (% of examples processed) Precision? (% of detections that are right) Recall? (% of objects detected) Amount of error? (For regression problems)

7 End-to-end System Get up and running ASAP Build the simplest viable system first What baseline to start with though? Copy state-of-the-art from related publication

8 Deep or Not? Lots of noise, little structure -> not deep Little noise, complex structure -> deep Good shallow baseline: Use what you know Logistic regression, SVM, boosted tree are all good

9 Choosing Architecture Family No structure -> fully connected Spatial structure -> convolutional Sequential structure -> recurrent

10 Fully Connected Baseline 2-3 hidden layer feed-forward neural network ptron AKA multilayer perceptron Rectified linear units Batch normalization V W Adam Maybe dropout

11 Convolutional Network Baseline Download a pretrained network Or copy-paste an architecture from a related task Or: Deep residual network Batch normalization al baselin Adam nal net

12 Recurrent baseline Recurrent Network Baseline output LSTM SGD self-loop + state output gate Gradient clipping High forget gate bias input forget gate input gate

13 Data-driven Adaptation Choose what to do based on data Don t believe hype Measure train and test error Overfitting versus underfitting

14 High Train Error Inspect data for defects Inspect software for bugs Don t roll your own unless you know what you re doing Tune learning rate (and other optimization settings) Make model bigger

15 Checking Data for Defects Can a human process it? 26624

16 Increasing Depth 96.5 Effect of Depth Test accuracy (%) Number of hidden layers

17 High Test Error Add dataset augmentation Add dropout Collect more data

18 Increasing Training Set Size Error (MSE) Bayes error Train (quadratic) Test (quadratic) Test (optimal capacity) Train (optimal capacity) Optimal capacity (polynomial degree) # train examples # train examples

19 Tuning the Learning Rate Training error Learning rate (logarithmic scale) Figure 11.1

20 Reasoning about Hyperparameters Hyperparameter Number of hidden units Increases capacity when... increased Reason Increasing the number of hidden units increases the representational capacity of the model. Learning rate tuned op- An improper learning rate, Caveats Increasing the number of hidden units increases both the time and memory cost of essentially every operation on the model. Table 11.1

21 Hyperparameter Search Grid Random Figure 11.2

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30