Resampling methods (Ch. 5 Intro)

Size: px

Start display at page:

Download "Resampling methods (Ch. 5 Intro)"

Esther Williams
5 years ago
Views:

1 Zavádějící faktor (Confounding factor), ale i 'současně působící faktor' Resampling methods (Ch. 5 Intro) Key terms: Train/Validation/Test data Crossvalitation One-leave-out = LOOCV Bootstrup key slides for the exam are denoted:! ISL, ESL books + a paper 1

2 Model Assessment and Selection (ch. 7 Elements of SL) We aim for generalization on independent test data. Assessment of model goodness requires many choices: distance measure between true and predicted value the size of train/test data or indirect method of error estimation crossvalidation, bootstrap analytical scores AIC, BIC,.... 2

3 Error Measure for Regression most common squared error measure absolute error less common for practical reasons reasonable choice otherwise. Hubber error loss combines good properties of both. 3

4 Error Measure for Classification prediction error (0-1 error) log-likehood (=-log likelihood, cross-entropy, deviance) Asymptotically, leads to corect probability estimated, not only most probable class. 4

5 Error Measures for Classification 0-1 gray LL orange SVM green 5

6 Fix??? Train Dataset 6

7 Training Error x Generalization Err. Training error is defined generalization error is expected error on an independent test sample. Error on test data is a direct estimate of Err.! 7

8 Model Assessement What do we want to assess? one domain x more domains (always one dom.) model, classificator x algorithm accuracy x compare models / algorithms do we have enough data yes / no. 8

9 Model Accuracy, Enough Data We split the dataset into two-three pieces: train: train the model validation (for some algorithms: prunning the tree, selection of the degree of the polynomial,...) test: for the error estimate. Recommended ratio:! without validation set: train:test 2:1 with validation set: 2:1:1 (artificial data: test set 'almost infinite size'). 9

10 Stratified Sampling We have seen Default dataset, only 3.33% Default clients. The train/test data split may not preserve this ratio. We may perform the split in two steps: split Default=Yes and Defalt=No data, in each group, split Train/Test data. Then the Default ratio in Train and Test data is preserved. Simillary: male/female, /spam, disease. 10

11 Other Spit Consider a company with many branch offices. We may model: overall data: standard train/test split as test data, select one or serveral branches more appropriate for a new branche estimate. Robot position estimation: for error estimation, use FUTURE position you may consider time lack between train and test data. (Forward chaining) 11

12 Algorithm Accuracy, Enough data This never happens. Select one test data, M train datasets for each run of the algorithm Calculate average test error over M models. 12

13 Model Assessement, few data assess the algorithm train the classifier report algorithm error as an estimated error. Algorithm Assessment, few data that is what we usually do Crossvalidation CV One Leave Out LOOCV Bootstrup 13

Crossvalidation 2 x 5 Repeat 2 times: Split data to 5 equally sized folds for each part 1:5 keep this part for error estimate train a model on remaining (5-1) folds calculate the

14 Crossvalidation 2 x 5 Repeat 2 times: Split data to 5 equally sized folds for each part 1:5 keep this part for error estimate train a model on remaining (5-1) folds calculate the error of the model on the hidden fold calculate average error over 2x5 estimates. Train the model on full data, report error estimated by 2x5 crossvalidation. Often used 10x10 CV.! 14

15 Code Example glm.fit=glm(mpg~horsepower,data=auto) cv.error=rep(0,5) for (i in 1:5){ glm.fit=glm(mpg~poly(horsepower,i),data=auto) cv.error[i]=cv.glm(auto,glm.fit,k=10)$delta[1] } cv.error cost <- function(r, pi = 0) mean((r-pi)^2) cost1 <- function(r, pi=0) mean(abs(r-pi) > 0.5) cost2 <- function(r, pi = 0.5) -2*sum(r* log(pi,2) + (1-r)*log((1-pi),2)) 15

16 Crossvalidation for Model Selection Model complexity is often choosen by CV. Degree of polynomial in regression, k in k-nn regression or classification, the size of a decision tree and many others. 16

17 Experiment CV evaluation true test error - orange train error blue 1x10 CV black 17

18 One-leave-out Error Estimate 1xn CV, n is the size of the data It is deterministic. Train set as large as possible. May be computationally intensive. Sometime fails: Imagine random goal 25 'Yes', 25 'No'. Train a model. Estimate error true, One-leave-out.! 18

19 Comparison One-leave-out, CV usually both methods give reasonable results. true error blue one-l-o dashed CV - orange 19

20 (In)proper use of CV Let us have n=50 data samples, the goal variable is 25 'Yes', 25 'No' 500 features normally distributed N(0,1) Independent from the goal. Build a model. Select 'good' predictors (either p-value in logist. regression, or corr. with G) Build a model using just this predictors. Estimate error by CV. This CV estimates 3% error rate! What is wrong? 20

21 Proper use of CV Divide the data really at the beginning 1. Divide the data into K cross-val. folds at random. 2.For each fold k=1:k find good predictors using all samples except fold k based on this predictors, build a classifier using all samples except fold k use the classifier to predict the class labels for samples in fold k. 3.Build a model on full data and report average CV error (50% here). 21

22 Bootstrap more data samples size n reflects estimation variance. 22

23 Bootstrap For 1:b required number of datasets for i=1:n (number of original data samples) select data sample at random (with replacement) Train the model on this bootstrup sample Some original data were not selected (approx. 37%) they will be used for error estimate Estimate the error:! for each original data sample i=1:n calculate average error on models where it was not used for training average the averages from different samples. 23

24 Bootstrap Example estimation α independent samples: orange bootstrup samples: blue 24

25 Bootstrap Error Estimates Probability not being in train set: The easiest error estimate: True data size correction: err train error overfitting weight considered: 25

26 Relative Overfitting Rate Let us assume inputs and response independent. The average error is: We compare bootstrup error to this; Relative overfitting rate is and settle the weight: 26

27 Mc Nemars's Test Model Selection Important if err A is less then 5! Usually is done (less specific) estimate errors using CV or bootstrup test the hypothesis the estimates differ 27

28 Algorithm Comparison Paired t-test more specific than usual t-test on each data sample measure the difference then test null hypothesis: 28

29 Analytical Criteria We denote d is the number of model parameters, then: Akaike Information Criterion Bayesian Information Criterion (corresponds to minimal description lenght) 29

30 30

31 31

32 32

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error