2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy

Size: px

Start display at page:

Download "2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy"

Phillip Harper
5 years ago
Views:

1 2017 ITRON EFG Meeting Abdul Razack Specialist, Load Forecasting NV Energy

2 Topics 1. Concepts 2. Model (Variable) Selection Methods 3. Cross- Validation 4. Cross-Validation: Time Series 5. Example 1 6. Example 2 7. Model Fit 8. Summary 9. Appendix 2

3 Concepts Test Error: The test error is the average error that results from using a statistical learning method to predict the response on a new observation that is, a measurement that was not used in training the method. Training Error: Same as above except the observations are based on the training set. Bias- Variance Trade off: Including many predictors leads to low bias and high variance; including few predictors leads to high bias and low variance. Balancing these two extremes is Bias-Variance Trade off. Overfitting: When a given statistical method yields a small training MSE (Mean Squared Error) but a large test MSE, we are said to be overfitting the data. Variable Selection: The task of determining which predictors are associated with the response, in order to fit a single model involving only those predictors, is referred to as variable selection. Model Selection: The process of selecting the proper level of flexibility (number of predictors) for a model is known as model selection. 3

4 Model (Variable) Selection Methods 1. Forward Selection 2. Backward Elimination 3. Stepwise Regression 4. Best Subset Selection 5. Test Error Rate Mathematical Adjustments to Training Error rate to estimate Test Error Rate 1. AIC (Akaike Information Criterion) 2. BIC (Bayesian Information Criterion) 3. Mallow s Cp 4. Adjusted R-square Directly estimates the test error by holding out 1. Cross Validation 6. Lasso 7. Ridge Regression 4

5 Cross Validation Cross-Validation (CV) Methods 1. Validation Set Approach 2. LOOCV (Leave One Out Cross Validation) 3. k-fold CV Image Source: 5

6 Time Series CV For ordered data, randomly splitting the data into training and test data will not apply as they are dependent. Four different ways (and many) exist depending on the length of the training data (fixed or varying) and the test data. No future observation can be used as part of training set. Sometimes called as evaluation by rolling forecasting origin Image Source: 6

7 Time Series CV : Example 1 1. Choosing among three competing models 2. Data: Monthly count of hotel rooms in Las Vegas from 1998 to The training set grows by 12 months in each draw 4. Total of 3 x 9 model fits 7

8 Example 1: Heat Map of ARIMA (0,2,1) 8

9 Time Series CV : Example 2 1. The training set grows by 1 month in each sample 2. Total of 3 x 97 model fits 9

10 Example 2: Heat Map of ETS 10

11 Model Fit: Entire Dataset 11

12 Summary Good model fit statistics do not necessarily mean good statistical model. The training set error will be almost always lower than the test error rate. So, R-Squared and RSS (residual sum of squares) are not suitable for selecting a best model among a collection with different number of predictors (obtained from methods 1 through 4). AIC, BIC, Cp, and adjusted R-Square estimates of test error rate by making mathematical adjustments to training error rate. Or, use CV to choose the model with the lowest error rate. But, CV has advantages over the penalizing methods because it estimates the test error directly, does not make any assumptions about the true underlying model, and can be applied to wider range of model selection tasks. 12

13 Appendix 13

14 Best Subsets: Example 1 Response Variable: Labor force participation rate Number of Predictor Variables: 19 Subset Selection restricted to 11 variables excluding intercept All Model includes intercept totnfm nonman srvc const man tran whole retail ware info finan prof edu leisure other govt fedgov state mili : : : :---- :----- :--- :---- :----- : :---- :---- :----- :---- :--- : :----- :---- : :----- : ( 1 ) * 2 ( 1 ) * * 3 ( 1 ) * * * 4 ( 1 ) * * * * 5 ( 1 ) * * * * * 6 ( 1 ) * * * * * * 7 ( 1 ) * * * * * * * 8 ( 1 ) * * * * * * * * 9 ( 1 ) * * * * * * * * * 10 ( 1 ) * * * * * * * * * * 11 ( 1 ) * * * * * * * * * * * 14

15 Training Error vs Test Error Overfitting 15

16 Best Model using Adjusted R-Square 16

17 Example 2: Heat Map of ETS (MSE) 17

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always