22s:152 Applied Linear Regression

Similar documents
22s:152 Applied Linear Regression

Lecture 13: Model selection and regularization

Variable selection is intended to select the best subset of predictors. But why bother?

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

STA121: Applied Regression Analysis

[POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization

Model selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy

Package leaps. R topics documented: May 5, Title regression subset selection. Version 2.9

Quantitative Methods in Management

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Exploratory regression and model selection

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010

Assignment 6 - Model Building

Multiple Linear Regression

7. Collinearity and Model Selection

Linear Methods for Regression and Shrinkage Methods

Chapter 9 Building the Regression Model I: Model Selection and Validation

Applied Statistics and Econometrics Lecture 6

Model selection Outline for today

Multivariate Analysis Multivariate Calibration part 2

Section 2.1: Intro to Simple Linear Regression & Least Squares

Lecture 16: High-dimensional regression, non-linear regression

Discussion Notes 3 Stepwise Regression and Model Selection

14. League: A factor with levels A and N indicating player s league at the end of 1986

1 StatLearn Practical exercise 5

mcssubset: Efficient Computation of Best Subset Linear Regressions in R

Gelman-Hill Chapter 3

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1)

Information Criteria Methods in SAS for Multiple Linear Regression Models

Chapter 10: Variable Selection. November 12, 2018

The problem we have now is called variable selection or perhaps model selection. There are several objectives.

Chapter 6: Linear Model Selection and Regularization

Regression Analysis and Linear Regression Models

Section 2.1: Intro to Simple Linear Regression & Least Squares

Using R for Analyzing Delay Discounting Choice Data. analysis of discounting choice data requires the use of tools that allow for repeated measures

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018

An Econometric Study: The Cost of Mobile Broadband

Model Selection and Inference

Bayes Estimators & Ridge Regression

Salary 9 mo : 9 month salary for faculty member for 2004

Section 2.2: Covariance, Correlation, and Least Squares

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017

Multicollinearity and Validation CIVL 7012/8012

CPSC 340: Machine Learning and Data Mining

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Machine Learning. Topic 4: Linear Regression Models

Analysis of variance - ANOVA

Chapter 6: DESCRIPTIVE STATISTICS

Nonparametric Methods Recap

MODEL DEVELOPMENT: VARIABLE SELECTION

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

STAT 705 Introduction to generalized additive models

Chapter 10: Extensions to the GLM

Statistical Consulting Topics Using cross-validation for model selection. Cross-validation is a technique that can be used for model evaluation.

Solution to Bonus Questions

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Package pampe. R topics documented: November 7, 2015

Chapter 2: Descriptive Statistics

NCSS Statistical Software

ES-2 Lecture: Fitting models to data

Show how the LG-Syntax can be generated from a GUI model. Modify the LG-Equations to specify a different LC regression model

2016 Stat-Ease, Inc. Taking Advantage of Automated Model Selection Tools for Response Surface Modeling

An R Package for the Panel Approach Method for Program Evaluation: pampe by Ainhoa Vega-Bayo

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

NEURAL NETWORKS. Cement. Blast Furnace Slag. Fly Ash. Water. Superplasticizer. Coarse Aggregate. Fine Aggregate. Age

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

Ex.1 constructing tables. a) find the joint relative frequency of males who have a bachelors degree.

Cross-validation and the Bootstrap

Nonparametric Classification Methods

Lasso. November 14, 2017

Among those 14 potential explanatory variables,non-dummy variables are:

Regression III: Lab 4

CHAPTER 7 ASDA ANALYSIS EXAMPLES REPLICATION-SPSS/PASW V18 COMPLEX SAMPLES

Lasso Regression: Regularization for feature selection

Lecture on Modeling Tools for Clustering & Regression

8. Collinearity and Model Selection

Exploratory model analysis

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

Model selection. Peter Hoff STAT 423. Applied Regression and Analysis of Variance. University of Washington /53

The linear mixed model: modeling hierarchical and longitudinal data

Statistical Analysis in R Guest Lecturer: Maja Milosavljevic January 28, 2015

Poisson Regression and Model Checking

Section 2.3: Simple Linear Regression: Predictions and Inference

Evolution of Regression II: From OLS to GPS to MARS Hands-on with SPM

Section 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business

IQR = number. summary: largest. = 2. Upper half: Q3 =

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Predicting Web Service Levels During VM Live Migrations

Stat 5303 (Oehlert): Response Surfaces 1

Logistic Regression. (Dichotomous predicted variable) Tim Frasier

SYS 6021 Linear Statistical Models

Package freeknotsplines

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017

Lab 8 - Subset Selection in Python

Splines and penalized regression

Transcription:

22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider the available explanatory variables as candidate variables. (Some candidates may be transformations of others). Model selection can be challenging. If we have k candidate variables, there are potentially 2 k models to consider (i.e. each term being in or out of a given model). There are many methods for model selection, and we will only talk about a few in this class. One way to avoid looking at all possible subsets (potentially a very large number of models) is to use a stepwise procedure. For example, consider a backward stepwise method: 1. Start with the absolute largest model. 2. Choose a measure to quantify what makes a good model (R 2 is not a good choice, it will just choose the largest model every time). 3. Remove the term that most greatly increases the good model measure. 4. Continue to remove terms one at a time while the removal still provides a better model. 5. When removal of the next term would give you a worse model, stop the procedure. You ve found the best model. 1 2 The measure we use to make our choice should consider: 1. The number of explanatory variables in the model (we ll penalize models with too many). 2. The goodness of fit that the model provides. These express our conflicting interests... To describe the data reasonably well. (pushes toward more variables) To build a model simple enough to be interpretable. (pushes toward fewer variables) Some model selection measures (or criteria) Adjusted R 2,or R 2 : R 2 =1 RSS n 1 TSS n k 1 We prefer a model with a large R 2. Cross-Validation Criterion: ni=1 (Ŷ( i) CV = Y i) 2 n where Ŷ( i) is from the model fitted without using observation i. If you use a lot of parameters, you tend to over-fit the data, and we will do poorly at predicting a new Y not in the model-fitting (or training) data set. We prefer a model with a small CV. 3 4

Akaike information criterion (AIC): (This is assuming we have normal errors) AIC=n log e ˆσ 2 +2(k +1) We prefer a model with a small AIC. Bayesian information criterion (BIC): (This is assuming we have normal errors) BIC=n log e ˆσ 2 +(k +1)log e n We prefer a model with a small BIC. For both AIC and BIC, more parameters will provide smaller ˆσ 2,butthelasttermaddsona penalty related to the number of parameters in the model. Choose a best model using AIC in a Backward Stepwise Algorithm: Example: Crimeratedataset Crime-related and demographic statistics for 47 US states in 1960. The data were collected from the FBI s Uniform Crime Report and other government agencies to determine how the variable crime rate depends on the other variables measured in the study. VARIABLES RATE: Crime rate as # of offenses reported to police per million population Age: The number of males of age 14-24 per 1000 population S: Indicator variable for Southern states (0 = No, 1 = Yes) Ed: Mean # of years of schooling x 10 for persons of age 25 or older Ex0: 1960 per capita expenditure on police by state and local government 5 6 Ex1: 1959 per capita expenditure on police by state and local government LF: Labor force participation rate per 1000 civilian urban males age 14-24 M: The number of males per 1000 females N: State population size in hundred thousands NW: The number of non-whites per 1000 population U1: Unemployment rate of urban males per 1000 of age 14-24 U2: Unemployment rate of urban males per 1000 of age 35-39 W: Median value of transferable goods and assets or family income in tens of $ Pov: The number of families per 1000 earning below 1/2 the median income Use the step procedure in R to choose a good subset of predictors by subtracting terms one at atime. > crime.data=read.delim("crime.txt",sep="\t",header=false) > dimnames(crime.data)[[2]]=c("rate","age","s","ed","ex0", "Ex1","LF","M","N","NW","U1","U2","W","Pov") > attach(crime.data) > head(crime.data) RATE Age S Ed Ex0 Ex1 LF M N NW U1 U2 W X 1 79.1 151 1 91 58 56 510 950 33 301 108 41 394 261 2 163.5 143 0 113 103 95 583 1012 13 102 96 36 557 194 3 57.8 142 1 89 45 44 533 969 18 219 94 33 318 250 4 196.9 136 0 121 149 141 577 994 157 80 102 39 673 167 5 123.4 141 0 121 109 101 591 985 18 30 91 20 578 174 6 68.2 121 0 110 118 115 547 964 25 44 84 29 689 126 ## Fit the model including all candidate variables: > lm.full.out=lm(rate ~Age + S + Ed + Ex0 + Ex1 + LF + M + N + NW + U1 + U2 + W + Pov) > vifs=vif(lm.full.out) > round(vifs,2) Age S Ed Ex0 Ex1 LF 2.70 4.88 5.05 94.63 98.64 3.68 M N NW U1 U2 W Pov 3.66 2.32 4.12 5.94 5.00 9.97 8.41 7 8

## The starting AIC is 301.66... ## remove variables one at a time. > model.selection=step(lm.full.out) Start: AIC=301.66 RATE ~ Age + S + Ed + Ex0 + Ex1 + LF + M + N + NW + U1 + U2 + W + Pov - NW 1 6.1 15884.8 299.7 - LF 1 34.4 15913.1 299.8 - N 1 48.9 15927.6 299.8 - S 1 149.4 16028.1 300.1 - Ex1 1 162.3 16041.0 300.1 - M 1 296.5 16175.2 300.5 <none> 15878.7 301.7 - W 1 810.6 16689.3 302.0 - U1 1 911.5 16790.2 302.3 - Ex0 1 1109.8 16988.5 302.8 - U2 1 2108.8 17987.5 305.5 - Age 1 2911.6 18790.3 307.6 - Ed 1 3700.5 19579.2 309.5 - Pov 1 5474.2 21352.9 313.6 ## Remove NW and check if we should remove another. Step: AIC=299.68 RATE ~ Age + S + Ed + Ex0 + Ex1 + LF + M + N + U1 + U2 + W + Pov - LF 1 28.7 15913.4 297.8 9 - N 1 48.6 15933.4 297.8 - Ex1 1 156.3 16041.0 298.1 - S 1 158.0 16042.8 298.1 - M 1 294.1 16178.9 298.5 <none> 15884.8 299.7 - W 1 820.2 16705.0 300.0 - U1 1 913.1 16797.9 300.3 - Ex0 1 1104.3 16989.1 300.8 - U2 1 2107.1 17991.9 303.5 - Age 1 3365.8 19250.5 306.7 - Ed 1 3757.1 19641.9 307.7 - Pov 1 5503.6 21388.3 311.7 ## Remove LF and check if we should remove another. Step: AIC=297.76 RATE ~ Age + S + Ed + Ex0 + Ex1 + M + N + U1 + U2 + W + Pov - N 1 62.2 15975.6 295.9 - S 1 129.4 16042.8 296.1 - Ex1 1 134.8 16048.2 296.2 - M 1 276.8 16190.2 296.6 <none> 15913.4 297.8 - W 1 801.9 16715.3 298.1 - U1 1 941.8 16855.2 298.5 - Ex0 1 1075.9 16989.4 298.8 - U2 1 2088.5 18001.9 301.6 - Age 1 3407.9 19321.3 304.9 - Ed 1 3895.3 19808.7 306.1 - Pov 1 5621.3 21534.7 310.0 10 ## Remove N and check if we should remove another. Step: AIC=295.95 RATE ~ Age + S + Ed + Ex0 + Ex1 + M + U1 + U2 + W + Pov - S 1 104.4 16080.0 294.3 - Ex1 1 123.3 16098.9 294.3 - M 1 533.8 16509.4 295.5 <none> 15975.6 295.9 - W 1 748.7 16724.4 296.1 - U1 1 997.7 16973.4 296.8 - Ex0 1 1021.3 16996.9 296.9 - U2 1 2082.3 18057.9 299.7 - Age 1 3425.9 19401.6 303.1 - Ed 1 3887.6 19863.3 304.2 - Pov 1 5896.9 21872.6 308.7 ## Remove S and check if we should remove another. Step: AIC=294.25 RATE ~ Age + Ed + Ex0 + Ex1 + M + U1 + U2 + W + Pov - Ex1 1 171.5 16251.5 292.8 - M 1 563.4 16643.4 293.9 <none> 16080.0 294.3 - W 1 734.7 16814.7 294.4 - U1 1 906.0 16986.0 294.8 - Ex0 1 1162.0 17241.9 295.5 11 - U2 1 1978.0 18058.0 297.7 - Age 1 3354.5 19434.4 301.2 - Ed 1 4139.1 20219.1 303.0 - Pov 1 6094.8 22174.8 307.4 ## Remove Ex1 and check if we should remove another. Step: AIC=292.75 RATE ~ Age + Ed + Ex0 + M + U1 + U2 + W + Pov - M 1 691.0 16942.5 292.7 <none> 16251.5 292.8 - W 1 759.0 17010.5 292.9 - U1 1 921.8 17173.2 293.3 - U2 1 2018.1 18269.5 296.3 - Age 1 3323.1 19574.5 299.5 - Ed 1 4005.1 20256.5 301.1 - Pov 1 6402.7 22654.2 306.4 - Ex0 1 11818.8 28070.2 316.4 ## Remove M and check if we should remove another. Step: AIC=292.71 RATE ~ Age + Ed + Ex0 + U1 + U2 + W + Pov - U1 1 408.6 17351.1 291.8 <none> 16942.5 292.7 - W 1 1016.9 17959.3 293.4 - U2 1 1548.6 18491.1 294.8 - Age 1 4511.6 21454.1 301.8 12

- Ed 1 6430.6 23373.0 305.8 - Pov 1 8147.7 25090.1 309.2 - Ex0 1 12019.6 28962.1 315.9 ## Remove U1 and check if we should remove another. Step: AIC=291.83 RATE ~ Age + Ed + Ex0 + U2 + W + Pov <none> 17351 292 - W 1 1253 18604 293 - U2 1 1629 18980 294 - Age 1 4461 21812 301 - Ed 1 6215 23566 304 - Pov 1 8932 26283 309 - Ex0 1 15597 32948 320 ######################################### ## Procedure stops because removing ## ## any of the remaining variables ## ## only increases AIC. ## ######################################### ## Get the output from the final chosen model: > summary(model.selection) Call: lm(formula = RATE ~ Age + Ed + Ex0 + U2 + W + Pov) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -618.5028 108.2456-5.714 1.19e-06 *** Age 1.1252 0.3509 3.207 0.002640 ** Ed 1.8179 0.4803 3.785 0.000505 *** Ex0 1.0507 0.1752 5.996 4.78e-07 *** U2 0.8282 0.4274 1.938 0.059743. W 0.1596 0.0939 1.699 0.097028. Pov 0.8236 0.1815 4.538 5.10e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 20.83 on 40 degrees of freedom Multiple R-squared: 0.7478,Adjusted R-squared: 0.71 F-statistic: 19.77 on 6 and 40 DF, p-value: 1.441e-10 > lm.out=lm(rate ~Age + Ed + Ex0 + U2 + W + Pov) > vif(lm.out) Age Ed Ex0 U2 W Pov 2.061942 3.061153 2.875709 1.381671 8.705602 5.559788 13 14 You can use the step function with the BIC instead through an option in the step() statement. Setting k = log(n)inthestatementchangesthe criterion to the BIC rather than the AIC (Eventhough some of the output says AIC). > model.selection.bic=step(lm.full.out,k=log(47)) ##### similar output to previous example... ###### > summary(model.selection.bic) Call: lm(formula = RATE ~ Age + Ed + Ex0 + U2 + Pov) BIC tends to favor smaller models than the AIC. It has a heavier penalty for using more parameters. The only difference between the two chosen models in this example is that W (for wealth) is also removed from the BIC chosen model. You can also use the option direction = forward to build abestmodel. Butstartingwiththefull model is generally more reliable. Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -524.3743 95.1156-5.513 2.13e-06 *** Age 1.0198 0.3532 2.887 0.006175 ** Ed 2.0308 0.4742 4.283 0.000109 *** Ex0 1.2331 0.1416 8.706 7.26e-11 *** U2 0.9136 0.4341 2.105 0.041496 * Pov 0.6349 0.1468 4.324 9.56e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 21.3 on 41 degrees of freedom Multiple R-squared: 0.7296,Adjusted R-squared: 0.6967 F-statistic: 22.13 on 5 and 41 DF, p-value: 1.105e-10 15 16

We mentioned stepwise procedures as a way to get around looking at every single possible model (there are 2 k possibilities). What about actually considering every possible model? Is this feasible?... Depends on the total number of variables. We can use the regsubsets function in the leaps library to consider the best model of each possible size (1 predictor, 2 predictors, 3 predictors, etc.) > library(leaps) ## The. below means we ll use all the other variables ## besides RATE as predictors in the largest model. > crime.subsets=regsubsets(rate ~., nbest=1,nvmax=13, data=crime.data) ## For each size of model, we ll only record ## the single best model (nbest=1). > summary(crime.subsets) Subset selection object Call: regsubsets.formula(rate ~., nbest = 1, nvmax = 13, data = crime.data) 13 Variables (and intercept) 1 subsets of each size up to 13 Selection Algorithm: exhaustive Age S Ed Ex0 Ex1 LF M N NW U1 U2 W Pov 1 ( 1 ) " " " " " " "*" " " " " " " " " " " " " " " " " " " 2 ( 1 ) " " " " " " "*" " " " " " " " " " " " " " " " " "*" 3 ( 1 ) " " " " "*" "*" " " " " " " " " " " " " " " " " "*" 4 ( 1 ) "*" " " "*" "*" " " " " " " " " " " " " " " " " "*" 5 ( 1 ) "*" " " "*" "*" " " " " " " " " " " " " "*" " " "*" 6 ( 1 ) "*" " " "*" "*" " " " " " " " " " " " " "*" "*" "*" 7 ( 1 ) "*" " " "*" "*" " " " " " " " " " " "*" "*" "*" "*" 8 ( 1 ) "*" " " "*" "*" " " " " "*" " " " " "*" "*" "*" "*" 9 ( 1 ) "*" " " "*" "*" "*" " " "*" " " " " "*" "*" "*" "*" 10 ( 1 ) "*" "*" "*" "*" "*" " " "*" " " " " "*" "*" "*" "*" 11 ( 1 ) "*" "*" "*" "*" "*" " " "*" "*" " " "*" "*" "*" "*" 12 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" " " "*" "*" "*" "*" 13 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" As Ex0, the1960policeexpenditures,isfirst chosen and appears in every best model, this may be the most important explanatory variable of crime rate. ## We ll consider models up to a ## 13 variable model (nvmax=13). 17 18 The 5 variable model matches that of the stepwise BIC best model we saw earlier. And the 6variablemodelmatchesthatofthestepwise AIC best model we saw earlier. If you use regsubsets to record more than just the single best model of each size, you can see how different the BIC values are for the top X best models of each size in a visual plot... Statistic: bic -38-36 -34-32 E1-M-P Ed-E1-P E0-M-P Ed-E0-P Ed-E0-U2-P Ed-E0-W-P Ed-E0-M-P A-Ed-E0-P A-Ed-E0-U1-P A-Ed-E0-M-P A-Ed-E0-W-P A-Ed-E0-E1-U2-P A-Ed-E0-L-U2-P A-Ed-E0-U1-U2-P A-Ed-E0-U2-W-P A-Ed-E0-U2-P We ll keep the best 4 models of each size. > crime.subsets.2=regsubsets(rate~., nbest=4, nvmax=8, data=crime.data) ## The next line will give you the graphic ## for models of size 3 to 6. > subsets(crime.subsets.2,min.size=3,max.size=6,legend=f) ## The subsets() function is in the car library. 19 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Subset Size This may be useful if you re deciding between models with similar BIC values, and some models seem better for you in terms of your research, and which variables are included. The plot also shows the model with the smallest BIC (if you show all subset sizes). 20

Here we keep track of the best 4 models of each size up to a model with all 13 variables included. The graphic becomes a bit hard to read when you look at all recorded models, so subsetting the picture (as on the previous page) is useful. > crime.subsets.3=regsubsets(rate~., nbest=4, nvmax=13, data=crime.data) > subsets(crime.subsets.3,legend=f) AIC function To compare model 1 to model 2 using AIC, you can just use the AIC function directly. > model.1=lm(rate ~Ex0 + Ex1 + LF + M + N) > model.2=lm(rate ~Age + S + Ex0 + Ex1 + U1 + Pov) > AIC(model.1) [1] 454.5741 Statistic: bic -30-20 -10 0 N W E1 E0 A-E1 E1-P A-E0 E0-P A-Ed-E0-M-NW-U1-U2-W-P A-Ed-E0-E1-M-U1-U2-W-P A-Ed-E0-M-N-U1-U2-W-P A-S-Ed-E0-M-U1-U2-W-P E1-M-P Ed-E1-P A-Ed-E0-E1-U1-U2-W-P A-Ed-E0-N-U1-U2-W-P A-Ed-E0-L-U1-U2-W-P E0-M-PEd-E0-U2-P A-Ed-E0-M-U1-U2-W-P Ed-E0-W-P Ed-E0-M-P A-Ed-E0-U1-P A-Ed-E0-M-P A-Ed-E0-E1-U2-P Ed-E0-P A-Ed-E0-U1-U2-P A-Ed-E0-L-U2-P A-Ed-E0-U1-U2-W-P A-Ed-E0-E1-U2-W-P A-Ed-E0-M-U1-U2-P A-Ed-E0-L-U2-W-P A-Ed-E0-P A-Ed-E0-W-P A-Ed-E0-U2-P A-Ed-E0-U2-W-P A-S-Ed-E0-E1-L-M-N-NW-U1-U2-W-P A-S-Ed-E0-E1-M-N-NW-U1-U2- A-Ed-E0-E1-L-M-N-NW-U1-U2- A-S-Ed-E0-E1-L-M-NW-U1-U2- A-S-Ed-E0-E1-L-M-N-U1-U2-W A-Ed-E0-E1-M-N-NW-U1-U2-W-P A-S-Ed-E0-E1-M-NW-U1-U2-W-P A-S-Ed-E0-E1-M-N-U1-U2-W-P A-S-Ed-E0-E1-L-M-U1-U2-W-P A-Ed-E0-E1-M-NW-U1-U2-W-P A-Ed-E0-E1-M-N-U1-U2-W-P A-S-Ed-E0-E1-M-U1-U2-W-P A-S-Ed-E0-M-N-U1-U2-W-P > AIC(model.2) [1] 445.9138 *smaller AIC is better. To compare the two models using BIC... > AIC(model.1,k=log(47)) [1] 467.5252 > AIC(model.2,k=log(47)) [1] 460.715 *smaller BIC is better. 2 4 6 8 10 12 Subset Size 21 22