Penalized regression Statistical Learning, 2011

Similar documents
The Data. Math 158, Spring 2016 Jo Hardin Shrinkage Methods R code Ridge Regression & LASSO

1 StatLearn Practical exercise 5

SAS Workshop. Introduction to SAS Programming. Iowa State University DAY 3 SESSION I

Regularization Methods. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Lasso. November 14, 2017

Stat 4510/7510 Homework 6

14. League: A factor with levels A and N indicating player s league at the end of 1986

Lecture on Modeling Tools for Clustering & Regression

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression

[POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization

CSE446: Linear Regression. Spring 2017

Simulation studies. Patrick Breheny. September 8. Monte Carlo simulation Example: Ridge vs. Lasso vs. Subset

EE 511 Linear Regression

Lecture 13: Model selection and regularization

Package msgps. February 20, 2015

MACHINE LEARNING TOOLBOX. Random forests and wine

Bayes Estimators & Ridge Regression

Linear Methods for Regression and Shrinkage Methods

The lasso2 Package. April 5, 2006

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

Sparse Linear Models

Package TANDEM. R topics documented: June 15, Type Package

Package LINselect. June 9, 2018

Lasso.jl Documentation

Package parcor. February 20, 2015

Lasso Regression: Regularization for feature selection

PS 6: Regularization. PART A: (Source: HTF page 95) The Ridge regression problem is:

Package scoop. September 16, 2011

Package glmnetutils. August 1, 2017

Comparison of Optimization Methods for L1-regularized Logistic Regression

Package gwrr. February 20, 2015

Package ordinalnet. December 5, 2017

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

Package sparsenetgls

Package lasso2. November 27, 2018

LECTURE 12: LINEAR MODEL SELECTION PT. 3. October 23, 2017 SDS 293: Machine Learning

Assignment 6 - Model Building

Lecture 16: High-dimensional regression, non-linear regression

Package SSLASSO. August 28, 2018

Overfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff

Package bayescl. April 14, 2017

Package GLDreg. February 28, 2017

Package RAMP. May 25, 2017

Glmnet Vignette. Introduction. Trevor Hastie and Junyang Qian

Final Exam. Advanced Methods for Data Analysis (36-402/36-608) Due Thursday May 8, 2014 at 11:59pm

MATH 829: Introduction to Data Mining and Analysis Model selection

Cross-Validation Alan Arnholt 3/22/2016

Package hiernet. March 18, 2018

Package flam. April 6, 2018

The supclust Package

STAT432 Mini-Midterm Exam I (green) University of Illinois Urbana-Champaign February 25 (Monday), :00 10:50a SOLUTIONS

Introduction to Automated Text Analysis. bit.ly/poir599

Model selection and validation 1: Cross-validation

Chapter 6: Linear Model Selection and Regularization

The grplasso Package

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

Among those 14 potential explanatory variables,non-dummy variables are:

Journal of Statistical Software

This is a simple example of how the lasso regression model works.

Overview and Practical Application of Machine Learning in Pricing

Package MSGLasso. November 9, 2016

ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESSION, THE LASSO ESTIMATE, AND ELASTIC NET REGULARIZATION METHODS

Practical OmicsFusion

Variable Selection 6.783, Biomedical Decision Support

GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Evolution of Regression II: From OLS to GPS to MARS Hands-on with SPM

Yelp Recommendation System

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums

Contents Cont Hypothesis testing

Machine Learning: An Applied Econometric Approach Online Appendix

Leveling Up as a Data Scientist. ds/2014/10/level-up-ds.jpg

scikit-learn (Machine Learning in Python)

Data mining techniques for actuaries: an overview

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Package EBglmnet. January 30, 2016

Simulation Extrapolation

Package intccr. September 12, 2017

REGULARIZED REGRESSION FOR RESERVING AND MORTALITY MODELS GARY G. VENTER

CSE446: Linear Regression. Spring 2017

Package rgcvpack. February 20, Index 6. Fitting Thin Plate Smoothing Spline. Fit thin plate splines of any order with user specified knots

Package MatchingFrontier

Package TVsMiss. April 5, 2018

Package polywog. April 20, 2018

Lecture A. Exhaustive and non-exhaustive algorithms without resampling. Samuel Müller and Garth Tarr

The theory of the linear model 41. Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients

Package DTRlearn. April 6, 2018

Package h2o. November 19, Installer for H2O R Package

Package spikeslab. February 20, 2015

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy

Machine Learning. Chao Lan

Nonparametric Methods Recap

Variable Clustering in High-Dimensional Linear Regression: The R Package clere by Loïc Yengo, Julien Jacques, Christophe Biernacki and Mickael Canouil

REPLACING MLE WITH BAYESIAN SHRINKAGE CAS ANNUAL MEETING NOVEMBER 2018 GARY G. VENTER

CSE 446 Bias-Variance & Naïve Bayes

Transcription:

Penalized regression Statistical Learning, 2011 Niels Richard Hansen September 19, 2011 Penalized regression is implemented in several different R packages. Ridge regression can, in principle, be carried out using lm or lm.fit, but it is also implemented in MASS as the lm.ridge function. A main package for penalized regression involving the 1-norm penalty is glmnet, which is an extremely fast and well implemented package. We illustrate the usages on the prostate cancer data set. > require(glmnet) ### elastic net (and lasso...) > require(mass) ### lm.ridge for ridge regression > require(leaps) ### Best subset selection > require(ggplot2) Below, we construct a data set, where we center all the variables and, in addition, scale the predictors to have unit variance. This is mostly done here explicitly for reasons of comparison. Some functions implement standardization as a default or an option before the parameters are estimated. This is to bring the coefficients on a comparable scale. The resulting estimated parameters are usually transformed back to the original scale before they are returned. This will ease future predictions. Below we are careful in standardizing the test data the same way as the training are standardized. > prostate <- read.table("http://www-statclass.stanford.edu/%7etibs/elemstatlearn/datasets/prostate.data") > ### Selection of the training data only > ### Doubt about whether the original selection was random... > set.seed(1) > trainindex <- sample(prostate$train) > prostatetest <- prostate[!trainindex, 1:9] > prostate <- prostate[trainindex, 1:9] > ### Centering and scaling > xbar <- colmeans(prostate) > vbar <- colsums(scale(prostate, center = xbar, scale = FALSE)^2)/(dim(prostate)[1]-1) > prostatescaled <- scale(prostate, + center = xbar, + scale = c(sqrt(vbar)[-9], 1)) > prostatetestscaled <- scale(prostatetest, + center = xbar, + scale = c(sqrt(vbar)[-9], 1)) > Ntrain <- dim(prostate)[1] > Ntest <- dim(prostatetest)[1] 1

1 Best subset First, we revisit best subset selection. We compute the best model for each dimension and predict on the test data set. We compute and plot the average prediction error on the test and training data, respectively. The best model is found, in this case, with 3 predictors (lcavol, lweight and svi). > prostatesubset <- regsubsets(x = prostatescaled[, -9], + y = prostatescaled[, 9], + intercept = FALSE, + nbest = 1) > bestcoef <- coefficients(prostatesubset, 1:8) > rsspredict <- numeric() > for(i in 1:8) { + X <- prostatetestscaled[, names(bestcoef[[i]]), drop = FALSE] + yhat <- X %*% bestcoef[[i]] + rsspredict[i] <- mean((prostatetestscaled[, 9] - yhat)^2) + } > plot(1:8, prostatesubset$ress[, 1]/Ntrain, ylim = c(0.35, 0.7), + type = "b", col = "red", ylab = "Prediction error", xlab = "Dimension") > points(1:8, rsspredict, type = "b", col = "blue") 2

Prediction error 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 1 2 3 4 5 6 7 8 Dimension 2 Ridge The ridge regression models are fitted by a call similar to an lm call except that we have to specify a λ sequence. From the result we can, for instance, produce a coefficient profile plot of the estimated parameters as a function of the penalty parameter λ. We clearly see the shrinkage effect. As a function of λ all the parameters shrink towards 0. > lambda <- seq(0, 50, 1) > prostateridge <- lm.ridge(lpsa ~. - 1, + data = as.data.frame(prostatescaled), + lambda = lambda) > plot(prostateridge) 3

t(x$coef) 0.2 0.0 0.2 0.4 0.6 0 10 20 30 40 50 x$lambda As for best subset selection we can also plot the average prediction error for the test and the training data as a function of λ, which controls the model complexity in this case. > rssridge <- colmeans((prostatescaled[, 9] - + prostatescaled[, -9] %*% + t(coefficients(prostateridge)))^2 > rssridgepredict <- colmeans((prostatetestscaled[, 9] - + prostatetestscaled[, -9] %*% + t(coefficients(prostateridge)))^2 > plot(lambda, rssridge, type = "l", col = "red", ylim = c(0.35, 0.7)) > lines(lambda, rssridgepredict, col = "blue") 4

rssridge 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0 10 20 30 40 50 lambda 3 Lasso For lasso we use glmnet, which has an interface more similar to lm.fit than to lm. You have to specify the matrix of predictors and the vector of response values. You don t need to specify a λ sequence, in fact, you should rely on the sequence of λ values computed automatically unless you know what you are doing. There is also an α argument to the function, which by default is 1 giving lasso. Values in (0, 1) will produce the elastic net penalties, where the penalty function is a combination of the 1-norm as in pure lasso and the 2-norm as in ridge regression. Note that, though α = 0 formally gives ridge regression, this choice of α is not possible with glmnet. > prostateglmnet <- glmnet(x = prostatescaled[, 1:8], 5

+ y = prostatescaled[, 9], > plot(prostateglmnet) 0 2 5 6 Coefficients 0.2 0.0 0.2 0.4 0.6 0.0 0.5 1.0 1.5 L1 Norm We compute and plot again the average prediction error for the test and the training data as a function of λ. > rssglmnet <- colmeans((prostatescaled[, 9] - + predict(prostateglmnet, prostatescaled[, -9]))^2 > rssglmnetpredict <- colmeans((prostatetestscaled[, 9] - + predict(prostateglmnet, prostatetestscaled[, -9]))^2 > plot(prostateglmnet$lambda, rssglmnet, type = "l", col = "red", ylim = 6

c(0.35, 0.7)) > lines(prostateglmnet$lambda, rssglmnetpredict, col = "blue") rssglmnet 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.0 0.2 0.4 0.6 0.8 prostateglmnet$lambda We can, finally, compare the three resulting models and the average prediction errors estimated on the test data. > compmat <- matrix(0, 8, 3) > colnames(compmat) <- c("best subset", "Ridge", "Lasso") > rownames(compmat) <- names(prostate)[1:8] > compmat[names(bestcoef[[3]]), "Best subset"] <- bestcoef[[3]] > compmat[, "Ridge"] <- coefficients(prostateridge)[which.min(rssridgepredict), ] > compmat[, "Lasso"] <- coefficients(prostateglmnet)[-1, which.min(rssglmnetpredict)] > compmat 7

Best subset Ridge Lasso lcavol 0.6126496 0.643644414 0.56903878 lweight 0.2490049 0.260581059 0.20928299 age 0.0000000-0.230861289-0.09862584 lbph 0.0000000 0.147012639 0.07496092 svi 0.2921355 0.291408117 0.24104899 lcp 0.0000000-0.097911853 0.00000000 gleason 0.0000000-0.002976996 0.00000000 pgg45 0.0000000 0.207799316 0.12078733 > c(min(rsspredict), min(rssridgepredict), min(rssglmnetpredict)) [1] 0.4708751 0.5080935 0.5009177 8