Lasso. November 14, 2017

Size: px
Start display at page:

Download "Lasso. November 14, 2017"

Transcription

1 Lasso November 14, 2017 Contents 1 Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) The Lasso Estimator Computation of the Lasso Solution Single Predictor: Soft Thresholding l q Penalties Advantages of l 1 -penalty Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) There are two reasons why we might consider an alternative to the least-squares estimate. Prediction accuracy: The least-squares estimate often has low bias but large variance, and prediction accuracy can sometimes be improved by shrinking the values of the regression coefficients. By doing so, we introduce some bias but reduce the variance of the predicted values, and hence may improve the overall prediction accuracy. Purposes of interpretation: With a large number of predictors, we often would like to identify a smaller subset of these predictors that exhibit the strongest effects. In this section, we discuss the various penalty functions p λ ( ) used in the penalized problem arg min{l(β) + p λ (β)} β for some loss function L(β). We mainly use the least squares loss function throughout our discussion. 1.1 The Lasso Estimator Definition 1 (The lasso estimator). The lasso estimator, denoted by ˆβ lasso, is defined as ˆβ lasso 1 n p = arg min (y β i β 0 x i β) 2 + λ β j, (λ > 0) 2n i=1 j=1 1

2 or equivalently, ˆβ lasso 1 n = arg min (y i β 0 x i β) 2 β 2n i=1 p subject to β j t (t > 0) j=1 or equivalently, { } ˆβ lasso 1 = arg min β 2n y β 01 Xβ λ β 1, (λ > 0) where y = (y 1,..., y n ) denote the n-vector of responses, X be an n p matrix with x i R p in its ith row, 1 is the vector of n ones, and 1 is the l 1 -norm and 2 is the usual Euclidean norm. Why do we use the l 1 norm? Why not use the l 2 norm or any l q norm? The lasso yields sparse solution vectors. The value q = 1 is the smallest value that yields a convex problem. Theoretical guarantee. Note: Typically, we first standardize the predictors X so that each column is centered 1 n n i=1 x ij = 0 and has unit variance 1 n n i=1 x2 ij = 1. Without standardization, the lasso solutions would depend on the units. For convenience, we also assume that the outcome values y i have been centered, meaning that 1 n n i=1 y i = 0. These centering conditions are convenient, since they mean that we can omit the intercept term β 0 in the lasso optimization. Given an optimal lasso solution ˆβ on the centered data, we can recover the optimal solutions for the uncentered data: ˆβ is the same, and the intercept ˆβ 0 is given by ˆβ 0 = ȳ p x j ˆβj where ȳ and { x j } p 1 are the original means. (This is typically only true for linear regression with squared-error loss; it s not true, for example, for lasso logistic regression). For this reason, we omit the intercept β 0 from the lasso for the remainder of this chapter. j=1 2

3 Figure 1: The l 1 ball. Table 2 shows the results of applying three fitting procedures to the crime data. bound t was chosen by cross-validation. The lasso The left panel corresponds to the full least-squares fit. The middle panel shows the lasso fit. On the right, we have applied least-squares estimation to the subset of three predictors with nonzero coefficients in the lasso. (Relaxed Lasso) The standard errors for the least-squares estimates come from the usual formulas. No such simple formula exists for the lasso, so we have used the bootstrap to obtain the estimate of standard errors in the middle panel. Overall it appears that funding has a large effect, probably indicating that police resources have been focused on higher crime areas. The other predictors have small to moderate effects. Note 3

4 that the lasso sets two of the five coefficients to zero, and tends to shrink the coefficients of the others toward zero relative to the full least-squares estimate. In turn, the least-squares fit on the subset of the three predictors tends to expand the lasso estimates away from zero. The nonzero estimates from the lasso tend to be biased toward zero, so the debiasing in the right panel can often improve the prediction error of the model. This two-stage process is also known as the relaxed lasso (Meinshausen 2007). Figure 2: Results from analysis of the crime data. Left panel shows the least-squares estimates, standard errors, and their ratio (Z-score). Middle and right panels show the corresponding results for the lasso, and the least-squares estimates applied to the subset of predictors chosen by the lasso. # to obtain glmnet and install it directly from CRAN. # install.packages("glmnet", repos = " # load the glmnet package: library(glmnet) # The default model used in the package is the Guassian linear # model or "least squares", which we will demonstrate in this # section. We load a set of data created beforehand for # illustration. Users can either load their own data or use # those saved in the workspace. getwd() ## [1] "/Users/yiyang/Dropbox/Teaching/MATH680/Topic4/note" load("bardet.rda") # The command loads an input matrix x and a response # vector y from this saved R data archive. # # We fit the model using the most basic call to glmnet. fit = glmnet(x, y) 4

5 # "fit" is an object of class glmnet that contains all the # relevant information of the fitted model for further use. # We do not encourage users to extract the components directly. # Instead, various methods are provided for the object such # as plot, print, coef and predict that enable us to execute # those tasks more elegantly. # We can visualize the coefficients by executing the plot function: plot(fit) Coefficients L1 Norm # Each curve corresponds to a variable. It shows the path of # its coefficient against the l1-norm of the whole # coefficient vector at as lambda varies. The axis above 5

6 # indicates the number of nonzero coefficients at the # current lambda, which is the effective degrees of freedom # (df) for the lasso. Users may also wish to annotate # the curves; this can be done by setting label = TRUE # in the plot command. # A summary of the glmnet path at each step is displayed # if we just enter the object name or use # the print function: print(fit) ## ## Call: glmnet(x = x, y = y) ## ## Df %Dev Lambda ## [1,] ## [2,] ## [3,] ## [4,] ## [5,] ## [6,] ## [7,] ## [8,] ## [9,] ## [10,] ## [11,] ## [12,] ## [13,] ## [14,] ## [15,] ## [16,] ## [17,] ## [18,] ## [19,] ## [20,] ## [21,] ## [22,] ## [23,] ## [24,] ## [25,] ## [26,] ## [27,] ## [28,] ## [29,] ## [30,]

7 ## [31,] ## [32,] ## [33,] ## [34,] ## [35,] ## [36,] ## [37,] ## [38,] ## [39,] ## [40,] ## [41,] ## [42,] ## [43,] ## [44,] ## [45,] ## [46,] ## [47,] ## [48,] ## [49,] ## [50,] ## [51,] ## [52,] ## [53,] ## [54,] ## [55,] ## [56,] ## [57,] ## [58,] ## [59,] ## [60,] ## [61,] ## [62,] ## [63,] ## [64,] ## [65,] ## [66,] ## [67,] ## [68,] ## [69,] ## [70,] ## [71,] ## [72,] ## [73,] ## [74,] ## [75,]

8 ## [76,] ## [77,] ## [78,] ## [79,] ## [80,] ## [81,] ## [82,] ## [83,] ## [84,] ## [85,] ## [86,] ## [87,] ## [88,] ## [89,] ## [90,] ## [91,] ## [92,] ## [93,] ## [94,] ## [95,] ## [96,] ## [97,] ## [98,] ## [99,] ## [100,] # It shows from left to right the number of nonzero # coefficients (Df), the values of -log(likelihood) # (%dev) and the value of lambda (Lambda). # Although by default glmnet calls for 100 values of # lambda the program stops early if %dev% does not # change sufficently from one lambda to the next # (typically near the end of the path.) # We can obtain the actual coefficients at one or more lambda's # within the range of the sequence: coef0 = coef(fit,s=0.1) # The function glmnet returns a sequence of models # for the users to choose from. In many cases, users # may prefer the software to select one of them. # Cross-validation is perhaps the simplest and most # widely used method for that task. # 8

9 # cv.glmnet is the main function to do cross-validation # here, along with various supporting methods such as # plotting and prediction. We still act on the sample # data loaded before. cvfit = cv.glmnet(x, y) # cv.glmnet returns a cv.glmnet object, which is "cvfit" # here, a list with all the ingredients of the # cross-validation fit. As for glmnet, we do not # encourage users to extract the components directly # except for viewing the selected values of lambda. # The package provides well-designed functions # for potential tasks. # We can plot the object. plot(cvfit) 9

10 Mean Squared Error log(lambda) # It includes the cross-validation curve (red dotted line), # and upper and lower standard deviation curves along the # lambda sequence (error bars). Two selected lambda's are # indicated by the vertical dotted lines (see below). # We can view the selected lambda's and the corresponding # coefficients. For example, cvfit$lambda.min ## [1] # lambda.min is the value of lambda that gives minimum # mean cross-validated error. The other lambda saved is # lambda.1se, which gives the most regularized model 10

11 # such that error is within one standard error of # the minimum. To use that, we only need to replace # lambda.min with lambda.1se above. coef1 = coef(cvfit, s = "lambda.min") # Note that the coefficients are represented in the # sparse matrix format. The reason is that the # solutions along the regularization path are # often sparse, and hence it is more efficient # in time and space to use a sparse format. # If you prefer non-sparse format, # pipe the output through as.matrix(). # Predictions can be made based on the fitted # cv.glmnet object. Let's see a toy example. predict(cvfit, newx = x[1:5,], s = "lambda.min") ## 1 ## V ## V ## V ## V ## V # newx is for the new input matrix and s, # as before, is the value(s) of lambda at which # predictions are made. 1.2 Computation of the Lasso Solution Lasso prefers sparse solution. To see this, notice that, with ridge regression, the prior cost of a sparse solution, such as β = (1, 0), is the same as the cost of a dense solution, such as β = (1/ 2, 1/ 2), as long as they have the same l 2 norm: (1, 0) 2 = (1/ 2, 1/ 2) 2 = 1. However, for lasso, setting β = (1, 0) is cheaper than setting β = (1/ 2, 1/ 2), since (1, 0) 1 = 1 < (1/ 2, 1/ 2) 1 = 2. The most rigorous way to see that l 1 regularization results in sparse solutions is to examine the conditions that hold at the optimum. 11

12 1.2.1 Single Predictor: Soft Thresholding In this section, z i has been centered. Consider a single predictor setting, based on samples {(y i, z i )} n i=1 (for convenience we have renamed z i to be x ij ). The problem then is to solve arg min β { 1 2n } n (y i z i β) 2 + λ β i=1 We cannot get the optimality condition directly, since β does not have a derivative at β = 0. By direct inspection of the function (1), we find that 1 n z, y λ if 1 n z, y > λ ˆβ = 0 if 1 n z, y λ, 1 n z, y + λ if 1 n z, y < λ (1) which can be written as ˆβ = S λ ( 1 z, y ), n where the soft-thresholding operator S λ (x) = sign(x)( x λ) +. when data is standardized 1 n i z2 i = 1, it translates the usual least-squares estimate ˆβ OLS = z, y / z, z = 1 n z, y toward zero by the amount λ. This is demonstrated in Figure 3. Figure 3: Soft thresholding function S λ (x) = sign(x)( x λ) + is shown in blue (broken lines), along with the 45 line in black. 1.3 l q Penalties For a fixed real number q 0, consider the criterion 12

13 1 min β 2n n (y i x i β) 2 + λ i=1 p β j q. (2) This is the lasso for q = 1 and ridge regression for q = 2. For q = 0, the term p j=1 β j 0 counts the number of nonzero elements in β, and thus amounts to best-subset selection. Figure 4 displays the constraint regions corresponding to these penalties for the case of two predictors (p = 2). j=1 Figure 4: Constraint regions p j=1 β j q 1 for different values of q. For q < 1, the constraint region is nonconvex. In the special case of an orthonormal model matrix X, all three procedures have explicit solutions. Each method applies a simple coordinate-wise transformation to the least-squares estimate β as detailed in Table 1. The lasso is special in that the choice q = 1 is the smallest value of q (closest to best-subset) that leads to a convex constraint region and hence a convex optimization problem. In this sense, it is the closest convex relaxation of the best-subset selection problem. Table 1: Estimators of β j from (2) in the case of an orthonormal model matrix X. 1.4 Advantages of l 1 -penalty Interpretation of the final model: the l 1 -penalty provides a natural way to encourage or enforce sparsity and simplicity in the solution. Statistical efficiency: bet-on-sparsity principle assume that the underlying true signal is sparse and we use an l 1 -penalty to try to recover it. If our assumption is correct, we can do a good job in recovering the true signal. But if we are wrong the underlying truth is not sparse in the chosen bases then the l 1 -penalty will not work well. However in that instance, 13

14 no method can do well, relative to the Bayes error. There is now a large body of theoretical support for these loose statements. We can think of this in terms of the amount of information n/p per parameter. If p n and the true model is not sparse, i.e. k n, then the number of samples n is too small to allow for accurate estimation of the parameters. But if the true model is sparse, so that only k < n parameters are actually nonzero in the true underlying model, then it turns out that we can estimate the parameters effectively, using the lasso. This may come as somewhat of a surprise, because we are able to do this even though we are not told which k of the p parameters are actually nonzero. Of course we cannot do as well as we could if we had that information, but it turns out that we can still do reasonably well. Computational efficiency: l 1 -based penalties are convex and this fact and the assumed sparsity can lead to significant computational advantages. 14

Glmnet Vignette. Introduction. Trevor Hastie and Junyang Qian

Glmnet Vignette. Introduction. Trevor Hastie and Junyang Qian Glmnet Vignette Trevor Hastie and Junyang Qian Stanford September 13, 2016 Introduction Installation Quick Start Linear Regression Logistic Regression Poisson Models Cox Models Sparse Matrices Appendix

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

Regularization Methods. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Regularization Methods. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Regularization Methods Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Avoiding overfitting and improving model interpretability with the help of regularization

More information

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100. Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records

More information

14. League: A factor with levels A and N indicating player s league at the end of 1986

14. League: A factor with levels A and N indicating player s league at the end of 1986 PENALIZED REGRESSION Ridge and The LASSO Note: The example contained herein was copied from the lab exercise in Chapter 6 of Introduction to Statistical Learning by. For this exercise, we ll use some baseball

More information

Model selection and validation 1: Cross-validation

Model selection and validation 1: Cross-validation Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

Contents Cont Hypothesis testing

Contents Cont Hypothesis testing Lecture 5 STATS/CME 195 Contents Hypothesis testing Hypothesis testing Exploratory vs. confirmatory data analysis Two approaches of statistics to analyze data sets: Exploratory: use plotting, transformations

More information

1 StatLearn Practical exercise 5

1 StatLearn Practical exercise 5 1 StatLearn Practical exercise 5 Exercise 1.1. Download the LA ozone data set from the book homepage. We will be regressing the cube root of the ozone concentration on the other variables. Divide the data

More information

Variable Selection 6.783, Biomedical Decision Support

Variable Selection 6.783, Biomedical Decision Support 6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13 CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit

More information

Stat 4510/7510 Homework 6

Stat 4510/7510 Homework 6 Stat 4510/7510 1/11. Stat 4510/7510 Homework 6 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that

More information

Lecture 19: November 5

Lecture 19: November 5 0-725/36-725: Convex Optimization Fall 205 Lecturer: Ryan Tibshirani Lecture 9: November 5 Scribes: Hyun Ah Song Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

Package TANDEM. R topics documented: June 15, Type Package

Package TANDEM. R topics documented: June 15, Type Package Type Package Package TANDEM June 15, 2017 Title A Two-Stage Approach to Maximize Interpretability of Drug Response Models Based on Multiple Molecular Data Types Version 1.0.2 Date 2017-04-07 Author Nanne

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Bayes Estimators & Ridge Regression

Bayes Estimators & Ridge Regression Bayes Estimators & Ridge Regression Readings ISLR 6 STA 521 Duke University Merlise Clyde October 27, 2017 Model Assume that we have centered (as before) and rescaled X o (original X) so that X j = X o

More information

Chapter 6: Linear Model Selection and Regularization

Chapter 6: Linear Model Selection and Regularization Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the

More information

Penalized regression Statistical Learning, 2011

Penalized regression Statistical Learning, 2011 Penalized regression Statistical Learning, 2011 Niels Richard Hansen September 19, 2011 Penalized regression is implemented in several different R packages. Ridge regression can, in principle, be carried

More information

CS294-1 Assignment 2 Report

CS294-1 Assignment 2 Report CS294-1 Assignment 2 Report Keling Chen and Huasha Zhao February 24, 2012 1 Introduction The goal of this homework is to predict a users numeric rating for a book from the text of the user s review. The

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form

More information

Soft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an

Soft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an Soft Threshold Estimation for Varying{coecient Models Artur Klinger, Universitat Munchen ABSTRACT: An alternative penalized likelihood estimator for varying{coecient regression in generalized linear models

More information

Package glmnetutils. August 1, 2017

Package glmnetutils. August 1, 2017 Type Package Version 1.1 Title Utilities for 'Glmnet' Package glmnetutils August 1, 2017 Description Provides a formula interface for the 'glmnet' package for elasticnet regression, a method for cross-validating

More information

Final Exam. Advanced Methods for Data Analysis (36-402/36-608) Due Thursday May 8, 2014 at 11:59pm

Final Exam. Advanced Methods for Data Analysis (36-402/36-608) Due Thursday May 8, 2014 at 11:59pm Final Exam Advanced Methods for Data Analysis (36-402/36-608) Due Thursday May 8, 2014 at 11:59pm Instructions: you will submit this take-home final exam in three parts. 1. Writeup. This will be a complete

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Multiresponse Sparse Regression with Application to Multidimensional Scaling Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,

More information

Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums

Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums José Garrido Department of Mathematics and Statistics Concordia University, Montreal EAJ 2016 Lyon, September

More information

Yelp Recommendation System

Yelp Recommendation System Yelp Recommendation System Jason Ting, Swaroop Indra Ramaswamy Institute for Computational and Mathematical Engineering Abstract We apply principles and techniques of recommendation systems to develop

More information

MATH 829: Introduction to Data Mining and Analysis Model selection

MATH 829: Introduction to Data Mining and Analysis Model selection 1/12 MATH 829: Introduction to Data Mining and Analysis Model selection Dominique Guillot Departments of Mathematical Sciences University of Delaware February 24, 2016 2/12 Comparison of regression methods

More information

Package msgps. February 20, 2015

Package msgps. February 20, 2015 Type Package Package msgps February 20, 2015 Title Degrees of freedom of elastic net, adaptive lasso and generalized elastic net Version 1.3 Date 2012-5-17 Author Kei Hirose Maintainer Kei Hirose

More information

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017 CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class

More information

The Data. Math 158, Spring 2016 Jo Hardin Shrinkage Methods R code Ridge Regression & LASSO

The Data. Math 158, Spring 2016 Jo Hardin Shrinkage Methods R code Ridge Regression & LASSO Math 158, Spring 2016 Jo Hardin Shrinkage Methods R code Ridge Regression & LASSO The Data The following dataset is from Hastie, Tibshirani and Friedman (2009), from a studyby Stamey et al. (1989) of prostate

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Comparison of Optimization Methods for L1-regularized Logistic Regression

Comparison of Optimization Methods for L1-regularized Logistic Regression Comparison of Optimization Methods for L1-regularized Logistic Regression Aleksandar Jovanovich Department of Computer Science and Information Systems Youngstown State University Youngstown, OH 44555 aleksjovanovich@gmail.com

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

Lecture 17 Sparse Convex Optimization

Lecture 17 Sparse Convex Optimization Lecture 17 Sparse Convex Optimization Compressed sensing A short introduction to Compressed Sensing An imaging perspective 10 Mega Pixels Scene Image compression Picture Why do we compress images? Introduction

More information

Leveling Up as a Data Scientist. ds/2014/10/level-up-ds.jpg

Leveling Up as a Data Scientist.   ds/2014/10/level-up-ds.jpg Model Optimization Leveling Up as a Data Scientist http://shorelinechurch.org/wp-content/uploa ds/2014/10/level-up-ds.jpg Bias and Variance Error = (expected loss of accuracy) 2 + flexibility of model

More information

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python J. Ge, X. Li, H. Jiang, H. Liu, T. Zhang, M. Wang and T. Zhao Abstract We describe a new library named picasso, which

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection CSE 416: Machine Learning Emily Fox University of Washington April 12, 2018 Symptom of overfitting 2 Often, overfitting associated with very large

More information

model order p weights The solution to this optimization problem is obtained by solving the linear system

model order p weights The solution to this optimization problem is obtained by solving the linear system CS 189 Introduction to Machine Learning Fall 2017 Note 3 1 Regression and hyperparameters Recall the supervised regression setting in which we attempt to learn a mapping f : R d R from labeled examples

More information

Lecture 22 The Generalized Lasso

Lecture 22 The Generalized Lasso Lecture 22 The Generalized Lasso 07 December 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Class Notes Midterm II - Due today Problem Set 7 - Available now, please hand in by the 16th Motivation Today

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 3 Due Tuesday, October 22, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

The picasso Package for High Dimensional Regularized Sparse Learning in R

The picasso Package for High Dimensional Regularized Sparse Learning in R The picasso Package for High Dimensional Regularized Sparse Learning in R X. Li, J. Ge, T. Zhang, M. Wang, H. Liu, and T. Zhao Abstract We introduce an R package named picasso, which implements a unified

More information

PS 6: Regularization. PART A: (Source: HTF page 95) The Ridge regression problem is:

PS 6: Regularization. PART A: (Source: HTF page 95) The Ridge regression problem is: Economics 1660: Big Data PS 6: Regularization Prof. Daniel Björkegren PART A: (Source: HTF page 95) The Ridge regression problem is: : β "#$%& = argmin (y # β 2 x #4 β 4 ) 6 6 + λ β 4 #89 Consider the

More information

Machine Learning: An Applied Econometric Approach Online Appendix

Machine Learning: An Applied Econometric Approach Online Appendix Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail

More information

Sparsity Based Regularization

Sparsity Based Regularization 9.520: Statistical Learning Theory and Applications March 8th, 200 Sparsity Based Regularization Lecturer: Lorenzo Rosasco Scribe: Ioannis Gkioulekas Introduction In previous lectures, we saw how regularization

More information

Nearest Neighbor Predictors

Nearest Neighbor Predictors Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,

More information

ELEG Compressive Sensing and Sparse Signal Representations

ELEG Compressive Sensing and Sparse Signal Representations ELEG 867 - Compressive Sensing and Sparse Signal Representations Gonzalo R. Arce Depart. of Electrical and Computer Engineering University of Delaware Fall 211 Compressive Sensing G. Arce Fall, 211 1 /

More information

Chapter 7: Numerical Prediction

Chapter 7: Numerical Prediction Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

Package svmpath. R topics documented: August 30, Title The SVM Path Algorithm Date Version Author Trevor Hastie

Package svmpath. R topics documented: August 30, Title The SVM Path Algorithm Date Version Author Trevor Hastie Title The SVM Path Algorithm Date 2016-08-29 Version 0.955 Author Package svmpath August 30, 2016 Computes the entire regularization path for the two-class svm classifier with essentially the same cost

More information

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression Machine learning, pattern recognition and statistical data modelling Lecture 3. Linear Methods (part 1) Coryn Bailer-Jones Last time... curse of dimensionality local methods quickly become nonlocal as

More information

6 Model selection and kernels

6 Model selection and kernels 6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis

More information

Lecture 26: Missing data

Lecture 26: Missing data Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:

More information

Package hiernet. March 18, 2018

Package hiernet. March 18, 2018 Title A Lasso for Hierarchical Interactions Version 1.7 Author Jacob Bien and Rob Tibshirani Package hiernet March 18, 2018 Fits sparse interaction models for continuous and binary responses subject to

More information

Gradient LASSO algoithm

Gradient LASSO algoithm Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents

More information

3 Nonlinear Regression

3 Nonlinear Regression CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic

More information

LECTURE 12: LINEAR MODEL SELECTION PT. 3. October 23, 2017 SDS 293: Machine Learning

LECTURE 12: LINEAR MODEL SELECTION PT. 3. October 23, 2017 SDS 293: Machine Learning LECTURE 12: LINEAR MODEL SELECTION PT. 3 October 23, 2017 SDS 293: Machine Learning Announcements 1/2 Presentation of the CS Major & Minors TODAY @ lunch Ford 240 FREE FOOD! Announcements 2/2 CS Internship

More information

Gelman-Hill Chapter 3

Gelman-Hill Chapter 3 Gelman-Hill Chapter 3 Linear Regression Basics In linear regression with a single independent variable, as we have seen, the fundamental equation is where ŷ bx 1 b0 b b b y 1 yx, 0 y 1 x x Bivariate Normal

More information

Lab 10 - Ridge Regression and the Lasso in Python

Lab 10 - Ridge Regression and the Lasso in Python Lab 10 - Ridge Regression and the Lasso in Python March 9, 2016 This lab on Ridge Regression and the Lasso is a Python adaptation of p. 251-255 of Introduction to Statistical Learning with Applications

More information

Package flam. April 6, 2018

Package flam. April 6, 2018 Type Package Package flam April 6, 2018 Title Fits Piecewise Constant Models with Data-Adaptive Knots Version 3.2 Date 2018-04-05 Author Ashley Petersen Maintainer Ashley Petersen

More information

The grplasso Package

The grplasso Package The grplasso Package June 27, 2007 Type Package Title Fitting user specified models with Group Lasso penalty Version 0.2-1 Date 2007-06-27 Author Lukas Meier Maintainer Lukas Meier

More information

Data mining techniques for actuaries: an overview

Data mining techniques for actuaries: an overview Data mining techniques for actuaries: an overview Emiliano A. Valdez joint work with Banghee So and Guojun Gan University of Connecticut Advances in Predictive Analytics (APA) Conference University of

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. 1/44 Cross-validation and the Bootstrap In the section we discuss two resampling

More information

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Non-Linear Regression Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Understanding the need for non-parametric regressions 2 Familiarizing with two common

More information

Text Modeling with the Trace Norm

Text Modeling with the Trace Norm Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 3 Parametric Distribu>ons We want model the probability

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

Simulation studies. Patrick Breheny. September 8. Monte Carlo simulation Example: Ridge vs. Lasso vs. Subset

Simulation studies. Patrick Breheny. September 8. Monte Carlo simulation Example: Ridge vs. Lasso vs. Subset Simulation studies Patrick Breheny September 8 Patrick Breheny BST 764: Applied Statistical Modeling 1/17 Introduction In statistics, we are often interested in properties of various estimation and model

More information

Package EBglmnet. January 30, 2016

Package EBglmnet. January 30, 2016 Type Package Package EBglmnet January 30, 2016 Title Empirical Bayesian Lasso and Elastic Net Methods for Generalized Linear Models Version 4.1 Date 2016-01-15 Author Anhui Huang, Dianting Liu Maintainer

More information

GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute

GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute GENREG WHAT IS IT? The Generalized Regression platform was introduced in JMP Pro 11 and got much better in version

More information

Overfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff

Overfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff Overfitting Machine Learning CSE546 Carlos Guestrin University of Washington October 2, 2013 1 Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More

More information

Multicollinearity and Validation CIVL 7012/8012

Multicollinearity and Validation CIVL 7012/8012 Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression

More information

REPLACING MLE WITH BAYESIAN SHRINKAGE CAS ANNUAL MEETING NOVEMBER 2018 GARY G. VENTER

REPLACING MLE WITH BAYESIAN SHRINKAGE CAS ANNUAL MEETING NOVEMBER 2018 GARY G. VENTER REPLACING MLE WITH BAYESIAN SHRINKAGE CAS ANNUAL MEETING NOVEMBER 2018 GARY G. VENTER ESTIMATION Problems with MLE known since Charles Stein 1956 paper He showed that when estimating 3 or more means, shrinking

More information

Conquering Massive Clinical Models with GPU. GPU Parallelized Logistic Regression

Conquering Massive Clinical Models with GPU. GPU Parallelized Logistic Regression Conquering Massive Clinical Models with GPU Parallelized Logistic Regression M.D./Ph.D. candidate in Biomathematics University of California, Los Angeles Joint Statistical Meetings Vancouver, Canada, July

More information

Dimension Reduction Methods for Multivariate Time Series

Dimension Reduction Methods for Multivariate Time Series Dimension Reduction Methods for Multivariate Time Series BigVAR Will Nicholson PhD Candidate wbnicholson.com github.com/wbnicholson/bigvar Department of Statistical Science Cornell University May 28, 2015

More information

Kernel Methods & Support Vector Machines

Kernel Methods & Support Vector Machines & Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector

More information

Package msda. February 20, 2015

Package msda. February 20, 2015 Title Multi-Class Sparse Discriminant Analysis Version 1.0.2 Date 2014-09-26 Package msda February 20, 2015 Author Maintainer Yi Yang Depends Matri, MASS Efficient procedures for computing

More information

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression Goals: To open up the black-box of scikit-learn and implement regression models. To investigate how adding polynomial

More information

The problem we have now is called variable selection or perhaps model selection. There are several objectives.

The problem we have now is called variable selection or perhaps model selection. There are several objectives. STAT-UB.0103 NOTES for Wednesday 01.APR.04 One of the clues on the library data comes through the VIF values. These VIFs tell you to what extent a predictor is linearly dependent on other predictors. We

More information

Lasso.jl Documentation

Lasso.jl Documentation Lasso.jl Documentation Release 0.0.1 Simon Kornblith Jan 07, 2018 Contents 1 Lasso paths 3 2 Fused Lasso and trend filtering 7 3 Indices and tables 9 i ii Lasso.jl Documentation, Release 0.0.1 Contents:

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model

More information

Introduction to Machine Learning Spring 2018 Note Sparsity and LASSO. 1.1 Sparsity for SVMs

Introduction to Machine Learning Spring 2018 Note Sparsity and LASSO. 1.1 Sparsity for SVMs CS 189 Introduction to Machine Learning Spring 2018 Note 21 1 Sparsity and LASSO 1.1 Sparsity for SVMs Recall the oective function of the soft-margin SVM prolem: w,ξ 1 2 w 2 + C Note that if a point x

More information

1 Training/Validation/Testing

1 Training/Validation/Testing CPSC 340 Final (Fall 2015) Name: Student Number: Please enter your information above, turn off cellphones, space yourselves out throughout the room, and wait until the official start of the exam to begin.

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements

More information

biglasso: extending lasso model to Big Data in R

biglasso: extending lasso model to Big Data in R biglasso: extending lasso model to Big Data in R Yaohui Zeng, Patrick Breheny Package Version: 1.2-3 December 1, 2016 1 User guide 1.1 Small data When the data size is small, the usage of biglasso package

More information

Package scoop. September 16, 2011

Package scoop. September 16, 2011 Package scoop September 16, 2011 Version 0.2-1 Date 2011-xx-xx Title Sparse cooperative regression Author Julien Chiquet Maintainer Julien Chiquet Depends MASS, methods

More information

COURSE WEBPAGE. Peter Orbanz Applied Data Mining

COURSE WEBPAGE.   Peter Orbanz Applied Data Mining INTRODUCTION COURSE WEBPAGE http://stat.columbia.edu/~porbanz/un3106s18.html iii THIS CLASS What to expect This class is an introduction to machine learning. Topics: Classification; learning ; basic neural

More information

Package polywog. April 20, 2018

Package polywog. April 20, 2018 Package polywog April 20, 2018 Title Bootstrapped Basis Regression with Oracle Model Selection Version 0.4-1 Date 2018-04-03 Author Maintainer Brenton Kenkel Routines for flexible

More information

Package TVsMiss. April 5, 2018

Package TVsMiss. April 5, 2018 Type Package Title Variable Selection for Missing Data Version 0.1.1 Date 2018-04-05 Author Jiwei Zhao, Yang Yang, and Ning Yang Maintainer Yang Yang Package TVsMiss April 5, 2018

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,

More information

Applied Statistics and Econometrics Lecture 6

Applied Statistics and Econometrics Lecture 6 Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,

More information

Package SSLASSO. August 28, 2018

Package SSLASSO. August 28, 2018 Package SSLASSO August 28, 2018 Version 1.2-1 Date 2018-08-28 Title The Spike-and-Slab LASSO Author Veronika Rockova [aut,cre], Gemma Moran [aut] Maintainer Gemma Moran Description

More information

Exercise: Graphing and Least Squares Fitting in Quattro Pro

Exercise: Graphing and Least Squares Fitting in Quattro Pro Chapter 5 Exercise: Graphing and Least Squares Fitting in Quattro Pro 5.1 Purpose The purpose of this experiment is to become familiar with using Quattro Pro to produce graphs and analyze graphical data.

More information