[POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization

Size: px
Start display at page:

Download "[POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization"

Transcription

1 [POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization L. Jason Anastasopoulos February 2, 2017

2 Gradient descent Let s begin with our simple problem of estimating the parameters for a linear regression model with gradient descent. Where the gradient J(θ) is in general defined as: J(θ) = And in the case of linear regression is: [ J, J,, J ] θ 0 θ 1 θ p J(θ) = 1 N (y T θx T )X

3 Gradient descent for linear regression The gradient decent algorithm finds parameters in the following manner: repeat while ( η J(θ) > ɛ){ } θ := θ η 1 N (y T θx T )X

4 Gradient descent R function As it turns out, this is quite easy to implement in R as a function which we call gradientr below: gradientr

5 Normal equations in R Let s also make a function that estimates the parameters with the normal equations: θ = (X T X) 1 X T Y normalest <- function(y, X){ X = data.frame(rep(1,length(y)),x) X = as.matrix(x) theta = solve(t(x)%*%x)%*%t(x)%*%y return(theta) }

6 Running gradient descent Now let s make up some fake data and see gradient descent in action with η = 10 and 1000 epochs: y = rnorm(n = 1000, mean = 0, sd = 1) x1 = rnorm(n = 1000, mean = 0, sd = 1) x2 = rnorm(n = 1000, mean = 0, sd = 1) x3 = rnorm(n = 1000, mean = 0, sd = 1) x4 = rnorm(n = 1000, mean = 0, sd = 1) x5 = rnorm(n = 1000, mean = 0, sd = 1) gdec.eta1 = gradientr(y = y, X = data.frame(x1,x2,x3, x4,x5), eta = 10, iters = 1000)

7 Did we get the correct parameter values? Let s check if we got the correct parameter values Explicit.Coef<-normalest(y=y, X = data.frame(x1,x2,x3,x4,x5)) Gradient.Coef<-gdec.eta1$coef data.frame(explicit.coef, Gradient.Coef) ## Explicit.Coef Gradient.Coef ## rep.1..length.y ## x ## x ## x ## x ## x

8 L 2 loss for each epoch Let s take a look at the L2-loss for each epoch: L2 loss Epoch

9 What if we decreased η? What if we decreased η = 1?: L2 loss Epoch

10 Stochastic Gradient Descent Gradient descent can often have slow convergence because each iteration requires calculation of the gradient for every single training example. If we update the parameters each time by iterating through each training example, we can actually get excellent estimates despite the fact that we ve done less work.

11 Stochastic Gradient Descent For stochastic gradient descent, thus: Becomes: J(θ) = 1 N (y T θx T )X J(θ) i = 1 N (y i θ T X i )X i Where i is each row of the data set.

12 Stochastic Gradient Descent Algorithm This the stochastic gradient descent algorithm proceeds as follows for the case of linear regression: Step 1 Randomly shuffle the data Step 2 repeat for {i := 1,, N}{ θ := θ η J(θ) i } Part of the homework assignment will be to write a R function that performs stochastic gradient descent.

13 Linear model selection and regularization Prediction accuracy Recall the standard linear model: P Y = θ 0 + θ i x i + ɛ i=1 If n >> p least squares will do well on test obs. If n > p lot of variability in least-squares fit, overfitting. If p > n there is no solution that the normal eqns can explicitly solve.

14 Linear model selection and regularization Model interpretatibility Inclusion of many irrelevant variables increases chances of overfitting. Often need ways to perform feature selection for linear regression and other models.

15 Three ways to accomplish this Subset Selection- find p P that best fit Y. Shrinkage/Regularization - Fit model to all p predictors, shrink values of some θ 0. Dimension Reduction - Project p onto M dimensional subspace s.t. M < p. Use M as predictors.

16 Best subset selection Y = θ 0 + θ 1 x θ p x p Choose from among the best predictors of a model according to some criteria. C p, AIC, BIC or adj. R 2

17 Best subset selection algorithm 1. M 0 denote the null model. 2. for i = 1,, p{ Y = θ 0 + θ 1 x θ p x p Fit each ( p i ) models containing i predictors. Choose the best among the ( p i ) models M with the lowest RSS (highest R 2 ) 3. Select best model M among the M using some criteria Cross-validated prediction error, AIC, BIC or adj. R 2.

18 Example in R using bestglm ## Loading required package: leaps ## Warning: package 'leaps' was built under R version str(zprostate) ## 'data.frame': 97 obs. of 10 variables: ## $ lcavol : num ## $ lweight: num ## $ age : num ## $ lbph : num ## $ svi : num ## $ lcp : num ## $ gleason: num ## $ pgg45 : num ## $ lpsa : num ## $ train : logi TRUE TRUE TRUE TRUE TRUE TRUE...

19 Example in R using bestglm str(zprostate) ## 'data.frame': 97 obs. of 10 variables: ## $ lcavol : num ## $ lweight: num ## $ age : num ## $ lbph : num ## $ svi : num ## $ lcp : num ## $ gleason: num ## $ pgg45 : num ## $ lpsa : num ## $ train : logi TRUE TRUE TRUE TRUE TRUE TRUE...

20 ## (Intercept) lcavol lweight age lbph svi lcp gle ## 1 TRUE TRUE FALSE FALSE FALSE FALSE FALSE F ## 2 TRUE TRUE TRUE FALSE FALSE FALSE FALSE F ## 3 TRUE TRUE TRUE FALSE FALSE TRUE FALSE F ## 4 TRUE TRUE TRUE FALSE TRUE TRUE FALSE F ## 5 TRUE TRUE TRUE FALSE TRUE TRUE FALSE F ## RSS ## ## ## ## Example in R using bestglm train <- (zprostate[zprostate[, 10], ])[, -10] X <- train[, 1:8] y <- train[, 9] out <- summary(regsubsets(x = X, y = y, nvmax = ncol(x))) Subsets <- out$which RSS <- out$rss

21 Example in R using bestglm Let bestglm() find the best model using Bayesian information criterion (default, BIC) Xy <- cbind(as.data.frame(x), lpsa = y) out <- bestglm(xy, IC = "BIC") out$bestmodel ## ## Call: ## lm(formula = y ~., data = data.frame(xy[, c(bestset[-1] ## drop = FALSE], y = y)) ## ## Coefficients: ## (Intercept) lcavol lweight ##

22 Example in R using bestglm Let bestglm() find the best model using Akaike information criterion (AIC) Xy <- cbind(as.data.frame(x), lpsa = y) out <- bestglm(xy, IC = "AIC") out$bestmodel ## lm(formula = y ~., data = data.frame(xy[, c(bestset[-1] ## ## Call: ## drop = FALSE], y = y)) ## ## Coefficients: ## (Intercept) lcavol lweight age ## ## svi lcp pgg45 ##

23 Example in R using bestglm Let bestglm() find the best model using cross-validated prediction error. Xy <- cbind(as.data.frame(x), lpsa = y) out <- bestglm(xy, IC = "CV", CVArgs=list(Method="HTF", K=10, REP=1)) out$bestmodel ## ## Call: ## lm(formula = y ~., data = data.frame(xy[, c(bestset[-1] ## drop = FALSE], y = y)) ## ## Coefficients: ## (Intercept) lcavol lweight svi ##

24 Problems with best subset selection What if you have p = 50 predictors and wanted to choose a model with a variable subset of i = 10? That would require you to estimate ( 50 10) = models! Very computationally intensive.

25 Forward stepwise selection Does not consider all possible subsets of models Starts with null model M 0, adds predictors on at a time and assesses greatest additional improvement in fit.

26 Forward stepwise selection algorithm 1. M 0 denote the null model. 2. for i = 1,, p 1 : Y = θ 0 + θ 1 x θ p x p Consider all p i models that alter the predictors M i with an additional predictor. Choose the best among the p i models M + with the lowest RSS (highest R 2 ) 3. Select best model M among the M 0,, M 0 using some criteria C p, AIC, BIC or adj. R 2.

27 Forward stepwise selection algorithm Cuts down the number of models that you need to estimate significantly. Only estimate the null model and p i models on the i th iteration. Total Models = p (p i) = 1 + i=1 p(p + 1) 2

28 Forward stepwise selection in R using step() USJudgeRatings Dataset # CONT Number of contacts of lawyer with judge. # INTG Judicial integrity. # DMNR Demeanor. # DILG Diligence. # CFMG Case flow managing. # DECI Prompt decisions. # PREP Preparation for trial. # FAMI Familiarity with law. # ORAL Sound oral rulings. # WRIT Sound written rulings. # PHYS Physical ability. # RTEN Worthy of retention.

29 Forward stepwise selection in R using step() What if we wanted to train a model to predict whether a judge would be worthy of retention (RTEN)? library(datasets) data(usjudgeratings)

30 Forward stepwise selection in R using step() null.model = lm(rten~1, data=usjudgeratings) largest.model = lm(usjudgeratings$rten~., data=usjudgeratings[,1:11]) forward.stepwise = step(null.model, direction = 'backward', scope = largest.model) ## Start: AIC=9.26 ## RTEN ~ 1

31 Model selection criteria While R 2 provides a measure of fit, it will always increase as the number of predictors increase. i R 2 = 1 (y i (θ 0 + ( p θ px ip )) 2 i (y i ȳ) 2

32 Model selection criteria Thus, in order to avoid estimating models that tend to overfit data, we need to use criteria which penalize models that have more features. Here we consider C p, Akaike information criterion (AIC), Bayesian information criterion (BIC) and adjusted R 2.

33 C p C p = 1 n (RSS + 2d ˆσ2 ) In the equation above a penalty of 2d ˆσ 2 is added to the residual sum of squares where d is the number of parameters.

34 AIC AIC = 1 nˆσ 2 (RSS + 2d ˆσ2 )

35 BIC BIC = 1 n (RSS + log(n)d ˆσ2 )

36 Adjusted R 2 AdjR 2 = 1 RSS/(n d 1) TSS/(n 1)

37 Validation set approach In order to understand cross validation we have to take a step back. We ve already discussed the training set and the test set. But we haven t discussed the validation set

38 Validation set approach Training set: The data that you train your model with (ie estimate parameters). Test set: Data that you use to test how well your trained model predicts new data. Validation set: Data that is used to provide a more accurate estimate of the performance of one or several models and often to avoid overfitting.

39 Validation set approach If the goal is to estimate the test error in a number of models, which gives us a sense of how accurate our model makes predictions we might take the following steps: Step 1: divide data into training set and validation set. Step 2: train model on training set, estimate test error rate on validation set.

40 Validation set approach: example Bible and Quran: n 1100 randomly sampled verses. Step 1 Randomly divide data training set n = 1050 and validation set n = Step 2 estimate parameters on training set, estimate test error on validation set.

41 Validation set approach Why not just estimate error on a test set? We want to reserve the test set to verify the final accuracy of the model, but want another data set to provide us with information about how we should adjust out model.

42 Problems with the validation set approach 1. Validation test error estimate is highly variable/depends on validation set chosen. 2. Validation test error estimate may overestimate the error because not enough training data is used to train the model.

43 Leave-One-Out Cross-Validation Training set: {(x 2, y 2 ),, (x n, y n ))} Validation set: {(x 1, y 1 )} LOOCV solves both of these problems. Validation set is only a single observation and training set is the rest of the training data.

44 Leave-One-Out Cross-Validation Algorithm for(i = 1,, n): select validation set {(x i, y i )} select training set and train on {(x i, y i )} estimate MSE i = 1 n 1 i (y i ŷ) 2 Estimate overall MSE: CV n = 1 MSE i n i

45 Leave-One-Out Cross-Validation Algorithm Benefits - provides excellent and stable estimate of MSE. can be used for any kind of model. Drawbacks computationally intensive

46 k-fold Cross-Validation the k-fold method solves the problem of computational complexity. data is divided into k groups or folds. the first fold is used for training, the k 1 folds are used as validation sets.

47 k-fold Cross-Validation Algorithm Choose k Randomly divide data Θ into Θ 1,, Θ k sets of size n k Train model on first fold y 1 = f (X) 1 for (i = {2,, k}): Calculate MSE i = k/n j=1 (y i ŷ) 2 Calculate overall MSE: 1 k CV k = k i=1 MSE i

48 For next time Regularization Logistic Regression Linear Discriminant Analysis N{"a}ive Bayes

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100. Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records

More information

SAS Workshop. Introduction to SAS Programming. Iowa State University DAY 3 SESSION I

SAS Workshop. Introduction to SAS Programming. Iowa State University DAY 3 SESSION I SAS Workshop Introduction to SAS Programming DAY 3 SESSION I Iowa State University May 10, 2016 Sample Data: Prostate Data Set Example C8 further illustrates the use of all-subset selection options in

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider

More information

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy 2017 ITRON EFG Meeting Abdul Razack Specialist, Load Forecasting NV Energy Topics 1. Concepts 2. Model (Variable) Selection Methods 3. Cross- Validation 4. Cross-Validation: Time Series 5. Example 1 6.

More information

The Data. Math 158, Spring 2016 Jo Hardin Shrinkage Methods R code Ridge Regression & LASSO

The Data. Math 158, Spring 2016 Jo Hardin Shrinkage Methods R code Ridge Regression & LASSO Math 158, Spring 2016 Jo Hardin Shrinkage Methods R code Ridge Regression & LASSO The Data The following dataset is from Hastie, Tibshirani and Friedman (2009), from a studyby Stamey et al. (1989) of prostate

More information

RESAMPLING METHODS. Chapter 05

RESAMPLING METHODS. Chapter 05 1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation

More information

Penalized regression Statistical Learning, 2011

Penalized regression Statistical Learning, 2011 Penalized regression Statistical Learning, 2011 Niels Richard Hansen September 19, 2011 Penalized regression is implemented in several different R packages. Ridge regression can, in principle, be carried

More information

Variable selection is intended to select the best subset of predictors. But why bother?

Variable selection is intended to select the best subset of predictors. But why bother? Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.

More information

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Variable Selection - Chapters 8 in Dielman Artin Department of Statistical Science October 23, 2009 Outline Introduction 1 Introduction 2 3 4 Variable Selection Model

More information

Chapter 6: Linear Model Selection and Regularization

Chapter 6: Linear Model Selection and Regularization Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the

More information

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

CS535 Big Data Fall 2017 Colorado State University   10/10/2017 Sangmi Lee Pallickara Week 8- A. CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Interactive Scatterplots

Interactive Scatterplots Interactive Scatterplots Elizabeth Whalen October 7, 2004 1 Overview In the package isplot, the goal is to create interactive, linked scatterplots. The two required packages for isplot are RGtk and gtkdevice.

More information

Multicollinearity and Validation CIVL 7012/8012

Multicollinearity and Validation CIVL 7012/8012 Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.

More information

Machine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD

Machine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD Machine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD Due: Friday, February 6, 2015, at 4pm (Submit via NYU Classes) Instructions: Your answers to the questions

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection CSE 416: Machine Learning Emily Fox University of Washington April 12, 2018 Symptom of overfitting 2 Often, overfitting associated with very large

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

Discussion Notes 3 Stepwise Regression and Model Selection

Discussion Notes 3 Stepwise Regression and Model Selection Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments

More information

Cross-validation. Cross-validation is a resampling method.

Cross-validation. Cross-validation is a resampling method. Cross-validation Cross-validation is a resampling method. It refits a model of interest to samples formed from the training set, in order to obtain additional information about the fitted model. For example,

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Chapter 10: Variable Selection. November 12, 2018

Chapter 10: Variable Selection. November 12, 2018 Chapter 10: Variable Selection November 12, 2018 1 Introduction 1.1 The Model-Building Problem The variable selection problem is to find an appropriate subset of regressors. It involves two conflicting

More information

Machine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD

Machine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD Machine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD Due: Friday, February 5, 2015, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

CS294-1 Assignment 2 Report

CS294-1 Assignment 2 Report CS294-1 Assignment 2 Report Keling Chen and Huasha Zhao February 24, 2012 1 Introduction The goal of this homework is to predict a users numeric rating for a book from the text of the user s review. The

More information

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014 CS273 Midterm Eam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014 Your name: Your UCINetID (e.g., myname@uci.edu): Your seat (row and number): Total time is 80 minutes. READ THE

More information

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

BIOL 458 BIOMETRY Lab 10 - Multiple Regression BIOL 458 BIOMETRY Lab 10 - Multiple Regression Many problems in science involve the analysis of multi-variable data sets. For data sets in which there is a single continuous dependent variable, but several

More information

10601 Machine Learning. Model and feature selection

10601 Machine Learning. Model and feature selection 10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior

More information

Resampling methods (Ch. 5 Intro)

Resampling methods (Ch. 5 Intro) Zavádějící faktor (Confounding factor), ale i 'současně působící faktor' Resampling methods (Ch. 5 Intro) Key terms: Train/Validation/Test data Crossvalitation One-leave-out = LOOCV Bootstrup key slides

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Feature Selection Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. Admin Assignment 3: Due Friday Midterm: Feb 14 in class

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.

More information

Machine Learning. Topic 4: Linear Regression Models

Machine Learning. Topic 4: Linear Regression Models Machine Learning Topic 4: Linear Regression Models (contains ideas and a few images from wikipedia and books by Alpaydin, Duda/Hart/ Stork, and Bishop. Updated Fall 205) Regression Learning Task There

More information

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Topics in Machine Learning-EE 5359 Model Assessment and Selection Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing

More information

Package RAMP. May 25, 2017

Package RAMP. May 25, 2017 Type Package Package RAMP May 25, 2017 Title Regularized Generalized Linear Models with Interaction Effects Version 2.0.1 Date 2017-05-24 Author Yang Feng, Ning Hao and Hao Helen Zhang Maintainer Yang

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Package SSLASSO. August 28, 2018

Package SSLASSO. August 28, 2018 Package SSLASSO August 28, 2018 Version 1.2-1 Date 2018-08-28 Title The Spike-and-Slab LASSO Author Veronika Rockova [aut,cre], Gemma Moran [aut] Maintainer Gemma Moran Description

More information

CSC 411: Lecture 02: Linear Regression

CSC 411: Lecture 02: Linear Regression CSC 411: Lecture 02: Linear Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 16, 2015 Urtasun & Zemel (UofT) CSC 411: 02-Regression Sep 16, 2015 1 / 16 Today Linear regression problem continuous

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. 1/44 Cross-validation and the Bootstrap In the section we discuss two resampling

More information

LECTURE 11: LINEAR MODEL SELECTION PT. 2. October 18, 2017 SDS 293: Machine Learning

LECTURE 11: LINEAR MODEL SELECTION PT. 2. October 18, 2017 SDS 293: Machine Learning LECTURE 11: LINEAR MODEL SELECTION PT. 2 October 18, 2017 SDS 293: Machine Learning Announcements 1/2 CS Internship Lunch Presentations Come hear where Computer Science majors interned in Summer 2017!

More information

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13 CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit

More information

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017 CPSC 340: Machine Learning and Data Mining Feature Selection Fall 2017 Assignment 2: Admin 1 late day to hand in tonight, 2 for Wednesday, answers posted Thursday. Extra office hours Thursday at 4pm (ICICS

More information

14. League: A factor with levels A and N indicating player s league at the end of 1986

14. League: A factor with levels A and N indicating player s league at the end of 1986 PENALIZED REGRESSION Ridge and The LASSO Note: The example contained herein was copied from the lab exercise in Chapter 6 of Introduction to Statistical Learning by. For this exercise, we ll use some baseball

More information

Feature Subset Selection for Logistic Regression via Mixed Integer Optimization

Feature Subset Selection for Logistic Regression via Mixed Integer Optimization Feature Subset Selection for Logistic Regression via Mixed Integer Optimization Yuichi TAKANO (Senshu University, Japan) Toshiki SATO (University of Tsukuba) Ryuhei MIYASHIRO (Tokyo University of Agriculture

More information

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression Goals: To open up the black-box of scikit-learn and implement regression models. To investigate how adding polynomial

More information

Perceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018

Perceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron Matt Gormley Lecture 5 Jan. 31, 2018 1 Q&A Q: We pick the best hyperparameters

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Guidelines Submission. Submit a hardcopy of the report containing all the figures and printouts of code in class. For readability

More information

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016 CPSC 34: Machine Learning and Data Mining Feature Selection Fall 26 Assignment 3: Admin Solutions will be posted after class Wednesday. Extra office hours Thursday: :3-2 and 4:3-6 in X836. Midterm Friday:

More information

Logistic Regression and Gradient Ascent

Logistic Regression and Gradient Ascent Logistic Regression and Gradient Ascent CS 349-02 (Machine Learning) April 0, 207 The perceptron algorithm has a couple of issues: () the predictions have no probabilistic interpretation or confidence

More information

Bayesian model selection and diagnostics

Bayesian model selection and diagnostics Bayesian model selection and diagnostics A typical Bayesian analysis compares a handful of models. Example 1: Consider the spline model for the motorcycle data, how many basis functions? Example 2: Consider

More information

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson Robust Regression Robust Data Mining Techniques By Boonyakorn Jantaranuson Outline Introduction OLS and important terminology Least Median of Squares (LMedS) M-estimator Penalized least squares What is

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting Andrew W. Moore/Anna Goldenberg School of Computer Science Carnegie Mellon Universit Copright 2001, Andrew W. Moore Apr 1st, 2004 Want to learn

More information

Case Study 1: Estimating Click Probabilities

Case Study 1: Estimating Click Probabilities Case Study 1: Estimating Click Probabilities SGD cont d AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade March 31, 2015 1 Support/Resources Office Hours Yao Lu:

More information

Statistical Consulting Topics Using cross-validation for model selection. Cross-validation is a technique that can be used for model evaluation.

Statistical Consulting Topics Using cross-validation for model selection. Cross-validation is a technique that can be used for model evaluation. Statistical Consulting Topics Using cross-validation for model selection Cross-validation is a technique that can be used for model evaluation. We often fit a model to a full data set and then perform

More information

Assignment 6 - Model Building

Assignment 6 - Model Building Assignment 6 - Model Building your name goes here Due: Wednesday, March 7, 2018, noon, to Sakai Summary Primarily from the topics in Chapter 9 of your text, this homework assignment gives you practice

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Gradient LASSO algoithm

Gradient LASSO algoithm Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents

More information

Leveling Up as a Data Scientist. ds/2014/10/level-up-ds.jpg

Leveling Up as a Data Scientist.   ds/2014/10/level-up-ds.jpg Model Optimization Leveling Up as a Data Scientist http://shorelinechurch.org/wp-content/uploa ds/2014/10/level-up-ds.jpg Bias and Variance Error = (expected loss of accuracy) 2 + flexibility of model

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

Programming Exercise 4: Neural Networks Learning

Programming Exercise 4: Neural Networks Learning Programming Exercise 4: Neural Networks Learning Machine Learning Introduction In this exercise, you will implement the backpropagation algorithm for neural networks and apply it to the task of hand-written

More information

IOM 530: Intro. to Statistical Learning 1 RESAMPLING METHODS. Chapter 05

IOM 530: Intro. to Statistical Learning 1 RESAMPLING METHODS. Chapter 05 IOM 530: Intro. to Statistical Learning 1 RESAMPLING METHODS Chapter 05 IOM 530: Intro. to Statistical Learning 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold

More information

Assignment No: 2. Assessment as per Schedule. Specifications Readability Assignments

Assignment No: 2. Assessment as per Schedule. Specifications Readability Assignments Specifications Readability Assignments Assessment as per Schedule Oral Total 6 4 4 2 4 20 Date of Performance:... Expected Date of Completion:... Actual Date of Completion:... ----------------------------------------------------------------------------------------------------------------

More information

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but

More information

In stochastic gradient descent implementations, the fixed learning rate η is often replaced by an adaptive learning rate that decreases over time,

In stochastic gradient descent implementations, the fixed learning rate η is often replaced by an adaptive learning rate that decreases over time, Chapter 2 Although stochastic gradient descent can be considered as an approximation of gradient descent, it typically reaches convergence much faster because of the more frequent weight updates. Since

More information

Information Criteria Methods in SAS for Multiple Linear Regression Models

Information Criteria Methods in SAS for Multiple Linear Regression Models Paper SA5 Information Criteria Methods in SAS for Multiple Linear Regression Models Dennis J. Beal, Science Applications International Corporation, Oak Ridge, TN ABSTRACT SAS 9.1 calculates Akaike s Information

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

MODEL DEVELOPMENT: VARIABLE SELECTION

MODEL DEVELOPMENT: VARIABLE SELECTION 7 MODEL DEVELOPMENT: VARIABLE SELECTION The discussion of least squares regression thus far has presumed that the model was known with respect to which variables were to be included and the form these

More information

CMPSC 390 Visual Computing Spring 2014 Bob Roos Notes on R Graphs, Part 2

CMPSC 390 Visual Computing Spring 2014 Bob Roos   Notes on R Graphs, Part 2 Notes on R Graphs, Part 2 1 CMPSC 390 Visual Computing Spring 2014 Bob Roos http://cs.allegheny.edu/~rroos/cs390s2014 Notes on R Graphs, Part 2 Bar Graphs in R So far we have looked at basic (x, y) plots

More information

Exploring Econometric Model Selection Using Sensitivity Analysis

Exploring Econometric Model Selection Using Sensitivity Analysis Exploring Econometric Model Selection Using Sensitivity Analysis William Becker Paolo Paruolo Andrea Saltelli Nice, 2 nd July 2013 Outline What is the problem we are addressing? Past approaches Hoover

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression Machine learning, pattern recognition and statistical data modelling Lecture 3. Linear Methods (part 1) Coryn Bailer-Jones Last time... curse of dimensionality local methods quickly become nonlocal as

More information

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in

More information

1 StatLearn Practical exercise 5

1 StatLearn Practical exercise 5 1 StatLearn Practical exercise 5 Exercise 1.1. Download the LA ozone data set from the book homepage. We will be regressing the cube root of the ozone concentration on the other variables. Divide the data

More information

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Multiresponse Sparse Regression with Application to Multidimensional Scaling Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,

More information

Model selection. Peter Hoff STAT 423. Applied Regression and Analysis of Variance. University of Washington /53

Model selection. Peter Hoff STAT 423. Applied Regression and Analysis of Variance. University of Washington /53 /53 Model selection Peter Hoff STAT 423 Applied Regression and Analysis of Variance University of Washington Diabetes example: y = diabetes progression x 1 = age x 2 = sex. dim(x) ## [1] 442 64 colnames(x)

More information

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning

More information

MATH 829: Introduction to Data Mining and Analysis Model selection

MATH 829: Introduction to Data Mining and Analysis Model selection 1/12 MATH 829: Introduction to Data Mining and Analysis Model selection Dominique Guillot Departments of Mathematical Sciences University of Delaware February 24, 2016 2/12 Comparison of regression methods

More information

Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent

Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent Slide credit: http://sebastianruder.com/optimizing-gradient-descent/index.html#batchgradientdescent

More information

Using R for Analyzing Delay Discounting Choice Data. analysis of discounting choice data requires the use of tools that allow for repeated measures

Using R for Analyzing Delay Discounting Choice Data. analysis of discounting choice data requires the use of tools that allow for repeated measures Using R for Analyzing Delay Discounting Choice Data Logistic regression is available in a wide range of statistical software packages, but the analysis of discounting choice data requires the use of tools

More information

Model Inference and Averaging. Baging, Stacking, Random Forest, Boosting

Model Inference and Averaging. Baging, Stacking, Random Forest, Boosting Model Inference and Averaging Baging, Stacking, Random Forest, Boosting Bagging Bootstrap Aggregating Bootstrap Repeatedly select n data samples with replacement Each dataset b=1:b is slightly different

More information

06: Logistic Regression

06: Logistic Regression 06_Logistic_Regression 06: Logistic Regression Previous Next Index Classification Where y is a discrete value Develop the logistic regression algorithm to determine what class a new input should fall into

More information

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques

More information

High dimensional data analysis

High dimensional data analysis High dimensional data analysis Cavan Reilly October 24, 2018 Table of contents Data mining Random forests Missing data Logic regression Multivariate adaptive regression splines Data mining Data mining

More information

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 6: k-nn Cross-validation Regularization LEARNING METHODS Lazy vs eager learning Eager learning generalizes training data before

More information

Multivariate Analysis Multivariate Calibration part 2

Multivariate Analysis Multivariate Calibration part 2 Multivariate Analysis Multivariate Calibration part 2 Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Linear Latent Variables An essential concept in multivariate data

More information

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series

More information

Stat 401 B Lecture 26

Stat 401 B Lecture 26 Stat B Lecture 6 Forward Selection The Forward selection rocedure looks to add variables to the model. Once added, those variables stay in the model even if they become insignificant at a later ste. Backward

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors

More information

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

BIOL 458 BIOMETRY Lab 10 - Multiple Regression BIOL 458 BIOMETRY Lab 0 - Multiple Regression Many problems in biology science involve the analysis of multivariate data sets. For data sets in which there is a single continuous dependent variable, but

More information

A faster model selection criterion for OP-ELM and OP-KNN: Hannan-Quinn criterion

A faster model selection criterion for OP-ELM and OP-KNN: Hannan-Quinn criterion A faster model selection criterion for OP-ELM and OP-KNN: Hannan-Quinn criterion Yoan Miche 1,2 and Amaury Lendasse 1 1- Helsinki University of Technology - ICS Lab. Konemiehentie 2, 02015 TKK - Finland

More information

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps Today Gradient descent for minimization of functions of real variables. Multi-dimensional scaling Self-organizing maps Gradient Descent Derivatives Consider function f(x) : R R. The derivative w.r.t. x

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 3 Due Tuesday, October 22, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information