Chapter 10: Variable Selection. November 12, 2018

Size: px
Start display at page:

Download "Chapter 10: Variable Selection. November 12, 2018"

Transcription

1 Chapter 10: Variable Selection November 12, 2018

2 1 Introduction 1.1 The Model-Building Problem The variable selection problem is to find an appropriate subset of regressors. It involves two conflicting objectives. (1) We would like the model to include as many regressors as possible so that the information content in these factors can influence the pre-

3 dicted value ofy. (2) We want the model to include as few regressors as possible because the variance of the prediction ŷ increases as the number of regressors increases. Also the more regressors there are in a model, the greater the costs of data collection and model maintenance. The process of finding a model that is a compromise between these two objectives is called select-

4 ing the best regression equation. Unfortunately, there is no unique definition of best. Different procedures may result in different subsets of the candidate regressors as best. In fact, there usually is not a single best equation but rather several equally good ones. The variable selection problem is often discussed in an idealized setting. It is usually assumed that the correct functional specification of the regres-

5 sors is known, and that no outliers or influential observations are present. In practice, these assumptions are rarely met. An iterative approach is often employed: (1) A particular variable selection strategy is employed. (2) The resulting subset model is checked for correct functional specification, outliers, and influential observations.

6 Several iterations may be required to produce an adequate model.

7 1.2 Consequences of Model Misspecification Let y i = β 0 + K j=1 β j x ij +ǫ i, i = 1,2,,n or equivalently y = Xβ +ǫ

8 denote the full model, which consists of all K regressors. Let y i = β 0 + or equivalently p 1 i=1 β jx ij +ǫ i, i = 1,2,,n y = X p β p +ǫ denote a subset model of (p 1) regressors. In this chapter, we assume that all subset models include the intercept term. The properties of the es-

9 timates ˆβ p and ˆσ 2 p can be summarized as follows: 1. E(ˆβ p ) = β p +(X px p ) 1 X px r β r = β p +AB r, where X r is the matrix consisting of all regressors not included in X p, β r are the regression coefficients corresponding tox r,a = (X px p ) 1 X px r is sometimes called the alias matrix. Thus, ˆβp is a biased estimate ofβ p unlessβ r = 0 orx px r = 0.

10 2. The variance of ˆβp and ˆβ are Var(ˆβ p ) = σ 2 (X px p ) 1 and Var(ˆβ) = σ 2 (X X) 1, respectively. The matrix Var(ˆβ f p) Var(ˆβ p ) is positive semidefinite, that is, the variance of the least squares estimates of the parameters in the full model are greater than or equal to the variances of the corresponding estimates in the subset model. 3. Since ˆβ p is possibly a biased estimate ofβ p,

11 its precision can be measured in terms of mean square error. Recall that if ˆθ is an estimate of the parameterθ, the mean square error of ˆθ is MSE(ˆθ) = Var(ˆθ)+[E(ˆθ) θ] 2. The mean square error of ˆβ p is MSE(ˆβ p ) = σ 2 (X px p ) 1 +Aβ r β ra. 4. The estimate ˆσ 2 f is an unbiased estimate of

12 σ 2. However, for the subset model E(ˆσ 2 p) = σ 2 + β rx r[i X p (X px p ) 1 X p]x r β r n p That is, ˆσ 2 p is generally biased upward as an estimate ofσ 2.

13 1.3 Criteria for evaluating subset regression models Coefficient of Multiple Determination LetR 2 p denote the coefficient of multiple determination for a subset regression model withpterms. Computationally, R 2 p = 1 SS Res(p) SS T. Note that there are ( K p 1) values of R 2 p for each value of p, one for each possible subset model of size p. R 2 p increases as p increases and is a

14 maximum when p = K +1. Therefore, the analyst uses this criterion by adding regressors to the model up to the point where an additional variable is not useful in that it provides only a small increase inrp 2. Aitkin (1974) proposed one solution to this problem by providing a test by which all subset regression models that have anr 2 not significantly different from ther 2 for the full model can be identified.

15 Let R 2 0 = 1 (1 R 2 K+1)(1+d α,n,k ), where d α,n,k = KF α,n,n K 1 n K 1. Aitkin calls any subset of of regressor variables producing anr 2 greater thanr 2 0 anr2 -adequate(α) subset.

16 AdjustR 2 AdjRp 2 = 1 SS Res(p)/(n p). SS T /(n 1) The AdjR 2 p statistic does not necessarily increase as additional regressors are introduced into the model. In fact, it can be shown (Seber, 1977) that if s regressors are added to the model, AdjR 2 p+s will exceed AdjR 2 p if and only if the partialf -statistic for testing the significance of the s additional regressors exceeds 1.

17 One criterion for selection of an optimum subset model is to choose the model that has a maximum AdjR 2 p.

18 Residual Mean Square The residual mean square for a subset regression model is defined as MS res (p) = SS Res(p) n p. Comparing to AdjRp 2, it is easy to see that the subset regression model that minimizes MS Res (p) will also maximize AdjR 2 p.

19 Mallows C p statistic The mean squared error of a fitted value is E[ŷ i E(y i )] 2 = [E(y i ) E(ŷ i )] 2 +Var(ŷ i ) (1) where E(y i ) is the expected response from the true regression equation, ande(ŷ i ) is the expected response from the p-term subset model. The two terms of the right-hand side of (1) are the squared bias and variance components, respectively, of the

20 mean square error. Since (Appendix C.3) n E[ (ŷ i y i ) 2 ] = E[SS Res (p)] = SS B (p)+(n p)σ 2, i=1 where and n SS B (p) = [E(y i ) E(ŷ i )] 2 i=1 n Var(ŷ i ) = pσ 2. i=1

21 Hence, minimizing the total mean squared error is equivalent to minimizing Γ p = n i=1 E[ŷ i E(y i )] 2 σ 2 = 1 σ 2{E[SS Res(p)] (n p)σ 2 +pσ 2 } = ESS Res(p) σ 2 +2p n. Suppose thatˆσ 2 is a good estimate ofσ 2. Then replacinge[ss Res (p)] 2 by the observed valuess Res (p)

22 produces an estimate ofγ p, say C p = SS Res(p) σ 2 +2p n. If thep-term model has negligible bias, thenss B (p) = 0. Consequently,E[SS Res (p)] = (n p)σ 2, and E[C p Bias = 0] = (n p)σ2 σ 2 +2p n = p. Hence, a good model should have C p p. However, when k is large, an exhaustive search for the models where C p p is impossible and

23 the minimum C p is often used for the model selection. Mallows (1973) warned that minimizing C p may lead to the selection of a model that has a larger mean squared prediction error than the full model. Hocking (1976) suggested to choose any other model under the line C p = p. Mallows (1995) further suggested that any candidate model with C p < p should be carefully examined as a potential best model.

24 Information Criteria Criteria for comparing various candidate subsets are based on the lack of fit of a model and its complexity. Lack of fit for a candidate subset ofx is measured by its residual sum of squaresss Res,p. Complexity for multiple linear regression models is measured by the number of termspin the subset, including the intercept. The most common criterion that is useful in multiple linear regression and many other problems

25 where model comparison is at issue is the Akaike Information Criterion, or AIC, which is given as follows: AIC p = nlog(ss Res,p )+2p. Small value of AIC are preferred. An alternative to AIC is the Bayes Information criterion, or BIC, which is given by BIC p = nlog(ss Res,p )+plog(n). Once again, smaller values are preferred.

26 Cross-Validation Cross-validation can also be used to compare candidate subset mean functions. The most straight-forward type of cross-validation is to split the data into two parts at random, a construction set and a validation set. The construction set is used to estimate the parameters in the mean function. Fitted values from this fit are then computed for the cases in the validation set, and the average of the squared differences between the re-

27 sponse and the fitted values for the validation set is used as a summary if fit for the candidate subset. Good candidates will have small cross-validation errors. Another version of cross-validation uses predicted residuals for the subset mean function based in the candidate X p. For this criterion, compute the fitted value for each i from a regression that does not include case i. The sum of squares of these

28 values is the predicted residual sum of squares, or PRESS, n n ( epi ) 2. PRESS p = i=1 (y i x p(i) β p(i) ) 2 = i=1 1 h pii 2 Computational Techniques for Variable Selection Highway Accident Data We will use the highway accident data to demonstrate the variable selection procedures. The response variable, log(rate), and

29 predictors are described in the following table. Variable Description log(rate) Based-two logarithm of 1973 accident rate per million log(len) vehicle miles, the response. Base-two logarithm of the length of the segment in miles. log(adt) Base-two logarithm of average daily traffic count in thousands. log(t rks) Base-two logarithm of truck volume as a percent of the total volume.

30 Slim Lwid Shld Itg 1973 speed limit Lane width in feet Shoulder width in feet of outer shoulder on the roadway Number of freeway-type interchanges per mile in the segment log(sigs1) Base-two logarithm of (number of signalized interchanges per mile in the segment +1)/(length of segment) Acpt Hwy Number of access points per mile in the segment A factor coded 0 if a federal interstate highway, 1 if a principal arterial highway, 2 if a major arterial, and 3 otherwise

31 2.1 All possible regression This procedure requires the analyst fit all possible candidate models. For example, if there are K candidate regressors, then totally there are2 K candidate models. 2.2 Stepwise regression methods Forward selection The procedure begins with a null model (only the intercept term is included), and

32 attempts to insert regressors one by one according to the partial F-statistics. The procedure terminates either when the partialf -statistic at a particular step does not exceed the pre-specified cutoff value F IN or when the last candidate regressor is added to the model. At each step, the regressor with the largest F - statistic F = SS R(x i+1 x 1,,x i ) MS Res (x 1,,x i,x i+1 )

33 value is chosen to be inserted.

34 Backward elimination Backward elimination begins with a model that includes all K regressors, and attempts to eliminate regressors one by one according to the partial F -statistics. The procedure terminates when the smallest partial F value is not less than the pre-specified cutoff valuef OUT. At each step, the partialf -statistic is computed for each regressor of the current model as if it were the last variable to enter the model. If the smallest

35 F value is less thanf OUT, then the corresponding regressor is removed from the model.

36 Stepwise Regression Stepwise regression is a modification of forward selection in which at each step all regressors entered into the model previously are reassessed via their partialf -statistics. A regressor added at an earlier step may now be redundant because of the relationships between it and regressors now in the equation. If the partial F - statistic for a variable is less thanf OUT, that variable is dropped from the model.

37 Stepwise regression requires two cutoff values, F IN and F OUT. Some analysts prefer to choose F IN = F OUT, although this is not necessary. Frequently we choose F IN > F OUT, making it relatively more difficult to add a regressor than to delete one. About the choice of the values off IN andf OUT. A popular setting isf IN = F OUT = 4.0, and this corresponds roughly to the upper 5% point of the

38 F distribution.

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100. Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records

More information

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Variable Selection - Chapters 8 in Dielman Artin Department of Statistical Science October 23, 2009 Outline Introduction 1 Introduction 2 3 4 Variable Selection Model

More information

Variable selection is intended to select the best subset of predictors. But why bother?

Variable selection is intended to select the best subset of predictors. But why bother? Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider

More information

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy 2017 ITRON EFG Meeting Abdul Razack Specialist, Load Forecasting NV Energy Topics 1. Concepts 2. Model (Variable) Selection Methods 3. Cross- Validation 4. Cross-Validation: Time Series 5. Example 1 6.

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider

More information

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

BIOL 458 BIOMETRY Lab 10 - Multiple Regression BIOL 458 BIOMETRY Lab 0 - Multiple Regression Many problems in biology science involve the analysis of multivariate data sets. For data sets in which there is a single continuous dependent variable, but

More information

Model selection. Peter Hoff STAT 423. Applied Regression and Analysis of Variance. University of Washington /53

Model selection. Peter Hoff STAT 423. Applied Regression and Analysis of Variance. University of Washington /53 /53 Model selection Peter Hoff STAT 423 Applied Regression and Analysis of Variance University of Washington Diabetes example: y = diabetes progression x 1 = age x 2 = sex. dim(x) ## [1] 442 64 colnames(x)

More information

Information Criteria Methods in SAS for Multiple Linear Regression Models

Information Criteria Methods in SAS for Multiple Linear Regression Models Paper SA5 Information Criteria Methods in SAS for Multiple Linear Regression Models Dennis J. Beal, Science Applications International Corporation, Oak Ridge, TN ABSTRACT SAS 9.1 calculates Akaike s Information

More information

Discussion Notes 3 Stepwise Regression and Model Selection

Discussion Notes 3 Stepwise Regression and Model Selection Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

7. Collinearity and Model Selection

7. Collinearity and Model Selection Sociology 740 John Fox Lecture Notes 7. Collinearity and Model Selection Copyright 2014 by John Fox Collinearity and Model Selection 1 1. Introduction I When there is a perfect linear relationship among

More information

MODEL DEVELOPMENT: VARIABLE SELECTION

MODEL DEVELOPMENT: VARIABLE SELECTION 7 MODEL DEVELOPMENT: VARIABLE SELECTION The discussion of least squares regression thus far has presumed that the model was known with respect to which variables were to be included and the form these

More information

[POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization

[POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization [POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization L. Jason Anastasopoulos ljanastas@uga.edu February 2, 2017 Gradient descent Let s begin with our simple problem of estimating

More information

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

BIOL 458 BIOMETRY Lab 10 - Multiple Regression BIOL 458 BIOMETRY Lab 10 - Multiple Regression Many problems in science involve the analysis of multi-variable data sets. For data sets in which there is a single continuous dependent variable, but several

More information

Multivariate Analysis Multivariate Calibration part 2

Multivariate Analysis Multivariate Calibration part 2 Multivariate Analysis Multivariate Calibration part 2 Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Linear Latent Variables An essential concept in multivariate data

More information

SYS 6021 Linear Statistical Models

SYS 6021 Linear Statistical Models SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are

More information

Chapter 9 Building the Regression Model I: Model Selection and Validation

Chapter 9 Building the Regression Model I: Model Selection and Validation Chapter 9 Building the Regression Model I: Model Selection and Validation 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 9 1 / 42 9.1 Polynomial Regression Models

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

Algorithms for LTS regression

Algorithms for LTS regression Algorithms for LTS regression October 26, 2009 Outline Robust regression. LTS regression. Adding row algorithm. Branch and bound algorithm (BBA). Preordering BBA. Structured problems Generalized linear

More information

Lasso. November 14, 2017

Lasso. November 14, 2017 Lasso November 14, 2017 Contents 1 Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) 1 1.1 The Lasso Estimator.................................... 1 1.2 Computation of the Lasso Solution............................

More information

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series

More information

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Lecture 7: Linear Regression (continued)

Lecture 7: Linear Regression (continued) Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions

More information

Cross-validation. Cross-validation is a resampling method.

Cross-validation. Cross-validation is a resampling method. Cross-validation Cross-validation is a resampling method. It refits a model of interest to samples formed from the training set, in order to obtain additional information about the fitted model. For example,

More information

Model selection and validation 1: Cross-validation

Model selection and validation 1: Cross-validation Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the

More information

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques

More information

Module 7 VIDEO CODING AND MOTION ESTIMATION

Module 7 VIDEO CODING AND MOTION ESTIMATION Module 7 VIDEO CODING AND MOTION ESTIMATION Version ECE IIT, Kharagpur Lesson Block based motion estimation algorithms Version ECE IIT, Kharagpur Lesson Objectives At the end of this less, the students

More information

Chapter 15 Mixed Models. Chapter Table of Contents. Introduction Split Plot Experiment Clustered Data References...

Chapter 15 Mixed Models. Chapter Table of Contents. Introduction Split Plot Experiment Clustered Data References... Chapter 15 Mixed Models Chapter Table of Contents Introduction...309 Split Plot Experiment...311 Clustered Data...320 References...326 308 Chapter 15. Mixed Models Chapter 15 Mixed Models Introduction

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

Model selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41

Model selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41 1/41 Model selection 560 Hierarchical modeling Peter Hoff Statistics, University of Washington /41 Modeling choices Model: A statistical model is a set of probability distributions for your data. In HLM,

More information

Chapter 6: Linear Model Selection and Regularization

Chapter 6: Linear Model Selection and Regularization Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

Two-Stage Least Squares

Two-Stage Least Squares Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes

More information

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Topics in Machine Learning-EE 5359 Model Assessment and Selection Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing

More information

Network Management System Dimensioning with Performance Data. Kaisa Tuisku

Network Management System Dimensioning with Performance Data. Kaisa Tuisku Network Management System Dimensioning with Performance Data Kaisa Tuisku University of Tampere School of Information Sciences Computer Science M. Sc. thesis Supervisor: Jorma Laurikkala June 2016 i University

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

An imputation approach for analyzing mixed-mode surveys

An imputation approach for analyzing mixed-mode surveys An imputation approach for analyzing mixed-mode surveys Jae-kwang Kim 1 Iowa State University June 4, 2013 1 Joint work with S. Park and S. Kim Ouline Introduction Proposed Methodology Application to Private

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM * Which directories are used for input files and output files? See menu-item "Options" and page 22 in the manual.

More information

Model selection Outline for today

Model selection Outline for today Model selection Outline for today The problem of model selection Choose among models by a criterion rather than significance testing Criteria: Mallow s C p and AIC Search strategies: All subsets; stepaic

More information

Gradient LASSO algoithm

Gradient LASSO algoithm Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents

More information

The Truth behind PGA Tour Player Scores

The Truth behind PGA Tour Player Scores The Truth behind PGA Tour Player Scores Sukhyun Sean Park, Dong Kyun Kim, Ilsung Lee May 7, 2016 Abstract The main aim of this project is to analyze the variation in a dataset that is obtained from the

More information

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical

More information

Package msgps. February 20, 2015

Package msgps. February 20, 2015 Type Package Package msgps February 20, 2015 Title Degrees of freedom of elastic net, adaptive lasso and generalized elastic net Version 1.3 Date 2012-5-17 Author Kei Hirose Maintainer Kei Hirose

More information

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017 CPSC 340: Machine Learning and Data Mining Feature Selection Fall 2017 Assignment 2: Admin 1 late day to hand in tonight, 2 for Wednesday, answers posted Thursday. Extra office hours Thursday at 4pm (ICICS

More information

2014 Stat-Ease, Inc. All Rights Reserved.

2014 Stat-Ease, Inc. All Rights Reserved. What s New in Design-Expert version 9 Factorial split plots (Two-Level, Multilevel, Optimal) Definitive Screening and Single Factor designs Journal Feature Design layout Graph Columns Design Evaluation

More information

Sight Distance on Vertical Curves

Sight Distance on Vertical Curves Iowa Department of Transportation Office of Design Sight Distance on Vertical Curves 6D-5 Design Manual Chapter 6 Geometric Design Originally Issued: 01-04-0 Stopping sight distance is an important factor

More information

Considerations for Screening Designs and Follow- Up Experimentation

Considerations for Screening Designs and Follow- Up Experimentation Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2015 Considerations for Screening Designs and Follow- Up Experimentation Robert D. Leonard Virginia Commonwealth

More information

3 Nonlinear Regression

3 Nonlinear Regression 3 Linear models are often insufficient to capture the real-world phenomena. That is, the relation between the inputs and the outputs we want to be able to predict are not linear. As a consequence, nonlinear

More information

Chapter 6 Continued: Partitioning Methods

Chapter 6 Continued: Partitioning Methods Chapter 6 Continued: Partitioning Methods Partitioning methods fix the number of clusters k and seek the best possible partition for that k. The goal is to choose the partition which gives the optimal

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 327 Geometric Regression Introduction Geometric regression is a special case of negative binomial regression in which the dispersion parameter is set to one. It is similar to regular multiple regression

More information

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball

More information

2016 Stat-Ease, Inc. & CAMO Software

2016 Stat-Ease, Inc. & CAMO Software Multivariate Analysis and Design of Experiments in practice using The Unscrambler X Frank Westad CAMO Software fw@camo.com Pat Whitcomb Stat-Ease pat@statease.com Agenda Goal: Part 1: Part 2: Show how

More information

Removing Subjectivity from the Assessment of Critical Process Parameters and Their Impact

Removing Subjectivity from the Assessment of Critical Process Parameters and Their Impact Peer-Reviewed Removing Subjectivity from the Assessment of Critical Process Parameters and Their Impact Fasheng Li, Brad Evans, Fangfang Liu, Jingnan Zhang, Ke Wang, and Aili Cheng D etermining critical

More information

Topics in Machine Learning

Topics in Machine Learning Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur

More information

Applied Statistics and Econometrics Lecture 6

Applied Statistics and Econometrics Lecture 6 Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors

More information

Zero-Inflated Poisson Regression

Zero-Inflated Poisson Regression Chapter 329 Zero-Inflated Poisson Regression Introduction The zero-inflated Poisson (ZIP) regression is used for count data that exhibit overdispersion and excess zeros. The data distribution combines

More information

Statistical Consulting Topics Using cross-validation for model selection. Cross-validation is a technique that can be used for model evaluation.

Statistical Consulting Topics Using cross-validation for model selection. Cross-validation is a technique that can be used for model evaluation. Statistical Consulting Topics Using cross-validation for model selection Cross-validation is a technique that can be used for model evaluation. We often fit a model to a full data set and then perform

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

The Automation of the Feature Selection Process. Ronen Meiri & Jacob Zahavi

The Automation of the Feature Selection Process. Ronen Meiri & Jacob Zahavi The Automation of the Feature Selection Process Ronen Meiri & Jacob Zahavi Automated Data Science http://www.kdnuggets.com/2016/03/automated-data-science.html Outline The feature selection problem Objective

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,

More information

Lab 07: Multiple Linear Regression: Variable Selection

Lab 07: Multiple Linear Regression: Variable Selection Lab 07: Multiple Linear Regression: Variable Selection OBJECTIVES 1.Use PROC REG to fit multiple regression models. 2.Learn how to find the best reduced model. 3.Variable diagnostics and influential statistics

More information

Splines and penalized regression

Splines and penalized regression Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

On Choosing the Right Coordinate Transformation Method

On Choosing the Right Coordinate Transformation Method On Choosing the Right Coordinate Transformation Method Yaron A. Felus 1 and Moshe Felus 1 The Survey of Israel, Tel-Aviv Surveying Engineering Department, MI Halperin Felus Surveying and Photogrammetry

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Feature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262

Feature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262 Feature Selection Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2016 239 / 262 What is Feature Selection? Department Biosysteme Karsten Borgwardt Data Mining Course Basel

More information

Data-Splitting Models for O3 Data

Data-Splitting Models for O3 Data Data-Splitting Models for O3 Data Q. Yu, S. N. MacEachern and M. Peruggia Abstract Daily measurements of ozone concentration and eight covariates were recorded in 1976 in the Los Angeles basin (Breiman

More information

SAS Workshop. Introduction to SAS Programming. Iowa State University DAY 2 SESSION IV

SAS Workshop. Introduction to SAS Programming. Iowa State University DAY 2 SESSION IV SAS Workshop Introduction to SAS Programming DAY 2 SESSION IV Iowa State University May 10, 2016 Controlling ODS graphical output from a procedure Many SAS procedures produce default plots in ODS graphics

More information

MATH 829: Introduction to Data Mining and Analysis Model selection

MATH 829: Introduction to Data Mining and Analysis Model selection 1/12 MATH 829: Introduction to Data Mining and Analysis Model selection Dominique Guillot Departments of Mathematical Sciences University of Delaware February 24, 2016 2/12 Comparison of regression methods

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Exploratory model analysis

Exploratory model analysis Exploratory model analysis with R and GGobi Hadley Wickham 6--8 Introduction Why do we build models? There are two basic reasons: explanation or prediction [Ripley, 4]. Using large ensembles of models

More information

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems RSM Split-Plot Designs & Diagnostics Solve Real-World Problems Shari Kraber Pat Whitcomb Martin Bezener Stat-Ease, Inc. Stat-Ease, Inc. Stat-Ease, Inc. 221 E. Hennepin Ave. 221 E. Hennepin Ave. 221 E.

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Feature Selection Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. Admin Assignment 3: Due Friday Midterm: Feb 14 in class

More information

Bayes Estimators & Ridge Regression

Bayes Estimators & Ridge Regression Bayes Estimators & Ridge Regression Readings ISLR 6 STA 521 Duke University Merlise Clyde October 27, 2017 Model Assume that we have centered (as before) and rescaled X o (original X) so that X j = X o

More information

Machine Learning. Topic 4: Linear Regression Models

Machine Learning. Topic 4: Linear Regression Models Machine Learning Topic 4: Linear Regression Models (contains ideas and a few images from wikipedia and books by Alpaydin, Duda/Hart/ Stork, and Bishop. Updated Fall 205) Regression Learning Task There

More information

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions

More information

Package EBglmnet. January 30, 2016

Package EBglmnet. January 30, 2016 Type Package Package EBglmnet January 30, 2016 Title Empirical Bayesian Lasso and Elastic Net Methods for Generalized Linear Models Version 4.1 Date 2016-01-15 Author Anhui Huang, Dianting Liu Maintainer

More information

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo A Multiple-Line Fitting Algorithm Without Initialization Yan Guo Abstract: The commonest way to fit multiple lines is to use methods incorporate the EM algorithm. However, the EM algorithm dose not guarantee

More information

Multicollinearity and Validation CIVL 7012/8012

Multicollinearity and Validation CIVL 7012/8012 Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.

More information

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Multiresponse Sparse Regression with Application to Multidimensional Scaling Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,

More information

Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique

Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique Al Kadiri M. A. Abstract Benefits of using Ranked Set Sampling (RSS) rather than Simple Random Sampling (SRS) are indeed significant

More information

Parameter Estimation with MOCK algorithm

Parameter Estimation with MOCK algorithm Parameter Estimation with MOCK algorithm Bowen Deng (bdeng2) November 28, 2015 1 Introduction Wine quality has been considered a difficult task that only few human expert could evaluate. However, it is

More information

1 Training/Validation/Testing

1 Training/Validation/Testing CPSC 340 Final (Fall 2015) Name: Student Number: Please enter your information above, turn off cellphones, space yourselves out throughout the room, and wait until the official start of the exam to begin.

More information

Bayesian model selection and diagnostics

Bayesian model selection and diagnostics Bayesian model selection and diagnostics A typical Bayesian analysis compares a handful of models. Example 1: Consider the spline model for the motorcycle data, how many basis functions? Example 2: Consider

More information

Chapter 7: Linear regression

Chapter 7: Linear regression Chapter 7: Linear regression Objective (1) Learn how to model association bet. 2 variables using a straight line (called "linear regression"). (2) Learn to assess the quality of regression models. (3)

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

SASEG 9B Regression Assumptions

SASEG 9B Regression Assumptions SASEG 9B Regression Assumptions (Fall 2015) Sources (adapted with permission)- T. P. Cronan, Jeff Mullins, Ron Freeze, and David E. Douglas Course and Classroom Notes Enterprise Systems, Sam M. Walton

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI

More information

USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY

USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY 1 USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY Leo Breiman Statistics Department University of California Berkeley, CA 94720 leo@stat.berkeley.edu ABSTRACT A prediction algorithm is consistent

More information