Chapter 6: Linear Model Selection and Regularization

Size: px
Start display at page:

Download "Chapter 6: Linear Model Selection and Regularization"

Transcription

1 Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the estimates gets large and in the case of p>n no solution is possible. Reducing the number of predictors would then both improve the statistical properties of the regression estimates but would also simplify the model making its interpretation easier.

2 Main Topics Subset Selection: there are several approaches to reducing the number of predictor variables and then doing normal linear regression. Shrinkage: If we use all p predictors then some methods will shrink (also called regularization) the magnitude of the predictor. This may entail a small increase in bias with a large reduction in variance. Dimension reduction: We may create linear combinations of the p predictors or project them onto a subspace of smaller dimensionality. Both techniques will reduce the number of predictors prior to normal linear regression.

3 Subset Selection The best subset selection looks at all 2 p models using the following algorithm. (1) Let M be the null model with no parameters. (2) for k=2,..,p fit all p! (= k ) models. Pick the best (M!! ) based on the smallest RSS or largest R 2. (3) Select the best among M,,M using cross-validation (MSE), C p, AIC, BIC, or adjusted R 2. Using R 2 is OK at step (2) since all models have the same number of parameters.

4 Subset Selection For logistic-regression we can use the deviance in place of RSS. The deviance is -2 times the log-likelihood of the model. The smaller the better. The main drawback is the number of models that must be examined. For p=20 it is over one million. For least-squares models there are some shortcuts to fitting all possible models but it still become difficult for large p. Stepwise selection is computationally more efficient.

5 Stepwise Selection: forward selection Forward stepwise selection: this method starts with no predictors and add them one at a time. (1) Let M be the null model with no predictors (2) for k= 0,,p-1, consider all p-k models by adding one parameter to M. Choose the best model (M ) based on the smallest RSS or largest R 2. (3) Select the single best model among M,,M using crossvalidation (MSE), C p, AIC, BIC, or adjusted R 2. As before all the models compared at step (2) have the same number of parameters so using RSS or R 2 is OK.

6 Stepwise Selection: forward selection The total number of models fitted is now only 1+p(p+1)/2. So when p=20 we fit 211 not one million! We are not guaranteed to get the best model. If p=3, the best single variable model might be X 1, but the best model using 2 variables is X 2 plus X 3 which will be missed by forward selection. Although we can start the forward selection algorithm even if p>n we can only go up to M.

7 Stepwise Selection: backward selection Algorithm (1) Let M be the full model with all p predictors. (2) For k=p, p-1,,1: fit all k models with one less predictor than used in M. Choose the best model (M ) based on the smallest RSS or largest R 2. (3) Select the single best model among M,,M using crossvalidation (MSE), C p, AIC, BIC, or adjusted R 2. The same number of models are fit as with forward selection. However, we must have p<n.

8 Choosing the Optimal Model We know the training MSE is an underestimate of the test MSE. Two different approaches, (1) Make adjustments to the training error to correct for the bias. (2) Directly estimate the test error with a validation set or crossvalidation.

9 C p, AIC, BIC, and Adjusted R 2 Mallow s C p = RSS 2dσ, where d is the number of predictors AIC= RSS 2dσ for least squares AIC and Cp are proportional to each other. BIC = RSS log ndσ, n>7 log(n)>2 so BIC will be greater that 2 and thus more conservation than C p and AIC. Adjusted R 2 = 1 /, the adjusted / R2 will no longer always increase with d. Except for the Adjusted R 2 the other measures have a strong theoretical basis.

10 C p, AIC, BIC, and Adjusted R 2 The best model is at the minimum of C p and BIC (AIC) and the maximum of the adjusted R 2. For the credit data BIC indicates an optimum with fewer predictors than C p.

11 Validation Set and Cross-Validation The same credit data which in this case gives the same optimum for the validation set and cross-validation. James et al. propose the 1 standard deviation rule. Calculate the standard deviation of the test MSE. After identifying the minimum see if plus 1 standard deviation includes the test MSE for fewer predictors

12 Shrinkage Methods: Ridge Regression Minimize RSS λ β is called the tuning parameter. λ β is called the shrinkage penalty. When =0, then the ridge estimators is just the normal least squares estimates. As λ the penalty grows and the ridge estimates approach 0. For each λ there is a different set of regression parameters, β. The penalty function does not include the intercept, 0. James et al. don t talk about this directly but when p>n then there may be no unique solution to the ridge minimization formula.

13 Shrinkage Methods: Ridge Regression 2 is called the l2 norm and equals β. So the x-axis can be thought of as a measure of the relative amount of shrinkage, which decreases to the right until equal to 1 which is no shrinkage.

14 Why does ridge regression work? Using simulated data with n=50 and p=45, the MSE (top purple? line) for the ridge estimator, the squared bias (black) and the variance (green) are shown. The LSE show a very large variance which is decreased substantially by the ridge estimator. Ridge regression does not eliminate predictors, at best they get assigned very small coefficients.

15 The Lasso The lasso can set some predictor coefficients to 0 and thus effectively aid with variable selection. The penalty function uses an l 1 norm instead of an l 2 norm. The β lasso coefficients satisfy, RSS λ β As with the ridge estimates as gets larger the coefficients shrink towards 0 but now some may equal be 0. Thus, we say the lasso yields sparse models. By convex duality you can shown when p>n there can be at most n non-zero lasso coefficients! (see Rosset & Zhu, Piecewise linear regularization paths. Ann. Stat. 35: )

16 The Lasso Credit data. The number of predictors in the final model is a function of. In the right figure as you move to the right Rating is the first variable to come into the model followed by Student and Limit.

17 The Lasso An alternative way to write solutions for the ridge and lasso estimates are, minimize RSS subject to β β s minimize β RSS subject to β s For every value of there is a corresponding value of s.

18 The Lasso The regions demarcated by s for the lasso (left) and ridge estimators (right) are where the solutions must reside. β is the least squares estimate. The ellipses are regions of constant RSS and get larger as you move away from β. The solutions for the lasso will often hit a vertex of the region which results in one or more parameters being set to 0.

19 The Lasso This simulation has p=45, n=50, but now only 2 of the predictors are related to the response. On the right are the lasso (solid) and ridge (dashed) estimator properties.

20 Lasso and Ridge Soft thresholding Consider a simple model, no intercept, n=p, X a diagonal matrix =I. Then the y if y λ/2 ridge solution is β y /1 λ and the lasso is β y if y λ/2 0 if y λ/2

21 Choosing Using the leave-one-out cross validation ridge regression was applied to the credit data. The optimal is small and results in a modest reduction in the MSE and magnitude of the coefficients. Perhaps the original least square estimates are not that bad.

22 Choosing Lasso applied to the simulated data with p=45 but only two that affect the outcome. Now the optimal results in two non-zero coefficients which were the two that affect the outcome.

23 Principal Components Suppose we have a data matrix, X, of n independent samples and p features. The sample covariance matrix of X is S. Principal components will find p linear combinations of the features which are orthogonal (independent) of each other and are ranked by variance, so the first principal component will have the largest variance. If the features different dramatically in scale then it would be best to center and scale the raw data, e.g. z. The principal components from the centered and scaled data will be different then the unscaled data. There is no simple transformation. Thus, the first principal component is Y a X a X a x The Var Y a Sa

24 Principal Components So we want to find a 1 such that Var(Y 1 ) it has the largest variance of all normalized linear compounds that satisfies a a 1. Due to the constraint finding the maximum is a little more difficult but can be done with Lagrange multipliers. Skipping the details the Lagrange multipliers results in p simultaneous equations, Sl Ia 0, where l 1 is the Lagrange multiplier. The only way for this to not have a trivial solution is if, dets l I 0 This means that l 1 is the characteristic root (eigenvalue) of S and a 1 is its associated characteristic vector (eigenvector).

25 Principal Components If we pre-multiply Sl Ia 0by a we get l a Sa VarY Since the first principal component should have the largest variance then l 1 should be the largest eigenvalue out of p possible eigenvalues. The second principal component satisfies, a a 1, and a a 0 The second principal component is the second largest eigenvalue of S and so on. It is also the case that l l trs So if we divide the variance of each principal component by the total variance it will equal the proportion of the total variance.

26 Supervised Principal Components To get details see chapter 18 of The elements of statistical learning or Blair et at JASA 101:119. This technique is designed for the p>n case. We don t want to use all the features only those that are correlated with the outcomes, hence the supervision. The technique was originally designed for survival data. However, it can be used with normal regression problems. The program can be run with the superpc package written by Blair and Tibshirani. For a reasonably good tutorial go to Note that the superpc.listfeatures command is incorrect or outdated -> see help menu.

27 Supervised Principal Components Algorithm 1. Compute the standardized univariate regression coefficients (, where v j is jth diagonal of (X T X) -1 ) for the outcome as a function of each feature. 2. For each value of the threshold from the list 0 θ.. θ : (a) Form a reduced data matrix consisting of only those features whose univariate coefficient exceeds in absolute value, and compute the first m principal components of this matrix. (b) Use these principal components in a regression model to predict the outcome. 3. Pick (and m) by cross-validation.

28 Example: Supervised Principal Components We use a simulated pooled genomic allele frequency database. In this database loci 1-30 have some effect on a phenotype, loci show the same level of allele frequency differentiation as loci 1-30 but have NO effect on the phenotype and loci show random variation between populations and also do not affect the phenotype. There are a total of 40 populations (so n=40 and p=2000). Before doing this analysis we remove loci by testing for allele frequency differences and using a false discovery rate of 5%. With this database the pre-filtering reduced p to 43.

29 Example: Supervised Principal Components In this simulated database the allele frequency variation among populations is shown in (a). These are the mean allele frequencies. This database shows binomial sampling variation about these means. Populations

30 Example: Supervised Principal Components Data was randomly divided into a training set of 32 populations and a test set of 8 populations. After training the data we choose the threshold value 6.49 from the graph below. Only the first principal component is shown.

31 Example: Supervised Principal Components We can test the significance of the first three principal components. > superpc.fit.to.outcome(sim401.train, data.test, sim401.fit$v.pred) Call: lm(formula = data.test$y ~., data = temp.list) Residuals: e e e e e e e e-03 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-09 *** score e-07 *** score * score Signif. codes: 0 *** ** 0.01 * Residual standard error: on 4 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 1849 on 3 and 4 DF, p-value: 9.736e-07

32 Example: Supervised Principal Components Results Importance-score Raw-score Name feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature feature13 The input list included all these plus # s 39, 453, and 1560 Thus, supervised principal components was only able to eliminate three loci and it included almost all of the noncausative loci (in bold). These results were based on only the first principal component.

33 FLAME Apply FLAME to the same artificial database yielding the following sparse list. FLAM 50% criteria feature1 73 feature2 100 feature4 98 feature6 90 feature7 89 feature8 85 feature11 95 feature12 71 feature14 98 feature16 80 feature17 70 feature21 90 feature22 79 feature26 94 feature27 72 feature28 92 feature30 97 feature32 76 Frequency/100

34 Partial Least Squares This technique can be used for dimension reduction like principal component regression. Up to p new directions are created which are linear functions of the original features. Unlike principal components the new directions are based on X and y not just X like principal components. In principal components we chose each one to maximize the variance of the first, then second and so on. Partial least squares chooses that have a high variance and high correlation with the outcomes (y). Partial least squares software can be found in the pls R-package. The detailed algorithm is on page 81 of the Elements book.

35 FLAM Fussed Lasso Additive Model Assume n-observations. At one feature assume the ordered values are x j1,, x jn. The regression model is E y x θ The D matrix inside of the l 1 norm encourages adjacent parameters to be zero, e.g. θ θ 0. Knots (jumps) will only appear if the reduce the sum of squares x 1 x 2 x 3

36 FLAM The full minimization problem has a group lasso although it uses an l 2 norm that encourages discarding whole loci. P j is a permutation matrix which orders the x j from least to greatest. The tuning parameters and have to be chosen from a grid. Luckily there is a finite value for above which the results are completely sparse. ranges from 0 to 1. If the outcomes, y, vary widely in magnitude consider a transformation like log(y). Since the test MSE determines the model parameters, very small y may have a minor effect on the final model.

37 FLAM: Example One simulated database with 40 populations. The patterns are shown in panel (c) below. FLAM is in the flam package. After initial filtering, FLAM was run 100 times on permuted genetic databases. From the sparse list using the 50% rule, FLAM was run on just those features after the best was determined. The final results was, best.flam<flamcv(sparse.gen,pheno.data,alpha=0.4,n.fold=5,seed=1,met hod="bcd")

38 FLAM: Example > summary(best.flam) Call: flamcv(x = sparse.gen, y = pheno.data, alpha = 0.4, method = "BCD", n.fold = 5, seed = 1) plot(best.flam) FLAM was fit using the tuning parameters: lambda: alpha: 0.4 Cross-validation with K=5 folds was used to choose lambda. Lambda was chosen to be the largest value with CV error within one standard error of the minimum CV error. The chosen lambda was This corresponds to 7 predictors having non-sparse fits. The predictors with non-sparse fits:

39 FLAM: Example plot(best.flam$flam.out,best.flam$index.cv) 5= gene 7 best.predict<- cbind(pheno.data,best.flam$flam.out$y.hat.mat[best.flam$index.cv,]) 6= gene 8 best.predict<- as.data.frame(best.predict) 7= gene 10 colnames(best.predict)<- c("observed","predicted") 8= gene 13 library(ggplot2) 14= gene 27 ggplot(best.predict,aes(observed,predicted))+geom_point() +ylab("predicted Phenotype")+xlab("Observed Phenotype")+geom_abline()

40 Feature Assessment and Multiple Testing Problem: determine if there are significant differences in the mean feature value between two groups. If p>>n then this involves multiple hypothesis tests. If the type-i error (the chance of rejecting the null hypothesis when true) is 5% then we expect to have many type errors when p is very large. A family wise error rate (FWER) controls the type-i error on a collection of hypothesis tests. If we do a total of M hypothesis tests with a type-i error rate of, then the chance that any of the M tests results in a type-i error is, (1-(1-) M )=FWER. If there is positive dependence between the tests then FWER will be smaller.

41 Feature Assessment and Multiple Testing To test each feature for significant differences first calculate a t-statistic, t, where in general, x x /N, where C l are the indices of group l with sample size N l. The standard error is calculated as, 1 se σ 1 N N σ 1 N N 2 x x x x We can approximate the distribution using a t-distribution or make a permutation distribution.

42 Permutation Distribution Here we permute the labels of the features many times and for each permutation compute the t-statistics. In theory we could look at all possible permutations. So the number of different ways to sample labels for group 1 are K N N. Thus, for permutation k, the t-statistic is t N, then the p- value for feature-j is p I t t. If the features are very similar then the calculation for p can be summed over all features to get a better average. The Bonferroni method can give a FWER of αby simply dividing the individual error rate,, by the number of tests. However, this can be overly conservation for large numbers of tests.

43 False Discovery Rate A second approach is to control the fraction of false significance calls. The FDR is E(V/R)

44 Benjamini and Hockberg Method See, 1995, J. Royal Stat. Soc. Series B 85: Algorithm: 1. Fix the false discovery rate at and let p p p denote the ordered p-values. 2. Define Lmax j: p α 3. Reject all hypotheses for which p p, the BH rejection threshold.

45 Benjamini and Hockberg Method The BH threshold is The Bonferroni with =0.15 is , an order of magnitude smaller.

46 Plug-in estimate of false discovery rate Algorithm: 1. Create K permutations of the data, producing t-statistics t for features j=1,,m and permutations k=1,, K. 2. For a range of values of the cut-point C, let R I t C,EV I t C 3. Estimate the FDR by FDR EV /R For the microarray data the BH threshold was for t= If we use as the cut-point, R obs =11, and EV 1.518, thus FDR 0.14 The plug in method rejects a greater number of hypotheses while controlling the same error rate, which leads to greater power (Storey, 2002, J. Roy Soc. Stat. B 64: 479.

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100. Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

Package EBglmnet. January 30, 2016

Package EBglmnet. January 30, 2016 Type Package Package EBglmnet January 30, 2016 Title Empirical Bayesian Lasso and Elastic Net Methods for Generalized Linear Models Version 4.1 Date 2016-01-15 Author Anhui Huang, Dianting Liu Maintainer

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,

More information

Multivariate Analysis Multivariate Calibration part 2

Multivariate Analysis Multivariate Calibration part 2 Multivariate Analysis Multivariate Calibration part 2 Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Linear Latent Variables An essential concept in multivariate data

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

Lasso. November 14, 2017

Lasso. November 14, 2017 Lasso November 14, 2017 Contents 1 Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) 1 1.1 The Lasso Estimator.................................... 1 1.2 Computation of the Lasso Solution............................

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. 1/44 Cross-validation and the Bootstrap In the section we discuss two resampling

More information

Gradient LASSO algoithm

Gradient LASSO algoithm Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form

More information

Discussion Notes 3 Stepwise Regression and Model Selection

Discussion Notes 3 Stepwise Regression and Model Selection Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017 CPSC 340: Machine Learning and Data Mining Feature Selection Fall 2017 Assignment 2: Admin 1 late day to hand in tonight, 2 for Wednesday, answers posted Thursday. Extra office hours Thursday at 4pm (ICICS

More information

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error

More information

MATH 829: Introduction to Data Mining and Analysis Model selection

MATH 829: Introduction to Data Mining and Analysis Model selection 1/12 MATH 829: Introduction to Data Mining and Analysis Model selection Dominique Guillot Departments of Mathematical Sciences University of Delaware February 24, 2016 2/12 Comparison of regression methods

More information

Medical Image Analysis

Medical Image Analysis Medical Image Analysis Instructor: Moo K. Chung mchung@stat.wisc.edu Lecture 10. Multiple Comparisons March 06, 2007 This lecture will show you how to construct P-value maps fmri Multiple Comparisons 4-Dimensional

More information

Variable Selection 6.783, Biomedical Decision Support

Variable Selection 6.783, Biomedical Decision Support 6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider

More information

Package flam. April 6, 2018

Package flam. April 6, 2018 Type Package Package flam April 6, 2018 Title Fits Piecewise Constant Models with Data-Adaptive Knots Version 3.2 Date 2018-04-05 Author Ashley Petersen Maintainer Ashley Petersen

More information

Overview. Background. Locating quantitative trait loci (QTL)

Overview. Background. Locating quantitative trait loci (QTL) Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems

More information

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy 2017 ITRON EFG Meeting Abdul Razack Specialist, Load Forecasting NV Energy Topics 1. Concepts 2. Model (Variable) Selection Methods 3. Cross- Validation 4. Cross-Validation: Time Series 5. Example 1 6.

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Dimensionality Reduction, including by Feature Selection.

Dimensionality Reduction, including by Feature Selection. Dimensionality Reduction, including by Feature Selection www.cs.wisc.edu/~dpage/cs760 Goals for the lecture you should understand the following concepts filtering-based feature selection information gain

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

Analysis of variance - ANOVA

Analysis of variance - ANOVA Analysis of variance - ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Feature Selection Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. Admin Assignment 3: Due Friday Midterm: Feb 14 in class

More information

Comparison of Optimization Methods for L1-regularized Logistic Regression

Comparison of Optimization Methods for L1-regularized Logistic Regression Comparison of Optimization Methods for L1-regularized Logistic Regression Aleksandar Jovanovich Department of Computer Science and Information Systems Youngstown State University Youngstown, OH 44555 aleksjovanovich@gmail.com

More information

LECTURE 12: LINEAR MODEL SELECTION PT. 3. October 23, 2017 SDS 293: Machine Learning

LECTURE 12: LINEAR MODEL SELECTION PT. 3. October 23, 2017 SDS 293: Machine Learning LECTURE 12: LINEAR MODEL SELECTION PT. 3 October 23, 2017 SDS 293: Machine Learning Announcements 1/2 Presentation of the CS Major & Minors TODAY @ lunch Ford 240 FREE FOOD! Announcements 2/2 CS Internship

More information

The Automation of the Feature Selection Process. Ronen Meiri & Jacob Zahavi

The Automation of the Feature Selection Process. Ronen Meiri & Jacob Zahavi The Automation of the Feature Selection Process Ronen Meiri & Jacob Zahavi Automated Data Science http://www.kdnuggets.com/2016/03/automated-data-science.html Outline The feature selection problem Objective

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Topics in Machine Learning-EE 5359 Model Assessment and Selection Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing

More information

LECTURE 11: LINEAR MODEL SELECTION PT. 2. October 18, 2017 SDS 293: Machine Learning

LECTURE 11: LINEAR MODEL SELECTION PT. 2. October 18, 2017 SDS 293: Machine Learning LECTURE 11: LINEAR MODEL SELECTION PT. 2 October 18, 2017 SDS 293: Machine Learning Announcements 1/2 CS Internship Lunch Presentations Come hear where Computer Science majors interned in Summer 2017!

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Multicollinearity and Validation CIVL 7012/8012

Multicollinearity and Validation CIVL 7012/8012 Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider

More information

Voxel selection algorithms for fmri

Voxel selection algorithms for fmri Voxel selection algorithms for fmri Henryk Blasinski December 14, 2012 1 Introduction Functional Magnetic Resonance Imaging (fmri) is a technique to measure and image the Blood- Oxygen Level Dependent

More information

Repeated Measures Part 4: Blood Flow data

Repeated Measures Part 4: Blood Flow data Repeated Measures Part 4: Blood Flow data /* bloodflow.sas */ options linesize=79 pagesize=100 noovp formdlim='_'; title 'Two within-subjecs factors: Blood flow data (NWK p. 1181)'; proc format; value

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1 Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1,2 Keyue Ding, Ph.D. Nov. 8, 2014 1 NCIC Clinical Trials Group, Kingston, Ontario, Canada 2 Dept. Public

More information

The Basics of Decision Trees

The Basics of Decision Trees Tree-based Methods Here we describe tree-based methods for regression and classification. These involve stratifying or segmenting the predictor space into a number of simple regions. Since the set of splitting

More information

Variable selection is intended to select the best subset of predictors. But why bother?

Variable selection is intended to select the best subset of predictors. But why bother? Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Variable Selection - Chapters 8 in Dielman Artin Department of Statistical Science October 23, 2009 Outline Introduction 1 Introduction 2 3 4 Variable Selection Model

More information

[POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization

[POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization [POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization L. Jason Anastasopoulos ljanastas@uga.edu February 2, 2017 Gradient descent Let s begin with our simple problem of estimating

More information

14. League: A factor with levels A and N indicating player s league at the end of 1986

14. League: A factor with levels A and N indicating player s league at the end of 1986 PENALIZED REGRESSION Ridge and The LASSO Note: The example contained herein was copied from the lab exercise in Chapter 6 of Introduction to Statistical Learning by. For this exercise, we ll use some baseball

More information

Introduction to hypothesis testing

Introduction to hypothesis testing Introduction to hypothesis testing Mark Johnson Macquarie University Sydney, Australia February 27, 2017 1 / 38 Outline Introduction Hypothesis tests and confidence intervals Classical hypothesis tests

More information

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13 CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit

More information

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible

More information

Contents Cont Hypothesis testing

Contents Cont Hypothesis testing Lecture 5 STATS/CME 195 Contents Hypothesis testing Hypothesis testing Exploratory vs. confirmatory data analysis Two approaches of statistics to analyze data sets: Exploratory: use plotting, transformations

More information

Regularization Methods. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Regularization Methods. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Regularization Methods Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Avoiding overfitting and improving model interpretability with the help of regularization

More information

Leveling Up as a Data Scientist. ds/2014/10/level-up-ds.jpg

Leveling Up as a Data Scientist.   ds/2014/10/level-up-ds.jpg Model Optimization Leveling Up as a Data Scientist http://shorelinechurch.org/wp-content/uploa ds/2014/10/level-up-ds.jpg Bias and Variance Error = (expected loss of accuracy) 2 + flexibility of model

More information

Week 4: Simple Linear Regression III

Week 4: Simple Linear Regression III Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection CSE 416: Machine Learning Emily Fox University of Washington April 12, 2018 Symptom of overfitting 2 Often, overfitting associated with very large

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

Correction for multiple comparisons. Cyril Pernet, PhD SBIRC/SINAPSE University of Edinburgh

Correction for multiple comparisons. Cyril Pernet, PhD SBIRC/SINAPSE University of Edinburgh Correction for multiple comparisons Cyril Pernet, PhD SBIRC/SINAPSE University of Edinburgh Overview Multiple comparisons correction procedures Levels of inferences (set, cluster, voxel) Circularity issues

More information

Machine Learning. Topic 4: Linear Regression Models

Machine Learning. Topic 4: Linear Regression Models Machine Learning Topic 4: Linear Regression Models (contains ideas and a few images from wikipedia and books by Alpaydin, Duda/Hart/ Stork, and Bishop. Updated Fall 205) Regression Learning Task There

More information

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Multiresponse Sparse Regression with Application to Multidimensional Scaling Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,

More information

Feature Selec+on. Machine Learning Fall 2018 Kasthuri Kannan

Feature Selec+on. Machine Learning Fall 2018 Kasthuri Kannan Feature Selec+on Machine Learning Fall 2018 Kasthuri Kannan Interpretability vs. Predic+on Types of feature selec+on Subset selec+on/forward/backward Shrinkage (Lasso/Ridge) Best model (CV) Feature selec+on

More information

10601 Machine Learning. Model and feature selection

10601 Machine Learning. Model and feature selection 10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior

More information

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes. Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

The Data. Math 158, Spring 2016 Jo Hardin Shrinkage Methods R code Ridge Regression & LASSO

The Data. Math 158, Spring 2016 Jo Hardin Shrinkage Methods R code Ridge Regression & LASSO Math 158, Spring 2016 Jo Hardin Shrinkage Methods R code Ridge Regression & LASSO The Data The following dataset is from Hastie, Tibshirani and Friedman (2009), from a studyby Stamey et al. (1989) of prostate

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

STATS PAD USER MANUAL

STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

Supervised vs unsupervised clustering

Supervised vs unsupervised clustering Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

More information

STAT Statistical Learning. Predictive Modeling. Statistical Learning. Overview. Predictive Modeling. Classification Methods.

STAT Statistical Learning. Predictive Modeling. Statistical Learning. Overview. Predictive Modeling. Classification Methods. STAT 48 - STAT 48 - December 5, 27 STAT 48 - STAT 48 - Here are a few questions to consider: What does statistical learning mean to you? Is statistical learning different from statistics as a whole? What

More information

Clustering analysis of gene expression data

Clustering analysis of gene expression data Clustering analysis of gene expression data Chapter 11 in Jonathan Pevsner, Bioinformatics and Functional Genomics, 3 rd edition (Chapter 9 in 2 nd edition) Human T cell expression data The matrix contains

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Modelling Proportions and Count Data

Modelling Proportions and Count Data Modelling Proportions and Count Data Rick White May 4, 2016 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:

More information

A short explanation of Linear Mixed Models (LMM)

A short explanation of Linear Mixed Models (LMM) A short explanation of Linear Mixed Models (LMM) DO NOT TRUST M ENGLISH! This PDF is downloadable at "My learning page" of http://www.lowtem.hokudai.ac.jp/plantecol/akihiro/sumida-index.html ver 20121121e

More information

Linear Models in Medical Imaging. John Kornak MI square February 22, 2011

Linear Models in Medical Imaging. John Kornak MI square February 22, 2011 Linear Models in Medical Imaging John Kornak MI square February 22, 2011 Acknowledgement / Disclaimer Many of the slides in this lecture have been adapted from slides available in talks available on the

More information

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

BIOL 458 BIOMETRY Lab 10 - Multiple Regression BIOL 458 BIOMETRY Lab 10 - Multiple Regression Many problems in science involve the analysis of multi-variable data sets. For data sets in which there is a single continuous dependent variable, but several

More information

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016 Resampling Methods Levi Waldron, CUNY School of Public Health July 13, 2016 Outline and introduction Objectives: prediction or inference? Cross-validation Bootstrap Permutation Test Monte Carlo Simulation

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to

More information

Topics in Machine Learning

Topics in Machine Learning Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur

More information

Modelling Proportions and Count Data

Modelling Proportions and Count Data Modelling Proportions and Count Data Rick White May 5, 2015 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:

More information

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses

More information

Package msgps. February 20, 2015

Package msgps. February 20, 2015 Type Package Package msgps February 20, 2015 Title Degrees of freedom of elastic net, adaptive lasso and generalized elastic net Version 1.3 Date 2012-5-17 Author Kei Hirose Maintainer Kei Hirose

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

SPSS INSTRUCTION CHAPTER 9

SPSS INSTRUCTION CHAPTER 9 SPSS INSTRUCTION CHAPTER 9 Chapter 9 does no more than introduce the repeated-measures ANOVA, the MANOVA, and the ANCOVA, and discriminant analysis. But, you can likely envision how complicated it can

More information

CSE446: Linear Regression. Spring 2017

CSE446: Linear Regression. Spring 2017 CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill

More information

Multiple Linear Regression: Global tests and Multiple Testing

Multiple Linear Regression: Global tests and Multiple Testing Multiple Linear Regression: Global tests and Multiple Testing Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike

More information

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical

More information

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM) School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression

More information

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions) THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination

More information

7. Collinearity and Model Selection

7. Collinearity and Model Selection Sociology 740 John Fox Lecture Notes 7. Collinearity and Model Selection Copyright 2014 by John Fox Collinearity and Model Selection 1 1. Introduction I When there is a perfect linear relationship among

More information

6 Model selection and kernels

6 Model selection and kernels 6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)

More information

Classification by Nearest Shrunken Centroids and Support Vector Machines

Classification by Nearest Shrunken Centroids and Support Vector Machines Classification by Nearest Shrunken Centroids and Support Vector Machines Florian Markowetz florian.markowetz@molgen.mpg.de Max Planck Institute for Molecular Genetics, Computational Diagnostics Group,

More information

Package ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2.

Package ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2. Package ridge February 15, 2013 Title Ridge Regression with automatic selection of the penalty parameter Version 2.1-2 Date 2012-25-09 Author Erika Cule Linear and logistic ridge regression for small data

More information

Package CVR. March 22, 2017

Package CVR. March 22, 2017 Type Package Title Canonical Variate Regression Version 0.1.1 Date 2017-03-17 Author Chongliang Luo, Kun Chen. Package CVR March 22, 2017 Maintainer Chongliang Luo Perform canonical

More information

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions

More information

Introductory Concepts for Voxel-Based Statistical Analysis

Introductory Concepts for Voxel-Based Statistical Analysis Introductory Concepts for Voxel-Based Statistical Analysis John Kornak University of California, San Francisco Department of Radiology and Biomedical Imaging Department of Epidemiology and Biostatistics

More information