Multiple Linear Regression: Global tests and Multiple Testing

Similar documents
Simulating power in practice

Simulation and resampling analysis in R

Model Selection and Inference

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Analysis of variance - ANOVA

A Knitr Demo. Charles J. Geyer. February 8, 2017

Stat 411/511 MULTIPLE COMPARISONS. Charlotte Wickham. stat511.cwick.co.nz. Nov

Chapter 6: Linear Model Selection and Regularization

Salary 9 mo : 9 month salary for faculty member for 2004

Introduction to Statistical Analyses in SAS

ITSx: Policy Analysis Using Interrupted Time Series

Regression on the trees data with R

Generalized Additive Models

Section 2.2: Covariance, Correlation, and Least Squares

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

Yelp Star Rating System Reviewed: Are Star Ratings inline with textual reviews?

The theory of the linear model 41. Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

Factorial ANOVA with SAS

MS&E 226: Small Data

Regression Analysis and Linear Regression Models

36-402/608 HW #1 Solutions 1/21/2010

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Lab #9: ANOVA and TUKEY tests

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture IV: Quantitative Comparison of Models

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

Additional Issues: Random effects diagnostics, multiple comparisons

Week 5: Multiple Linear Regression II

Robust Linear Regression (Passing- Bablok Median-Slope)

Introduction to Data Visualization

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison

Section 2.3: Simple Linear Regression: Predictions and Inference

The Statistical Sleuth in R: Chapter 10

R-Square Coeff Var Root MSE y Mean

E-Campus Inferential Statistics - Part 2

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Introduction to hypothesis testing

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

Stat 5303 (Oehlert): Response Surfaces 1

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC.

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

Correctly Compute Complex Samples Statistics

Centering and Interactions: The Training Data

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1)

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

Contents Cont Hypothesis testing

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

Linear Models in Medical Imaging. John Kornak MI square February 22, 2011

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation:

Logical operators: R provides an extensive list of logical operators. These include

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Week 4: Simple Linear Regression III

Section 2.1: Intro to Simple Linear Regression & Least Squares

Bootstrapped and Means Trimmed One Way ANOVA and Multiple Comparisons in R

22s:152 Applied Linear Regression DeCook Fall 2011 Lab 3 Monday October 3

PSY 9556B (Feb 5) Latent Growth Modeling

Multiple Linear Regression

More data analysis examples

Multiple Regression White paper

BayesFactor Examples

Week 4: Simple Linear Regression II

Correction for multiple comparisons. Cyril Pernet, PhD SBIRC/SINAPSE University of Edinburgh

Factorial ANOVA. Skipping... Page 1 of 18

Experiments in Imputation

IQR = number. summary: largest. = 2. Upper half: Q3 =

Linear Models in Medical Imaging. John Kornak MI square February 19, 2013

Subset Selection in Multiple Regression

SPSS INSTRUCTION CHAPTER 9

Comparison of Means: The Analysis of Variance: ANOVA

Bernt Arne Ødegaard. 15 November 2018

Correctly Compute Complex Samples Statistics

Controlling for multiple comparisons in imaging analysis. Wednesday, Lecture 2 Jeanette Mumford University of Wisconsin - Madison

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Lab 07: Multiple Linear Regression: Variable Selection

Lecture on Modeling Tools for Clustering & Regression

A. Using the data provided above, calculate the sampling variance and standard error for S for each week s data.

Litter Sizes in Two Strains of Inbred Mice

Compare Linear Regression Lines for the HP-67

LECTURE 12: LINEAR MODEL SELECTION PT. 3. October 23, 2017 SDS 293: Machine Learning

Introduction to Mixed Models: Multivariate Regression

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

Bivariate (Simple) Regression Analysis

Poisson Regression and Model Checking

22s:152 Applied Linear Regression

Practical 4: Mixed effect models

Set up of the data is similar to the Randomized Block Design situation. A. Chang 1. 1) Setting up the data sheet

SAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/

CSC 328/428 Summer Session I 2002 Data Analysis for the Experimenter FINAL EXAM

Analyzing traffic source impact on returning visitors ratio in information provider website

STATS PAD USER MANUAL

Non-linear Modelling Solutions to Exercises

BIOSTAT640 R Lab1 for Spring 2016

Enter your UID and password. Make sure you have popups allowed for this site.

A Basic Example of ANOVA in JAGS Joel S Steele

Hypermarket Retail Analysis Customer Buying Behavior. Reachout Analytics Client Sample Report

Transcription:

Multiple Linear Regression: Global tests and Multiple Testing Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en US

Today s Lecture Multiple testing - preserving your Type I error rate.

Inference about multiple coefficients Our model contains multiple parameters; often we want ask a question about multiple coefficients simultaneously. I.e. are any of these k coefficients significantly different from 0? This is equivalent to performing multiple tests (or looking at confidence ellipses): H 01 : β 1 = 0 H 02 : β 2 = 0. =. H 0k : β k = 0 where each test has a size of α For any individual test, P(reject H 0i H 0i ) = α

Inference about multiple coefficients For any individual test, P(reject H 0i H 0i ) = α. But it DOES NOT FOLLOW that P(reject at least one H 0i all H 0i are true) = α. This is called the Family-wise error rate (FWER). Ignore it at your own peril!

Family-wise error rate To calculate the FWER First note P(no rejections all H 0i are true) = (1 α) k It follows that FWER = P(at least one rejection all H 0i are true) = 1 (1 α) k

Family-wise error rate FWER = 1 (1 α) k alpha <-.05 k <- 1:100 FWER <- 1-(1-alpha)^k qplot(k, FWER, geom="line") + geom_hline(yintercept = 1, lty=2) 1.00 0.75 FWER 0.50 0.25 0 25 50 75 100 k

Addressing multiple comparisons Three general approaches Do nothing in a reasonable way Don t trust scientifically implausible results Don t over-emphasize isolated findings Correct for multiple comparisons Often, use the Bonferroni correction and use αi = α/k for each test Thanks to the Bonferroni inequality, this gives an overall FWER α Use a global test

Global tests Compare a smaller null model to a larger alternative model Smaller model must be nested in the larger model That is, the smaller model must be a special case of the larger model For both models, the RSS gives a general idea about how well the model is fitting In particular, something like RSS S RSS L RSS L compares the relative RSS of the models

Nested models These models are nested: Smaller = Regression of Y on X 1 Larger = Regression of Y on X 1, X 2, X 3, X 4 These models are not: Smaller = Regression of Y on X 2 Larger = Regression of Y on X 1, X 3

Global F tests Compute the test statistic F obs = (RSS S RSS L )/(df S df L ) RSS L /df L If H 0 (the null model) is true, then F obs F dfs df L,df L Note df s = n p S 1 and df L = n p L 1 We reject the null hypothesis if the p-value is above α, where p-value = P(F dfs df L,df L > F obs )

Global F tests There are a couple of important special cases for the F test The null model contains the intercept only When people say ANOVA, this is often what they mean (although all F tests are based on an analysis of variance) The null model and the alternative model differ only by one term Gives a way of testing for a single coefficient Turns out to be equivalent to a two-sided t-test: t 2 dfl F 1,dfL

Lung data: multiple coefficients simultaneously You can test multiple coefficients simultaneously using the F test mlr_null <- lm(disease ~ nutrition, data=dat) mlr1 <- lm(disease ~ nutrition+ airqual + crowding + smoking, data=dat) anova(mlr_null, mlr1) ## Analysis of Variance Table ## ## Model 1: disease ~ nutrition ## Model 2: disease ~ nutrition + airqual + crowding + smoking ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 97 9192.7 ## 2 94 1248.0 3 7944.7 199.47 < 2.2e-16 *** ## --- ## Signif. codes: ## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 This test shows that airqual, crowding, and smoking together significantly improve the fit of our model (assuming model diagnostics look good). Further analyses may be warranted to determine which, if any, coefficients are not different from 0.

Lung data: single coefficient test The F test is equivalent to the t test when there s only one parameter of interest mlr_null <- lm(disease ~ nutrition, data=dat) mlr1 <- lm(disease ~ nutrition + airqual, data=dat) anova(mlr_null, mlr1) ## Analysis of Variance Table ## ## Model 1: disease ~ nutrition ## Model 2: disease ~ nutrition + airqual ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 97 9192.7 ## 2 96 5969.5 1 3223.1 51.833 1.347e-10 *** ## --- ## Signif. codes: ## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 summary(mlr1)$coef ## Estimate Std. Error t value Pr(> t ) ## (Intercept) 37.62538251 2.43946243 15.423637 9.946294e-28 ## nutrition -0.03469855 0.01692446-2.050202 4.307101e-02 ## airqual 0.36114435 0.05016218 7.199535 1.346935e-10

Today s Big Ideas F tests can control for multiple comparisons! hands-on example