Week 5: Multiple Linear Regression II

Similar documents
Week 4: Simple Linear Regression III

Week 4: Simple Linear Regression II

Week 11: Interpretation plus

Week 10: Heteroskedasticity II

Week 9: Modeling II. Marcelo Coca Perraillon. Health Services Research Methods I HSMP University of Colorado Anschutz Medical Campus

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

Week 1: Introduction to Stata

Bivariate (Simple) Regression Analysis

Lab 2: OLS regression

Stata Session 2. Tarjei Havnes. University of Oslo. Statistics Norway. ECON 4136, UiO, 2012

Section 2.3: Simple Linear Regression: Predictions and Inference

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

Compare Linear Regression Lines for the HP-67

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

Section 2.2: Covariance, Correlation, and Least Squares

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017

Panel Data 4: Fixed Effects vs Random Effects Models

Model Selection and Inference

Repeated Measures Part 4: Blood Flow data

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models

Introduction to Programming in Stata

Introduction to Stata: An In-class Tutorial

Review of Stata II AERC Training Workshop Nairobi, May 2002

Stata versions 12 & 13 Week 4 Practice Problems

schooling.log 7/5/2006

Conditional and Unconditional Regression with No Measurement Error

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation. Simple Linear Regression Software: Stata v 10.1

For our example, we will look at the following factors and factor levels.

Labor Economics with STATA. Estimating the Human Capital Model Using Artificial Data

May 24, Emil Coman 1 Yinghui Duan 2 Daren Anderson 3

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

Health Disparities (HD): It s just about comparing two groups

Introduction to Stata Session 3

ECON Stata course, 3rd session

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

Box-Cox Transformation for Simple Linear Regression

Introduction to Mixed Models: Multivariate Regression

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

One Factor Experiments

/23/2004 TA : Jiyoon Kim. Recitation Note 1

Conducting a Path Analysis With SPSS/AMOS

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Regression Analysis and Linear Regression Models

Two-Stage Least Squares

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Compare Linear Regression Lines for the HP-41C

texdoc 2.0 An update on creating LaTeX documents from within Stata Example 2

Centering and Interactions: The Training Data

8. MINITAB COMMANDS WEEK-BY-WEEK

Creating LaTeX and HTML documents from within Stata using texdoc and webdoc. Example 2

Page 1. Notes: MB allocated to data 2. Stata running in batch mode. . do 2-simpower-varests.do. . capture log close. .

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Multiple Linear Regression

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Factorial ANOVA with SAS

Correctly Compute Complex Samples Statistics

36-402/608 HW #1 Solutions 1/21/2010

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Factorial ANOVA. Skipping... Page 1 of 18

Modelling Proportions and Count Data

piecewise ginireg 1 Piecewise Gini Regressions in Stata Jan Ditzen 1 Shlomo Yitzhaki 2 September 8, 2017

Lecture 3: The basic of programming- do file and macro

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Leveling Up as a Data Scientist. ds/2014/10/level-up-ds.jpg

Missing Data Analysis for the Employee Dataset

Section 2.1: Intro to Simple Linear Regression & Least Squares

The Coefficient of Determination

Modelling Proportions and Count Data

Introduction to Statistical Analyses in SAS

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation:

Lecture 13: Model selection and regularization

Chapters 5-6: Statistical Inference Methods

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Splines and penalized regression

Introduction to SAS proc calis

Subset Selection in Multiple Regression

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.

Correctly Compute Complex Samples Statistics

General Factorial Models

Applied Regression Modeling: A Business Approach

INTRODUCTION TO PANEL DATA ANALYSIS

Analysis of variance - ANOVA

Missing Data Analysis for the Employee Dataset

Applied Multivariate Analysis

Source:

E-Campus Inferential Statistics - Part 2

Statistical Analysis of List Experiments

Computing Murphy Topel-corrected variances in a heckprobit model with endogeneity

25 Working with categorical data and factor variables

Practical 4: Mixed effect models

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC.

A quick introduction to STATA:

MS&E 226: Small Data

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

Quantitative - One Population

Transcription:

Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1

Outline Adjusted R 2 More on testing hypotheses in linear models c 2017 PERRAILLON ARR 2

R 2 versus R 2 a (adjusted) We saw before that the goodness of fit of a linear regression can be measured by R 2 = SSR SST = 1 SSE SST This is equivalent to [cor(ŷ, y)] 2 We can still use this measure but when we compare models that have different number of predictors it is better to to take into account the number of predictors In the linear model, R 2 will always increase (or not decrease) when we add more parameters, regardless of whether they are relevant or not The adjusted (for the number of parameters) model is R 2 a = 1 SSE (n p 1) SST (n 1) Note that the more parameters we estimate the larger is p and the more SSE is penalized c 2017 PERRAILLON ARR 3

Example Stata shows these quantities in the ANOVA table. reg colgpa hsgpa male skipped Source SS df MS Number of obs = 141 -------------+---------------------------------- F(3, 137) = 13.30 Model 4.37665441 3 1.4588848 Prob > F = 0.0000 Residual 15.029445 137.109703978 R-squared = 0.2255 -------------+---------------------------------- Adj R-squared = 0.2086 Total 19.4060994 140.138614996 Root MSE =.33122. di 1-.109703978/.138614996.20857064 But why an extra parameter reduces SSE? Note that this implies that the unexplained variance is always reduced or at least is never increased This is because SST = SSR + SSE c 2017 PERRAILLON ARR 4

Example Add (literally) random noise to the regression gen noise = uniform() qui reg colgpa hsgpa male skipped est sto m1 reg colgpa hsgpa male skipped noise ------------------------------------------------------------------------------ colgpa Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- hsgpa.4710294.0898855 5.24 0.000.2932754.6487834 male.0409904.0584738 0.70 0.484 -.0746451.1566259 skipped -.080972.0265302-3.05 0.003 -.1334371 -.0285069 noise -.0026149.0984739-0.03 0.979 -.197353.1921232 _cons 1.521224.3209863 4.74 0.000.8864544 2.155994 ------------------------------------------------------------------------------ est sto m2 est table m1 m2, star stats(n r2 r2_a) b(%7.3f) ---------------------------------------- Variable m1 m2 -------------+-------------------------- hsgpa 0.471*** 0.471*** male 0.041 0.041 skipped -0.081** -0.081** noise -0.003 _cons 1.520*** 1.521*** -------------+-------------------------- N 141 141 r2 0.226 0.226 r2_a 0.209 0.203 ---------------------------------------- legend: * p<0.05; ** p<0.01; *** p<0.001 c 2017 PERRAILLON ARR 5

A couple of things to notice The parameter for noise is not significant, which makes sense None of the other coefficients were affected at all because noise is not correlated to any of them (verify) The R 2 a went down, which is somewhat reassuring R 2 did not change at 3 decimals (actual numbers are 0.225530 vs 0.225534) One more time: Remember the context. This is true in linear models. In other models adding irrelevant variables may make the fit of the model worse c 2017 PERRAILLON ARR 6

Small digression What if we add random noise that is correlated to one of the covariates? gen noise2 = skipped*noise + rnormal(0,5) noise2 skipped colgpa hsgpa male -------------+--------------------------------------------- noise2 1.0000 skipped 0.2573 1.0000 colgpa -0.1022-0.2618 1.0000 hsgpa -0.0357-0.0897 0.4146 1.0000 male 0.0422 0.2010-0.0765-0.2075 1.0000 qui reg colgpa hsgpa male skipped est sto m1 qui reg colgpa hsgpa male skipped noise2 est sto m2 est table m1 m2, star stats(n r2 r2_a) b(%7.6f) ------------------------------------------ Variable m1 m2 -------------+---------------------------- hsgpa 0.471037*** 0.470484*** male 0.040885 0.040586 skipped -0.080895** -0.078127** noise2-0.002154 _cons 1.519816*** 1.521428*** -------------+---------------------------- N 141 141 r2 0.225530 0.226446 r2_a 0.208571 0.203694 ------------------------------------------ legend: * p<0.05; ** p<0.01; *** p<0.001 c 2017 PERRAILLON ARR 7

Hypotheses testing Nothing much has changed respect to Wald tests but now the degrees of freedom for the t-student are different For confidence intervals ˆβ j ± t (n p 1,α/2) se( ˆβ j ) We need to take into account that we are now estimating p+1 parameters. t (n p 1,α/2) is still close to 2 and with large samples closer to 1.96 (as the z from the standard normal) We could do the same simulations we did before because we know that ˆβ j distributes normal If we wanted to do simulations to do tests or probabilities about two or more parameters at the same time, we need to consider their covariance c 2017 PERRAILLON ARR 8

Simulating from multivariate normals It used to be a bit of a hassle to do this simulation but Stata now has a command to do it qui reg colgpa hsgpa skipped * Save coefficients and variance-covariance matrix matrix M = e(b) matrix V = e(v) clear * won t delete matrices matrix list M matrix list V * Simulate 10,000 draws from multivariate normal with mean M and var-covar V drawnorm b_hsgpa b_skip b_cons, n(10000) cov(v) means(m) sum Variable Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- b_hsgpa 10,000.4604699.088031.1144291.807568 b_skip 10,000 -.0771271.0258411 -.1671696.017728 b_cons 10,000 1.573506.3040728.3955161 2.747055. corr b_hsgpa b_skip b_cons -------------+--------------------------- b_hsgpa 1.0000 b_skip 0.0837 1.0000 b_cons -0.9918-0.1727 1.0000 c 2017 PERRAILLON ARR 9

Simulating from multivariate normals Each β has a marginal normal too c 2017 PERRAILLON ARR 10

Simulating from multivariate normals What is the probability that b hsgpa > 0.8 and b skipped > 0.05? count if b_hsgpa > 0.4 & b_skip < -0.05 6,410 di 6410/10000 0.641 Fairly likely Remember, we need to take into account their joint probability c 2017 PERRAILLON ARR 11

Comparing nested models Models are nested if one can be obtained as a special case of the other a) y = β 0 + β 1 x 1 + β 2 x 2 is nested within b) y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 If β 3 = 0, the we can obtain a) from b) Two non-nested models: a) y = β 0 + β 1 x 1 + β 2 y is NOT nested within b) y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 We often call the smaller model the reduced or restricted model and the larger model the full model Lot s of theory behind the above statements; it s a paradigm for doing statistical tests c 2017 PERRAILLON ARR 12

Comparing nested models The intuition for comparing nested models is fairly simple: we will compare their SSEs Recall that SSE is the sum of squares of the residuals, which gives a measure of the variance not explained by our model Comparing SSE is similar to comparing R 2 a. We are essentially trying to figure out what improvement in R 2 a is good enough (is the improvement due to chance?) Define SSE(RM) as the sum of square of the residuals of the reduced model and SSE(FM) as the sum of square of the residuals of the full model We will use the ratio F = [SSE(RM) SSE(FM)]/(p+1 k) SSE(FM)/(n p 1) c 2017 PERRAILLON ARR 13

Comparing nested models F = [SSE(RM) SSE(FM)]/(p+1 k) SSE(FM)/(n p 1) The above expression is essentially the proportion of unexplained variance between the reduced and full model relative to the full model We just divided by the degrees of freedom to take into account the parameters estimated. The parameters in the full model are p + 1 while the parameters of the reduced model are denoted by k What is the sign of [SSE(RM) SSE(FM)]? The smaller F the more convinced we should be that the full model is not that great. We are estimating more parameters but not reducing the unexplained variation c 2017 PERRAILLON ARR 14

Null hypotheses When comparing models, our null hypothesis is that the reduced model is adequate The alternative is that the full model is adequate By now, you probably can guess that the ratio F distributes F with some degrees of freedom We reject the null if F F (p + 1 k, n p 1; α) F (p + 1 k, n p 1; α) is the critical value Note that p+1-k is just the number of additional parameters in the full model c 2017 PERRAILLON ARR 15

Digression about χ 2 and F distributions set obs 10000 gen ychin = rchi2(2) gen ychid = rchi2(139) gen f = ychin/ychid See a pattern here? Chi-square (χ 2 ) with df 139 converges to normal Why is χ 2 positive? Rejection for F is the tail on right, so large values of F will be likely to be rejected; also, F = (t student) 2 (yes, all are related and all have something to do with the normal distribution) c 2017 PERRAILLON ARR 16

Test all parameters are equal to zero Reduced model: colgpa = β 0 Full model: colgpa = γ 0 + γ 1 hsgpa + γ 2 skipped Recall that the null is that the reduced model is adequate Since the reduced model is just the mean of colgpa, then SSE = SST This test is essentially testing γ 1 = γ 2 = 0 c 2017 PERRAILLON ARR 17

F test by hand Stata stores SSE in a temporary variable called e(rss) qui reg colgpa * ereturn list scalar sse_r = e(rss) qui reg colgpa hsgpa skipped scalar sse_f = e(rss) di ((sse_r - sse_f)/2)/(sse_f/(141-3)) 19.77258 di invftail(2,138,0.05) 3.0617157 reg colgpa hsgpa skipped Source SS df MS Number of obs = 141 -------------+---------------------------------- F(2, 138) = 19.77 Model 4.32237812 2 2.16118906 Prob > F = 0.0000 Residual 15.0837213 138.109302328 R-squared = 0.2227 -------------+---------------------------------- Adj R-squared = 0.2115 Total 19.4060994 140.138614996 Root MSE =.33061 It matches the regression output: 19.77 Note that the critical value is usually around 3, larger for smaller samples (see Stata code for this class) c 2017 PERRAILLON ARR 18

Digression: Be curious How is the rejection region affected by sample size in an F-test? forvalues i = 10(10)300 { di i " " invftail(2, i,0.05) } 10 4.102821 20 3.4928285 30 3.3158295 40 3.231727 50 3.1826099 60 3.1504113 70 3.1276756 80 3.1107662 90 3.097698 100 3.0872959 110 3.0788195 120 3.0717794 130 3.0658391 140 3.0607595 150 3.0563663 160 3.0525291 170 3.0491486... Why? Remember what happened to the t-student critical value when the sample size increases c 2017 PERRAILLON ARR 19

But we can also use the test command The test command is quite flexible qui reg colgpa hsgpa skipped test hsgpa skipped ( 1) hsgpa = 0 ( 2) skipped = 0 F( 2, 138) = 19.77 Prob > F = 0.0000 * Remember, shortcut for test _b[hsgpa] = _b[skipped] = 0 ( 1) hsgpa - skipped = 0 ( 2) hsgpa = 0 F( 2, 138) = 19.77 Prob > F = 0.0000 You can do much more with test but don t forget the logic of the test Terminology and software can be confusing; the above F-test can also be a Wald test; (t student) 2 = F c 2017 PERRAILLON ARR 20

More Your textbook has more examples that you can easily do with the test command They are extensions of the idea of comparing reduced and full models test hsgpa = skipped ( 1) hsgpa - skipped = 0 F( 1, 138) = 36.18 Prob > F = 0.0000 test hsgpa + skipped =1 ( 1) hsgpa + skipped = 1 F( 1, 138) = 43.69 Prob > F = 0.0000 c 2017 PERRAILLON ARR 21

Summary The idea of partitioning the variance and using SSE = (y i ŷ i ) 2 as a measure of the variation in y not explained by the model leads to a general method for comparing models The models must be nested We want our models to be parsimonious ( unwilling to spend money or use resources; stingy or frugal; sparing, restrained ) We haven t covered the inferential theory but it all starts with the assumption of normally distributed iid error terms Next, we will cover maximun likelihood estimation and will show that we can use the likelihood function in a similar way we used SSE or SSR The advantage of focusing on MLE is that the method applies to many other models, not just linear regression c 2017 PERRAILLON ARR 22