Applied Statistics and Econometrics Lecture 6

Similar documents
Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Section 2.3: Simple Linear Regression: Predictions and Inference

Regression Analysis and Linear Regression Models

Chapter 6: DESCRIPTIVE STATISTICS

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Week 4: Simple Linear Regression II

Multiple Linear Regression

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

MS&E 226: Small Data

Exercise 2.23 Villanova MAT 8406 September 7, 2015

Section 2.2: Covariance, Correlation, and Least Squares

22s:152 Applied Linear Regression

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

OLS Assumptions and Goodness of Fit

22s:152 Applied Linear Regression

Chapters 5-6: Statistical Inference Methods

Standard Errors in OLS Luke Sonnet

Gov Troubleshooting the Linear Model II: Heteroskedasticity

Section 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business

Robust Linear Regression (Passing- Bablok Median-Slope)

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Week 10: Heteroskedasticity II

STA 570 Spring Lecture 5 Tuesday, Feb 1

Model Selection and Inference

Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression

IQR = number. summary: largest. = 2. Upper half: Q3 =

Gelman-Hill Chapter 3

Regression on the trees data with R

Lecture 20: Outliers and Influential Points

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

Salary 9 mo : 9 month salary for faculty member for 2004

Math 263 Excel Assignment 3

Coding Categorical Variables in Regression: Indicator or Dummy Variables. Professor George S. Easton

Section 2.1: Intro to Simple Linear Regression & Least Squares

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Introduction to Mixed Models: Multivariate Regression

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

36-402/608 HW #1 Solutions 1/21/2010

Section 2.1: Intro to Simple Linear Regression & Least Squares

An introduction to SPSS

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison

OVERVIEW OF ESTIMATION FRAMEWORKS AND ESTIMATORS

Introduction to R, Github and Gitlab

Lecture 25: Review I

Week 11: Interpretation plus

Solution to Bonus Questions

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

Section 6.3: Measures of Position

Lasso. November 14, 2017

Week 4: Simple Linear Regression III

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Bivariate (Simple) Regression Analysis

PSY 9556B (Feb 5) Latent Growth Modeling

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

Lecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression

Analysis of variance - ANOVA

Model selection and validation 1: Cross-validation

Model selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41

Stat 5303 (Oehlert): Response Surfaces 1

Labor Economics with STATA. Estimating the Human Capital Model Using Artificial Data

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

Knowledge Discovery and Data Mining

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

GETTING STARTED WITH STATA. Sébastien Fontenay ECON - IRES

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Poisson Regression and Model Checking

Panel Data 4: Fixed Effects vs Random Effects Models

Chapter 1. Looking at Data-Distribution

Multivariate Analysis Multivariate Calibration part 2

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix

Monte Carlo Analysis

Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018

Splines and penalized regression

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

More Summer Program t-shirts

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

An Econometric Study: The Cost of Mobile Broadband

Lecture 3: Chapter 3

Two-Stage Least Squares

Correlation and Regression

predict and Friends: Common Methods for Predictive Models in R , Spring 2015 Handout No. 1, 25 January 2015

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings

Excel 2010 with XLSTAT

Laboratory for Two-Way ANOVA: Interactions

Estimating R 0 : Solutions

INTRODUCTION TO PANEL DATA ANALYSIS

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

An Example of Using inter5.exe to Obtain the Graph of an Interaction

Package oaxaca. January 3, Index 11

22s:152 Applied Linear Regression DeCook Fall 2011 Lab 3 Monday October 3

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a

5.5 Regression Estimation

Transcription:

Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University

Empirical application. Data Italian Labour Force Survey, ISTAT (2015Q3) wage: wage of full-time workers education: years of education 1

Wage and Education Italia Labour Force Survey - ISTAT, 2015 3 Variables 26127 Observations RETRIC n missing distinct Info Mean Gmd.05.10.25.50.75.90.95 26127 0 275 0.999 1307 567.5 500 680 1000 1300 1550 1950 2290 lowest : 250 260 270 280 290, highest: 2960 2970 2980 2990 3000 EDULEV n missing distinct 26127 0 6 Value No education elementary school middle school Frequency 142 700 7510 Proportion 0.005 0.027 0.287 Value prof. high school diploma high school diploma college degree Frequency 2289 10530 4956 Proportion 0.088 0.403 0.190 SG11 n missing distinct Info Mean Gmd 26127 0 2 0.748 1.473 0.4985 2

Wage and education: data We recode education in terms of year of education years <- c(0, 5, 8, 11, 13, 18) cnt <- 1 for (j in levels(lfs$edulev)) { lfs$educ[lfs$edulev == j] <- years[cnt] cnt <- cnt + 1 } table(lfs$educ) ## ## 0 5 8 11 13 18 ## 142 700 7510 2289 10530 4956 3

Wage and education: data Education levels Years of education high school diploma middle school college degree prof. high school diploma elementary school No education 0 2500 5000 7500 10000 educ 13 8 18 11 5 0 0 2500 5000 7500 10000 4

Regression: wages and education lm1 <- lm(retric ~ educ, data = lfs) summary(lm1) ## ## Call: ## lm(formula = RETRIC ~ educ, data = lfs) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1312.6-312.6-32.5 252.4 1867.5 ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) 788.421 10.368 76.0 <2e-16 *** ## educ 43.012 0.822 52.4 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 497 on 26125 degrees of freedom ## Multiple R-squared: 0.0949,Adjusted R-squared: 0.0949 5

Regression when X is Binary (Section 5.3) X = 1 if small class size, = 0 if not; X = 1 if female, = 0 if not; etc. Binary regressors are sometimes called dummy variables. So far, β 1 has been called a slope, but that doesnt make sense if X is binary. How do we interpret regression with a binary regressor? 6

Interpretation when X is binary Consider Y i = β 0 + β 1 X i + u i, when X is binary. 7

Interpretation when X is binary Consider Y i = β 0 + β 1 X i + u i, when X is binary. Then, E[Y i X i = 0] = β 0 E[Y i X i = 1] = β 0 + β 1 7

Interpretation when X is binary Consider Y i = β 0 + β 1 X i + u i, when X is binary. Then, E[Y i X i = 0] = β 0 E[Y i X i = 1] = β 0 + β 1 Thus, β 1 = E[Y i X i = 1] E[Y i X i = 0] = population difference in group means 7

Example Let { 1 if STRi 20 D i = 0 if STR i > 20 The linear model: TestScore i = β 0 + β 1 D i + u i library(ase) data(caschools) CASchools["D"] <- ifelse(caschools[["str"]] <= 20, 1, 0) ## OLS lm(testscore ~ D, data = CASchools) ## ## Call: ## lm(formula = testscore ~ D, data = CASchools) ## ## Coefficients: ## (Intercept) D ## 650.00 7.19 8

Difference in means/regression testscore D n mean sd 0 177 649.999 17.966 1 243 657.185 19.286 All 420 654.157 19.053 Ȳ small Ȳ large = 657.185 649.999 = 7.186 7.19 SE(Ȳsmall Ȳlarge) ss = 2 + s2 l n s n l = 1.83 summary(lm(testscore ~ D, data = CASchools)) Call: lm(formula = testscore ~ D, data = CASchools) Residuals: Min 1Q Median 3Q Max -50.43-14.07-0.28 12.78 49.57 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 650.00 1.41 461.41 < 2e-16 *** D 7.19 1.85 3.88 0.00012 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 19 on 418 degrees of freedom Multiple R-squared: 0.0348,Adjusted R-squared: 0.0324 F-statistic: 15.1 on 1 and 418 DF, p-value: 0.000121 9

Difference in wages: males / females ## SG11 denote gender of individual ## SG11 is coded as: ## == 1, male; ## == 2, female ## `female` is == 1 if female; ==0 o/w lfs$female <- ifelse(lfs$sg11 == 2, 1, 0) lm(retric ~ female, data = lfs) ## ## Call: ## lm(formula = RETRIC ~ female, data = lfs) ## ## Coefficients: ## (Intercept) female ## 1445-291 10

Heteroskedasticity and Homoskedasticity 10

Heteroskedasticity robust standard errors (Section 5.4) What...? Consequences of heteroskedasticicty/homoskedasticity Implication for computing standard errors What do these two terms mean? If var(u X = x) is constant that is, if the variance of the conditional distribution of u given X does not depend on X then u is said to be homoskedastic. Otherwise, u is heteroskedastic. 11

Consider wage i = β 0 + β 1 educ i + u i Homoskedasticity means that the variance of u i does not change with the education level. Of course, we do not know anything about var(u i educ i ), but we can use data to get an idea. We plot the boxplot of wage for each educational cathegory if u i is homeskedastic, the box should approximately be of the same size 12

Homoskedasticity in a picture 3000 Net wages 2000 1000 No education elementary school middle school prof. high school diploma Education level high school diploma college degree 13

Homoskedasticity in a table Table 1: Sample variance of wage per education level EDULEV VAR SD No education 104210 323 elementary school 179437 424 middle school 184808 430 prof. high school diploma 184084 429 high school diploma 235692 485 college degree 401217 633 14

So far we have (without saying so) assumed that u might be heteroskedastic. Recall the three least squares assumptions: E(u X = x) = 0; (Xi, Yi), i = 1,..., n, are i.i.d. Large outliers are rare Heteroskedasticity and homoskedasticity concern var(u X = x). Because we have not explicitly assumed homoskedastic errors, we have implicitly allowed for heteroskedasticity. 15

Standard Errors We now have two formulas for standard errors for ˆβ 1 : Homoskedastic only standard errors these are valid only if the errors are homoskedastic The heteroskedasticity robust standard errors valid whether or not the errors are heteroskedastic. The main advantage of the homoskedasticity-only standard errors is that the formula is simpler. But the disadvantage is that the formula is only correct if the errors are homoskedastic. 16

Practical implications The homoskedasticity-only formula for the standard error of ˆβ 1 and the heteroskedasticity-robust formula differ - so in general, you get different standard errors using the different formulas. 17

Practical implications The homoskedasticity-only formula for the standard error of ˆβ 1 and the heteroskedasticity-robust formula differ - so in general, you get different standard errors using the different formulas. Homoskedasticity-only standard errors are the default setting in regression software - sometimes the only setting (e.g. Excel). To get the general heteroskedasticity-robust standard errors you must override the default. 17

Practical implications The homoskedasticity-only formula for the standard error of ˆβ 1 and the heteroskedasticity-robust formula differ - so in general, you get different standard errors using the different formulas. Homoskedasticity-only standard errors are the default setting in regression software - sometimes the only setting (e.g. Excel). To get the general heteroskedasticity-robust standard errors you must override the default. If you dont override the default and there is in fact heteroskedasticity, your standard errors (and wrong t-statistics and confidence intervals) will be wrong - typically, homoskedasticity-only SEs are too small. 17

The bottom line... If the errors are either homoskedastic or heteroskedastic and you use heteroskedastic-robust standard errors, you are OK 18

The bottom line... If the errors are either homoskedastic or heteroskedastic and you use heteroskedastic-robust standard errors, you are OK If the errors are heteroskedastic and you use the homoskedasticity-only formula for standard errors, your standard errors will be wrong (the homoskedasticity-only estimator of the variance of ˆβ 1 is inconsistent if there is heteroskedasticity). 18

The bottom line... If the errors are either homoskedastic or heteroskedastic and you use heteroskedastic-robust standard errors, you are OK If the errors are heteroskedastic and you use the homoskedasticity-only formula for standard errors, your standard errors will be wrong (the homoskedasticity-only estimator of the variance of ˆβ 1 is inconsistent if there is heteroskedasticity). The two formulas coincide (when n is large) in the special case of homoskedasticity 18

The bottom line... If the errors are either homoskedastic or heteroskedastic and you use heteroskedastic-robust standard errors, you are OK If the errors are heteroskedastic and you use the homoskedasticity-only formula for standard errors, your standard errors will be wrong (the homoskedasticity-only estimator of the variance of ˆβ 1 is inconsistent if there is heteroskedasticity). The two formulas coincide (when n is large) in the special case of homoskedasticity So, you should always use heteroskedasticity-robust standard errors 18

In R, to obtain heteroskedastic robust standard errors use summary rob() 18

summary(lm(testscore ~ str, data = CASchools)) ## This only works if `ase` has been loaded summary_rob(lm(testscore ~ str, data = CASchools)) Call: lm(formula = testscore ~ str, data = CASchools) Coefficients: Estimate Std. Error z value Pr(> z ) Residuals: Min 1Q Median 3Q Max (Intercept) str 698.933-2.280 10.364 0.519 67.44-4.39 < 2e-16 1.1e-05-47.73-14.25 0.48 12.82 48.54 --- Heteroskadasticity robust standard errors used Coefficients: Estimate Std. Error t value Pr(> t ) Residual standard error: 19 on 418 degrees of freedom (Intercept) 698.93 9.47 73.82 < 2e-16 Multiple *** R-squared: 0.0512, Adjusted R-squared: 0.049 str -2.28 0.48-4.75 2.8e-06 F-statistic: *** 19.3 on 1 and Inf DF, p-value: 1.14e-05 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 19 on 418 degrees of freedom Multiple R-squared: 0.0512,Adjusted R-squared: 0.049 F-statistic: 22.6 on 1 and 418 DF, p-value: 2.78e-06 19

Difference in means/regression testscore D n mean sd 0 177 650.00 17.97 1 243 657.18 19.29 All 420 654.16 19.05 Ȳ small Ȳ large = 657.18 650.00 = 7.18 SE(Ȳsmall Ȳlarge) = s 2 s n s + s2 l n l = 1.83 summary_rob(lm(testscore ~ D, data = CASchools)) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 650.00 1.35 481.55 < 2e-16 D 7.19 1.83 3.92 8.7e-05 --- Heteroskadasticity robust standard errors used Residual standard error: 19 on 418 degrees of freedom Multiple R-squared: 0.0348, Adjusted R-squared: 0.0324 F-statistic: 15.4 on 1 and Inf DF, p-value: 8.73e-05 20

Some Additional Theoretical Foundations of OLS 20

Some Additional Theoretical Foundations of OLS (Section 5.5) We have already learned a very great deal about OLS: (i) OLS is unbiased and consistent; (ii) we have a formula for heteroskedasticity-robust standard errors; (iii) and we can construct confidence intervals and test statistics. 21

Some Additional Theoretical Foundations of OLS (Section 5.5) We have already learned a very great deal about OLS: (i) OLS is unbiased and consistent; (ii) we have a formula for heteroskedasticity-robust standard errors; (iii) and we can construct confidence intervals and test statistics. Also, a very good reason to use OLS is that everyone else does so by using it, others will understand what you are doing. In effect, OLS is the language of regression analysis, and if you use a different estimator, you will be speaking a different language. 21

Further questions you may have: Is this really a good reason to use OLS? Arent there other estimators that might be better in particular, ones that might have a smaller variance? 22

Further questions you may have: Is this really a good reason to use OLS? Arent there other estimators that might be better in particular, ones that might have a smaller variance? So we will now answer this question but to do so we will need to make some stronger assumptions than the three least squares assumptions already presented. 22

The Extended Least Squares Assumptions 1. E(u i X i = x) = 0, for all x; 2. (X i, Y i ), i = 1,..., n, are i.i.d.; 3. large outliers are rare (E(Y 4 ) <, E(X 4 ) < ); 4. u i is homoskedastic; 5. u i is N(0, σu). 2 23

Efficiency of OLS: The Gauss-Markov Theorem Gauss-Markov theorem - Part I Under extended LS assumptions 1-4 (1-3, plus homoskedasticity): OLS has the smallest variance among all linear estimators of β 1. 24

Efficiency of OLS:The Gauss-Markov Theorem Gauss-Markov theorem - Part II Under extended LS assumptions 1-5 (1-3, plus homoskedasticity and normality): OLS has the smallest variance among all consistent estimators. This is a pretty amazing result it says that, if (in addition to LSA 1-3) the errors are homoskedastic and normally distributed, then OLS is a better choice than any other consistent estimator. And because an estimator that isn t consistent is a poor choice, this says that OLS really is the best you can do if extended LS assumptions hold. 25

Some not-so-good thing about OLS The foregoing results are impressive, but these results and the OLS estimator have important limitations. The GM theorem really isnt that compelling: 26

Some not-so-good thing about OLS The foregoing results are impressive, but these results and the OLS estimator have important limitations. The GM theorem really isnt that compelling: The condition of homoskedasticity often doesn t hold (homoskedasticity is special) 26

Some not-so-good thing about OLS The foregoing results are impressive, but these results and the OLS estimator have important limitations. The GM theorem really isnt that compelling: The condition of homoskedasticity often doesn t hold (homoskedasticity is special) The result is only for linear estimators only a small subset of estimators 26

Some not-so-good thing about OLS The foregoing results are impressive, but these results and the OLS estimator have important limitations. The GM theorem really isnt that compelling: The condition of homoskedasticity often doesn t hold (homoskedasticity is special) The result is only for linear estimators only a small subset of estimators The strongest optimality result (part II above) requires homoskedastic normal errors - not plausible in applications 26

Some not-so-good thing about OLS The foregoing results are impressive, but these results and the OLS estimator have important limitations. The GM theorem really isnt that compelling: The condition of homoskedasticity often doesn t hold (homoskedasticity is special) The result is only for linear estimators only a small subset of estimators The strongest optimality result (part II above) requires homoskedastic normal errors - not plausible in applications OLS is more sensitive to outliers than some other estimators. 26

Some not-so-good thing about OLS The foregoing results are impressive, but these results and the OLS estimator have important limitations. The GM theorem really isnt that compelling: The condition of homoskedasticity often doesn t hold (homoskedasticity is special) The result is only for linear estimators only a small subset of estimators The strongest optimality result (part II above) requires homoskedastic normal errors - not plausible in applications OLS is more sensitive to outliers than some other estimators. 26

Some not-so-good thing about OLS The foregoing results are impressive, but these results and the OLS estimator have important limitations. The GM theorem really isnt that compelling: The condition of homoskedasticity often doesn t hold (homoskedasticity is special) The result is only for linear estimators only a small subset of estimators The strongest optimality result (part II above) requires homoskedastic normal errors - not plausible in applications OLS is more sensitive to outliers than some other estimators. In virtually all applied regression analysis, OLS is used and that is what we will do in this course too. 26