Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Similar documents
Applied Statistics and Econometrics Lecture 6

Week 4: Simple Linear Regression II

Panel Data 4: Fixed Effects vs Random Effects Models

Chapters 5-6: Statistical Inference Methods

OLS Assumptions and Goodness of Fit

Bivariate (Simple) Regression Analysis

Week 10: Heteroskedasticity II

Section 2.3: Simple Linear Regression: Predictions and Inference

Week 11: Interpretation plus

Week 4: Simple Linear Regression III

MS&E 226: Small Data

Measures of Dispersion

AXIOMS FOR THE INTEGERS

Introduction to STATA 6.0 ECONOMICS 626

Week 9: Modeling II. Marcelo Coca Perraillon. Health Services Research Methods I HSMP University of Colorado Anschutz Medical Campus

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

appstats6.notebook September 27, 2016

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

The Bootstrap and Jackknife

Standard Errors in OLS Luke Sonnet

Cross-validation and the Bootstrap

Multicollinearity and Validation CIVL 7012/8012

Cross-validation and the Bootstrap

PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation. Simple Linear Regression Software: Stata v 10.1

ECON Introductory Econometrics Seminar 4

Missing Data Analysis for the Employee Dataset

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

Formal Methods of Software Design, Eric Hehner, segment 24 page 1 out of 5

Cross-validation. Cross-validation is a resampling method.

STA Module 4 The Normal Distribution

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Unit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals

Ch3 E lement Elemen ar tary Descriptive Descriptiv Statistics

Week 5: Multiple Linear Regression II

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017

L E A R N I N G O B JE C T I V E S

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems

Statistics Case Study 2000 M. J. Clancy and M. C. Linn

Continuous Improvement Toolkit. Normal Distribution. Continuous Improvement Toolkit.

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

1 Introduction to Using Excel Spreadsheets

Chapter 3 Analyzing Normal Quantitative Data

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

How individual data points are positioned within a data set.

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix

Randomized algorithms have several advantages over deterministic ones. We discuss them here:

1 Algorithm and Proof for Minimum Vertex Coloring

Stata versions 12 & 13 Week 4 Practice Problems

PSS718 - Data Mining

piecewise ginireg 1 Piecewise Gini Regressions in Stata Jan Ditzen 1 Shlomo Yitzhaki 2 September 8, 2017

3 Graphical Displays of Data

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

MEASURES OF CENTRAL TENDENCY

Statistical Analysis of List Experiments

(Refer Slide Time 3:31)

Slides for Data Mining by I. H. Witten and E. Frank

HPISD Eighth Grade Math

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 1. Looking at Data-Distribution

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models

3 Graphical Displays of Data

STA 570 Spring Lecture 5 Tuesday, Feb 1

Introduction to Stata: An In-class Tutorial

Averages and Variation

10701 Machine Learning. Clustering

Horn Formulae. CS124 Course Notes 8 Spring 2018

Assignments Fill out this form to do the assignments or see your scores.

schooling.log 7/5/2006

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1

CHAPTER 3: Data Description

CS150 - Assignment 5 Data For Everyone Due: Wednesday Oct. 16, at the beginning of class

The problem we have now is called variable selection or perhaps model selection. There are several objectives.

Chapter 3: Describing, Exploring & Comparing Data

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

COMPARING MODELS AND CURVES. Fitting models to biological data using linear and nonlinear regression. A practical guide to curve fitting.

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

Chapter 8. Interval Estimation

Foundations, Reasoning About Algorithms, and Design By Contract CMPSC 122

An Introductory Guide to Stata

Mathematics Scope & Sequence Grade 8 Revised: 5/24/2017

Six Weeks:

Topic 1: What is HoTT and why?

Lab 2: OLS regression

6th Grade Advanced Math Algebra

Exercise 2.23 Villanova MAT 8406 September 7, 2015

More Summer Program t-shirts

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency

Estimation of Item Response Models

A Short Introduction to STATA

Measures of Central Tendency

Why Use Graphs? Test Grade. Time Sleeping (Hrs) Time Sleeping (Hrs) Test Grade

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

model order p weights The solution to this optimization problem is obtained by solving the linear system

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

May 24, Emil Coman 1 Yinghui Duan 2 Daren Anderson 3

Module 2 Congruence Arithmetic pages 39 54

Transcription:

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms mean? If var(u X=x) is constant that is, if the variance of the conditional distribution of u given X does not depend on X then u is said to be homoskedastic. Otherwise, u is heteroskedastic. 1

2 Homoskedasticity in a picture: E(u X=x) = 0 (u satisfies Least Squares Assumption #1) The variance of u does not depend on x

3 Heteroskedasticity in a picture: E(u X=x) = 0 (u satisfies Least Squares Assumption #1) The variance of u does depends on x: u is heteroskedastic.

A real-data example from labor economics: average hourly earnings vs. years of education (data source: Current Population Survey): 4 Heteroskedastic or homoskedastic?

5 The class size data: Heteroskedastic or homoskedastic?

So far we have (without saying so) assumed that u might be heteroskedastic. Recall the three least squares assumptions: 1. E(u X = x) = 0 2. (X i,y i ), i =1,,n, are i.i.d. 3. Large outliers are rare Heteroskedasticity and homoskedasticity concern var(u X=x). Because we have not explicitly assumed homoskedastic errors, we have implicitly allowed for heteroskedasticity. 6

7 We now have two formulas for standard errors for 1 Homoskedasticity-only standard errors these are valid only if the errors are homoskedastic. The usual standard errors to differentiate the two, it is conventional to call these heteroskedasticity robust standard errors, because they are valid whether or not the errors are heteroskedastic. The main advantage of the homoskedasticity-only standard errors is that the formula is simpler. But the disadvantage is that the formula is only correct in general if the errors are homoskedastic. ˆ

Practical implications The homoskedasticity-only formula for the standard error of ˆ and the heteroskedasticity-robust formula differ so in 1 general, you get different standard errors using the different formulas. Homoskedasticity-only standard errors are the default setting in regression software sometimes the only setting (e.g. Excel). To get the general heteroskedasticity-robust standard errors you must override the default. If you don t override the default and there is in fact heteroskedasticity, your standard errors (and wrong t- statistics and confidence intervals) will be wrong typically, homoskedasticity-only SEs are too small. 8

Heteroskedasticity-robust standard errors in STATA regress testscr str, robust Regression with robust standard errors Number of obs = 420 F( 1, 418) = 19.26 Prob > F = 0.0000 R-squared = 0.0512 Root MSE = 18.581 ------------------------------------------------------------------------- Robust testscr Coef. Std. Err. t P> t [95% Conf. Interval] --------+---------------------------------------------------------------- str -2.279808.5194892-4.39 0.000-3.300945-1.258671 _cons 698.933 10.36436 67.44 0.000 678.5602 719.3057 ------------------------------------------------------------------------- If you use the, robust option, STATA computes heteroskedasticity-robust standard errors Otherwise, STATA computes homoskedasticity-only standard errors 9

The bottom line: If the errors are either homoskedastic or heteroskedastic and you use heteroskedastic-robust standard errors, you are OK If the errors are heteroskedastic and you use the homoskedasticity-only formula for standard errors, your standard errors will be wrong (the homoskedasticity-only estimator of the variance of ˆ 1 is inconsistent if there is heteroskedasticity). The two formulas coincide (when n is large) in the special case of homoskedasticity So, you should always use heteroskedasticity-robust standard errors. 10

Some Additional Theoretical Foundations of OLS (Section 5.5) We have already learned a very great deal about OLS: OLS is unbiased and consistent; we have a formula for heteroskedasticity-robust standard errors; and we can construct confidence intervals and test statistics. Also, a very good reason to use OLS is that everyone else does so by using it, others will understand what you are doing. In effect, OLS is the language of regression analysis, and if you use a different estimator, you will be speaking a different language. 11

The Extended Least Squares Assumptions These consist of the three LS assumptions, plus two more: 1. E(u X = x) = 0. 2. (X i,y i ), i =1,,n, are i.i.d. 3. Large outliers are rare (E(Y 4 ) <, E(X 4 ) < ). 4. u is homoskedastic 5. u is distributed N(0, 2 ) Assumptions 4 and 5 are more restrictive so they apply to fewer cases in practice. However, if you make these assumptions, then certain mathematical calculations simplify and you can prove strong results results that hold if these additional assumptions are true. We start with a discussion of the efficiency of OLS 12

13 Efficiency of OLS, part I: The Gauss-Markov Theorem Under extended LS assumptions 1-4 (the basic three, plus homoskedasticity), ˆ 1 has the smallest variance among all linear estimators (estimators that are linear functions of Y 1,, Y n ). This is the Gauss-Markov theorem. Comments The GM theorem is proven in SW Appendix 5.2

14 Efficiency of OLS, part II: Under all five extended LS assumptions including normally distributed errors ˆ 1 has the smallest variance of all consistent estimators (linear or nonlinear functions of Y 1,,Y n ), as n. This is a pretty amazing result it says that, if (in addition to LSA 1-3) the errors are homoskedastic and normally distributed, then OLS is a better choice than any other consistent estimator. And because an estimator that isn t consistent is a poor choice, this says that OLS really is the best you can do if all five extended LS assumptions hold. (The proof of this result is beyond the scope of this course and isn t in SW it is typically done in graduate courses.)

Some not-so-good thing about OLS The foregoing results are impressive, but these results and the OLS estimator have important limitations. 1. The GM theorem really isn t that compelling: The condition of homoskedasticity often doesn t hold (homoskedasticity is special) The result is only for linear estimators only a small subset of estimators (more on this in a moment) 2. The strongest optimality result ( part II above) requires homoskedastic normal errors not plausible in applications (think about the hourly earnings data!) 15

Limitations of OLS, ctd. 3. OLS is more sensitive to outliers than some other estimators. In the case of estimating the population mean, if there are big outliers, then the median is preferred to the mean because the median is less sensitive to outliers it has a smaller variance than OLS when there are outliers. Similarly, in regression, OLS can be sensitive to outliers, and if there are big outliers other estimators can be more efficient (have a smaller variance). One such estimator is the least absolute deviations (LAD) estimator: n min Y ( b b X ) b0, b1 i 0 1 i i 1 In virtually all applied regression analysis, OLS is used and that is what we will do in this course too. 16

Summary and Assessment (Section 5.7) The initial policy question: Suppose new teachers are hired so the student-teacher ratio falls by one student per class. What is the effect of this policy intervention ( treatment ) on test scores? Does our regression analysis answer this convincingly? Not really districts with low STR tend to be ones with lots of other resources and higher income families, which provide kids with more learning opportunities outside school this suggests that corr(u i, STR i ) > 0, so E(u i X i ) 0. So, we have omitted some factors, or variables, from our analysis, and this has biased our results. 17