Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

Similar documents
Two-Stage Least Squares

PANEL DATA REGRESSION MODELS IN EVIEWS: Pooled OLS, Fixed or Random effect model?

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix

Multiple Regression White paper

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS

OLS Assumptions and Goodness of Fit

Bivariate (Simple) Regression Analysis

Model Diagnostic tests

Chapter 7: Linear regression

IQR = number. summary: largest. = 2. Upper half: Q3 =

PSY 9556B (Feb 5) Latent Growth Modeling

Econometrics I: OLS. Dean Fantazzini. Dipartimento di Economia Politica e Metodi Quantitativi. University of Pavia

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Compare Linear Regression Lines for the HP-67

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

Regression Analysis and Linear Regression Models

5.5 Regression Estimation

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

CREATING THE ANALYSIS

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Fathom Dynamic Data TM Version 2 Specifications

7. Collinearity and Model Selection

Subset Selection in Multiple Regression

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

Time-Varying Volatility and ARCH Models

Multicollinearity and Validation CIVL 7012/8012

Multivariate Capability Analysis

1. Determine the population mean of x denoted m x. Ans. 10 from bottom bell curve.

Multiple Linear Regression

INTRODUCTION TO PANEL DATA ANALYSIS

Comparison of Means: The Analysis of Variance: ANOVA

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

Assignment 2. with (a) (10 pts) naive Gauss elimination, (b) (10 pts) Gauss with partial pivoting

- 1 - Fig. A5.1 Missing value analysis dialog box

Compare Linear Regression Lines for the HP-41C

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

For our example, we will look at the following factors and factor levels.

Intro to E-Views. E-views is a statistical package useful for cross sectional, time series and panel data statistical analysis.

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

SAS Econometrics and Time Series Analysis 1.1 for JMP

Introduction to Statistical Analyses in SAS

AMELIA II: A Program for Missing Data

Conducting a Path Analysis With SPSS/AMOS

Cognalysis TM Reserving System User Manual

Lab Session 1. Introduction to Eviews

5. Key ingredients for programming a random walk

Section 2.1: Intro to Simple Linear Regression & Least Squares

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

Multivariate Analysis Multivariate Calibration part 2

Package PSTR. September 25, 2017

ADMS 3330 FALL 2008 EXAM All Multiple choice Exam (See Answer Key on last page)

Algorithms for LTS regression

Coding Categorical Variables in Regression: Indicator or Dummy Variables. Professor George S. Easton

Introduction to hypothesis testing

STATS PAD USER MANUAL

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Using HLM for Presenting Meta Analysis Results. R, C, Gardner Department of Psychology

Linear Methods for Regression and Shrinkage Methods

X Std. Topic Content Expected Learning Outcomes Mode of Transaction

rpms: An R Package for Modeling Survey Data with Regression Trees

Week 4: Simple Linear Regression II

Instructor: Barry McQuarrie Page 1 of 6

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding

Classification/Regression Trees and Random Forests

ASSIGNMENT 6 Final_Tracts.shp Phil_Housing.mat lnmv %Vac %NW Final_Tracts.shp Philadelphia Housing Phil_Housing_ Using Matlab Eire

An Econometric Study: The Cost of Mobile Broadband

Notes for Student Version of Soritec

UNIT B3 Number Sequences: Activities

Chapter 6: Linear Model Selection and Regularization

MS&E 226: Small Data

Dual-Frame Weights (Landline and Cell) for the 2009 Minnesota Health Access Survey

NCSS Statistical Software. Robust Regression

set mem 10m we can also decide to have the more separation line on the screen or not when the software displays results: set more on set more off

BASIC STEPS TO DO A SIMPLE PANEL DATA ANALYSIS IN STATA

Salary 9 mo : 9 month salary for faculty member for 2004

GRETL FOR TODDLERS!! CONTENTS. 1. Access to the econometric software A new data set: An existent data set: 3

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

1. What specialist uses information obtained from bones to help police solve crimes?

Week 4: Simple Linear Regression III

VOID FILL ACCURACY MEASUREMENT AND PREDICTION USING LINEAR REGRESSION VOID FILLING METHOD

Model Selection and Inference

Week 10: Heteroskedasticity II

Section 2.2: Covariance, Correlation, and Least Squares

STA121: Applied Regression Analysis

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Fly wing length data Sokal and Rohlf Box 10.1 Ch13.xls. on chalk board

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

A short explanation of Linear Mixed Models (LMM)

Excel Spreadsheets and Graphs

SASEG 9B Regression Assumptions

Transcription:

Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have a cross-section (entity or subject) variable and a time-series variable

Fixed group effect model -examines group differences in intercepts, - assumes g the same slopes and constant variance across entities or subjects -a group (individual specific) effect is time invariant and considered a part of the intercept, -ut is allowed to be correlated to other regressors Estimation Methods -use least squares dummy variable (LSDV) and within effect estimation methods e.g. Ordinary least squares (OLS) regressions with dummies, Random effect model, -estimates variance components for groups (or times) and error, - assumes the same intercept and slopes -ut is a part of the errors ;should not be correlated to any regressor (why?) -The difference among groups (or time periods) lies in their variance of the error term, not in their intercepts Estimation Methods generalized least squares when the variance structure among groups is known Fixed effects are tested by the (incremental) F test, while random effects are examined by the Lagrange Multiplier (LM) test -If the null hypothesis is not rejected, the pooled OLS regression is favored The Hausman specification test compares fixed effect and random effect models - If the null hypothesis that the individual effects are uncorrelated with the other regressors in the model is not rejected, a random effect model is better than its fixed counterpart.

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

Example 2 of cross-sectional data: research and development (R&D) expenditure data of the top 50 information technology firms

Example 2 Regression analysis Regress R&D expenditure in 2002 on 2000 net income and firm type. Dummy variable coding for Firm Types The dummy variable d1 is set to 1 for equipment and software firms and zero for telecommunication and electronics. The variable d2 is coded in the opposite way. dummy variable is a binary variable that is coded to either 1 or zero

Example 2 Least Squares Dummy Variable Regression without a Dummy Variable: Pooled OLS with a Dummy Variable

Model 1 without a Dummy Variable: Pooled OLS Constant intercept & slope regardless of firm types : beta0 intercept; beta 1: slope of net income in 2000 The pooled model fits the data well at the.05 significance level (F=7.07, p<.0115). R2 : this model accounts for 16 percent of the total variance. intercept /1,482.697 & slope /.2231: For a $ one million increase in net income, a firm is likely to increase R&D expenditure by $.2231 million (p<.012). Pooled model: R&D = 1,482.697 +.2231*income Despite moderate goodness of fit statistics such as F and t, this is a naïve model. R&D investment tends to vary across industries. Example 2

Example 2 Model 2 with a Dummy Variable Assume that equipment & software firms have more R&D expenditure than other types of companies. How to take this group difference into account? delta1 : the coefficient of equipment, service, and software companies. We have to drop one of the two dummy variables. Why? That is, OLS does not work with both dummies in a model.

Example 2 This model results in two different regression equations for two groups (difference in the intercepts; slope remain unchanged) d1=1: R&D = 2,140.2050 +.2180*income = 1,113.579 +1,006.6260*1 +.2180*income d1=0: R&D = 1,133.5790 +.2180*income = 1,113.579 +1,006.6260*0 +.2180*income - The slope.2180 indicates a positive impact of two-year-lagged net income on a firm s R&D expenditure. - Equipment and software firms on average spend $1,007 million (=2,140-1,134) more for R&D than telecommunication and electronics companies.

Visualization of Model 1 and 2 Model 1 Model 2 Intercept 1,483 -Equipment & software companies 1,134 -Telecommunications & electronics 2,140 Slope.2231.2180 p-value of the F test.00115.0054* R2.1604.2520 SSE (sum of squares due to error or residual) SEE (square root of MSE) 83,261,299 74,175,757 1,500 1,435 Model 2 fits data better than Model 1 Example 2

Example 2 Model 1 ignores the group difference reports misleading intercept The difference in the intercept between two groups of firms looks substantial However, the two models have the similar slopes Model 2 considering a fixed group effect (i.e., firm type) seems better than the simple Model 1 - red line (pooled) : regression line of Model 1; - dotted blue line : for equipment & software companies (d1=1) in Model 2; - dotted green line :telecommunication and electronics firms (d2=1 or d1=0)

Example 3 Example 3 : panel data cross-sectional variable airline time-series variable: year

Example 3

Example 3 Pooled OLS Regression Model fit the pooled regression model without any dummy variable: regress cost output fuel load regression equation : cost = 9.5169 +.8827*output +.4540*fuel -1.6275*load This model fits the data well (F=2419.34, p<.0000 and R2=.9883)

Example 3 The fixed effects least-squares dummy variable We include the information that the intercepts of the six airlines may vary Airline 1: cost = 9.7059 +.9193*output +.4175*fuel -1.0704*load Airline 2: cost = 9.6647 +.9193*output +.4175*fuel -1.0704*load Airline 3: cost = 9.4970 +.9193*output +.4175*fuel -1.0704*load Airline 4: cost = 9.8905 +.9193*output +.4175*fuel -1.0704*load Airline 5: cost = 9.7300 +.9193*output +.4175*fuel -1.0704*load Airline 6: cost = 9.7930 +.9193*output +.4175*fuel -1.0704*load drop a dummy variable to get the model identified Let us drop the last dummy g6 and use it as the reference group Due to the dummies included, this model loses five degrees of freedom (from 86 to 81)

The parameter estimate of g6 is presented in the intercept (9.7930). e.g. The actual intercept of airline 1, : 9.7059 = 9.7930 + (-.0871)*1 + (-.1283)*0 + (-.2960)*0 + (.0975)*0 + (-.0630)*0 or simply 9.7930 + (-.0871), where 9.7930 is the reference point, the intercept of this model. The coefficient -.0871 says that the Y-intercept of airline 1 (9.7059) is.0871 smaller than that of airline 6 (reference point). Example 3

What if we drop a different dummy variable, say g1, instead of g6? Since the different reference point is applied, you will get different dummy coefficients. Example 3 actual parameter estimate (Y-intercept) of g1= intercept 9.7059 (excluded from the model) Y-intercept of airline 2 : 9.6647=9.7059-.0412. Y-intercept of airline.0412 smaller than the reference point of 9.7059. other statistics : parameter estimates of regressors & goodness-of-fit measures remain unchanged choice of a dummy variable to be dropped does not change a model

Example 3 Within Group Effect Model The within effect model does not use dummy variables and thus has larger degrees of freedom, smaller MSE, and smaller standard errors of parameters than those of LSDV -the values of the dependent & explanatory variables for each airline are mean corrected or expressed as deviations from their sample mean Transform dependent and independent variables to compute deviations from group means.. gw_cost = cost - gm_cost. gw_output = output - gm_output. gw_fuel = fuel - gm_fuel. gw_load = load - gm_load run the within effect model, suppress the intercept

ˆ1 Within effect model: correct SSE & parameter estimates of regressors; incorrect R2 a& standard errors of parameter estimates. degrees of freedom: increase from 81 (LSDV) to 87 since six dummy variables are not used. How to compute intercepts? Intercept for airline i: gm_cost- gm_beta2_hat* output-beta3_hat*gm_fuel- Beta4_hat*gm_load e.g. the intercept of airline 5 : 9.730 = 12.3630 {.9193*(-2.2857) +.4175*12.7921 + (-1.0704)*.5665}. Example 3

Example 3 correct standard errors :adjust them using the ratio of degrees of freedom of the within effect model and LSDV e.g. standard error of the logged output :.0299=.0288*sqrt(87/81).

Example 3 Random Effect Models A random effect model examines how group and/or time affect error variances. This model is appropriate for n individuals who were drawn randomly from a large population. Generalised Least squares method - run the OLS with the transformed variables. - suppress the intercept.

Panel data : investigate group & time effects How? fixed effect & random effect models. Differences: fixed effect model asks how group and/or time affect the intercept, the random effect model analyzes error variance structures affected by group and/or time. Slopes are assumed unchanged in both fixed effect and random effect models. Estimation methods : Fixed effect models : least squares dummy variable (LSDV) regression & within effect model Random effect models: deviations from group means

Random effect models are estimated by the generalized least squares (GLS) Fixed effects are tested by the F-test and random effects by the Breusch-Pagan Lagrange multiplier test.