Are We Really Doing What We think We are doing? A Note on Finite-Sample Estimates of Two-Way Cluster-Robust Standard Errors

Size: px
Start display at page:

Download "Are We Really Doing What We think We are doing? A Note on Finite-Sample Estimates of Two-Way Cluster-Robust Standard Errors"

Transcription

1 Are We Really Doing What We think We are doing? A Note on Finite-Sample Estimates of Two-Way Cluster-Robust Standard Errors Mark (Shuai) Ma Kogod School of Business American University Shuaim@american.edu Comments and Suggestions are welcome! Summary Archival researchers heavily rely on statistics software to deliver large sample analyses. To provide valid empirical analyses, an author needs to know the best statistical solution to the research issue and also the correct computer language that exactly carries out such statistical tests. However, unfortunately, researchers rarely explain what they exactly do in empirical analyses. Consequently, it is difficult for a reader to figure out the validity of the empirical results. This note uses two-way cluster-robust standard errors as an example to explain these points. Two-way cluster-robust standard errors are getting widely used in the accounting and finance literature. There are multiple different alternative specifications of two-way cluster-robust standard errors, which could result in very different significance levels than unadjusted asymptotic estimates. However, researchers rarely explain which estimate of two-way clusterrobust standard errors they use, though they may all call their standard errors two-way clusterrobust standard errors. Specifically, I first provide a short-discussion on alternative estimates of two-way cluster-robust standard errors. Second, I discuss two common mistakes in calculating two-way cluster-robust standard errors. Third, I show that popular statistics software (SAS and STATA) have options that could generate several alternative estimates of two-way cluster-robust standard errors.. Therefore, if not explained, no reader would know which estimate is used. Finally, I suggest that future empirical research should carefully explain how it implements estimates of two-way cluster-robust standard errors in finite samples. In addition, a SAS macro code for two-way clustered standard errors is available at my website. If you use this code, please add the following footnote "To obtain unbiased estimates, the clustered standard errors are adjusted by (N-1)/ (N-P) G/(G-1), where N is the sample size, P is the number of independent variables, and G is the number of clusters." Key Words: statistical software, two-way cluster-robust standard errors, finite sample. * I emphasize that this note is written purely and solely for communication purposes. I would thank Sutirtha Bagchi, Wayne Thomas, Louis Ederington, Xin Huang, Wenbin Cao, Lisa Yang and Ted Moorman for answering my questions regarding two-way clustered SEs, but errors are my own. If you find any error in this note, please let me know immediately.

2 1.Asymptotic estimate of two-way cluster-robust standard errors Here, I provide a brief summary of studies on two-way cluster-robust standard errors. For details, please see Petersen 2009; Gow et al. 2010; Thompson 2011; and Cameron and Miller According to the literature (page 458, Petersen 2009; page 2, Thompson 2011), the estimate for the VARIANCE-COVARIANCE (VAR-COV) matrix for two-way cluster-robust standard errors (SEs) is expressed as V firm&time= V firm+ V time V white (1) Where, V firm&time is the VAR-COV matrix clustered by two-way (i.e., firm and time). V firm is the VAR-COV matrix clustered by one dimension (i.e., firm). V time is the VAR-COV matrix clustered by another dimension (i.e., time). V white is the white (1980) heteroskedasticity-robust VAR-COV matrix. V white is subtracted off, because it is included in both V firm and V time. So, V white would solve the double counting problem. Importantly, if there are multiple observations in each intersection of firm-time, then the V white should be replaced by V firm-time intersection, as suggested in footnote 19 on page 458 on Petersen (2009). For example, when an author wants to cluster by firm-year in a sample of firm-month observations, then there might be 12 observations each firm-year. Then, the equation for two-way clustered SEs would be expressed as V firm&time= V firm+ V time V firm-time (2) To estimate two-way clustered SEs, there are three steps we can follow: Step 1. Estimate the firm-clustered VARIANCE-COVARIANCE matrix V firm, Step 2. Estimate the time-clustered VARIANCE-COVARIANCE matrix V time, Step 3. Estimate the heteroskedasticity robust white VARIANCE-COVARIANCE matrix (V white) when there is only one observation in each firm-time intersection, or, estimate the firm-time intersection clustered VARIANCE-COVARIANCE matrix (V firm-time) when there is more than one observation in each firm-time intersection. 1

3 2. Finite Sample Estimates for Clustered Standard Errors Prior studies carefully discuss two-way clustered SEs, which is becoming widely used in the accounting and finance literature. In finite samples, there are several alternative estimates for two-way cluster-robust standard errors. These finite-sample estimates could result in different significance levels than unadjusted asymptotic estimates. However, researchers rarely explain whether they use finite-sample adjusted estimates or unadjusted asymptotic estimates of two-way cluster-robust standard errors. To my knowledge, no prior work has talked about the empirical implementation of clustered standard errors in finite samples by popular statistics software (e.g., SAS and STATA). This note tries to fill this void in the literature. Asymptotic estimates of clustered SEs (as in equations 1 and 2) are likely downward biased in finite samples with a limited number of clusters. Therefore, in finite samples, Cameron and Miller (2011) suggest there could be three possible ways to adjust the clustered VAR-COV matrix (see Cameron and Miller 2011 for details about the three alternatives). One of these alternative specifications is to adjust the VAR-COV matrix by G/(G-1), where G is the number of clusters. STATA and SAS both have an approach that is similar to this finite sample specification, though SAS also provides an estimate which is not finite sample adjusted and can be misused by researchers (see details below). The other two specifications are alternative jackknife estimates of the clustered standard error. 1 Specifically, one specification uses the adjustment factor [I N -H], where I N is an identity matrix, H = X (X X) -1 X ; another specification uses the adjustment factor G/(G-1) [I N -H] -1. My note focuses on the first specification of finite sample adjustment G/(G-1), because this is similar to what several popular software packages use (see discussions below) and is used in several programs shared by other professors online. 1 These are analogs of HC2 and HC3 in MacKinnon and White (1985). MacKinnon and White (1985) discussed alternative specifications of white standard errors in finite samples, which are knows as HC, HC1, HC2, and HC3. See MacKinnon and White (1985) for details. 2

4 However, no study has examined whether G/(G-1) provides better finite sample adjustments than other specifications. 2.1 Two Common Mistakes in Calculating Clustered Standard Errors Mistake 1: Not using a finite sample-adjustment. As a result of this finite sample adjustment G/(G-1), the finite-sample estimates of oneor two-way standard errors could be very different than those calculated based on the asymptotic estimate. For example, if an author wants to cluster by year (or by both firm and year) in a sample with 5 years (G=5), the adjustment factor for V time would be 5/(5-1)=1.25. However, if this adjustment factor is not used, the t-statistics could be 11.8% over-estimated, 2 leading to completely different conclusions. Even if the number of years increases to 20, the t-statistics could still be 2.4% ( 20/19 1) overestimated if the finite sample adjustment is not used. Therefore, it is very important to explain whether and how the author implemented finite sample adjustments, especially when the number of clusters is limited even in a large sample with many observations. 3 Mistake 2: Use V firm-time when there is only one observation per firm-time. As discussed above and also in Petersen (2009) and Thompson (2011), V white should be used rather than V firm-time, when there is only one observation per firm-time. However, if a researcher always uses V firm-time rather than V white regardless of whether there are one or more observations per firm-time, this would also result in inaccuracy. This is because V firmtime is adjusted by G/(G-1), but V white is not adjusted by G/(G-1). Generally, because G/(G- 1)>1, V firm-time is larger than V white. So, when V firm-time instead of V white is subtracted 2 V is over-stated by Thus, standard error is over-estimated by 1.118, which is the square root of For example, if there are 1,000 firms each year for 5 years, the sample is still large (5,000 observations). But, the number of clusters for years is only 5. This type of sample is very common in the literature. 3

5 off, as in equation 2, V firm&time would be under-estimated. Then, the t-statistics would be over-estimated, potentially leading to wrong conclusions. Therefore, it is also very important to explain whether V white or V firm-time is used in calculating V firm&time. In the discussion below, I try to show how clustered and white standard errors are calculated in STATA and SAS. I hope this note can help researchers understand what their software actually gives them. 2.2 STATA Finite Samples Adjustment Formula STATA could use the following code from Professor Mitchell Petersen s website to estimate one-way cluster-robust standard errors. regress dependent_variable independent_variables, robust cluster(cluster_variable) According to page 54 STATA USER MANUAL 20 Estimation and postestimation commands, STATA uses the following adjustment for finite samples: V firm is multiplied by (N-1)/(N-P)* G 1 /(G 1-1). 4 N is the sample size, P is the number of independent variables, G 1 is the number of firm-clusters. For example, if there are 100 firms, then G 1 is 100. When N becomes large (relative to P), this adjustment factor is approximately G 1 /(G 1-1). V time is multiplied by (N-1)/(N-P)* G 2 /(G 2-1). N is the sample size, p is the number of independent variables, G 2 is the number of time-clusters. V firm-time is multiplied by (N-1)/(N-P)* G 3 /(G 3-1). N is the sample size, P is the number of independent variables; G 3 is the number of firm-time intersections/clusters. White SEs can be obtained by using the following STATA code: 4 The STATA manual uses M instead of G for notations. But, to be consistent in my discussion, I keep using G. 4

6 regress dependent_variable independent_variables, robust Importantly, the V white matrix given by this code is the unadjusted matrix as in White (1980). If a user wants to use alternative specifications of heteroskedasticity consistent SEs (i.e., HC1,HC2, or HC3), this code need to be modified. 2.3 SAS Finite Samples Adjustment Formula In SAS, there are two way to estimate clustered SEs: proc surveyreg and proc genmod. There is one major difference between these two functions 5 : that is, proc genmod does not provide finite sample adjustment, while proc surveyreg adjust the SEs by (N-1)/(N-P)* G/(G- 1). Therefore, the SEs given by proc genmod are likely to be under-estimated in finite samples. proc genmod; class identifier; model depvar = indvars; repeated subject=identifier / type=ind; run; quit; from Professor Stoffman s website. To adjust the proc genmod results, multiply the VAR-COV matrix by G/(G-1) or (N-1)/(N-P)* G/(G-1). Another SAS code for clustered standard error is proc surveyreg, which does provide finite sample adjustment. This proc surveyreg is mentioned on Professor Petersen s website proc surveyreg data=mydata; cluster cluster_variable; model dependent variable = independent variables; run; from Professor Petersen s website. According to SAS/STAT 9.2 User s Guide PAGE 6556, the matrix is as following 5 Professor Noah Stoffman s website also explains this. See 5

7 proc surveyreg is designed to analyze survey data. h is the stratum index. nh is the number of clusters. fh is the sampling rate for stratum h. The number of fh is negligible, unless a unique sample rate is specified. Therefore, fh is generally negligible when using the code above. To simplify this and make this matrix comparable to my discussions of the STATA matrix, I translate this adjustment as the finite sample adjustment factor in SAS is also approximately (N- 1)/(N-P)* G/(G-1). In addition, for the VAR-COV matrix to be weighted by G/(G-1) rather than (N-1)/(N- P)* G/(G-1), tone must specify VADJUST=NONE in the model statement. V white in SAS is also the same as in White (1980), unless specific forms of alternative specification of White SEs are chosen by the user. Specifically, set the option HCCMETHOD to 0 or 1 or 2 or 3 to get the white HC, HC1, HC2, HC3 SEs as described in MacKinnon and White (1985). 6 The white SEs can be obtained using the following code: proc reg; model y=x / hcc HCCMETHOD=0; run; quit; 3. Suggestions I provide two suggestions to authors using two-way clustered SEs. First, I suggest that authors explicitly identify how they calculate two-way cluster-robust standard errors. Specifically, the author needs to explain 1) whether V white or V firm-time is used in estimating V firm&time and 2) whether and how the estimate is finite sample adjusted. Second, SAS and STATA have options that provide a finite sample adjustment to the clustered SEs in a way similar to one specification suggested by Cameron and Miller (2011). However, SAS also has another option that does not do so. Therefore, if not explained, no reader 6 See 6

8 would know which estimate is used. Also, no studies in accounting and/or finance have compared alternative finite sample estimates of two-way clustered SEs. I suggest that future research carefully examines which alternative specification gives the best finite sample adjustment. In addition, similar issues of finite-sample adjustment exist for one-way clusterrobust standard errors. Finally, a SAS macro code for two way clustered standard errors is available at my personal website: To use this code, please add the following footnote "To obtain unbiased estimates in finite sample, the clustered standard errors are adjusted by (N-1)/(N-P) G/(G-1), where N is the sample size, P is the number of independent variables, and G is the number of clusters." Thus, a reader could better interpret the empirical results. When other researchers share their codes online, they should also explain to the users how their codes perform finite sample adjustments. References Cameron A. C. and D. L. Miller, 2011,Robust Inference with Clustered Data, in A. Ullah and D. E. Giles eds., Handbook of Empirical Economics and Finance, CRC Press, MacKinnon, J. and H., White, Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties, Journal of Econometrics 29(3), Petersen, M., 2009, Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches, Review of Financial Studies, 22,1: Thompson, S., 2011, Simple formulas for standard errors that cluster by both firm and time, Journal of Financial Economics, 99, 1: White, H. 1980, A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity, Econometrica, 48, Gow, I., G. Ormazabal, and D.Taylor, 2010, Correcting for Cross-Sectional and Time-Series Dependence in Accounting Research, The Accounting Review, 85, 2,

Heteroscedasticity-Consistent Standard Error Estimates for the Linear Regression Model: SPSS and SAS Implementation. Andrew F.

Heteroscedasticity-Consistent Standard Error Estimates for the Linear Regression Model: SPSS and SAS Implementation. Andrew F. Heteroscedasticity-Consistent Standard Error Estimates for the Linear Regression Model: SPSS and SAS Implementation Andrew F. Hayes 1 The Ohio State University Columbus, Ohio hayes.338@osu.edu Draft: January

More information

Standard Errors in OLS Luke Sonnet

Standard Errors in OLS Luke Sonnet Standard Errors in OLS Luke Sonnet Contents Variance-Covariance of ˆβ 1 Standard Estimation (Spherical Errors) 2 Robust Estimation (Heteroskedasticity Constistent Errors) 4 Cluster Robust Estimation 7

More information

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms

More information

Using SAS and STATA in Archival Accounting Research

Using SAS and STATA in Archival Accounting Research Using SAS and STATA in Archival Accounting Research Kai Chen Dec 2, 2014 Overview SAS and STATA are most commonly used software in archival accounting research. SAS is harder to learn. STATA is much easier.

More information

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion

More information

Meeting 1 Introduction to Functions. Part 1 Graphing Points on a Plane (REVIEW) Part 2 What is a function?

Meeting 1 Introduction to Functions. Part 1 Graphing Points on a Plane (REVIEW) Part 2 What is a function? Meeting 1 Introduction to Functions Part 1 Graphing Points on a Plane (REVIEW) A plane is a flat, two-dimensional surface. We describe particular locations, or points, on a plane relative to two number

More information

Statistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland

Statistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland Statistical Analysis Using Combined Data Sources: Discussion 2011 JPSM Distinguished Lecture University of Maryland 1 1 University of Michigan School of Public Health April 2011 Complete (Ideal) vs. Observed

More information

BACKGROUND INFORMATION ON COMPLEX SAMPLE SURVEYS

BACKGROUND INFORMATION ON COMPLEX SAMPLE SURVEYS Analysis of Complex Sample Survey Data Using the SURVEY PROCEDURES and Macro Coding Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT The paper presents

More information

Poisson Regressions for Complex Surveys

Poisson Regressions for Complex Surveys Poisson Regressions for Complex Surveys Overview Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population.

More information

Applied Survey Data Analysis Module 2: Variance Estimation March 30, 2013

Applied Survey Data Analysis Module 2: Variance Estimation March 30, 2013 Applied Statistics Lab Applied Survey Data Analysis Module 2: Variance Estimation March 30, 2013 Approaches to Complex Sample Variance Estimation In simple random samples many estimators are linear estimators

More information

Lag Order and Critical Values of the Augmented Dickey-Fuller Test: A Replication

Lag Order and Critical Values of the Augmented Dickey-Fuller Test: A Replication MPRA Munich Personal RePEc Archive Lag Order and Critical Values of the Augmented Dicey-Fuller Test: A Replication Tamer Kulasizoglu 31 August 2014 Online at https://mpra.ub.uni-muenchen.de/60456/ MPRA

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this

More information

Machine Learning: An Applied Econometric Approach Online Appendix

Machine Learning: An Applied Econometric Approach Online Appendix Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail

More information

Analysis of Complex Survey Data with SAS

Analysis of Complex Survey Data with SAS ABSTRACT Analysis of Complex Survey Data with SAS Christine R. Wells, Ph.D., UCLA, Los Angeles, CA The differences between data collected via a complex sampling design and data collected via other methods

More information

HILDA PROJECT TECHNICAL PAPER SERIES No. 2/08, February 2008

HILDA PROJECT TECHNICAL PAPER SERIES No. 2/08, February 2008 HILDA PROJECT TECHNICAL PAPER SERIES No. 2/08, February 2008 HILDA Standard Errors: A Users Guide Clinton Hayes The HILDA Project was initiated, and is funded, by the Australian Government Department of

More information

SAS Programs and Output for Alternative Experimentals Designs. Section 13-8 in Howell (2010)

SAS Programs and Output for Alternative Experimentals Designs. Section 13-8 in Howell (2010) SAS Programs and Output for Alternative Experimentals Designs. Section 13-8 in Howell (2010) In Statistical Methods for Psychology (7 th ed.) I discuss various alternative experimental designs involving

More information

Applied Statistics and Econometrics Lecture 6

Applied Statistics and Econometrics Lecture 6 Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,

More information

Introduction to Mixed Models: Multivariate Regression

Introduction to Mixed Models: Multivariate Regression Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate

More information

Speeding Up the ARDL Estimation Command:

Speeding Up the ARDL Estimation Command: Speeding Up the ARDL Estimation Command: A Case Study in Efficient Programming in Stata and Mata Sebastian Kripfganz 1 Daniel C. Schneider 2 1 University of Exeter 2 Max Planck Institute for Demographic

More information

Computing Murphy Topel-corrected variances in a heckprobit model with endogeneity

Computing Murphy Topel-corrected variances in a heckprobit model with endogeneity The Stata Journal (2010) 10, Number 2, pp. 252 258 Computing Murphy Topel-corrected variances in a heckprobit model with endogeneity Juan Muro Department of Statistics, Economic Structure, and International

More information

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix Serial Correlation and Heteroscedasticity in Time series Regressions Econometric (EC3090) - Week 11 Agustín Bénétrix 1 Properties of OLS with serially correlated errors OLS still unbiased and consistent

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

3.6 Sample code: yrbs_data <- read.spss("yrbs07.sav",to.data.frame=true)

3.6 Sample code: yrbs_data <- read.spss(yrbs07.sav,to.data.frame=true) InJanuary2009,CDCproducedareportSoftwareforAnalyisofYRBSdata, describingtheuseofsas,sudaan,stata,spss,andepiinfoforanalyzingdatafrom theyouthriskbehaviorssurvey. ThisreportprovidesthesameinformationforRandthesurveypackage.Thetextof

More information

Basics of Computational Geometry

Basics of Computational Geometry Basics of Computational Geometry Nadeem Mohsin October 12, 2013 1 Contents This handout covers the basic concepts of computational geometry. Rather than exhaustively covering all the algorithms, it deals

More information

Instrumental variables, bootstrapping, and generalized linear models

Instrumental variables, bootstrapping, and generalized linear models The Stata Journal (2003) 3, Number 4, pp. 351 360 Instrumental variables, bootstrapping, and generalized linear models James W. Hardin Arnold School of Public Health University of South Carolina Columbia,

More information

Economics 561: Economics of Labour (Industrial Relations) Empirical Assignment #2 Due Date: March 7th

Economics 561: Economics of Labour (Industrial Relations) Empirical Assignment #2 Due Date: March 7th Page 1 of 5 2/16/2017 The University of British Columbia Vancouver School of Economics Economics 561: Economics of Labour (Industrial Relations) Professor Nicole M. Fortin Winter 2017 Professor Thomas

More information

Facilitator. TIME SERIES ECONOMETRICS FOR THE PRACTITIONER (E-VIEWS) Time Series Econometrics- I

Facilitator. TIME SERIES ECONOMETRICS FOR THE PRACTITIONER (E-VIEWS) Time Series Econometrics- I TIME SERIES ECONOMETRICS FOR THE PRACTITIONER (E-VIEWS) Econometrics- I 11 12 April 2014 Econometrics II 18 19 April 2014 AND PANEL DATA ECONOMETRICS (STATA) Econometrics I 2 3 May 2014 Econometrics II

More information

CLAREMONT MCKENNA COLLEGE. Fletcher Jones Student Peer to Peer Technology Training Program. Basic Statistics using Stata

CLAREMONT MCKENNA COLLEGE. Fletcher Jones Student Peer to Peer Technology Training Program. Basic Statistics using Stata CLAREMONT MCKENNA COLLEGE Fletcher Jones Student Peer to Peer Technology Training Program Basic Statistics using Stata An Introduction to Stata A Comparison of Statistical Packages... 3 Opening Stata...

More information

Information Criteria Methods in SAS for Multiple Linear Regression Models

Information Criteria Methods in SAS for Multiple Linear Regression Models Paper SA5 Information Criteria Methods in SAS for Multiple Linear Regression Models Dennis J. Beal, Science Applications International Corporation, Oak Ridge, TN ABSTRACT SAS 9.1 calculates Akaike s Information

More information

Small Sample Robust Fit Criteria in Latent Growth Models with Incomplete Data. Dan McNeish & Jeff Harring University of Maryland

Small Sample Robust Fit Criteria in Latent Growth Models with Incomplete Data. Dan McNeish & Jeff Harring University of Maryland Small Sample Robust Fit Criteria in Latent Growth Models with Incomplete Data Dan McNeish & Jeff Harring University of Maryland Growth Models With Small Samples An expanding literature has addressed the

More information

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017 Introduction to Hierarchical Linear Model Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017 1 Outline What is Hierarchical Linear Model? Why do nested data create analytic problems? Graphic presentation

More information

Week 11: Interpretation plus

Week 11: Interpretation plus Week 11: Interpretation plus Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline A bit of a patchwork

More information

Week 10: Heteroskedasticity II

Week 10: Heteroskedasticity II Week 10: Heteroskedasticity II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Dealing with heteroskedasticy

More information

Variance Estimation in Presence of Imputation: an Application to an Istat Survey Data

Variance Estimation in Presence of Imputation: an Application to an Istat Survey Data Variance Estimation in Presence of Imputation: an Application to an Istat Survey Data Marco Di Zio, Stefano Falorsi, Ugo Guarnera, Orietta Luzi, Paolo Righi 1 Introduction Imputation is the commonly used

More information

Labor Economics with STATA. Estimating the Human Capital Model Using Artificial Data

Labor Economics with STATA. Estimating the Human Capital Model Using Artificial Data Labor Economics with STATA Liyousew G. Borga December 2, 2015 Estimating the Human Capital Model Using Artificial Data Liyou Borga Labor Economics with STATA December 2, 2015 84 / 105 Outline 1 The Human

More information

Bivariate (Simple) Regression Analysis

Bivariate (Simple) Regression Analysis Revised July 2018 Bivariate (Simple) Regression Analysis This set of notes shows how to use Stata to estimate a simple (two-variable) regression equation. It assumes that you have set Stata up on your

More information

Estimation of Unknown Parameters in Dynamic Models Using the Method of Simulated Moments (MSM)

Estimation of Unknown Parameters in Dynamic Models Using the Method of Simulated Moments (MSM) Estimation of Unknown Parameters in ynamic Models Using the Method of Simulated Moments (MSM) Abstract: We introduce the Method of Simulated Moments (MSM) for estimating unknown parameters in dynamic models.

More information

rdrobust Local Polynomial Regression Discontinuity Estimation with Robust Bias Corrected Confidence Intervals and Inference Procedures.

rdrobust Local Polynomial Regression Discontinuity Estimation with Robust Bias Corrected Confidence Intervals and Inference Procedures. Tuesday September 25 17:28:08 2018 Page 1 Statistics/Data Analysis Title Syntax rdrobust Local Polynomial Regression Discontinuity Estimation with Robust Bias Corrected Confidence Intervals and Inference

More information

May 24, Emil Coman 1 Yinghui Duan 2 Daren Anderson 3

May 24, Emil Coman 1 Yinghui Duan 2 Daren Anderson 3 Assessing Health Disparities in Intensive Longitudinal Data: Gender Differences in Granger Causality Between Primary Care Provider and Emergency Room Usage, Assessed with Medicaid Insurance Claims May

More information

A Very Brief EViews Tutorial

A Very Brief EViews Tutorial A Very Brief EViews Tutorial Contents Importing data... 2 Transformations and generating new series... 4 Drawing graphs... 6 Regressions... 7 Forecasting... 9 Testing... 10 1 Importing data The easiest

More information

MIXED_RELIABILITY: A SAS Macro for Estimating Lambda and Assessing the Trustworthiness of Random Effects in Multilevel Models

MIXED_RELIABILITY: A SAS Macro for Estimating Lambda and Assessing the Trustworthiness of Random Effects in Multilevel Models SESUG 2015 Paper SD189 MIXED_RELIABILITY: A SAS Macro for Estimating Lambda and Assessing the Trustworthiness of Random Effects in Multilevel Models Jason A. Schoeneberger ICF International Bethany A.

More information

Disclaimer. Lect 2: empirical analyses of graphs

Disclaimer. Lect 2: empirical analyses of graphs 462 Page 1 Lect 2: empirical analyses of graphs Tuesday, September 11, 2007 8:30 AM Disclaimer These are my personal notes from this lecture. They may be wrong or inaccurate, and have not carefully been

More information

Paper PO-06. Gone are the days when social and behavioral science researchers should simply report obtained test statistics (e.g.

Paper PO-06. Gone are the days when social and behavioral science researchers should simply report obtained test statistics (e.g. Paper PO-06 CI_MEDIATE: A SAS Macro for Computing Point and Interval Estimates of Effect Sizes Associated with Mediation Analysis Thanh V. Pham, University of South Florida, Tampa, FL Eun Kyeng Baek, University

More information

MODULE THREE, PART FOUR: PANEL DATA ANALYSIS IN ECONOMIC EDUCATION RESEARCH USING SAS

MODULE THREE, PART FOUR: PANEL DATA ANALYSIS IN ECONOMIC EDUCATION RESEARCH USING SAS MODULE THREE, PART FOUR: PANEL DATA ANALYSIS IN ECONOMIC EDUCATION RESEARCH USING SAS Part Four of Module Three provides a cookbook-type demonstration of the steps required to use SAS in panel data analysis.

More information

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #44. Multidimensional Array and pointers

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #44. Multidimensional Array and pointers Introduction to Programming in C Department of Computer Science and Engineering Lecture No. #44 Multidimensional Array and pointers In this video, we will look at the relation between Multi-dimensional

More information

Quality Checking an fmri Group Result (art_groupcheck)

Quality Checking an fmri Group Result (art_groupcheck) Quality Checking an fmri Group Result (art_groupcheck) Paul Mazaika, Feb. 24, 2009 A statistical parameter map of fmri group analyses relies on the assumptions of the General Linear Model (GLM). The assumptions

More information

1. Estimation equations for strip transect sampling, using notation consistent with that used to

1. Estimation equations for strip transect sampling, using notation consistent with that used to Web-based Supplementary Materials for Line Transect Methods for Plant Surveys by S.T. Buckland, D.L. Borchers, A. Johnston, P.A. Henrys and T.A. Marques Web Appendix A. Introduction In this on-line appendix,

More information

GETTING STARTED WITH THE STUDENT EDITION OF LISREL 8.51 FOR WINDOWS

GETTING STARTED WITH THE STUDENT EDITION OF LISREL 8.51 FOR WINDOWS GETTING STARTED WITH THE STUDENT EDITION OF LISREL 8.51 FOR WINDOWS Gerhard Mels, Ph.D. mels@ssicentral.com Senior Programmer Scientific Software International, Inc. 1. Introduction The Student Edition

More information

piecewise ginireg 1 Piecewise Gini Regressions in Stata Jan Ditzen 1 Shlomo Yitzhaki 2 September 8, 2017

piecewise ginireg 1 Piecewise Gini Regressions in Stata Jan Ditzen 1 Shlomo Yitzhaki 2 September 8, 2017 piecewise ginireg 1 Piecewise Gini Regressions in Stata Jan Ditzen 1 Shlomo Yitzhaki 2 1 Heriot-Watt University, Edinburgh, UK Center for Energy Economics Research and Policy (CEERP) 2 The Hebrew University

More information

Data: a collection of numbers or facts that require further processing before they are meaningful

Data: a collection of numbers or facts that require further processing before they are meaningful Digital Image Classification Data vs. Information Data: a collection of numbers or facts that require further processing before they are meaningful Information: Derived knowledge from raw data. Something

More information

Reals 1. Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method.

Reals 1. Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method. Reals 1 13 Reals Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method. 13.1 Floating-point numbers Real numbers, those declared to be

More information

Dual-Frame Weights (Landline and Cell) for the 2009 Minnesota Health Access Survey

Dual-Frame Weights (Landline and Cell) for the 2009 Minnesota Health Access Survey Dual-Frame Weights (Landline and Cell) for the 2009 Minnesota Health Access Survey Kanru Xia 1, Steven Pedlow 1, Michael Davern 1 1 NORC/University of Chicago, 55 E. Monroe Suite 2000, Chicago, IL 60603

More information

STATA Tutorial. Introduction to Econometrics. by James H. Stock and Mark W. Watson. to Accompany

STATA Tutorial. Introduction to Econometrics. by James H. Stock and Mark W. Watson. to Accompany STATA Tutorial to Accompany Introduction to Econometrics by James H. Stock and Mark W. Watson STATA Tutorial to accompany Stock/Watson Introduction to Econometrics Copyright 2003 Pearson Education Inc.

More information

Algebra 2 Common Core Summer Skills Packet

Algebra 2 Common Core Summer Skills Packet Algebra 2 Common Core Summer Skills Packet Our Purpose: Completion of this packet over the summer before beginning Algebra 2 will be of great value to helping students successfully meet the academic challenges

More information

A User Manual for the Multivariate MLE Tool. Before running the main multivariate program saved in the SAS file Part2-Main.sas,

A User Manual for the Multivariate MLE Tool. Before running the main multivariate program saved in the SAS file Part2-Main.sas, A User Manual for the Multivariate MLE Tool Before running the main multivariate program saved in the SAS file Part-Main.sas, the user must first compile the macros defined in the SAS file Part-Macros.sas

More information

Computing Optimal Strata Bounds Using Dynamic Programming

Computing Optimal Strata Bounds Using Dynamic Programming Computing Optimal Strata Bounds Using Dynamic Programming Eric Miller Summit Consulting, LLC 7/27/2012 1 / 19 Motivation Sampling can be costly. Sample size is often chosen so that point estimates achieve

More information

Methods for Estimating Change from NSCAW I and NSCAW II

Methods for Estimating Change from NSCAW I and NSCAW II Methods for Estimating Change from NSCAW I and NSCAW II Paul Biemer Sara Wheeless Keith Smith RTI International is a trade name of Research Triangle Institute 1 Course Outline Review of NSCAW I and NSCAW

More information

Working Paper No. 782

Working Paper No. 782 Working Paper No. 782 Feasible Estimation of Linear Models with N-fixed Effects by Fernando Rios-Avila* Levy Economics Institute of Bard College December 2013 *Acknowledgements: This paper has benefited

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model

More information

Floating-point numbers. Phys 420/580 Lecture 6

Floating-point numbers. Phys 420/580 Lecture 6 Floating-point numbers Phys 420/580 Lecture 6 Random walk CA Activate a single cell at site i = 0 For all subsequent times steps, let the active site wander to i := i ± 1 with equal probability Random

More information

Markscheme May 2017 Mathematical studies Standard level Paper 1

Markscheme May 2017 Mathematical studies Standard level Paper 1 M17/5/MATSD/SP1/ENG/TZ/XX/M Markscheme May 017 Mathematical studies Standard level Paper 1 3 pages M17/5/MATSD/SP1/ENG/TZ/XX/M This markscheme is the property of the International Baccalaureate and must

More information

An Iterative Approach to Estimation with Multiple High-Dimensional Fixed Effects

An Iterative Approach to Estimation with Multiple High-Dimensional Fixed Effects An Iterative Approach to Estimation with Multiple High-Dimensional Fixed Effects Abstract Siyi Luo, Wenjia Zhu, Randall P. Ellis March 23, 2017 Department of Economics, Boston University Controlling for

More information

Epidemiological analysis PhD-course in epidemiology

Epidemiological analysis PhD-course in epidemiology Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 9. oktober 2012 Multivariate tables Agenda today Age standardization Missing data 1 2 3 4 Age standardization

More information

Package endogenous. October 29, 2016

Package endogenous. October 29, 2016 Package endogenous October 29, 2016 Type Package Title Classical Simultaneous Equation Models Version 1.0 Date 2016-10-25 Maintainer Andrew J. Spieker Description Likelihood-based

More information

Epidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014

Epidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014 Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 25 th February 2014 Age standardization Incidence and prevalence are strongly agedependent Risks rising

More information

Adjusting for Unequal Selection Probability in Multilevel Models: A Comparison of Software Packages

Adjusting for Unequal Selection Probability in Multilevel Models: A Comparison of Software Packages Adusting for Unequal Selection Probability in Multilevel Models: A Comparison of Software Packages Kim Chantala Chirayath Suchindran Carolina Population Center, UNC at Chapel Hill Carolina Population Center,

More information

5.5 Regression Estimation

5.5 Regression Estimation 5.5 Regression Estimation Assume a SRS of n pairs (x, y ),..., (x n, y n ) is selected from a population of N pairs of (x, y) data. The goal of regression estimation is to take advantage of a linear relationship

More information

Panel Data 4: Fixed Effects vs Random Effects Models

Panel Data 4: Fixed Effects vs Random Effects Models Panel Data 4: Fixed Effects vs Random Effects Models Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 4, 2017 These notes borrow very heavily, sometimes verbatim,

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

The SAS interface is shown in the following screen shot:

The SAS interface is shown in the following screen shot: The SAS interface is shown in the following screen shot: There are several items of importance shown in the screen shot First there are the usual main menu items, such as File, Edit, etc I seldom use anything

More information

The method of rationalizing

The method of rationalizing Roberto s Notes on Differential Calculus Chapter : Resolving indeterminate forms Section The method of rationalizing What you need to know already: The concept of it and the factor-and-cancel method of

More information

User s guide to R functions for PPS sampling

User s guide to R functions for PPS sampling User s guide to R functions for PPS sampling 1 Introduction The pps package consists of several functions for selecting a sample from a finite population in such a way that the probability that a unit

More information

Planting the Seeds Exploring Cubic Functions

Planting the Seeds Exploring Cubic Functions 295 Planting the Seeds Exploring Cubic Functions 4.1 LEARNING GOALS In this lesson, you will: Represent cubic functions using words, tables, equations, and graphs. Interpret the key characteristics of

More information

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions

More information

Monte Carlo Integration

Monte Carlo Integration Lab 18 Monte Carlo Integration Lab Objective: Implement Monte Carlo integration to estimate integrals. Use Monte Carlo Integration to calculate the integral of the joint normal distribution. Some multivariable

More information

The Importance of Modeling the Sampling Design in Multiple. Imputation for Missing Data

The Importance of Modeling the Sampling Design in Multiple. Imputation for Missing Data The Importance of Modeling the Sampling Design in Multiple Imputation for Missing Data Jerome P. Reiter, Trivellore E. Raghunathan, and Satkartar K. Kinney Key Words: Complex Sampling Design, Multiple

More information

The method of rationalizing

The method of rationalizing Roberto s Notes on Differential Calculus Chapter : Resolving indeterminate forms Section The method of rationalizing What you need to know already: The concept of it and the factor-and-cancel method of

More information

Course Number 432/433 Title Algebra II (A & B) H Grade # of Days 120

Course Number 432/433 Title Algebra II (A & B) H Grade # of Days 120 Whitman-Hanson Regional High School provides all students with a high- quality education in order to develop reflective, concerned citizens and contributing members of the global community. Course Number

More information

Working Paper. Spatial Differencing: Estimation and Inference. Highlights. Federico Belotti, Edoardo Di Porto & Gianluca Santoni

Working Paper. Spatial Differencing: Estimation and Inference. Highlights. Federico Belotti, Edoardo Di Porto & Gianluca Santoni No 2017-10 June Working Paper Spatial Differencing: Estimation and Inference Federico Belotti, Edoardo Di Porto & Gianluca Santoni Highlights Spatial differencing is a spatial data transformation pioneered

More information

Nonparametric Survey Regression Estimation in Two-Stage Spatial Sampling

Nonparametric Survey Regression Estimation in Two-Stage Spatial Sampling Nonparametric Survey Regression Estimation in Two-Stage Spatial Sampling Siobhan Everson-Stewart, F. Jay Breidt, Jean D. Opsomer January 20, 2004 Key Words: auxiliary information, environmental surveys,

More information

Sampling Statistics Guide. Author: Ali Fadakar

Sampling Statistics Guide. Author: Ali Fadakar Sampling Statistics Guide Author: Ali Fadakar An Introduction to the Sampling Interface Sampling interface is an interactive software package that uses statistical procedures such as random sampling, stratified

More information

JMASM 46: Algorithm for Comparison of Robust Regression Methods In Multiple Linear Regression By Weighting Least Square Regression (SAS)

JMASM 46: Algorithm for Comparison of Robust Regression Methods In Multiple Linear Regression By Weighting Least Square Regression (SAS) Journal of Modern Applied Statistical Methods Volume 16 Issue 2 Article 27 December 2017 JMASM 46: Algorithm for Comparison of Robust Regression Methods In Multiple Linear Regression By Weighting Least

More information

State Approved Calculators for Standards of Learning Testing: Guidelines and Preparation Instructions for Testing

State Approved Calculators for Standards of Learning Testing: Guidelines and Preparation Instructions for Testing Standards of Learning (SOL) Assessment Grade 4 Mathematics Grade 4 Plain English Mathematics Grade 5 Mathematics Grade 5 Plain English Mathematics Grade 5 Science Grade 6 Mathematics Grade 6 Plain English

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Training Intelligent Stoplights

Training Intelligent Stoplights Training Intelligent Stoplights Thomas Davids, Michael Celentano, and Luke Knepper December 14, 2012 1 Introduction Traffic is a huge problem for the American economy. In 2010, the average American commuter

More information

Table of Laplace Transforms

Table of Laplace Transforms Table of Laplace Transforms 1 1 2 3 4, p > -1 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Heaviside Function 27 28. Dirac Delta Function 29 30. 31 32. 1 33 34. 35 36. 37 Laplace Transforms

More information

Conditional Volatility Estimation by. Conditional Quantile Autoregression

Conditional Volatility Estimation by. Conditional Quantile Autoregression International Journal of Mathematical Analysis Vol. 8, 2014, no. 41, 2033-2046 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijma.2014.47210 Conditional Volatility Estimation by Conditional Quantile

More information

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine

More information

Chapter 1 Introduction. Chapter Contents

Chapter 1 Introduction. Chapter Contents Chapter 1 Introduction Chapter Contents OVERVIEW OF SAS/STAT SOFTWARE................... 17 ABOUT THIS BOOK.............................. 17 Chapter Organization............................. 17 Typographical

More information

Chapter 7: Dual Modeling in the Presence of Constant Variance

Chapter 7: Dual Modeling in the Presence of Constant Variance Chapter 7: Dual Modeling in the Presence of Constant Variance 7.A Introduction An underlying premise of regression analysis is that a given response variable changes systematically and smoothly due to

More information

REGLERTEKNIK AUTOMATIC CONTROL LINKÖPING

REGLERTEKNIK AUTOMATIC CONTROL LINKÖPING An Iterative Method for Identification of ARX Models from Incomplete Data Ragnar Wallin, Alf Isaksson, and Lennart Ljung Department of Electrical Engineering Linkping University, S-81 8 Linkping, Sweden

More information

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but

More information

Package OLScurve. August 29, 2016

Package OLScurve. August 29, 2016 Type Package Title OLS growth curve trajectories Version 0.2.0 Date 2014-02-20 Package OLScurve August 29, 2016 Maintainer Provides tools for more easily organizing and plotting individual ordinary least

More information

The SAS %BLINPLUS Macro

The SAS %BLINPLUS Macro The SAS %BLINPLUS Macro Roger Logan and Donna Spiegelman April 10, 2012 Abstract The macro %blinplus corrects for measurement error in one or more model covariates logistic regression coefficients, their

More information

Econometrics Economics 345

Econometrics Economics 345 1 Econometrics Economics 345 David M. Levy Carow Hall 2pm Tuesday & Thursday Virtual Office: DavidMLevy@gmail.com Course Goal. We shall look upon econometrics as something practiced by optimizing agents.

More information

Vignette of the JoSAE package

Vignette of the JoSAE package Vignette of the JoSAE package Johannes Breidenbach 6 October 2011: JoSAE 0.2 1 Introduction The aim in the analysis of sample surveys is frequently to derive estimates of subpopulation characteristics.

More information

An algorithm for censored quantile regressions. Abstract

An algorithm for censored quantile regressions. Abstract An algorithm for censored quantile regressions Thanasis Stengos University of Guelph Dianqin Wang University of Guelph Abstract In this paper, we present an algorithm for Censored Quantile Regression (CQR)

More information

Multiple-imputation analysis using Stata s mi command

Multiple-imputation analysis using Stata s mi command Multiple-imputation analysis using Stata s mi command Yulia Marchenko Senior Statistician StataCorp LP 2009 UK Stata Users Group Meeting Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi

More information

Chapter 4. The Classification of Species and Colors of Finished Wooden Parts Using RBFNs

Chapter 4. The Classification of Species and Colors of Finished Wooden Parts Using RBFNs Chapter 4. The Classification of Species and Colors of Finished Wooden Parts Using RBFNs 4.1 Introduction In Chapter 1, an introduction was given to the species and color classification problem of kitchen

More information

11 1. Introductory part In order to follow the contents of this book with full understanding, certain prerequisites from high school mathematics will

11 1. Introductory part In order to follow the contents of this book with full understanding, certain prerequisites from high school mathematics will 11 1. Introductory part In order to follow the contents of this book with full understanding, certain prerequisites from high school mathematics will be necessary. This firstly pertains to the basic concepts

More information