Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017

Similar documents
HLM versus SEM Perspectives on Growth Curve Modeling. Hsueh-Sheng Wu CFDR Workshop Series August 3, 2015

Bivariate (Simple) Regression Analysis

Growth Curve Modeling in Stata. Hsueh-Sheng Wu CFDR Workshop Series February 12, 2018

DETAILED CONTENTS. About the Editor About the Contributors PART I. GUIDE 1

Panel Data 4: Fixed Effects vs Random Effects Models

Performing Cluster Bootstrapped Regressions in R

The linear mixed model: modeling hierarchical and longitudinal data

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Generalized least squares (GLS) estimates of the level-2 coefficients,

Week 4: Simple Linear Regression III

Week 10: Heteroskedasticity II

PSY 9556B (Feb 5) Latent Growth Modeling

Week 5: Multiple Linear Regression II

Missing Data Techniques

Week 11: Interpretation plus

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

Lab 2: OLS regression

schooling.log 7/5/2006

Week 4: Simple Linear Regression II

piecewise ginireg 1 Piecewise Gini Regressions in Stata Jan Ditzen 1 Shlomo Yitzhaki 2 September 8, 2017

Linear regression Number of obs = 6,866 F(16, 326) = Prob > F = R-squared = Root MSE =

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

Getting Correct Results from PROC REG

MIXED_RELIABILITY: A SAS Macro for Estimating Lambda and Assessing the Trustworthiness of Random Effects in Multilevel Models

Introduction to Mixed Models: Multivariate Regression

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

Stat 5100 Handout #6 SAS: Linear Regression Remedial Measures

Week 9: Modeling II. Marcelo Coca Perraillon. Health Services Research Methods I HSMP University of Colorado Anschutz Medical Campus

Missing Data Missing Data Methods in ML Multiple Imputation

INTRODUCTION TO PANEL DATA ANALYSIS

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation. Simple Linear Regression Software: Stata v 10.1

An Introduction to Growth Curve Analysis using Structural Equation Modeling

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

Introduction to Mplus

Health Disparities (HD): It s just about comparing two groups

CSC 328/428 Summer Session I 2002 Data Analysis for the Experimenter FINAL EXAM

ANNOUNCING THE RELEASE OF LISREL VERSION BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3

CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

Two-Stage Least Squares

Modeling Categorical Outcomes via SAS GLIMMIX and STATA MEOLOGIT/MLOGIT (data, syntax, and output available for SAS and STATA electronically)

Stata versions 12 & 13 Week 4 Practice Problems

Using HLM for Presenting Meta Analysis Results. R, C, Gardner Department of Psychology

Stat 5100 Handout #19 SAS: Influential Observations and Outliers

Heteroscedasticity-Consistent Standard Error Estimates for the Linear Regression Model: SPSS and SAS Implementation. Andrew F.

Newsom Psy 510/610 Multilevel Regression, Spring

Introduction to Mixed-Effects Models for Hierarchical and Longitudinal Data

25 Working with categorical data and factor variables

A Beginner's Guide to. Randall E. Schumacker. The University of Alabama. Richard G. Lomax. The Ohio State University. Routledge

Stata Session 2. Tarjei Havnes. University of Oslo. Statistics Norway. ECON 4136, UiO, 2012

An Econometric Study: The Cost of Mobile Broadband

SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY

Centering and Interactions: The Training Data

5.5 Regression Estimation

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Example Using Missing Data 1

Study Guide. Module 1. Key Terms

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

/23/2004 TA : Jiyoon Kim. Recitation Note 1

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

Introduction to Stata: An In-class Tutorial

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

Robust Linear Regression (Passing- Bablok Median-Slope)

Implementing Resampling Methods for Design-based Variance Estimation in Multilevel Models: Using HLM6 and SAS together

Hierarchical Generalized Linear Models

Statistical Analysis of List Experiments

Chapters 5-6: Statistical Inference Methods

Compare Linear Regression Lines for the HP-67

A SAS MACRO for estimating bootstrapped confidence intervals in dyadic regression

PRI Workshop Introduction to AMOS

Stat 401 B Lecture 26

texdoc 2.0 An update on creating LaTeX documents from within Stata Example 2

An introduction to SPSS

Online Supplementary Appendix for. Dziak, Nahum-Shani and Collins (2012), Multilevel Factorial Experiments for Developing Behavioral Interventions:

Source:

ECON Introductory Econometrics Seminar 4

Creating LaTeX and HTML documents from within Stata using texdoc and webdoc. Example 2

Analyzing Correlated Data in SAS

JMASM 46: Algorithm for Comparison of Robust Regression Methods In Multiple Linear Regression By Weighting Least Square Regression (SAS)

An imputation approach for analyzing mixed-mode surveys

Introduction to Statistical Analyses in SAS

ECON Stata course, 3rd session

ST Lab 1 - The basics of SAS

Multiple Regression White paper

Handbook of Statistical Modeling for the Social and Behavioral Sciences

LISREL 10.1 RELEASE NOTES 2 1 BACKGROUND 2 2 MULTIPLE GROUP ANALYSES USING A SINGLE DATA FILE 2

Model Selection Using Information Criteria (Made Easy in SAS )

An Introductory Guide to Stata

Missing Data. SPIDA 2012 Part 6 Mixed Models with R:

Intermediate Stata Workshop. Hsueh-Sheng Wu CFDR Workshop Series Spring 2009

Missing Data: What Are You Missing?

Introduction to Programming in Stata

Introduction to STATA 6.0 ECONOMICS 626

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Transcription:

Introduction to Hierarchical Linear Model Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017 1

Outline What is Hierarchical Linear Model? Why do nested data create analytic problems? Graphic presentation Building HLM models Basic HLM models Unconditional Random Intercept Model Random Intercept Model with a level 2 predictor Random-coefficient Model with a Level 1 predictor Random-coefficient Model with predictors from two different levels Unconditional Growth Curve Model without predictors Growth Curve Model with a level 1 predictor SAS codes for basic HLM models Stata codes for basic HLM models Conclusions

What Is Hierarchical Linear Model? A statistical technique that takes account the nested structure of the data when modeling the linear relations among parameters. Social scientists often deal with nested data because individuals are embedded within their environments. Two types of nested structures: In cross-sectional data, individuals nested within groups. In panel data, individual data collected at different time points are viewed as nested within individuals. Applications Multilevel analysis Growth curve analysis Meta-Analysis Software: SAS, Stata, HLM, SPSS, R, LISREL, and Mplus 3

Why Do Nested Data Create Analytic Problems? Statistically, nested data structure violates thee assumptions of regression: Independent observations Independent error terms Equal variances of errors for all observations Empirically, when one of these assumptions is violated, the estimates are biased. Nested data structure can also leads to different results, depending on which level on which the analysis is conducted. 4

Why Do Nested Data Create Analytic Problems? (Cont.) Hypothetical Data Example from Snijders and Bosker (1999) Table 1. Sample data from 10 respondents group id x y 1 1 1 5 1 2 3 7 2 3 2 4 2 4 4 6 3 5 3 3 3 6 5 5 4 7 4 2 4 8 6 4 5 9 5 1 5 10 7 3 5

Why Do Nested Data Create Analytic Problems? (Cont.) Use information from both individual and group levels, that is, we regress Y on X within each group. Y = 4 + X (group 1) Y = 2 + X (group 2) Y = 0 + X (group 3) Y = -2 + X (group 4) Y = -4 + X (group 5) The finding shows a positive association between X and Y for all five group, suggesting that X should be positively associated with Y when all of the five groups were aggregated. 6

Why Do Nested Data Create analytic Problems? (Cont.) Disaggregation analysis (analyzing the data at the individual level and ignoring the fact that respondents are from different groups). reg y x Source SS df MS Number of obs = 10 -------------+------------------------------ F( 1, 8) = 1.00 Model 3.33333333 1 3.33333333 Prob > F = 0.3466 Residual 26.6666667 8 3.33333333 R-squared = 0.1111 -------------+------------------------------ Adj R-squared = 0.0000 Total 30 9 3.33333333 Root MSE = 1.8257 ------------------------------------------------------------------------------ y Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- x -.3333333.3333333-1.00 0.347-1.102001.4353347 _cons 5.333333 1.452966 3.67 0.006 1.982787 8.68388 ------------------------------------------------------------------------------ Y= 5.33 -.33X + e One unit increases in X leads to 0.33 unit decrease in Y. The result is different from what we would have expected from the previous analysis. 7

Why Do Nested Data Create Analytic Problems? (Cont.) Aggregate analysis (focusing on group-level variables and ignore individual-level variables; that is, using the average of X to predict the average of Y at the group level): reg mean_y mean_x Source SS df MS Number of obs = 5 -------------+------------------------------ F( 1, 3) =. Model 10 1 10 Prob > F =. Residual 0 3 0 R-squared = 1.0000 -------------+------------------------------ Adj R-squared = 1.0000 Total 10 4 2.5 Root MSE = 0 ------------------------------------------------------------------------------ mean_y Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- mean_x -1..... _cons 8..... ------------------------------------------------------------------------------ Y = 8 X + e One unit increases in X leads to 1 unit decrease in Y. This method also reduce the power, because groups are the unit of analysis. In addition, the result is different from what we would have expected from the previous analyses.

Graphic Presentation of Fixed or Random Intercepts or Slopes A. Groups with same intercept and slope B. Groups with different intercepts and same slope C. Groups with same intercept and different slopes D. Groups with different intercepts and slopes 9

Find a research topic Building HLM Models Find a data set with variables from different levels Contemplate whether the intercept or slope(s) can vary at the higher level Always write the equations for the full model. You start with the equation for the lower level, then with the ones for the higher level, and finally combines these two sets of equations. Choose software to run the models 10

Multi-level models: Basic HLM Models Unconditional Random Intercept Model Random Intercept Model with a level 2 predictor Random-coefficient Model with a Level 1 predictor Random-coefficient Model with predictors from two different levels Growth Curve models: Unconditional Growth Curve Model without predictors Growth Curve Model with a level 1 predictor 11

Unconditional Random Intercept Model Research Question: We would like to know whether students from different schools have different levels of mathematics achievement. Level 1 Model: MATHACH ij = β 0j + r ij Level 2 Model: β 0j = γ 00 + u 0j Full Model: MATHACH ij = γ 00 + u 0j + r ij 12

Intra-class association Intra-class tells you whether the variance attributable to the higher-level units, relative to that from the lower-level units, is significant If it is significant, it justifies the use of HLM models. Intra-class association = u 0 /(u 0 + r ij ) =8.6097/(8.6097 + 39.1487) =0.18 13

MATHACH ij = γ 00 + γ 01 (MEANSES) + u 0j + r ij 14 Random Intercept Model with a Level 2 Predictor Research question is whether the SES level of schools are associated with the mathematic achievements of students. Level 1 Model: MATHACH ij = β 0j + r ij Level 2 Model: β 0j = γ 00 + γ 01 (MEANSES) + u 0j Full model:

Random-coefficient Model with a Level 1 Predictor Research Questions is whether student s SES level, relative to that of other students in their schools, is associated with the mathematics achievement levels, and if so, whether such associations differ for schools. Level 1 Model: MATHACH ij = β 0j + β 1j (SES - MEANSES) + r ij Level 2 Model: β 0j = γ 00 + u 0j β 1j = γ 10 + u 1j Full Model: MATHACH ij = γ 00 + γ 10 (SES - MEANSES) + u 0j + u 1j (SES - MEANSES) + r ij 15

Random-coefficient Model with Predictors from Two Different Levels Research Questions is whether student s SES level, relative to that of other students in their schools, is associated with the mathematics achievement levels, and if so, whether the associations differ for public schools than for private schools. Level 1 model: MATHACH ij = β 0j + β 1j (SES - MEANSES) + r ij Level 2 model: β 0j = γ 00 + γ 01 (MEANSES) + γ 02 (SECTOR) + u 0j β 1j = γ 10 + γ 11 (MEANSES) + γ 12 (SECTOR) + u 1j Full Model: MATHACH ij = γ 00 + γ 01 (MEANSES) + γ 02 (SECTOR) + γ 10 (SES - MEANSES) + γ 11 (MEANSES)* (SES - MEANSES) + γ 12 (SECTOR)* (SES - MEANSES) + u 0j +u 1j (SES-MEANSES) + r ij 16

Unconditional Growth Curve Models Research Question is whether the change of dependent variable (Y) has a linear association with time. Level 1 model: Y ij =p 0j + p 1j (TIME) ij + r ij Level 2 model: p 0j =b 00 + u 0j p 1j =b 10 + u 1j Full Model: Y ij =[b 00 + b 10 TIME ij ] + [u 0j + u 1j TIME ij + r ij ] 17

Growth Curve Model with a Level 2 Predictor Research Question is whether the change of dependent variable (i.e., the change trajectory of Y) has a linear association with time and whether the individuals characteristics will influence this trajectory. Level 1 Model: Y ij =p 0j + p 1j (TIME) ij + r ij Level 2 Model: p 0j =b 00 + b 01 COVAR j + u 0j p 1j =b 10 + b 11 COVAR j + u 1j Full Model: Y ij =b 00 + b 10 (TIME) ij + b 01 (COVAR) ij + b 11 (COVAR)(TIME) ij + u 0j + u 1j (TIME) ij + r ij 18

SAS Codes for Multi-Level Models Unconditional Random Intercept Model: proc mixed data = in.hsb12 covtest noclprint; class school; model mathach = / solution; random intercept / subject = school; run; Random Intercept Model with a level 2 predictor: proc mixed data = in.hsb12 covtest noclprint; class school; model mathach = meanses / solution ddfm = bw; random intercept / subject = school; run; 19

SAS Codes for Multi-Level Models (cont.) Random-coefficient Model with a Level 1 predictor: data hsbc; set in.hsb12; cses = ses - meanses; run; proc mixed data = hsbc noclprint covtest noitprint; class school; model mathach = cses / solution ddfm = bw notest; random intercept cses / subject = school type = un gcorr; run; Random-coefficient Model with predictors from two different levels: proc mixed data = hsbc noclprint covtest noitprint; class school; model mathach = meanses sector cses meanses*cses sector*cses / solution ddfm = bw notest; random intercept cses / subject = school type = un; run; 20

SAS Codes for Growth Curve Models Unconditional Growth Curve Model without predictors: proc mixed data = in.willett noclprint covtest; class id; model y = time /solution ddfm = bw notest; random intercept time / subject = id type = un; run; Growth Curve Model with a level 1 predictor: data in.willett2; set in.willett; wave = time; ccovar = covar - 113.4571429; run; proc mixed data = in.willett2 noclprint covtest; class id; model y = time ccovar time*ccovar /solution ddfm = bw notest; random intercept time / subject = id type = un gcorr; run; 21

Stata Codes for Multi-Level Models Unconditional Random Intercept Model mixed mathach school:, variance Random Intercept Model with a level 2 predictor mixed mathach meanses school:, variance Random-coefficient Model with a Level 1 predictor mixed mathach cses school: cses, variance cov(un) Random-coefficient Model with predictors from two different levels generate msesxcses = meanses*cses generate secxcses = sector*cses mixed mathach meanses sector cses msesxcses secxcses school: cses, variance cov(un) 22

Stata Codes for Growth Curve Models Unconditional Growth Curve Model without predictors mixed y time id: time, variance cov(un) Growth Curve Model with a level 1 predictor mixed y time ccovar timebyccovar id: time, variance cov(un) 23

Conclusions Social scientists often deal with nested data because individuals are embedded within their environments. Hierarchical modeling allows researchers to take into account the associations among variables from different levels. You can do simple HLM models with SAS, Stata, HLM, SPSS, R, LISREL, or Mplus. However, some complex models may only be analyzed with a certain software. If you have any questions, please send me a note. 24

Journal Articles and book: References Judith Singer (1998) Using SAS PROC MIXED to Fit Multilevel Models, Hierarchical Models, and Individual Growth Models. Journal of Educational and Behavioral Statistics,23, 323-355. Maimon, David and Danielle Kuhl (2008) Social Control and Youth Suicidality: Situating Durkheim s Ideas in A Multilevel Framework. American Sociological Review, 73, 921-943. Multilevel Models in Family Research (2002) Some Conceptual and Methodological Issues. Journal of Marriage and Family, 64,280-294. Snijders, T and Bosker, (2000) Multilevel Analysis: An introduction to basic and advanced multilevel modeling, London, Sage. Webpages for Multilevel models: Princeton University http://data.princeton.edu/pop510/default.html UCLA http://www.ats.ucla.edu/stat/sas/topics/mlm.htmhttp://www.ats.ucla.edu/stat/stata/topics/ml M.htm 25