Performance of Latent Growth Curve Models with Binary Variables

Similar documents
CHAPTER 1 INTRODUCTION

Multiple Imputation with Mplus

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS

Binary IFA-IRT Models in Mplus version 7.11

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects

Introduction to Mplus

The Mplus modelling framework

CHAPTER 18 OUTPUT, SAVEDATA, AND PLOT COMMANDS

Comparison of computational methods for high dimensional item factor analysis

SEM 1: Confirmatory Factor Analysis

SEM 1: Confirmatory Factor Analysis

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Estimation of Item Response Models

RESEARCH ARTICLE. Growth Rate Models: Emphasizing Growth Rate Analysis through Growth Curve Modeling

PSY 9556B (Feb 5) Latent Growth Modeling

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models

MPLUS Analysis Examples Replication Chapter 10

Enterprise Miner Tutorial Notes 2 1

Bagging & System Combination for POS Tagging. Dan Jinguji Joshua T. Minor Ping Yu

Missing Data Analysis for the Employee Dataset

Approximation methods and quadrature points in PROC NLMIXED: A simulation study using structured latent curve models

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

SLStats.notebook. January 12, Statistics:

CHAPTER 12 ASDA ANALYSIS EXAMPLES REPLICATION-MPLUS 5.21 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION

Economics of Cybercrime The Influence of Perceived Cybercrime Risk on Online Service Adoption of European Internet Users

Ludwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer

Also, for all analyses, two other files are produced upon program completion.

Study Guide. Module 1. Key Terms

Missing Not at Random Models for Latent Growth Curve Analyses

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding

Example Using Missing Data 1

Generalized least squares (GLS) estimates of the level-2 coefficients,

Monte Carlo Simulation. Ben Kite KU CRMDA 2015 Summer Methodology Institute

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016

STATS PAD USER MANUAL

SAS Graphics Macros for Latent Class Analysis Users Guide

Recitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002

Lecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression

An Introduction to Growth Curve Analysis using Structural Equation Modeling

Supervised vs unsupervised clustering

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Bayesian Model Averaging over Directed Acyclic Graphs With Implications for Prediction in Structural Equation Modeling

Lecture 25: Review I

Last updated January 4, 2012

SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

Multicollinearity and Validation CIVL 7012/8012

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Expectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Adjusting for Unequal Selection Probability in Multilevel Models: A Comparison of Software Packages

STATISTICS (STAT) Statistics (STAT) 1

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

2016 Stat-Ease, Inc. & CAMO Software

Feature selection. LING 572 Fei Xia

Introduction to Mixed Models: Multivariate Regression

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Statistical Matching using Fractional Imputation

Monte Carlo 1. Appendix A

MPLUS Analysis Examples Replication Chapter 13

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Chapters 5-6: Statistical Inference Methods

Applying Supervised Learning

Missing Data Missing Data Methods in ML Multiple Imputation

Analysis of Complex Survey Data with SAS

Dealing with Categorical Data Types in a Designed Experiment

Latent Curve Models. A Structural Equation Perspective WILEY- INTERSCIENΠKENNETH A. BOLLEN

#NULL! Appears most often when you insert a space (where you should have comma) to separate cell references used as arguments for functions.

Missing Data Analysis with SPSS

An Introduction to the Bootstrap

1. Determine the population mean of x denoted m x. Ans. 10 from bottom bell curve.

ITSx: Policy Analysis Using Interrupted Time Series

Multiple Regression White paper

Implementing a Simulation Study Using Multiple Software Packages for Structural Equation Modeling

Correctly Compute Complex Samples Statistics

ABSTRACT ESTIMATING UNKNOWN KNOTS IN PIECEWISE LINEAR- LINEAR LATENT GROWTH MIXTURE MODELS. Nidhi Kohli, Doctor of Philosophy, 2011

Statistical Methods for the Analysis of Repeated Measurements

IQR = number. summary: largest. = 2. Upper half: Q3 =

A Beginner's Guide to. Randall E. Schumacker. The University of Alabama. Richard G. Lomax. The Ohio State University. Routledge

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Dual-Frame Weights (Landline and Cell) for the 2009 Minnesota Health Access Survey

Learn What s New. Statistical Software

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa

Categorical Data in a Designed Experiment Part 2: Sizing with a Binary Response

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Data Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data

CS 229 Midterm Review

LISREL 10.1 RELEASE NOTES 2 1 BACKGROUND 2 2 MULTIPLE GROUP ANALYSES USING A SINGLE DATA FILE 2

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

Lecture 9: Support Vector Machines

Product Catalog. AcaStat. Software

Linear Methods for Regression and Shrinkage Methods

CS229 Lecture notes. Raphael John Lamarre Townshend

3. Data Analysis and Statistics

Thus far, the examples used to motivate the utility of structural equation

Predict Outcomes and Reveal Relationships in Categorical Data

Transcription:

Performance of Latent Growth Curve Models with Binary Variables Jason T. Newsom & Nicholas A. Smith Department of Psychology Portland State University 1

Goal Examine estimation of latent growth curve models with diagonalized weighted least squares and categorical ML estimation approaches for binary variables Effects of sample size, number of time points, base rate proportion Examine convergence failures, parameter bias, standard error bias, Type I error, and coverage 2

Background Simulation of SEMs with binary/ordinal estimators suggest: Weighted least squares (WLS) estimation with polychoric correlations has better small sample performance than regular ADL/WLS approach (Flora & Curran, 2004; Maydeu-Olivares, 2001; Muthén, du Toit, & Spisic, 1997) Diagonalized WLS more computationally practical than inversion of full weight matrix (Muthén, 1993) Limited information method, which may be less optimal when values are NMAR (Asparouhov & Muthén, 2010) 3

Background Maximum likelihood for categorical variables Sometimes referred to as marginal maximum likelihood, uses expectation maximization (EM) algorithm with numeric integration to analyze full multiway frequency tables (Bock & Atkin, 1981; Christofersson, 1975; Muthén & Christofferson, 1981) Commonly employed with item response models (Demars, 2012; Kamata & Bauer, 2008) Parameter estimates that are interpretable as logistic regression coefficients Full-information estimation approach. Can be computationally intensive, but may be preferable to WLSMV when values are NMAR 4

Background DWLS method with robust adjustments outperforms unadjusted counterpart (Bandalos, 2014; DeMars, 2012), henceforth referred to as WLSMV Maximum likelihood (ML) for binary variables using robust standard errors also outperforms unadjusted ML counterpart, henceforth referred to as MLR (Bandalos, 2014; DeMars, 2012) Sample sizes of 100-150 have poor Type I error or coverage rates for WLSMV (Rhemtulla, Brosseau-Liard, & Savalei, 2012) 5

Background Several sources describe details of testing LGC models with binary or ordinal variables (e.g., Lee, Wickrama, & O'Neal, 2018; Masyn, Petras, & Liu, 2014; Mehta, Neale, & Flay, 2004; Newsom, 2015) Not widely used presently but popularity is likely to increase 6

Background Bandalos (2014) compared robust binary ML and DWLS and found that Robust ML and WLSMV performed comparably but had high Type I error rates with low N (150) and asymmetric data (up to N = 300) Robust ML performed better with asymmetric variables (rarer event) Concluded WLSMV was better all-around choice because of less bias in parameter estimates 7

Background Comparisons of categorical estimation methods have been based on factor models and some standard predictive models Latent growth curve models (LGC) differ in the large number of parameter constraints, parameters of interest (factor means and variances), and potential impact of the number of time points Few existing studies of performance of LGC models with categorical indicators, and more work is needed 8

Background Muthén (1996) conducted a small simulation study with binary variables in latent growth curve models Using LISCOMP (unadjusted DWLS) Low parameter bias for N=250 and N=1000 Fairly symmetric outcome, with baseline proportion of.64 Type I error rates were approximately nominal for N=250 (5.4%) 9

Background Finch (2017) conducted a simulation study with binary variables in latent growth curve models Examined WLSMV 4, 6, and 8 time points N = 200, 500, and N=1000 (but smaller N may be common for studies) Examined different baseline proportions, but neither condition particularly rare (.5 an.69) 10

Background Finch (2017) Better convergence with more time points Found low bias and good coverage rates in fixed effects (i.e., average intercept and slope estimates) even for N = 200 No evaluation of random effects Did not examine MLR estimation 11

Background The latent growth curve model ψ 01 α 0,ψ 00 α 1,ψ 11 1 η 0 1 1 1 0 1 η 1 2 3 y 1 y 2 y 3 y 4 ε 1 ε 2 ε 3 ε 4 12

Method Simulations Data were generated in SAS 9.4 using the RandomMVBinary macro (Wicklin, 2013) to produce proportions and change in proportions comparable to applied longitudinal studies (LLSE: Sorkin & Rook, 2004; HRS: Heeringa & Connor, 1995) Simulations were analyzed using the Mplus Version 8 (Muthén & Muthén, 1998-2017) MONTECARLO feature 13

Design Dependent Measures % bias slope means (fixed effects) % bias slope variances (random effects) Bias greater than 5% was considered unacceptable (Hoogland & Boomsma, 1998) 14

Design Bias Computation for Average Slopes Percent relative bias ˆ θ θ % Bias( ˆ θ ) = 100 θ Computation when parameter value equals 0 1 ( ) ( ) 1 ˆ =Φ Φ ( ) %Bias θ z z θ where Φ -1 is the cumulative standard normal probability, z ˆ = ( ˆ θ θ) / SD estimated from standard score values for the θ ˆ θ parameter, z θ, with the percentile for the population value for θ i equal to.5 ˆ θ 15

Dependent Measures Bias Computation for Variances Design ( ) E SE SD % Bias( SE) = 100 SD 16

Design Dependent Measures Type I error (% samples significant) for conditions in which the slope is zero 95% Coverage (% samples in which population value falls within confidence interval) for all other conditions 17

Design Independent Variables Sample size: 100, 200, 500, 1000 # time points: T = 3, 5, 7 84,000 replications, 500 per cell (e.g., Paxton et al., 2001) Average intercept: small proportion (.11) vs. medium proportion (.45) Average slope: 0 vs. mod effect (increase of p =.025 per wave) Slope variance: small vs. medium effect 18

Design Summary Average Intercept, α I Average Slope, α S SSS Small Small(Zero) Small SMS Small Medium Small Slope Variance,ψ s SMM Small Medium Medium MSS Medium Small(Zero) Small MSM Medium Small(Zero) Medium MMS Medium Medium Small MMM Medium Medium Medium 19

Results Convergence Models converged successfully for models in all conditions for both WLSMV and MLR Improper solutions Improper solutions (e.g., negative error variances) were common, sometimes as high as 80% of samples with small N and T Especially high for the low baseline proportion conditions (SSS, SMS, and SMM) 20

Improper Solutions SSS MSS MSM 21

SMS Improper Solutions SMM MMS MMM 22

Average Slope % Bias (zero slope) SSS MSS MSM 23

SMS Average Slope % Bias (medium slope) SMM MMS MMM 24

Average Slope SE % Bias (zero slope) SSS MSS MSM 25

SMS Average Slope SE % Bias (medium slope) SMM MMS MMM 26

Average Slope % Type I Error (zero slope) SSS MSS MSM 27

SMS Average Slope 95% Coverage (medium slope) SMM MMS MMM 28

SSS Slope Variance % Bias (small variance) SMS MSS MMS 29

SMM Slope Variance % Bias (medium variance) MSM MMM 30

SSS Slope Variance SE % Bias (small variance) SMS MSS MMS 31

SMM Slope Variance SE % Bias (medium variance) MSM MMM 32

SSS Slope Variance 95% Coverage (small variance) SMS MMS MSS 33

Slope Variance 95% Coverage (medium variance) SMM MSM MMM 34

Summary Convergence and Improper Solutions All models converged for all conditions with WLSMV and MLR Improper solutions decreased with larger N and more time points Improper solutions were generally uncommon for MLR for all conditions and far more common for WLSMV, particularly for N < 500 Even with N = 1000, WLSMV 30% of samples had improper solutions when T = 3 35

Average Slope Summary % bias was not problematic for either estimation method when N > 200 for both WLSMV and MLR, but was unacceptable for N = 100 when T = 3 % bias for SEs was generally acceptable except for N = 100 and T = 3 with MLR Type I errors were near nominal levels in all conditions, particularly when T > 3 95% coverage rates were appropriate for N > 200 for all T and both estimators 36

Slope Variance Summary % bias was erratic when T=3 for both MLR and WLSMV With T > 3, % bias was acceptable for WLSMV if N > 200 and MLR if N > 500 % bias for SEs was poor for T=3 for both MLR and WLSMV, with SEs better estimated by MLR than WLSMV when N = 200 95% coverage was poor for T = 3 for both WLSMV and MLR, and better for MLR when T = 5 37

Strengths Discussion Parameter estimates and standard errors in context of LGC models, where means and variances are of principal interest Variance estimates and robust ML for LGC models with binary variables has not be investigated Asymmetric distributions, comparing a low baseline proportion of.1 to larger baseline proportion.45. Included smaller sample size condition, N = 100. Smallest sample size studied by Finch (2017) was N = 200 38

Recommendations Discussion Convergence and improper solutions provide information of importance for practicing researchers: More than three time points needed MLR has fewer improper solutions N = 100 generally unacceptable when T = 3 for accurate slope SEs, Type I error, and coverage T= 3 insufficient for slope variance estimates 39

Limitations Discussion Limited to binary variables, more ordinal categories may improve estimation problems (Finch, 2007) Correctly specified models only, misspecifications should be examined (e.g., nonlinear models) Quadratic effects may require N = 1,000 or more (Finch, 2007) Estimation improved for fixed effects with ordinal variables and more time points (T = 6) 40

Thank you! Please contact Jason Newsom, newsomj@pdx.edu, with comments or questions. Acknowledgements: We appreciate helpful comments from Todd Bodner, Joel Steele, Liu-Qin Yang, and the Portland State University Stats Lunch group, and financial support from Portland State University. 41