Introduction to Mixed Models: Multivariate Regression

Similar documents
Missing Data Missing Data Methods in ML Multiple Imputation

Hierarchical Generalized Linear Models

Example Using Missing Data 1

Study Guide. Module 1. Key Terms

The linear mixed model: modeling hierarchical and longitudinal data

PRI Workshop Introduction to AMOS

Path Analysis using lm and lavaan

Performing Cluster Bootstrapped Regressions in R

Week 4: Simple Linear Regression II

ANNOUNCING THE RELEASE OF LISREL VERSION BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison

An introduction to SPSS

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background

Correctly Compute Complex Samples Statistics

SPSS INSTRUCTION CHAPTER 9

Conducting a Path Analysis With SPSS/AMOS

Generalized least squares (GLS) estimates of the level-2 coefficients,

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Week 11: Interpretation plus

Beta-Regression with SPSS Michael Smithson School of Psychology, The Australian National University

An Introduction to Growth Curve Analysis using Structural Equation Modeling

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation:

Missing Data Analysis with SPSS

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Applied Regression Modeling: A Business Approach

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Repeated Measures Part 4: Blood Flow data

Two-Stage Least Squares

DETAILED CONTENTS. About the Editor About the Contributors PART I. GUIDE 1

Week 5: Multiple Linear Regression II

Week 4: Simple Linear Regression III

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS

Laboratory for Two-Way ANOVA: Interactions

Panel Data 4: Fixed Effects vs Random Effects Models

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

Splines and penalized regression

HLM versus SEM Perspectives on Growth Curve Modeling. Hsueh-Sheng Wu CFDR Workshop Series August 3, 2015

Generalized additive models I

STATISTICS (STAT) Statistics (STAT) 1

Introduction to Mixed-Effects Models for Hierarchical and Longitudinal Data

Missing Data Techniques

Applied Statistics and Econometrics Lecture 6

Multiple Group CFA in AMOS (And Modification Indices and Nested Models)

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC.

CHAPTER 1 INTRODUCTION

Missing Data Analysis for the Employee Dataset

Statistical Pattern Recognition

LISREL 10.1 RELEASE NOTES 2 1 BACKGROUND 2 2 MULTIPLE GROUP ANALYSES USING A SINGLE DATA FILE 2

Binary IFA-IRT Models in Mplus version 7.11

Robust Linear Regression (Passing- Bablok Median-Slope)

8. MINITAB COMMANDS WEEK-BY-WEEK

predict and Friends: Common Methods for Predictive Models in R , Spring 2015 Handout No. 1, 25 January 2015

10601 Machine Learning. Model and feature selection

Lecture 16: High-dimensional regression, non-linear regression

Generalized Least Squares (GLS) and Estimated Generalized Least Squares (EGLS)

Correctly Compute Complex Samples Statistics

Week 10: Heteroskedasticity II

Applied Multivariate Analysis

Multiple Linear Regression

UNIVERSITY OF CALIFORNIA, SANTA CRUZ BOARD OF STUDIES IN COMPUTER ENGINEERING

Missing Data. SPIDA 2012 Part 6 Mixed Models with R:

PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects

Missing Data Analysis for the Employee Dataset

Statistical Pattern Recognition

Linear Methods for Regression and Shrinkage Methods

THE ANALYSIS OF CONTINUOUS DATA FROM MULTIPLE GROUPS

May 24, Emil Coman 1 Yinghui Duan 2 Daren Anderson 3

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

CSE 158. Web Mining and Recommender Systems. Midterm recap

Statistical Analysis of List Experiments

GETTING STARTED WITH THE STUDENT EDITION OF LISREL 8.51 FOR WINDOWS

Latent Curve Models. A Structural Equation Perspective WILEY- INTERSCIENΠKENNETH A. BOLLEN

Bayesian Model Averaging over Directed Acyclic Graphs With Implications for Prediction in Structural Equation Modeling

Analysis of Complex Survey Data with SAS

Health Disparities (HD): It s just about comparing two groups

Multidimensional Latent Regression

xxm Reference Manual

Estimating DCMs Using Mplus. Chapter 9 Example Data

Course Business. types of designs. " Week 4. circle back to if we have time later. ! Two new datasets for class today:

Section 2.3: Simple Linear Regression: Predictions and Inference

Salary 9 mo : 9 month salary for faculty member for 2004

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

Lecture 7: Linear Regression (continued)

Brief Guide on Using SPSS 10.0

Standard Errors in OLS Luke Sonnet

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

**************************************************************************************************************** * Single Wave Analyses.

PSY 9556B (Feb 5) Latent Growth Modeling

- 1 - Fig. A5.1 Missing value analysis dialog box

ANSWERS -- Prep for Psyc350 Laboratory Final Statistics Part Prep a

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Package endogenous. October 29, 2016

Chapter 15 Mixed Models. Chapter Table of Contents. Introduction Split Plot Experiment Clustered Data References...

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Transcription:

Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis

Today s Lecture Multivariate regression via mixed models Comparing and contrasting path analysis with mixed models Ø Differences in model fit measures Ø Differences in software estimation methods Ø Model comparisons via multivariate Wald tests (instead of LRTs) Ø How to compute R 2 EPSY 905: Multivariate Regression via Path Analysis 2

Today s Data Example Data are simulated based on the results reported in: Pajares, F., & Miller, M. D. (1994). Role of self-efficacy and self-concept beliefs in mathematical problem solving: a path analysis. Journal of Educational Psychology, 86, 193-203. Sample of 350 undergraduates (229 women, 121 men) Ø In simulation, 10% of variables were missing (using missing completely at random mechanism) Note: simulated data characteristics differ from actual data (some variables extend beyond their official range) Ø Simulated using Multivariate Normal Distribution w Some variables had boundaries that simulated data exceeded Ø Results will not match exactly due to missing data and boundaries EPSY 905: Multivariate Regression via Path Analysis 3

Variables of Data Example Sex (1 = male; 0 = female) Math Self-Efficacy (MSE) Ø Reported reliability of.91 Ø Assesses math confidence of college students Perceived Usefulness of Mathematics (USE) Ø Reported reliability of.93 Math Anxiety (MAS) Ø Reported reliability ranging from.86 to.90 Math Self-Concept (MSC) Ø Reported reliability of.93 to.95 Prior Experience at High School Level (HSL) Ø Self report of number of years of high school during which students took mathematics courses Prior Experience at College Level (CC) Ø Self report of courses taken at college level Math Performance (PERF) Ø Reported reliability of.788 Ø 18-item multiple choice instrument (total of correct responses) EPSY 905: Multivariate Regression via Path Analysis 4

Multivariate Linear Regression Path Diagram σ ' ( σ ( σ ',** ** σ ( '.** σ ','.** σ **,'.** Female (F) College Math Experience (CC) Female x College Math Experience (FxCC) β,-% ** β,-% '.** β ' $%&' β ',-% β $%&' ** β $%&' '.** Mathematics Performance (PERF) Mathematics Usefulness (USE) Direct Effect σ ( ":$%&' σ ":$%&',,-% σ ( ":,-% Residual (Endogenous) Variance Exogenous Variances Exogenous Covariances EPSY 905: Multivariate Regression via Path Analysis 5

The Big Picture Mixed models are used for many types of analyses: Ø Analogous to MANOVA and M-Regression (so repeated measures analyses) Ø Multilevel models for clustered, longitudinal, and crossed-effects data The biggest difference between mixed models software and path analysis software is in the assumed distribution of the exogenous variables Ø Mixed models: no distribution assumed Ø Path analysis: most software assumes multivariate normal (Mplus does not by default) Ø This affects how missing data are managed mixed models cannot have any missing IVs Mixed models also do not allow endogenous variables to predict other endogenous variables Ø No indirect effects are possible from a single analysis (multiple analyses needed) Mixed models software also often needs variables to be stored in socalled stacked or long-format data (one row per DV) Ø We used wide-format data for lavaan (one row per person) EPSY 905: Multivariate Regression via Path Analysis 6

Multivariate Linear Models Software Scorecard Feature Need Classical MANOVA/ M-Regression Dependent variables can predict other dependent variables simultaneously Variable-specific modeling options Robust Maximum Likelihood Estimation Residual Maximum Likelihood Estimation Ability to Incorporate Random Effects Multiple Covariance Structures (beyond Compound Symmetry and Unstructured) Approximate Model Fit Indices Different denominator df methods Path Analysis (lavaan unless stated) Path analysis definition NO YES NO Precise model development Provides lepto/platykurtic data protection Provides unbiased variances, covariance, and (some) path estimates Additional levels of dependency Model parsimony ( Curse of multidimensionality) Determining How Bad a Model May Be NO YES (directly) Mixed Models (nlme/lme4 unless stated) YES (with dummy codes) NO YES YES (most without LRT) YES (analogously) NO YES NO YES (two-level nested; Mplus does more) YES NO YES (some) YES (many in R; many more in SAS) NO YES NO NO NO YES PRE 906, SEM: Estimation 7

MULTIVARIATE REGRESSION/ANOVA/ANCOVA VIA PATH ANALYSIS EPSY 905: Multivariate Regression via Path Analysis 8

Multivariate Regression Before we dive into mixed models, we will begin with a multivariate regression model: Ø Predicting mathematics performance (PERF) with sex (F), college math experience (CC), and the interaction between sex and college math experience (FxCC) Ø Predicting perceived usefulness (USE) with sex (F), college math experience (CC), and the interaction between sex and college math experience (FxCC) PERF 3 = β $%&' 5 + β $%&' ' F 3 + β $%&' ** CC 3 + β $%&' $%&' '.** F 3 CC 3 + e 3 USE 3 = β,-% 5 + β,-% ' F 3 + β,-% ** CC 3 + β,-%,-% '.** F 3 CC 3 + e 3 We denote the residual for PERF as e 3 $%&' and the residual for USE as e 3,-% Ø We also assume the residuals are Multivariate Normal: e 3 $%&' ( σ ":$%&',,-% 0,-% N ( e 3 0, σ ":$%&' σ ":$%&',,-% σ ( ":,-% EPSY 905: Multivariate Regression via Path Analysis 9

Before Continuing: We will Center CC at 10 EPSY 905: Multivariate Regression via Path Analysis 10

Types of Variables in the Analysis An important distinction in path analysis is between endogenous and exogenous variables Endogenous variable(s): variables whose variability is explained by one or more variables in a model Ø In linear regression, the dependent variable is the only endogenous variable in an analysis w Mathematics Performance (PERF) and Mathematics Usefulness (USE) Exogenous variable(s): variables whose variability is not explained by any variables in a model Ø In linear regression, the independent variable(s) are the exogenous variables in the analysis w Female (F), college experience (CC), and the interaction (FxCC) These distinctions are not commonly used in mixed models: Ø Endogenous variables are called dependent variables or outcomes Ø Exogenous variables are called independent variables or predictors EPSY 905: Multivariate Regression via Path Analysis 11

Multivariate Linear Regression Path Diagram σ > ( σ ( σ ',** ** σ ( '.** σ ','.** σ **,'.** Male (M) College Math Experience (CC) Female x College Math Experience (FxCC) β,-% ** β,-% '.** β > $%&' β >,-% β $%&' ** β $%&' '.** Mathematics Performance (PERF) Mathematics Usefulness (USE) Direct Effect σ ( ":$%&' σ ":$%&',,-% σ ( ":,-% Residual (Endogenous) Variance Exogenous Variances Exogenous Covariances EPSY 905: Multivariate Regression via Path Analysis 12

CONVERTING DATA FROM WIDE- TO LONG-FORMAT USING THE RESHAPE FUNCTION EPSY 905: Multivariate Regression via Path Analysis 13

First Step: Convert Data from Wide to Long Original wide-format data (all DVs for a person on one row) Reshape commands: Resulting data: EPSY 905: Multivariate Regression via Path Analysis 14

Second Step: Remove any Missing Values Because all variables are not put into the likelihood function, any missing values (on any DV or IV) are not permitted Ø For IVs, omitting variables has serious implications for the analysis (stay tuned we ll talk about this more in the multiple imputation portion) Ø For DVs, because the DVs are part of the likelihood, they can be missing without problem (well without problems that aren t easy to handle) So, we must remove all missing values from all variables: EPSY 905: Multivariate Regression via Path Analysis 15

Third Step: Create Dummy Code for DV In stacked data, the dependent variable values are all put into one column (variable) of the data set As mixed model software has only one spot for a single outcome, we will trick the software into modeling more than one by using a dummy coded variable that will be part of our analysis, making each effect conditional on the correct DV We ve decided to put USE as the reference variable Ø Like reference group, but now for DV EPSY 905: Multivariate Regression via Path Analysis 16

BUILDING THE EMPTY MULTIVARIATE MODEL IN MIXED MODELS SOFTWARE EPSY 905: Multivariate Regression via Path Analysis 17

Multivariate Regression from the Path Analysis Method If we were to put our empty model into a path analysis framework, we would end up with the following: PERF 3 = β 5 $%&' + e 3 $%&' USE 3 = β 5,-% + e 3,-% We denote the residual for PERF as e 3 $%&' and the residual for USE as e 3,-% Ø We also assume the residuals are Multivariate Normal: e 3 $%&' ( σ ":$%&',,-% 0,-% N ( e 3 0, σ ":$%&' σ ":$%&',,-% σ ( ":,-% EPSY 905: Multivariate Regression via Path Analysis 18

Building the Empty Model: Not So Empty A multivariate model using mixed model software uses the dummy code for DV to make all effects conditional on the specific DV in the model Ø Here I use the symbol δ to represent each fixed effect in the multivariate model from the mixed model perspective Ø I will compare/contrast these with the symbols β from the fixed effects in path analysis For instance, our empty model is thus: Y A,BC = δ 5 + δ D dperf A + e A,BC The prediction is conditional on the value of dperf: Ø When dperf = 0 à DV = Use à USE A = δ 5 + e A,H" Ø When dperf = 1 à DV = Perf à PERF A = δ 5 + δ D + e A $"IJ EPSY 905: Multivariate Regression via Path Analysis 19

Estimating the Empty Model From the nlme library, we use the gls() function Ø Be sure the library is installed and loaded before trying this! model = score ~ 1 + dperf: Ø DV ~ à put the name of the DV on the left of the ~ Ø 1 à Unnecessary but indicates the intercept is estimated Ø dperf à The rest of the IVs go between plus signs after the ~ method = REML à Don t change (more details next) correlation = corsymm(form = ~1 id) Ø Provides estimates of all unique correlations Ø Needs id variable name after for program to know which data comes from which person weights = varident(form = ~1 DV) Ø Estimates a different (residual) variance for each DV Ø With correlation line ensures an unstructured model is estimated EPSY 905: Multivariate Regression via Path Analysis 20

Empty Model Results: Fixed Effects and Model Fit EPSY 905: Multivariate Regression via Path Analysis 21

Empty Model Results: Residual Covariance Matrix The residual covariance matrix comes from the getvarcov() function: EPSY 905: Multivariate Regression via Path Analysis 22

Mapping Multivariate Mixed Models onto Path Models To compare this result with the path analyses we conducted previously, we ll have to use this data set Ø Omit the same observations So, we ll need to take our long-format data and reshape it into wide-format: EPSY 905: Multivariate Regression via Path Analysis 23

Lavaan Model Syntax EPSY 905: Multivariate Regression via Path Analysis 24

Comparing and Contrasting Results: Model Fit EPSY 905: Multivariate Regression via Path Analysis 25

Comparing and Contrasting Results: Parameter Estimates EPSY 905: Multivariate Regression via Path Analysis 26

RESIDUAL MAXIMUM LIKELIHOOD ESTIMATION (UNBIASED VARIANCE COMPONENTS) EPSY 905: Multivariate Regression via Path Analysis 27

Residual Maximum Likelihood Estimation The ML estimator is nice, but the variance estimate is downward biased (too small) Ø Remember it divides by N for the residual covariance matrix In small samples, this is likely to lead to biased estimates and incorrect p-values Ø The variance goes into the SE, which goes into the Wald test, which dictates the p-value for the beta Instead, another maximum likelihood technique has been developed: REML Ø Ø Ø Maximizes the likelihood of the residuals rather than the data Has unbiased estimates of the residual covariance matrix Is the default method of estimation for most mixed model estimation packages There is one catch to REML: you cannot use a LRT to compare nested models with differing fixed effects Ø Ø Ø Because the algorithm uses residuals, if the residuals change, the likelihood changes Residuals come from the fixed effects à if fixed effects are different, then residuals change, causing the likelihood to change Can use multivariate Wald test for fixed effects Don t mix ML and REML for the same analysis PRE 905: Lecture 8 - Multivariate Empty Models 28

ADDING PREDICTORS TO THE MODEL EPSY 905: Multivariate Regression via Path Analysis 29

Adding Predictors To The Model Adding predictors to the model is similar to adding predictors in regular regression models Ø No 0* terms in syntax to remove By using REML we cannot compare models using likelihood ratio tests Ø REML LRTs must have same fixed effects Ø Adding predictors adds new fixed effects to the empty model We are predicting each DV with male, cc10, and male*cc10 EPSY 905: Multivariate Regression via Path Analysis 30

Model with Predictors: Syntax Interpret the Parameters: β 5 : β K$"IJ dperf A : β >LM" Male A : β K$"IJ >LM" dperf A Male A : β **D5 CC10 A : β K$"IJ **D5 dperf A CC10 A : β >LM" **D5 CC10 A Male A : β K$"IJ >LM" **D5 dperf A CC10 A Male A : EPSY 905: Multivariate Regression via Path Analysis 31

First Question: Which Model Fits Better? After adding the predictors (estimating their betas) to the model, we must first ask which model fits better A likelihood ratio test (LRT) cannot be performed as we are using REML Which model is the null model? Ø Model01 Which model is the alternative model? Ø Model02 What is the null hypothesis? Ø H 5 : What is the alterative hypothesis? EPSY 905: Multivariate Regression via Path Analysis 32

Syntax for Multivariate Wald Test for Model Fit EPSY 905: Multivariate Regression via Path Analysis 33

Results for Model Fit Note: R s nlme function doesn t do a good job with df.residual and provides a Chi-square test Ø SAS is way more advanced on this as there are multiple options Also note there are 6 degrees of freedom (one for each additional beta weight in the model) EPSY 905: Multivariate Regression via Path Analysis 34

Up Next: Inspect Parameters and Make Interpretations Using the summary() function for the model: But, there is a way to directly code these beta weights into path model analogs using the glht() function EPSY 905: Multivariate Regression via Path Analysis 35

Direct Betas using Linear Combinations and glht() EPSY 905: Multivariate Regression via Path Analysis 36

Questions to Answer about this Model What is the effect of college experience on usefulness for males? What is the effect of college experience on usefulness for females? What is the difference between males and females ratings of usefulness when college experience = 10? How did the difference between males and females ratings change for each additional hour of college experience? EPSY 905: Multivariate Regression via Path Analysis 37

Questions to Answer about this Model What is the effect of college experience on performance for males? What is the effect of college experience on performance for females? What is the difference between males and females performance when college experience = 10? How did the difference between males and females performance change for each additional hour of college experience? EPSY 905: Multivariate Regression via Path Analysis 38

Model R-squared To determine the model R-squared, we have to compare the variance/covariance matrix from model01 and model02 and make the statistics ourselves: EPSY 905: Multivariate Regression via Path Analysis 39

WRAPPING UP AND REFOCUSING EPSY 905: Multivariate Regression via Path Analysis 40

Differences Between Mixed and Path Model Results Things we get directly from path models that we do not get directly in mixed models: Ø Tests for approximate model fit Ø Scaled Chi-square for some types of non-normal data Ø Standardized parameter coefficients Ø Tests for indirect effects Ø R-squared statistics Things we get directly in mixed models that we do not get in path models: Ø REML (unbiased estimates of variances/covariances) EPSY 905: Multivariate Regression via Path Analysis 41

Wrapping Up In this lecture we discussed the basics of mixed model analyses for multivariate models Ø Model specification/identification Ø Model estimation Ø Model modification and re-estimation Ø Final model parameter interpretation There is a lot to the analysis but what is important to remember is the over-arching principal of multivariate analyses: covariance between variables is important Ø Mixed models imply very specific covariance structures Ø The validity of the results still hinge upon accurately finding an approximation to the covariance matrix EPSY 905: Multivariate Regression via Path Analysis 42