Introduction to Mplus May 12, 2010 SPONSORED BY: Research Data Centre Population and Life Course Studies PLCS Interdisciplinary Development Initiative Piotr Wilk piotr.wilk@schulich.uwo.ca
OVERVIEW Mplus modeling framework Mplus language Examples of research using Mplus
WHAT IS MPLUS? Statistical modeling program for structural equation modeling... and more Extremely flexible modeling framework: Multiple types of data formats (variables) Multiple types of statistical models (relationships) Code-based path-centric specification
MODELING FRAMEWORK
MODELING FRAMEWORK Describes structure of the (rectangles and circles) Describes relationships between variables (arrows) Acknowledges complex data structures: Multilevel and multiple population data
DATA STRUCTURE Observe variables (rectangles) Latent variables (circles) Combinations
OBSERVED OUTCOME VARIABLES Continuous (y) Categorical (u): Censored Binary Ordered categorical (ordinal) Unordered categorical (nominal) Counts Combinations: Single model (y with u)
LATANT VARIABLES Continuous latent variables (f): Continuous indicators Categorical indicators Categorical latent variables (c) Measurement model Group membership
RELATIONSHIPS: OBSERVED VARIABLES Linear regression for continuous outcomes Probit or logistic regression for binary outcomes Poisson or zero-inflated regression for count outcomes Simultaneous modeling of several related relationships: Path analysis
RELATIONSHIPS: LATANT VARIABLES Continuous latent variables: Structural equation modeling Categorical latent variables: Mixture modeling Latent class analysis Latent variable interactions
RELATIONSHIPS: ALL VARIABLES Ability to combine all types of variables and all types of relationships into a single analytical framework
MODELING POSSIBILITIES: EXAMPLES Complex survey data Multiple group analysis Multilevel modeling Mixture modeling Latent class analysis Longitudinal data analysis Modeling with missing data Monte Carlo simulations And more
COMPLEX SURVEY DATA Adjustment of standard errors: Takes into account stratification and/or nonindependence of observations Unequal probabilities of selection (sampling weights) Multilevel framework: Specify a separate model for each level of the multilevel data Both approaches can be combined
MULTILEVEL MODELING Multilevel models separate the overall variance into two sources: Within (individual-level variation) Between (group-level variation) Allows random intercepts and random slopes Random effects can be specified for any relationship
MIXTURE MODELING Modeling with categorical latent variables Represent subpopulations where population membership not known but inferred from the data
LATENT CLASS ANALYSIS A special case of mixture modeling Explains relationships among observed dependent variables Provides classification of individuals into more homogenous sub-groups
LONGITUDINAL DATA ANALYSIS Broad class of statistical methods for longitudinal data Latent growth curve analysis Resembles classic confirmatory factor analysis Multilevel modeling
MODELING WITH MISSING DATA Several options for estimating models with missing data Estimation based on two assumptions: Missing completely at random Missing at random Non-ignorable missing data modeling: Categorical outcomes as indicators of missingness Generates and analyzes multiple data sets using multiple imputation Computes bootstrapped standard errors
MONTE CARLO SIMULATIONS Extensive Monte Carlo facilities for data generation and data analysis Generates several types of data based on specified parameters Can be used for power analysis Other Monte Carlo features: Saving generated data and parameter estimates Analytical results from each replication can be saved in an external file
OTHER USEFUL FEATURES Indirect effects (specific paths) Bootstrap standard errors and confidence intervals Robust estimation of standard errors and chi-square tests for model fit And more
COMMAND STRUCTURE Mplus is a command-based program There are nine sets of Mplus commands: TITLE: DATA: VARIABLE: DEFINE: ANALYSIS: MODEL: OUTPUT: SAVEDATA: PLOT: MONTECARLO:
GENERAL RULES All commands must begin on a new line and must be followed by a colon (:) Some commands have numerous subcommands Semicolons (;) separate subcommands Individual lines of code cannot exceed 80 characters Not case sensitive (only variable names are case sensitive) Exclamation mark in front (!) serves as a comment character
TITLE COMMAND Specifies a title that will be printed on each page of the output file No limit on length
DATA COMMAND Specifies where the data file is located and the format of the data Records may be in free format or fixed format Accepts covariance or correlation matrices Data files from other statistical packages have to be converted: SAS and SPSS: fixed format ASCII file STATA: stata2mplus function
DEFINE COMMAND Allows for transformation and creation of new variables Supports a large number of transformation functions Allows for conditional statements Selection of observations
ANALYSIS COMMAND Specifies analysis type(s) and estimation procedure Many estimation options are available Some analyses require additional commands
MODEL COMMAND: OVERVIEW Specifies the parameters of the model Models are built in terms of relationships between variables: Variable RELATIONSHIP Variable
MODEL COMMAND: RELATIONSHIPS BY keyword ("measured by"): Define the latent variables ON keyword ( regressed on ): Structural path between variables WITH keyword ( correlated with ): correlation between two variables
Variances: MODEL COMMAND: PARAMETERS Variable name without brackets [Means] or thresholds [catvar$1]: Variable name inside square brackets {Scale factors}: Variable name in curly brackets
OUTPUT COMMAND Specifies optional outputs to be generated Mplus creates an output file using the extension.out (text file) Specific elements of output can be included or suppressed
SAVEDATA COMMAND Determines what to save in new text files Analysis dependent outputs Datasets Parameter estimates Latent class memberships Cook s distances or influence statistics
PLOT COMMAND Provides graphical displays of observed data and results: Histograms / scatterplots Individual observed and estimated values Sample and estimated means and proportions/probabilities Available for: Total sample By group / class Adjusted for covariates Editing and exporting of plots
DEFAULTS The command language is set up with defaults to minimize the amount of text Version specific defaults Example: Missing data Mplus assumes that there are no missing values or that FIML estimation (missing values are missing at random) Listwise deletion must be specified under the DATA command
SUMMARY: PROS Many great features not available in other packages Ability to combine various data types Path-centric specification: Relatively intuitive and easy to learn Extensions to larger models are easy to implement Commitment to development Excellent support
SUMMARY: CONS Cost: Mplus is a commercial package Annual fee: Support and updates Matrix specification is not supported No data management beyond Monte Carlo capabilities, transformations, and selection of observations
ADDITIONAL RESOURCES Technical and theoretical support: Homepage: www.statmodel.com Discussion forum: www.statmodel.com/cgibin/discus/discus.cgi Online manuals and tutorials Other websites: http://www.ats.ucla.edu/stat/mplus/
MPLUS COMMERCIAL VERSION Current version: 6.0 (new!) Base Program: 595 USD Mixture "add-on": 745 USD Multilevel "add-on": 745 USD Combination "add-on": 895 USD
MPLUS DEMO VERSION Free version of the software www.statmodel.com/demo.shtml Limit on the number of variables 2 independent variables 6 dependent variables
CONCLUSION Advantages and disadvantage of using only one program Each program has strengths and weaknesses Use the correct one for the problem at hand
EXAMPLES Path analysis (3.11) Structural equation modeling (511) Latent growth curve analysis Quadratic growth (6.9) Paralleled processes (6.13) Mixture modeling (7.1) Advanced models Latent class growth curves analysis Complier average causal effect
PATH ANALYSIS
PATH ANALYSIS TITLE: Path analysis with continuous dependent variables DATA: FILE IS ex3.11.dat; VARIABLE: NAMES ARE y1-y3 x1-x3; MODEL: y1 y2 ON x1 x2 x3; y3 ON y1 y2 x2;
STRUCTURAL EQUATION MODEL
STRUCTURAL EQUATION MODEL TITLE: SEM with continuous indicators DATA: FILE IS ex5.11.dat; VARIABLE: NAMES ARE y1-y12; MODEL: f1 BY y1-y3; f2 BY y4-y6; f3 BY y7-y9; f4 BY y10-y12; f4 ON f3; f3 ON f1 f2;
LATENT GROWTH MODEL
LATENT GROWTH MODEL TITLE: Quadratic growth model DATA: FILE IS ex6.9.dat; VARIABLE: NAMES ARE y11-y14; MODEL: i s q y11@0 y12@1 y13@2 y14@3; PLOT: Type is Plot3; Series = y11 (0) y12 (1) y13 (2) y14 (3);
LATENT GROWTH MODEL
LATENT GROWTH MODEL TITLE: Growth model for two parallel processes DATA: FILE IS ex6.13.dat; VARIABLE: NAMES ARE y11- y24; MODEL: i1 s1 y11@0 y12@1 y13@2 y14@3; i2 s2 y21@0 y22@1 y23@2 y24@3; s1 ON i2; s2 ON i1;
MIXTURE MODEL
MIXTURE MODEL TITLE: Mixture regression analysis DATA: FILE IS ex7.1.dat; VARIABLE: NAMES ARE y x1 x2; CLASSES = c (2); ANALYSIS: TYPE = MIXTURE; MODEL: %OVERALL% y ON x1 x2; c ON x1; %c#2% y ON x2; y;
GROWTH MIXTURE MODEL
COMPLIER AVERAGE CAUSAL EFFECT Outcome Compliance Covariates Treatment