Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background

Size: px
Start display at page:

Download "Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background"

Transcription

1 An Introduction to Multiple Imputation and its Application Craig K. Enders University of California - Los Angeles Department of Psychology cenders@psych.ucla.edu Background Work supported by Institute of Educational Sciences award R35D Missing Data Theory Rubin s (1976) missing data mechanisms describe different relations between the probability of nonresponse and the data Each incomplete variable consists of observed scores and hypothetical values for the non-responders Mechanisms describe different relations between nonresponse and the observed / hypothetical scores 3 Motivating Example participants from a smoking cessation study Participants report the number of years smoking and number of cigarettes smoked 25% of respondents do not report the number of cigarettes smoked Years Cigarettes 7 9 NA NA 15 NA NA NA

2 Missing Completely At Random (MCAR) Missing At Random (MAR) The probability of missing data on Y is unrelated to other variables and to the hypothetical values of Y No systematic determinants of missingness e.g., A data collection app failed to capture or transmit data, participants forget to respond for idiosyncratic reasons The probability of missing data on Y is related to observed scores but not to the hypothetical values of Y Differences between the complete cases and nonresponders vanish after controlling for observed variables e.g., Long-time smokers have a greater tendency for nonresponse, but the number of cigarettes smoked carries no additional information 5 6 Not Missing At Random (NMAR) Why Mechanisms Matter The probability of missing data on Y is related to the hypothetical values of Y itself The complete cases and non-responders systematically differ even after controlling for observed variables e.g., Participants who smoke more frequently are less likely to respond, even after adjusting for years smoking Mechanisms function as analysis assumptions, estimates are biased when assumptions are violated Older approaches such as deletion require MCAR, other methods make no attempt to satisfy any mechanism Multiple imputation, maximum likelihood, and Bayesian estimation assume MAR 7

3 Missing at Random Example Years Cigarettes Scatterplot of Observed and Hypothetical Scores 6 1 Number of years smoking and number of cigarettes smoked 25% of respondents do not report the number of cigarettes smoked The likelihood of nonresponse increases with years smoking NA NA NA NA Incomplete cases 15 NA 9 1 Impact of Deletion on Impact of Deletion on Full data Complete cases Incomplete cases (observed data) 11 Full data (hypothetical) 22 Incomplete cases (missing values) Complete cases

4 Multiple Imputation and Maximum Likelihood Why Imputation? Multiple imputation and maximum likelihood are MARbased methods that are widely available in software Multiple imputation fills in the data prior to analysis, maximum likelihood estimates parameters directly from the observed data Methods are equivalent with normally distributed data Imputation is often better because it allows researchers to tailor the missing data handling procedure to honor the features of the data and/or a particular analysis e.g., Mixtures of continuous and categorical variables, scale scores computed from questionnaire items, multilevel data 13 1 Multiple Imputation Overview Multiple Imputation Steps: Imputation Multiple imputation generates several complete data sets (e.g., M = or more), each with different imputations Unique regression coefficients generate each data set Analyzing multiple complete data sets provides a mechanism to adjust standard errors for missing data The imputation phase creates multiple copies of the data, each with different replacement values X Y Z 3 3 NA NA NA NA NA 6 X Y Z X Y Z X Y Z

5 Multiple Imputation Steps: Analysis Multiple Imputation Steps: Pooling In the analysis phase the researcher analyzes and obtains estimates from each complete data set X Y Z X Y Z X Y Z The pooling phase combines estimates and standard errors into a single set of results ˆθ = θ 1 +θ 2 +θ 3 3 X X X θ1 θ2 θ3 Y Y Y X θ1 Y X θ2 Y X θ3 Y 17 1 Imputation Phase Multiple Imputation: Imputation Phase The imputation phase uses a regression model to define a distribution of plausible replacement scores Imputations are randomly sampled from this distribution An iterative Bayesian estimation algorithm provides unique parameter estimates for each round of imputation 19

6 Imputation Example: Years = Imputation = Predicted Score + Noise Yˆ Cigs = β + β 1 Years = () = 9.77 Cigs mis ~ N Y ˆ 2 ( Cigs, σ ε ) ~ N 9.77,.17 Cigs mis = ˆ Y Cigs + ε Imputation Example: Years = 1 Imputation = Predicted Score + Noise Yˆ Cigs = β + β 1 Years = (1) = 1.69 Cigs mis ~ N Y ˆ 2 ( Cigs, σ ε ) ~ N 1.69,.17 Cigs mis = ˆ Y Cigs + ε 23 2

7 Imputation Example: Years = Imputation = Predicted Score + Noise Yˆ Cigs = β + β 1 Years = () = Cigs mis = ˆ Y Cigs + ε Cigs mis ~ N Y ˆ 2 ( Cigs, σ ε ) ~ N 11.59, Imputation Scatterplot Updating Parameter Values The next round of imputation requires new regression parameters A Bayesian estimation algorithm samples new estimates from a distribution of plausible values Akin to estimating the regression from the filled-in data and randomly perturbing the estimates 27 2

8 Alternate Regression Lines Imputation Example: Years = Alternate (perturbed) regressions Complete-data regression 29 Yˆ Cigs = β + β 1 Years = () = 1.76 Cigs mis ~ N Updated regression line Y ˆ 2 ( Cigs, σ ε ) ~ N 1.76, Bayesian Estimation Steps for a Single Iteration Burn-in Interval 1. Update residual variance 2. Update coefficients 2 σ β ε β 1 Iteration 1 β (t ) 2(t P( β σ ) ε, Cigs (t 1) imp, Years) 2(t σ ) ε ~ P σ 2 ε β (t 1), Cigs (t 1) imp, Years 2 3 Iterate... Burn-in interval (t ) Cigs mis 3. missing values (t ~ N β ) (t ) 2(t ) + β 1 ( Years), σ ε Save Data Set

9 Thinning (Between-Imputation) Interval Thinning Interval, Continued Iteration Iteration Thinning interval 2 Thinning interval 3 3 Iterate... Iterate... Save Data Set 2 6 Save Data Set Multivariate Missing Data FCS Imputation Scheme Fully conditional specification (chained equations or sequential regression imputation) imputes incomplete variables one at a time in a sequence FCS imputation uses a series of univariate regression models to impute incomplete variables in a sequence Joint model imputation uses multivariate regression to impute the incomplete variables in single step Update Y1 Parameters Update Y2 Parameters Update Y3 Parameters Variable-by-variable and multivariate imputation are equivalent with normally distributed data Y1 Y2, Y3 Y2 Y1, Y3 Y3 Y1, Y2 Save a data set 35 36

10 Years Cigs Efficacy Smoking Data 7 9 NA NA NA 1 11 Algorithmic Steps for a Single Iteration of FCS Smoking cessation study where number of cigarettes smoked and efficacy to quit are incomplete Pattern Years Cigs Efficacy 1 O O O 2 O M O 3 O O M O M M NA 1 15 NA NA NA NA β β 1 β 2 σ ε 2 1. (t ) Cigs mis (t 1) ~ N ˆ Y Cigs, σ ε 2 Yˆ Cigs = β + β 1 Years + β 2 SE imp 2: Self-Efficacy to Quit (t ) SE mis ~ N ˆ Y SE, σ e 2 Yˆ SE = γ + γ 1 Years σ e 2 γ γ 1 γ 2 (t ) + γ 2 Cigs imp Years Cigs Efficacy Years Cigs Efficacy Years Cigs Efficacy Multiple Imputation: Analysis and Pooling Phases

11 Analysis and Pooling Pooling Estimates In the analysis phase the researcher analyzes and obtains estimates from each complete data set The pooling phase combines the estimates and standard errors into a single set of results Significance tests are performed on the pooled values The multiple imputation point estimate is the arithmetic average of the M complete-data estimates Pooled estimate ˆθ = M m=1 M ˆθ m Estimate from imputed data set m Number of data sets 1 2 Example: Descriptives Example: Correlations Data Set 1 Data Set 2 Data Set 3 Data Set 1 Data Set 2 Data Set 3 M SD N M SD N Years Cigs SE Years Cigs SE Years Cigs SE Years Cigs SE Years Years Years Years 1. Years 1. Years 1. Cigs Cigs Cigs Cigs.5 1. Cigs.3 1. Cigs.5 1. SE SE SE SE SE SE Pooled Estimates Pooled Estimates M ˆθ m m=1 ˆθ = M = = 1.3 M SD N Years Cigs SE M ˆθ m m=1 ˆθ = M = =.37 Years Cigs SE Years 1. Cigs.9 1. SE

12 Pooling Standard Errors Standard Error Decomposition Averaging standard errors underestimates sampling variability because the component standard errors are computed from complete data sets Imputation standard errors consist of two components The imputation standard error combines complete-data sampling error and missing data uncertainty Average squared standard error Variance of estimates across imputed data sets Correction for using finite imputations Within-imputation variance estimates complete-data sampling error, and between-imputation variance captures additional noise from the missing data SE = V T = V W + V B + V B M V W + V B + V B M = V T 5 6 Significance Test A test statistic is based on the pooled estimate and standard error Pooled estimate Hypothesized value Single-Level Imputation with the Blimp Graphical Interface t ( or z) = ˆθ θ SE Pooled standard error 7

13 Blimp Software and Data Motivating Example The Blimp application for Mac OS and Windows was developed with support from Institute of Educational Sciences award R35D1556 Blimp can accommodate mixtures of categorical (nominal or ordinal) and continuous variables in data sets with up to three levels Software, raw data, and analysis scripts available at appliedmissingdata.com/multilevel-imputation.html A math problem solving intervention randomly assigns students to an intervention or a control curriculum Probsolv = β + β 1 ( Efficacy) + β 2 Disab2 + β 3 ( Disab3) + β 5 ( Txcode) + ε +β Teachexp The analysis is a regression model that predicts problemsolving scores from the intervention code and covariates 9 5 Input Data Import Data Variable Description Metric school School identifier variable Nominal Choose Import Data from the File menu, then select the location of the input text file txcode Treatment code ( = control, 1 = intervention) Nominal pctminor Percentage of minority students Numeric teachexp Teacher experience Numeric stanmath Standardized math scores Numeric probsolv End-of-year problem-solving scores Numeric efficacy Math self-efficacy (6-point rating scale) Ordinal disab Disability classification (three groups) Nominal 51 52

14 Data View From the Data View tab, specify the delimiter (space or comma), enter the missing value code, and click Import Variable View From the Variable View tab, assign names and scales to the variables, then click Done 53 5 Specifying an Imputation Model From the pull-down, select Specify Model. An interface will appear that allows you to specify the variables to be included in the imputation model as well as various algorithmic options. Model Tab From the Model tab, click Single-Level Imputation and use the right (left) arrow to select (remove) variables from the imputation model

15 MCMC Tab From the MCMC tab, specify the algorithmic options. The radio buttons at the bottom of the page can be left at their default values. Output Tab From the Output tab, specify a name and format for the imputed data sets. Click the PSR ratio button for convergence diagnostics. Preliminary iterations Iterations separating each data set Number of imputed data sets Seed for random number generator Imputations in comma or space delimited files Default estimation settings (no need to change in most cases) Imputations in a stacked file (R, SPSS, SAS) or separate files (Mplus) Potential scale reduction (PSR) factor diagnostic tables 57 5 Blimp Command Script Clicking the Done button on the Output tab generates a Blimp command script that reflects the options selected from the graphical interface. Running Blimp From the pull-down, select Run. A dialog box will prompt you to save the Blimp command script. 59 6

16 Blimp Output Window Mplus Analysis of Blimp Data Blimp will begin running immediately after saving the file. An output window will appear that displays computational progress, the variable order for the imputed data set(s), and diagnostic tables (if selected). data: file = imputationslist.csv; type = imputation; variable: names = school txcode pctminor teachexp stanmath probsolv efficacy disab; usevariables = probsolv efficacy teachexp txcode disab2 disab3; define: if (disab eq 1) then disab2 = ; if (disab eq 1) then disab3 = ; if (disab eq 2) then disab2 = 1; if (disab eq 2) then disab3 = ; if (disab eq 3) then disab2 = ; if (disab eq 3) then disab3 = 1; center efficacy disab2 disab3 teachexp (grandmean); model: probsolv on efficacy disab2 disab3 teachexp txcode; output: stdyx; R Analysis of Blimp Data SAS Analysis of Blimp Data # load libraries library(mitml) library(nlme) # read stacked blimp file path <- c("~/desktop/example/imputations.csv") impdata <- read.csv(file = path, head = FALSE, sep = ",") names(impdata) = c("imp", "school", "txcode", "pctminor", "teachexp", "stanmath", "probsolv", "efficacy", "disab") impdata$disab2[impdata$disab == 1] <- impdata$disab3[impdata$disab == 1] <- impdata$disab2[impdata$disab == 2] <- 1 impdata$disab3[impdata$disab == 2] <- impdata$disab2[impdata$disab == 3] <- impdata$disab3[impdata$disab == 3] <- 1 # split stacked data into separate files implist <- split(impdata, impdata$imp) implist <- as.mitml.list(implist) # regression with lm model <- with(implist, lm(probsolv ~ efficacy + disab2 + disab3 + teachexp + txcode)) n <- 1 numpredictors <- 5 dfdenom <- n - numpredictors - 1 testestimates(model, df.com = dfdenom) 63 * read data and compute dummy codes. data imputations; infile '/folders/myfolders/imputations.csv' delimiter = ','; input _imputation_ school txcode pctminor teachexp stanmath probsolv efficacy disab; disab2 = ; disab3 = ; if disab = 2 then disab2 = 1; if disab = 3 then disab3 = 1; run; * estimate regression model; proc reg data = imputations outest = estimates covout; model probsolv = efficacy disab2 disab3 teachexp txcode; by _imputation_; run; * pool estimates; proc mianalyze data = estimates edf = 99; modeleffects Intercept efficacy disab2 disab3 teachexp txcode; run; 6

17 SPSS Analysis of Blimp Data * read data and compute dummy codes. data list free file = '/users/craig/desktop/example/imputations.csv' /imputation_ school txcode pctminor teachexp stanmath probsolv efficacy disab. compute disab2 =. compute disab3 =. if (disab = 2) disab2 = 1. if (disab = 3) disab3 = 1. exe. Two-Level Imputation with the Blimp Graphical Interface * split file into separate data sets. sort cases by imputation_. split file layered by imputation_. * analysis and pooling. regression /descriptives mean stddev corr sig n /dependent probsolv /method=enter efficacy disab2 disab3 teachexp txcode Motivating Example Input Data A math problem solving intervention randomly assigns schools to an intervention or a control curriculum Probsolv ij = β + β 1 ( Efficacy j ) + β 2 Disab2 ij + β 3 ( Disab3 ij ) + β 5 ( Txgrp j ) + u j + u 1 j ( Efficacy j ) + ε ij +β Teachexp j The analysis is a random slope regression model that predicts problem-solving scores from the intervention code and covariates Variable Description Metric school School identifier variable Nominal txcode Treatment code ( = control, 1 = intervention) Nominal pctminor Percentage of minority students Numeric teachexp Teacher experience Numeric stanmath Standardized math scores Numeric probsolv End-of-year problem-solving scores Numeric efficacy Math self-efficacy (6-point rating scale) Ordinal disab Disability classification (three groups) Nominal 67 6

18 Import Data Choose Import Data from the File menu, then select the location of the input text file Data View From the Data View tab, specify the delimiter (space or comma), enter the missing value code, and click Import 69 7 Variable View Specifying an Imputation Model From the Variable View tab, assign names and scales to the variables, then click Done From the pull-down, select Specify Model. An interface will appear that allows you to specify the variables to be included in the imputation model as well as various algorithmic options

19 Model Tab From the Model tab, click Single-Level Imputation and use the right (left) arrow to select (remove) variables from the imputation model. Model Tab From the Model tab, move the level-2 identifier variable to the Cluster-Level Identifier Variables box, and use the right (left) arrow to select (remove) variables from the imputation model Specifying a Random Slope Select the Random Slopes from the Build Terms dropdown, and select the pair of variables that have a random association. MCMC Tab From the MCMC tab, specify the algorithmic options. The radio buttons at the bottom of the page can be left at their default values. Preliminary iterations Iterations separating each data set Number of imputed data sets Seed for random number generator Default estimation settings (no need to change in most cases) 75 76

20 Output Tab From the Output tab, specify a name and format for the imputed data sets. Click the PSR ratio button for convergence diagnostics. Blimp Command Script Clicking the Done button on the Output tab generates a Blimp command script that reflects the options selected from the graphical interface. Imputations in comma or space delimited files Imputations in a stacked file (R, SPSS, SAS) or separate files (Mplus) Potential scale reduction (PSR) factor diagnostic tables 77 7 Running Blimp Blimp Output Window From the pull-down, select Run. A dialog box will prompt you to save the Blimp command script. Blimp will begin running immediately after saving the file. An output window will appear that displays computational progress, the variable order for the imputed data set(s), and diagnostic tables (if selected). 79

21 Mplus Analysis of Blimp Data Mplus Analysis of Blimp Data, Continued data: file = imputationslist.csv; type = imputation; variable: names = school txgrp pctminor teachexp stanmath probsolv efficacy disab; usevariables = probsolv efficacy teachexp txgrp disab2 disab3; cluster = school; within = efficacy disab2 disab3; between = teachexp txgrp; define: if (disab eq 1) then disab2 = ; if (disab eq 1) then disab3 = ; if (disab eq 2) then disab2 = 1; if (disab eq 2) then disab3 = ; if (disab eq 3) then disab2 = ; if (disab eq 3) then disab3 = 1; center efficacy teachexp (grandmean); analysis: type = twolevel random; model: %within% effslope probsolv on efficacy; probsolv on disab2 disab3; %between% probsolv on teachexp txgrp; probsolv; effslope; probsolv with effslope; 1 2 R Analysis of Blimp Data R Analysis of Blimp Data, Continued # load libraries library(mitml) library(nlme) # read stacked blimp file path <- c("~/desktop/example/imputations.csv") impdata <- read.csv(file = path, head = FALSE, sep = ",") names(impdata) = c("imp", "school", "txcode", "pctminor", "teachexp", "stanmath", "probsolv", "efficacy", "disab") impdata$disab2[impdata$disab == 1] <- impdata$disab3[impdata$disab == 1] <- impdata$disab2[impdata$disab == 2] <- 1 impdata$disab3[impdata$disab == 2] <- impdata$disab2[impdata$disab == 3] <- impdata$disab3[impdata$disab == 3] <- 1 # split stacked data into separate files implist <- split(impdata, impdata$imp) implist <- as.mitml.list(implist) # multilevel regression with lm require(lme) model <- with(implist, lmer(probsolv ~ efficacy + disab2 + disab3 + teachexp + txcode + (efficacy school), REML = TRUE)) restricted <- with(implist, lmer(probsolv ~ (efficacy school), REML = TRUE)) # pooled estimates testestimates(model, var.comp = TRUE, df.com = NULL) # wald test testmodels(model, restricted, method = c("d1")) 3

22 SAS Analysis of Blimp Data SAS Analysis of Blimp Data, Continued * read data and compute dummy codes. data imputations; infile '/folders/myfolders/imputations.csv' delimiter = ','; input _imputation_ school txcode pctminor teachexp stanmath probsolv efficacy disab; disab2 = ; disab3 = ; if disab = 2 then disab2 = 1; if disab = 3 then disab3 = 1; run; * estimate mlm; ods _all_ close; proc mixed data = impdata noclprint; class school; model probsolv = efficacy disab2 disab3 teachexp txcode /solution covb; random intercept efficacy / subject = school type = un; by _imputation_; ods output SolutionF = estimates CovB = covb; ods listing; run; 5 * pool estimates; proc mianalyze parms = estimates; modeleffects efficacy disab2 disab3 teachexp txcode; run; * wald test with mult option; proc mianalyze parms = estimates mult covb(effectvar = rowcol) = covb;; modeleffects efficacy disab2 disab3 teachexp txcode; run; 6 SPSS Analysis of Blimp Data * read data and compute dummy codes. data list free file = '/users/craig/desktop/example/imputations.csv' /imputation_ school txcode pctminor teachexp stanmath probsolv efficacy disab. compute disab2 =. compute disab3 =. if (disab = 2) disab2 = 1. if (disab = 3) disab3 = 1. exe. * split file into separate data sets. sort cases by imputation_. split file layered by imputation_. * analysis and pooling. mixed probsolv with efficacy disab2 disab3 teachexp txcode /print = solution testcov /fixed = intercept efficacy disab2 disab3 teachexp txcode /random = intercept efficacy subject(school) covtype(un). 7

Blimp User s Guide. Version 1.0. Brian T. Keller. Craig K. Enders.

Blimp User s Guide. Version 1.0. Brian T. Keller. Craig K. Enders. Blimp User s Guide Version 1.0 Brian T. Keller bkeller2@ucla.edu Craig K. Enders cenders@psych.ucla.edu September 2017 Developed by Craig K. Enders and Brian T. Keller. Blimp was developed with funding

More information

Missing Data Analysis with SPSS

Missing Data Analysis with SPSS Missing Data Analysis with SPSS Meng-Ting Lo (lo.194@osu.edu) Department of Educational Studies Quantitative Research, Evaluation and Measurement Program (QREM) Research Methodology Center (RMC) Outline

More information

Missing Data Missing Data Methods in ML Multiple Imputation

Missing Data Missing Data Methods in ML Multiple Imputation Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:

More information

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa Ronald H. Heck 1 In this handout, we will address a number of issues regarding missing data. It is often the case that the weakest point of a study is the quality of the data that can be brought to bear

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,

More information

Multiple Imputation with Mplus

Multiple Imputation with Mplus Multiple Imputation with Mplus Tihomir Asparouhov and Bengt Muthén Version 2 September 29, 2010 1 1 Introduction Conducting multiple imputation (MI) can sometimes be quite intricate. In this note we provide

More information

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Examples: Missing Data Modeling And Bayesian Analysis CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Mplus provides estimation of models with missing data using both frequentist and Bayesian

More information

Simulation of Imputation Effects Under Different Assumptions. Danny Rithy

Simulation of Imputation Effects Under Different Assumptions. Danny Rithy Simulation of Imputation Effects Under Different Assumptions Danny Rithy ABSTRACT Missing data is something that we cannot always prevent. Data can be missing due to subjects' refusing to answer a sensitive

More information

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Multiple Imputation for Missing Data Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Outline Missing data mechanisms What is Multiple Imputation? Software Options

More information

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent

More information

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in

More information

Handling missing data for indicators, Susanne Rässler 1

Handling missing data for indicators, Susanne Rässler 1 Handling Missing Data for Indicators Susanne Rässler Institute for Employment Research & Federal Employment Agency Nürnberg, Germany First Workshop on Indicators in the Knowledge Economy, Tübingen, 3-4

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex

More information

Missing Data: What Are You Missing?

Missing Data: What Are You Missing? Missing Data: What Are You Missing? Craig D. Newgard, MD, MPH Jason S. Haukoos, MD, MS Roger J. Lewis, MD, PhD Society for Academic Emergency Medicine Annual Meeting San Francisco, CA May 006 INTRODUCTION

More information

Introduction to Mixed Models: Multivariate Regression

Introduction to Mixed Models: Multivariate Regression Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate

More information

Epidemiological analysis PhD-course in epidemiology

Epidemiological analysis PhD-course in epidemiology Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 9. oktober 2012 Multivariate tables Agenda today Age standardization Missing data 1 2 3 4 Age standardization

More information

Epidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014

Epidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014 Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 25 th February 2014 Age standardization Incidence and prevalence are strongly agedependent Risks rising

More information

An introduction to SPSS

An introduction to SPSS An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible

More information

Missing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999.

Missing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999. 2 Schafer, J. L., Graham, J. W.: (2002). Missing Data: Our View of the State of the Art. Psychological methods, 2002, Vol 7, No 2, 47 77 Rosner, B. (2005) Fundamentals of Biostatistics, 6th ed, Wiley.

More information

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017 Introduction to Hierarchical Linear Model Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017 1 Outline What is Hierarchical Linear Model? Why do nested data create analytic problems? Graphic presentation

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

Example Using Missing Data 1

Example Using Missing Data 1 Ronald H. Heck and Lynn N. Tabata 1 Example Using Missing Data 1 Creating the Missing Data Variable (Miss) Here is a data set (achieve subset MANOVAmiss.sav) with the actual missing data on the outcomes.

More information

MISSING DATA AND MULTIPLE IMPUTATION

MISSING DATA AND MULTIPLE IMPUTATION Paper 21-2010 An Introduction to Multiple Imputation of Complex Sample Data using SAS v9.2 Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT This

More information

Generalized least squares (GLS) estimates of the level-2 coefficients,

Generalized least squares (GLS) estimates of the level-2 coefficients, Contents 1 Conceptual and Statistical Background for Two-Level Models...7 1.1 The general two-level model... 7 1.1.1 Level-1 model... 8 1.1.2 Level-2 model... 8 1.2 Parameter estimation... 9 1.3 Empirical

More information

ANNOUNCING THE RELEASE OF LISREL VERSION BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3

ANNOUNCING THE RELEASE OF LISREL VERSION BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3 ANNOUNCING THE RELEASE OF LISREL VERSION 9.1 2 BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3 THREE-LEVEL MULTILEVEL GENERALIZED LINEAR MODELS 3 FOUR

More information

NORM software review: handling missing values with multiple imputation methods 1

NORM software review: handling missing values with multiple imputation methods 1 METHODOLOGY UPDATE I Gusti Ngurah Darmawan NORM software review: handling missing values with multiple imputation methods 1 Evaluation studies often lack sophistication in their statistical analyses, particularly

More information

Missing Data Techniques

Missing Data Techniques Missing Data Techniques Paul Philippe Pare Department of Sociology, UWO Centre for Population, Aging, and Health, UWO London Criminometrics (www.crimino.biz) 1 Introduction Missing data is a common problem

More information

- 1 - Fig. A5.1 Missing value analysis dialog box

- 1 - Fig. A5.1 Missing value analysis dialog box WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation

More information

Hierarchical Generalized Linear Models

Hierarchical Generalized Linear Models Generalized Multilevel Linear Models Introduction to Multilevel Models Workshop University of Georgia: Institute for Interdisciplinary Research in Education and Human Development 07 Generalized Multilevel

More information

Multiple imputation using chained equations: Issues and guidance for practice

Multiple imputation using chained equations: Issues and guidance for practice Multiple imputation using chained equations: Issues and guidance for practice Ian R. White, Patrick Royston and Angela M. Wood http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/full By Gabrielle Simoneau

More information

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses

More information

Missing Data. SPIDA 2012 Part 6 Mixed Models with R:

Missing Data. SPIDA 2012 Part 6 Mixed Models with R: The best solution to the missing data problem is not to have any. Stef van Buuren, developer of mice SPIDA 2012 Part 6 Mixed Models with R: Missing Data Georges Monette 1 May 2012 Email: georges@yorku.ca

More information

IBM SPSS Missing Values 21

IBM SPSS Missing Values 21 IBM SPSS Missing Values 21 Note: Before using this information and the product it supports, read the general information under Notices on p. 87. This edition applies to IBM SPSS Statistics 21 and to all

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Handbook of Statistical Modeling for the Social and Behavioral Sciences

Handbook of Statistical Modeling for the Social and Behavioral Sciences Handbook of Statistical Modeling for the Social and Behavioral Sciences Edited by Gerhard Arminger Bergische Universität Wuppertal Wuppertal, Germany Clifford С. Clogg Late of Pennsylvania State University

More information

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU. SOS3003 Applied data analysis for social science Lecture note 04-2009 Erling Berge Department of sociology and political science NTNU Erling Berge 2009 1 Missing data Literature Allison, Paul D 2002 Missing

More information

Multiple-imputation analysis using Stata s mi command

Multiple-imputation analysis using Stata s mi command Multiple-imputation analysis using Stata s mi command Yulia Marchenko Senior Statistician StataCorp LP 2009 UK Stata Users Group Meeting Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi

More information

Missing Data. Where did it go?

Missing Data. Where did it go? Missing Data Where did it go? 1 Learning Objectives High-level discussion of some techniques Identify type of missingness Single vs Multiple Imputation My favourite technique 2 Problem Uh data are missing

More information

PASW Missing Values 18

PASW Missing Values 18 i PASW Missing Values 18 For more information about SPSS Inc. software products, please visit our Web site at http://www.spss.com or contact SPSS Inc. 233 South Wacker Drive, 11th Floor Chicago, IL 60606-6412

More information

Missing data analysis. University College London, 2015

Missing data analysis. University College London, 2015 Missing data analysis University College London, 2015 Contents 1. Introduction 2. Missing-data mechanisms 3. Missing-data methods that discard data 4. Simple approaches that retain all the data 5. RIBG

More information

SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY

SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 6 SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE

More information

PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects

PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects Let s create a data for a variable measured repeatedly over five occasions We could create raw data (for each

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models

Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models Bengt Muth en University of California, Los Angeles Tihomir Asparouhov Muth en & Muth en Mplus

More information

An imputation approach for analyzing mixed-mode surveys

An imputation approach for analyzing mixed-mode surveys An imputation approach for analyzing mixed-mode surveys Jae-kwang Kim 1 Iowa State University June 4, 2013 1 Joint work with S. Park and S. Kim Ouline Introduction Proposed Methodology Application to Private

More information

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a Chapter 3 Missing Data 3.1 Types of Missing Data ˆ Missing completely at random (MCAR) ˆ Missing at random (MAR) a ˆ Informative missing (non-ignorable non-response) See 1, 38, 59 for an introduction to

More information

Handling Data with Three Types of Missing Values:

Handling Data with Three Types of Missing Values: Handling Data with Three Types of Missing Values: A Simulation Study Jennifer Boyko Advisor: Ofer Harel Department of Statistics University of Connecticut Storrs, CT May 21, 2013 Jennifer Boyko Handling

More information

R software and examples

R software and examples Handling Missing Data in R with MICE Handling Missing Data in R with MICE Why this course? Handling Missing Data in R with MICE Stef van Buuren, Methodology and Statistics, FSBS, Utrecht University Netherlands

More information

DETAILED CONTENTS. About the Editor About the Contributors PART I. GUIDE 1

DETAILED CONTENTS. About the Editor About the Contributors PART I. GUIDE 1 DETAILED CONTENTS Preface About the Editor About the Contributors xiii xv xvii PART I. GUIDE 1 1. Fundamentals of Hierarchical Linear and Multilevel Modeling 3 Introduction 3 Why Use Linear Mixed/Hierarchical

More information

HANDLING MISSING DATA

HANDLING MISSING DATA GSO international workshop Mathematic, biostatistics and epidemiology of cancer Modeling and simulation of clinical trials Gregory GUERNEC 1, Valerie GARES 1,2 1 UMR1027 INSERM UNIVERSITY OF TOULOUSE III

More information

Performance of Sequential Imputation Method in Multilevel Applications

Performance of Sequential Imputation Method in Multilevel Applications Section on Survey Research Methods JSM 9 Performance of Sequential Imputation Method in Multilevel Applications Enxu Zhao, Recai M. Yucel New York State Department of Health, 8 N. Pearl St., Albany, NY

More information

From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here.

From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here. From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here. Contents About this Book...ix About the Authors... xiii Acknowledgments... xv Chapter 1: Item Response

More information

Handling missing data in cluster randomized trials: A demonstration of multiple imputation with PAN through SAS

Handling missing data in cluster randomized trials: A demonstration of multiple imputation with PAN through SAS Handling missing data in cluster randomized trials: A demonstration of multiple imputation with AN through SAS Jiangxiu Zhou a, Lauren E. Connell a, John W. Graham,a a Department of Biobehavioral Health,

More information

Missing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA

Missing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA Missing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA ABSTRACT Statistical analyses can be greatly hampered by missing

More information

Missing Data in Orthopaedic Research

Missing Data in Orthopaedic Research in Orthopaedic Research Keith D Baldwin, MD, MSPT, MPH, Pamela Ohman-Strickland, PhD Abstract Missing data can be a frustrating problem in orthopaedic research. Many statistical programs employ a list-wise

More information

Multiple Imputation for Multilevel Models with Missing Data Using Stat-JR

Multiple Imputation for Multilevel Models with Missing Data Using Stat-JR Multiple Imputation for Multilevel Models with Missing Data Using Stat-JR Introduction In this document we introduce a Stat-JR super-template for 2-level data that allows for missing values in explanatory

More information

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

BIOL 458 BIOMETRY Lab 10 - Multiple Regression BIOL 458 BIOMETRY Lab 0 - Multiple Regression Many problems in biology science involve the analysis of multivariate data sets. For data sets in which there is a single continuous dependent variable, but

More information

arxiv: v1 [stat.me] 29 May 2015

arxiv: v1 [stat.me] 29 May 2015 MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis Vincent Audigier 1, François Husson 2 and Julie Josse 2 arxiv:1505.08116v1 [stat.me] 29 May 2015 Applied Mathematics

More information

Last updated January 4, 2012

Last updated January 4, 2012 Last updated January 4, 2012 This document provides a description of Mplus code for implementing mixture factor analysis with four latent class components with and without covariates described in the following

More information

11.0 APPENDIX-B: COMPUTATION

11.0 APPENDIX-B: COMPUTATION 11.0 APPENDIX-B: COMPUTATION Computational details and the pseudo codes of the time-varying ARX(p t ) model and the MI-SRI composite imputation method will be given by using different statistical packages

More information

Introduction to Mplus

Introduction to Mplus Introduction to Mplus May 12, 2010 SPONSORED BY: Research Data Centre Population and Life Course Studies PLCS Interdisciplinary Development Initiative Piotr Wilk piotr.wilk@schulich.uwo.ca OVERVIEW Mplus

More information

Statistical Methods for the Analysis of Repeated Measurements

Statistical Methods for the Analysis of Repeated Measurements Charles S. Davis Statistical Methods for the Analysis of Repeated Measurements With 20 Illustrations #j Springer Contents Preface List of Tables List of Figures v xv xxiii 1 Introduction 1 1.1 Repeated

More information

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered

More information

The linear mixed model: modeling hierarchical and longitudinal data

The linear mixed model: modeling hierarchical and longitudinal data The linear mixed model: modeling hierarchical and longitudinal data Analysis of Experimental Data AED The linear mixed model: modeling hierarchical and longitudinal data 1 of 44 Contents 1 Modeling Hierarchical

More information

Performing Cluster Bootstrapped Regressions in R

Performing Cluster Bootstrapped Regressions in R Performing Cluster Bootstrapped Regressions in R Francis L. Huang / October 6, 2016 Supplementary material for: Using Cluster Bootstrapping to Analyze Nested Data with a Few Clusters in Educational and

More information

Tools for Imputing Missing Data

Tools for Imputing Missing Data ABSTRACT Tools for Imputing Missing Data Taylor Lewis, University of Maryland, College Park, MD Missing data frequently pose a problem to applied researchers and statisticians. Although a common approach

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

Opening Windows into the Black Box

Opening Windows into the Black Box Opening Windows into the Black Box Yu-Sung Su, Andrew Gelman, Jennifer Hill and Masanao Yajima Columbia University, Columbia University, New York University and University of California at Los Angels July

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Types of missingness and common strategies

Types of missingness and common strategies 9 th UK Stata Users Meeting 20 May 2003 Multiple imputation for missing data in life course studies Bianca De Stavola and Valerie McCormack (London School of Hygiene and Tropical Medicine) Motivating example

More information

ST Lab 1 - The basics of SAS

ST Lab 1 - The basics of SAS ST 512 - Lab 1 - The basics of SAS What is SAS? SAS is a programming language based in C. For the most part SAS works in procedures called proc s. For instance, to do a correlation analysis there is proc

More information

Repeated Measures Part 4: Blood Flow data

Repeated Measures Part 4: Blood Flow data Repeated Measures Part 4: Blood Flow data /* bloodflow.sas */ options linesize=79 pagesize=100 noovp formdlim='_'; title 'Two within-subjecs factors: Blood flow data (NWK p. 1181)'; proc format; value

More information

A Bayesian analysis of survey design parameters for nonresponse, costs and survey outcome variable models

A Bayesian analysis of survey design parameters for nonresponse, costs and survey outcome variable models A Bayesian analysis of survey design parameters for nonresponse, costs and survey outcome variable models Eva de Jong, Nino Mushkudiani and Barry Schouten ASD workshop, November 6-8, 2017 Outline Bayesian

More information

Smoking and Missingness: Computer Syntax 1

Smoking and Missingness: Computer Syntax 1 Smoking and Missingness: Computer Syntax 1 Computer Syntax SAS code is provided for the logistic regression imputation described in this article. This code is listed in parts, with description provided

More information

Online Supplementary Appendix for. Dziak, Nahum-Shani and Collins (2012), Multilevel Factorial Experiments for Developing Behavioral Interventions:

Online Supplementary Appendix for. Dziak, Nahum-Shani and Collins (2012), Multilevel Factorial Experiments for Developing Behavioral Interventions: Online Supplementary Appendix for Dziak, Nahum-Shani and Collins (2012), Multilevel Factorial Experiments for Developing Behavioral Interventions: Power, Sample Size, and Resource Considerations 1 Appendix

More information

Comparison of Hot Deck and Multiple Imputation Methods Using Simulations for HCSDB Data

Comparison of Hot Deck and Multiple Imputation Methods Using Simulations for HCSDB Data Comparison of Hot Deck and Multiple Imputation Methods Using Simulations for HCSDB Data Donsig Jang, Amang Sukasih, Xiaojing Lin Mathematica Policy Research, Inc. Thomas V. Williams TRICARE Management

More information

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by Paper CC-016 A macro for nearest neighbor Lung-Chang Chien, University of North Carolina at Chapel Hill, Chapel Hill, NC Mark Weaver, Family Health International, Research Triangle Park, NC ABSTRACT SAS

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

REALCOM-IMPUTE: multiple imputation using MLwin. Modified September Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol

REALCOM-IMPUTE: multiple imputation using MLwin. Modified September Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol REALCOM-IMPUTE: multiple imputation using MLwin. Modified September 2014 by Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol This description is divided into two sections. In the

More information

Lecture 26: Missing data

Lecture 26: Missing data Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:

More information

Reducing the Effects of Careless Responses on Item Calibration in Item Response Theory

Reducing the Effects of Careless Responses on Item Calibration in Item Response Theory Reducing the Effects of Careless Responses on Item Calibration in Item Response Theory Jeffrey M. Patton, Ying Cheng, & Ke-Hai Yuan University of Notre Dame http://irtnd.wikispaces.com Qi Diao CTB/McGraw-Hill

More information

CHAPTER 18 OUTPUT, SAVEDATA, AND PLOT COMMANDS

CHAPTER 18 OUTPUT, SAVEDATA, AND PLOT COMMANDS OUTPUT, SAVEDATA, And PLOT Commands CHAPTER 18 OUTPUT, SAVEDATA, AND PLOT COMMANDS THE OUTPUT COMMAND OUTPUT: In this chapter, the OUTPUT, SAVEDATA, and PLOT commands are discussed. The OUTPUT command

More information

LISREL 10.1 RELEASE NOTES 2 1 BACKGROUND 2 2 MULTIPLE GROUP ANALYSES USING A SINGLE DATA FILE 2

LISREL 10.1 RELEASE NOTES 2 1 BACKGROUND 2 2 MULTIPLE GROUP ANALYSES USING A SINGLE DATA FILE 2 LISREL 10.1 RELEASE NOTES 2 1 BACKGROUND 2 2 MULTIPLE GROUP ANALYSES USING A SINGLE DATA FILE 2 3 MODELS FOR GROUPED- AND DISCRETE-TIME SURVIVAL DATA 5 4 MODELS FOR ORDINAL OUTCOMES AND THE PROPORTIONAL

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types

More information

Path Analysis using lm and lavaan

Path Analysis using lm and lavaan Path Analysis using lm and lavaan Grant B. Morgan Baylor University September 10, 2014 First of all, this post is going to mirror a page on the Institute for Digital Research and Education (IDRE) site

More information

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved. StatCalc User Manual Version 9 for Mac and Windows Copyright 2018, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Introduction... 4 Getting Help... 4 Uninstalling StatCalc...

More information

AMELIA II: A Program for Missing Data

AMELIA II: A Program for Missing Data AMELIA II: A Program for Missing Data Amelia II is an R package that performs multiple imputation to deal with missing data, instead of other methods, such as pairwise and listwise deletion. In multiple

More information

Florida State University Libraries

Florida State University Libraries Florida State University Libraries Electronic Theses, Treatises and Dissertations The Graduate School 2013 Use of Item Parceling in Structural Equation Modeling with Missing Data Fatih Orcan Follow this

More information

The Mplus modelling framework

The Mplus modelling framework The Mplus modelling framework Continuous variables Categorical variables 1 Mplus syntax structure TITLE: a title for the analysis (not part of the syntax) DATA: (required) information about the data set

More information

Generalized Additive Models

Generalized Additive Models :p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN SAS for HLM Edps/Psych/Stat/ 587 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN SAS for HLM Slide 1 of 16 Outline SAS & (for Random

More information

Generalized Least Squares (GLS) and Estimated Generalized Least Squares (EGLS)

Generalized Least Squares (GLS) and Estimated Generalized Least Squares (EGLS) Generalized Least Squares (GLS) and Estimated Generalized Least Squares (EGLS) Linear Model in matrix notation for the population Y = Xβ + Var ( ) = In GLS, the error covariance matrix is known In EGLS

More information

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975. Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975. SPSS Statistics were designed INTRODUCTION TO SPSS Objective About the

More information

PSY 9556B (Feb 5) Latent Growth Modeling

PSY 9556B (Feb 5) Latent Growth Modeling PSY 9556B (Feb 5) Latent Growth Modeling Fixed and random word confusion Simplest LGM knowing how to calculate dfs How many time points needed? Power, sample size Nonlinear growth quadratic Nonlinear growth

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

An Introduction to Growth Curve Analysis using Structural Equation Modeling

An Introduction to Growth Curve Analysis using Structural Equation Modeling An Introduction to Growth Curve Analysis using Structural Equation Modeling James Jaccard New York University 1 Overview Will introduce the basics of growth curve analysis (GCA) and the fundamental questions

More information

Robust Linear Regression (Passing- Bablok Median-Slope)

Robust Linear Regression (Passing- Bablok Median-Slope) Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their

More information