Missing Data Analysis with SPSS

Size: px
Start display at page:

Download "Missing Data Analysis with SPSS"

Transcription

1 Missing Data Analysis with SPSS Meng-Ting Lo Department of Educational Studies Quantitative Research, Evaluation and Measurement Program (QREM) Research Methodology Center (RMC)

2 Outline Missing Data Patterns and Mechanisms Traditional Techniques Listwise and pairwise deletion Mean substitution Regression and stochastic regression Hot deck imputation Averaging the available items Last observations carried forward Maximum Likelihood (ML) and Multiple Imputation (MI) SPSS with Multiple Imputation (demonstration and practice) Practical Issues/ Myths 2

3 High school longitudinal study of 2009: public-use data NCES secondary longitudinal studies, more than 21,000 9th graders in 944 schools Hsls09_MissingDataWorkshop_demo Hsls09_MissingDataWorkshop_demo2_imputed5 Hsls09_MissingDataWorkshop_demo2_IterationHistory Hsls09_MissingDataWorkshop_practice SPSS modules Missing Value Analysis Multiple Imputation Data and Material 3

4 The importance of dealing with missing data Rarely see a dataset that is complete and beautiful Traditional techniques rely on strict assumption about missing data mechanisms (rarely be achieved in real world) The problem of missing data: Treat it inappropriately, obtain unreliable and biased estimates, make incorrect conclusion of results Reduce the statistical power of your test to detect a significant effect (e.g., listwise deletion) 4

5 Missing data patterns Where is the missing data in your data set? Describing the location of missing data (shaded area). In old time: specific missing data handling methods were developed to deal with different missing data patterns. Now: MI and ML work well in any missing data patterns. Figures from p.4 in Enders, C. K. (2010). Applied missing data analysis. Guilford Press. 5

6 Missing data mechanisms (Donald Rubin, 1976) Describe the relationships between measured variables and the probability of missing data and essentially function as assumptions for missing data analysis (Enders, 2010, p.2). Missing complete at random (MCAR), Missing at random (MAR), and Missing not at random(mnar) Why data are missing? Possible explanation for missing data and find evidence to justify our claim. Missing data mechanisms are much important than percentage of missing. Percentage of missing is to know the scope of missing data problem. It governs the performance of different analytic techniques. 6

7 Missing data mechanisms Race DV: Reading Achievement R Asian 0 Asian 0 Caucasian 0 Asian 0 Asian 0 Caucasian 66 1 Caucasian 88 1 Caucasian 95 1 Caucasian Asian 86 1 Asian 56 1 Caucasian 78 1 missing observed Introduced by Rubin (1976), missingness is a binary variable that has a probability distribution Race: complete observed DV: missing for some students R: missing data indicator Whether the probability of missing data on a variable (R) is related to other variables in the dataset? The relationship between probability of missingness and other variables in the dataset is then used to determine the missing data mechanisms. 7

8 Missing not at random (MNAR) The probability of missing data on a variable Y is related to the values of Y itself, even after controlling for other variables (Enders, 2010, p.8). Example: There is no way to verify whether data is MNAR without knowing the actual values of Y. In some situation, you may have some sense about the actual values if you are in the field monitoring data collection process. Needs to use other techniques to handle missing data. 8

9 Missing at Random (MAR) The probability of missing data on a variable Y is related to some other measured variable(s), but not to the values of Y itself (Enders, 2010, p.6). Example: Because we do not know the actual value of Y Theoretical judgement about MAR by providing evidence. ML and MI assume MAR. 9

10 Missing Complete at Random (MCAR) The probability of missing data on a variable Y is unrelated to other measured variables and is unrelated to the values of Y itself (Enders, 2010, p.7). Example: Observed data are just a simple random sample of the hypothetically complete dataset. Find some evidence for MCAR. For example, comparing cases with missing and without missing of a variable on other measured variables, two groups should not have differences! 10

11 Finding evidence for MCAR or MAR: t-test Preforming a series of independent sample t-test to compare a group with missing and a group without missing on the mean of other variables in the dataset (categorical data, chi-square). Selfefficacy DV: Reading Achievement R Available in SPSS Missing Values Analysis module No sig difference implies MCAR A sig difference implies MAR (good) A good way to identify variables that is related to missingness, which can be used in MI (provide information to impute missing value) 11

12 Testing the MCAR: Little (1998) s MCAR Test Multivariate extension of the t-test approach: perform all t-tests simultaneously. A global test of MCAR, available in SPSS Missing Values Analysis module under EM procedure. Testing the Null hypothesis: the data is MCAR. Significant MCAR test and/or significant t-tests = an indication of MAR. Issues: (1) Do not identify variables that violate MCAR. (2) Low statistical power (type II error) when the number of variables that violate MCAR is small or weak relationship between missingness and data. 12

13 Traditional methods for handling missing data Listwise deletion Pairwise deletion Mean substitution Regression and Stochastic regression Hot deck imputation Averaging available items Last observation carried forward 13

14 Listwise Deletion (complete-case analysis)-include only cases with complete data Easy, convenient, available in all statistical software Waste data and resources Reduce sample size and statistical power Assume MCAR (otherwise produce biased estimates) 14

15 Listwise Deletion (complete-case analysis) Problems : 1. The remaining cases do not represent the entire sample well 2. Higher mean estimate 3. Reduce the variability of data Assume MAR for this example data GPA Complete data Listwise deletion Mean Var

16 Pairwise Deletion (available-case analysis)- analyses (e.g., correlation, regression) are conducted based on different subset of cases Assume MCAR Correlation r= σ XY σ x 2σ y 2 1. Cases with complete data for X&Y 2. Use cases having x or y alone (separate subsample) Estimation problem: r >1 or < Lack of consistent sample size: using different subsets of cases to estimate parameters, difficult to compute standard errors 16

17 Arithmetic Mean Imputation (mean substitution): using the mean of the available cases to fill in the missing value Schafer &Graham (2002) Y has some missing, replace the missing value for Y with the mean of Y calculated from cases without missing on Y. Reduce variability of the data and correlations. Severely bias the parameter estimate, even MCAR. X Y

18 Regression Imputation (conditional mean imputation): using the predicted scores from a regression equation of the complete cases to fill in the missing value Predicted score of Yi*=β 0 +β 1 X Schafer &Graham (2002) Reduce variability, overestimate correlations between variables and R 2, even MCAR. 18

19 Stochastic Regression Imputation: using the predicted scores from a regression equation of the complete cases to fill in the missing value + normally distributed error term N~(0,σ 2 ) Schafer &Graham (2002) Schafer &Graham (2002) Predicted score of Yi*=β 0 +β 1 X+ Zi Adding residual terms to the predicted values: restore the variability to the imputed data and eliminate biases. Provide unbiased estimates under MAR just like ML and MI! But attenuate the standard error, inflate type I error rate. 19

20 Hot-Deck imputation: impute the missing values from similar respondents Procedure: some respondents did not report their income, classified respondents into cells (groups) based on their demographic information such as age, gender, marital status; randomly draw an income value from similar respondents Schafer &Graham (2002) Reduce variability to some extent, produce biases on correlation estimates and regression coefficients. 20

21 Averaging the available items (multiple-item questionnaire) Researchers typically compute a scale score by summing or averaging the item responses that measure the same construct. For example, 5 items measuring well-being, a respondent answered 3 items but not all of the items, her/his scale score would be the average of those 3 items. Person mean substitution Potential problem : Cronbach s alpha is incorrect, may bias the variance and correlation. Use with caution, especially with high rate of item nonresponses. ML and MI are better approaches. 21

22 Last observation carried forward: longitudinal designs Observed data ID W1 W2 W3 W Observed data ID W1 W2 W3 W Replace the missing value with the observation that immediately before dropout. Assume the scores do not change from the previous measurement. Likely to produce biased estimate, even when data are MCAR. 22

23 Recommended methods for handling missing data Maximum likelihood method (full information maximum likelihood, FIML) Multiple imputation 23

24 Why FIML or Multiple imputation (MI)? Traditional methods have its own limitation and some of them have strict assumption about missing data mechanisms. Provides you with better and more trustworthy parameter estimates. Make the conclusion about your statistical test more appropriately. Allow you to have rigor on your study. 24

25 Full information maximum likelihood (FIML) Assume MAR and multivariate normality data. Implemented in structural equation modeling program such as Mplus (default) when the outcome is continuous. When used in the missing data context, using all the information in the dataset to directly estimate the parameters and standard errors; handling missing data in one-step. Does not drop any cases with missing values. Does not produce imputed datasets. FIML reads in the raw data of one case at a time, and maximizes the ML function for one case at a time. 25

26 Full information maximum likelihood (FIML) The computations for a case use the information only from the variables and the corresponding parameters for which the case has complete data (Enders, 2010, p.89). Implies: depending on the missing data pattern for that case, the computations differ slightly (the ML function is customized to different missing data pattern). Involving iterative processes, each time using different estimates of the parameters, until it finds a set of parameter values that maximize the likelihood function (Enders, 2010). i.e., maximize the probability of observing the data, find a model that best fit the data. ML converges: The parameter estimates no longer change across successive iterations. 26

27 Full information maximum likelihood (FIML) An iterative process: putting the distribution in all possible locations until the program finds a place where the distribution with a set of parameters that best fit the data (have the highest probability /likelihood of observing the data) Reading achievements 27

28 Multiple imputation (MI) Assume MAR, also called multiple stochastic regression imputation (iterative procedure). Available in Mplus, SAS, Stata, Blimp, SPSS, R and other. Involves three steps: Imputation Phase Analysis Phase Pooling Phase Imputed dataset 1 Imputed dataset 2 Results 1 Results 2 A dataset with missing data Pooled (overall) results Imputed dataset m Results m 28

29 Multiple imputation- imputation phase SPSS uses fully conditional specification (FCS) or chained equations imputation, multivariate imputation by chained equations (MICE) (a Markov Chain Monte Carlo algorithm) Does not rely on the assumption of multivariate normality. Flexible in handling different types of variables. Scale: linear regression Categorical: logistic regression ID Age Income Gender Specify the imputation model on a variable-by-variable basis. For each variable with missing data, a univariate (single dependent variable) imputation model is fitted using all other available variables in the model as predictors, then imputes missing values for the variable being fit (IBM SPSS Missing Values 24). 29

30 Multiple imputation- imputation phase The imputation process goes through all variables with missing value iteratively, every time with new/updated imputed values. Age Income Gender This process is repeated for several times When the maximum number of iterations is reached (specified by researchers or by default), the imputed values at the maximum iteration are saved (one imputed dataset is created). Request 5 imputations with 200 maximum iterations = SPSS runs the MCMC algorithm 5 times and save the imputed values at 200 th iteration each time. Generally, 5-10 iterations is sufficient, but recommended to be conservative. You may need to increase the number of iterations if the model hasn't converged (save iteration history data in SPSS and plot it to assess convergence). 30

31 Multiple imputation imputation phase What variables should be included in the imputation model? (1) At least the variables that you are going to use in the subsequent analysis should be included. For example, run a regression model and use gender, SES to predict freshman s GPA. Gender, SES, and GPA should be included in the imputation model. (2) Include auxiliary variables: variables are either correlates of missingness or correlates of an incomplete variable (Enders, 2010, p.17); these variables may not the study interest, but help improving the imputation quality and increasing the plausibility of MAR. For example, there are other variables such as parents education level, ACT, SAT, and other variables in the datasets which are correlated with variables of interest or their missingness. 31

32 Multiple imputation imputation phase How many imputed datasets are needed? There are strong associations between statistical power and number of imputations. Convention wisdom: 3-5 imputed datasets; however, study showed that with only 3 or 5 imputed datasets, the power is below its optimal level (Graham et al., 2007). According to Enders (2011), generating a minimum of 20 imputed datasets seems to be a good rule of thumb for many situations. If the proportion of missing data is > 50%, increasing the # of imputations > 40 and be thoughtful about the variables included in the imputation model. 32

33 Multiple imputation analysis phase The imputation phase generate m set of imputed datasets. The analysis phase: analyze the imputed datasets using the normal analysis procedure. For example, a researcher generates 20 datasets and now would like to use multiple regression to analyze the data. She/he will repeat multiple regression analysis 20 times, one analysis for each of the datasets. Dataset1 Dataset2 Paramter β SE Paramter β SE Intercept Intercept SES SES

34 Multiple imputation pooling phase Pooling point estimate: Pooling standard errors: θ = 1 m m 1 θ t m= # of imputed datasets θ t = parameter estimate for t dataset Take an average of the parameter estimates across m datasets The statistical significance of the θ can be calculated in the usual way by calculating the ratio θ / V T V T = V W + V B + V B m ; SE= V T = total sampling variance V W =within-imputation variance V T (the mean of the squared SE across m datasets) V B = between-imputation variance (variability of parameter estimate across m datasets; additional variance that is due to missing) V B = correction factor for a finite number m of imputation 34

35 Using SPSS to Deal with Missing Data 35

36 High school longitudinal study of 2009: public-use data NCES secondary longitudinal studies, more than 21,000 9th graders in 944 schools Selected sample: subsample of 500 students who took math and science course in 2009 Selected measures: The example data 9th grade sex (0=male), race/ethnicity (0=white), socioeconomic status 9th and 11th grade math IRT scores 9th grade math interest (3 items; 4 point Likert scale) 9th grade math self-efficacy (4 items; 4 point Likert scale) Demonstration dataset: Hsls09_MissingDataWorkshop_demo 36

37 Using SPSS to deal with missing data Delete cases with no data on any of the variables. All missing values need to be displayed as system missing (a blank cell) or user-defined missing (a value assigned by researcher, such as 999 or -8888). 37

38 Using SPSS to deal with missing data Change all missing values (either system missing or user-defined missing value) to a common value Transform-> click Recode into Same Variables -> Select all of the variables into the selection box-> click Old and New Values->

39 Using SPSS to deal with missing data Assign missing values for all the variables: In Variable View -> Click on one cell in the Missing column to assign -999 as a discrete missing value -> Click OK. Right click Copy -> Select all cells with numeric variables --- Click Paste. 39

40 Using SPSS to deal with missing data Define variables : In Variable View -> Under Measure column -> assign the scale for each of the variables. 40

41 Using SPSS to deal with missing data Analyze the pattern of missing data: Go to Analyze -> Multiple Imputation - > Analyze Patterns Select the variables excluding the ID to Analyze Across Variables For Minimum percentage missing for variable to be displayed, change to 0 -> Click OK (would like to see everything that is missing) 41

42 Using SPSS to deal with missing data Only 1.83% of the individual values are missing. Variables: the number of variables which contained missing values= 9 out of 12 (green) Cases: 409 cases have complete data (81.8%) (blue) ; 91 cases have at least one missing value on a variable Values: the number of individual values (out of 6000=12*500) that are missing = 110 (1.83%) (green) 42

43 Using SPSS to deal with missing data The number and percent missing for each variable. Notice, the variables are ordered by the amount of values they are missing (i.e. the percentage missing). Examine the percentage of missing for each variable, make sure that each percent missing makes sense based on your knowledge about this dataset! 43

44 Using SPSS to deal with missing data The pattern here is arbitrary. least highest Each pattern (row) reflects a group of cases with the same pattern of missing values (15 patterns of missing and nonmissing data) The variables along the bottom (x-axis) are ordered by the amount of missing values each contains. The percent missing for the 10 most common patterns Pattern 1 = no missing (81%) is the most prevalent pattern. Pattern 10= missing on MATH11 (10%) 44

45 Using SPSS to deal with missing data Request Little s MCAR test and independent sample t-tests for MAR Go to Analyze --- Missing Value Analysis--> Descriptive: Report Student t- test for each pair of continuous variables to examine MAR 45

46 Using SPSS to deal with missing data Request Little s MCAR test and Separate Variance t tests Go to Analyze --- Missing Value Analysis A note: If you get a warning message in the SPSS output that the EM algorithm failed to converge in 25 iterations, you can increase the maximum iterations by clicking on the EM button. 46

47 Using SPSS to deal with missing data Request Little s MCAR test and Separate Variance t-tests Scroll down in the SPSS Output window to the EM Means table: Under this table, you can find the result from Little s MCAR test. Non- significant results at p =.054 indicate the data are missing completely at random (MCAR). 47

48 Examine independent sample t-tests A significant t-test indicates the probability of missing is a function of the values on another variables. It s an indication of MAR! We have variables that can be used in the imputation model. 48

49 Analysis model Research Question: Can students SES and math self-efficacy predict their 11th grade math score? Dependent Variable: MATH11 Independent Variables: SES and EFF_total (sum of 4 items) Auxiliary variables (for imputation): SEX, RACE, MATH09, Math interest items Correlation analysis: these variables are correlated with variables of interest to some extent Independent sample t-test: some of them are correlated with missingness for variables of interest 49

50 Before imputation, set a random seed Transform-> Random Number Generators - > select Set Active Generator-> click Mersenne Twister -> select Set Starting Point and Fixed Value -> click OK. 50

51 Using SPSS to deal with missing data Conducting multiple imputation: Analyze-> Multiple Imputation-> Impute Missing Data Values-> Move the variables of interest to the Variables in Model box. 51

52 Variables-> 5 imputations will be implemented for demonstration purpose Missing value will be imputed 5 times and stored Name the dataset below the Create a new dataset button 52

53 Method-> Since the missing data pattern is arbitrary, selecting FCS Specify the number of maximum iterations = 200 Default =10; Increase the number of iterations if the Markov Chain Monte Carlo algorithm hasn't converged. PMM: still uses regression, but the imputed values are adjusted to match the nearest actual value in the dataset (from observations with the same predicted value with no missing on that variable). If the original variable is bounded by 0 and 40, the imputed values will also be bounded by 0 and 40. According to Paul Allison, there are some drawbacks of PMM in SPSS. 53

54 Constraints-> Click on Scan Data: examine the variable summary 1 You can specify the role of a variable during the imputation and constraint the range of imputed values (min, max, rounding) so that they are plausible Obtain integer values = specify 1 as the rounding denomination (6.648->7); obtain values rounded to the nearest cent, specify 0.01 (6.648->6.65) 2 3 This column allows you to specify the smallest denomination to accept. 54

55 Constraints-> If specify the Min and Max: Maximum draw procedure will be activated: it attempts to draw values for a case until it finds a set of values that are within the specified ranges Errors: if a set of values within the ranges is not obtained Increase the maximum draws Demonstration: no constraints on the range of variables 55

56 Imputation model: univariate model type, model effects, and # of values imputed Descriptive statistics: basic information before and after imputation Iteration history: information on the convergence performance 56

57 Outputs Hsls09_MissingDataWorkshop_demo2_imputed5 57

58 Datasets with imputed values are numbered 1 through M, where M is the number of imputations. Select the imputation from the drop-down list in the edit bar in Data view. 58

59 You can distinguish imputed values from observed values by cell background color. 59

60 Create composite score: Transform-> Compute Variable Compute the scale score (composite score) for self-efficacy in the stacked dataset This would apply to all the imputed datasets 60

61 Before the analysis: Data-> Split file Split the file by imputation number This invokes the analysis and pooling phase for multiple imputed datasets 61

62 Analyze data as usual SPSS provides pooled estimate for some analyses but not all Analyses with this icon, indicating that SPSS provides corresponding procedure to accommodate multiple imputed datasets Let s perform a multiple regression 62

63 SPSS outputs for multiple regression-descriptive statistics 63

64 SPSS outputs for multiple regression- correlation matrix 64

65 SPSS outputs for multiple regression- coefficient estimates Coefficients a Standardized Unstandardized Coefficients Coefficients Imputation Number Model B Std. Error Beta t Sig. Original data 1 (Constant) X1 Socio-economic status composite Fraction Missing Info. Relative Increase Variance Relative Efficiency EFF_total Pooled 1 (Constant) X1 Socio-economic status composite EFF_total a. Dependent Variable: X2 Mathematics IRT-estimated number right score Results differ slightly across imputed datasets SPSS provides pooled estimate for unstandardized regression coefficients! 65

66 Imputation Diagnostics 66

67 SPSS outputs for multiple regression- coefficient estimates Fraction missing info: The proportion of total sampling variance that is due to missing data (V B + V B m )/ V T for a parameter estimate, related to percentage missing for that variable for SES: 8.7% of the sampling variance is due to missing data A measure of the impact of missing data on parameter estimates 67

68 SPSS outputs for multiple regression- coefficient estimates Relative Increase Variance: how much the sampling variance would be increased (inflated) because of missingness (V B + V B m )/ V w for EFF_total: compared to the sampling variance for EFF_total assumed it has complete data, the estimated sampling variance for EFF_total (with missing) is 14.1% larger. Variables with larger percentage missingness tend to have larger relative increase variance. 68

69 SPSS outputs for multiple regression- coefficient estimates Relative efficiency: it is an efficiency estimate from m imputations relative to performing an infinite number of imputations 1/(1+F/M), where F= Fraction missing info, M= # of imputation. Close to 1 = more efficient, produce proper SE (won t produce too large SE) Large percentage of missing needs more imputations to achieve sufficient efficiency for parameter estimates The SE got from infinite # of imputations is 98.3% of SE got from 5 imputations (fraction of missing info) SAS documentation for multiple imputation (Horton & Lipsitz, 2001, p. 246) 69

70 Iteration history: Provides mean and standard deviation by iteration and imputation for continuous imputed variables Build the plot to examine the convergence of model 70

71 Assessing the performance of imputations Graphs > Chart Builder> select line chart 71

72 Assessing the performance of imputations

73 Assessing the performance of imputations In the Element Properties, select Value as the statistic to display. 4 73

74 Assessing the performance of imputations

75 Mean and standard deviation of the imputed values of SES at each iteration (200) for each of the 5 requested imputations (can be requested for each continuous imputed variable). The purpose of this plot is to look for trends or patterns. Model converge: the parameter values bounce around in a random fashion with no trend ( it reaches this phase immediately) and the different lines of imputations should be mixed with each other. 75

76 Assessing the performance of imputations using trace plots (using Ender s Macro The plot for mean and SD for imputed continuous variables can be requested using Ender s SPSS macro. An indication of the performance of the imputations. For using this macro: 1000 iterations with 2 imputed datasets. Provides additional convergence performance criterion: Potential scale reduction (PSR) for every 100 iteration: the MCMC is regarded as converge when the PSR <

77 Problematic or pathological case of non-convergence: Figure from Buuren, S. V., & Groothuis-Oudshoorn, K. (2010). mice: Multivariate imputation by chained equations in R. Journal of statistical software,

78 Healthy case of convergence: Figure from Buuren, S. V., & Groothuis-Oudshoorn, K. (2010). mice: Multivariate imputation by chained equations in R. Journal of statistical software,

79 Practice time! 79

80 High school longitudinal study of 2009: public-use data Selected sample: subsample of 490 participants who took math and science course in 2009 Selected measures: 9th grade sex (0=male), race/ethnicity (0=white), SES 9th and 11th grade math and science GPA 9th grade science utility (3 items; 4 point Likert scale) 9th grade science self-efficacy (4 items; 4 point Likert scale) Nominal Var: SEX, RACE The practice data Scale Var: SES, MGPA12, SGPA12 Ordinal Var: Science utility and self-efficacy items 80

81 Analysis model Research Question: Can students race, SES and science selfefficacy predict their 12 th grade science GPA score? Dependent Variable: SGPA12 Independent Variables: Race, SES and SEFF_total (sum of 4 items) Auxiliary variables for imputation model: Sex, MGPA12, science utility items Examine the correlation analysis and univariate t-tests 81

82 TASKS : YOU CAN DO IT! Change all missing values (either system missing or user-defined missing value) to a common value, e.g., 999 Assign missing values for all the variables in variable view Define variables : In Variable View -> Under Measure column -> assign the scale for each of the variables Analyze the pattern of missing data and examine the percentage of missing (how many percentage of missing?) Request Little s MCAR test (EM) and Separate Variance t-test Conducting multiple imputation: 10 datasets, 100 iterations Remember to set the maximum and minimum value of science and math GPA to 0 and 4 Create a composite score for science self-efficacy Run a regression model to answer the research question Examine the convergence of model by using iteration history 82

83 Practical Issues/ Myths 83

84 Practical issues/myths Is imputation making up the data? Note really! The goal of imputation is not to produce the individual values and treat them as real data, but to estimate the population parameter and preserve important characteristics of the data set as a whole (Graham, 2008). Account for uncertainty associated with missing data. Thus, unbiased estimates can be obtained. 84

85 Practical issues/myths Should both independent variables and dependent variables be included in the imputation model (MI)? At least, all the variables that you will use in your analysis should be included. Why? When the DV is not included, the correlations between it and IVs are assumed to be 0. Excluding it will reduce its relationships with other variables. Taking a liberal approach for variables selection in the imputation phase. Programs did not distinguish whether a variable is IV or DV! 85

86 Practical issues Why including auxiliary variables? Inclusive Analysis Strategy: ML and MI require MAR and since there is no test for MAR, we need to find ways to increase the likelihood to satisfy MAR. Shafer and Graham (2002, p, 173): collecting data on the potential causes of missingness may effectively convert an MNAR situation to MAR. Incorporates a number of auxiliary variables : help increasing statistical power or reduce biases in parameter estimates. Use as many as you can, most useful are those with correlations

87 Practical issues Working with multiple items questionnaire, whether to impute the individual items or scale scores? If doable, imputing individual items, since it maximizes the information for creating the imputations and have more statistical power than imputing scale scores (Enders, 2010, p ). 87

88 Practical issues What if my missing data is MNAR? Using Selection Modeling and Pattern Mixture Modeling (Chapter 10 in Ender s Applied Missing Data Analysis) These two models deal with the NMAR situation by statistically modeling the missing data mechanism. Enders, C. K. (2011). Missing not at random models for latent growth curve analyses. Psychological methods, 16(1), 1. 88

89 What should I report when I write it up? Missing data mechanisms Percentage of missing for each variable & overall percentage of missing Software for missing data imputation Imputation method & algorithm Number of imputed datasets The variables used in the imputation model 89

90 Reference Enders, C. K. (2010). Applied missing data analysis. Guilford Press. Graham, J. W. (2012). Missing data : analysis and design. Springer. Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual review of psychology, 60, Pigott, T. D. (2001). A review of methods for missing data. Educational research and evaluation, 7(4), Schafer, J. L., & Graham, J. W. (2002). Missing data: our view of the state of the art. Psychological methods, 7(2), 147. Azur, M. J., Stuart, E. A., Frangakis, C., & Leaf, P. J. (2011). Multiple imputation by chained equations: what is it and how does it work?. International journal of methods in psychiatric research, 20(1), Puma, M. J., Olsen, R. B., Bell, S. H., & Price, C. (2009). What to Do when Data Are Missing in Group Randomized Controlled Trials. NCEE National Center for Education Evaluation and Regional Assistance. IBM SPSS Missing Values 21 & 24 (user manual). Buuren, S. V., & Groothuis-Oudshoorn, K. (2010). mice: Multivariate imputation by chained equations in R. Journal of statistical software,

91 UCLA: idre Recommended websites SAS : Stata : 1_new/ Craig Enders website: Mplus: Blimp: 91

92 Thank you Don t be afraid of missing data! 92

- 1 - Fig. A5.1 Missing value analysis dialog box

- 1 - Fig. A5.1 Missing value analysis dialog box WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation

More information

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background An Introduction to Multiple Imputation and its Application Craig K. Enders University of California - Los Angeles Department of Psychology cenders@psych.ucla.edu Background Work supported by Institute

More information

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa Ronald H. Heck 1 In this handout, we will address a number of issues regarding missing data. It is often the case that the weakest point of a study is the quality of the data that can be brought to bear

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex

More information

Missing Data: What Are You Missing?

Missing Data: What Are You Missing? Missing Data: What Are You Missing? Craig D. Newgard, MD, MPH Jason S. Haukoos, MD, MS Roger J. Lewis, MD, PhD Society for Academic Emergency Medicine Annual Meeting San Francisco, CA May 006 INTRODUCTION

More information

Missing Data Techniques

Missing Data Techniques Missing Data Techniques Paul Philippe Pare Department of Sociology, UWO Centre for Population, Aging, and Health, UWO London Criminometrics (www.crimino.biz) 1 Introduction Missing data is a common problem

More information

Missing Data Missing Data Methods in ML Multiple Imputation

Missing Data Missing Data Methods in ML Multiple Imputation Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:

More information

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Multiple Imputation for Missing Data Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Outline Missing data mechanisms What is Multiple Imputation? Software Options

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in

More information

Missing Data. SPIDA 2012 Part 6 Mixed Models with R:

Missing Data. SPIDA 2012 Part 6 Mixed Models with R: The best solution to the missing data problem is not to have any. Stef van Buuren, developer of mice SPIDA 2012 Part 6 Mixed Models with R: Missing Data Georges Monette 1 May 2012 Email: georges@yorku.ca

More information

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Examples: Missing Data Modeling And Bayesian Analysis CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Mplus provides estimation of models with missing data using both frequentist and Bayesian

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,

More information

Multiple imputation using chained equations: Issues and guidance for practice

Multiple imputation using chained equations: Issues and guidance for practice Multiple imputation using chained equations: Issues and guidance for practice Ian R. White, Patrick Royston and Angela M. Wood http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/full By Gabrielle Simoneau

More information

Simulation of Imputation Effects Under Different Assumptions. Danny Rithy

Simulation of Imputation Effects Under Different Assumptions. Danny Rithy Simulation of Imputation Effects Under Different Assumptions Danny Rithy ABSTRACT Missing data is something that we cannot always prevent. Data can be missing due to subjects' refusing to answer a sensitive

More information

NORM software review: handling missing values with multiple imputation methods 1

NORM software review: handling missing values with multiple imputation methods 1 METHODOLOGY UPDATE I Gusti Ngurah Darmawan NORM software review: handling missing values with multiple imputation methods 1 Evaluation studies often lack sophistication in their statistical analyses, particularly

More information

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered

More information

PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects

PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects Let s create a data for a variable measured repeatedly over five occasions We could create raw data (for each

More information

An introduction to SPSS

An introduction to SPSS An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible

More information

Introduction to Mixed Models: Multivariate Regression

Introduction to Mixed Models: Multivariate Regression Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate

More information

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a Chapter 3 Missing Data 3.1 Types of Missing Data ˆ Missing completely at random (MCAR) ˆ Missing at random (MAR) a ˆ Informative missing (non-ignorable non-response) See 1, 38, 59 for an introduction to

More information

The Performance of Multiple Imputation for Likert-type Items with Missing Data

The Performance of Multiple Imputation for Likert-type Items with Missing Data Journal of Modern Applied Statistical Methods Volume 9 Issue 1 Article 8 5-1-2010 The Performance of Multiple Imputation for Likert-type Items with Missing Data Walter Leite University of Florida, Walter.Leite@coe.ufl.edu

More information

Supplementary Notes on Multiple Imputation. Stephen du Toit and Gerhard Mels Scientific Software International

Supplementary Notes on Multiple Imputation. Stephen du Toit and Gerhard Mels Scientific Software International Supplementary Notes on Multiple Imputation. Stephen du Toit and Gerhard Mels Scientific Software International Part A: Comparison with FIML in the case of normal data. Stephen du Toit Multivariate data

More information

Missing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999.

Missing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999. 2 Schafer, J. L., Graham, J. W.: (2002). Missing Data: Our View of the State of the Art. Psychological methods, 2002, Vol 7, No 2, 47 77 Rosner, B. (2005) Fundamentals of Biostatistics, 6th ed, Wiley.

More information

Missing Data. Where did it go?

Missing Data. Where did it go? Missing Data Where did it go? 1 Learning Objectives High-level discussion of some techniques Identify type of missingness Single vs Multiple Imputation My favourite technique 2 Problem Uh data are missing

More information

IBM SPSS Missing Values 21

IBM SPSS Missing Values 21 IBM SPSS Missing Values 21 Note: Before using this information and the product it supports, read the general information under Notices on p. 87. This edition applies to IBM SPSS Statistics 21 and to all

More information

MISSING DATA AND MULTIPLE IMPUTATION

MISSING DATA AND MULTIPLE IMPUTATION Paper 21-2010 An Introduction to Multiple Imputation of Complex Sample Data using SAS v9.2 Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT This

More information

Example Using Missing Data 1

Example Using Missing Data 1 Ronald H. Heck and Lynn N. Tabata 1 Example Using Missing Data 1 Creating the Missing Data Variable (Miss) Here is a data set (achieve subset MANOVAmiss.sav) with the actual missing data on the outcomes.

More information

PRI Workshop Introduction to AMOS

PRI Workshop Introduction to AMOS PRI Workshop Introduction to AMOS Krissy Zeiser Pennsylvania State University klz24@pop.psu.edu 2-pm /3/2008 Setting up the Dataset Missing values should be recoded in another program (preferably with

More information

PASW Missing Values 18

PASW Missing Values 18 i PASW Missing Values 18 For more information about SPSS Inc. software products, please visit our Web site at http://www.spss.com or contact SPSS Inc. 233 South Wacker Drive, 11th Floor Chicago, IL 60606-6412

More information

Handling missing data for indicators, Susanne Rässler 1

Handling missing data for indicators, Susanne Rässler 1 Handling Missing Data for Indicators Susanne Rässler Institute for Employment Research & Federal Employment Agency Nürnberg, Germany First Workshop on Indicators in the Knowledge Economy, Tübingen, 3-4

More information

Epidemiological analysis PhD-course in epidemiology

Epidemiological analysis PhD-course in epidemiology Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 9. oktober 2012 Multivariate tables Agenda today Age standardization Missing data 1 2 3 4 Age standardization

More information

Epidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014

Epidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014 Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 25 th February 2014 Age standardization Incidence and prevalence are strongly agedependent Risks rising

More information

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening Variables Entered/Removed b Variables Entered GPA in other high school, test, Math test, GPA, High school math GPA a Variables Removed

More information

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent

More information

HANDLING MISSING DATA

HANDLING MISSING DATA GSO international workshop Mathematic, biostatistics and epidemiology of cancer Modeling and simulation of clinical trials Gregory GUERNEC 1, Valerie GARES 1,2 1 UMR1027 INSERM UNIVERSITY OF TOULOUSE III

More information

Introduction to Mplus

Introduction to Mplus Introduction to Mplus May 12, 2010 SPONSORED BY: Research Data Centre Population and Life Course Studies PLCS Interdisciplinary Development Initiative Piotr Wilk piotr.wilk@schulich.uwo.ca OVERVIEW Mplus

More information

An imputation approach for analyzing mixed-mode surveys

An imputation approach for analyzing mixed-mode surveys An imputation approach for analyzing mixed-mode surveys Jae-kwang Kim 1 Iowa State University June 4, 2013 1 Joint work with S. Park and S. Kim Ouline Introduction Proposed Methodology Application to Private

More information

Multiple-imputation analysis using Stata s mi command

Multiple-imputation analysis using Stata s mi command Multiple-imputation analysis using Stata s mi command Yulia Marchenko Senior Statistician StataCorp LP 2009 UK Stata Users Group Meeting Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi

More information

Predict Outcomes and Reveal Relationships in Categorical Data

Predict Outcomes and Reveal Relationships in Categorical Data PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,

More information

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU. SOS3003 Applied data analysis for social science Lecture note 04-2009 Erling Berge Department of sociology and political science NTNU Erling Berge 2009 1 Missing data Literature Allison, Paul D 2002 Missing

More information

Handling Data with Three Types of Missing Values:

Handling Data with Three Types of Missing Values: Handling Data with Three Types of Missing Values: A Simulation Study Jennifer Boyko Advisor: Ofer Harel Department of Statistics University of Connecticut Storrs, CT May 21, 2013 Jennifer Boyko Handling

More information

Generalized least squares (GLS) estimates of the level-2 coefficients,

Generalized least squares (GLS) estimates of the level-2 coefficients, Contents 1 Conceptual and Statistical Background for Two-Level Models...7 1.1 The general two-level model... 7 1.1.1 Level-1 model... 8 1.1.2 Level-2 model... 8 1.2 Parameter estimation... 9 1.3 Empirical

More information

Excel 2010 with XLSTAT

Excel 2010 with XLSTAT Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with

More information

Blimp User s Guide. Version 1.0. Brian T. Keller. Craig K. Enders.

Blimp User s Guide. Version 1.0. Brian T. Keller. Craig K. Enders. Blimp User s Guide Version 1.0 Brian T. Keller bkeller2@ucla.edu Craig K. Enders cenders@psych.ucla.edu September 2017 Developed by Craig K. Enders and Brian T. Keller. Blimp was developed with funding

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

AMELIA II: A Program for Missing Data

AMELIA II: A Program for Missing Data AMELIA II: A Program for Missing Data Amelia II is an R package that performs multiple imputation to deal with missing data, instead of other methods, such as pairwise and listwise deletion. In multiple

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Multiple Imputation with Mplus

Multiple Imputation with Mplus Multiple Imputation with Mplus Tihomir Asparouhov and Bengt Muthén Version 2 September 29, 2010 1 1 Introduction Conducting multiple imputation (MI) can sometimes be quite intricate. In this note we provide

More information

Missing Data Analysis with the Mahalanobis Distance

Missing Data Analysis with the Mahalanobis Distance Missing Data Analysis with the Mahalanobis Distance by Elaine M. Berkery, B.Sc. Department of Mathematics and Statistics, University of Limerick A thesis submitted for the award of M.Sc. Supervisor: Dr.

More information

An Introduction to Growth Curve Analysis using Structural Equation Modeling

An Introduction to Growth Curve Analysis using Structural Equation Modeling An Introduction to Growth Curve Analysis using Structural Equation Modeling James Jaccard New York University 1 Overview Will introduce the basics of growth curve analysis (GCA) and the fundamental questions

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

[/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n} {NOT} {NODF} {NOPROB}] {NOCOUNTS} {NOMEANS}

[/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n} {NOT} {NODF} {NOPROB}] {NOCOUNTS} {NOMEANS} MVA MVA [VARIABLES=] {varlist} {ALL } [/CATEGORICAL=varlist] [/MAXCAT={25 ** }] {n } [/ID=varname] Description: [/NOUNIVARIATE] [/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n}

More information

Lecture 26: Missing data

Lecture 26: Missing data Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Missing Data in Orthopaedic Research

Missing Data in Orthopaedic Research in Orthopaedic Research Keith D Baldwin, MD, MSPT, MPH, Pamela Ohman-Strickland, PhD Abstract Missing data can be a frustrating problem in orthopaedic research. Many statistical programs employ a list-wise

More information

A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY

A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY Norman Solomon School of Computing and Technology University of Sunderland A thesis submitted in partial fulfilment of the requirements of the University

More information

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types

More information

Performance of Sequential Imputation Method in Multilevel Applications

Performance of Sequential Imputation Method in Multilevel Applications Section on Survey Research Methods JSM 9 Performance of Sequential Imputation Method in Multilevel Applications Enxu Zhao, Recai M. Yucel New York State Department of Health, 8 N. Pearl St., Albany, NY

More information

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data Introduction About this Document This manual was written by members of the Statistical Consulting Program as an introduction to SPSS 12.0. It is designed to assist new users in familiarizing themselves

More information

Creating a data file and entering data

Creating a data file and entering data 4 Creating a data file and entering data There are a number of stages in the process of setting up a data file and analysing the data. The flow chart shown on the next page outlines the main steps that

More information

Bootstrap and multiple imputation under missing data in AR(1) models

Bootstrap and multiple imputation under missing data in AR(1) models EUROPEAN ACADEMIC RESEARCH Vol. VI, Issue 7/ October 2018 ISSN 2286-4822 www.euacademic.org Impact Factor: 3.4546 (UIF) DRJI Value: 5.9 (B+) Bootstrap and multiple imputation under missing ELJONA MILO

More information

Longitudinal Modeling With Randomly and Systematically Missing Data: A Simulation of Ad Hoc, Maximum Likelihood, and Multiple Imputation Techniques

Longitudinal Modeling With Randomly and Systematically Missing Data: A Simulation of Ad Hoc, Maximum Likelihood, and Multiple Imputation Techniques 10.1177/1094428103254673 ORGANIZATIONAL Newman / LONGITUDINAL RESEARCH MODELS METHODS WITH MISSING DATA ARTICLE Longitudinal Modeling With Randomly and Systematically Missing Data: A Simulation of Ad Hoc,

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use

More information

SPSS INSTRUCTION CHAPTER 9

SPSS INSTRUCTION CHAPTER 9 SPSS INSTRUCTION CHAPTER 9 Chapter 9 does no more than introduce the repeated-measures ANOVA, the MANOVA, and the ANCOVA, and discriminant analysis. But, you can likely envision how complicated it can

More information

Robust Linear Regression (Passing- Bablok Median-Slope)

Robust Linear Regression (Passing- Bablok Median-Slope) Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their

More information

Missing Data Part 1: Overview, Traditional Methods Page 1

Missing Data Part 1: Overview, Traditional Methods Page 1 Missing Data Part 1: Overview, Traditional Methods Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 17, 2015 This discussion borrows heavily from: Applied

More information

Missing data analysis. University College London, 2015

Missing data analysis. University College London, 2015 Missing data analysis University College London, 2015 Contents 1. Introduction 2. Missing-data mechanisms 3. Missing-data methods that discard data 4. Simple approaches that retain all the data 5. RIBG

More information

Types of missingness and common strategies

Types of missingness and common strategies 9 th UK Stata Users Meeting 20 May 2003 Multiple imputation for missing data in life course studies Bianca De Stavola and Valerie McCormack (London School of Hygiene and Tropical Medicine) Motivating example

More information

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values Chapter 500 Introduction This procedure produces tables of frequency counts and percentages for categorical and continuous variables. This procedure serves as a summary reporting tool and is often used

More information

Improving Imputation Accuracy in Ordinal Data Using Classification

Improving Imputation Accuracy in Ordinal Data Using Classification Improving Imputation Accuracy in Ordinal Data Using Classification Shafiq Alam 1, Gillian Dobbie, and XiaoBin Sun 1 Faculty of Business and IT, Whitireia Community Polytechnic, Auckland, New Zealand shafiq.alam@whitireia.ac.nz

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

IBM SPSS Categories 23

IBM SPSS Categories 23 IBM SPSS Categories 23 Note Before using this information and the product it supports, read the information in Notices on page 55. Product Information This edition applies to version 23, release 0, modification

More information

Multiple Imputation for Continuous and Categorical Data: Comparing Joint and Conditional Approaches

Multiple Imputation for Continuous and Categorical Data: Comparing Joint and Conditional Approaches Multiple Imputation for Continuous and Categorical Data: Comparing Joint and Conditional Approaches Jonathan Kropko University of Virginia Ben Goodrich Columbia University Andrew Gelman Columbia University

More information

Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models

Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models Bengt Muth en University of California, Los Angeles Tihomir Asparouhov Muth en & Muth en Mplus

More information

Approaches to Missing Data

Approaches to Missing Data Approaches to Missing Data A Presentation by Russell Barbour, Ph.D. Center for Interdisciplinary Research on AIDS (CIRA) and Eugenia Buta, Ph.D. CIRA and The Yale Center of Analytical Studies (YCAS) April

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

Package midastouch. February 7, 2016

Package midastouch. February 7, 2016 Type Package Version 1.3 Package midastouch February 7, 2016 Title Multiple Imputation by Distance Aided Donor Selection Date 2016-02-06 Maintainer Philipp Gaffert Depends R (>=

More information

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by Paper CC-016 A macro for nearest neighbor Lung-Chang Chien, University of North Carolina at Chapel Hill, Chapel Hill, NC Mark Weaver, Family Health International, Research Triangle Park, NC ABSTRACT SAS

More information

Faculty of Sciences. Holger Cevallos Valdiviezo

Faculty of Sciences. Holger Cevallos Valdiviezo Faculty of Sciences Handling of missing data in the predictor variables when using Tree-based techniques for training and generating predictions Holger Cevallos Valdiviezo Master dissertation submitted

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

Enterprise Miner Tutorial Notes 2 1

Enterprise Miner Tutorial Notes 2 1 Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender

More information

Canadian National Longitudinal Survey of Children and Youth (NLSCY)

Canadian National Longitudinal Survey of Children and Youth (NLSCY) Canadian National Longitudinal Survey of Children and Youth (NLSCY) Fathom workshop activity For more information about the survey, see: http://www.statcan.ca/ Daily/English/990706/ d990706a.htm Notice

More information

Multidimensional Latent Regression

Multidimensional Latent Regression Multidimensional Latent Regression Ray Adams and Margaret Wu, 29 August 2010 In tutorial seven, we illustrated how ConQuest can be used to fit multidimensional item response models; and in tutorial five,

More information

Multivariate Capability Analysis

Multivariate Capability Analysis Multivariate Capability Analysis Summary... 1 Data Input... 3 Analysis Summary... 4 Capability Plot... 5 Capability Indices... 6 Capability Ellipse... 7 Correlation Matrix... 8 Tests for Normality... 8

More information

Missing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA

Missing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA Missing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA ABSTRACT Statistical analyses can be greatly hampered by missing

More information

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement

More information

MHPE 494: Data Analysis. Welcome! The Analytic Process

MHPE 494: Data Analysis. Welcome! The Analytic Process MHPE 494: Data Analysis Alan Schwartz, PhD Department of Medical Education Memoona Hasnain,, MD, PhD, MHPE Department of Family Medicine College of Medicine University of Illinois at Chicago Welcome! Your

More information

R software and examples

R software and examples Handling Missing Data in R with MICE Handling Missing Data in R with MICE Why this course? Handling Missing Data in R with MICE Stef van Buuren, Methodology and Statistics, FSBS, Utrecht University Netherlands

More information

An Algorithm for Creating Models for Imputation Using the MICE Approach:

An Algorithm for Creating Models for Imputation Using the MICE Approach: An Algorithm for Creating Models for Imputation Using the MICE Approach: An application in Stata Rose Anne rosem@ats.ucla.edu Statistical Consulting Group Academic Technology Services University of California,

More information

7.4 Tutorial #4: Profiling LC Segments Using the CHAID Option

7.4 Tutorial #4: Profiling LC Segments Using the CHAID Option 7.4 Tutorial #4: Profiling LC Segments Using the CHAID Option DemoData = gss82.sav After an LC model is estimated, it is often desirable to describe (profile) the resulting latent classes in terms of demographic

More information

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT This chapter provides step by step instructions on how to define and estimate each of the three types of LC models (Cluster, DFactor or Regression) and also

More information

Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation. Filip Lindhfors and Farhana Morko

Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation. Filip Lindhfors and Farhana Morko Bachelor thesis Department of Statistics Kandidatuppsats, Statistiska institutionen Nr 2014:5 Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation Filip

More information

Development of weighted model fit indexes for structural equation models using multiple imputation

Development of weighted model fit indexes for structural equation models using multiple imputation Graduate Theses and Dissertations Graduate College 2011 Development of weighted model fit indexes for structural equation models using multiple imputation Cherie Joy Kientoff Iowa State University Follow

More information