R software and examples
|
|
- Henry Hutchinson
- 5 years ago
- Views:
Transcription
1 Handling Missing Data in R with MICE Handling Missing Data in R with MICE Why this course? Handling Missing Data in R with MICE Stef van Buuren, Methodology and Statistics, FSBS, Utrecht University Netherlands Organization for Applied Scientific Research TNO, Leiden Winnipeg, June, 7 Missing data are everywhere Ad-hoc fixes often do not work Multiple imputation is broadly applicable, yield correct statistical inferences, and there is good software Goal of the course: get comfortable with a modern and powerful way of solving missing data problems Handling Missing Data in R with MICE Course materials Handling Missing Data in R with MICE Reading materials Van Buuren, S. and Groothuis-Oudshoorn, C.G.M. (). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 5(), Van Buuren, S. (). Flexible Imputation of Missing Data. Chapman & Hall/CRC, Boca Raton, FL. Chapters 6,. Handling Missing Data in R with MICE Flexible Imputation of Missing Data (FIMD) Handling Missing Data in R with MICE R software and examples R Install from RStudio: Install from R package mice. or higher: from CRAN or from More examples: Handling Missing Data in R with MICE > Time table Time table (morning) Handling Missing Data in R with MICE > Time table Time table (afternoon) Time Session L/P Description L Overview I L Introduction to missing data. -. I P Ad hoc methods + MICE. -.5 PAUSE.5 -. II L Multiple imputation. -. II P Boys data. -.5 PAUSE Time Session L/P Description.5 -. III L Generating plausible imputations. -. III P Algorithmic convergence and pooling. -.5 PAUSE IV L Imputation in practice IV P Post-processing and passive imputation V L Guidelines for reporting
2 Handling Missing Data in R with MICE > I > Handling Missing Data in R with MICE > I > Problem of missing data Why are missing data interesting? SESSION I Obviously the best way to treat missing data is not to have them. (Orchard and Woodbury 97) Sooner or later (usually sooner), anyone who does statistical analysis runs into problems with missing data (Allison, ) Missing data problems are the heart of statistics Handling Missing Data in R with MICE > I > Problem of missing data Causes of missing data Handling Missing Data in R with MICE > I > Problem of missing data Consequences of missing data Respondent skipped the item Data transmission/coding error Drop out in longitudinal research Refusal to cooperate Sample from population Question not asked, di erent forms Censoring Less information than planned Enough statistical power? Di erent analyses, di erent n s Cannot calculate even the mean Systematic biases in the analysis Appropriate confidence interval, P-values? In general, missing data can severely complicate interpretation and analysis. Listwise deletion Listwise deletion Analyze only the complete records Also known as Complete Case Analysis (CCA) Advantages Simple (default in most software) Unbiased under MCAR Correct standard errors, significance levels Two special properties in regression Disadvantages Wasteful Large standard errors Biased under MAR, even for simple statistics like the mean Inconsistencies in reporting Mean imputation Mean imputation Replace the missing values by the mean of the observed data Advantages Simple Unbiased for the mean, under MCAR Frequency 5 Ozone (ppb) Ozone (ppb) Solar Radiation (lang)
3 Mean imputation Regression imputation Disadvantages Disturbs the distribution Underestimates the variance Biases correlations to zero Biased under MAR AVOID (unless you know what you are doing) Also known as prediction Fit model for Y obs under listwise deletion Predict Y mis for records with missing Y s Replace missing values by prediction Advantages Unbiased estimates of regression coe cients (under MAR) Good approximation to the (unknown) true data if explained variance is high Prediction is the favorite among non-statisticians Regression imputation Regression imputation Frequency Ozone (ppb) 5 5 Disadvantages Artificially increases correlations Systematically underestimates the variance Too optimistic P-values and too short confidence intervals AVOID. Harmful to statistical inference Ozone (ppb) Solar Radiation (lang) Stochastic regression imputation Stochastic regression imputation Like regression imputation, but adds appropriate noise to the predictions to reflect uncertainty Advantages Preserves the distribution of Y obs Preserves the correlation between Y and X in the imputed data Frequency Ozone (ppb) Ozone (ppb) Solar Radiation (lang) Stochastic regression imputation Single imputation methods, wrapup Disadvantages Symmetric and constant error restrictive Single imputation does not take uncertainty imputed data into account, and incorrectly treats them as real Not so simple anymore Underestimate uncertainty caused by the missing data Unbiased only under restrictive assumptions
4 Alternatives Handling Missing Data in R with MICE > II > Maximum Likelihood, Direct Likelihood Weighting Multiple Imputation SESSION II Little, R.J.A. Rubin D.B. () Statistical Analysis with Missing Data. Second Edition. John Wiley Sons, New York. Handling Missing Data in R with MICE > II > What is multiple imputation Rising popularity of multiple imputation Handling Missing Data in R with MICE > II > What is multiple imputation Main steps used in multiple imputation Number of publications (log) 5 5 early publications 'multiple imputation' in abstract 'multiple imputation' in title - R - R Year Incomplete data Imputed data Analysis results Pooled results Handling Missing Data in R with MICE > II > What is multiple imputation Steps in mice Handling Missing Data in R with MICE > II > Goal Estimand incomplete data imputed data analysis results pooled results Q is a quantity of scientific interest in the population. mice() with() pool() Q can be a vector of population means, population regression weights, population variances, and so on. Q may not depend on the particular sample, thus Q cannot be a standard error, sample mean, p-value, and so on. data frame mids mira mipo Handling Missing Data in R with MICE > II > Goal Goal of multiple imputation Handling Missing Data in R with MICE > II > Multiple imputation theory Pooled estimate Q Estimate Q by ˆQ or Q accompanied by a valid estimate of its uncertainty. What is the di erence between ˆQ or Q? ˆQ and Q both estimate Q ˆQ accounts for the sampling uncertainty Q accounts for the sampling and missing data uncertainty ˆQ` is the estimate of the `-th repeated imputation ˆQ` contains k parameters and is represented as a k column vector The pooled estimate Q is simply the average Q = mx ˆQ` () m `=
5 Handling Missing Data in R with MICE > II > Multiple imputation theory Within-imputation variance Handling Missing Data in R with MICE > II > Multiple imputation theory Between-imputation variance Average of the complete-data variances as Ū = mx Ū`, () m `= where Ū` is the variance-covariance matrix of ˆQ` obtained for the `-th imputation Ū` is the variance is the estimate, not the variance in the data Variance between the m complete-data estimates is given by B = m mx ( ˆQ` `= Q)( ˆQ` Q), () where Q is the pooled estimate (c.f. equation ) The between-imputation variance is large there many missing data The within-imputation variance is large if the sample is small Handling Missing Data in R with MICE > II > Multiple imputation theory Total variance Handling Missing Data in R with MICE > II > Multiple imputation theory Three sources of variation The total variance is not simply T = Ū + B The correct formula is for the total variance of Q, and hence of (Q The term B/m is the simulation error T = Ū + B + B/m = Ū + + B () m Q) if Q is unbiased In summary, the total variance T stems from three sources: Ū, thevariancecausedbythefactthatwearetakingasample rather than the entire population. This is the conventional statistical measure of variability; B, theextravariancecausedbythefactthattherearemissing values in the sample; B/m, the extra simulation variance caused by the fact that Q itself is based on finite m. Handling Missing Data in R with MICE > II > Multiple imputation theory Variance ratio s () Handling Missing Data in R with MICE > II > Multiple imputation theory Variance ratio s () Proportion of the variation attributable to the missing data = B + B/m, (5) T Relative increase in variance due to nonresponse r = B + B/m Ū These are related by r = /( ). (6) Fraction of information about Q missing due to nonresponse r +/( + ) = +r This measure needs an estimate of the degrees of freedom. Relation between and = (8) The literature often confuses and. (7) Handling Missing Data in R with MICE > II > Statistical inference Statistical inference for Q () Handling Missing Data in R with MICE > II > Statistical inference Statistical inference for Q () The ( )% confidence interval of a Q is calculated as Q ± t (, /) p T, (9) where t (, /) is the quantile corresponding to probability / of t. For example, use t(,.975) =. for the 95% confidence interval for =. Suppose we test the null hypothesis Q = Q for some specified value Q.Wecanfindthep-valueof the test as the probability apple P s =Pr F, > (Q Q) () T where F, is an F distribution with and degrees of freedom.
6 Handling Missing Data in R with MICE > II > Statistical inference Handling Missing Data in R with MICE > II > Statistical inference Degrees of freedom () Degrees of freedom () With missing data, n is e ectively lower. Thus, the degrees of freedom in statistical tests need to be adjusted. The new formula is = The old formula assumes n = : old = (m ) + r m = old obs. old + obs () where the estimated observed-data degrees of freedom that accounts for the missing information is obs = () with com = n com + com ( com + ). () k. Handling Missing Data in R with MICE > II > How many imputations? Handling Missing Data in R with MICE > II > How many imputations? How large should m be? The legacy Classic advice: m =, 5,. More recently: set m higher:. Some advice Use m = 5 or m = if the fraction of missing information is low, <.. Develop your model with m = 5. Do final run with m equal to percentage of incomplete cases. Repeat the analysis with m = 5 with di erent seeds. If there are large di erences for some parameters, this means that the data contain little information about them. Handling Missing Data in R with MICE > II > How many imputations? Handling Missing Data in R with MICE > III > Introductions to multiple imputation Schafer, J.L. (999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8(), 5. Sterne et al (9). Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ, 8, b9. Van Buuren, S. (). Flexible Imputation of Missing Data. Chapman & Hall/CRC, Boca Raton, FL. SESSION III 5 6 a We delete gas consumption of observation 7 7 Relation between temperature and gas consumption deleted observation
7 Predict imputed value from regression line Predicted value + noise b c Predicted value + noise + parameter uncertainty Imputation based on two predictors d e Predictive mean matching: Y given X Add two regression lines Predicted given 5 C, Define a matching range ŷ ±
8 Select potential donors Bayesian PMM: Draw a line Define a matching range ŷ ± Select potential donors Imputation of a binary variable Fit logistic model logistic regression Pr(y i = X i, )= exp(x i ) +exp(x i ). () Probability Linear predictor Draw parameter estimate Read o the probability Probability Probability Linear predictor Linear predictor
9 Impute ordered categorical variable Fit ordered logit model K ordered categories k =,...,K ordered logit model, or proportional odds model Pr(y i = k X i, )= exp( k + X i ) P K k= exp( k + X i ) (5) Probability Linear predictor Read o the probability Other types of variables Probability Count data Semi-continuous data Censored data Truncated data Rounded data Linear predictor Univariate imputation in mice Method Description Scale type pmm Predictive mean matching numeric norm Bayesian linear regression numeric norm.nob Linear regression, non-bayesian numeric norm.boot Linear regression with bootstrap numeric mean Unconditional mean imputation numeric L.norm Two-level linear model numeric logreg Logistic regression factor, levels logreg.boot Logistic regression with bootstrap factor, levels polyreg Multinomial logit model factor, > levels polr Ordered logit model ordered, > levels lda Linear discriminant analysis factor sample Simple random sample any Problems in multivariate imputation Predictors themselves can be incomplete Mixed measurement levels Order of imputation can be meaningful Too many predictor variables Relations could be nonlinear Higher order interactions Impossible combinations Three general strategies Imputation of monotone pattern X Y Y Y Y Monotone data imputation Joint modeling Fully conditional specification (FCS)
10 Imputation of monotone pattern Imputation of monotone pattern X Y Y Y Y X Y Y Y Y Joint Modeling (JM) Joint modeling: Software Specify joint model P(Y, X, R) Derive P(Y mis Y obs, X, R) Use MCMC techniques to draw imputations Y mis R/S Plus SAS STATA Stand-alone norm, cat, mix, pan, Amelia proc MI, proc MIANALYZE MI command Amelia, solas, norm, pan Joint Modeling: Pro s Joint Modeling: Con s Yield correct statistical inference under the assumed JM E cient parametrization (if the model fits) Known theoretical properties Works very well for parameters close to the center Many applications Lack of flexibility May lead to large models Can assume more than the complete data problem Can impute impossible data Fully Conditional Specification (FCS) Multivariate Imputation by Chained Equations (MICE) MICE algorithm Specify P(Y mis Y obs, X, R) Use MCMC techniques to draw imputations Y mis Specify imputation model for each incomplete column Fill in starting imputations And iterate Model: Fully Conditional Specification (FCS)
11 Fully Conditional Specification: Con s Fully Conditional Specification: Pro s Theoretical properties only known in special cases Cannot use computational shortcuts, like sweep-operator Joint distribution may not exist (incompatibility) Easy and flexible Imputes close to the data, prevents impossible data Subset selection of predictors Modular, can preserve valuable work Works well, both in simulations and practice Fully Conditional Specification (FCS): Software How many iterations? R mice, transcan, mi, VIM, baboon SPSS V7 procedure multiple imputation SAS IVEware, SAS 9. STATA ice command, multiple imputation command Stand-alone Solas, Mplus Quick convergence 5 iterations is adequate for most problems More iterations is is high inspect the generated imputations Monitor convergence to detect anomalies Non-convergence Convergence mean mean mean hgt wgt sd sd sd hgt wgt mean mean mean hgt wgt sd sd sd hgt wgt Iteration Iteration Handling Missing Data in R with MICE > IV > Handling Missing Data in R with MICE > IV > Modeling choices Imputation model choices SESSION IV MAR or MNAR Form of the imputation model Which predictors Derived variables 5 What is m? 6 Order of imputation 7 Diagnostics, convergence
12 Handling Missing Data in R with MICE > IV > Which predictors Which predictors? Derived variables Include all variables that appear in the complete-data model In addition, include the variables that are related to the nonresponse In addition, include variables that explain a considerable amount of variance Remove from the variables selected in steps and those variables that have too many missing values within the subgroup of incomplete cases. Function quickpred() and flux() ratio of two variables sum score index variable quadratic relations interaction term conditional imputation compositions How to impute a ratio? Method POST weight/height ratio: whr=wgt/hgt kg/m. Easy if only one of wgt or hgt or whr is missing Methods POST: Impute wgt and hgt, and calculate whr after imputation JAV: Impute whr as just another variable PASSIVE: Impute wgt and hgt, and calculate whr during imputation PASSIVE: As PASSIVE with adapted predictor matrix > imp <- mice(boys) > long <- complete(imp, "long", inc = TRUE) > long$whr <- with(long, wgt/(hgt/)) > imp <- longmids(long) Method JAV: Just another variable Method JAV 6 JAV 5 5 passive passive > boys$whr <- boys$wgt/(boys$hgt/) > imp.jav <- mice(boys, m =, seed = 9, maxit = ) Weight/Height (kg/m) Height (cm) 5 5 Method PASSIVE Method PASSIVE, predictor matrix > meth["whr"] <- "~I(wgt/(hgt/))" age hgt wgt hc gen phb tv reg whr age hgt wgt hc gen phb tv reg whr
13 Method PASSIVE Method PASSIVE 5 JAV 5 passive passive Weight/Height (kg/m) 6 5 > pred[c("wgt", "hgt", "hc", "reg"), ""] <- > pred[c("gen", "phb", "tv"), c("hgt", "wgt", "hc")] <- > pred[, "whr"] < Height (cm) Method PASSIVE, predictor matrix Method PASSIVE 5 age hgt wgt hc gen phb tv reg whr 5 passive passive 6 Weight/Height (kg/m) age hgt wgt hc gen phb tv reg whr JAV Height (cm) Handling Missing Data in R with MICE > IV > Diagnostics Derived variables: summary Standard diagnostic plots in mice Since mice.5, plots for imputed data: Derived variables pose special challenges Plausible values respect data dependencies one-dimensional scatter: stripplot If you can, create derived variables after imputation box-and-whisker plot: bwplot If you cannot, use passive imputation densities: densityplot Break up direct feedback loops using the predictor matrix scattergram: xyplot Handling Missing Data in R with MICE > IV > Diagnostics Handling Missing Data in R with MICE > IV > Diagnostics Stripplot stripplot(imp, pch=c(,9)) age. > library(mice) > imp <- mice(nhanes, seed = 998) > stripplot(imp, pch = c(, 9)) chl hyp 5 Imputation number
14 Handling Missing Data in R with MICE > IV > Diagnostics Alargerdataset Handling Missing Data in R with MICE > IV > Diagnostics bwplot(imp) > imp <- mice(boys, seed =, maxit = ) > bwplot(imp) 5 5 age hgt 5 hc 6 8 wgt 5 tv Imputation number Handling Missing Data in R with MICE > IV > Diagnostics densityplot(imp) Handling Missing Data in R with MICE > V >..... hgt.... wgt SESSION V Density hc tv Handling Missing Data in R with MICE > V > Reporting guidelines Reporting guidelines Amount of missing data Reasons for missingness Di erences between complete and incomplete data Method used to account for missing data 5 Software 6 Number of imputed datasets 7 Imputation model 8 Derived variables 9 Diagnostics Pooling Listwise deletion Sensitivity analysis
Missing Data: What Are You Missing?
Missing Data: What Are You Missing? Craig D. Newgard, MD, MPH Jason S. Haukoos, MD, MS Roger J. Lewis, MD, PhD Society for Academic Emergency Medicine Annual Meeting San Francisco, CA May 006 INTRODUCTION
More informationMultiple-imputation analysis using Stata s mi command
Multiple-imputation analysis using Stata s mi command Yulia Marchenko Senior Statistician StataCorp LP 2009 UK Stata Users Group Meeting Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi
More informationMultiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health
Multiple Imputation for Missing Data Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Outline Missing data mechanisms What is Multiple Imputation? Software Options
More informationPackage midastouch. February 7, 2016
Type Package Version 1.3 Package midastouch February 7, 2016 Title Multiple Imputation by Distance Aided Donor Selection Date 2016-02-06 Maintainer Philipp Gaffert Depends R (>=
More informationin this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a
Chapter 3 Missing Data 3.1 Types of Missing Data ˆ Missing completely at random (MCAR) ˆ Missing at random (MAR) a ˆ Informative missing (non-ignorable non-response) See 1, 38, 59 for an introduction to
More informationNORM software review: handling missing values with multiple imputation methods 1
METHODOLOGY UPDATE I Gusti Ngurah Darmawan NORM software review: handling missing values with multiple imputation methods 1 Evaluation studies often lack sophistication in their statistical analyses, particularly
More informationBootstrap and multiple imputation under missing data in AR(1) models
EUROPEAN ACADEMIC RESEARCH Vol. VI, Issue 7/ October 2018 ISSN 2286-4822 www.euacademic.org Impact Factor: 3.4546 (UIF) DRJI Value: 5.9 (B+) Bootstrap and multiple imputation under missing ELJONA MILO
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1
More informationHandling missing data for indicators, Susanne Rässler 1
Handling Missing Data for Indicators Susanne Rässler Institute for Employment Research & Federal Employment Agency Nürnberg, Germany First Workshop on Indicators in the Knowledge Economy, Tübingen, 3-4
More informationMultiple imputation using chained equations: Issues and guidance for practice
Multiple imputation using chained equations: Issues and guidance for practice Ian R. White, Patrick Royston and Angela M. Wood http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/full By Gabrielle Simoneau
More informationSimulation of Imputation Effects Under Different Assumptions. Danny Rithy
Simulation of Imputation Effects Under Different Assumptions Danny Rithy ABSTRACT Missing data is something that we cannot always prevent. Data can be missing due to subjects' refusing to answer a sensitive
More informationHandling Data with Three Types of Missing Values:
Handling Data with Three Types of Missing Values: A Simulation Study Jennifer Boyko Advisor: Ofer Harel Department of Statistics University of Connecticut Storrs, CT May 21, 2013 Jennifer Boyko Handling
More informationMODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES
UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in
More informationMissing Data Techniques
Missing Data Techniques Paul Philippe Pare Department of Sociology, UWO Centre for Population, Aging, and Health, UWO London Criminometrics (www.crimino.biz) 1 Introduction Missing data is a common problem
More informationMissing Data Missing Data Methods in ML Multiple Imputation
Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:
More informationMissing Data Analysis with SPSS
Missing Data Analysis with SPSS Meng-Ting Lo (lo.194@osu.edu) Department of Educational Studies Quantitative Research, Evaluation and Measurement Program (QREM) Research Methodology Center (RMC) Outline
More informationGroup Level Imputation of Statistic Maps, Version 1.0. User Manual by Kenny Vaden, Ph.D. musc. edu)
Updated: March 29, 2012 1 Group Level Imputation of Statistic Maps, Version 1.0 User Manual by Kenny Vaden, Ph.D. (vaden @ musc. edu) Notice for use in Academic Work: If our toolkit is used in an academic
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients
More informationMissing Data and Imputation
Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex
More informationMissing Data. SPIDA 2012 Part 6 Mixed Models with R:
The best solution to the missing data problem is not to have any. Stef van Buuren, developer of mice SPIDA 2012 Part 6 Mixed Models with R: Missing Data Georges Monette 1 May 2012 Email: georges@yorku.ca
More informationThe mice Package. June 25, 2007
The mice Package June 25, 2007 Version 1.16 Date 2007-06-25 Title Multivariate Imputation by Chained Equations Author S. Van Buuren & C.G.M. Oudshoorn Maintainer Roel de Jong Depends
More informationMotivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background
An Introduction to Multiple Imputation and its Application Craig K. Enders University of California - Los Angeles Department of Psychology cenders@psych.ucla.edu Background Work supported by Institute
More informationThe mice Package. April 3, 2006
Version 1.14 Date 9/16/2005 The mice Package April 3, 2006 Title Multivariate Imputation by Chained Equations Author S. Van Buuren & C.G.M. Oudshoorn [R: peter.malewski@gmx.de] Maintainer Roel de Jong
More informationEpidemiological analysis PhD-course in epidemiology
Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 9. oktober 2012 Multivariate tables Agenda today Age standardization Missing data 1 2 3 4 Age standardization
More informationEpidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014
Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 25 th February 2014 Age standardization Incidence and prevalence are strongly agedependent Risks rising
More informationMISSING DATA AND MULTIPLE IMPUTATION
Paper 21-2010 An Introduction to Multiple Imputation of Complex Sample Data using SAS v9.2 Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT This
More informationGarret M. Fitzmaurice, Michael G. Kenward, Geert Molenberghs, Anastasios A. Tsiatis, Geert Verbeke. Handbook of Missing Data
Garret M. Fitzmaurice, Michael G. Kenward, Geert Molenberghs, Anastasios A. Tsiatis, Geert Verbeke Handbook of Missing Data 2 Contents I Multiple Imputation 1 1 Fully conditional specification 3 Stef van
More informationSOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.
SOS3003 Applied data analysis for social science Lecture note 04-2009 Erling Berge Department of sociology and political science NTNU Erling Berge 2009 1 Missing data Literature Allison, Paul D 2002 Missing
More informationMissing Data Analysis with the Mahalanobis Distance
Missing Data Analysis with the Mahalanobis Distance by Elaine M. Berkery, B.Sc. Department of Mathematics and Statistics, University of Limerick A thesis submitted for the award of M.Sc. Supervisor: Dr.
More informationPerformance of Sequential Imputation Method in Multilevel Applications
Section on Survey Research Methods JSM 9 Performance of Sequential Imputation Method in Multilevel Applications Enxu Zhao, Recai M. Yucel New York State Department of Health, 8 N. Pearl St., Albany, NY
More informationAnalysis of Incomplete Multivariate Data
Analysis of Incomplete Multivariate Data J. L. Schafer Department of Statistics The Pennsylvania State University USA CHAPMAN & HALL/CRC A CR.C Press Company Boca Raton London New York Washington, D.C.
More informationCHAPTER 1 INTRODUCTION
Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,
More informationMachine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017
Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis
More informationRonald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa
Ronald H. Heck 1 In this handout, we will address a number of issues regarding missing data. It is often the case that the weakest point of a study is the quality of the data that can be brought to bear
More informationCHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS
Examples: Missing Data Modeling And Bayesian Analysis CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Mplus provides estimation of models with missing data using both frequentist and Bayesian
More informationMissing Data. Where did it go?
Missing Data Where did it go? 1 Learning Objectives High-level discussion of some techniques Identify type of missingness Single vs Multiple Imputation My favourite technique 2 Problem Uh data are missing
More informationHANDLING MISSING DATA
GSO international workshop Mathematic, biostatistics and epidemiology of cancer Modeling and simulation of clinical trials Gregory GUERNEC 1, Valerie GARES 1,2 1 UMR1027 INSERM UNIVERSITY OF TOULOUSE III
More informationThe Performance of Multiple Imputation for Likert-type Items with Missing Data
Journal of Modern Applied Statistical Methods Volume 9 Issue 1 Article 8 5-1-2010 The Performance of Multiple Imputation for Likert-type Items with Missing Data Walter Leite University of Florida, Walter.Leite@coe.ufl.edu
More informationMissing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999.
2 Schafer, J. L., Graham, J. W.: (2002). Missing Data: Our View of the State of the Art. Psychological methods, 2002, Vol 7, No 2, 47 77 Rosner, B. (2005) Fundamentals of Biostatistics, 6th ed, Wiley.
More informationCHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA
Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent
More informationLecture 26: Missing data
Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:
More informationFaculty of Sciences. Holger Cevallos Valdiviezo
Faculty of Sciences Handling of missing data in the predictor variables when using Tree-based techniques for training and generating predictions Holger Cevallos Valdiviezo Master dissertation submitted
More informationarxiv: v1 [stat.me] 29 May 2015
MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis Vincent Audigier 1, François Husson 2 and Julie Josse 2 arxiv:1505.08116v1 [stat.me] 29 May 2015 Applied Mathematics
More informationIntroduction to Mplus
Introduction to Mplus May 12, 2010 SPONSORED BY: Research Data Centre Population and Life Course Studies PLCS Interdisciplinary Development Initiative Piotr Wilk piotr.wilk@schulich.uwo.ca OVERVIEW Mplus
More informationSimulation Study: Introduction of Imputation. Methods for Missing Data in Longitudinal Analysis
Applied Mathematical Sciences, Vol. 5, 2011, no. 57, 2807-2818 Simulation Study: Introduction of Imputation Methods for Missing Data in Longitudinal Analysis Michikazu Nakai Innovation Center for Medical
More informationSTATISTICS (STAT) Statistics (STAT) 1
Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).
More informationGeneralized Additive Model
Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1
More informationMissing Data in Orthopaedic Research
in Orthopaedic Research Keith D Baldwin, MD, MSPT, MPH, Pamela Ohman-Strickland, PhD Abstract Missing data can be a frustrating problem in orthopaedic research. Many statistical programs employ a list-wise
More informationPackage CALIBERrfimpute
Type Package Package CALIBERrfimpute June 11, 2018 Title Multiple Imputation Using MICE and Random Forest Version 1.0-1 Date 2018-06-05 Functions to impute using Random Forest under Full Conditional Specifications
More informationStatistical matching: conditional. independence assumption and auxiliary information
Statistical matching: conditional Training Course Record Linkage and Statistical Matching Mauro Scanu Istat scanu [at] istat.it independence assumption and auxiliary information Outline The conditional
More informationAn Algorithm for Creating Models for Imputation Using the MICE Approach:
An Algorithm for Creating Models for Imputation Using the MICE Approach: An application in Stata Rose Anne rosem@ats.ucla.edu Statistical Consulting Group Academic Technology Services University of California,
More informationMissing data analysis. University College London, 2015
Missing data analysis University College London, 2015 Contents 1. Introduction 2. Missing-data mechanisms 3. Missing-data methods that discard data 4. Simple approaches that retain all the data 5. RIBG
More informationApproaches to Missing Data
Approaches to Missing Data A Presentation by Russell Barbour, Ph.D. Center for Interdisciplinary Research on AIDS (CIRA) and Eugenia Buta, Ph.D. CIRA and The Yale Center of Analytical Studies (YCAS) April
More informationMultiple Imputation with Mplus
Multiple Imputation with Mplus Tihomir Asparouhov and Bengt Muthén Version 2 September 29, 2010 1 1 Introduction Conducting multiple imputation (MI) can sometimes be quite intricate. In this note we provide
More informationEstimation of Item Response Models
Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:
More informationWork Session on Statistical Data Editing (Paris, France, April 2014) Topic (v): International Collaboration and Software & Tools
WP.XX ENGLISH ONLY UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing (Paris, France, 28-30 April 204) Topic (v): International
More informationAmelia multiple imputation in R
Amelia multiple imputation in R January 2018 Boriana Pratt, Princeton University 1 Missing Data Missing data can be defined by the mechanism that leads to missingness. Three main types of missing data
More informationMissing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA
Missing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA ABSTRACT Statistical analyses can be greatly hampered by missing
More informationHandling Missing Data
Handling Missing Data Estie Hudes Tor Neilands UCSF Center for AIDS Prevention Studies Part 2 December 10, 2013 1 Contents 1. Summary of Part 1 2. Multiple Imputation (MI) for normal data 3. Multiple Imputation
More informationCorrectly Compute Complex Samples Statistics
SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample
More informationPSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects
PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects Let s create a data for a variable measured repeatedly over five occasions We could create raw data (for each
More informationStatistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland
Statistical Analysis Using Combined Data Sources: Discussion 2011 JPSM Distinguished Lecture University of Maryland 1 1 University of Michigan School of Public Health April 2011 Complete (Ideal) vs. Observed
More informationIBM SPSS Categories 23
IBM SPSS Categories 23 Note Before using this information and the product it supports, read the information in Notices on page 55. Product Information This edition applies to version 23, release 0, modification
More informationPackage micemd. August 24, 2018
Type Package Package micemd August 24, 2018 Title Multiple Imputation by Chained Equations with Multilevel Data Version 1.4.0 Date 2018-08-23 Addons for the 'mice' package to perform multiple imputation
More informationFrequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS
ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion
More informationarxiv: v1 [stat.ap] 8 Jan 2014
The Annals of Applied Statistics 2013, Vol. 7, No. 4, 1983 2006 DOI: 10.1214/13-AOAS664 c Institute of Mathematical Statistics, 2013 CALIBRATED IMPUTATION OF NUMERICAL DATA UNDER LINEAR EDIT RESTRICTIONS
More informationHandbook of Statistical Modeling for the Social and Behavioral Sciences
Handbook of Statistical Modeling for the Social and Behavioral Sciences Edited by Gerhard Arminger Bergische Universität Wuppertal Wuppertal, Germany Clifford С. Clogg Late of Pennsylvania State University
More information- 1 - Fig. A5.1 Missing value analysis dialog box
WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation
More informationA STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY
A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY Norman Solomon School of Computing and Technology University of Sunderland A thesis submitted in partial fulfilment of the requirements of the University
More informationAn introduction to SPSS
An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible
More informationGeneralized least squares (GLS) estimates of the level-2 coefficients,
Contents 1 Conceptual and Statistical Background for Two-Level Models...7 1.1 The general two-level model... 7 1.1.1 Level-1 model... 8 1.1.2 Level-2 model... 8 1.2 Parameter estimation... 9 1.3 Empirical
More informationCorrectly Compute Complex Samples Statistics
PASW Complex Samples 17.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample
More informationPackage miceext. March 6, 2018
Title Extension Package to 'mice' Version 1.1.0 Package miceext March 6, 2018 Maintainer Tobias Schumacher Description Extends and builds on the 'mice' package by adding
More informationTypes of missingness and common strategies
9 th UK Stata Users Meeting 20 May 2003 Multiple imputation for missing data in life course studies Bianca De Stavola and Valerie McCormack (London School of Hygiene and Tropical Medicine) Motivating example
More informationIBM SPSS Missing Values 21
IBM SPSS Missing Values 21 Note: Before using this information and the product it supports, read the general information under Notices on p. 87. This edition applies to IBM SPSS Statistics 21 and to all
More informationPredict Outcomes and Reveal Relationships in Categorical Data
PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,
More informationProduct Catalog. AcaStat. Software
Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,
More informationREALCOM-IMPUTE: multiple imputation using MLwin. Modified September Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol
REALCOM-IMPUTE: multiple imputation using MLwin. Modified September 2014 by Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol This description is divided into two sections. In the
More informationESTIMATING THE MISSING VALUES IN ANALYSIS OF VARIANCE TABLES BY A FLEXIBLE ADAPTIVE ARTIFICIAL NEURAL NETWORK AND FUZZY REGRESSION MODELS
ESTIMATING THE MISSING VALUES IN ANALYSIS OF VARIANCE TABLES BY A FLEXIBLE ADAPTIVE ARTIFICIAL NEURAL NETWORK AND FUZZY REGRESSION MODELS Ali Azadeh - Zahra Saberi Hamidreza Behrouznia-Farzad Radmehr Peiman
More informationCross-validation and the Bootstrap
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. 1/44 Cross-validation and the Bootstrap In the section we discuss two resampling
More informationIn this chapter, we present how to use the multiple imputation methods
MULTIPLE IMPUTATION WITH PRINCIPAL COMPONENT METHODS: A USER GUIDE In this chapter, we present how to use the multiple imputation methods described previously: the BayesMIPCA method, allowing multiple
More informationMoving Beyond Linearity
Moving Beyond Linearity The truth is never linear! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! But often
More informationMissing data analysis: - A study of complete case analysis, single imputation and multiple imputation. Filip Lindhfors and Farhana Morko
Bachelor thesis Department of Statistics Kandidatuppsats, Statistiska institutionen Nr 2014:5 Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation Filip
More informationUsing Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models
Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models Bengt Muth en University of California, Los Angeles Tihomir Asparouhov Muth en & Muth en Mplus
More informationMultiple Imputation for Continuous and Categorical Data: Comparing Joint and Conditional Approaches
Multiple Imputation for Continuous and Categorical Data: Comparing Joint and Conditional Approaches Jonathan Kropko University of Virginia Ben Goodrich Columbia University Andrew Gelman Columbia University
More informationSTATISTICS (STAT) 200 Level Courses. 300 Level Courses. Statistics (STAT) 1
Statistics (STAT) 1 STATISTICS (STAT) 200 Level Courses STAT 250: Introductory Statistics I. 3 credits. Elementary introduction to statistics. Topics include descriptive statistics, probability, and estimation
More informationPASW Missing Values 18
i PASW Missing Values 18 For more information about SPSS Inc. software products, please visit our Web site at http://www.spss.com or contact SPSS Inc. 233 South Wacker Drive, 11th Floor Chicago, IL 60606-6412
More informationA Monotonic Sequence and Subsequence Approach in Missing Data Statistical Analysis
Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 12, Number 1 (2016), pp. 1131-1140 Research India Publications http://www.ripublication.com A Monotonic Sequence and Subsequence Approach
More informationMHPE 494: Data Analysis. Welcome! The Analytic Process
MHPE 494: Data Analysis Alan Schwartz, PhD Department of Medical Education Memoona Hasnain,, MD, PhD, MHPE Department of Family Medicine College of Medicine University of Illinois at Chicago Welcome! Your
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationOpening Windows into the Black Box
Opening Windows into the Black Box Yu-Sung Su, Andrew Gelman, Jennifer Hill and Masanao Yajima Columbia University, Columbia University, New York University and University of California at Los Angels July
More informationNotes on Simulations in SAS Studio
Notes on Simulations in SAS Studio If you are not careful about simulations in SAS Studio, you can run into problems. In particular, SAS Studio has a limited amount of memory that you can use to write
More informationPaper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by
Paper CC-016 A macro for nearest neighbor Lung-Chang Chien, University of North Carolina at Chapel Hill, Chapel Hill, NC Mark Weaver, Family Health International, Research Triangle Park, NC ABSTRACT SAS
More informationWeek 5: Multiple Linear Regression II
Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R
More informationStatistical modelling with missing data using multiple imputation. Session 2: Multiple Imputation
Statistical modelling with missing data using multiple imputation Session 2: Multiple Imputation James Carpenter London School of Hygiene & Tropical Medicine Email: james.carpenter@lshtm.ac.uk www.missingdata.org.uk
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationA noninformative Bayesian approach to small area estimation
A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported
More informationSENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY
Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 6 SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE
More informationStatistical Methods for the Analysis of Repeated Measurements
Charles S. Davis Statistical Methods for the Analysis of Repeated Measurements With 20 Illustrations #j Springer Contents Preface List of Tables List of Figures v xv xxiii 1 Introduction 1 1.1 Repeated
More informationFHDI: An R Package for Fractional Hot Deck Imputation by Jongho Im, In Ho Cho, and Jae Kwang Kim
CONTRIBUTED RESEARCH ARTICLE 140 FHDI: An R Package for Fractional Hot Deck Imputation by Jongho Im, In Ho Cho, and Jae Kwang Kim Abstract Fractional hot deck imputation (FHDI), proposed by Kalton and
More information