R software and examples

Size: px
Start display at page:

Download "R software and examples"

Transcription

1 Handling Missing Data in R with MICE Handling Missing Data in R with MICE Why this course? Handling Missing Data in R with MICE Stef van Buuren, Methodology and Statistics, FSBS, Utrecht University Netherlands Organization for Applied Scientific Research TNO, Leiden Winnipeg, June, 7 Missing data are everywhere Ad-hoc fixes often do not work Multiple imputation is broadly applicable, yield correct statistical inferences, and there is good software Goal of the course: get comfortable with a modern and powerful way of solving missing data problems Handling Missing Data in R with MICE Course materials Handling Missing Data in R with MICE Reading materials Van Buuren, S. and Groothuis-Oudshoorn, C.G.M. (). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 5(), Van Buuren, S. (). Flexible Imputation of Missing Data. Chapman & Hall/CRC, Boca Raton, FL. Chapters 6,. Handling Missing Data in R with MICE Flexible Imputation of Missing Data (FIMD) Handling Missing Data in R with MICE R software and examples R Install from RStudio: Install from R package mice. or higher: from CRAN or from More examples: Handling Missing Data in R with MICE > Time table Time table (morning) Handling Missing Data in R with MICE > Time table Time table (afternoon) Time Session L/P Description L Overview I L Introduction to missing data. -. I P Ad hoc methods + MICE. -.5 PAUSE.5 -. II L Multiple imputation. -. II P Boys data. -.5 PAUSE Time Session L/P Description.5 -. III L Generating plausible imputations. -. III P Algorithmic convergence and pooling. -.5 PAUSE IV L Imputation in practice IV P Post-processing and passive imputation V L Guidelines for reporting

2 Handling Missing Data in R with MICE > I > Handling Missing Data in R with MICE > I > Problem of missing data Why are missing data interesting? SESSION I Obviously the best way to treat missing data is not to have them. (Orchard and Woodbury 97) Sooner or later (usually sooner), anyone who does statistical analysis runs into problems with missing data (Allison, ) Missing data problems are the heart of statistics Handling Missing Data in R with MICE > I > Problem of missing data Causes of missing data Handling Missing Data in R with MICE > I > Problem of missing data Consequences of missing data Respondent skipped the item Data transmission/coding error Drop out in longitudinal research Refusal to cooperate Sample from population Question not asked, di erent forms Censoring Less information than planned Enough statistical power? Di erent analyses, di erent n s Cannot calculate even the mean Systematic biases in the analysis Appropriate confidence interval, P-values? In general, missing data can severely complicate interpretation and analysis. Listwise deletion Listwise deletion Analyze only the complete records Also known as Complete Case Analysis (CCA) Advantages Simple (default in most software) Unbiased under MCAR Correct standard errors, significance levels Two special properties in regression Disadvantages Wasteful Large standard errors Biased under MAR, even for simple statistics like the mean Inconsistencies in reporting Mean imputation Mean imputation Replace the missing values by the mean of the observed data Advantages Simple Unbiased for the mean, under MCAR Frequency 5 Ozone (ppb) Ozone (ppb) Solar Radiation (lang)

3 Mean imputation Regression imputation Disadvantages Disturbs the distribution Underestimates the variance Biases correlations to zero Biased under MAR AVOID (unless you know what you are doing) Also known as prediction Fit model for Y obs under listwise deletion Predict Y mis for records with missing Y s Replace missing values by prediction Advantages Unbiased estimates of regression coe cients (under MAR) Good approximation to the (unknown) true data if explained variance is high Prediction is the favorite among non-statisticians Regression imputation Regression imputation Frequency Ozone (ppb) 5 5 Disadvantages Artificially increases correlations Systematically underestimates the variance Too optimistic P-values and too short confidence intervals AVOID. Harmful to statistical inference Ozone (ppb) Solar Radiation (lang) Stochastic regression imputation Stochastic regression imputation Like regression imputation, but adds appropriate noise to the predictions to reflect uncertainty Advantages Preserves the distribution of Y obs Preserves the correlation between Y and X in the imputed data Frequency Ozone (ppb) Ozone (ppb) Solar Radiation (lang) Stochastic regression imputation Single imputation methods, wrapup Disadvantages Symmetric and constant error restrictive Single imputation does not take uncertainty imputed data into account, and incorrectly treats them as real Not so simple anymore Underestimate uncertainty caused by the missing data Unbiased only under restrictive assumptions

4 Alternatives Handling Missing Data in R with MICE > II > Maximum Likelihood, Direct Likelihood Weighting Multiple Imputation SESSION II Little, R.J.A. Rubin D.B. () Statistical Analysis with Missing Data. Second Edition. John Wiley Sons, New York. Handling Missing Data in R with MICE > II > What is multiple imputation Rising popularity of multiple imputation Handling Missing Data in R with MICE > II > What is multiple imputation Main steps used in multiple imputation Number of publications (log) 5 5 early publications 'multiple imputation' in abstract 'multiple imputation' in title - R - R Year Incomplete data Imputed data Analysis results Pooled results Handling Missing Data in R with MICE > II > What is multiple imputation Steps in mice Handling Missing Data in R with MICE > II > Goal Estimand incomplete data imputed data analysis results pooled results Q is a quantity of scientific interest in the population. mice() with() pool() Q can be a vector of population means, population regression weights, population variances, and so on. Q may not depend on the particular sample, thus Q cannot be a standard error, sample mean, p-value, and so on. data frame mids mira mipo Handling Missing Data in R with MICE > II > Goal Goal of multiple imputation Handling Missing Data in R with MICE > II > Multiple imputation theory Pooled estimate Q Estimate Q by ˆQ or Q accompanied by a valid estimate of its uncertainty. What is the di erence between ˆQ or Q? ˆQ and Q both estimate Q ˆQ accounts for the sampling uncertainty Q accounts for the sampling and missing data uncertainty ˆQ` is the estimate of the `-th repeated imputation ˆQ` contains k parameters and is represented as a k column vector The pooled estimate Q is simply the average Q = mx ˆQ` () m `=

5 Handling Missing Data in R with MICE > II > Multiple imputation theory Within-imputation variance Handling Missing Data in R with MICE > II > Multiple imputation theory Between-imputation variance Average of the complete-data variances as Ū = mx Ū`, () m `= where Ū` is the variance-covariance matrix of ˆQ` obtained for the `-th imputation Ū` is the variance is the estimate, not the variance in the data Variance between the m complete-data estimates is given by B = m mx ( ˆQ` `= Q)( ˆQ` Q), () where Q is the pooled estimate (c.f. equation ) The between-imputation variance is large there many missing data The within-imputation variance is large if the sample is small Handling Missing Data in R with MICE > II > Multiple imputation theory Total variance Handling Missing Data in R with MICE > II > Multiple imputation theory Three sources of variation The total variance is not simply T = Ū + B The correct formula is for the total variance of Q, and hence of (Q The term B/m is the simulation error T = Ū + B + B/m = Ū + + B () m Q) if Q is unbiased In summary, the total variance T stems from three sources: Ū, thevariancecausedbythefactthatwearetakingasample rather than the entire population. This is the conventional statistical measure of variability; B, theextravariancecausedbythefactthattherearemissing values in the sample; B/m, the extra simulation variance caused by the fact that Q itself is based on finite m. Handling Missing Data in R with MICE > II > Multiple imputation theory Variance ratio s () Handling Missing Data in R with MICE > II > Multiple imputation theory Variance ratio s () Proportion of the variation attributable to the missing data = B + B/m, (5) T Relative increase in variance due to nonresponse r = B + B/m Ū These are related by r = /( ). (6) Fraction of information about Q missing due to nonresponse r +/( + ) = +r This measure needs an estimate of the degrees of freedom. Relation between and = (8) The literature often confuses and. (7) Handling Missing Data in R with MICE > II > Statistical inference Statistical inference for Q () Handling Missing Data in R with MICE > II > Statistical inference Statistical inference for Q () The ( )% confidence interval of a Q is calculated as Q ± t (, /) p T, (9) where t (, /) is the quantile corresponding to probability / of t. For example, use t(,.975) =. for the 95% confidence interval for =. Suppose we test the null hypothesis Q = Q for some specified value Q.Wecanfindthep-valueof the test as the probability apple P s =Pr F, > (Q Q) () T where F, is an F distribution with and degrees of freedom.

6 Handling Missing Data in R with MICE > II > Statistical inference Handling Missing Data in R with MICE > II > Statistical inference Degrees of freedom () Degrees of freedom () With missing data, n is e ectively lower. Thus, the degrees of freedom in statistical tests need to be adjusted. The new formula is = The old formula assumes n = : old = (m ) + r m = old obs. old + obs () where the estimated observed-data degrees of freedom that accounts for the missing information is obs = () with com = n com + com ( com + ). () k. Handling Missing Data in R with MICE > II > How many imputations? Handling Missing Data in R with MICE > II > How many imputations? How large should m be? The legacy Classic advice: m =, 5,. More recently: set m higher:. Some advice Use m = 5 or m = if the fraction of missing information is low, <.. Develop your model with m = 5. Do final run with m equal to percentage of incomplete cases. Repeat the analysis with m = 5 with di erent seeds. If there are large di erences for some parameters, this means that the data contain little information about them. Handling Missing Data in R with MICE > II > How many imputations? Handling Missing Data in R with MICE > III > Introductions to multiple imputation Schafer, J.L. (999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8(), 5. Sterne et al (9). Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ, 8, b9. Van Buuren, S. (). Flexible Imputation of Missing Data. Chapman & Hall/CRC, Boca Raton, FL. SESSION III 5 6 a We delete gas consumption of observation 7 7 Relation between temperature and gas consumption deleted observation

7 Predict imputed value from regression line Predicted value + noise b c Predicted value + noise + parameter uncertainty Imputation based on two predictors d e Predictive mean matching: Y given X Add two regression lines Predicted given 5 C, Define a matching range ŷ ±

8 Select potential donors Bayesian PMM: Draw a line Define a matching range ŷ ± Select potential donors Imputation of a binary variable Fit logistic model logistic regression Pr(y i = X i, )= exp(x i ) +exp(x i ). () Probability Linear predictor Draw parameter estimate Read o the probability Probability Probability Linear predictor Linear predictor

9 Impute ordered categorical variable Fit ordered logit model K ordered categories k =,...,K ordered logit model, or proportional odds model Pr(y i = k X i, )= exp( k + X i ) P K k= exp( k + X i ) (5) Probability Linear predictor Read o the probability Other types of variables Probability Count data Semi-continuous data Censored data Truncated data Rounded data Linear predictor Univariate imputation in mice Method Description Scale type pmm Predictive mean matching numeric norm Bayesian linear regression numeric norm.nob Linear regression, non-bayesian numeric norm.boot Linear regression with bootstrap numeric mean Unconditional mean imputation numeric L.norm Two-level linear model numeric logreg Logistic regression factor, levels logreg.boot Logistic regression with bootstrap factor, levels polyreg Multinomial logit model factor, > levels polr Ordered logit model ordered, > levels lda Linear discriminant analysis factor sample Simple random sample any Problems in multivariate imputation Predictors themselves can be incomplete Mixed measurement levels Order of imputation can be meaningful Too many predictor variables Relations could be nonlinear Higher order interactions Impossible combinations Three general strategies Imputation of monotone pattern X Y Y Y Y Monotone data imputation Joint modeling Fully conditional specification (FCS)

10 Imputation of monotone pattern Imputation of monotone pattern X Y Y Y Y X Y Y Y Y Joint Modeling (JM) Joint modeling: Software Specify joint model P(Y, X, R) Derive P(Y mis Y obs, X, R) Use MCMC techniques to draw imputations Y mis R/S Plus SAS STATA Stand-alone norm, cat, mix, pan, Amelia proc MI, proc MIANALYZE MI command Amelia, solas, norm, pan Joint Modeling: Pro s Joint Modeling: Con s Yield correct statistical inference under the assumed JM E cient parametrization (if the model fits) Known theoretical properties Works very well for parameters close to the center Many applications Lack of flexibility May lead to large models Can assume more than the complete data problem Can impute impossible data Fully Conditional Specification (FCS) Multivariate Imputation by Chained Equations (MICE) MICE algorithm Specify P(Y mis Y obs, X, R) Use MCMC techniques to draw imputations Y mis Specify imputation model for each incomplete column Fill in starting imputations And iterate Model: Fully Conditional Specification (FCS)

11 Fully Conditional Specification: Con s Fully Conditional Specification: Pro s Theoretical properties only known in special cases Cannot use computational shortcuts, like sweep-operator Joint distribution may not exist (incompatibility) Easy and flexible Imputes close to the data, prevents impossible data Subset selection of predictors Modular, can preserve valuable work Works well, both in simulations and practice Fully Conditional Specification (FCS): Software How many iterations? R mice, transcan, mi, VIM, baboon SPSS V7 procedure multiple imputation SAS IVEware, SAS 9. STATA ice command, multiple imputation command Stand-alone Solas, Mplus Quick convergence 5 iterations is adequate for most problems More iterations is is high inspect the generated imputations Monitor convergence to detect anomalies Non-convergence Convergence mean mean mean hgt wgt sd sd sd hgt wgt mean mean mean hgt wgt sd sd sd hgt wgt Iteration Iteration Handling Missing Data in R with MICE > IV > Handling Missing Data in R with MICE > IV > Modeling choices Imputation model choices SESSION IV MAR or MNAR Form of the imputation model Which predictors Derived variables 5 What is m? 6 Order of imputation 7 Diagnostics, convergence

12 Handling Missing Data in R with MICE > IV > Which predictors Which predictors? Derived variables Include all variables that appear in the complete-data model In addition, include the variables that are related to the nonresponse In addition, include variables that explain a considerable amount of variance Remove from the variables selected in steps and those variables that have too many missing values within the subgroup of incomplete cases. Function quickpred() and flux() ratio of two variables sum score index variable quadratic relations interaction term conditional imputation compositions How to impute a ratio? Method POST weight/height ratio: whr=wgt/hgt kg/m. Easy if only one of wgt or hgt or whr is missing Methods POST: Impute wgt and hgt, and calculate whr after imputation JAV: Impute whr as just another variable PASSIVE: Impute wgt and hgt, and calculate whr during imputation PASSIVE: As PASSIVE with adapted predictor matrix > imp <- mice(boys) > long <- complete(imp, "long", inc = TRUE) > long$whr <- with(long, wgt/(hgt/)) > imp <- longmids(long) Method JAV: Just another variable Method JAV 6 JAV 5 5 passive passive > boys$whr <- boys$wgt/(boys$hgt/) > imp.jav <- mice(boys, m =, seed = 9, maxit = ) Weight/Height (kg/m) Height (cm) 5 5 Method PASSIVE Method PASSIVE, predictor matrix > meth["whr"] <- "~I(wgt/(hgt/))" age hgt wgt hc gen phb tv reg whr age hgt wgt hc gen phb tv reg whr

13 Method PASSIVE Method PASSIVE 5 JAV 5 passive passive Weight/Height (kg/m) 6 5 > pred[c("wgt", "hgt", "hc", "reg"), ""] <- > pred[c("gen", "phb", "tv"), c("hgt", "wgt", "hc")] <- > pred[, "whr"] < Height (cm) Method PASSIVE, predictor matrix Method PASSIVE 5 age hgt wgt hc gen phb tv reg whr 5 passive passive 6 Weight/Height (kg/m) age hgt wgt hc gen phb tv reg whr JAV Height (cm) Handling Missing Data in R with MICE > IV > Diagnostics Derived variables: summary Standard diagnostic plots in mice Since mice.5, plots for imputed data: Derived variables pose special challenges Plausible values respect data dependencies one-dimensional scatter: stripplot If you can, create derived variables after imputation box-and-whisker plot: bwplot If you cannot, use passive imputation densities: densityplot Break up direct feedback loops using the predictor matrix scattergram: xyplot Handling Missing Data in R with MICE > IV > Diagnostics Handling Missing Data in R with MICE > IV > Diagnostics Stripplot stripplot(imp, pch=c(,9)) age. > library(mice) > imp <- mice(nhanes, seed = 998) > stripplot(imp, pch = c(, 9)) chl hyp 5 Imputation number

14 Handling Missing Data in R with MICE > IV > Diagnostics Alargerdataset Handling Missing Data in R with MICE > IV > Diagnostics bwplot(imp) > imp <- mice(boys, seed =, maxit = ) > bwplot(imp) 5 5 age hgt 5 hc 6 8 wgt 5 tv Imputation number Handling Missing Data in R with MICE > IV > Diagnostics densityplot(imp) Handling Missing Data in R with MICE > V >..... hgt.... wgt SESSION V Density hc tv Handling Missing Data in R with MICE > V > Reporting guidelines Reporting guidelines Amount of missing data Reasons for missingness Di erences between complete and incomplete data Method used to account for missing data 5 Software 6 Number of imputed datasets 7 Imputation model 8 Derived variables 9 Diagnostics Pooling Listwise deletion Sensitivity analysis

Missing Data: What Are You Missing?

Missing Data: What Are You Missing? Missing Data: What Are You Missing? Craig D. Newgard, MD, MPH Jason S. Haukoos, MD, MS Roger J. Lewis, MD, PhD Society for Academic Emergency Medicine Annual Meeting San Francisco, CA May 006 INTRODUCTION

More information

Multiple-imputation analysis using Stata s mi command

Multiple-imputation analysis using Stata s mi command Multiple-imputation analysis using Stata s mi command Yulia Marchenko Senior Statistician StataCorp LP 2009 UK Stata Users Group Meeting Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi

More information

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Multiple Imputation for Missing Data Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Outline Missing data mechanisms What is Multiple Imputation? Software Options

More information

Package midastouch. February 7, 2016

Package midastouch. February 7, 2016 Type Package Version 1.3 Package midastouch February 7, 2016 Title Multiple Imputation by Distance Aided Donor Selection Date 2016-02-06 Maintainer Philipp Gaffert Depends R (>=

More information

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a Chapter 3 Missing Data 3.1 Types of Missing Data ˆ Missing completely at random (MCAR) ˆ Missing at random (MAR) a ˆ Informative missing (non-ignorable non-response) See 1, 38, 59 for an introduction to

More information

NORM software review: handling missing values with multiple imputation methods 1

NORM software review: handling missing values with multiple imputation methods 1 METHODOLOGY UPDATE I Gusti Ngurah Darmawan NORM software review: handling missing values with multiple imputation methods 1 Evaluation studies often lack sophistication in their statistical analyses, particularly

More information

Bootstrap and multiple imputation under missing data in AR(1) models

Bootstrap and multiple imputation under missing data in AR(1) models EUROPEAN ACADEMIC RESEARCH Vol. VI, Issue 7/ October 2018 ISSN 2286-4822 www.euacademic.org Impact Factor: 3.4546 (UIF) DRJI Value: 5.9 (B+) Bootstrap and multiple imputation under missing ELJONA MILO

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

Handling missing data for indicators, Susanne Rässler 1

Handling missing data for indicators, Susanne Rässler 1 Handling Missing Data for Indicators Susanne Rässler Institute for Employment Research & Federal Employment Agency Nürnberg, Germany First Workshop on Indicators in the Knowledge Economy, Tübingen, 3-4

More information

Multiple imputation using chained equations: Issues and guidance for practice

Multiple imputation using chained equations: Issues and guidance for practice Multiple imputation using chained equations: Issues and guidance for practice Ian R. White, Patrick Royston and Angela M. Wood http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/full By Gabrielle Simoneau

More information

Simulation of Imputation Effects Under Different Assumptions. Danny Rithy

Simulation of Imputation Effects Under Different Assumptions. Danny Rithy Simulation of Imputation Effects Under Different Assumptions Danny Rithy ABSTRACT Missing data is something that we cannot always prevent. Data can be missing due to subjects' refusing to answer a sensitive

More information

Handling Data with Three Types of Missing Values:

Handling Data with Three Types of Missing Values: Handling Data with Three Types of Missing Values: A Simulation Study Jennifer Boyko Advisor: Ofer Harel Department of Statistics University of Connecticut Storrs, CT May 21, 2013 Jennifer Boyko Handling

More information

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in

More information

Missing Data Techniques

Missing Data Techniques Missing Data Techniques Paul Philippe Pare Department of Sociology, UWO Centre for Population, Aging, and Health, UWO London Criminometrics (www.crimino.biz) 1 Introduction Missing data is a common problem

More information

Missing Data Missing Data Methods in ML Multiple Imputation

Missing Data Missing Data Methods in ML Multiple Imputation Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:

More information

Missing Data Analysis with SPSS

Missing Data Analysis with SPSS Missing Data Analysis with SPSS Meng-Ting Lo (lo.194@osu.edu) Department of Educational Studies Quantitative Research, Evaluation and Measurement Program (QREM) Research Methodology Center (RMC) Outline

More information

Group Level Imputation of Statistic Maps, Version 1.0. User Manual by Kenny Vaden, Ph.D. musc. edu)

Group Level Imputation of Statistic Maps, Version 1.0. User Manual by Kenny Vaden, Ph.D. musc. edu) Updated: March 29, 2012 1 Group Level Imputation of Statistic Maps, Version 1.0 User Manual by Kenny Vaden, Ph.D. (vaden @ musc. edu) Notice for use in Academic Work: If our toolkit is used in an academic

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex

More information

Missing Data. SPIDA 2012 Part 6 Mixed Models with R:

Missing Data. SPIDA 2012 Part 6 Mixed Models with R: The best solution to the missing data problem is not to have any. Stef van Buuren, developer of mice SPIDA 2012 Part 6 Mixed Models with R: Missing Data Georges Monette 1 May 2012 Email: georges@yorku.ca

More information

The mice Package. June 25, 2007

The mice Package. June 25, 2007 The mice Package June 25, 2007 Version 1.16 Date 2007-06-25 Title Multivariate Imputation by Chained Equations Author S. Van Buuren & C.G.M. Oudshoorn Maintainer Roel de Jong Depends

More information

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background An Introduction to Multiple Imputation and its Application Craig K. Enders University of California - Los Angeles Department of Psychology cenders@psych.ucla.edu Background Work supported by Institute

More information

The mice Package. April 3, 2006

The mice Package. April 3, 2006 Version 1.14 Date 9/16/2005 The mice Package April 3, 2006 Title Multivariate Imputation by Chained Equations Author S. Van Buuren & C.G.M. Oudshoorn [R: peter.malewski@gmx.de] Maintainer Roel de Jong

More information

Epidemiological analysis PhD-course in epidemiology

Epidemiological analysis PhD-course in epidemiology Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 9. oktober 2012 Multivariate tables Agenda today Age standardization Missing data 1 2 3 4 Age standardization

More information

Epidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014

Epidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014 Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 25 th February 2014 Age standardization Incidence and prevalence are strongly agedependent Risks rising

More information

MISSING DATA AND MULTIPLE IMPUTATION

MISSING DATA AND MULTIPLE IMPUTATION Paper 21-2010 An Introduction to Multiple Imputation of Complex Sample Data using SAS v9.2 Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT This

More information

Garret M. Fitzmaurice, Michael G. Kenward, Geert Molenberghs, Anastasios A. Tsiatis, Geert Verbeke. Handbook of Missing Data

Garret M. Fitzmaurice, Michael G. Kenward, Geert Molenberghs, Anastasios A. Tsiatis, Geert Verbeke. Handbook of Missing Data Garret M. Fitzmaurice, Michael G. Kenward, Geert Molenberghs, Anastasios A. Tsiatis, Geert Verbeke Handbook of Missing Data 2 Contents I Multiple Imputation 1 1 Fully conditional specification 3 Stef van

More information

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU. SOS3003 Applied data analysis for social science Lecture note 04-2009 Erling Berge Department of sociology and political science NTNU Erling Berge 2009 1 Missing data Literature Allison, Paul D 2002 Missing

More information

Missing Data Analysis with the Mahalanobis Distance

Missing Data Analysis with the Mahalanobis Distance Missing Data Analysis with the Mahalanobis Distance by Elaine M. Berkery, B.Sc. Department of Mathematics and Statistics, University of Limerick A thesis submitted for the award of M.Sc. Supervisor: Dr.

More information

Performance of Sequential Imputation Method in Multilevel Applications

Performance of Sequential Imputation Method in Multilevel Applications Section on Survey Research Methods JSM 9 Performance of Sequential Imputation Method in Multilevel Applications Enxu Zhao, Recai M. Yucel New York State Department of Health, 8 N. Pearl St., Albany, NY

More information

Analysis of Incomplete Multivariate Data

Analysis of Incomplete Multivariate Data Analysis of Incomplete Multivariate Data J. L. Schafer Department of Statistics The Pennsylvania State University USA CHAPMAN & HALL/CRC A CR.C Press Company Boca Raton London New York Washington, D.C.

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,

More information

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017 Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis

More information

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa Ronald H. Heck 1 In this handout, we will address a number of issues regarding missing data. It is often the case that the weakest point of a study is the quality of the data that can be brought to bear

More information

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Examples: Missing Data Modeling And Bayesian Analysis CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Mplus provides estimation of models with missing data using both frequentist and Bayesian

More information

Missing Data. Where did it go?

Missing Data. Where did it go? Missing Data Where did it go? 1 Learning Objectives High-level discussion of some techniques Identify type of missingness Single vs Multiple Imputation My favourite technique 2 Problem Uh data are missing

More information

HANDLING MISSING DATA

HANDLING MISSING DATA GSO international workshop Mathematic, biostatistics and epidemiology of cancer Modeling and simulation of clinical trials Gregory GUERNEC 1, Valerie GARES 1,2 1 UMR1027 INSERM UNIVERSITY OF TOULOUSE III

More information

The Performance of Multiple Imputation for Likert-type Items with Missing Data

The Performance of Multiple Imputation for Likert-type Items with Missing Data Journal of Modern Applied Statistical Methods Volume 9 Issue 1 Article 8 5-1-2010 The Performance of Multiple Imputation for Likert-type Items with Missing Data Walter Leite University of Florida, Walter.Leite@coe.ufl.edu

More information

Missing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999.

Missing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999. 2 Schafer, J. L., Graham, J. W.: (2002). Missing Data: Our View of the State of the Art. Psychological methods, 2002, Vol 7, No 2, 47 77 Rosner, B. (2005) Fundamentals of Biostatistics, 6th ed, Wiley.

More information

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent

More information

Lecture 26: Missing data

Lecture 26: Missing data Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:

More information

Faculty of Sciences. Holger Cevallos Valdiviezo

Faculty of Sciences. Holger Cevallos Valdiviezo Faculty of Sciences Handling of missing data in the predictor variables when using Tree-based techniques for training and generating predictions Holger Cevallos Valdiviezo Master dissertation submitted

More information

arxiv: v1 [stat.me] 29 May 2015

arxiv: v1 [stat.me] 29 May 2015 MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis Vincent Audigier 1, François Husson 2 and Julie Josse 2 arxiv:1505.08116v1 [stat.me] 29 May 2015 Applied Mathematics

More information

Introduction to Mplus

Introduction to Mplus Introduction to Mplus May 12, 2010 SPONSORED BY: Research Data Centre Population and Life Course Studies PLCS Interdisciplinary Development Initiative Piotr Wilk piotr.wilk@schulich.uwo.ca OVERVIEW Mplus

More information

Simulation Study: Introduction of Imputation. Methods for Missing Data in Longitudinal Analysis

Simulation Study: Introduction of Imputation. Methods for Missing Data in Longitudinal Analysis Applied Mathematical Sciences, Vol. 5, 2011, no. 57, 2807-2818 Simulation Study: Introduction of Imputation Methods for Missing Data in Longitudinal Analysis Michikazu Nakai Innovation Center for Medical

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

Missing Data in Orthopaedic Research

Missing Data in Orthopaedic Research in Orthopaedic Research Keith D Baldwin, MD, MSPT, MPH, Pamela Ohman-Strickland, PhD Abstract Missing data can be a frustrating problem in orthopaedic research. Many statistical programs employ a list-wise

More information

Package CALIBERrfimpute

Package CALIBERrfimpute Type Package Package CALIBERrfimpute June 11, 2018 Title Multiple Imputation Using MICE and Random Forest Version 1.0-1 Date 2018-06-05 Functions to impute using Random Forest under Full Conditional Specifications

More information

Statistical matching: conditional. independence assumption and auxiliary information

Statistical matching: conditional. independence assumption and auxiliary information Statistical matching: conditional Training Course Record Linkage and Statistical Matching Mauro Scanu Istat scanu [at] istat.it independence assumption and auxiliary information Outline The conditional

More information

An Algorithm for Creating Models for Imputation Using the MICE Approach:

An Algorithm for Creating Models for Imputation Using the MICE Approach: An Algorithm for Creating Models for Imputation Using the MICE Approach: An application in Stata Rose Anne rosem@ats.ucla.edu Statistical Consulting Group Academic Technology Services University of California,

More information

Missing data analysis. University College London, 2015

Missing data analysis. University College London, 2015 Missing data analysis University College London, 2015 Contents 1. Introduction 2. Missing-data mechanisms 3. Missing-data methods that discard data 4. Simple approaches that retain all the data 5. RIBG

More information

Approaches to Missing Data

Approaches to Missing Data Approaches to Missing Data A Presentation by Russell Barbour, Ph.D. Center for Interdisciplinary Research on AIDS (CIRA) and Eugenia Buta, Ph.D. CIRA and The Yale Center of Analytical Studies (YCAS) April

More information

Multiple Imputation with Mplus

Multiple Imputation with Mplus Multiple Imputation with Mplus Tihomir Asparouhov and Bengt Muthén Version 2 September 29, 2010 1 1 Introduction Conducting multiple imputation (MI) can sometimes be quite intricate. In this note we provide

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

Work Session on Statistical Data Editing (Paris, France, April 2014) Topic (v): International Collaboration and Software & Tools

Work Session on Statistical Data Editing (Paris, France, April 2014) Topic (v): International Collaboration and Software & Tools WP.XX ENGLISH ONLY UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing (Paris, France, 28-30 April 204) Topic (v): International

More information

Amelia multiple imputation in R

Amelia multiple imputation in R Amelia multiple imputation in R January 2018 Boriana Pratt, Princeton University 1 Missing Data Missing data can be defined by the mechanism that leads to missingness. Three main types of missing data

More information

Missing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA

Missing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA Missing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA ABSTRACT Statistical analyses can be greatly hampered by missing

More information

Handling Missing Data

Handling Missing Data Handling Missing Data Estie Hudes Tor Neilands UCSF Center for AIDS Prevention Studies Part 2 December 10, 2013 1 Contents 1. Summary of Part 1 2. Multiple Imputation (MI) for normal data 3. Multiple Imputation

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects

PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects Let s create a data for a variable measured repeatedly over five occasions We could create raw data (for each

More information

Statistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland

Statistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland Statistical Analysis Using Combined Data Sources: Discussion 2011 JPSM Distinguished Lecture University of Maryland 1 1 University of Michigan School of Public Health April 2011 Complete (Ideal) vs. Observed

More information

IBM SPSS Categories 23

IBM SPSS Categories 23 IBM SPSS Categories 23 Note Before using this information and the product it supports, read the information in Notices on page 55. Product Information This edition applies to version 23, release 0, modification

More information

Package micemd. August 24, 2018

Package micemd. August 24, 2018 Type Package Package micemd August 24, 2018 Title Multiple Imputation by Chained Equations with Multilevel Data Version 1.4.0 Date 2018-08-23 Addons for the 'mice' package to perform multiple imputation

More information

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion

More information

arxiv: v1 [stat.ap] 8 Jan 2014

arxiv: v1 [stat.ap] 8 Jan 2014 The Annals of Applied Statistics 2013, Vol. 7, No. 4, 1983 2006 DOI: 10.1214/13-AOAS664 c Institute of Mathematical Statistics, 2013 CALIBRATED IMPUTATION OF NUMERICAL DATA UNDER LINEAR EDIT RESTRICTIONS

More information

Handbook of Statistical Modeling for the Social and Behavioral Sciences

Handbook of Statistical Modeling for the Social and Behavioral Sciences Handbook of Statistical Modeling for the Social and Behavioral Sciences Edited by Gerhard Arminger Bergische Universität Wuppertal Wuppertal, Germany Clifford С. Clogg Late of Pennsylvania State University

More information

- 1 - Fig. A5.1 Missing value analysis dialog box

- 1 - Fig. A5.1 Missing value analysis dialog box WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation

More information

A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY

A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY Norman Solomon School of Computing and Technology University of Sunderland A thesis submitted in partial fulfilment of the requirements of the University

More information

An introduction to SPSS

An introduction to SPSS An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible

More information

Generalized least squares (GLS) estimates of the level-2 coefficients,

Generalized least squares (GLS) estimates of the level-2 coefficients, Contents 1 Conceptual and Statistical Background for Two-Level Models...7 1.1 The general two-level model... 7 1.1.1 Level-1 model... 8 1.1.2 Level-2 model... 8 1.2 Parameter estimation... 9 1.3 Empirical

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics PASW Complex Samples 17.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

Package miceext. March 6, 2018

Package miceext. March 6, 2018 Title Extension Package to 'mice' Version 1.1.0 Package miceext March 6, 2018 Maintainer Tobias Schumacher Description Extends and builds on the 'mice' package by adding

More information

Types of missingness and common strategies

Types of missingness and common strategies 9 th UK Stata Users Meeting 20 May 2003 Multiple imputation for missing data in life course studies Bianca De Stavola and Valerie McCormack (London School of Hygiene and Tropical Medicine) Motivating example

More information

IBM SPSS Missing Values 21

IBM SPSS Missing Values 21 IBM SPSS Missing Values 21 Note: Before using this information and the product it supports, read the general information under Notices on p. 87. This edition applies to IBM SPSS Statistics 21 and to all

More information

Predict Outcomes and Reveal Relationships in Categorical Data

Predict Outcomes and Reveal Relationships in Categorical Data PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,

More information

Product Catalog. AcaStat. Software

Product Catalog. AcaStat. Software Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,

More information

REALCOM-IMPUTE: multiple imputation using MLwin. Modified September Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol

REALCOM-IMPUTE: multiple imputation using MLwin. Modified September Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol REALCOM-IMPUTE: multiple imputation using MLwin. Modified September 2014 by Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol This description is divided into two sections. In the

More information

ESTIMATING THE MISSING VALUES IN ANALYSIS OF VARIANCE TABLES BY A FLEXIBLE ADAPTIVE ARTIFICIAL NEURAL NETWORK AND FUZZY REGRESSION MODELS

ESTIMATING THE MISSING VALUES IN ANALYSIS OF VARIANCE TABLES BY A FLEXIBLE ADAPTIVE ARTIFICIAL NEURAL NETWORK AND FUZZY REGRESSION MODELS ESTIMATING THE MISSING VALUES IN ANALYSIS OF VARIANCE TABLES BY A FLEXIBLE ADAPTIVE ARTIFICIAL NEURAL NETWORK AND FUZZY REGRESSION MODELS Ali Azadeh - Zahra Saberi Hamidreza Behrouznia-Farzad Radmehr Peiman

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. 1/44 Cross-validation and the Bootstrap In the section we discuss two resampling

More information

In this chapter, we present how to use the multiple imputation methods

In this chapter, we present how to use the multiple imputation methods MULTIPLE IMPUTATION WITH PRINCIPAL COMPONENT METHODS: A USER GUIDE In this chapter, we present how to use the multiple imputation methods described previously: the BayesMIPCA method, allowing multiple

More information

Moving Beyond Linearity

Moving Beyond Linearity Moving Beyond Linearity The truth is never linear! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! But often

More information

Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation. Filip Lindhfors and Farhana Morko

Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation. Filip Lindhfors and Farhana Morko Bachelor thesis Department of Statistics Kandidatuppsats, Statistiska institutionen Nr 2014:5 Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation Filip

More information

Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models

Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models Bengt Muth en University of California, Los Angeles Tihomir Asparouhov Muth en & Muth en Mplus

More information

Multiple Imputation for Continuous and Categorical Data: Comparing Joint and Conditional Approaches

Multiple Imputation for Continuous and Categorical Data: Comparing Joint and Conditional Approaches Multiple Imputation for Continuous and Categorical Data: Comparing Joint and Conditional Approaches Jonathan Kropko University of Virginia Ben Goodrich Columbia University Andrew Gelman Columbia University

More information

STATISTICS (STAT) 200 Level Courses. 300 Level Courses. Statistics (STAT) 1

STATISTICS (STAT) 200 Level Courses. 300 Level Courses. Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) 200 Level Courses STAT 250: Introductory Statistics I. 3 credits. Elementary introduction to statistics. Topics include descriptive statistics, probability, and estimation

More information

PASW Missing Values 18

PASW Missing Values 18 i PASW Missing Values 18 For more information about SPSS Inc. software products, please visit our Web site at http://www.spss.com or contact SPSS Inc. 233 South Wacker Drive, 11th Floor Chicago, IL 60606-6412

More information

A Monotonic Sequence and Subsequence Approach in Missing Data Statistical Analysis

A Monotonic Sequence and Subsequence Approach in Missing Data Statistical Analysis Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 12, Number 1 (2016), pp. 1131-1140 Research India Publications http://www.ripublication.com A Monotonic Sequence and Subsequence Approach

More information

MHPE 494: Data Analysis. Welcome! The Analytic Process

MHPE 494: Data Analysis. Welcome! The Analytic Process MHPE 494: Data Analysis Alan Schwartz, PhD Department of Medical Education Memoona Hasnain,, MD, PhD, MHPE Department of Family Medicine College of Medicine University of Illinois at Chicago Welcome! Your

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Opening Windows into the Black Box

Opening Windows into the Black Box Opening Windows into the Black Box Yu-Sung Su, Andrew Gelman, Jennifer Hill and Masanao Yajima Columbia University, Columbia University, New York University and University of California at Los Angels July

More information

Notes on Simulations in SAS Studio

Notes on Simulations in SAS Studio Notes on Simulations in SAS Studio If you are not careful about simulations in SAS Studio, you can run into problems. In particular, SAS Studio has a limited amount of memory that you can use to write

More information

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by Paper CC-016 A macro for nearest neighbor Lung-Chang Chien, University of North Carolina at Chapel Hill, Chapel Hill, NC Mark Weaver, Family Health International, Research Triangle Park, NC ABSTRACT SAS

More information

Week 5: Multiple Linear Regression II

Week 5: Multiple Linear Regression II Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R

More information

Statistical modelling with missing data using multiple imputation. Session 2: Multiple Imputation

Statistical modelling with missing data using multiple imputation. Session 2: Multiple Imputation Statistical modelling with missing data using multiple imputation Session 2: Multiple Imputation James Carpenter London School of Hygiene & Tropical Medicine Email: james.carpenter@lshtm.ac.uk www.missingdata.org.uk

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY

SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 6 SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE

More information

Statistical Methods for the Analysis of Repeated Measurements

Statistical Methods for the Analysis of Repeated Measurements Charles S. Davis Statistical Methods for the Analysis of Repeated Measurements With 20 Illustrations #j Springer Contents Preface List of Tables List of Figures v xv xxiii 1 Introduction 1 1.1 Repeated

More information

FHDI: An R Package for Fractional Hot Deck Imputation by Jongho Im, In Ho Cho, and Jae Kwang Kim

FHDI: An R Package for Fractional Hot Deck Imputation by Jongho Im, In Ho Cho, and Jae Kwang Kim CONTRIBUTED RESEARCH ARTICLE 140 FHDI: An R Package for Fractional Hot Deck Imputation by Jongho Im, In Ho Cho, and Jae Kwang Kim Abstract Fractional hot deck imputation (FHDI), proposed by Kalton and

More information