Opening Windows into the Black Box

Similar documents
Journal of Statistical Software

Missing Data and Imputation

options(width = 65) suppressmessages(library(mi)) data(nlsyv, package = "mi")

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health

Generalized least squares (GLS) estimates of the level-2 coefficients,

Multiple imputation using chained equations: Issues and guidance for practice

An Algorithm for Creating Models for Imputation Using the MICE Approach:

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

CHAPTER 1 INTRODUCTION

Handbook of Statistical Modeling for the Social and Behavioral Sciences

Multiple Regression White paper

Predict Outcomes and Reveal Relationships in Categorical Data

CS 229 Midterm Review

Statistical matching: conditional. independence assumption and auxiliary information

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users

Ludwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer

Multiple Imputation with Mplus

NORM software review: handling missing values with multiple imputation methods 1

Multiple-imputation analysis using Stata s mi command

Example Using Missing Data 1

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Multiple Imputation for Multilevel Models with Missing Data Using Stat-JR

Missing Data Analysis for the Employee Dataset

Random projection for non-gaussian mixture models

5. Key ingredients for programming a random walk

Machine Learning Feature Creation and Selection

Excel 2010 with XLSTAT

Multiple Imputation for Continuous and Categorical Data: Comparing Joint and Conditional Approaches

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R

Recommender Systems New Approaches with Netflix Dataset

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Performance of Sequential Imputation Method in Multilevel Applications

A Monotonic Sequence and Subsequence Approach in Missing Data Statistical Analysis

From logistic to binomial & Poisson models

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

Logistic Regression. Abstract

Combine the PA Algorithm with a Proximal Classifier

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background

Overview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce

IBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can:

Missing Data Analysis for the Employee Dataset

Applying multiple imputation on waterbird census data Comparing two imputation methods

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

ANNOUNCING THE RELEASE OF LISREL VERSION BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3

Replication Note for A Bayesian Poisson Vector Autoregression Model

Tutorial on Machine Learning Tools

REALCOM-IMPUTE: multiple imputation using MLwin. Modified September Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS

Analysis of Incomplete Multivariate Data

10-701/15-781, Fall 2006, Final

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Introduction to ANSYS DesignXplorer

Preparing for Data Analysis

Weka ( )

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Applied Regression Modeling: A Business Approach

Missing Data. SPIDA 2012 Part 6 Mixed Models with R:

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham

Machine Learning (BSMC-GA 4439) Wenke Liu

Two-Stage Least Squares

CSC 2515 Introduction to Machine Learning Assignment 2

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT

Using Machine Learning to Optimize Storage Systems

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data

SAS High-Performance Analytics Products

7. Collinearity and Model Selection

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Mixture Models and EM

Variational Methods for Discrete-Data Latent Gaussian Models

Package mcemglm. November 29, 2015

[/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n} {NOT} {NODF} {NOPROB}] {NOCOUNTS} {NOMEANS}

Handling missing data for indicators, Susanne Rässler 1

Package midastouch. February 7, 2016

8. MINITAB COMMANDS WEEK-BY-WEEK

Statistical Methods for the Analysis of Repeated Measurements

Correctly Compute Complex Samples Statistics

LISREL 10.1 RELEASE NOTES 2 1 BACKGROUND 2 2 MULTIPLE GROUP ANALYSES USING A SINGLE DATA FILE 2

BAYESIAN OUTPUT ANALYSIS PROGRAM (BOA) VERSION 1.0 USER S MANUAL

Package optimus. March 24, 2017

Monte Carlo for Spatial Models

Lecture #11: The Perceptron

Mixture Models and the EM Algorithm

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400

STATISTICS (STAT) Statistics (STAT) 1

Package gee. June 29, 2015

An introduction to SPSS

JMP Book Descriptions

Simulation of Imputation Effects Under Different Assumptions. Danny Rithy

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation

Statistical Matching using Fractional Imputation

Transcription:

Opening Windows into the Black Box Yu-Sung Su, Andrew Gelman, Jennifer Hill and Masanao Yajima Columbia University, Columbia University, New York University and University of California at Los Angels July 8, 2009

Most MI packages are black boxes! Since Rubin pioneering work (1987) multiple imputation methods of non-response in surveys, the general statistical theory and framework for managing missing information has been well developed. Numerous software packages have been developed; yet, to a certain degree, they are black boxes. These software users must trust the imputation procedure without much control over what goes into it and without much understanding of what comes out.

Most MI packages are black boxes! Since Rubin pioneering work (1987) multiple imputation methods of non-response in surveys, the general statistical theory and framework for managing missing information has been well developed. Numerous software packages have been developed; yet, to a certain degree, they are black boxes. These software users must trust the imputation procedure without much control over what goes into it and without much understanding of what comes out.

Most MI packages are black boxes! Since Rubin pioneering work (1987) multiple imputation methods of non-response in surveys, the general statistical theory and framework for managing missing information has been well developed. Numerous software packages have been developed; yet, to a certain degree, they are black boxes. These software users must trust the imputation procedure without much control over what goes into it and without much understanding of what comes out.

Modeling and model checking shall not be neglected in MI! Modeling checking and other diagnostics are an important part of any statistical procedure. These examination are particularly important because of the inherent tension of multiple imputation. The model used for the imputations is not in general the same as the model used for the analysis. Our mi is an open-ended, open source package, not only to solve imputation problems, but also to develop and implement new ideas in modeling and model checking. Our mi uses the chain equations algorithm.

Modeling and model checking shall not be neglected in MI! Modeling checking and other diagnostics are an important part of any statistical procedure. These examination are particularly important because of the inherent tension of multiple imputation. The model used for the imputations is not in general the same as the model used for the analysis. Our mi is an open-ended, open source package, not only to solve imputation problems, but also to develop and implement new ideas in modeling and model checking. Our mi uses the chain equations algorithm.

Modeling and model checking shall not be neglected in MI! Modeling checking and other diagnostics are an important part of any statistical procedure. These examination are particularly important because of the inherent tension of multiple imputation. The model used for the imputations is not in general the same as the model used for the analysis. Our mi is an open-ended, open source package, not only to solve imputation problems, but also to develop and implement new ideas in modeling and model checking. Our mi uses the chain equations algorithm.

Modeling and model checking shall not be neglected in MI! Modeling checking and other diagnostics are an important part of any statistical procedure. These examination are particularly important because of the inherent tension of multiple imputation. The model used for the imputations is not in general the same as the model used for the analysis. Our mi is an open-ended, open source package, not only to solve imputation problems, but also to develop and implement new ideas in modeling and model checking. Our mi uses the chain equations algorithm.

Novel Features in our MI package Bayesian regression models to address problems with separation; Imputation steps that deal with structural correlation; Functions that check the convergence of the imputations; Plotting functions that visually check the imputation models.

Novel Features in our MI package Bayesian regression models to address problems with separation; Imputation steps that deal with structural correlation; Functions that check the convergence of the imputations; Plotting functions that visually check the imputation models.

Novel Features in our MI package Bayesian regression models to address problems with separation; Imputation steps that deal with structural correlation; Functions that check the convergence of the imputations; Plotting functions that visually check the imputation models.

Novel Features in our MI package Bayesian regression models to address problems with separation; Imputation steps that deal with structural correlation; Functions that check the convergence of the imputations; Plotting functions that visually check the imputation models.

Outline 1 Introduction 2 Basic Setup of mi 3 Novel Features 4 Concluding Remarks

The Basic Procedure of Doing a Sensible MI Setup Display of missing data patterns. Identifying structural problems in the data and preprocessing. Specifying the conditional models. Iterative imputation based on the conditional model. Imputation Checking the fit of conditional models. Checking the convergence of the procedure. Checking to see if the imputed values are reasonable. Analysis Obtaining completed data. Pooling the complete case analysis on multiply imputed datasets. Validation Sensitivity analysis. Cross validation.

The Basic Procedure of Doing a Sensible MI Setup Display of missing data patterns. Identifying structural problems in the data and preprocessing. Specifying the conditional models. Iterative imputation based on the conditional model. Imputation Checking the fit of conditional models. Checking the convergence of the procedure. Checking to see if the imputed values are reasonable. Analysis Obtaining completed data. Pooling the complete case analysis on multiply imputed datasets. Validation Sensitivity analysis. Cross validation.

The Basic Procedure of Doing a Sensible MI Setup Display of missing data patterns. Identifying structural problems in the data and preprocessing. Specifying the conditional models. Iterative imputation based on the conditional model. Imputation Checking the fit of conditional models. Checking the convergence of the procedure. Checking to see if the imputed values are reasonable. Analysis Obtaining completed data. Pooling the complete case analysis on multiply imputed datasets. Validation Sensitivity analysis. Cross validation.

The Basic Procedure of Doing a Sensible MI Setup Display of missing data patterns. Identifying structural problems in the data and preprocessing. Specifying the conditional models. Iterative imputation based on the conditional model. Imputation Checking the fit of conditional models. Checking the convergence of the procedure. Checking to see if the imputed values are reasonable. Analysis Obtaining completed data. Pooling the complete case analysis on multiply imputed datasets. Validation Sensitivity analysis. Cross validation.

Imputation Information Matrix names include order number.mis all.mis type collinear 1 h39b.w1 Yes 1 179 No nonnegative No 2 age.w1 Yes 2 24 No positive-continuous No 3 c28.w1 Yes 3 38 No positive-continuous No 4 pcs.w1 Yes 4 24 No positive-continuous No 5 mcs37.w1 Yes 5 24 No binary No 6 b05.w1 Yes 6 63 No ordered-categorical No 7 haartadhere.w1 Yes 7 24 No ordered-categorical No h39b.w1 "h39b.w1 ~ age.w1 + c28.w1 + pcs.w1 + mcs37.w1 + b05.w1 + haartadhere.w1" age.w1 "age.w1 ~ h39b.w1 + c28.w1 + pcs.w1 + mcs37.w1 + b05.w1 + haartadhere.w1" c28.w1 "c28.w1 ~ h39b.w1 + age.w1 + pcs.w1 + mcs37.w1 + b05.w1 + haartadhere.w1" pcs.w1 "pcs.w1 ~ h39b.w1 + age.w1 + c28.w1 + mcs37.w1 + b05.w1 + haartadhere.w1" mcs37.w1 "mcs37.w1 ~ h39b.w1 + age.w1 + c28.w1 + pcs.w1 + b05.w1 + haartadhere.w1" b05.w1 "b05.w1 ~ h39b.w1 + age.w1 + c28.w1 + pcs.w1 + mcs37.w1 + haartadhere.w1" haartadhere.w1 "haartadhere.w1 ~ h39b.w1 + age.w1 + c28.w1 + pcs.w1 + mcs37.w1 + b05.w1"

Variable Types L==1 fixed L==2 binary x is.character(x) is.ordered(x) unordered categorical ordered categorical L=length(unique(x)) is.numeric(x) continuous all(0<x<1) proportion 2<L<=5 L>5 & all(x>=0) L>5 & all(x>0) ordered categorical nonnegative positive continuous mi.preprocess() binary log continuous log continuous

Imputation Models List of regression models, corresponding to variable types Variable Types binary continuous count fixed log-continuous nonnegative ordered-categorical unordered-categorical positive-continuous proportion predictive-mean-matching Regression Models mi.binary mi.continuous mi.count mi.fixed mi.continuous mi.continuous mi.polr mi.categorical mi.continuous mi.continuous mi.pmm

Bayesian Models to Address Problems with Separation The problem of separation occurs whenever the outcome variable is perfectly predicted by a predictor or a linear combination of several predictors. The problem exacerbates when the number of predictors increases. However, multiple imputation in generally strengthened by including many variables to help to satisfy the missing at random assumption. To address this problem of separation, we have augmented our mi to allow for Bayesian versions of generalized linear models (Gelman, Jakulin, Pittau and Su 2008).

Bayesian Models to Address Problems with Separation List of Bayesian Generalized Linear Models, Used in Mi Regression Functions mi Functions Bayesian Functons mi.continuous() bayesglm() with gaussian family mi.binary() bayesglm() with binomial family (default uses logit link) mi.count() bayesglm() with quasi-poisson family (overdispersed poisson) mi.polr() bayespolr()

Imputing Semi-Continuous Data with Transformation positive-continuous, nonnegative and proportion variable types are, in general, no modeled in a reasonable way in other imputation software. These kinds of data have bounds or truncations and are not of standard distribution Our algorithm models these data with transformation. nonnegative: create two ancillary variable, one indicator for values bigger than 0; the other takes log of values bigger than 0. positive-continuous: log of such a variable. proportion: logit(x)

Imputing Semi-Continuous Data with Transformation positive-continuous, nonnegative and proportion variable types are, in general, no modeled in a reasonable way in other imputation software. These kinds of data have bounds or truncations and are not of standard distribution Our algorithm models these data with transformation. nonnegative: create two ancillary variable, one indicator for values bigger than 0; the other takes log of values bigger than 0. positive-continuous: log of such a variable. proportion: logit(x)

Imputing Semi-Continuous Data with Transformation positive-continuous, nonnegative and proportion variable types are, in general, no modeled in a reasonable way in other imputation software. These kinds of data have bounds or truncations and are not of standard distribution Our algorithm models these data with transformation. nonnegative: create two ancillary variable, one indicator for values bigger than 0; the other takes log of values bigger than 0. positive-continuous: log of such a variable. proportion: logit(x)

Imputing Semi-Continuous Data with Transformation positive-continuous, nonnegative and proportion variable types are, in general, no modeled in a reasonable way in other imputation software. These kinds of data have bounds or truncations and are not of standard distribution Our algorithm models these data with transformation. nonnegative: create two ancillary variable, one indicator for values bigger than 0; the other takes log of values bigger than 0. positive-continuous: log of such a variable. proportion: logit(x)

Imputing Semi-Continuous Data with Transformation x1 0.00 6.00 19.00 0.00 5.00 10.00 18.00 0.00 Original Dataset x2 6.00 5.00 10.00 11.00 5.00 14.00 x3 0.18 0.54 0.51 0.43 0.98 0.26 0.82 0.16 mi.preprocess() x1 1.79 2.94 1.61 2.30 2.89 Transformed Dataset x1.ind 0.00 1.00 1.00 0.00 1.00 1.00 1.00 0.00 x2 1.79 1.61 2.30 2.40 1.61 2.64 x3 1.49 0.15 0.04 0.27 4.00 1.06 1.54 1.65

Imputing Data with Collinearity Perfect Correlation: x 1 = 10x 2 5 Same missing pattern, mi duplicates values from the correlated variable. Different missing pattern, mi adds noises to the iterative imputation process (see next point). Additive constraints: x 1 = x 2 + x 3 + x 4 + x 5 + 10 Reshuffling noise: mi randomly imputes missing data from the marginal distribution, not from the conditional models Fading empirical noise: mi augments the data (drawn from the observed data) by 10% of the completed data. After one batch of finished mi, mi will run 20 more iteration without noises to alleviate the influence of the noise.

Imputing Data with Collinearity Perfect Correlation: x 1 = 10x 2 5 Same missing pattern, mi duplicates values from the correlated variable. Different missing pattern, mi adds noises to the iterative imputation process (see next point). Additive constraints: x 1 = x 2 + x 3 + x 4 + x 5 + 10 Reshuffling noise: mi randomly imputes missing data from the marginal distribution, not from the conditional models Fading empirical noise: mi augments the data (drawn from the observed data) by 10% of the completed data. After one batch of finished mi, mi will run 20 more iteration without noises to alleviate the influence of the noise.

Imputing Data with Collinearity Perfect Correlation: x 1 = 10x 2 5 Same missing pattern, mi duplicates values from the correlated variable. Different missing pattern, mi adds noises to the iterative imputation process (see next point). Additive constraints: x 1 = x 2 + x 3 + x 4 + x 5 + 10 Reshuffling noise: mi randomly imputes missing data from the marginal distribution, not from the conditional models Fading empirical noise: mi augments the data (drawn from the observed data) by 10% of the completed data. After one batch of finished mi, mi will run 20 more iteration without noises to alleviate the influence of the noise.

Imputing Data with Collinearity Perfect Correlation: x 1 = 10x 2 5 Same missing pattern, mi duplicates values from the correlated variable. Different missing pattern, mi adds noises to the iterative imputation process (see next point). Additive constraints: x 1 = x 2 + x 3 + x 4 + x 5 + 10 Reshuffling noise: mi randomly imputes missing data from the marginal distribution, not from the conditional models Fading empirical noise: mi augments the data (drawn from the observed data) by 10% of the completed data. After one batch of finished mi, mi will run 20 more iteration without noises to alleviate the influence of the noise.

Imputing Data with Collinearity Perfect Correlation: x 1 = 10x 2 5 Same missing pattern, mi duplicates values from the correlated variable. Different missing pattern, mi adds noises to the iterative imputation process (see next point). Additive constraints: x 1 = x 2 + x 3 + x 4 + x 5 + 10 Reshuffling noise: mi randomly imputes missing data from the marginal distribution, not from the conditional models Fading empirical noise: mi augments the data (drawn from the observed data) by 10% of the completed data. After one batch of finished mi, mi will run 20 more iteration without noises to alleviate the influence of the noise.

Imputing Data with Collinearity class 1 2 3 4 5 6 7 8 9 10 Structured Correlated Dataset total 66 76 51 73 46 59 female 27 37 31 24 39 26 male 35 23 34 39

Checking the Convergence of the Imputation mi monitors the mixing of each variable by the variance of its mean and standard deviation within and between different chains of the imputation (ˆR < 1.1 (Gelman et al 2004)) By default, mi monitors the convergence of the mean and standard deviation of the imputed data. More strictly, mi can also monitor the convergence of the parameters of the conditional models.

Checking the Convergence of the Imputation mi monitors the mixing of each variable by the variance of its mean and standard deviation within and between different chains of the imputation (ˆR < 1.1 (Gelman et al 2004)) By default, mi monitors the convergence of the mean and standard deviation of the imputed data. More strictly, mi can also monitor the convergence of the parameters of the conditional models.

Checking the Convergence of the Imputation mi monitors the mixing of each variable by the variance of its mean and standard deviation within and between different chains of the imputation (ˆR < 1.1 (Gelman et al 2004)) By default, mi monitors the convergence of the mean and standard deviation of the imputed data. More strictly, mi can also monitor the convergence of the parameters of the conditional models.

Model checking and Other Diagnostic for the Imputation Using Graphics We offers three strategies: Imputations are typically generated using models, such as regressions or multivariate distributions, which are fit to observed data. Thus the fit of these models can be checked (Gelman 2005). Imputations can be checked using a standard of reasonability: the differences between observed and missing values, and the distribution of the completed data as a whole, can be checked to see whether they make sense in the context of the problem being studied (Abayomi et al 2008). We can use cross-validation to perform sensitivity analysis to violations of our assumptions.

Model checking and Other Diagnostic for the Imputation Using Graphics 0 10 30 50 h39b.w1 h39b.w1 Frequency 1.6 2.0 2.4 2.8 2.05 2.15 2.25 2.35 0.4 0.0 0.2 0.4 h39b.w1 Predicted Residuals Residuals vs Predicted 2.10 2.20 2.30 0.10 0.00 0.10 h39b.w1 Expected Values Average residual 2.0 2.1 2.2 2.3 1.6 2.0 2.4 2.8 h39b.w1 predicted h39b.w1 0 20 40 60 80 age.w1 age.w1 Frequency 3.0 3.4 3.8 4.2 3.60 3.70 3.80 3.90 0.6 0.2 0.2 age.w1 Predicted Residuals Residuals vs Predicted 3.65 3.75 3.85 0.10 0.00 0.10 age.w1 Expected Values Average residual 3.60 3.70 3.80 3.90 3.0 3.4 3.8 4.2 age.w1 predicted age.w1 0 40 80 120 c28.w1 c28.w1 Frequency 1.0 0.0 1.0 2.0 0.7 0.9 1.1 1.3 1.0 0.0 1.0 c28.w1 Predicted Residuals Residuals vs Predicted 0.8 1.0 1.2 0.3 0.1 0.1 0.3 c28.w1 Expected Values Average residual 0.7 0.9 1.1 1.3 1.0 0.0 1.0 2.0 c28.w1 predicted c28.w1

Model checking and Other Diagnostic for the Imputation Using Graphics D2 Complete Subset Data: rvote, eth rvote 0.2 0.3 0.4 0.5 'actual' all available imputed all after imputation mean of hidden mean of all rvote 0.0 0.1 0.2 0.3 0.4 0.5 0.6 complete available imputed 1 2 3 4 5 eth 1 2 3 4 5 eth D2conv Complete Subset Data: rvote, eth 0.5 0.6

The Contributions of our MI package 1 Graphical diagnostics of imputation models and convergence of the imputation process. 2 Use of Bayesian version of regression models to handle the issue of separation. 3 Imputation model specification is similar to the way in which you would fit a regression model in R 4 Automatical detection of problematic characteristics fo data followed by either a fix or an alert to the user. In particular, mi add noise into the imputation process to solve the problem of additive constraints.

The Contributions of our MI package 1 Graphical diagnostics of imputation models and convergence of the imputation process. 2 Use of Bayesian version of regression models to handle the issue of separation. 3 Imputation model specification is similar to the way in which you would fit a regression model in R 4 Automatical detection of problematic characteristics fo data followed by either a fix or an alert to the user. In particular, mi add noise into the imputation process to solve the problem of additive constraints.

The Contributions of our MI package 1 Graphical diagnostics of imputation models and convergence of the imputation process. 2 Use of Bayesian version of regression models to handle the issue of separation. 3 Imputation model specification is similar to the way in which you would fit a regression model in R 4 Automatical detection of problematic characteristics fo data followed by either a fix or an alert to the user. In particular, mi add noise into the imputation process to solve the problem of additive constraints.

The Contributions of our MI package 1 Graphical diagnostics of imputation models and convergence of the imputation process. 2 Use of Bayesian version of regression models to handle the issue of separation. 3 Imputation model specification is similar to the way in which you would fit a regression model in R 4 Automatical detection of problematic characteristics fo data followed by either a fix or an alert to the user. In particular, mi add noise into the imputation process to solve the problem of additive constraints.

The Future Plan Speed up! Modeling time-series cross-sectional data, hierarchical or clustered data.

Link to the Full Paper Yu-Sung Su, Andrew Gelman, Jennifer Hill, and Masanao Yajima. Multiple imputation with diagnostics (mi) in R: Opening windows into the black box. Journal of Statistical Software. Available at: http://www.stat.columbia.edu/~gelman/research/ published/mipaper.rev04.pdf