Missing Data Techniques

Size: px
Start display at page:

Download "Missing Data Techniques"

Transcription

1 Missing Data Techniques Paul Philippe Pare Department of Sociology, UWO Centre for Population, Aging, and Health, UWO London Criminometrics ( 1

2 Introduction Missing data is a common problem in quantitative social research Multivariate analysis of large sample surveys: even a small proportion of missing data on many variables quickly adds up to a large number if cases aredeleted Analysis of small sample datasets (e.g. Clinical data, cross national data, quantification of qualitative data): every case counts! Analysis of variables involving sensitive topics 2

3 Introduction Main problems pobe with missing data If missing cases are deleted Reduced sample size and lower statistical power Biased estimates: sample selection If missing cases are treated (imputation) Biasedestimates: estimates: inadequate imputation Biased standard errors and sig. Tests: over fitted imputation Publication of results Journal editors and reviewers are increasingly strict about missing data problems and solutions Gone are the days when one could just add a footnote: missing data were deleted or replaced with the variables average 3

4 I am not: My background An inventor of new techniques to deal with missing data Publishing studies on the substantive topic of missing data I am: A reader (or consumer) of studies on missing data A user of missing data techniques in my own studies 4

5 Missing data theory: Paradise, Purgatory, and Hell Hll Paradise: Missing completely at Random (MCAR) The missing cases are unrelated to any variable in the analysis (including the variable with missing data itself) The pattern of missing cases is random in the common sense definition of random: roll of dices Example: A survey asked respondents about their age. 3% did not answer the question. No pattern is observed with ihother variables. ibl Why MCAR is paradise? Most missing data techniques will work well and provide unbiased and adequate estimates 5

6 Missing data theory: Paradise, Purgatory, and Hell Hll Purgatory: Missing at Random (MAR) The missing cases are associated with other variables in the analysis, but not with the variable with missing data itself. The pattern of missing cases is initially not random but this is taken into account by more sophisticated missing data techniques. The missing data are random once relevant correlates are included. Example: A survey asked respondents about their age. 2% of men and 6% of women did not answer the question. Gender, however, is used as a variable for the missing data technique and the analysis. Why MAR is purgatory? There is an original bias involving the missing data, but the problem can be solved with more sophisticated missing data techniques 6

7 Missing data theory: Paradise, Purgatory, and Hell Hll Hell: Non Random (NR) The missing cases are associated with the variable with missing data itself. The pattern of missing cases is not random and there is no real solution. More sophisticated missing data techniques provide some improvement but the bias is still there. The problem should be reported as a limitation of the study. Example: A survey asked respondents about their income. 10% of respondents did not answer the question. Respondents with particularly high or low income were less likely to answer the income question. Why NR is hell? Non random missing data create biases that cannot totally be solved. At best, more sophisticated missing data techniques will reduce the bias. 7

8 Missing Data as Substantive Dependent Variable 8

9 Missing Data as DV A first step in dealing with missing data is to understand the nature of the missing data pattern This can be done empirically: Cases with missing i values are coded dd1, cases without t missing values are coded 0 This variable is treated as a DV in a logistic regression or similar model, and other variables are used as predictors This analysis will reveal the characteristics of cases that are more likely to have missing values Similar to attrition analysis in longitudinal studies, but can also be done with non longitudinal data 9

10 Missing Data as DV Why is analyzing missing data as a DV useful? Provide the researcher with a substantive bt ti understanding of the missing data pattern Canhelp with selecting the best technique to address the missing data problem Canhelp with using the technique: creating weights, creating imputation data Depending on space, can be a part of your story: Missing data analysis as a section of a masters thesis, doctoral dissertation, or book. Appendix or major footnote in a journal article. 10

11 Missing Data Techniques: Traditional and Modern Approaches 11

12 Traditional Techniques: Listwise dl deletion Deleting any case with missing data Advantages: Simple (default option in many stat. ttp Programs) Acceptable with small number of MD (i.e. less than 5% for the full sample). Disadvantages: Quickly reduces sample size and statistical power when many variables have missing data Undetected dselection biases 12

13 Traditional Techniques: Pairwise dl deletion Using all available data between pair of variables to calculate covariance; only delete pairs with specific missing data (not the whole case is deleted) Advantages: Simple All available data is used Disadvantages: Mathematical problems: covariances have different sample sizes; different parts of the model have different degrees of freedom False advantage: given the mathematical problems, the point that All available data is used is more a source of bias than a source of validity 13

14 Traditional Techniques: Mean imputation Substituting thevariable s mean instead of missing data Advantages: Simple Full sample size is preserved Disadvantages: Theoretically illogical Sometimes overestimate coefficients, sometimes underestimate coefficients 14

15 Traditional Techniques: Hot deck imputation Substituting tut the answer from a randomly selected ected similar unit instead of missing data Advantages: Full sample size is preserved Logical Disadvantages: Over fitting: artificially increases statistical power by assuming that similar units are actually all identical. Lower SE and generous significance test. Justify the method for selecting similar units (how many variables?; how similar?, etc.) 15

16 Traditional Techniques: Single imputation Substituting model predicted values instead of missing data. Typically based on a multivariate model and the same variables used in the main analysis. Advantages: Logical Fullsamplesize size ispreserved Powerful yet simple method Disadvantage: Over fitting: artificially increases statistical power by assuming that predicted values are the same as observed values. Lower SE and generous significance test. 16

17 Traditional Techniques: Extra dummy variable Adding an extra dummy variable coded 1 for all the missing values and 0 otherwise to a series of dummy variables. Ex: Education: University graduate (ref); High School graduate (dummy); Less than high school graduate (dummy); Unknown education (dummy) Advantages: Full sample size is preserved The effect of the missing data dummy variable is empirically measured Disadvantages: Heterogeneity: the missing data dummy variable possibly combines very different cases together Requires many extra dummy variables if missing data are on multiple variables Requires original variables to be dummy coded 17

18 Traditional Techniques: Weighting Creating a statistical weight to compensate for the pattern of missing data (i.e. More weight to cases that are more likely to be missing) Advantages: Logical Not based on artificial a (imputed) ed)dataa Disadvantages: Decision must be made about how to create the weight iht Conflict with other recommended weights (e.g. Survey based sampling weight) Some scholars are skeptical of weighted analysis 18

19 Modern Techniques: Multiple l imputation ti Substituting model based predicted values, including random error or noise, instead ofmissing data. Repeat 5 10 times. Analyze datasets (Regression, SEM, etc.). Calculate average pooled coefficients and standard errors from these 5 10 datasets. Advantages: Same as single imputation: logical, full sample is preserved By including random error in the repeated imputations, imputed missing data are more noisy than observed data, thus overfitting is prevented Disadvantages: Requires more statistical work and knowledge; might require the use of different statistical programs Not necessarily an available optionfor all kindsof models (e.g. Propensity score matching within a probit Heckman selection model). 19

20 Modern Techniques: Maximum Likelihood Eti Estimation Using advanced mathematics and computer estimations (Maximum likelihood estimation, Expectation Maximization algorithms) to estimate model s coefficients and standard errors with missing data. Missing data are not imputed. Best fitting parameter estimates are selected via iterations that maximize the probability of observing the data that were collected. With FIML, parameters are directly estimated with the missing data With other ML methods, missing data are mathematically removed from the estimation, under MAR assumptions Advantages: Powerful and mathematically beautiful Not based on artificial (imputed) data Disadvantages: Requires more statistical work and knowledge; might require the use of different statistical programs Not necessarily an available option for all kinds of models. More common in recent SEM programs. 20

21 Other issues: General approaches vs. model specific dl Today s presentation focused on the general approaches to missing data Basic multivariate models: OLS regression; Logistic regression; ANOVA, etc. Literature on model specific techniques to deal with missing data Multilevel models Structural equation models Scales/factor analysis/principal component analysis Panel models/discrete dl/di time series/event history If using these models, it is recommended to also read the specific literature in addition to the general missing data literature. Some results/conclusions are different. 21

22 Other issues: Softwares General Statistical Programs: SPSS: Limited. Expectation Maximization imputation best function. Controversy: More likea sophisticated single imputation than a Maximum likelihood estimation. AMOS: can estimate models with missing data by Full Information Maximum Likelihood (extra cost) MI: Multiple imputation: new SPSS upgrade (extra cost) Stata: Multiple imputation: mi, ice, micombine SAS: Multipleimputation: imputation: MI, MIANALYZE R: AMELIA 2 Multiple Imputation Mplus: many Maximum likelihood estimation options are available HLM: some multiple imputation and Maximum Likelihood options Missing i data dt program: NORM: Multiple imputation 22

23 Conclusion Many techniques are available to deal with missing data Multiple imputation and maximum likelihood estimation are the most powerful techniques, but are also the most complicated Traditional techniques are sometimes adequate, but not always (MI and ML are popular for a reason) Some techniques should be avoided: mean imputation and pairwise deletion It is typically a good idea to analyze your missing data like a DV to better understand dtheir hipatterns If you plan on using more sophisticated missing data techniques, take time to understand the options and limitations of your statistical software. 23

24 General literature: References Allison, P. (2002) Missing data. Sage (one of Sage s green books on methodology) Johnson, D. & R. Young (2011) Toward best practices in analyzing datasets with missing data: Comparisons and recommendations. Journal of Marriage and Family 73: Acock, A. (2005) Working with missing values. Journal of Marriage and Family 67: Raghunathan, T. (2004) What to do with missing data? Some options for analysis of incomplete data. Annual lreview of Public Health h25: Graham, J. (2009) Missing data analysis: Making it work in the real world. Annual Review of Psychology 60: Model specific literature: Roth et al. (1999) Missing data in multiple item scales: Monte Carlo analysis of missing data techniques. Organizational Research Methods 2: Honaker, J. & G. King (2010) What to Do about Missing Values in Time Series Cross Section Data. American Journal of Political Science 54:

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU. SOS3003 Applied data analysis for social science Lecture note 04-2009 Erling Berge Department of sociology and political science NTNU Erling Berge 2009 1 Missing data Literature Allison, Paul D 2002 Missing

More information

Missing Data: What Are You Missing?

Missing Data: What Are You Missing? Missing Data: What Are You Missing? Craig D. Newgard, MD, MPH Jason S. Haukoos, MD, MS Roger J. Lewis, MD, PhD Society for Academic Emergency Medicine Annual Meeting San Francisco, CA May 006 INTRODUCTION

More information

Missing Data Analysis with SPSS

Missing Data Analysis with SPSS Missing Data Analysis with SPSS Meng-Ting Lo (lo.194@osu.edu) Department of Educational Studies Quantitative Research, Evaluation and Measurement Program (QREM) Research Methodology Center (RMC) Outline

More information

Missing Data Missing Data Methods in ML Multiple Imputation

Missing Data Missing Data Methods in ML Multiple Imputation Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:

More information

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa Ronald H. Heck 1 In this handout, we will address a number of issues regarding missing data. It is often the case that the weakest point of a study is the quality of the data that can be brought to bear

More information

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Multiple Imputation for Missing Data Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Outline Missing data mechanisms What is Multiple Imputation? Software Options

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex

More information

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in

More information

Missing data analysis. University College London, 2015

Missing data analysis. University College London, 2015 Missing data analysis University College London, 2015 Contents 1. Introduction 2. Missing-data mechanisms 3. Missing-data methods that discard data 4. Simple approaches that retain all the data 5. RIBG

More information

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017 Introduction to Hierarchical Linear Model Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017 1 Outline What is Hierarchical Linear Model? Why do nested data create analytic problems? Graphic presentation

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

The Performance of Multiple Imputation for Likert-type Items with Missing Data

The Performance of Multiple Imputation for Likert-type Items with Missing Data Journal of Modern Applied Statistical Methods Volume 9 Issue 1 Article 8 5-1-2010 The Performance of Multiple Imputation for Likert-type Items with Missing Data Walter Leite University of Florida, Walter.Leite@coe.ufl.edu

More information

PRI Workshop Introduction to AMOS

PRI Workshop Introduction to AMOS PRI Workshop Introduction to AMOS Krissy Zeiser Pennsylvania State University klz24@pop.psu.edu 2-pm /3/2008 Setting up the Dataset Missing values should be recoded in another program (preferably with

More information

Missing Data Part 1: Overview, Traditional Methods Page 1

Missing Data Part 1: Overview, Traditional Methods Page 1 Missing Data Part 1: Overview, Traditional Methods Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 17, 2015 This discussion borrows heavily from: Applied

More information

Simulation of Imputation Effects Under Different Assumptions. Danny Rithy

Simulation of Imputation Effects Under Different Assumptions. Danny Rithy Simulation of Imputation Effects Under Different Assumptions Danny Rithy ABSTRACT Missing data is something that we cannot always prevent. Data can be missing due to subjects' refusing to answer a sensitive

More information

ESTIMATING THE MISSING VALUES IN ANALYSIS OF VARIANCE TABLES BY A FLEXIBLE ADAPTIVE ARTIFICIAL NEURAL NETWORK AND FUZZY REGRESSION MODELS

ESTIMATING THE MISSING VALUES IN ANALYSIS OF VARIANCE TABLES BY A FLEXIBLE ADAPTIVE ARTIFICIAL NEURAL NETWORK AND FUZZY REGRESSION MODELS ESTIMATING THE MISSING VALUES IN ANALYSIS OF VARIANCE TABLES BY A FLEXIBLE ADAPTIVE ARTIFICIAL NEURAL NETWORK AND FUZZY REGRESSION MODELS Ali Azadeh - Zahra Saberi Hamidreza Behrouznia-Farzad Radmehr Peiman

More information

Longitudinal Modeling With Randomly and Systematically Missing Data: A Simulation of Ad Hoc, Maximum Likelihood, and Multiple Imputation Techniques

Longitudinal Modeling With Randomly and Systematically Missing Data: A Simulation of Ad Hoc, Maximum Likelihood, and Multiple Imputation Techniques 10.1177/1094428103254673 ORGANIZATIONAL Newman / LONGITUDINAL RESEARCH MODELS METHODS WITH MISSING DATA ARTICLE Longitudinal Modeling With Randomly and Systematically Missing Data: A Simulation of Ad Hoc,

More information

Epidemiological analysis PhD-course in epidemiology

Epidemiological analysis PhD-course in epidemiology Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 9. oktober 2012 Multivariate tables Agenda today Age standardization Missing data 1 2 3 4 Age standardization

More information

Missing Data in Orthopaedic Research

Missing Data in Orthopaedic Research in Orthopaedic Research Keith D Baldwin, MD, MSPT, MPH, Pamela Ohman-Strickland, PhD Abstract Missing data can be a frustrating problem in orthopaedic research. Many statistical programs employ a list-wise

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

NORM software review: handling missing values with multiple imputation methods 1

NORM software review: handling missing values with multiple imputation methods 1 METHODOLOGY UPDATE I Gusti Ngurah Darmawan NORM software review: handling missing values with multiple imputation methods 1 Evaluation studies often lack sophistication in their statistical analyses, particularly

More information

SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY

SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 6 SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE

More information

Introduction to Mplus

Introduction to Mplus Introduction to Mplus May 12, 2010 SPONSORED BY: Research Data Centre Population and Life Course Studies PLCS Interdisciplinary Development Initiative Piotr Wilk piotr.wilk@schulich.uwo.ca OVERVIEW Mplus

More information

Epidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014

Epidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014 Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 25 th February 2014 Age standardization Incidence and prevalence are strongly agedependent Risks rising

More information

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background An Introduction to Multiple Imputation and its Application Craig K. Enders University of California - Los Angeles Department of Psychology cenders@psych.ucla.edu Background Work supported by Institute

More information

PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects

PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects Let s create a data for a variable measured repeatedly over five occasions We could create raw data (for each

More information

Multiple imputation using chained equations: Issues and guidance for practice

Multiple imputation using chained equations: Issues and guidance for practice Multiple imputation using chained equations: Issues and guidance for practice Ian R. White, Patrick Royston and Angela M. Wood http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/full By Gabrielle Simoneau

More information

Introduction to Mixed Models: Multivariate Regression

Introduction to Mixed Models: Multivariate Regression Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate

More information

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a Chapter 3 Missing Data 3.1 Types of Missing Data ˆ Missing completely at random (MCAR) ˆ Missing at random (MAR) a ˆ Informative missing (non-ignorable non-response) See 1, 38, 59 for an introduction to

More information

Handling Missing Data

Handling Missing Data Handling Missing Data Estie Hudes Tor Neilands UCSF Center for AIDS Prevention Studies Part 2 December 10, 2013 1 Contents 1. Summary of Part 1 2. Multiple Imputation (MI) for normal data 3. Multiple Imputation

More information

Missing Data. Where did it go?

Missing Data. Where did it go? Missing Data Where did it go? 1 Learning Objectives High-level discussion of some techniques Identify type of missingness Single vs Multiple Imputation My favourite technique 2 Problem Uh data are missing

More information

Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models

Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models Bengt Muth en University of California, Los Angeles Tihomir Asparouhov Muth en & Muth en Mplus

More information

Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation. Filip Lindhfors and Farhana Morko

Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation. Filip Lindhfors and Farhana Morko Bachelor thesis Department of Statistics Kandidatuppsats, Statistiska institutionen Nr 2014:5 Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation Filip

More information

Development of weighted model fit indexes for structural equation models using multiple imputation

Development of weighted model fit indexes for structural equation models using multiple imputation Graduate Theses and Dissertations Graduate College 2011 Development of weighted model fit indexes for structural equation models using multiple imputation Cherie Joy Kientoff Iowa State University Follow

More information

An Introduction to Growth Curve Analysis using Structural Equation Modeling

An Introduction to Growth Curve Analysis using Structural Equation Modeling An Introduction to Growth Curve Analysis using Structural Equation Modeling James Jaccard New York University 1 Overview Will introduce the basics of growth curve analysis (GCA) and the fundamental questions

More information

Generalized least squares (GLS) estimates of the level-2 coefficients,

Generalized least squares (GLS) estimates of the level-2 coefficients, Contents 1 Conceptual and Statistical Background for Two-Level Models...7 1.1 The general two-level model... 7 1.1.1 Level-1 model... 8 1.1.2 Level-2 model... 8 1.2 Parameter estimation... 9 1.3 Empirical

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

Statistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland

Statistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland Statistical Analysis Using Combined Data Sources: Discussion 2011 JPSM Distinguished Lecture University of Maryland 1 1 University of Michigan School of Public Health April 2011 Complete (Ideal) vs. Observed

More information

Missing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999.

Missing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999. 2 Schafer, J. L., Graham, J. W.: (2002). Missing Data: Our View of the State of the Art. Psychological methods, 2002, Vol 7, No 2, 47 77 Rosner, B. (2005) Fundamentals of Biostatistics, 6th ed, Wiley.

More information

Teaching students quantitative methods using resources from the British Birth Cohorts

Teaching students quantitative methods using resources from the British Birth Cohorts Centre for Longitudinal Studies, Institute of Education Teaching students quantitative methods using resources from the British Birth Cohorts Assessment of Cognitive Development through Childhood CognitiveExercises.doc:

More information

Analysis of missing values in simultaneous. functional relationship model for circular variables

Analysis of missing values in simultaneous. functional relationship model for circular variables Analysis of missing values in simultaneous linear functional relationship model for circular variables S. F. Hassan, Y. Z. Zubairi and A. G. Hussin* Centre for Foundation Studies in Science, University

More information

Multiple-imputation analysis using Stata s mi command

Multiple-imputation analysis using Stata s mi command Multiple-imputation analysis using Stata s mi command Yulia Marchenko Senior Statistician StataCorp LP 2009 UK Stata Users Group Meeting Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

- 1 - Fig. A5.1 Missing value analysis dialog box

- 1 - Fig. A5.1 Missing value analysis dialog box WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation

More information

HANDLING MISSING DATA

HANDLING MISSING DATA GSO international workshop Mathematic, biostatistics and epidemiology of cancer Modeling and simulation of clinical trials Gregory GUERNEC 1, Valerie GARES 1,2 1 UMR1027 INSERM UNIVERSITY OF TOULOUSE III

More information

Statistical matching: conditional. independence assumption and auxiliary information

Statistical matching: conditional. independence assumption and auxiliary information Statistical matching: conditional Training Course Record Linkage and Statistical Matching Mauro Scanu Istat scanu [at] istat.it independence assumption and auxiliary information Outline The conditional

More information

Chapters 5-6: Statistical Inference Methods

Chapters 5-6: Statistical Inference Methods Chapters 5-6: Statistical Inference Methods Chapter 5: Estimation (of population parameters) Ex. Based on GSS data, we re 95% confident that the population mean of the variable LONELY (no. of days in past

More information

Missing Data. SPIDA 2012 Part 6 Mixed Models with R:

Missing Data. SPIDA 2012 Part 6 Mixed Models with R: The best solution to the missing data problem is not to have any. Stef van Buuren, developer of mice SPIDA 2012 Part 6 Mixed Models with R: Missing Data Georges Monette 1 May 2012 Email: georges@yorku.ca

More information

Multiple Imputation with Mplus

Multiple Imputation with Mplus Multiple Imputation with Mplus Tihomir Asparouhov and Bengt Muthén Version 2 September 29, 2010 1 1 Introduction Conducting multiple imputation (MI) can sometimes be quite intricate. In this note we provide

More information

Simulation Study: Introduction of Imputation. Methods for Missing Data in Longitudinal Analysis

Simulation Study: Introduction of Imputation. Methods for Missing Data in Longitudinal Analysis Applied Mathematical Sciences, Vol. 5, 2011, no. 57, 2807-2818 Simulation Study: Introduction of Imputation Methods for Missing Data in Longitudinal Analysis Michikazu Nakai Innovation Center for Medical

More information

LISREL 10.1 RELEASE NOTES 2 1 BACKGROUND 2 2 MULTIPLE GROUP ANALYSES USING A SINGLE DATA FILE 2

LISREL 10.1 RELEASE NOTES 2 1 BACKGROUND 2 2 MULTIPLE GROUP ANALYSES USING A SINGLE DATA FILE 2 LISREL 10.1 RELEASE NOTES 2 1 BACKGROUND 2 2 MULTIPLE GROUP ANALYSES USING A SINGLE DATA FILE 2 3 MODELS FOR GROUPED- AND DISCRETE-TIME SURVIVAL DATA 5 4 MODELS FOR ORDINAL OUTCOMES AND THE PROPORTIONAL

More information

Lecture 26: Missing data

Lecture 26: Missing data Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,

More information

REALCOM-IMPUTE: multiple imputation using MLwin. Modified September Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol

REALCOM-IMPUTE: multiple imputation using MLwin. Modified September Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol REALCOM-IMPUTE: multiple imputation using MLwin. Modified September 2014 by Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol This description is divided into two sections. In the

More information

Nuts and Bolts Research Methods Symposium

Nuts and Bolts Research Methods Symposium Organizing Your Data Jenny Holcombe, PhD UT College of Medicine Nuts & Bolts Conference August 16, 3013 Topics to Discuss: Types of Variables Constructing a Variable Code Book Developing Excel Spreadsheets

More information

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400 Statistics (STAT) 1 Statistics (STAT) STAT 1200: Introductory Statistical Reasoning Statistical concepts for critically evaluation quantitative information. Descriptive statistics, probability, estimation,

More information

Handbook of Statistical Modeling for the Social and Behavioral Sciences

Handbook of Statistical Modeling for the Social and Behavioral Sciences Handbook of Statistical Modeling for the Social and Behavioral Sciences Edited by Gerhard Arminger Bergische Universität Wuppertal Wuppertal, Germany Clifford С. Clogg Late of Pennsylvania State University

More information

ANNOUNCING THE RELEASE OF LISREL VERSION BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3

ANNOUNCING THE RELEASE OF LISREL VERSION BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3 ANNOUNCING THE RELEASE OF LISREL VERSION 9.1 2 BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3 THREE-LEVEL MULTILEVEL GENERALIZED LINEAR MODELS 3 FOUR

More information

DETAILED CONTENTS. About the Editor About the Contributors PART I. GUIDE 1

DETAILED CONTENTS. About the Editor About the Contributors PART I. GUIDE 1 DETAILED CONTENTS Preface About the Editor About the Contributors xiii xv xvii PART I. GUIDE 1 1. Fundamentals of Hierarchical Linear and Multilevel Modeling 3 Introduction 3 Why Use Linear Mixed/Hierarchical

More information

Amelia multiple imputation in R

Amelia multiple imputation in R Amelia multiple imputation in R January 2018 Boriana Pratt, Princeton University 1 Missing Data Missing data can be defined by the mechanism that leads to missingness. Three main types of missing data

More information

Single missing data imputation in PLS-based structural equation modeling

Single missing data imputation in PLS-based structural equation modeling Single imputation in PLS-based structural equation modeling Ned Kock Full reference: Kock (2018). Single imputation in PLS-based structural equation modeling. Journal of Modern Applied Statistical Methods,

More information

MISSING DATA AND MULTIPLE IMPUTATION

MISSING DATA AND MULTIPLE IMPUTATION Paper 21-2010 An Introduction to Multiple Imputation of Complex Sample Data using SAS v9.2 Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT This

More information

A Monotonic Sequence and Subsequence Approach in Missing Data Statistical Analysis

A Monotonic Sequence and Subsequence Approach in Missing Data Statistical Analysis Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 12, Number 1 (2016), pp. 1131-1140 Research India Publications http://www.ripublication.com A Monotonic Sequence and Subsequence Approach

More information

SC708: Hierarchical Linear Modeling Instructor: Natasha Sarkisian. Missing data

SC708: Hierarchical Linear Modeling Instructor: Natasha Sarkisian. Missing data SC708: Hierarchical Linear Modeling Instructor: Natasha Sarkisian Missing data In most datasets, we will encounter the problem of item non-response -- for various reasons respondents often leave particular

More information

Bootstrap and multiple imputation under missing data in AR(1) models

Bootstrap and multiple imputation under missing data in AR(1) models EUROPEAN ACADEMIC RESEARCH Vol. VI, Issue 7/ October 2018 ISSN 2286-4822 www.euacademic.org Impact Factor: 3.4546 (UIF) DRJI Value: 5.9 (B+) Bootstrap and multiple imputation under missing ELJONA MILO

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

R software and examples

R software and examples Handling Missing Data in R with MICE Handling Missing Data in R with MICE Why this course? Handling Missing Data in R with MICE Stef van Buuren, Methodology and Statistics, FSBS, Utrecht University Netherlands

More information

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Examples: Missing Data Modeling And Bayesian Analysis CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Mplus provides estimation of models with missing data using both frequentist and Bayesian

More information

Missing Data Analysis with the Mahalanobis Distance

Missing Data Analysis with the Mahalanobis Distance Missing Data Analysis with the Mahalanobis Distance by Elaine M. Berkery, B.Sc. Department of Mathematics and Statistics, University of Limerick A thesis submitted for the award of M.Sc. Supervisor: Dr.

More information

Using Amos For Structural Equation Modeling In Market Research

Using Amos For Structural Equation Modeling In Market Research Using Amos For Structural Equation Modeling In Market Research We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your

More information

Predict Outcomes and Reveal Relationships in Categorical Data

Predict Outcomes and Reveal Relationships in Categorical Data PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,

More information

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent

More information

Single missing data imputation in PLS-SEM. Ned Kock

Single missing data imputation in PLS-SEM. Ned Kock Single imputation in PLS-SEM Ned Kock December 2014 ScriptWarp Systems Laredo, Texas USA 1 Single imputation in PLS-SEM Ned Kock Full reference: Kock, N. (2014). Single imputation in PLS-SEM. Laredo, TX:

More information

Approaches to Missing Data

Approaches to Missing Data Approaches to Missing Data A Presentation by Russell Barbour, Ph.D. Center for Interdisciplinary Research on AIDS (CIRA) and Eugenia Buta, Ph.D. CIRA and The Yale Center of Analytical Studies (YCAS) April

More information

A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY

A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY Norman Solomon School of Computing and Technology University of Sunderland A thesis submitted in partial fulfilment of the requirements of the University

More information

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered

More information

Handling Data with Three Types of Missing Values:

Handling Data with Three Types of Missing Values: Handling Data with Three Types of Missing Values: A Simulation Study Jennifer Boyko Advisor: Ofer Harel Department of Statistics University of Connecticut Storrs, CT May 21, 2013 Jennifer Boyko Handling

More information

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Analysis of Panel Data Third Edition Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Contents Preface to the ThirdEdition Preface to the Second Edition Preface to the First Edition

More information

Panel Data 4: Fixed Effects vs Random Effects Models

Panel Data 4: Fixed Effects vs Random Effects Models Panel Data 4: Fixed Effects vs Random Effects Models Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 4, 2017 These notes borrow very heavily, sometimes verbatim,

More information

Supplementary Notes on Multiple Imputation. Stephen du Toit and Gerhard Mels Scientific Software International

Supplementary Notes on Multiple Imputation. Stephen du Toit and Gerhard Mels Scientific Software International Supplementary Notes on Multiple Imputation. Stephen du Toit and Gerhard Mels Scientific Software International Part A: Comparison with FIML in the case of normal data. Stephen du Toit Multivariate data

More information

Example Using Missing Data 1

Example Using Missing Data 1 Ronald H. Heck and Lynn N. Tabata 1 Example Using Missing Data 1 Creating the Missing Data Variable (Miss) Here is a data set (achieve subset MANOVAmiss.sav) with the actual missing data on the outcomes.

More information

Latent Curve Models. A Structural Equation Perspective WILEY- INTERSCIENΠKENNETH A. BOLLEN

Latent Curve Models. A Structural Equation Perspective WILEY- INTERSCIENΠKENNETH A. BOLLEN Latent Curve Models A Structural Equation Perspective KENNETH A. BOLLEN University of North Carolina Department of Sociology Chapel Hill, North Carolina PATRICK J. CURRAN University of North Carolina Department

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Florida State University Libraries

Florida State University Libraries Florida State University Libraries Electronic Theses, Treatises and Dissertations The Graduate School 2013 Use of Item Parceling in Structural Equation Modeling with Missing Data Fatih Orcan Follow this

More information

IBM SPSS Missing Values 21

IBM SPSS Missing Values 21 IBM SPSS Missing Values 21 Note: Before using this information and the product it supports, read the general information under Notices on p. 87. This edition applies to IBM SPSS Statistics 21 and to all

More information

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017 Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis

More information

Hierarchical Mixture Models for Nested Data Structures

Hierarchical Mixture Models for Nested Data Structures Hierarchical Mixture Models for Nested Data Structures Jeroen K. Vermunt 1 and Jay Magidson 2 1 Department of Methodology and Statistics, Tilburg University, PO Box 90153, 5000 LE Tilburg, Netherlands

More information

Missing Not at Random Models for Latent Growth Curve Analyses

Missing Not at Random Models for Latent Growth Curve Analyses Psychological Methods 20, Vol. 6, No., 6 20 American Psychological Association 082-989X//$2.00 DOI: 0.037/a0022640 Missing Not at Random Models for Latent Growth Curve Analyses Craig K. Enders Arizona

More information

Types of missingness and common strategies

Types of missingness and common strategies 9 th UK Stata Users Meeting 20 May 2003 Multiple imputation for missing data in life course studies Bianca De Stavola and Valerie McCormack (London School of Hygiene and Tropical Medicine) Motivating example

More information

PASW Missing Values 18

PASW Missing Values 18 i PASW Missing Values 18 For more information about SPSS Inc. software products, please visit our Web site at http://www.spss.com or contact SPSS Inc. 233 South Wacker Drive, 11th Floor Chicago, IL 60606-6412

More information

The Use of Sample Weights in Hot Deck Imputation

The Use of Sample Weights in Hot Deck Imputation Journal of Official Statistics, Vol. 25, No. 1, 2009, pp. 21 36 The Use of Sample Weights in Hot Deck Imputation Rebecca R. Andridge 1 and Roderick J. Little 1 A common strategy for handling item nonresponse

More information

STATISTICS (STAT) 200 Level Courses. 300 Level Courses. Statistics (STAT) 1

STATISTICS (STAT) 200 Level Courses. 300 Level Courses. Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) 200 Level Courses STAT 250: Introductory Statistics I. 3 credits. Elementary introduction to statistics. Topics include descriptive statistics, probability, and estimation

More information

Comparison of Hot Deck and Multiple Imputation Methods Using Simulations for HCSDB Data

Comparison of Hot Deck and Multiple Imputation Methods Using Simulations for HCSDB Data Comparison of Hot Deck and Multiple Imputation Methods Using Simulations for HCSDB Data Donsig Jang, Amang Sukasih, Xiaojing Lin Mathematica Policy Research, Inc. Thomas V. Williams TRICARE Management

More information

Analysis of Imputation Methods for Missing Data. in AR(1) Longitudinal Dataset

Analysis of Imputation Methods for Missing Data. in AR(1) Longitudinal Dataset Int. Journal of Math. Analysis, Vol. 5, 2011, no. 45, 2217-2227 Analysis of Imputation Methods for Missing Data in AR(1) Longitudinal Dataset Michikazu Nakai Innovation Center for Medical Redox Navigation,

More information

Description Remarks and examples References Also see

Description Remarks and examples References Also see Title stata.com intro 4 Substantive concepts Description Remarks and examples References Also see Description The structural equation modeling way of describing models is deceptively simple. It is deceptive

More information

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms

More information

" ( )* + #$ 1$( M$!.%3), e( * F ] M, #$ 3 F. Downloaded from journals.tums.ac.ir at 8:39 IRST on Sunday February 17th 2019 SEM

 ( )* + #$ 1$( M$!.%3), e( * F ] M, #$ 3 F. Downloaded from journals.tums.ac.ir at 8:39 IRST on Sunday February 17th 2019 SEM .83-89 #$%& :3! 3 396!!'!"&$%!# )!" " )* + 3 %& ' #$!" ' ' ' # )*+,-& $% &!"#!"# $% & ' ' ' # )*+,-& $% &!"#!"# $% & -& 3 ' ' ' # )*+,-& $% &!"#!"# $54-56 ' 3 34!! $ -& $ / * -. * % &' =3 93443948 ::;*

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College University at Albany Problem Set #7: Handling Egocentric Network Data Adapted from original by Peter V. Marsden, Harvard University Egocentric network data sometimes known as personal

More information

Development of Synthetic Microdata for Educational Use in Japan

Development of Synthetic Microdata for Educational Use in Japan 2013 Joint IASE / IAOS Satellite Conference, Macau Tower, Macau, China, 22nd-24th August, 2013 Development of Synthetic Microdata for Educational Use in Japan Naoki Makita Shinsuke Ito* National Statistics

More information