Epidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014

Size: px
Start display at page:

Download "Epidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014"

Transcription

1 Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 25 th February 2014

2 Age standardization Incidence and prevalence are strongly agedependent Risks rising (e.g. chronic diseases) or declining (e.g. measles) with age Comparisons between populations and over time may be very misleading A single age-independent index representing a set of age-specific rates may be more appropriate

3 Mortality in Denmark and Greenland, men, 1975 Direct standardization Please interpret this table? IR(DK-standardized to Greenlandic age-distribution) = 0.016* * * * * *66.5 = 3.8 Indirect standardization

4 Example Trend study of lung cancer incidence among women Denmark Lung Cancer Denmark Women 9 8 ratecrude segi 7 scand Lung Cancer Denmark Women ratecrude Example 2 Incidence of multiple sclerosis Denmark European Standard Population

5 Example indirect standardization 19,185 subjects (3,817 women) who attended outpatient clinics for alcohol abusers Copenhagen Compare incidence of heart disease by the incidence rate in the greater Copenhagen area

6 Problems Direct standardisation can produce unreliable estimates when the calculations are based on small numbers Indirect standardisations from different populations cannot be directly compared only compared to the standard Compared to regression methods Regression based methods are available but are rarely applied in practice When individual data are available (presence / absence of disease, age and sex), a logistic regression can be used to estimate the standardized rate The main advantage is that it allows adjustment by continuous variables in addition to categorical variables Missing data What does missing mean The pattern of missingness (nomenclature) How and why is it missing? Missing values Common in research Nonresponse Loss to follow-up Lack of overlap between linked data sets (not so common) Methods for handling

7 What is item nonresponse? Unit Nonresponse vs. Item Nonresponse ID Q1 Q2 Q ??? ID Q1 Q2 Q ? 1 458? ? Unit Nonresponse Examples Person who is not at home Person who does not pick up the phone Person who hangs up on you Rat that dies before the study The country you could not get data on etc. Item Nonresponse I Don t Know Refusals to respond Questions left blank Failed measurement etc. Best way to deal with Missing Data is not to have any

8 Minimizing Unit Nonresponse Call back if not home Refusal conversion Don t mess up Clear and understandable questionnaire Polite request Incentives Minimizing Item Nonresponse Well written questions Minimize misunderstandings cross-cultural example Standardized vs. non-standardized Minimize skip patterns What kind of missing data should be modeled? If an item is missing from your dataset but you suspect that it has a true value I don t know might simply mean I don t know Don t model it as if there was a true value Dead people (attrition) The pattern of missingness (nomenclature) Ignorable MCAR - Missing Completely at Random MAR - Missing at Random Non-ignorable NMAR - Not Missing at Random

9 Missing completely at random Missing Completely at Random: if the data are missing completely at random then missing values cannot be predicted any better Cause of missingness completely random process (like coin flip) Cause uncorrelated with variables of interest Example: parents move No bias if cause omitted In the unlikely event that the process is missing completely at random, then inferences based on complete cases are unbiased, but inefficient because we have lost some cases Missing at random Missingness may be related to measured variables But no residual relationship with unmeasured variables No bias if you control for measured variables For example, if highly educated are more likely to participate in a survey, then the process is missing at random as long we know the educational level of all persons If data is missing at random, then inferences based on complete cases will be biased and inefficient Missing not at random Non-Ignorable / NMAR: if the probability that a cell is missing depends on the unobserved value of the missing value For example, individuals responses to income questions, where high income people are more likely to refuse to answer survey questions about income and other variables in the data set cannot predict which respondents have high income If your missing data is non-ignorable, then inferences based on complete cases will be biased and inefficient Classical Missing Data Treatments Whatever you do, you are doing something Case Deletion Listwise (complete case analysis) Pairwise (available case analysis) Indicator variable (dummy variable) Single Imputation (Unconditional) Mean Imputation Conditional Mean Imputation (expected value) Weighting

10 Listwise Deletion and Multi-Item Excludes the whole case Default in most software Works if mechanism is MCAR and if pattern and sample size allows (need to have enough complete cases) Can be biased Pairwise Deletion An option for using all available information correlation/covariance matrixes Different calculations may be based on different populations Very unpredictable bias Indicator method For each variable with missing values, create a missing-value indicator to accompany the variable in all analysis Assumes MCAR Even if the stratum is just a random sample of all subjects, the stratum will yield a confounded estimate of the exposure effect Technique Mean imputation Calculate mean over cases that have values for Y Impute this mean where Y is missing Ditto for X 1, X 2, etc. Problems ignores relationships among X and Y underestimates covariances

11 (Unconditional) Mean Imputation Mean imputation Standard errors too low CI difficult to calculate Scatterplots are from Joe Schafer s website Conditional mean imputation Technique & implicit models If Y is missing impute mean of cases with similar values for X 1, X 2 Y = b 0 + X 1 b 1 + X 2 b 2 Likewise, if X 2 is missing impute mean of cases with similar values for X 1, Y X 1 = g 0 + X 1 g 1 + Y g 2 If both Y and X 2 are missing impute means of cases with similar values for X 1 Y = d 0 + X 1 d 1 X 2 = f 0 + X 1 f 1 Problem Ignores random components (no e) àunderestimates variances, se s Imputation of Expected Value Good for creating expected values Bad for multivariate analysis Decreases standard errors Creates overconfident outcomes Increases probability of Type I error

12 Problem with single imputation Underestimates se s! Treats imputed values like observed values when they are actually less certain Ignores imputation variation Sampling variation Imputation variation If you take a different sample you get different parameter estimates Standard errors reflect this One way to estimate sampling variation measure variation across multiple samples called bootstrapping Imputation variation If you impute different values you get different parameter estimates Standard errors should reflect this, too One way to estimate imputation variation measure variation across multiple imputed data sets called multiple imputation Multiple Imputation Example Models both expected value and uncertainty. Using the Missing Data Model you specify it simulates and imputes missing values multiple times creating M complete datasets (M=5 is usually OK. It is a good idea to simulate more) Analyze each dataset independently Combines results to get unbiased estimates. Models both uncertainty and expectation

13 Multiple Imputation Simple Procedure 1. Impute using PROC MI 3. Do analysis: PROC REG, LOGISTIC, etc. using by _imputation_; in the procedure 4. Combine results using PROC MIANALYZE PROC MI Sample Output PROC MI Typical syntax: proc mi data=bmx out=impdat seed=33155; var bmxbmi bmxht bmxwt bmxarmc bmxarml; run; data= 1 copy of data with missing values out= 5 copies of data with imputed values (will be different across copies) seed= random seed, you can keep same to reconstruct your results var Variables with missing values you need imputed, in model, and those that may be helpful with imputation PROC MI Options nimpute=5 # imputations, default=5 0 gives missing patterns set min & max, sometimes maximum= doesn t converge as well minimum= round= round off option

14 Output dataset Regression Fit your model as if data had no missing values, using by _imputation_; proc reg data=impdat outest=parmcov covout; model bmxbmi=bmxht bmxwt bmxarmc bmxarml; by _imputation_; run; You ll get nimpute (usually 5) sets of output Estimates, covariances, errors will be combined in MIANALYZE Need to generate parameter estimates and covariance data set (varies by procedure) Parameter Est. & Covariance Matrix proc logistic data=impdat descending; model bmxbmi=bmxht bmxwt bmxarmc bmxarml /covb; by _imputation_; ods output ParameterEstimates=parmsdat CovB=covbdat; run; proc mixed data=impdat; model bmxbmi=bmxht bmxwt bmxarmc bmxarml /solution covb; by _imputation_; ods output covparms=parmcov; run; Parameter Est. & Covariance Matrix proc genmod data=impdat; model bmxbmi=bmxht bmxwt bmxarmc bmxarml /covb; by _imputation_; ods output ParameterEstimates=parmsdat CovB=covbdat; run;

15 PROC MIANALYZE Syntax depends on what procedure you used in previous step: PROC MIANALYZE Output proc mianalyze data=parmcov; (or) proc mianalyze parms=parmsdat covb=covbdat; (or) proc mianalyze parms=parmsdat xpxi=xpxidat; (then type this:) modeleffects intercept bmxht bmxwt bmxarmc bmxarml; run; Note the var statement is now modeleffects Note that the dependent variable is omitted STATA *preparing dataset for multipel imputation mi query mi set mlong mi describe, detail mi register imputed total set seed mi impute mvn total = i.smoking i.isced4 i.samliv3 i.s57a_ i.alder4 i.gender, add(20) force mi describe, detail *rounding the imputed binary values to the nearest integer *replace bingedrinking = 0 if bingedrinking <0.5 *replace bingedrinking = 1 if bingedrinking >0.5 *replace change_new = round(change_new) *examination of imputations: comparing main descriptive statistics from some imputations to those from the observed data mi xeq : summarize total mi estimate: xtmixed total i.gender group##month username:, mle mi estimate: mean total, over(sex group month) Weigted regression Suppose that a national survey sampled 2000 subjects with 1000 men and 1000 women The response were 500 for men and 750 for women If there are large differences between men and women, a simple average of 2000 observations will be a distorted representation of the population mean By down-weighting women and up-weighting men we could obtain the accurate picture of the population

16 Values not missing at random (NMAR) Probability that values are missing depends on the missing values themselves e.g., the probability that weight Y is missing is higher for the overweight (depends on Y) is higher for women (depends on X1) and sometimes X1 is missing, too. Methods available not today!

17

Epidemiological analysis PhD-course in epidemiology

Epidemiological analysis PhD-course in epidemiology Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 9. oktober 2012 Multivariate tables Agenda today Age standardization Missing data 1 2 3 4 Age standardization

More information

Missing Data Missing Data Methods in ML Multiple Imputation

Missing Data Missing Data Methods in ML Multiple Imputation Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:

More information

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Multiple Imputation for Missing Data Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Outline Missing data mechanisms What is Multiple Imputation? Software Options

More information

MISSING DATA AND MULTIPLE IMPUTATION

MISSING DATA AND MULTIPLE IMPUTATION Paper 21-2010 An Introduction to Multiple Imputation of Complex Sample Data using SAS v9.2 Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT This

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

Simulation of Imputation Effects Under Different Assumptions. Danny Rithy

Simulation of Imputation Effects Under Different Assumptions. Danny Rithy Simulation of Imputation Effects Under Different Assumptions Danny Rithy ABSTRACT Missing data is something that we cannot always prevent. Data can be missing due to subjects' refusing to answer a sensitive

More information

Missing Data. Where did it go?

Missing Data. Where did it go? Missing Data Where did it go? 1 Learning Objectives High-level discussion of some techniques Identify type of missingness Single vs Multiple Imputation My favourite technique 2 Problem Uh data are missing

More information

Multiple-imputation analysis using Stata s mi command

Multiple-imputation analysis using Stata s mi command Multiple-imputation analysis using Stata s mi command Yulia Marchenko Senior Statistician StataCorp LP 2009 UK Stata Users Group Meeting Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi

More information

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background An Introduction to Multiple Imputation and its Application Craig K. Enders University of California - Los Angeles Department of Psychology cenders@psych.ucla.edu Background Work supported by Institute

More information

Lecture 26: Missing data

Lecture 26: Missing data Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:

More information

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a Chapter 3 Missing Data 3.1 Types of Missing Data ˆ Missing completely at random (MCAR) ˆ Missing at random (MAR) a ˆ Informative missing (non-ignorable non-response) See 1, 38, 59 for an introduction to

More information

Missing Data. SPIDA 2012 Part 6 Mixed Models with R:

Missing Data. SPIDA 2012 Part 6 Mixed Models with R: The best solution to the missing data problem is not to have any. Stef van Buuren, developer of mice SPIDA 2012 Part 6 Mixed Models with R: Missing Data Georges Monette 1 May 2012 Email: georges@yorku.ca

More information

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU. SOS3003 Applied data analysis for social science Lecture note 04-2009 Erling Berge Department of sociology and political science NTNU Erling Berge 2009 1 Missing data Literature Allison, Paul D 2002 Missing

More information

Missing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA

Missing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA Missing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA ABSTRACT Statistical analyses can be greatly hampered by missing

More information

Missing Data Techniques

Missing Data Techniques Missing Data Techniques Paul Philippe Pare Department of Sociology, UWO Centre for Population, Aging, and Health, UWO London Criminometrics (www.crimino.biz) 1 Introduction Missing data is a common problem

More information

Missing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999.

Missing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999. 2 Schafer, J. L., Graham, J. W.: (2002). Missing Data: Our View of the State of the Art. Psychological methods, 2002, Vol 7, No 2, 47 77 Rosner, B. (2005) Fundamentals of Biostatistics, 6th ed, Wiley.

More information

Missing Data: What Are You Missing?

Missing Data: What Are You Missing? Missing Data: What Are You Missing? Craig D. Newgard, MD, MPH Jason S. Haukoos, MD, MS Roger J. Lewis, MD, PhD Society for Academic Emergency Medicine Annual Meeting San Francisco, CA May 006 INTRODUCTION

More information

Handling Data with Three Types of Missing Values:

Handling Data with Three Types of Missing Values: Handling Data with Three Types of Missing Values: A Simulation Study Jennifer Boyko Advisor: Ofer Harel Department of Statistics University of Connecticut Storrs, CT May 21, 2013 Jennifer Boyko Handling

More information

Types of missingness and common strategies

Types of missingness and common strategies 9 th UK Stata Users Meeting 20 May 2003 Multiple imputation for missing data in life course studies Bianca De Stavola and Valerie McCormack (London School of Hygiene and Tropical Medicine) Motivating example

More information

Missing Data in Orthopaedic Research

Missing Data in Orthopaedic Research in Orthopaedic Research Keith D Baldwin, MD, MSPT, MPH, Pamela Ohman-Strickland, PhD Abstract Missing data can be a frustrating problem in orthopaedic research. Many statistical programs employ a list-wise

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,

More information

Simulation Study: Introduction of Imputation. Methods for Missing Data in Longitudinal Analysis

Simulation Study: Introduction of Imputation. Methods for Missing Data in Longitudinal Analysis Applied Mathematical Sciences, Vol. 5, 2011, no. 57, 2807-2818 Simulation Study: Introduction of Imputation Methods for Missing Data in Longitudinal Analysis Michikazu Nakai Innovation Center for Medical

More information

Missing Data Analysis with SPSS

Missing Data Analysis with SPSS Missing Data Analysis with SPSS Meng-Ting Lo (lo.194@osu.edu) Department of Educational Studies Quantitative Research, Evaluation and Measurement Program (QREM) Research Methodology Center (RMC) Outline

More information

Smoking and Missingness: Computer Syntax 1

Smoking and Missingness: Computer Syntax 1 Smoking and Missingness: Computer Syntax 1 Computer Syntax SAS code is provided for the logistic regression imputation described in this article. This code is listed in parts, with description provided

More information

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa Ronald H. Heck 1 In this handout, we will address a number of issues regarding missing data. It is often the case that the weakest point of a study is the quality of the data that can be brought to bear

More information

A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY

A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY Norman Solomon School of Computing and Technology University of Sunderland A thesis submitted in partial fulfilment of the requirements of the University

More information

HANDLING MISSING DATA

HANDLING MISSING DATA GSO international workshop Mathematic, biostatistics and epidemiology of cancer Modeling and simulation of clinical trials Gregory GUERNEC 1, Valerie GARES 1,2 1 UMR1027 INSERM UNIVERSITY OF TOULOUSE III

More information

Missing Data Part 1: Overview, Traditional Methods Page 1

Missing Data Part 1: Overview, Traditional Methods Page 1 Missing Data Part 1: Overview, Traditional Methods Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 17, 2015 This discussion borrows heavily from: Applied

More information

Missing data analysis. University College London, 2015

Missing data analysis. University College London, 2015 Missing data analysis University College London, 2015 Contents 1. Introduction 2. Missing-data mechanisms 3. Missing-data methods that discard data 4. Simple approaches that retain all the data 5. RIBG

More information

11.0 APPENDIX-B: COMPUTATION

11.0 APPENDIX-B: COMPUTATION 11.0 APPENDIX-B: COMPUTATION Computational details and the pseudo codes of the time-varying ARX(p t ) model and the MI-SRI composite imputation method will be given by using different statistical packages

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

Handling missing data for indicators, Susanne Rässler 1

Handling missing data for indicators, Susanne Rässler 1 Handling Missing Data for Indicators Susanne Rässler Institute for Employment Research & Federal Employment Agency Nürnberg, Germany First Workshop on Indicators in the Knowledge Economy, Tübingen, 3-4

More information

Week 10: Heteroskedasticity II

Week 10: Heteroskedasticity II Week 10: Heteroskedasticity II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Dealing with heteroskedasticy

More information

Statistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland

Statistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland Statistical Analysis Using Combined Data Sources: Discussion 2011 JPSM Distinguished Lecture University of Maryland 1 1 University of Michigan School of Public Health April 2011 Complete (Ideal) vs. Observed

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

Multiple imputation using chained equations: Issues and guidance for practice

Multiple imputation using chained equations: Issues and guidance for practice Multiple imputation using chained equations: Issues and guidance for practice Ian R. White, Patrick Royston and Angela M. Wood http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/full By Gabrielle Simoneau

More information

SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY

SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 6 SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE

More information

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in

More information

Amelia multiple imputation in R

Amelia multiple imputation in R Amelia multiple imputation in R January 2018 Boriana Pratt, Princeton University 1 Missing Data Missing data can be defined by the mechanism that leads to missingness. Three main types of missing data

More information

Introduction to Mixed Models: Multivariate Regression

Introduction to Mixed Models: Multivariate Regression Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate

More information

Multiple Imputation with Mplus

Multiple Imputation with Mplus Multiple Imputation with Mplus Tihomir Asparouhov and Bengt Muthén Version 2 September 29, 2010 1 1 Introduction Conducting multiple imputation (MI) can sometimes be quite intricate. In this note we provide

More information

Introduction to Mplus

Introduction to Mplus Introduction to Mplus May 12, 2010 SPONSORED BY: Research Data Centre Population and Life Course Studies PLCS Interdisciplinary Development Initiative Piotr Wilk piotr.wilk@schulich.uwo.ca OVERVIEW Mplus

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

NORM software review: handling missing values with multiple imputation methods 1

NORM software review: handling missing values with multiple imputation methods 1 METHODOLOGY UPDATE I Gusti Ngurah Darmawan NORM software review: handling missing values with multiple imputation methods 1 Evaluation studies often lack sophistication in their statistical analyses, particularly

More information

IBM SPSS Missing Values 21

IBM SPSS Missing Values 21 IBM SPSS Missing Values 21 Note: Before using this information and the product it supports, read the general information under Notices on p. 87. This edition applies to IBM SPSS Statistics 21 and to all

More information

Comparison of Hot Deck and Multiple Imputation Methods Using Simulations for HCSDB Data

Comparison of Hot Deck and Multiple Imputation Methods Using Simulations for HCSDB Data Comparison of Hot Deck and Multiple Imputation Methods Using Simulations for HCSDB Data Donsig Jang, Amang Sukasih, Xiaojing Lin Mathematica Policy Research, Inc. Thomas V. Williams TRICARE Management

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Examples: Missing Data Modeling And Bayesian Analysis CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Mplus provides estimation of models with missing data using both frequentist and Bayesian

More information

R software and examples

R software and examples Handling Missing Data in R with MICE Handling Missing Data in R with MICE Why this course? Handling Missing Data in R with MICE Stef van Buuren, Methodology and Statistics, FSBS, Utrecht University Netherlands

More information

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide STEPS Epi Info Training Guide Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further information: www.who.int/chp/steps WHO

More information

[/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n} {NOT} {NODF} {NOPROB}] {NOCOUNTS} {NOMEANS}

[/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n} {NOT} {NODF} {NOPROB}] {NOCOUNTS} {NOMEANS} MVA MVA [VARIABLES=] {varlist} {ALL } [/CATEGORICAL=varlist] [/MAXCAT={25 ** }] {n } [/ID=varname] Description: [/NOUNIVARIATE] [/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n}

More information

Machine Learning: An Applied Econometric Approach Online Appendix

Machine Learning: An Applied Econometric Approach Online Appendix Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail

More information

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion

More information

Panel Data 4: Fixed Effects vs Random Effects Models

Panel Data 4: Fixed Effects vs Random Effects Models Panel Data 4: Fixed Effects vs Random Effects Models Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 4, 2017 These notes borrow very heavily, sometimes verbatim,

More information

Tools for Imputing Missing Data

Tools for Imputing Missing Data ABSTRACT Tools for Imputing Missing Data Taylor Lewis, University of Maryland, College Park, MD Missing data frequently pose a problem to applied researchers and statisticians. Although a common approach

More information

Analysis of Imputation Methods for Missing Data. in AR(1) Longitudinal Dataset

Analysis of Imputation Methods for Missing Data. in AR(1) Longitudinal Dataset Int. Journal of Math. Analysis, Vol. 5, 2011, no. 45, 2217-2227 Analysis of Imputation Methods for Missing Data in AR(1) Longitudinal Dataset Michikazu Nakai Innovation Center for Medical Redox Navigation,

More information

PASW Missing Values 18

PASW Missing Values 18 i PASW Missing Values 18 For more information about SPSS Inc. software products, please visit our Web site at http://www.spss.com or contact SPSS Inc. 233 South Wacker Drive, 11th Floor Chicago, IL 60606-6412

More information

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered

More information

Methods for Estimating Change from NSCAW I and NSCAW II

Methods for Estimating Change from NSCAW I and NSCAW II Methods for Estimating Change from NSCAW I and NSCAW II Paul Biemer Sara Wheeless Keith Smith RTI International is a trade name of Research Triangle Institute 1 Course Outline Review of NSCAW I and NSCAW

More information

Faculty of Sciences. Holger Cevallos Valdiviezo

Faculty of Sciences. Holger Cevallos Valdiviezo Faculty of Sciences Handling of missing data in the predictor variables when using Tree-based techniques for training and generating predictions Holger Cevallos Valdiviezo Master dissertation submitted

More information

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by Paper CC-016 A macro for nearest neighbor Lung-Chang Chien, University of North Carolina at Chapel Hill, Chapel Hill, NC Mark Weaver, Family Health International, Research Triangle Park, NC ABSTRACT SAS

More information

PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects

PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects Let s create a data for a variable measured repeatedly over five occasions We could create raw data (for each

More information

Bootstrap and multiple imputation under missing data in AR(1) models

Bootstrap and multiple imputation under missing data in AR(1) models EUROPEAN ACADEMIC RESEARCH Vol. VI, Issue 7/ October 2018 ISSN 2286-4822 www.euacademic.org Impact Factor: 3.4546 (UIF) DRJI Value: 5.9 (B+) Bootstrap and multiple imputation under missing ELJONA MILO

More information

PRI Workshop Introduction to AMOS

PRI Workshop Introduction to AMOS PRI Workshop Introduction to AMOS Krissy Zeiser Pennsylvania State University klz24@pop.psu.edu 2-pm /3/2008 Setting up the Dataset Missing values should be recoded in another program (preferably with

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Kosuke Imai Princeton University Joint work with Graeme Blair October 29, 2010 Blair and Imai (Princeton) List Experiments NJIT (Mathematics) 1 / 26 Motivation

More information

The Importance of Modeling the Sampling Design in Multiple. Imputation for Missing Data

The Importance of Modeling the Sampling Design in Multiple. Imputation for Missing Data The Importance of Modeling the Sampling Design in Multiple Imputation for Missing Data Jerome P. Reiter, Trivellore E. Raghunathan, and Satkartar K. Kinney Key Words: Complex Sampling Design, Multiple

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Using Monetary incentives in face-to-face surveys:

Using Monetary incentives in face-to-face surveys: Using Monetary incentives in face-to-face surveys: Are prepaid incentives more effective than promised incentives? Michael Blohm & Achim Koch Q2016 - European Conference on Quality in Official Statistics

More information

Individual Covariates

Individual Covariates WILD 502 Lab 2 Ŝ from Known-fate Data with Individual Covariates Today s lab presents material that will allow you to handle additional complexity in analysis of survival data. The lab deals with estimation

More information

Using SAS for Multiple Imputation and Analysis of Longitudinal Data

Using SAS for Multiple Imputation and Analysis of Longitudinal Data Paper 1738-2018 Using SAS for Multiple Imputation and Analysis of Longitudinal Data Patricia A. Berglund, Institute for Social Research-University of Michigan ABSTRACT Using SAS for Multiple Imputation

More information

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. 1/44 Cross-validation and the Bootstrap In the section we discuss two resampling

More information

REALCOM-IMPUTE: multiple imputation using MLwin. Modified September Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol

REALCOM-IMPUTE: multiple imputation using MLwin. Modified September Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol REALCOM-IMPUTE: multiple imputation using MLwin. Modified September 2014 by Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol This description is divided into two sections. In the

More information

Teaching students quantitative methods using resources from the British Birth Cohorts

Teaching students quantitative methods using resources from the British Birth Cohorts Centre for Longitudinal Studies, Institute of Education Teaching students quantitative methods using resources from the British Birth Cohorts Assessment of Cognitive Development through Childhood CognitiveExercises.doc:

More information

Week 11: Interpretation plus

Week 11: Interpretation plus Week 11: Interpretation plus Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline A bit of a patchwork

More information

Missing Data Part II: Multiple Imputation & Maximum Likelihood

Missing Data Part II: Multiple Imputation & Maximum Likelihood Missing Data Part II: Multiple Imputation & Maximum Likelihood Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 12, 2017 Warning: I teach about Multiple

More information

STATA 13 INTRODUCTION

STATA 13 INTRODUCTION STATA 13 INTRODUCTION Catherine McGowan & Elaine Williamson LONDON SCHOOL OF HYGIENE & TROPICAL MEDICINE DECEMBER 2013 0 CONTENTS INTRODUCTION... 1 Versions of STATA... 1 OPENING STATA... 1 THE STATA

More information

Handling missing data in cluster randomized trials: A demonstration of multiple imputation with PAN through SAS

Handling missing data in cluster randomized trials: A demonstration of multiple imputation with PAN through SAS Handling missing data in cluster randomized trials: A demonstration of multiple imputation with AN through SAS Jiangxiu Zhou a, Lauren E. Connell a, John W. Graham,a a Department of Biobehavioral Health,

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,

More information

- 1 - Fig. A5.1 Missing value analysis dialog box

- 1 - Fig. A5.1 Missing value analysis dialog box WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation

More information

WORKSHOP: Using the Health Survey for England, 2014

WORKSHOP: Using the Health Survey for England, 2014 WORKSHOP: Using the Health Survey for England, 2014 There are three sections to this workshop, each with a separate worksheet. The worksheets are designed to be accessible to those who have no prior experience

More information

Answer keys for Assignment 16: Principles of data collection

Answer keys for Assignment 16: Principles of data collection Answer keys for Assignment 16: Principles of data collection (The correct answer is underlined in bold text) 1. Supportive supervision is essential for a good data collection process 2. Which one of the

More information

Modelling Personalized Screening: a Step Forward on Risk Assessment Methods

Modelling Personalized Screening: a Step Forward on Risk Assessment Methods Modelling Personalized Screening: a Step Forward on Risk Assessment Methods Validating Prediction Models Inmaculada Arostegui Universidad del País Vasco UPV/EHU Red de Investigación en Servicios de Salud

More information

Longitudinal Modeling With Randomly and Systematically Missing Data: A Simulation of Ad Hoc, Maximum Likelihood, and Multiple Imputation Techniques

Longitudinal Modeling With Randomly and Systematically Missing Data: A Simulation of Ad Hoc, Maximum Likelihood, and Multiple Imputation Techniques 10.1177/1094428103254673 ORGANIZATIONAL Newman / LONGITUDINAL RESEARCH MODELS METHODS WITH MISSING DATA ARTICLE Longitudinal Modeling With Randomly and Systematically Missing Data: A Simulation of Ad Hoc,

More information

SAS/STAT 14.2 User s Guide. The SURVEYIMPUTE Procedure

SAS/STAT 14.2 User s Guide. The SURVEYIMPUTE Procedure SAS/STAT 14.2 User s Guide The SURVEYIMPUTE Procedure This document is an individual chapter from SAS/STAT 14.2 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute

More information

ST Lab 1 - The basics of SAS

ST Lab 1 - The basics of SAS ST 512 - Lab 1 - The basics of SAS What is SAS? SAS is a programming language based in C. For the most part SAS works in procedures called proc s. For instance, to do a correlation analysis there is proc

More information

ESTIMATING THE MISSING VALUES IN ANALYSIS OF VARIANCE TABLES BY A FLEXIBLE ADAPTIVE ARTIFICIAL NEURAL NETWORK AND FUZZY REGRESSION MODELS

ESTIMATING THE MISSING VALUES IN ANALYSIS OF VARIANCE TABLES BY A FLEXIBLE ADAPTIVE ARTIFICIAL NEURAL NETWORK AND FUZZY REGRESSION MODELS ESTIMATING THE MISSING VALUES IN ANALYSIS OF VARIANCE TABLES BY A FLEXIBLE ADAPTIVE ARTIFICIAL NEURAL NETWORK AND FUZZY REGRESSION MODELS Ali Azadeh - Zahra Saberi Hamidreza Behrouznia-Farzad Radmehr Peiman

More information

Analysis of Complex Survey Data with SAS

Analysis of Complex Survey Data with SAS ABSTRACT Analysis of Complex Survey Data with SAS Christine R. Wells, Ph.D., UCLA, Los Angeles, CA The differences between data collected via a complex sampling design and data collected via other methods

More information

IBM SPSS Categories 23

IBM SPSS Categories 23 IBM SPSS Categories 23 Note Before using this information and the product it supports, read the information in Notices on page 55. Product Information This edition applies to version 23, release 0, modification

More information

Approaches to Missing Data

Approaches to Missing Data Approaches to Missing Data A Presentation by Russell Barbour, Ph.D. Center for Interdisciplinary Research on AIDS (CIRA) and Eugenia Buta, Ph.D. CIRA and The Yale Center of Analytical Studies (YCAS) April

More information

Enterprise Miner Tutorial Notes 2 1

Enterprise Miner Tutorial Notes 2 1 Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender

More information

Missing Data Part II: Multiple Imputation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 24, 2015

Missing Data Part II: Multiple Imputation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Missing Data Part II: Multiple Imputation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Warning: I teach about Multiple Imputation with some trepidation.

More information

Gov 50: 8. Measurement: Survey Sampling

Gov 50: 8. Measurement: Survey Sampling Gov 50: 8. Measurement: Survey Sampling Matthew Blackwell Harvard University Fall 2018 1 / 30 1. Today s agenda 2. The role of randomization 3. The power of randomization 4. Missing data in R 2 / 30 1/

More information

Survival Analysis with PHREG: Using MI and MIANALYZE to Accommodate Missing Data

Survival Analysis with PHREG: Using MI and MIANALYZE to Accommodate Missing Data Survival Analysis with PHREG: Using MI and MIANALYZE to Accommodate Missing Data Christopher F. Ake, SD VA Healthcare System, San Diego, CA Arthur L. Carpenter, Data Explorations, Carlsbad, CA ABSTRACT

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

Repository of Healthcare Databases in Thailand and Japan: Potential Sources for Health Technology Assessment Research

Repository of Healthcare Databases in Thailand and Japan: Potential Sources for Health Technology Assessment Research Repository of Healthcare Databases in Thailand and Japan: Potential Sources for Health Technology Assessment Research Surasak Saokaew, PharmD Takashi Sugimoto, RN, PHN Isao Kamae, MD, DrPH Nathorn Chaiyakunapruk,

More information

Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation. Filip Lindhfors and Farhana Morko

Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation. Filip Lindhfors and Farhana Morko Bachelor thesis Department of Statistics Kandidatuppsats, Statistiska institutionen Nr 2014:5 Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation Filip

More information

The Use of Sample Weights in Hot Deck Imputation

The Use of Sample Weights in Hot Deck Imputation Journal of Official Statistics, Vol. 25, No. 1, 2009, pp. 21 36 The Use of Sample Weights in Hot Deck Imputation Rebecca R. Andridge 1 and Roderick J. Little 1 A common strategy for handling item nonresponse

More information