Multiple-imputation analysis using Stata s mi command

Size: px
Start display at page:

Download "Multiple-imputation analysis using Stata s mi command"

Transcription

1 Multiple-imputation analysis using Stata s mi command Yulia Marchenko Senior Statistician StataCorp LP 2009 UK Stata Users Group Meeting Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

2 Outline 1 Brief overview of multiple imputation 2 Stata 11 s implementation of MI Brief overview of mi Heart-attack data example Creating multiply-imputed data Managing multiply-imputed data Analyzing multiply-imputed data GUI MI control panel Relation to existing user-written commands 3 Summary 4 References Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

3 What is multiple imputation? Multiple imputation (MI) is a simulation-based approach for analyzing incomplete data. MI replaces missing values with multiple sets of simulated values to complete the data, applies standard analyses to each completed dataset, and adjusts the obtained parameter estimates for missing-data uncertainty. The objective of MI is not to predict missing values as close as possible to the true ones but to handle missing data in a way resulting in valid statistical inference (Rubin 1996). Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

4 Why use multiple imputation? It is more flexible than fully-parametric methods, e.g. maximum likelihood, purely Bayesian analysis. It can be more efficient than listwise deletion (complete-cases analysis) and can correct for potential bias. It accounts for missing-data uncertainty and, thus, does not underestimate the variance of estimates unlike single imputation methods. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

5 MI technique The MI technique consists of three steps: 1 Imputation. Replace missing values with M sets of plausible values according to an imputation model (e.g., Rubin 1987; Schafer 1997) to create M completed datasets. 2 Completed-data analysis. Perform primary analysis on each imputed (completed) dataset to obtain a set of completed-data estimates ˆq i and their respective VCEs Û i, i = 1,...,M. 3 Pooling. Consolidate results from the completed-data analyses {ˆq i,û i } M i=1 into one MI inference using Rubin s combination rules (e.g. Rubin 1987, 76). Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

6 Statistical validity of MI MI yields statistically valid inference if an imputation method used in step 1 is proper per Rubin (1987, ). Rubin recommends drawing imputations from a Bayesian posterior predictive distribution of missing data to ensure that imputations are proper. the primary, completed-data analysis used in step 2 is statistically valid in the absence of missing data; see Rubin (1987, ) for details. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

7 Notation and some terminology Original data data containing missing values. With a slight abuse of terminology, by an imputation we mean a copy of the original data in which missing values are imputed. M denotes the number of imputations. m (= 0,..., M) refers to the original or imputed data: m = 0 means original data and m > 0 means imputed data. m = 1 means the first imputation, m = 2 means the second imputation, etc. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

8 Main features Stata 11 s mi command provides full support for all three steps of the MI technique: mi impute performs imputation (step 1); mi estimate performs individual analyses, collects estimates of coefficients and their VCEs, applies Rubin s combination rules to the collected estimates, and reports final results (steps 2 and 3). In addition, mi offers full data management of multiply-imputed data: you can create or drop variables, observations as if you were working with one dataset mi will replicate the changes correctly across the imputed datasets. Other unique features of mi: the ability to store multiply-imputed data in different formats mi data styles; the ability to verify consistency of the data across multiple copies. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

9 mi data system The mi command tracks information about the data which allows full integration of statistical and data-management components. mi records: the format (style) in which MI data will be stored: wide, mlong, flong, or flongsep; the number of imputations, M; the types of registered variables: imputed, passive, regular; information about complete and incomplete observations; see [MI] technical for more detail. mi views system missing values (.) as missing values to be imputed soft missing. All other, extended, missing values are viewed as missing values not to be imputed hard missing values. The mi set and mi register commands are used to set up mi data. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

10 mi data styles To use mi you must declare the storage style. mi supports 4 styles (formats) for storing MI data: flongsep full long and separate imputed data are in separate files, one per imputation; flong full long original and imputed data are in one file, imputations are saved as extra observations; mlong marginal long original and imputed data are in one file, only observations containing imputed values are saved as extra observations. mlong is a memory-efficient version of flong; wide wide original and imputed data are in one file, imputations are saved as extra variables. Some tasks are easier in one style than another. You can switch from one style to another during your mi session by using mi convert. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

11 The role of registered variables in mi mi uses a variable s status to verify its consistency across imputations. You can register variables by using mi register. Registering variables is, in general, not required but highly recommended. mi distinguishes 3 types of variables: imputation (imputed) variables containing soft missings (system missing values), i.e. missing values to be filled in; passive (passive) variables which are functions of imputation and or other passive variables; regular (regular) variables which are the same across imputations; other variables are treated as unregistered. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

12 Always register imputation variables! Important: the status of each observation, complete or incomplete, is determined based on registered imputation variables. An observation in which at least one imputation variable contains a soft missing is marked as incomplete. If no variables are registered as imputed, all observations are treated as complete. You should always register imputation variables. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

13 Example Consider fictional data recording heart attacks. The objective is to examine a relationship between smoking and heart attacks adjusting for age, body mass index, gender, and educational status.. webuse mheart0 (Fictional heart attack data; bmi missing). describe attack smokes age bmi female hsgrad storage display value variable name type format label variable label attack byte %9.0g Outcome (heart attack) smokes byte %9.0g Current smoker age float %9.0g Age, in years bmi float %9.0g Body Mass Index, kg/m^2 female byte %9.0g Gender hsgrad byte %9.0g High school graduate Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

14 We examine data for missing values using misstable.. misstable summarize Obs<. Unique Variable Obs=. Obs>. Obs<. values Min Max bmi Variable bmi contains 22 missing values. We use multiple imputation to perform analysis of the heart-attack data. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

15 Following the MI technique, we need to impute missing values of bmi and then analyze the resulting multiply-imputed data. We will use mi impute and mi estimate to do this but first we need to declare our data to be mi data.. /* check if data are -mi set- */. mi query (data not mi set). /* declare MI data to be stored in the marginal long style */. mi set mlong. /* register bmi as imputation variable */. mi register imputed bmi (22 m=0 obs. now marked as incomplete). /* register other variables as regular */. mi register regular attack smokes age female hsgrad Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

16 We use the regression imputation method to fill in missing values of bmi. We arbitrarily create 5 imputations. We also set the random-number seed for reproducibility.. mi impute regress bmi attack smokes age female hsgrad, add(5) rseed(123) Univariate imputation Imputations = 5 Linear regression added = 5 Imputed: m=1 through m=5 updated = 0 Observations per m Variable complete incomplete imputed total bmi (complete + incomplete = total; imputed is the minimum across m of the number of filled in observations.) Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

17 We perform logistic analysis of the multiply-imputed data using mi estimate: logit.. mi estimate: logit attack smokes age bmi female hsgrad Multiple-imputation estimates Imputations = 5 Logistic regression Number of obs = 154 Average RVI = DF adjustment: Large sample DF: min = avg = max = Model F test: Equal FMI F( 5, ) = 3.39 Within VCE type: OIM Prob > F = attack Coef. Std. Err. t P> t [95% Conf. Interval] smokes age bmi female hsgrad _cons Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

18 Imputation methods Univariate imputation: linear regression for continuous variables mi impute regress predictive mean matching for continuous variables mi impute pmm logistic regression for binary variables mi impute logit ordinal logistic regression for ordinal variables mi impute ologit multinomial logistic regression for nominal variables mi impute mlogit Multivariate imputation: monotone method for multiple variables of different types mi impute monotone multivariate normal regression for multiple continuous variables mi impute mvn Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

19 Imputation methods assumption mi impute assumes that missing data are missing at random; that is, missing values do not carry any extra information about why they are missing than what is already available in the observed data. mi impute creates imputations by simulating from a (approximate) Bayesian posterior predictive distribution of the missing data, following Rubin s recommendation. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

20 mi impute s general syntax mi impute method model spec [, common options method options ] The two main common options are add() and replace. These options allow you to perform the following actions: 1 Create imputations or add new imputations to the existing ones: mi impute..., add(#)... 2 Replace existing imputations with new ones: mi impute..., replace... 3 Replace existing imputations and add new ones: mi impute..., add(#) replace... See [MI] mi impute for more details. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

21 Multivariate imputation: heart-attack data. webuse mheart5s0 (Fictional heart attack data; bmi and age missing). mi describe Style: mlong last mi update 19jun :50:18, 30 days ago Obs.: complete 126 incomplete 28 (M = 0 imputations) total 154 Vars.: imputed: 2; bmi(28) age(12) passive: 0 regular: 4; attack smokes female hsgrad system: 3; _mi_m _mi_id _mi_miss (there are no unregistered variables) Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

22 Data are already set in the mlong style. Two variables, bmi and age, contain missing values and are registered as imputed. Other variables are registered as regular. Data contain no imputations. System variable mi miss records the status of observations (1 means incomplete) based on imputation variables age and bmi. mi m records imputation numbers and mi id records observation identifiers; see [MI] technical for details. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

23 Multivariate imputation: monotone missing pattern. mi misstable nested 1. age(12) -> bmi(28). mi impute monotone (pmm) bmi (regress) age = attack smokes hsgrad female, add(5) Conditional models: age: regress age attack smokes hsgrad female bmi: pmm bmi age attack smokes hsgrad female Multivariate imputation Imputations = 5 Monotone method added = 5 Imputed: m=1 through m=5 updated = 0 age: linear regression bmi: predictive mean matching Observations per m Variable complete incomplete imputed total age bmi (complete + incomplete = total; imputed is the minimum across m of the number of filled in observations.) Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

24 Note that because the data are already mi set, we used mi misstable rather than misstable. We used mi misstable nested to check if variables are nested with respect to missing values. From the output of mi misstable, missing values of age and bmi form a monotone-missing pattern: age is missing only in observations where bmi is missing. bmi does not have any observations with nonmissing values for which age is missing. Therefore, we used mi impute monotone to impute bmi and age. We can also use mi impute mvn to impute bmi and age (as shown on the next slide) but using mi impute monotone is faster because it does not require iteration. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

25 Multivariate imputation: normal regression. mi impute mvn age bmi = attack smokes hsgrad female, replace Performing EM optimization: note: 12 observations omitted from EM estimation because of all imputation variables missing observed log likelihood = at iteration 7 Performing MCMC data augmentation... Multivariate imputation Imputations = 5 Multivariate normal regression added = 0 Imputed: m=1 through m=5 updated = 5 Prior: uniform Iterations = 500 burn-in = 100 between = 100 Observations per m Variable complete incomplete imputed total age bmi (complete + incomplete = total; imputed is the minimum across m of the number of filled in observations.) Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

26 mi impute mvn uses data augmentation, an iterative MCMC method, to impute missing values under a multivariate normal model. mi impute mvn uses estimates from the EM algorithm as starting values for the MCMC procedure. You can supply your own initial values, if needed, using option initmcmc(). The default prior is uniform under which posterior mode estimates and maximum-likelihood estimates are equivalent. You can change the default prior specification using option prior(). The first imputation is drawn after an initial default burn-in period of 100 iterations. You can use option burnin() to choose a different burn-in period. The subsequent imputations are drawn every 100 (the default) iterations apart. You can change the number of iterations between imputations using option burnbetween(). Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

27 Importing existing imputations to mi In the heart-attack example we created imputations using mi impute. What if you need to analyze multiply-imputed data created outside of Stata? 1 Read file(s) containing multiply-imputed data into Stata; see, for example, [D] infile. 2 Use mi import to set up the multiply-imputed data in mi. mi import supports various styles in which multiply-imputed data can be recorded. In particular, mi import nhanes1 imports MI data recorded in the format used by NHANES; see mi import ice imports MI data recorded in the format used by the user-written command ice performing imputation via chained equations. see [MI] mi import for details. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

28 Describing mi data To describe the mi data use mi query to get a short summary of the mi settings; mi describe to get a more detailed report about mi data. mi varying is useful to identify variables that vary over imputations. For example, you can use it to identify imputation and passive variables and then register them using mi register. This command also helps to detect potential problems. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

29 Manipulating mi data Manipulation of mi data can be done in one of two ways: repeating the same data-management command on each imputed dataset; using a data-management routine specialized for multiply-imputed data. For example, specialized routines are needed to append or merge multiply-imputed data. Stata offers both: Use mi xeq:command to perform command on each imputed dataset. Use, e.g. mi append, mi merge, mi reshape to append, merge, and reshape mi data; see [MI] intro (or type help mi) for a list of all mi-specific data-management commands. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

30 Verifying consistency of the mi data mi update mi update is a unique feature of mi. mi update verifies that: the number of observations is consistent in m 0; the number and instances of variables are consistent in m 0; complete/incomplete observations are correctly identified by the imputation variables; regular variables contain the same values in imputed data as in the original data; imputation variables contain the same nonmissing values in imputed data as in the original data; passive variables contain the same values in complete observations in imputed data as in the original data;... ; see [MI] mi update for more detail. mi update is executed automatically each time an mi command is run. It can also be run manually. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

31 Managing observations, variables a quick example Example mi data contain 1 imputation and are saved in the flongsep style. 1. Replace a value:. mi xeq: replace age = 20 in 30 m=0 data: -> replace age = 20 in 30 (1 real change made) m=1 data: -> replace age = 20 in 30 (1 real change made) 2. Drop a variable:. mi xeq: drop female m=0 data: -> drop female m=1 data: -> drop female 3. Alternatively to 1 and 2 above,. replace age = 20 in 30 (1 real change made). drop female. mi update (regular variable female unregistered because not in m=0) (imputed variables updated in 1 obs. in m>0 in order to match m=0 data) Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

32 Managing imputations You can use mi impute to create or add new imputations; mi set m to delete selected imputations; mi add to add imputations from a separate file; mi set M to reset the number of imputations (or create empty imputations in which missing data are not filled in). When performing data manipulation on mi data, remember to use the mi versions of the data-management routines, if they exist; to use mi xeq with routines for which there is no mi prefix; to run mi update periodically to ensure consistency of the mi data. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

33 Estimation using multiple imputation mi estimate is designed to perform analysis of the multiply-imputed data; the data must be mi set and must have at least 2 imputations. Basic syntax: mi estimate: estimation command The above runs estimation command separately on each imputed dataset, and reports the MI estimates of coefficients and their standard errors based on results saved by estimation command. estimation command is one of the supported estimation commands as listed in [MI] estimation. Extended syntax: mi estimate [, options ] : estimation command mi estimate also provides options allowing to select how many and which imputations to use in the computation, to report additional information about the MI estimates, and more. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

34 Other features of mi estimate Other features of mi estimate include the ability to obtain estimates of functions of coefficients; the ability to save individual estimates for later use with mi estimate using without the need of refitting the models on each imputed dataset. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

35 Other features of mi estimate For example, using our earlier heart-attack data example, we save individual estimates to a myest.ster estimation file:. mi estimate, saving(myest): logit attack smokes age bmi female hsgrad We then compute MI estimates of the ratio of coefficients for age and bmi using saved individual estimation results rather than refitting the completed-data models:. mi estimate (ratio: b[age]/ b[bmi]) using myest See [MI] mi estimate and [MI] mi estimate using for more detail. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

36 Analyzing complex multiply-imputed data You can use mi estimate to analyze complex data such as, for example, survey data. In Stata, prior to analyzing complex data, it must be declared. For example, survey data must be svyset, survival data must be stset, and so on. To declare the complex mi data, you should use the corresponding set command with the mi prefix. For example, to declare mi survey data, use. mi svyset... Then, to fit a model on mi survey data, use. mi estimate: svy:... Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

37 Using mi estimate with user-written commands mi estimate provides a way (option cmdok) of applying combination rules to results from estimation commands not officially supported by mi estimate, such as user-written estimation commands.. mi estimate, cmdok : user command It is the user s responsibility to verify that the combination rules are applicable to the results reported by the used command. The user command should also satisfy technical requirements from Writing programs for use with mi in [P] program properties. Authors of user-written commands can also modify their estimation commands to be accepted by mi estimate without specifying cmdok as described in the section mentioned above. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

38 Testing hypotheses After mi estimate, you can test subsets of coefficients, linear or nonlinear hypotheses using mi test and mi testtransform. See [MI] postestimation for examples. mi test and mi testtransform provide the conditional (equal fraction-missing-information, equal FMI) test of Li et al. (1991); the unconditional test of Rubin (1987, 77 78). This test may be preferable when the number of imputations is large and the equal FMI assumption is suspect. small-sample adjustments for the tests as described in Marchenko and Reiter (2009). Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

39 GUI MI control panel The MI control panel, which can be invoked from the Statistics > Multiple imputation menu or by typing. db mi guides you through all the phases of MI. (NEXT SLIDE) Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

40 Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

41 Relation to existing user-written commands User-written commands ice and mim are widely used to perform multiple-imputation analysis in Stata. ice creates imputations using the chained-equation approach of van Buuren et al. (1999). mim analyzes multiply-imputed data. mi estimate and the mi data-management routines cover most estimation and all data-management capabilities of mim. mi does not provide imputation via chained equations and thus ice remains the only implementation of the chained-equation approach. See for more detail. A more detailed FAQ is coming soon. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

42 Summary mi accommodates all steps of the MI technique: mi impute provides univariate and multivariate methods for filling in missing values; mi estimate performs completed-data analysis and combines estimates using Rubin s pooling rules. mi provides full data-management support. mi provides 4 styles for storing MI data and can import from 5 styles. mi verifies consistency of your data at every opportunity. mi offers postestimation features: testing linear or nonlinear hypotheses. mi provides elaborate GUI support MI control panel. mi offers extensive documentation, manual [MI] Multiple imputation. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

43 References Li, K.-H., T. E. Raghunathan, and D. B. Rubin Large-sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution. Journal of the American Statistical Association 86: Marchenko, Y. V. and J. P. Reiter Improved degrees of freedom for multivariate significance tests obtained from multiply imputed, small-sample data. Stata Journal. Forthcoming. Rubin, D. B Multiple Imputation for Nonresponse in Surveys. New York: Wiley. Rubin, D. B Multiple imputation after 18+ years. Journal of the American Statistical Association 91: Schafer, J. L Analysis of Incomplete Multivariate Data. Boca Raton, FL: Chapman & Hall/CRC. van Buuren, S., H. C. Boshuizen, and D. L. Knook Multiple imputation of missing blood pressure covariates in survival analysis. Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi September 10, / 43

Missing Data Part II: Multiple Imputation & Maximum Likelihood

Missing Data Part II: Multiple Imputation & Maximum Likelihood Missing Data Part II: Multiple Imputation & Maximum Likelihood Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 12, 2017 Warning: I teach about Multiple

More information

Missing Data Part II: Multiple Imputation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 24, 2015

Missing Data Part II: Multiple Imputation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Missing Data Part II: Multiple Imputation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Warning: I teach about Multiple Imputation with some trepidation.

More information

Compute MI estimates of coefficients by fitting estimation command to mi data

Compute MI estimates of coefficients by fitting estimation command to mi data Title mi estimate Estimation using multiple imputations Syntax Compute MI estimates of coefficients by fitting estimation command to mi data mi estimate [, options ] : estimation command... Compute MI

More information

Handling missing data for indicators, Susanne Rässler 1

Handling missing data for indicators, Susanne Rässler 1 Handling Missing Data for Indicators Susanne Rässler Institute for Employment Research & Federal Employment Agency Nürnberg, Germany First Workshop on Indicators in the Knowledge Economy, Tübingen, 3-4

More information

Compute MI estimates of coefficients using previously saved estimation results

Compute MI estimates of coefficients using previously saved estimation results Title mi estimate using Estimation using previously saved estimation results Syntax Compute MI estimates of coefficients using previously saved estimation results mi estimate using miestfile [, options

More information

Missing Data: What Are You Missing?

Missing Data: What Are You Missing? Missing Data: What Are You Missing? Craig D. Newgard, MD, MPH Jason S. Haukoos, MD, MS Roger J. Lewis, MD, PhD Society for Academic Emergency Medicine Annual Meeting San Francisco, CA May 006 INTRODUCTION

More information

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Multiple Imputation for Missing Data Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Outline Missing data mechanisms What is Multiple Imputation? Software Options

More information

MISSING DATA AND MULTIPLE IMPUTATION

MISSING DATA AND MULTIPLE IMPUTATION Paper 21-2010 An Introduction to Multiple Imputation of Complex Sample Data using SAS v9.2 Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT This

More information

Multiple imputation of missing values: Further update of ice, with an emphasis on categorical variables

Multiple imputation of missing values: Further update of ice, with an emphasis on categorical variables The Stata Journal (2009) 9, Number 3, pp. 466 477 Multiple imputation of missing values: Further update of ice, with an emphasis on categorical variables Patrick Royston Hub for Trials Methodology Research

More information

An Algorithm for Creating Models for Imputation Using the MICE Approach:

An Algorithm for Creating Models for Imputation Using the MICE Approach: An Algorithm for Creating Models for Imputation Using the MICE Approach: An application in Stata Rose Anne rosem@ats.ucla.edu Statistical Consulting Group Academic Technology Services University of California,

More information

Multiple imputation using chained equations: Issues and guidance for practice

Multiple imputation using chained equations: Issues and guidance for practice Multiple imputation using chained equations: Issues and guidance for practice Ian R. White, Patrick Royston and Angela M. Wood http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/full By Gabrielle Simoneau

More information

R software and examples

R software and examples Handling Missing Data in R with MICE Handling Missing Data in R with MICE Why this course? Handling Missing Data in R with MICE Stef van Buuren, Methodology and Statistics, FSBS, Utrecht University Netherlands

More information

Analysis of Incomplete Multivariate Data

Analysis of Incomplete Multivariate Data Analysis of Incomplete Multivariate Data J. L. Schafer Department of Statistics The Pennsylvania State University USA CHAPMAN & HALL/CRC A CR.C Press Company Boca Raton London New York Washington, D.C.

More information

Missing Data Missing Data Methods in ML Multiple Imputation

Missing Data Missing Data Methods in ML Multiple Imputation Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:

More information

Statistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland

Statistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland Statistical Analysis Using Combined Data Sources: Discussion 2011 JPSM Distinguished Lecture University of Maryland 1 1 University of Michigan School of Public Health April 2011 Complete (Ideal) vs. Observed

More information

The Importance of Modeling the Sampling Design in Multiple. Imputation for Missing Data

The Importance of Modeling the Sampling Design in Multiple. Imputation for Missing Data The Importance of Modeling the Sampling Design in Multiple Imputation for Missing Data Jerome P. Reiter, Trivellore E. Raghunathan, and Satkartar K. Kinney Key Words: Complex Sampling Design, Multiple

More information

Bootstrap and multiple imputation under missing data in AR(1) models

Bootstrap and multiple imputation under missing data in AR(1) models EUROPEAN ACADEMIC RESEARCH Vol. VI, Issue 7/ October 2018 ISSN 2286-4822 www.euacademic.org Impact Factor: 3.4546 (UIF) DRJI Value: 5.9 (B+) Bootstrap and multiple imputation under missing ELJONA MILO

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex

More information

Handling Data with Three Types of Missing Values:

Handling Data with Three Types of Missing Values: Handling Data with Three Types of Missing Values: A Simulation Study Jennifer Boyko Advisor: Ofer Harel Department of Statistics University of Connecticut Storrs, CT May 21, 2013 Jennifer Boyko Handling

More information

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa Ronald H. Heck 1 In this handout, we will address a number of issues regarding missing data. It is often the case that the weakest point of a study is the quality of the data that can be brought to bear

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a Chapter 3 Missing Data 3.1 Types of Missing Data ˆ Missing completely at random (MCAR) ˆ Missing at random (MAR) a ˆ Informative missing (non-ignorable non-response) See 1, 38, 59 for an introduction to

More information

NORM software review: handling missing values with multiple imputation methods 1

NORM software review: handling missing values with multiple imputation methods 1 METHODOLOGY UPDATE I Gusti Ngurah Darmawan NORM software review: handling missing values with multiple imputation methods 1 Evaluation studies often lack sophistication in their statistical analyses, particularly

More information

A new framework for managing and analyzing multiply imputed data in Stata

A new framework for managing and analyzing multiply imputed data in Stata The Stata Journal (2008) 8, Number 1, pp. 49 67 A new framework for managing and analyzing multiply imputed data in Stata John B. Carlin Clinical Epidemiology & Biostatistics Unit Murdoch Children s Research

More information

Epidemiological analysis PhD-course in epidemiology

Epidemiological analysis PhD-course in epidemiology Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 9. oktober 2012 Multivariate tables Agenda today Age standardization Missing data 1 2 3 4 Age standardization

More information

Instrumental variables, bootstrapping, and generalized linear models

Instrumental variables, bootstrapping, and generalized linear models The Stata Journal (2003) 3, Number 4, pp. 351 360 Instrumental variables, bootstrapping, and generalized linear models James W. Hardin Arnold School of Public Health University of South Carolina Columbia,

More information

Show how the LG-Syntax can be generated from a GUI model. Modify the LG-Equations to specify a different LC regression model

Show how the LG-Syntax can be generated from a GUI model. Modify the LG-Equations to specify a different LC regression model Tutorial #S1: Getting Started with LG-Syntax DemoData = 'conjoint.sav' This tutorial introduces the use of the LG-Syntax module, an add-on to the Advanced version of Latent GOLD. In this tutorial we utilize

More information

Epidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014

Epidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014 Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 25 th February 2014 Age standardization Incidence and prevalence are strongly agedependent Risks rising

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

Title. Description. Quick start. stata.com. misstable Tabulate missing values

Title. Description. Quick start. stata.com. misstable Tabulate missing values Title stata.com misstable Tabulate missing values Description Quick start Menu Syntax Options Remarks and examples Stored results Also see Description misstable makes tables that help you understand the

More information

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,

More information

REALCOM-IMPUTE: multiple imputation using MLwin. Modified September Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol

REALCOM-IMPUTE: multiple imputation using MLwin. Modified September Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol REALCOM-IMPUTE: multiple imputation using MLwin. Modified September 2014 by Harvey Goldstein, Centre for Multilevel Modelling, University of Bristol This description is divided into two sections. In the

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software December 2011, Volume 45, Issue 5. http://www.jstatsoft.org/ REALCOM-IMPUTE Software for Multilevel Multiple Imputation with Mixed Response Types James R. Carpenter

More information

Missing Data Analysis with SPSS

Missing Data Analysis with SPSS Missing Data Analysis with SPSS Meng-Ting Lo (lo.194@osu.edu) Department of Educational Studies Quantitative Research, Evaluation and Measurement Program (QREM) Research Methodology Center (RMC) Outline

More information

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background An Introduction to Multiple Imputation and its Application Craig K. Enders University of California - Los Angeles Department of Psychology cenders@psych.ucla.edu Background Work supported by Institute

More information

Power of the power command in Stata 13

Power of the power command in Stata 13 Power of the power command in Stata 13 Yulia Marchenko Director of Biostatistics StataCorp LP 2013 UK Stata Users Group meeting Yulia Marchenko (StataCorp) September 13, 2013 1 / 27 Outline Outline Basic

More information

Supplementary Notes on Multiple Imputation. Stephen du Toit and Gerhard Mels Scientific Software International

Supplementary Notes on Multiple Imputation. Stephen du Toit and Gerhard Mels Scientific Software International Supplementary Notes on Multiple Imputation. Stephen du Toit and Gerhard Mels Scientific Software International Part A: Comparison with FIML in the case of normal data. Stephen du Toit Multivariate data

More information

Panel Data 4: Fixed Effects vs Random Effects Models

Panel Data 4: Fixed Effects vs Random Effects Models Panel Data 4: Fixed Effects vs Random Effects Models Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 4, 2017 These notes borrow very heavily, sometimes verbatim,

More information

Multiple Imputation for Continuous and Categorical Data: Comparing Joint and Conditional Approaches

Multiple Imputation for Continuous and Categorical Data: Comparing Joint and Conditional Approaches Multiple Imputation for Continuous and Categorical Data: Comparing Joint and Conditional Approaches Jonathan Kropko University of Virginia Ben Goodrich Columbia University Andrew Gelman Columbia University

More information

Performance of Sequential Imputation Method in Multilevel Applications

Performance of Sequential Imputation Method in Multilevel Applications Section on Survey Research Methods JSM 9 Performance of Sequential Imputation Method in Multilevel Applications Enxu Zhao, Recai M. Yucel New York State Department of Health, 8 N. Pearl St., Albany, NY

More information

Hierarchical Mixture Models for Nested Data Structures

Hierarchical Mixture Models for Nested Data Structures Hierarchical Mixture Models for Nested Data Structures Jeroen K. Vermunt 1 and Jay Magidson 2 1 Department of Methodology and Statistics, Tilburg University, PO Box 90153, 5000 LE Tilburg, Netherlands

More information

Application of multiple imputation using the two-fold fully conditional specification algorithm in longitudinal clinical data

Application of multiple imputation using the two-fold fully conditional specification algorithm in longitudinal clinical data The Stata Journal (yyyy) vv, Number ii, pp. 1 13 Application of multiple imputation using the two-fold fully conditional specification algorithm in longitudinal clinical data Catherine Welch University

More information

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent

More information

Bivariate (Simple) Regression Analysis

Bivariate (Simple) Regression Analysis Revised July 2018 Bivariate (Simple) Regression Analysis This set of notes shows how to use Stata to estimate a simple (two-variable) regression equation. It assumes that you have set Stata up on your

More information

A Bayesian analysis of survey design parameters for nonresponse, costs and survey outcome variable models

A Bayesian analysis of survey design parameters for nonresponse, costs and survey outcome variable models A Bayesian analysis of survey design parameters for nonresponse, costs and survey outcome variable models Eva de Jong, Nino Mushkudiani and Barry Schouten ASD workshop, November 6-8, 2017 Outline Bayesian

More information

The mice Package. June 25, 2007

The mice Package. June 25, 2007 The mice Package June 25, 2007 Version 1.16 Date 2007-06-25 Title Multivariate Imputation by Chained Equations Author S. Van Buuren & C.G.M. Oudshoorn Maintainer Roel de Jong Depends

More information

Package midastouch. February 7, 2016

Package midastouch. February 7, 2016 Type Package Version 1.3 Package midastouch February 7, 2016 Title Multiple Imputation by Distance Aided Donor Selection Date 2016-02-06 Maintainer Philipp Gaffert Depends R (>=

More information

Lab 2: OLS regression

Lab 2: OLS regression Lab 2: OLS regression Andreas Beger February 2, 2009 1 Overview This lab covers basic OLS regression in Stata, including: multivariate OLS regression reporting coefficients with different confidence intervals

More information

Generalized least squares (GLS) estimates of the level-2 coefficients,

Generalized least squares (GLS) estimates of the level-2 coefficients, Contents 1 Conceptual and Statistical Background for Two-Level Models...7 1.1 The general two-level model... 7 1.1.1 Level-1 model... 8 1.1.2 Level-2 model... 8 1.2 Parameter estimation... 9 1.3 Empirical

More information

Opening Windows into the Black Box

Opening Windows into the Black Box Opening Windows into the Black Box Yu-Sung Su, Andrew Gelman, Jennifer Hill and Masanao Yajima Columbia University, Columbia University, New York University and University of California at Los Angels July

More information

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Examples: Missing Data Modeling And Bayesian Analysis CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Mplus provides estimation of models with missing data using both frequentist and Bayesian

More information

Missing Data Analysis with the Mahalanobis Distance

Missing Data Analysis with the Mahalanobis Distance Missing Data Analysis with the Mahalanobis Distance by Elaine M. Berkery, B.Sc. Department of Mathematics and Statistics, University of Limerick A thesis submitted for the award of M.Sc. Supervisor: Dr.

More information

25 Working with categorical data and factor variables

25 Working with categorical data and factor variables 25 Working with categorical data and factor variables Contents 25.1 Continuous, categorical, and indicator variables 25.1.1 Converting continuous variables to indicator variables 25.1.2 Converting continuous

More information

Statistical matching: conditional. independence assumption and auxiliary information

Statistical matching: conditional. independence assumption and auxiliary information Statistical matching: conditional Training Course Record Linkage and Statistical Matching Mauro Scanu Istat scanu [at] istat.it independence assumption and auxiliary information Outline The conditional

More information

The mice Package. April 3, 2006

The mice Package. April 3, 2006 Version 1.14 Date 9/16/2005 The mice Package April 3, 2006 Title Multivariate Imputation by Chained Equations Author S. Van Buuren & C.G.M. Oudshoorn [R: peter.malewski@gmx.de] Maintainer Roel de Jong

More information

Example Using Missing Data 1

Example Using Missing Data 1 Ronald H. Heck and Lynn N. Tabata 1 Example Using Missing Data 1 Creating the Missing Data Variable (Miss) Here is a data set (achieve subset MANOVAmiss.sav) with the actual missing data on the outcomes.

More information

arxiv: v1 [stat.me] 29 May 2015

arxiv: v1 [stat.me] 29 May 2015 MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis Vincent Audigier 1, François Husson 2 and Julie Josse 2 arxiv:1505.08116v1 [stat.me] 29 May 2015 Applied Mathematics

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

Creating a data file and entering data

Creating a data file and entering data 4 Creating a data file and entering data There are a number of stages in the process of setting up a data file and analysing the data. The flow chart shown on the next page outlines the main steps that

More information

Handbook of Statistical Modeling for the Social and Behavioral Sciences

Handbook of Statistical Modeling for the Social and Behavioral Sciences Handbook of Statistical Modeling for the Social and Behavioral Sciences Edited by Gerhard Arminger Bergische Universität Wuppertal Wuppertal, Germany Clifford С. Clogg Late of Pennsylvania State University

More information

Missing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999.

Missing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999. 2 Schafer, J. L., Graham, J. W.: (2002). Missing Data: Our View of the State of the Art. Psychological methods, 2002, Vol 7, No 2, 47 77 Rosner, B. (2005) Fundamentals of Biostatistics, 6th ed, Wiley.

More information

Blimp User s Guide. Version 1.0. Brian T. Keller. Craig K. Enders.

Blimp User s Guide. Version 1.0. Brian T. Keller. Craig K. Enders. Blimp User s Guide Version 1.0 Brian T. Keller bkeller2@ucla.edu Craig K. Enders cenders@psych.ucla.edu September 2017 Developed by Craig K. Enders and Brian T. Keller. Blimp was developed with funding

More information

- 1 - Fig. A5.1 Missing value analysis dialog box

- 1 - Fig. A5.1 Missing value analysis dialog box WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation

More information

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education) DUMMY VARIABLES AND INTERACTIONS Let's start with an example in which we are interested in discrimination in income. We have a dataset that includes information for about 16 people on their income, their

More information

Handling Missing Data

Handling Missing Data Handling Missing Data Estie Hudes Tor Neilands UCSF Center for AIDS Prevention Studies Part 2 December 10, 2013 1 Contents 1. Summary of Part 1 2. Multiple Imputation (MI) for normal data 3. Multiple Imputation

More information

Multiple imputation of missing values: Update of ice

Multiple imputation of missing values: Update of ice The Stata Journal (2005) 5, Number 4, pp. 527 536 Multiple imputation of missing values: Update of ice 1 Introduction Patrick Roston Cancer Group MRC Clinical Trials Unit 222 Euston Road London NW1 2DA

More information

May 24, Emil Coman 1 Yinghui Duan 2 Daren Anderson 3

May 24, Emil Coman 1 Yinghui Duan 2 Daren Anderson 3 Assessing Health Disparities in Intensive Longitudinal Data: Gender Differences in Granger Causality Between Primary Care Provider and Emergency Room Usage, Assessed with Medicaid Insurance Claims May

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

Week 5: Multiple Linear Regression II

Week 5: Multiple Linear Regression II Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R

More information

The Performance of Multiple Imputation for Likert-type Items with Missing Data

The Performance of Multiple Imputation for Likert-type Items with Missing Data Journal of Modern Applied Statistical Methods Volume 9 Issue 1 Article 8 5-1-2010 The Performance of Multiple Imputation for Likert-type Items with Missing Data Walter Leite University of Florida, Walter.Leite@coe.ufl.edu

More information

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models SOCY776: Longitudinal Data Analysis Instructor: Natasha Sarkisian Panel Data Analysis: Fixed Effects Models Fixed effects models are similar to the first difference model we considered for two wave data

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

Missing Data Techniques

Missing Data Techniques Missing Data Techniques Paul Philippe Pare Department of Sociology, UWO Centre for Population, Aging, and Health, UWO London Criminometrics (www.crimino.biz) 1 Introduction Missing data is a common problem

More information

Missing Data in Orthopaedic Research

Missing Data in Orthopaedic Research in Orthopaedic Research Keith D Baldwin, MD, MSPT, MPH, Pamela Ohman-Strickland, PhD Abstract Missing data can be a frustrating problem in orthopaedic research. Many statistical programs employ a list-wise

More information

Using Imputation Techniques to Help Learn Accurate Classifiers

Using Imputation Techniques to Help Learn Accurate Classifiers Using Imputation Techniques to Help Learn Accurate Classifiers Xiaoyuan Su Computer Science & Engineering Florida Atlantic University Boca Raton, FL 33431, USA xsu@fau.edu Taghi M. Khoshgoftaar Computer

More information

Improving Imputation Accuracy in Ordinal Data Using Classification

Improving Imputation Accuracy in Ordinal Data Using Classification Improving Imputation Accuracy in Ordinal Data Using Classification Shafiq Alam 1, Gillian Dobbie, and XiaoBin Sun 1 Faculty of Business and IT, Whitireia Community Polytechnic, Auckland, New Zealand shafiq.alam@whitireia.ac.nz

More information

The Use of Sample Weights in Hot Deck Imputation

The Use of Sample Weights in Hot Deck Imputation Journal of Official Statistics, Vol. 25, No. 1, 2009, pp. 21 36 The Use of Sample Weights in Hot Deck Imputation Rebecca R. Andridge 1 and Roderick J. Little 1 A common strategy for handling item nonresponse

More information

Week 10: Heteroskedasticity II

Week 10: Heteroskedasticity II Week 10: Heteroskedasticity II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Dealing with heteroskedasticy

More information

Missing Data. SPIDA 2012 Part 6 Mixed Models with R:

Missing Data. SPIDA 2012 Part 6 Mixed Models with R: The best solution to the missing data problem is not to have any. Stef van Buuren, developer of mice SPIDA 2012 Part 6 Mixed Models with R: Missing Data Georges Monette 1 May 2012 Email: georges@yorku.ca

More information

Ludwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer

Ludwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer Ludwig Fahrmeir Gerhard Tute Statistical odelling Based on Generalized Linear Model íecond Edition. Springer Preface to the Second Edition Preface to the First Edition List of Examples List of Figures

More information

Week 4: Simple Linear Regression III

Week 4: Simple Linear Regression III Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of

More information

IBM SPSS Missing Values 21

IBM SPSS Missing Values 21 IBM SPSS Missing Values 21 Note: Before using this information and the product it supports, read the general information under Notices on p. 87. This edition applies to IBM SPSS Statistics 21 and to all

More information

3.6 Sample code: yrbs_data <- read.spss("yrbs07.sav",to.data.frame=true)

3.6 Sample code: yrbs_data <- read.spss(yrbs07.sav,to.data.frame=true) InJanuary2009,CDCproducedareportSoftwareforAnalyisofYRBSdata, describingtheuseofsas,sudaan,stata,spss,andepiinfoforanalyzingdatafrom theyouthriskbehaviorssurvey. ThisreportprovidesthesameinformationforRandthesurveypackage.Thetextof

More information

PASW Missing Values 18

PASW Missing Values 18 i PASW Missing Values 18 For more information about SPSS Inc. software products, please visit our Web site at http://www.spss.com or contact SPSS Inc. 233 South Wacker Drive, 11th Floor Chicago, IL 60606-6412

More information

CLAREMONT MCKENNA COLLEGE. Fletcher Jones Student Peer to Peer Technology Training Program. Basic Statistics using Stata

CLAREMONT MCKENNA COLLEGE. Fletcher Jones Student Peer to Peer Technology Training Program. Basic Statistics using Stata CLAREMONT MCKENNA COLLEGE Fletcher Jones Student Peer to Peer Technology Training Program Basic Statistics using Stata An Introduction to Stata A Comparison of Statistical Packages... 3 Opening Stata...

More information

PAM 4280/ECON 3710: The Economics of Risky Health Behaviors Fall 2015 Professor John Cawley TA Christine Coyer. Stata Basics for PAM 4280/ECON 3710

PAM 4280/ECON 3710: The Economics of Risky Health Behaviors Fall 2015 Professor John Cawley TA Christine Coyer. Stata Basics for PAM 4280/ECON 3710 PAM 4280/ECON 3710: The Economics of Risky Health Behaviors Fall 2015 Professor John Cawley TA Christine Coyer Stata Basics for PAM 4280/ECON 3710 I Introduction Stata is one of the most commonly used

More information

Analysis of Complex Survey Data with SAS

Analysis of Complex Survey Data with SAS ABSTRACT Analysis of Complex Survey Data with SAS Christine R. Wells, Ph.D., UCLA, Los Angeles, CA The differences between data collected via a complex sampling design and data collected via other methods

More information

Package sparsereg. R topics documented: March 10, Type Package

Package sparsereg. R topics documented: March 10, Type Package Type Package Package sparsereg March 10, 2016 Title Sparse Bayesian Models for Regression, Subgroup Analysis, and Panel Data Version 1.2 Date 2016-03-01 Author Marc Ratkovic and Dustin Tingley Maintainer

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Coding Categorical Variables in Regression: Indicator or Dummy Variables. Professor George S. Easton

Coding Categorical Variables in Regression: Indicator or Dummy Variables. Professor George S. Easton Coding Categorical Variables in Regression: Indicator or Dummy Variables Professor George S. Easton DataScienceSource.com This video is embedded on the following web page at DataScienceSource.com: DataScienceSource.com/DummyVariables

More information

Comparison of Hot Deck and Multiple Imputation Methods Using Simulations for HCSDB Data

Comparison of Hot Deck and Multiple Imputation Methods Using Simulations for HCSDB Data Comparison of Hot Deck and Multiple Imputation Methods Using Simulations for HCSDB Data Donsig Jang, Amang Sukasih, Xiaojing Lin Mathematica Policy Research, Inc. Thomas V. Williams TRICARE Management

More information

Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation. Filip Lindhfors and Farhana Morko

Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation. Filip Lindhfors and Farhana Morko Bachelor thesis Department of Statistics Kandidatuppsats, Statistiska institutionen Nr 2014:5 Missing data analysis: - A study of complete case analysis, single imputation and multiple imputation Filip

More information

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered

More information

Multiple Imputation for Multilevel Models with Missing Data Using Stat-JR

Multiple Imputation for Multilevel Models with Missing Data Using Stat-JR Multiple Imputation for Multilevel Models with Missing Data Using Stat-JR Introduction In this document we introduce a Stat-JR super-template for 2-level data that allows for missing values in explanatory

More information

Generalized ordered logit/partial proportional odds models for ordinal dependent variables

Generalized ordered logit/partial proportional odds models for ordinal dependent variables The Stata Journal (2006) 6, Number 1, pp. 58 82 Generalized ordered logit/partial proportional odds models for ordinal dependent variables Richard Williams Department of Sociology University of Notre Dame

More information

Smoking and Missingness: Computer Syntax 1

Smoking and Missingness: Computer Syntax 1 Smoking and Missingness: Computer Syntax 1 Computer Syntax SAS code is provided for the logistic regression imputation described in this article. This code is listed in parts, with description provided

More information

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT PRIMER FOR ACS OUTCOMES RESEARCH COURSE: TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT STEP 1: Install STATA statistical software. STEP 2: Read through this primer and complete the

More information

Hierarchical Generalized Linear Models

Hierarchical Generalized Linear Models Generalized Multilevel Linear Models Introduction to Multilevel Models Workshop University of Georgia: Institute for Interdisciplinary Research in Education and Human Development 07 Generalized Multilevel

More information

11.0 APPENDIX-B: COMPUTATION

11.0 APPENDIX-B: COMPUTATION 11.0 APPENDIX-B: COMPUTATION Computational details and the pseudo codes of the time-varying ARX(p t ) model and the MI-SRI composite imputation method will be given by using different statistical packages

More information