AMELIA II: A Program for Missing Data

Similar documents
The Amelia Package. March 25, 2007

- 1 - Fig. A5.1 Missing value analysis dialog box

Amelia multiple imputation in R

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.

Basics: How to Calculate Standard Deviation in Excel

Predict Outcomes and Reveal Relationships in Categorical Data

Excel 2010 with XLSTAT

SPSS INSTRUCTION CHAPTER 9

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

Bivariate (Simple) Regression Analysis

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

After opening Stata for the first time: set scheme s1mono, permanently

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2018/10/30) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

SPSS TRAINING SPSS VIEWS

Fathom Dynamic Data TM Version 2 Specifications

IBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can:

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health

HOW TO USE THIS BOOK... V 1 GETTING STARTED... 2

If the active datasheet is empty when the StatWizard appears, a dialog box is displayed to assist in entering data.

Evaluating the Numerical Accuracy of Analyse-it for Microsoft Excel

Introduction to Mplus

3. CENTRAL TENDENCY MEASURES AND OTHER CLASSICAL ITEM ANALYSES OF THE 2011 MOD-MSA: MATHEMATICS

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.

An introduction to SPSS

AMELIA II: A Program for Missing Data

Box-Cox Transformation for Simple Linear Regression

STATISTICS FOR PSYCHOLOGISTS

Intermediate SPSS. If you have an SPSS dataset (*.sav), you can open it in the following way:

Teaching students quantitative methods using resources from the British Birth Cohorts

In Minitab interface has two windows named Session window and Worksheet window.

StatsMate. User Guide

Acknowledgments. Acronyms

Supplementary Notes on Multiple Imputation. Stephen du Toit and Gerhard Mels Scientific Software International

Create Custom Tables in No Time

Correctly Compute Complex Samples Statistics

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

IQR = number. summary: largest. = 2. Upper half: Q3 =

Nuts and Bolts Research Methods Symposium

Product Catalog. AcaStat. Software

Multiple Regression White paper

Smoking and Missingness: Computer Syntax 1

Missing Data Analysis for the Employee Dataset

Generalized least squares (GLS) estimates of the level-2 coefficients,

Pivot Tables, Lookup Tables and Scenarios

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

The Performance of Multiple Imputation for Likert-type Items with Missing Data

Book 5. Chapter 1: Slides with SmartArt & Pictures... 1 Working with SmartArt Formatting Pictures Adjust Group Buttons Picture Styles Group Buttons

Lecture #3: Environments

Creating a data file and entering data

Maximum Entropy (Maxent)

Using Weka for Classification. Preparing a data file

PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects

Correctly Compute Complex Samples Statistics

Missing data analysis. University College London, 2015

Statistics with a Hemacytometer

Comparison of Hot Deck and Multiple Imputation Methods Using Simulations for HCSDB Data

Properties. Comparing and Ordering Rational Numbers Using a Number Line

4b: Making an auxiliary table for calculating the standard deviation

Exploring and Understanding Data Using R.

A Beginner's Guide to. Randall E. Schumacker. The University of Alabama. Richard G. Lomax. The Ohio State University. Routledge

Group Administrator. ebills csv file formatting by class level. User Guide

JMP Book Descriptions

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Introduction to Excel Workshop

EXAMS IN THE GENESIS GRADEBOOK

Subset Selection in Multiple Regression

Fast or furious? - User analysis of SF Express Inc

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS

Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker,

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

Descriptives. Graph. [DataSet1] C:\Documents and Settings\BuroK\Desktop\Prestige.sav

Version 2.4 of Idiogrid

TABEL DISTRIBUSI DAN HUBUNGAN LENGKUNG RAHANG DAN INDEKS FASIAL N MIN MAX MEAN SD

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a

NCSS Statistical Software. Design Generator

AMELIA II: A Program for Missing Data

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES

BIOL 417: Biostatistics Laboratory #3 Tuesday, February 8, 2011 (snow day February 1) INTRODUCTION TO MYSTAT

Regression. Page 1. Notes. Output Created Comments Data. 26-Mar :31:18. Input. C:\Documents and Settings\BuroK\Desktop\Data Sets\Prestige.

Applied Regression Modeling: A Business Approach

Rational Numbers CHAPTER Introduction

An Example of Using inter5.exe to Obtain the Graph of an Interaction

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

Missing Data Analysis for the Employee Dataset

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa

Chapter 4. The Classification of Species and Colors of Finished Wooden Parts Using RBFNs

JMP 10 Student Edition Quick Guide

Analysis of Two-Level Designs

3.6 Sample code: yrbs_data <- read.spss("yrbs07.sav",to.data.frame=true)

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Cognalysis TM Reserving System User Manual

R for IR. Created by Narren Brown, Grinnell College, and Diane Saphire, Trinity University

From Building Better Models with JMP Pro. Full book available for purchase here.

Contents. Tutorials Section 1. About SAS Enterprise Guide ix About This Book xi Acknowledgments xiii

C A S I O f x S UNIVERSITY OF SOUTHERN QUEENSLAND. The Learning Centre Learning and Teaching Support Unit

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

[/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n} {NOT} {NODF} {NOPROB}] {NOCOUNTS} {NOMEANS}

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

Transcription:

AMELIA II: A Program for Missing Data Amelia II is an R package that performs multiple imputation to deal with missing data, instead of other methods, such as pairwise and listwise deletion. In multiple imputation, values are imputed for each missing cell in your data set and completed data sets are created. In these completed data sets, the observed values stay the same, but the missing values are filled in with imputations based on a bootstrapped EMB algorithm. After imputation, you conduct your statistical analyses with the completed data sets and then combine the results of the imputed data sets. For this example, the world95.sav from the PASW 17.0 sample data sets is being used. There are 11 variables, nine are continuous variables and two are categorical values. As you can see below, data is missing, signified by the blank cells. You can download the Amelia II package from http://gking.harvard.edu/amelia/ (please make sure you have downloaded the R program first [http://cran.r project.org/]). You can also use the standalone AmeliaView package by the following commands in R: > library(amelia) > AmeliaView() Page 1 of 8

Step 1. Use Amelia II to impute data for missing values. I. Once the package is downloaded, AmeliaView will open in a new window. II. III. Select Import CSV. Locate your file that is saved as a comma separated value (.csv) file and click open. The data is now loaded into the program. This view provides you descriptive statistics (Min, Max, Mean, SD) of your variables, as well as how many data points are missing per variable (Missing). For example, the urban variable is missing one value out of 109 values. Transformation. Use this option to classify variable measurement type and to transform variables (e.g., logistic or square root), if necessary. Lag. Use this option for time series data; lags are variables that take the value of another variable in the previous time period. Lead. Use this option for time series data; leads take the value of another variable in the next time period. Bounds. Use this option to place restrictions on the range of the imputed values. Page 2 of 8

IV. Transformation. The Amelia package recognized the country variable as an ID variable and classified it as such. To transform the two categorical variables (region and climate), right click on the row of the variable. Select Nominal. Page 3 of 8

Repeat the same steps for the climate variable. V. Bounds. Since four of the continuous variables (urban, literacy, lit_male, and lit_female) are percentages, bounds need to be added to restrict the imputed values range from 0 to 100. Also, the climate variable needs to be restricted for the available values of 1 to 9 (the region variable does not have missing data, so no bound added to the variable). Right click on the urban variable row, and select Add or Edit Bounds. The Add or Edit Bounds box appears for you to enter the minimum and maximum values; type 0 for minimum and 100 for maximum. Select OK. Repeat the same steps for the remaining variables that need bounds added. VI. Select Options from the top menu. Select Output File Options. The Output Options box appears; by default the name of the imputed datasets have imp at the end of the file name and 5 imputed datasets are selected. Select OK. Page 4 of 8

VII. Select Impute! Imputation is complete when Successful Imputation. appears at the bottom right of the screen. VIII. Select Output Log. The output log gives you the chain length of each imputation. For example, Imputation 4 s chain length was 133. IX. The 5 imputed datasets are saved in the same location as the original file. Page 5 of 8

X. Below is an example of one complete dataset from the imputed files. Page 6 of 8

Step 2. Pool parameter estimates and standard errors. I. Run statistical analyses (e.g., multiple regression, canonical correlation, etc.) on the 5 imputed datasets. II. Compute the mean of the parameter estimates of the 5 imputed datasets. For example, a multiple regression was conducted use the 5 imputed datasets and there are 5 beta estimates for the literacy variable. The mean of the 5 beta estimates is 3.41. Imputation b SE Variance (SE 2 ) 1 3.23 0.15 0.022 2 3.43 0.15 0.023 3 3.35 0.11 0.011 4 3.55 0.15 0.023 5 3.47 0.15 0.022 3.41 III. To pool the standard error estimates, you need to compute the within imputation variance and the between imputation variance. a. The within imputation variance is the average of the squared standard errors across the m analyses, U = 1 m m U ˆ i i 1 where Uˆi is the variance estimate from the i th imputed data set, and m is the number of imputations. First, sum the variance estimates (0.101), then multiply by one fifth. The withinimputation variance is 0.020. b. Between imputation variance is the variability of the m parameter estimates around the mean estimate. m 1 2 B = ( Qˆ i Q ) m i 1 where m is the number of imputations, Qˆi data, and Q is the mean parameter. is the beta estimate from the ith imputed Page 7 of 8

First, find the deviation scores for the beta estimates, square them, and then sum the squared deviations. Next, multiply that value by one fifth. The between imputation variance is 0.009. c. Use the following equation to compute the total variance: 1 T = U + 1 + B m For the example, the total variance is 0.031. Therefore, the multiple imputation standard error is 0.18 ( 0.031. Imputation b SE Variance (SE 2 ) 1 3.23 0.15 0.022 2 3.43 0.15 0.023 3 3.35 0.11 0.011 4 3.55 0.15 0.023 5 3.47 0.15 0.022 3.41 0.18 IV. Repeat the steps for all of the parameter estimates and standard errors in your model. Page 8 of 8