Missing Data Analysis for the Employee Dataset
|
|
- Antonia McDowell
- 5 years ago
- Views:
Transcription
1 Missing Data Analysis for the Employee Dataset
2 67% of the observations have missing values!
3 Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1 if Y ij is missing R ij = 0 otherwise.
4 Missing Data Patterns Univariate: Missingness confined to single variables Y 1 Y 2 Y 3 Y 4
5 Missing Data Patterns Unit Nonresponse: Refuse to answer some variables. Y 1 Y 2 Y 3 Y 4
6 Missing Data Patterns Monotone (Longitudinal): Missing due to drop outs Y 1 Y 2 Y 3 Y 4
7 Missing Data Patterns General: Missing values spread throughout. Y 1 Y 2 Y 3 Y 4
8 Missing Data Patterns Latent Variables: All values of single variable are missing. Y 1 Y 2 Y 3 Y 4
9 Missing Data Mechanisms (Rubin 1976) 1. Missing Completely at Random (MCAR) [R, Y, ] =[R ][Y ] Parameters governing missing data Parameters of interest 2. Missing at Random (MAR) [R, Y, ] =[R Y obs, ][Y ] 3. Not Missing at Random (NMAR or MNAR) [R, Y, ] =[R Y obs, Y miss, ][Y ]
10 Missing Data Mechanisms (Rubin 1976) 1. Missing Completely at Random (MCAR) Y =(Y 1,Y 2 ):Y 1 always observed R =(R 1,R 2 ):R 2 B(1, 0.1)
11 Missing Data Mechanisms (Rubin 1976) 2. Missing at Random (MAR) Y =(Y 1,Y 2 ):Y 1 always observed R =(R 1,R 2 ):R 2 = 1(Y 1 < 1)
12 Missing Data Mechanisms (Rubin 1976) 3. Not Missing at Random (NMAR) Y =(Y 1,Y 2 ):Y 1 always observed R =(R 1,R 2 ):R 2 = 1(Y 2 < 1)
13 Missing Data Mechanisms (Rubin 1976) Why do we need to understand the missing data mechanism? If the data is NMAR then the missing data indicators, marginally, contain information about the parameters we are interested in. Y miss R Integrate out missing obs R Y obs Y obs Take home message: if data are NMAR, we have to model the missing data indicators.
14 Missing Data Mechanisms (Rubin 1976) Why do we need to understand the missing data mechanism? On the other hand, if data are MAR (or MCAR) then missing data indicators won t relate to parameters of interest. Y miss R Integrate out missing obs R Y obs Y obs Take home message: If data are MAR, we don t have to model the missing data indicators but we will need to include the incomplete obs (because of correlation).
15 Missing Data Mechanisms (Rubin 1976) How can we tell what missing data mechanism is present in the data? No way to tell if NMAR (there is no data) Can distinguish between MCAR and MAR o Fit a logistic regression of missing data indicator on observed data (if MCAR then nothing will be significant) o Compare distribution (via Kolmogorov-Smirnov test or simple t-tests) of observed data when R=1 vs. R=0.
16 Traditional Missing Data Methods Listwise Deletion: Use only the complete data. Advantages 1. Convenient 2. OK if data is MCAR Disadvantages 1. Bias results 2. Throws away much of the data.
17 Traditional Missing Data Methods Listwise Deletion: Wastes a lot of data N = Number of Observations P = Number of covariates = Prob. p th covariate is missing Assume R ip iid B(1, ) Case i = Complete B(1, (1 ) P ) # of Complete Cases B(N,(1 ) P ) E(# Complete Cases) = N(1 ) P
18 Traditional Missing Data Methods π = 0.02, N = 100 Number of Obs E(# of CC) E(# Thrown Out) P
19 Traditional Missing Data Methods Listwise Deletion: Use only the complete data. ˆµ 1 ˆµ 2 MCAR MAR NMAR MCAR MAR NMAR
20 Traditional Missing Data Methods Mean Imputation: Replace missing variables with mean (or mode) of that particular variable. Advantages 1. Convenient Disadvantages 1. Reduces variability of data 2. Reduces correlations
21 Traditional Missing Data Methods Mean Imputation: Replace missing variables with mean (or mode) of that particular variable. Y =(Y 1,Y 2 ):Y 1 always observed R =(R 1,R 2 ):R 2 = 1(Y 1 < 1)
22 Traditional Missing Data Methods Regression Imputation: Use complete cases to fit a regression then replace missing values with predicted values. Advantages 1. Convenient 2. Uses observed data to fill in missing data. Disadvantages 1. Increases correlations 2. Biases in variance estimates
23 Traditional Missing Data Methods Regression Imputation: Use complete cases to fit a regression then replace missing values with predicted values. Y =(Y 1,Y 2 ):Y 1 always observed R =(R 1,R 2 ):R 2 = 1(Y 1 < 1)
24 Traditional Missing Data Methods Stochastic Regression Imputation: Use complete cases to fit a regression then replace missing values a draw from prediction distribution. Advantages Disadvantages 1. Convenient 2. Uses observed data to fill in missing data. 1. Decrease standard errors. 3. Produces unbiased estimates of parameters if MAR.
25 Traditional Missing Data Methods Stochastic Regression Imputation: Use complete cases to fit a regression then replace missing values a draw from prediction distribution.
26 Traditional Missing Data Methods Hot Deck Imputation: Find K nearest neighbors then replace missing values with mean (or modes) of these nearest neighbors. Advantages Disadvantages 1. Convenient 2. Maintains univariate distributions. 1. Overestimates correlations (particularly when K=1). 3. Slight decrease in standard errors.
27 Traditional Missing Data Methods Hot Deck Imputation: Find K nearest neighbors then replace missing values with mean (or modes) of these nearest neighbors.
28 Modeling Missing Data Key Idea to Handling Missing Data: Need a multivariate model for Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 rather than just a univariate response. Common (and extremely useful) Multivariate Tool is the Multivariate Normal Distribution (MVN)
29 Review of MVN Distribution Let Y =(y 1,...,y P ) 0. If Y follows a multivariate normal (Gaussian) distribution then, Y N P (µ, Y ) ) f Y (y) = 1 2 P/2 1 exp Y 1/2 1 2 (y µ)0 1 Y (y µ) where, µ =(µ 1,...,µ P ) 0 is the mean vector and Y is the covariance matrix.
30 Review of MVN Distribution Partition, Y = Y1 Y 2, µ = µ1 µ 2, Y = The marginal distribution of Y 1 is Y 1 N (µ 1, 1 ). The conditional distribution of Y 1 Y 2 is Y 1 Y 2 N µ 1 2, 1 2 where µ 1 2 = µ (Y 2 µ 2 ) 1 2 =
31 Review of MVN Distribution How to draw from N (µ, ) : 1. Calculate Cholesky Decomposition 2. Draw Z N (0, I) 3. Set Y = µ + LZ = LL 0 mvn.draw <- mu+t(chol(sigma))%*%rnorm(p) Can you show? E(Y) =µ V(Y) =
32 Regression with the MVN Partition, Y i = yi, µ = X i µy, µ Y = X 2Y YX X 0 YX The conditional distribution of y i X i N y i X i µ y X, is 2 y X where µ y X = µ y + YX 1 X (X i µ X ) = µ y YX 1 X µ X {z } 0 + YX 1 X X i {z } 0 1 = X 0 i
33 Assessing MVN How do we know if data arise from a multivariate normal distribution? 1. Univariate histograms (or density) 2. Bivariate density estimates 3. Chi-square QQ plot (Y i µ) 0 1 (Y i µ) 2 p
34 Regression with the MVN Key Points: 1. If yi is MVN, then you get coefficients from X i covariance matrix. 2. Easy to get any conditional distribution (including distribution of x given y) via properties of the MVN. But, what are the MLEs of µ,? ˆµ = 1 X Y i N i ˆ = 1 X (Y i ˆµ)(Y i ˆµ) 0 N i
35 Maximum Likelihood Estimation with Missing Data Missing Data Log-likelihood: f Y (y ) : Joint dist of ALL data. L( ) = ny i=1 Marg. dist of observed data z } { Z Y i,miss f Y (y i,obs, y i,miss )dy i,miss Space of missing values for observation i (might be discrete).
36 Maximum Likelihood Estimation with Missing Data Missing Data Log-likelihood (MVN Example): Y N 2 apple 0 0, apple Y =(Y 1,Y 2 ):Y 1 always observed R =(R 1,R 2 ):R 2 = 1(Y 1 < 1) L(µ) = Y i:r i2 =0 N (Y i µ, ) Y i:r i2 =1 N Y i1 µ 1, 2 1
37 Maximum Likelihood Estimation with Missing Data Missing Data Log-likelihood (MVN Example):
38 Maximum Likelihood Estimation with Missing Data Missing Data Log-likelihood (MVN, No Cor, Example):
39 Maximum Likelihood Estimation with Missing Data How do we maximize the missing data LL? EM algorithm is particularly useful here How do we calculate standard errors from the missing data LL? 1. Asymptotics 2. Bootstrap ˆ d!n(,i 1 (ˆ ))
40 Maximum Likelihood Estimation with Missing Data Big Issues with MLE Approach 1. Oftentimes, integral is hard to compute. L( ) = ny i=1 Marg. dist of observed data z } { Z Y i,miss f Y (y i,obs, y i,miss )dy i,miss 2. Maximizing complete data likelihood is computationally faster (and sometimes analytically tractable). Solution: Multiple imputation (aka using Bayesian techniques without actually being Bayesian)
41 Multiple Imputation The three steps of multiple imputation: Imputation Estimation Pooling 0 Data Set 1 Estimate 1 Missing Data Data Set 2 Estimate 2 Final Results Data Set M Estimate M
42 Multiple Imputation The Imputation Step (Algorithm): 1. Choose an initial value of For m=1,,m i. for all i, draw missing values from the conditional distribution ii. set Y (m) i,miss f (y miss y obs, m 1 ) m = arg max L( Y obs, Y (m) miss )
43 Multiple Imputation The Imputation Step (Algorithm): A MVN Example Y N 2 apple 0 0, apple Y =(Y 1,Y 2 ):Y 1 always observed R =(R 1,R 2 ):R 2 = 1(Y 1 < 1)
44 Multiple Imputation The Imputation Step (Algorithm): A MVN Example 1. Set ˆµ 0 and ˆ 0 to be complete case empirical mean and covariance matrix. 2. For m=1,,m i. for all i, draw missing values from the conditional distribution y 2 N µ (y 1 µ 1 ), ii. set ˆµ m = 1 n nx y i ˆ m = 1 n nx (y i ˆµ m )(y i ˆµ m ) 0 i=1 i=1
45 Multiple Imputation The Imputation Step (Algorithm): Issues to Consider 1. The sequence of parameters and missing data imputations should converge.
46 Multiple Imputation The Imputation Step (Algorithm): Issues to Consider 2. How do we assess convergence? Trace plots Autocorrelation plots (Stat 651) Effective sample size (Stat 651) Convergence Diagnostics (Stat 651)
47 Multiple Imputation The Imputation Step (Algorithm): Issues to Consider 3. What do we do if we can t draw from Y (m) i,miss f (y miss y obs, m 1 )? Use Metropolis-Hastings Algorithm (take 651 and you ll learn).
48 Multiple Imputation The Analysis Phase Calculate the MLEs, SE s, Predictions, etc. (whatever you re interested in) for each imputed dataset
49 Multiple Imputation The Pooling Phase Pooling parameter estimates MX = ˆ m m=1 Note: this pooled estimate is most appropriate under normality of ˆ m s.
50 Multiple Imputation The Pooling Phase Pooling standard errors V w = 1 M MX m=1 SE 2 ( m ) V b = 1 M 1 MX 2 ( m ) m=1 V T = V w + V b + V b M ) SE pool = p V T
51 Multiple Imputation The Pooling Phase Fraction of Missing Information FMI = V b + V b /M V t Hypothesis testing and CIs t = p 0 T VT 1 =(M 1) FMI 2
52 Approaches for NMAR Selection Model Approach f(y,r) =f(r Y )f(y ) Challenges: 1. Need to relate missing data to missingness indicator so must have strong prior understanding.
53 Approaches for NMAR Pattern Mixture Approach f(y,r)=f(y R )f(r ) Challenges: 1. Need to relate model parameters to missingness indicator so must have strong prior understanding.
54 Expectations for Employee Analysis Expectations: 1. Carry out a regression using all the data (use missing data likelihood or multiple imputation). 2. Assume MVN for the whole observation vector.
Missing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients
More informationMissing Data Missing Data Methods in ML Multiple Imputation
Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:
More informationMissing Data. Where did it go?
Missing Data Where did it go? 1 Learning Objectives High-level discussion of some techniques Identify type of missingness Single vs Multiple Imputation My favourite technique 2 Problem Uh data are missing
More informationHandling Data with Three Types of Missing Values:
Handling Data with Three Types of Missing Values: A Simulation Study Jennifer Boyko Advisor: Ofer Harel Department of Statistics University of Connecticut Storrs, CT May 21, 2013 Jennifer Boyko Handling
More informationHANDLING MISSING DATA
GSO international workshop Mathematic, biostatistics and epidemiology of cancer Modeling and simulation of clinical trials Gregory GUERNEC 1, Valerie GARES 1,2 1 UMR1027 INSERM UNIVERSITY OF TOULOUSE III
More informationExpectation Maximization: Inferring model parameters and class labels
Expectation Maximization: Inferring model parameters and class labels Emily Fox University of Washington February 27, 2017 Mixture of Gaussian recap 1 2/26/17 Jumble of unlabeled images HISTOGRAM blue
More informationin this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a
Chapter 3 Missing Data 3.1 Types of Missing Data ˆ Missing completely at random (MCAR) ˆ Missing at random (MAR) a ˆ Informative missing (non-ignorable non-response) See 1, 38, 59 for an introduction to
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationMachine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017
Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis
More informationSimulation of Imputation Effects Under Different Assumptions. Danny Rithy
Simulation of Imputation Effects Under Different Assumptions Danny Rithy ABSTRACT Missing data is something that we cannot always prevent. Data can be missing due to subjects' refusing to answer a sensitive
More informationCHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS
Examples: Missing Data Modeling And Bayesian Analysis CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Mplus provides estimation of models with missing data using both frequentist and Bayesian
More informationMCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24
MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationCHAPTER 1 INTRODUCTION
Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,
More informationMissing Data Analysis with SPSS
Missing Data Analysis with SPSS Meng-Ting Lo (lo.194@osu.edu) Department of Educational Studies Quantitative Research, Evaluation and Measurement Program (QREM) Research Methodology Center (RMC) Outline
More informationMissing data analysis. University College London, 2015
Missing data analysis University College London, 2015 Contents 1. Introduction 2. Missing-data mechanisms 3. Missing-data methods that discard data 4. Simple approaches that retain all the data 5. RIBG
More informationMissing Data and Imputation
Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex
More informationSOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.
SOS3003 Applied data analysis for social science Lecture note 04-2009 Erling Berge Department of sociology and political science NTNU Erling Berge 2009 1 Missing data Literature Allison, Paul D 2002 Missing
More informationMissing Data: What Are You Missing?
Missing Data: What Are You Missing? Craig D. Newgard, MD, MPH Jason S. Haukoos, MD, MS Roger J. Lewis, MD, PhD Society for Academic Emergency Medicine Annual Meeting San Francisco, CA May 006 INTRODUCTION
More informationMODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES
UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in
More informationMissing Data. SPIDA 2012 Part 6 Mixed Models with R:
The best solution to the missing data problem is not to have any. Stef van Buuren, developer of mice SPIDA 2012 Part 6 Mixed Models with R: Missing Data Georges Monette 1 May 2012 Email: georges@yorku.ca
More informationMultiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health
Multiple Imputation for Missing Data Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Outline Missing data mechanisms What is Multiple Imputation? Software Options
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationMultiple Imputation with Mplus
Multiple Imputation with Mplus Tihomir Asparouhov and Bengt Muthén Version 2 September 29, 2010 1 1 Introduction Conducting multiple imputation (MI) can sometimes be quite intricate. In this note we provide
More informationRonald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa
Ronald H. Heck 1 In this handout, we will address a number of issues regarding missing data. It is often the case that the weakest point of a study is the quality of the data that can be brought to bear
More informationWELCOME! Lecture 3 Thommy Perlinger
Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important
More informationMultiple imputation using chained equations: Issues and guidance for practice
Multiple imputation using chained equations: Issues and guidance for practice Ian R. White, Patrick Royston and Angela M. Wood http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/full By Gabrielle Simoneau
More informationEpidemiological analysis PhD-course in epidemiology
Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 9. oktober 2012 Multivariate tables Agenda today Age standardization Missing data 1 2 3 4 Age standardization
More informationEpidemiological analysis PhD-course in epidemiology. Lau Caspar Thygesen Associate professor, PhD 25 th February 2014
Epidemiological analysis PhD-course in epidemiology Lau Caspar Thygesen Associate professor, PhD 25 th February 2014 Age standardization Incidence and prevalence are strongly agedependent Risks rising
More informationLecture 26: Missing data
Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:
More information1 Methods for Posterior Simulation
1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing
More informationOverview of various smoothers
Chapter 2 Overview of various smoothers A scatter plot smoother is a tool for finding structure in a scatter plot: Figure 2.1: CD4 cell count since seroconversion for HIV infected men. CD4 counts vs Time
More informationR software and examples
Handling Missing Data in R with MICE Handling Missing Data in R with MICE Why this course? Handling Missing Data in R with MICE Stef van Buuren, Methodology and Statistics, FSBS, Utrecht University Netherlands
More informationAmelia multiple imputation in R
Amelia multiple imputation in R January 2018 Boriana Pratt, Princeton University 1 Missing Data Missing data can be defined by the mechanism that leads to missingness. Three main types of missing data
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationMarkov Chain Monte Carlo (part 1)
Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for
More informationHandling missing data for indicators, Susanne Rässler 1
Handling Missing Data for Indicators Susanne Rässler Institute for Employment Research & Federal Employment Agency Nürnberg, Germany First Workshop on Indicators in the Knowledge Economy, Tübingen, 3-4
More informationPSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects
PSY 9556B (Jan8) Design Issues and Missing Data Continued Examples of Simulations for Projects Let s create a data for a variable measured repeatedly over five occasions We could create raw data (for each
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model
More informationComparison of Hot Deck and Multiple Imputation Methods Using Simulations for HCSDB Data
Comparison of Hot Deck and Multiple Imputation Methods Using Simulations for HCSDB Data Donsig Jang, Amang Sukasih, Xiaojing Lin Mathematica Policy Research, Inc. Thomas V. Williams TRICARE Management
More informationMonte Carlo for Spatial Models
Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing
More informationIssues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users
Practical Considerations for WinBUGS Users Kate Cowles, Ph.D. Department of Statistics and Actuarial Science University of Iowa 22S:138 Lecture 12 Oct. 3, 2003 Issues in MCMC use for Bayesian model fitting
More informationarxiv: v1 [stat.me] 29 May 2015
MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis Vincent Audigier 1, François Husson 2 and Julie Josse 2 arxiv:1505.08116v1 [stat.me] 29 May 2015 Applied Mathematics
More informationMarkov chain Monte Carlo methods
Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis
More informationCIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]
CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.
More informationBootstrap and multiple imputation under missing data in AR(1) models
EUROPEAN ACADEMIC RESEARCH Vol. VI, Issue 7/ October 2018 ISSN 2286-4822 www.euacademic.org Impact Factor: 3.4546 (UIF) DRJI Value: 5.9 (B+) Bootstrap and multiple imputation under missing ELJONA MILO
More informationSENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY
Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 6 SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE
More informationTypes of missingness and common strategies
9 th UK Stata Users Meeting 20 May 2003 Multiple imputation for missing data in life course studies Bianca De Stavola and Valerie McCormack (London School of Hygiene and Tropical Medicine) Motivating example
More information- 1 - Fig. A5.1 Missing value analysis dialog box
WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation
More informationESTIMATING THE MISSING VALUES IN ANALYSIS OF VARIANCE TABLES BY A FLEXIBLE ADAPTIVE ARTIFICIAL NEURAL NETWORK AND FUZZY REGRESSION MODELS
ESTIMATING THE MISSING VALUES IN ANALYSIS OF VARIANCE TABLES BY A FLEXIBLE ADAPTIVE ARTIFICIAL NEURAL NETWORK AND FUZZY REGRESSION MODELS Ali Azadeh - Zahra Saberi Hamidreza Behrouznia-Farzad Radmehr Peiman
More informationMultiple-imputation analysis using Stata s mi command
Multiple-imputation analysis using Stata s mi command Yulia Marchenko Senior Statistician StataCorp LP 2009 UK Stata Users Group Meeting Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi
More informationMissing Data Analysis with the Mahalanobis Distance
Missing Data Analysis with the Mahalanobis Distance by Elaine M. Berkery, B.Sc. Department of Mathematics and Statistics, University of Limerick A thesis submitted for the award of M.Sc. Supervisor: Dr.
More informationThe Performance of Multiple Imputation for Likert-type Items with Missing Data
Journal of Modern Applied Statistical Methods Volume 9 Issue 1 Article 8 5-1-2010 The Performance of Multiple Imputation for Likert-type Items with Missing Data Walter Leite University of Florida, Walter.Leite@coe.ufl.edu
More informationMissing Data Techniques
Missing Data Techniques Paul Philippe Pare Department of Sociology, UWO Centre for Population, Aging, and Health, UWO London Criminometrics (www.crimino.biz) 1 Introduction Missing data is a common problem
More informationMissing Data and Imputation
Missing Data and Imputation Hoff Chapter 7, GH Chapter 25 April 21, 2017 Bednets and Malaria Y:presence or absence of parasites in a blood smear AGE: age of child BEDNET: bed net use (exposure) GREEN:greenness
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationApproaches to Missing Data
Approaches to Missing Data A Presentation by Russell Barbour, Ph.D. Center for Interdisciplinary Research on AIDS (CIRA) and Eugenia Buta, Ph.D. CIRA and The Yale Center of Analytical Studies (YCAS) April
More informationMissing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999.
2 Schafer, J. L., Graham, J. W.: (2002). Missing Data: Our View of the State of the Art. Psychological methods, 2002, Vol 7, No 2, 47 77 Rosner, B. (2005) Fundamentals of Biostatistics, 6th ed, Wiley.
More informationTopics in Machine Learning-EE 5359 Model Assessment and Selection
Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationMotivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background
An Introduction to Multiple Imputation and its Application Craig K. Enders University of California - Los Angeles Department of Psychology cenders@psych.ucla.edu Background Work supported by Institute
More informationClustering web search results
Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means
More information10.4 Linear interpolation method Newton s method
10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationTime Series Analysis by State Space Methods
Time Series Analysis by State Space Methods Second Edition J. Durbin London School of Economics and Political Science and University College London S. J. Koopman Vrije Universiteit Amsterdam OXFORD UNIVERSITY
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationAnalysis of Incomplete Multivariate Data
Analysis of Incomplete Multivariate Data J. L. Schafer Department of Statistics The Pennsylvania State University USA CHAPMAN & HALL/CRC A CR.C Press Company Boca Raton London New York Washington, D.C.
More informationA STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY
A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY Norman Solomon School of Computing and Technology University of Sunderland A thesis submitted in partial fulfilment of the requirements of the University
More informationNonparametric Importance Sampling for Big Data
Nonparametric Importance Sampling for Big Data Abigael C. Nachtsheim Research Training Group Spring 2018 Advisor: Dr. Stufken SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Motivation Goal: build a model
More informationCalibration and emulation of TIE-GCM
Calibration and emulation of TIE-GCM Serge Guillas School of Mathematics Georgia Institute of Technology Jonathan Rougier University of Bristol Big Thanks to Crystal Linkletter (SFU-SAMSI summer school)
More informationStatistical matching: conditional. independence assumption and auxiliary information
Statistical matching: conditional Training Course Record Linkage and Statistical Matching Mauro Scanu Istat scanu [at] istat.it independence assumption and auxiliary information Outline The conditional
More informationMachine Learning Lecture 3
Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process
More informationModel Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer
Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationToday. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time
Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine
More informationMachine Learning. Supervised Learning. Manfred Huber
Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationStatistical Analysis of List Experiments
Statistical Analysis of List Experiments Kosuke Imai Princeton University Joint work with Graeme Blair October 29, 2010 Blair and Imai (Princeton) List Experiments NJIT (Mathematics) 1 / 26 Motivation
More informationStatistical Matching using Fractional Imputation
Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:
More informationLinear Modeling with Bayesian Statistics
Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the
More informationNORM software review: handling missing values with multiple imputation methods 1
METHODOLOGY UPDATE I Gusti Ngurah Darmawan NORM software review: handling missing values with multiple imputation methods 1 Evaluation studies often lack sophistication in their statistical analyses, particularly
More informationCPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.
More informationMachine Learning Lecture 3
Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course
More informationSupplementary Figure 1. Decoding results broken down for different ROIs
Supplementary Figure 1 Decoding results broken down for different ROIs Decoding results for areas V1, V2, V3, and V1 V3 combined. (a) Decoded and presented orientations are strongly correlated in areas
More informationWeek 4: Simple Linear Regression II
Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties
More informationMachine Learning Lecture 3
Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear
More informationLinear Regression and K-Nearest Neighbors 3/28/18
Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,
More informationChapter 2: Looking at Multivariate Data
Chapter 2: Looking at Multivariate Data Multivariate data could be presented in tables, but graphical presentations are more effective at displaying patterns. We can see the patterns in one variable at
More informationExpectation-Maximization. Nuno Vasconcelos ECE Department, UCSD
Expectation-Maximization Nuno Vasconcelos ECE Department, UCSD Plan for today last time we started talking about mixture models we introduced the main ideas behind EM to motivate EM, we looked at classification-maximization
More informationFaculty of Sciences. Holger Cevallos Valdiviezo
Faculty of Sciences Handling of missing data in the predictor variables when using Tree-based techniques for training and generating predictions Holger Cevallos Valdiviezo Master dissertation submitted
More informationExpectation Maximization: Inferring model parameters and class labels
Expectation Maximization: Inferring model parameters and class labels Emily Fox University of Washington February 27, 2017 Mixture of Gaussian recap 1 2/27/2017 Jumble of unlabeled images HISTOGRAM blue
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression
More informationPerformance of Sequential Imputation Method in Multilevel Applications
Section on Survey Research Methods JSM 9 Performance of Sequential Imputation Method in Multilevel Applications Enxu Zhao, Recai M. Yucel New York State Department of Health, 8 N. Pearl St., Albany, NY
More informationStat 342 Exam 3 Fall 2014
Stat 34 Exam 3 Fall 04 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed There are questions on the following 6 pages. Do as many of them as you can
More informationHomework #4 Programming Assignment Due: 11:59 pm, November 4, 2018
CSCI 567, Fall 18 Haipeng Luo Homework #4 Programming Assignment Due: 11:59 pm, ovember 4, 2018 General instructions Your repository will have now a directory P4/. Please do not change the name of this
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationEstimation of Item Response Models
Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationUnsupervised Learning with Non-Ignorable Missing Data
Unsupervised Learning with on-ignorable Missing Data Benjamin M. Marlin, Sam T. Roweis, Richard S. Zemel Department of Computer Science University of Toronto Toronto, Ontario Abstract In this paper we
More informationComparison of Alternative Imputation Methods for Ordinal Data
Comparison of Alternative Imputation Methods for Ordinal Data Federica Cugnata Silvia Salini Abstract In this paper, we compare alternative missing imputation methods in the presence of ordinal data, in
More information