Missing Data and Imputation
|
|
- Sharon Norton
- 5 years ago
- Views:
Transcription
1 Missing Data and Imputation Hoff Chapter 7, GH Chapter 25 April 21, 2017
2 Bednets and Malaria Y:presence or absence of parasites in a blood smear AGE: age of child BEDNET: bed net use (exposure) GREEN:greenness of the surrounding vegetation based on satellite photography PHC: whether a village is part of a primary health-care system
3 Bednets and Malaria malaria = readcsv("gambiadat", header=true) summary(malaria) Y AGE BEDNET GREEN Min :00000 Min :1000 Min :00000 Min :2885 Min 1st Qu: st Qu:1000 1st Qu: st Qu:4085 1st Q Median :00000 Median :2000 Median :10000 Median :4085 Media Mean :03093 Mean :2399 Mean :07049 Mean :3984 Mean 3rd Qu: rd Qu:3000 3rd Qu: rd Qu:4085 3rd Q Max :10000 Max :4000 Max :10000 Max :4765 Max NA's :317 39% missing
4 More about missingness Consider Probability of missingness - are certain groups more likely to have missing data?
5 More about missingness Consider Probability of missingness - are certain groups more likely to have missing data? Are certain responses more likely to be missing? (ie individuals with high income are more likely to not report it) probability of missing depends on value of outcome
6 More about missingness Consider Probability of missingness - are certain groups more likely to have missing data? Are certain responses more likely to be missing? (ie individuals with high income are more likely to not report it) probability of missing depends on value of outcome Analysis depends on assumptions about missingness
7 Mechanisms for Missingness Missing Completely at random (MCAR): missingness does not depend on outcome or other variables
8 Mechanisms for Missingness Missing Completely at random (MCAR): missingness does not depend on outcome or other variables Missing at Random: missing does not depend on value of variable, but may depend on other variables
9 Mechanisms for Missingness Missing Completely at random (MCAR): missingness does not depend on outcome or other variables Missing at Random: missing does not depend on value of variable, but may depend on other variables Missing Not at Random: missingness depends on the variable that is missing
10 Mechanisms for Missingness Missing Completely at random (MCAR): missingness does not depend on outcome or other variables Missing at Random: missing does not depend on value of variable, but may depend on other variables Missing Not at Random: missingness depends on the variable that is missing Cannot tell from data
11 Modeling Delete subjects with any missing observations This would remove 39 % of the data and reduces power Induces Bias if data are not missing completely at random!
12 Modeling Delete subjects with any missing observations This would remove 39 % of the data and reduces power Induces Bias if data are not missing completely at random! Replace each missing value with an estimated mean (plug-in approach) This implies that we are certain about the values of the missing cases, so any measures of uncertainty in parameter estimates are overly optimistic (too narrow) Distorts correlation structure in data
13 Modeling Delete subjects with any missing observations This would remove 39 % of the data and reduces power Induces Bias if data are not missing completely at random! Replace each missing value with an estimated mean (plug-in approach) This implies that we are certain about the values of the missing cases, so any measures of uncertainty in parameter estimates are overly optimistic (too narrow) Distorts correlation structure in data Work with likelihoods based on observed data; this will be a product of marginal distributions, difficult to work with
14 Modeling Delete subjects with any missing observations This would remove 39 % of the data and reduces power Induces Bias if data are not missing completely at random! Replace each missing value with an estimated mean (plug-in approach) This implies that we are certain about the values of the missing cases, so any measures of uncertainty in parameter estimates are overly optimistic (too narrow) Distorts correlation structure in data Work with likelihoods based on observed data; this will be a product of marginal distributions, difficult to work with Model Based Methods
15 Observed Data (Y i,1, Y i,2, Y i,3, Y i,4, Y i,5 ) (O i,1, O i,2, O i,3, O i,4, O i,5 )
16 Observed Data (Y i,1, Y i,2, Y i,3, Y i,4, Y i,5 ) (O i,1, O i,2, O i,3, O i,4, O i,5 ) where O i,j is 1 if Y i,j is observed and O i,j is 0 if Y i,j is missing
17 Observed Data (Y i,1, Y i,2, Y i,3, Y i,4, Y i,5 ) (O i,1, O i,2, O i,3, O i,4, O i,5 ) where O i,j is 1 if Y i,j is observed and O i,j is 0 if Y i,j is missing Missing at Random Data: O i and Y i are independent given θ
18 Observed Data (Y i,1, Y i,2, Y i,3, Y i,4, Y i,5 ) (O i,1, O i,2, O i,3, O i,4, O i,5 ) where O i,j is 1 if Y i,j is observed and O i,j is 0 if Y i,j is missing Missing at Random Data: O i and Y i are independent given θ distribution for O i does not depend on θ
19 Observed Data (Y i,1, Y i,2, Y i,3, Y i,4, Y i,5 ) (O i,1, O i,2, O i,3, O i,4, O i,5 ) where O i,j is 1 if Y i,j is observed and O i,j is 0 if Y i,j is missing Missing at Random Data: O i and Y i are independent given θ distribution for O i does not depend on θ Marginal Model for observed data p(o i, y[o i = 1] θ) = p(o i )p(y[o i = 1] θ) = p(o i ) p(y i,1, y i,2, y i,3, y i,4, y i,5 θ) y i,j o i,j =0 dy i,j
20 Observed Data (Y i,1, Y i,2, Y i,3, Y i,4, Y i,5 ) (O i,1, O i,2, O i,3, O i,4, O i,5 ) where O i,j is 1 if Y i,j is observed and O i,j is 0 if Y i,j is missing Missing at Random Data: O i and Y i are independent given θ distribution for O i does not depend on θ Marginal Model for observed data p(o i, y[o i = 1] θ) = p(o i )p(y[o i = 1] θ) = p(o i ) p(y i,1, y i,2, y i,3, y i,4, y i,5 θ) Integrate over the missing variables to obtain the likelihood y i,j o i,j =0 dy i,j
21 Use the Gibbs Sampler to Integrate If we had complete data then we would draw θ from the condition distribution of θ Y class for sampling µ and Σ Add a step at each iteration to generate the missing data:
22 Use the Gibbs Sampler to Integrate If we had complete data then we would draw θ from the condition distribution of θ Y class for sampling µ and Σ Add a step at each iteration to generate the missing data: Generate Y (t+1) miss from p(y miss Y obs, θ (t) ) and fill in the missing data to obtain a complete matrix Y from Y obs and Y miss
23 Use the Gibbs Sampler to Integrate If we had complete data then we would draw θ from the condition distribution of θ Y class for sampling µ and Σ Add a step at each iteration to generate the missing data: Generate Y (t+1) miss from p(y miss Y obs, θ (t) ) and fill in the missing data to obtain a complete matrix Y from Y obs and Y miss Generate θ (t+1) from p(θ Y obs, Y (t+1) miss, )
24 Use the Gibbs Sampler to Integrate If we had complete data then we would draw θ from the condition distribution of θ Y class for sampling µ and Σ Add a step at each iteration to generate the missing data: Generate Y (t+1) miss from p(y miss Y obs, θ (t) ) and fill in the missing data to obtain a complete matrix Y from Y obs and Y miss Generate θ (t+1) from p(θ Y obs, Y (t+1) miss, ) Averaging over the draws of Y miss integrates marginalizes over the missing dimensions
25 JAGS Model model = function() { for (i in 1:N) { Y[i] ~ dbern(p[i]) logit(p[i]) <- alpha + betaage*age[i] + betabednet*bednet[i] +betagreen*green[i] + betaphc*phc[i] } # model for missing exposure variable for (i in 1:N) { BEDNET[i] ~ dbern(q) #prior model for whether or not child # sleeps under treated bednet } #uniform prior (uniform) on prob of sleeping under bednet q ~ dbeta (1,1) #vague priors on regression coefficients alpha ~ dnorm(0, ) betaage ~ dnorm(0, ) betabednet ~ dnorm(0, ) betagreen ~ dnorm(0, ) betaphc ~ dnorm(0, ) # calculate odds ratios of interest ORbednet <- exp(betabednet) #OR of malaria for children using bednet }
26 Posterior Density theta = asdataframe(sim$bugsoutput$simsmatrix) plot(density(theta[,1]), xlab="or Bednet", main="") OR Bednet Density
27 JAGS Model model2 = function() { for (i in 1:N) { Y[i] ~ dbern(p[i]) logit(p[i]) <- alpha + betaage*age[i] + betabednet*bednet[i] +betagreen*green[i] + betaphc*phc[i] } # model for missing exposure variable for (i in 1:N) { BEDNET[i] ~ dbern(q[i]) #prior model for bednet use logit(q[i]) <- gamma[1] + gamma[2]*phc[i] #allow prob depend on PHC } #vague priors on regression coefficients gamma[1] ~ dnorm(0, ) gamma[2] ~ dnorm(0, ) alpha ~ dnorm(0, ) betaage ~ dnorm(0, ) betabednet ~ dnorm(0, ) betagreen ~ dnorm(0, ) betaphc ~ dnorm(0, ) # calculate odds ratios of interest ORbednet <- exp(betabednet) #OR of malaria for children using bednet
28 Posterior Density thetaphc = asdataframe(simphc$bugsoutput$simsmatrix) plot(density(thetaphc[,1]), xlab="or Malaria Bednet", main="") OR Malaria Bednet Density
29 Posterior Density plot(density(thetaphc[,"orbednetphc"]), xlab="or BEDNET PHC", main="" OR BEDNET PHC Density
30 intervals exp(confint(glm(y ~, data=malaria, family=binomial), parm="bednet")) 25 % 975 % HPDinterval(asmcmc(theta)) lower upper ORbednet betabednet deviance attr(,"probability") [1] 095 HPDinterval(asmcmc(thetaphc)) lower upper ORbednet ORbednetPHC deviance attr(,"probability")
31 More than one variable with missing data Model each predictor (joint distribution)
32 More than one variable with missing data Model each predictor (joint distribution) Coherent sequential model of conditional distributions
33 More than one variable with missing data Model each predictor (joint distribution) Coherent sequential model of conditional distributions Handle Mix of Discrete and Continuous
34 More than one variable with missing data Model each predictor (joint distribution) Coherent sequential model of conditional distributions Handle Mix of Discrete and Continuous Categorical: Continuation Ratios easiest
35 More than one variable with missing data Model each predictor (joint distribution) Coherent sequential model of conditional distributions Handle Mix of Discrete and Continuous Categorical: Continuation Ratios easiest
36 Missing Not at Random probability of missing depends on predictor
37 Missing Not at Random probability of missing depends on predictor need to model joint missingness indicator and outcomes
38 Missing Not at Random probability of missing depends on predictor need to model joint missingness indicator and outcomes model missingness given variables
39 Missing Not at Random probability of missing depends on predictor need to model joint missingness indicator and outcomes model missingness given variables need more information!
40 Summary Make sure you know how missing data are coded!
41 Summary Make sure you know how missing data are coded! Think about why they are missing; ie if there is no garage then there can be no garage condition
42 Summary Make sure you know how missing data are coded! Think about why they are missing; ie if there is no garage then there can be no garage condition Joint Models require understanding more about the data and reasons for missingness and more sophisticated modelling
43 Summary Make sure you know how missing data are coded! Think about why they are missing; ie if there is no garage then there can be no garage condition Joint Models require understanding more about the data and reasons for missingness and more sophisticated modelling
Missing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1
More informationLecture 26: Missing data
Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:
More informationMarkov Chain Monte Carlo (part 1)
Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients
More informationMissing Data and Imputation
Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex
More informationBayesian model selection and diagnostics
Bayesian model selection and diagnostics A typical Bayesian analysis compares a handful of models. Example 1: Consider the spline model for the motorcycle data, how many basis functions? Example 2: Consider
More informationMCMC Methods for data modeling
MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms
More informationApproaches to Missing Data
Approaches to Missing Data A Presentation by Russell Barbour, Ph.D. Center for Interdisciplinary Research on AIDS (CIRA) and Eugenia Buta, Ph.D. CIRA and The Yale Center of Analytical Studies (YCAS) April
More informationMultiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health
Multiple Imputation for Missing Data Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Outline Missing data mechanisms What is Multiple Imputation? Software Options
More informationLinear Modeling with Bayesian Statistics
Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the
More informationStatistical Matching using Fractional Imputation
Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:
More informationMCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24
MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,
More information1 Methods for Posterior Simulation
1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing
More informationLogistic Regression. (Dichotomous predicted variable) Tim Frasier
Logistic Regression (Dichotomous predicted variable) Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information.
More informationin this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a
Chapter 3 Missing Data 3.1 Types of Missing Data ˆ Missing completely at random (MCAR) ˆ Missing at random (MAR) a ˆ Informative missing (non-ignorable non-response) See 1, 38, 59 for an introduction to
More informationMissing Data. Where did it go?
Missing Data Where did it go? 1 Learning Objectives High-level discussion of some techniques Identify type of missingness Single vs Multiple Imputation My favourite technique 2 Problem Uh data are missing
More informationCART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology
CART Classification and Regression Trees Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART CART stands for Classification And Regression Trees.
More informationCHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS
Examples: Missing Data Modeling And Bayesian Analysis CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Mplus provides estimation of models with missing data using both frequentist and Bayesian
More informationResampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016
Resampling Methods Levi Waldron, CUNY School of Public Health July 13, 2016 Outline and introduction Objectives: prediction or inference? Cross-validation Bootstrap Permutation Test Monte Carlo Simulation
More informationMultiple Imputation with Mplus
Multiple Imputation with Mplus Tihomir Asparouhov and Bengt Muthén Version 2 September 29, 2010 1 1 Introduction Conducting multiple imputation (MI) can sometimes be quite intricate. In this note we provide
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationHandling missing data for indicators, Susanne Rässler 1
Handling Missing Data for Indicators Susanne Rässler Institute for Employment Research & Federal Employment Agency Nürnberg, Germany First Workshop on Indicators in the Knowledge Economy, Tübingen, 3-4
More informationExpectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University
Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate
More information(Not That) Advanced Hierarchical Models
(Not That) Advanced Hierarchical Models Ben Goodrich StanCon: January 10, 2018 Ben Goodrich Advanced Hierarchical Models StanCon 1 / 13 Obligatory Disclosure Ben is an employee of Columbia University,
More informationClustering Relational Data using the Infinite Relational Model
Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew Ludkin September 4, 2015 Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015
More informationImage analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis
7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning
More informationTypes of missingness and common strategies
9 th UK Stata Users Meeting 20 May 2003 Multiple imputation for missing data in life course studies Bianca De Stavola and Valerie McCormack (London School of Hygiene and Tropical Medicine) Motivating example
More informationMissing Data. SPIDA 2012 Part 6 Mixed Models with R:
The best solution to the missing data problem is not to have any. Stef van Buuren, developer of mice SPIDA 2012 Part 6 Mixed Models with R: Missing Data Georges Monette 1 May 2012 Email: georges@yorku.ca
More informationComputer vision: models, learning and inference. Chapter 10 Graphical Models
Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x
More informationBART STAT8810, Fall 2017
BART STAT8810, Fall 2017 M.T. Pratola November 1, 2017 Today BART: Bayesian Additive Regression Trees BART: Bayesian Additive Regression Trees Additive model generalizes the single-tree regression model:
More informationMotivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background
An Introduction to Multiple Imputation and its Application Craig K. Enders University of California - Los Angeles Department of Psychology cenders@psych.ucla.edu Background Work supported by Institute
More informationRegression. Dr. G. Bharadwaja Kumar VIT Chennai
Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called
More informationBayesian Inference for Sample Surveys
Bayesian Inference for Sample Surveys Trivellore Raghunathan (Raghu) Director, Survey Research Center Professor of Biostatistics University of Michigan Distinctive features of survey inference 1. Primary
More informationHandling Data with Three Types of Missing Values:
Handling Data with Three Types of Missing Values: A Simulation Study Jennifer Boyko Advisor: Ofer Harel Department of Statistics University of Connecticut Storrs, CT May 21, 2013 Jennifer Boyko Handling
More informationCalibration and emulation of TIE-GCM
Calibration and emulation of TIE-GCM Serge Guillas School of Mathematics Georgia Institute of Technology Jonathan Rougier University of Bristol Big Thanks to Crystal Linkletter (SFU-SAMSI summer school)
More informationCITS4009 Introduction to Data Science
School of Computer Science and Software Engineering CITS4009 Introduction to Data Science SEMESTER 2, 2017: CHAPTER 4 MANAGING DATA 1 Chapter Objectives Fixing data quality problems Organizing your data
More informationMultiple imputation using chained equations: Issues and guidance for practice
Multiple imputation using chained equations: Issues and guidance for practice Ian R. White, Patrick Royston and Angela M. Wood http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/full By Gabrielle Simoneau
More informationBayesian Computation with JAGS
JAGS is Just Another Gibbs Sampler Cross-platform Accessible from within R Bayesian Computation with JAGS What I did Downloaded and installed JAGS. In the R package installer, downloaded rjags and dependencies.
More informationMonte Carlo for Spatial Models
Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing
More informationIntroduction to Bayesian Analysis in Stata
tools Introduction to Bayesian Analysis in Gustavo Sánchez Corp LLC September 15, 2017 Porto, Portugal tools 1 Bayesian analysis: 2 Basic Concepts The tools 14: The command 15: The bayes prefix Postestimation
More informationEstimation of Item Response Models
Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:
More informationPanel Data 4: Fixed Effects vs Random Effects Models
Panel Data 4: Fixed Effects vs Random Effects Models Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 4, 2017 These notes borrow very heavily, sometimes verbatim,
More informationarxiv: v1 [stat.me] 29 May 2015
MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis Vincent Audigier 1, François Husson 2 and Julie Josse 2 arxiv:1505.08116v1 [stat.me] 29 May 2015 Applied Mathematics
More informationFaculty of Sciences. Holger Cevallos Valdiviezo
Faculty of Sciences Handling of missing data in the predictor variables when using Tree-based techniques for training and generating predictions Holger Cevallos Valdiviezo Master dissertation submitted
More informationThe linear mixed model: modeling hierarchical and longitudinal data
The linear mixed model: modeling hierarchical and longitudinal data Analysis of Experimental Data AED The linear mixed model: modeling hierarchical and longitudinal data 1 of 44 Contents 1 Modeling Hierarchical
More informationMultiple Imputation for Continuous and Categorical Data: Comparing Joint and Conditional Approaches
Multiple Imputation for Continuous and Categorical Data: Comparing Joint and Conditional Approaches Jonathan Kropko University of Virginia Ben Goodrich Columbia University Andrew Gelman Columbia University
More informationBayes Estimators & Ridge Regression
Bayes Estimators & Ridge Regression Readings ISLR 6 STA 521 Duke University Merlise Clyde October 27, 2017 Model Assume that we have centered (as before) and rescaled X o (original X) so that X j = X o
More informationPerformance of Sequential Imputation Method in Multilevel Applications
Section on Survey Research Methods JSM 9 Performance of Sequential Imputation Method in Multilevel Applications Enxu Zhao, Recai M. Yucel New York State Department of Health, 8 N. Pearl St., Albany, NY
More informationCanopy Light: Synthesizing multiple data sources
Canopy Light: Synthesizing multiple data sources Tree growth depends upon light (previous example, lab 7) Hard to measure how much light an ADULT tree receives Multiple sources of proxy data Exposed Canopy
More informationGeostatistical Reservoir Characterization of McMurray Formation by 2-D Modeling
Geostatistical Reservoir Characterization of McMurray Formation by 2-D Modeling Weishan Ren, Oy Leuangthong and Clayton V. Deutsch Department of Civil & Environmental Engineering, University of Alberta
More informationMultiple-imputation analysis using Stata s mi command
Multiple-imputation analysis using Stata s mi command Yulia Marchenko Senior Statistician StataCorp LP 2009 UK Stata Users Group Meeting Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi
More informationSTENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015
STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................
More informationBayesian Modelling with JAGS and R
Bayesian Modelling with JAGS and R Martyn Plummer International Agency for Research on Cancer Rencontres R, 3 July 2012 CRAN Task View Bayesian Inference The CRAN Task View Bayesian Inference is maintained
More information[/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n} {NOT} {NODF} {NOPROB}] {NOCOUNTS} {NOMEANS}
MVA MVA [VARIABLES=] {varlist} {ALL } [/CATEGORICAL=varlist] [/MAXCAT={25 ** }] {n } [/ID=varname] Description: [/NOUNIVARIATE] [/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n}
More informationSemi- Supervised Learning
Semi- Supervised Learning Aarti Singh Machine Learning 10-601 Dec 1, 2011 Slides Courtesy: Jerry Zhu 1 Supervised Learning Feature Space Label Space Goal: Optimal predictor (Bayes Rule) depends on unknown
More informationwinbugs and openbugs
Eric F. Lock UMN Division of Biostatistics, SPH elock@umn.edu 04/19/2017 Bayesian estimation software Several stand-alone applications and add-ons to estimate Bayesian models Stand-alone applications:
More informationQuick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018
Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018 Contents Introduction... 1 Start DIONE... 2 Load Data... 3 Missing Values... 5 Explore Data... 6 One Variable... 6 Two Variables... 7 All
More informationRonald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa
Ronald H. Heck 1 In this handout, we will address a number of issues regarding missing data. It is often the case that the weakest point of a study is the quality of the data that can be brought to bear
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationBayesian data analysis using R
Bayesian data analysis using R BAYESIAN DATA ANALYSIS USING R Jouni Kerman, Samantha Cook, and Andrew Gelman Introduction Bayesian data analysis includes but is not limited to Bayesian inference (Gelman
More informationExpectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc.
Expectation-Maximization Methods in Population Analysis Robert J. Bauer, Ph.D. ICON plc. 1 Objective The objective of this tutorial is to briefly describe the statistical basis of Expectation-Maximization
More informationPoisson Regression and Model Checking
Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)
More informationRecap: The E-M algorithm. Biostatistics 615/815 Lecture 22: Gibbs Sampling. Recap - Local minimization methods
Recap: The E-M algorithm Biostatistics 615/815 Lecture 22: Gibbs Sampling Expectation step (E-step) Given the current estimates of parameters λ (t), calculate the conditional distribution of latent variable
More information1 RefresheR. Figure 1.1: Soy ice cream flavor preferences
1 RefresheR Figure 1.1: Soy ice cream flavor preferences 2 The Shape of Data Figure 2.1: Frequency distribution of number of carburetors in mtcars dataset Figure 2.2: Daily temperature measurements from
More informationFHDI: An R Package for Fractional Hot Deck Imputation by Jongho Im, In Ho Cho, and Jae Kwang Kim
CONTRIBUTED RESEARCH ARTICLE 140 FHDI: An R Package for Fractional Hot Deck Imputation by Jongho Im, In Ho Cho, and Jae Kwang Kim Abstract Fractional hot deck imputation (FHDI), proposed by Kalton and
More informationA Basic Example of ANOVA in JAGS Joel S Steele
A Basic Example of ANOVA in JAGS Joel S Steele The purpose This demonstration is intended to show how a simple one-way ANOVA can be coded and run in the JAGS framework. This is by no means an exhaustive
More informationCHAPTER 3. BUILDING A USEFUL EXPONENTIAL RANDOM GRAPH MODEL
CHAPTER 3. BUILDING A USEFUL EXPONENTIAL RANDOM GRAPH MODEL Essentially, all models are wrong, but some are useful. Box and Draper (1979, p. 424), as cited in Box and Draper (2007) For decades, network
More informationPSS718 - Data Mining
Lecture 5 - Hacettepe University October 23, 2016 Data Issues Improving the performance of a model To improve the performance of a model, we mostly improve the data Source additional data Clean up the
More information[spa-temp.inf] Spatial-temporal information
[spa-temp.inf] Spatial-temporal information VI Table of Contents for Spatial-temporal information I. Spatial-temporal information........................................... VI - 1 A. Cohort-survival method.........................................
More informationTemporal Modeling and Missing Data Estimation for MODIS Vegetation data
Temporal Modeling and Missing Data Estimation for MODIS Vegetation data Rie Honda 1 Introduction The Moderate Resolution Imaging Spectroradiometer (MODIS) is the primary instrument on board NASA s Earth
More informationHANDLING MISSING DATA
GSO international workshop Mathematic, biostatistics and epidemiology of cancer Modeling and simulation of clinical trials Gregory GUERNEC 1, Valerie GARES 1,2 1 UMR1027 INSERM UNIVERSITY OF TOULOUSE III
More informationRegression III: Lab 4
Regression III: Lab 4 This lab will work through some model/variable selection problems, finite mixture models and missing data issues. You shouldn t feel obligated to work through this linearly, I would
More informationWarped Mixture Models
Warped Mixture Models Tomoharu Iwata, David Duvenaud, Zoubin Ghahramani Cambridge University Computational and Biological Learning Lab March 11, 2013 OUTLINE Motivation Gaussian Process Latent Variable
More informationMachine Learning in Telecommunications
Machine Learning in Telecommunications Paulos Charonyktakis & Maria Plakia Department of Computer Science, University of Crete Institute of Computer Science, FORTH Roadmap Motivation Supervised Learning
More informationSimulation of Imputation Effects Under Different Assumptions. Danny Rithy
Simulation of Imputation Effects Under Different Assumptions Danny Rithy ABSTRACT Missing data is something that we cannot always prevent. Data can be missing due to subjects' refusing to answer a sensitive
More informationPackage EMLRT. August 7, 2014
Package EMLRT August 7, 2014 Type Package Title Association Studies with Imputed SNPs Using Expectation-Maximization-Likelihood-Ratio Test LazyData yes Version 1.0 Date 2014-08-01 Author Maintainer
More informationExpected Value of Partial Perfect Information in Hybrid Models Using Dynamic Discretization
Received September 13, 2017, accepted January 15, 2018, date of publication January 31, 2018, date of current version March 12, 2018. Digital Object Identifier 10.1109/ACCESS.2018.2799527 Expected Value
More informationMachine Learning: An Applied Econometric Approach Online Appendix
Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail
More informationMissing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999.
2 Schafer, J. L., Graham, J. W.: (2002). Missing Data: Our View of the State of the Art. Psychological methods, 2002, Vol 7, No 2, 47 77 Rosner, B. (2005) Fundamentals of Biostatistics, 6th ed, Wiley.
More informationPredictive Checking. Readings GH Chapter 6-8. February 8, 2017
Predictive Checking Readings GH Chapter 6-8 February 8, 2017 Model Choice and Model Checking 2 Questions: 1. Is my Model good enough? (no alternative models in mind) 2. Which Model is best? (comparison
More informationMultiple Imputation for Multilevel Models with Missing Data Using Stat-JR
Multiple Imputation for Multilevel Models with Missing Data Using Stat-JR Introduction In this document we introduce a Stat-JR super-template for 2-level data that allows for missing values in explanatory
More informationWill Monroe July 21, with materials by Mehran Sahami and Chris Piech. Joint Distributions
Will Monroe July 1, 017 with materials by Mehran Sahami and Chris Piech Joint Distributions Review: Normal random variable An normal (= Gaussian) random variable is a good approximation to many other distributions.
More informationOrganizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set
Fitting Mixed-Effects Models Using the lme4 Package in R Deepayan Sarkar Fred Hutchinson Cancer Research Center 18 September 2008 Organizing data in R Standard rectangular data sets (columns are variables,
More informationISyE8843A, Brani Vidakovic Handout 14
ISyE8843A, Brani Vidakovic Handout 4 BUGS BUGS is freely available software for constructing Bayesian statistical models and evaluating them using MCMC methodology. BUGS and WINBUGS are distributed freely
More informationGraphical Models, Bayesian Method, Sampling, and Variational Inference
Graphical Models, Bayesian Method, Sampling, and Variational Inference With Application in Function MRI Analysis and Other Imaging Problems Wei Liu Scientific Computing and Imaging Institute University
More informationLecture 12. August 23, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.
Lecture 12 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University August 23, 2007 1 2 3 4 5 1 2 Introduce the bootstrap 3 the bootstrap algorithm 4 Example
More informationProblem Set 4. Assigned: March 23, 2006 Due: April 17, (6.882) Belief Propagation for Segmentation
6.098/6.882 Computational Photography 1 Problem Set 4 Assigned: March 23, 2006 Due: April 17, 2006 Problem 1 (6.882) Belief Propagation for Segmentation In this problem you will set-up a Markov Random
More informationOutline. Bayesian Data Analysis Hierarchical models. Rat tumor data. Errandum: exercise GCSR 3.11
Outline Bayesian Data Analysis Hierarchical models Helle Sørensen May 15, 2009 Today: More about the rat tumor data: model, derivation of posteriors, the actual computations in R. : a hierarchical normal
More informationPredict Outcomes and Reveal Relationships in Categorical Data
PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,
More informationA Nonparametric Bayesian Approach to Detecting Spatial Activation Patterns in fmri Data
A Nonparametric Bayesian Approach to Detecting Spatial Activation Patterns in fmri Data Seyoung Kim, Padhraic Smyth, and Hal Stern Bren School of Information and Computer Sciences University of California,
More informationoptimization_machine_probit_bush106.c
optimization_machine_probit_bush106.c. probit ybush black00 south hispanic00 income owner00 dwnom1n dwnom2n Iteration 0: log likelihood = -299.27289 Iteration 1: log likelihood = -154.89847 Iteration 2:
More informationSelf Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation
For Intelligent GPS Navigation and Traffic Interpretation Tianshi Gao Stanford University tianshig@stanford.edu 1. Introduction Imagine that you are driving on the highway at 70 mph and trying to figure
More informationAn Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics
Practical 1: Getting started in OpenBUGS Slide 1 An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics Dr. Christian Asseburg Centre for Health Economics Practical 1 Getting
More informationA GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM
A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM Jayawant Mandrekar, Daniel J. Sargent, Paul J. Novotny, Jeff A. Sloan Mayo Clinic, Rochester, MN 55905 ABSTRACT A general
More informationMissing Data: What Are You Missing?
Missing Data: What Are You Missing? Craig D. Newgard, MD, MPH Jason S. Haukoos, MD, MS Roger J. Lewis, MD, PhD Society for Academic Emergency Medicine Annual Meeting San Francisco, CA May 006 INTRODUCTION
More informationR Programming Basics - Useful Builtin Functions for Statistics
R Programming Basics - Useful Builtin Functions for Statistics Vectorized Arithmetic - most arthimetic operations in R work on vectors. Here are a few commonly used summary statistics. testvect = c(1,3,5,2,9,10,7,8,6)
More information- 1 - Fig. A5.1 Missing value analysis dialog box
WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationNONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR
NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR J. D. Maca July 1, 1997 Abstract The purpose of this manual is to demonstrate the usage of software for
More informationStatistical matching: conditional. independence assumption and auxiliary information
Statistical matching: conditional Training Course Record Linkage and Statistical Matching Mauro Scanu Istat scanu [at] istat.it independence assumption and auxiliary information Outline The conditional
More information