Statistical Matching using Fractional Imputation
|
|
- Bernadette Erica Clark
- 5 years ago
- Views:
Transcription
1 Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park
2 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application: Measurement error models 5 Simulation Study 6 Conclusion Kim (ISU) Matching 2 / 35
3 Introduction Motivation Combine information from several surveys Example: Two surveys 1 Survey A: Observe X and Y 1 2 Survey B: Observe X and Y 2 Want to create a data file with X, Y 1, Y 2. If Survey B sample is a subset of Survey A sample, then we may use record linkage technique to obtain Y 1 value for survey B sample. What if the two samples are independent? Kim (ISU) Matching 3 / 35
4 Introduction Table : A Simple Data structure for Matching X Y 1 Y 2 Sample A o o Sample B o o Kim (ISU) Matching 4 / 35
5 Introduction Table : Data after statistical matching X Y 1 Y 2 Sample A o o o Sample B o o o Also called data fusion, or data combination. Kim (ISU) Matching 5 / 35
6 Introduction Example 1 Split questionnaire design Split the original sample into two groups In group 1, ask (x, y 1 ) In group 2, ask (x, y 2 ) Often used to reduce the response burden (and improve the quality of the survey responses). Kim (ISU) Matching 6 / 35
7 Introduction Example 2 Combining two surveys Survey A: Health-related survey Survey B: Socio-Economic surveys x: demographic variable, y 1 : health status variable, y 2 : socio-economic variable Interested in fitting a regression of y 1 (e.g. Obesity) on x and y 2 using two surveys. Two samples should be obtained from the same finite population. Kim (ISU) Matching 7 / 35
8 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application: Measurement error models 5 Simulation Study 6 Conclusion Kim (ISU) Matching 8 / 35
9 Introduction Idea We want to create Y 1 for each element in sample B by finding a statistical twin from the sample A. Often based on the assumption that Y 1 and Y 2 are conditionally independent, conditional on X. That is, Y 1 Y 2 X Under CI (Conditional Independence) assumption, we have f (y 1 x, y 2 ) = f (y 1 x) and the statistical twin is solely determined by how close they are in terms of x s. Kim (ISU) Matching 9 / 35
10 Introduction Remark Under the assumption that (X, Y 1, Y 2 ) are multivariate normal, the CI assumption means that σ 12 = σ 1x σ 2x /σ xx and ρ 12 = ρ 1x ρ 2x. That is, σ 12 is determined from other parameters, rather than estimated from the realized samples. Kim (ISU) Matching 10 / 35
11 Existing Methods Methods under CI assumption Synthetic data imputation: 1 Estimate f (y 1 x) from sample A, denoted by ˆf a (y 1 x). 2 For each element in sample B, use the x i value to create imputed value(s) from ˆf a (y 1 x). Matching: Two-step method Instead of using the synthetic values directly for imputation, synthetic values are used to identify the statistical twins in sample A. The identified twin in sample A is used as the imputed value. Kim (ISU) Matching 11 / 35
12 Existing Methods Some popular methods under CI assumption Parametric approach : Often based on the parametric model or regression model ŷ 1i = ˆβ 0 + ˆβ 1 x i Nonparametric approach Random hot deck Rank hot deck Distance hot deck Reference D Orazio, Di Zio, and Scanu (2006). Statistical Matching: Theory and Practice, Wiley. Kim (ISU) Matching 12 / 35
13 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application: Measurement error models 5 Simulation Study 6 Conclusion Kim (ISU) Matching 13 / 35
14 New Approach Motivation The regression of Y 1 on X and Y 2 will provide insignificant regression coefficient on Y 2. That is, the p-value for ˆβ 2 will be large in ŷ 1 = ˆβ 0 + ˆβ 1 x + ˆβ 2 y 2 CI assumption is often unrealistic! For example, 1 Often X is demographic variable 2 Y 1 is social-behavior (or public health) 3 Y 2 is economic variable (e.g. HH income) In this case, we may have Corr(Y 1, Y 2 X ) 0 Kim (ISU) Matching 14 / 35
15 New Approach Alternative interpretation We can view the problem as an omitted variable regression problem. y 1 = β (1) 0 + β (1) 1 x + β(1) 2 z + e 1 y 2 = β (2) 0 + β (2) 1 x + β(2) 2 z + e 2 where z, e 1, e 2 are never observed. e 1 and e 2 are independent. z is an unobservable confounding factor that explains Cov(y 1, y 2 x) 0. Thus, if we fit a regression of (y 1, y 2 ) on x, then the error terms are still correlated. Kim (ISU) Matching 15 / 35
16 New Approach Instrumental variable Under CI assumption, imputed values are generated from f (y 1 x), which completely ignores the observed information of y 2. Let s try to generate imputed values from f (y 1 x, y 2 ). However, we cannot estimate the parameters in f (y 1 x, y 2 ). Use instrumental variable assumption for identification of the models. Kim (ISU) Matching 16 / 35
17 New Approach Idea Decompose X = (X 1, X 2 ) such that (i) f (y 1 x 1, x 2, y 2 ) = f (y 1 x 1, y 2 ) (ii) f (y 1 x 1, x 2 = a) f (y 1 x 1, x 2 = b) for some a b. X 2 is often called instrumental variable (IV) for Y 2 Kim (ISU) Matching 17 / 35
18 New Approach Propose method Under IV assumption, f (y 1 x, y 2 ) f (y 1 x) f (y 2 x 1, y 1 ) The second term can be ignored under CI assumption. The second term incorporates the observed information of y 2 in Sample B. EM algorithm can be used to perform the parameter estimation and prediction simultaneously. E-step can be computationally heavy (Markov Chain Monte Carlo). Metropolis-Hastings algorithm 1 Generate y 1 from ˆf a (y 1 x). 2 Accept y 1 if f (y 2 x 1, y 1 ; ˆθ) is large at the current parameter value ˆθ. Kim (ISU) Matching 18 / 35
19 New Approach Propose method Parametric fractional imputation (PFI) of Kim (2011) is an alternative computational tool that does not involve MCMC computation but still implements EM algorithm with intractable E-step. PFI uses importance sampling: When the target distribution is f (y 1 x, y 2 ) f (y 1 x) f (y 2 x 1, y 1 ), first generate m values of y1 f (y 1 x) and then use a normalized version of f (y 2 x 1, y1 ) as a weight assigned to y 1. Solve the weighted score equation to update the parameters in the M-step. Kim (ISU) Matching 19 / 35
20 New Approach Propose method: Parametric fractional imputation 1 For each i B, generate m imputed values of y 1, denoted by y (1) 1i,, y (m) 1i, from ˆf a (y 1 x i ). 2 Let ˆθ t be the current parameter value of θ in f (y 2 x 1, y 1 ). For the j-th imputed value y (j) 1i, assign fractional weight where m j=1 w ij = 1. w ij f ( y 2i x 1i, y (j) 1i ; ˆθ t ) 3 Solve the fractionally imputed score equation for θ m w ib i B j=1 w ij S(θ; x 1i, y (j) 1i, y 2i ) = 0 to update ˆθ t+1, where S(θ; x 1, y 1, y 2 ) = log f (y 2 x 1, y 1 ; θ)/ θ. 4 Go to step 2 and continue until convergence. Kim (ISU) Matching 20 / 35
21 Remark Fractional imputation can be understood as a tool for computing a Monte Carlo approximation of the conditional expectation given the observation. Fractionally imputed data file can be used to obtain many different parameters. That is, if a parameter η is defined as a solution to E{U(η; x, y 1, y 2 )} = 0, then a consistent estimator of η can be obtained by the solution to m w ib i B j=1 w ij U(η; x i, y (j) 1i, y 2i ) = 0. Note that the above estimating equation is a Monte Carlo approximation to the following estimating equation: w ib E{U(η; x i, Y 1i, y 2i ) x i, y 2i } = 0. i B For variance estimation, linearization method can be used (Skipped here). Kim (ISU) Matching 21 / 35
22 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application: Measurement error models 5 Simulation Study 6 Conclusion Kim (ISU) Matching 22 / 35
23 Application to Measurement error models Interested in estimating θ in f (y x; θ). Instead of observing x, we observe z which can be highly correlated with x. Thus, z is an instrumental variable for x: f (y x, z) = f (y x) and f (y z = a) f (y z = b) for a b. In addition to original sample, we have a separate calibration sample that observes (x i, z i ). Kim (ISU) Matching 23 / 35
24 Example: Measurement error model Table : External Calibration Study Z X Y Sample A o o Sample B o o Table : Internal Calibration Study Sample Z X Y Validation Subsample o o o Non-validation subsample o o Kim (ISU) Matching 24 / 35
25 Remark Internal calibration study: Two-phase sampling structure Phase One: observe (z, y) Phase Two: validation subsample, observe x in addition to (z, y) Imputation approach for two-phase sampling Estimate f (x z, y) from the second phase sample. For the elements in the phase one sample, generate x ˆf (x z, y). For external calibration study, we use the proposed statistical matching technique under the assumption that f (y x, z) = f (y x). Kim (ISU) Matching 25 / 35
26 Proposed method: Idea In sample B, x is a latent variable (a variable that is always missing). The goal is to generate x in Sample B from f (x i z i, y i ) f (x i z i ) f (y i x i, z i ) = f (x i z i ) f (y i x i ) Obtain a consistent estimator ˆf a (x z) from sample A. May use a Monte Carlo EM algorithm E-step: Generate x (1) i,, x (m) i from f (x i z i, y i ; ˆθ (t) ) ˆf a (x i z i )f (y i x i ; ˆθ (t) ) M-step: Solve the imputed score equation for θ. Kim (ISU) Matching 26 / 35
27 Fractional imputation for EM algorithm The above E-step may be computationally challenging (often relies on a MCMC method) Parametric fractional imputation can be used for easy computation. E-step 1 Generate x (1) i,, x (m) i from ˆf a (x i z i ) in i B. 2 Compute the fractional weights associated with x (j) i w ij f (y i x (j) i ; ˆθ (t) ) and j w ij = 1. M-step: Solve the weighted score equation for θ. by Kim (ISU) Matching 27 / 35
28 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application: Measurement error models 5 Simulation Study 6 Conclusion Kim (ISU) Matching 28 / 35
29 Simulation Setup Measurement error model setup y i Bernoulli(p i ) logit(p i ) = γ 0 + γ x x i z i = β 0 + β 1 x i + u i u i N(0, σ 2 xi 2α ) and x i N(µ x, σx). 2 We observe (x i, z i ), i = 1,..., n A in sample A. In sample B, instead of observing (x i, y i ), we observe (z i, y i ). For the simulation, n A = n B = 800, γ 0 = 1, γ x = 1, β 0 = 0, β 1 = 1, σ 2 = 0.25, α = 0.4, µ x = 0, and σ 2 x = 1. Kim (ISU) Matching 29 / 35
30 Methods 1 Parametric fractional imputation (PFI) 2 Hot deck fractional imputation (HDFI) 3 Naive: Naive estimator obtained from the logistic regression of y i on z i for i B. 4 Bayes: Proposed by Guo and Little (2011). GIBBS sampling is implemented with JAGS. We used 1000 iterations of a single chain for inference, after discarding the first 500 for burn-in. We specify diffuse proper prior distributions for the Bayes estimators. Letting θ 1 = (log(σ 2 x), log(σ 2 ), µ x, β 0, β 1, γ 0, γ x ), we assume a priori that θ 1 N(0, 10 6 I 7 ), where I 7 is a 7 7 identity matrix. The prior distribution for the power α is uniform on the interval [ 5, 5]. 5 Weighted regression calibration (WRC): regression calibration method incorporating the unequal variance in the measurement error model (also considered in Guo and Little, 2011). Kim (ISU) Matching 30 / 35
31 Simulation result Table : Monte Carlo (MC) means, variances, and mean squared errors (MSE) of point estimators of γ x Method MC Bias MC Variance MC MSE PFI HDFI Naive Bayes WRC Kim (ISU) Matching 31 / 35
32 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application: Measurement error models 5 Simulation Study 6 Conclusion Kim (ISU) Matching 32 / 35
33 Concluding Remark Statistical matching is a tool for survey data integration. The current practice of statistical matching is based on conditional independence assumption, which may not be a realistic assumption in practice. A new approach based on instrumental variable is proposed. The proposed method provides statistically valid regression coefficient for the matched data even when CI assumption does not hold. Variance estimation is possible (not covered here). Directly applicable to measurement error model problems and split questionnaire design problems. Kim (ISU) Matching 33 / 35
34 Future research Semi-parametric inference by making ˆf a (y 1 x) nonparametric. f (y 1 x, y 2 ) f (y 1 x) f (y 2 x 1, y 1 ) Application to causal inference: Estimation of average treatment effect from observational studies when we cannot observe the counterfactual outcomes. Combination of two data: one from probability sampling and the other from a non-probability sample. Kim (ISU) Matching 34 / 35
35 The end Kim (ISU) Matching 35 / 35
An imputation approach for analyzing mixed-mode surveys
An imputation approach for analyzing mixed-mode surveys Jae-kwang Kim 1 Iowa State University June 4, 2013 1 Joint work with S. Park and S. Kim Ouline Introduction Proposed Methodology Application to Private
More informationMarkov chain Monte Carlo methods
Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis
More informationMCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24
MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,
More informationA noninformative Bayesian approach to small area estimation
A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported
More informationStatistical matching: conditional. independence assumption and auxiliary information
Statistical matching: conditional Training Course Record Linkage and Statistical Matching Mauro Scanu Istat scanu [at] istat.it independence assumption and auxiliary information Outline The conditional
More informationCHAPTER 1 INTRODUCTION
Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,
More informationMissing Data and Imputation
Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex
More informationMCMC Methods for data modeling
MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms
More informationEvaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support
Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Topics Validation of biomedical models Data-splitting Resampling Cross-validation
More informationDynamic Thresholding for Image Analysis
Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British
More information1 Methods for Posterior Simulation
1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing
More informationWeek 4: Simple Linear Regression II
Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties
More informationStochastic Simulation: Algorithms and Analysis
Soren Asmussen Peter W. Glynn Stochastic Simulation: Algorithms and Analysis et Springer Contents Preface Notation v xii I What This Book Is About 1 1 An Illustrative Example: The Single-Server Queue 1
More informationCS281 Section 9: Graph Models and Practical MCMC
CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs
More informationHandling missing data for indicators, Susanne Rässler 1
Handling Missing Data for Indicators Susanne Rässler Institute for Employment Research & Federal Employment Agency Nürnberg, Germany First Workshop on Indicators in the Knowledge Economy, Tübingen, 3-4
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationImage analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis
7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationGraphical Models, Bayesian Method, Sampling, and Variational Inference
Graphical Models, Bayesian Method, Sampling, and Variational Inference With Application in Function MRI Analysis and Other Imaging Problems Wei Liu Scientific Computing and Imaging Institute University
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationMonte Carlo for Spatial Models
Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing
More informationPerformance of Sequential Imputation Method in Multilevel Applications
Section on Survey Research Methods JSM 9 Performance of Sequential Imputation Method in Multilevel Applications Enxu Zhao, Recai M. Yucel New York State Department of Health, 8 N. Pearl St., Albany, NY
More informationLinear Modeling with Bayesian Statistics
Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements Project poster session on Thursday Dec 3, 4-6pm in Annenberg 2 nd floor atrium! Easels, poster boards and cookies
More informationHandbook of Statistical Modeling for the Social and Behavioral Sciences
Handbook of Statistical Modeling for the Social and Behavioral Sciences Edited by Gerhard Arminger Bergische Universität Wuppertal Wuppertal, Germany Clifford С. Clogg Late of Pennsylvania State University
More informationClustering Relational Data using the Infinite Relational Model
Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew Ludkin September 4, 2015 Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationConvexization in Markov Chain Monte Carlo
in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non
More informationStatistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland
Statistical Analysis Using Combined Data Sources: Discussion 2011 JPSM Distinguished Lecture University of Maryland 1 1 University of Michigan School of Public Health April 2011 Complete (Ideal) vs. Observed
More informationarxiv: v1 [stat.me] 29 May 2015
MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis Vincent Audigier 1, François Husson 2 and Julie Josse 2 arxiv:1505.08116v1 [stat.me] 29 May 2015 Applied Mathematics
More informationCalibration and emulation of TIE-GCM
Calibration and emulation of TIE-GCM Serge Guillas School of Mathematics Georgia Institute of Technology Jonathan Rougier University of Bristol Big Thanks to Crystal Linkletter (SFU-SAMSI summer school)
More informationFHDI: An R Package for Fractional Hot Deck Imputation by Jongho Im, In Ho Cho, and Jae Kwang Kim
CONTRIBUTED RESEARCH ARTICLE 140 FHDI: An R Package for Fractional Hot Deck Imputation by Jongho Im, In Ho Cho, and Jae Kwang Kim Abstract Fractional hot deck imputation (FHDI), proposed by Kalton and
More informationBayesian Modelling with JAGS and R
Bayesian Modelling with JAGS and R Martyn Plummer International Agency for Research on Cancer Rencontres R, 3 July 2012 CRAN Task View Bayesian Inference The CRAN Task View Bayesian Inference is maintained
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationMultiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health
Multiple Imputation for Missing Data Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Outline Missing data mechanisms What is Multiple Imputation? Software Options
More informationMissing Data Missing Data Methods in ML Multiple Imputation
Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:
More informationCollective classification in network data
1 / 50 Collective classification in network data Seminar on graphs, UCSB 2009 Outline 2 / 50 1 Problem 2 Methods Local methods Global methods 3 Experiments Outline 3 / 50 1 Problem 2 Methods Local methods
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationChapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea
Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.
More informationUsing DIC to compare selection models with non-ignorable missing responses
Using DIC to compare selection models with non-ignorable missing responses Abstract Data with missing responses generated by a non-ignorable missingness mechanism can be analysed by jointly modelling the
More informationStatistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400
Statistics (STAT) 1 Statistics (STAT) STAT 1200: Introductory Statistical Reasoning Statistical concepts for critically evaluation quantitative information. Descriptive statistics, probability, estimation,
More informationADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION
ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION CHRISTOPHER A. SIMS Abstract. A new algorithm for sampling from an arbitrary pdf. 1. Introduction Consider the standard problem of
More informationExponential Random Graph Models for Social Networks
Exponential Random Graph Models for Social Networks ERGM Introduction Martina Morris Departments of Sociology, Statistics University of Washington Departments of Sociology, Statistics, and EECS, and Institute
More informationBayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri
Bayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri Galin L. Jones 1 School of Statistics University of Minnesota March 2015 1 Joint with Martin Bezener and John Hughes Experiment
More informationPost-stratification and calibration
Post-stratification and calibration Thomas Lumley UW Biostatistics WNAR 2008 6 22 What are they? Post-stratification and calibration are ways to use auxiliary information on the population (or the phase-one
More informationLecture 7: Linear Regression (continued)
Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions
More informationCHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA
Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent
More informationThis module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics
This module is part of the Memobust Handboo on Methodology of Modern Business Statistics 26 March 2014 Method: Statistical Matching Methods Contents General section... 3 Summary... 3 2. General description
More informationAnalysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS
Analysis of Panel Data Third Edition Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Contents Preface to the ThirdEdition Preface to the Second Edition Preface to the First Edition
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1
More informationin this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a
Chapter 3 Missing Data 3.1 Types of Missing Data ˆ Missing completely at random (MCAR) ˆ Missing at random (MAR) a ˆ Informative missing (non-ignorable non-response) See 1, 38, 59 for an introduction to
More informationAnalysis of Incomplete Multivariate Data
Analysis of Incomplete Multivariate Data J. L. Schafer Department of Statistics The Pennsylvania State University USA CHAPMAN & HALL/CRC A CR.C Press Company Boca Raton London New York Washington, D.C.
More informationGAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.
GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential
More informationSemiparametric Mixed Effecs with Hierarchical DP Mixture
Semiparametric Mixed Effecs with Hierarchical DP Mixture R topics documented: April 21, 2007 hdpm-package........................................ 1 hdpm............................................ 2 hdpmfitsetup........................................
More informationMissing Data. Where did it go?
Missing Data Where did it go? 1 Learning Objectives High-level discussion of some techniques Identify type of missingness Single vs Multiple Imputation My favourite technique 2 Problem Uh data are missing
More informationMODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES
UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in
More informationProbabilistic Graphical Models
Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational
More informationMissing Data Analysis with SPSS
Missing Data Analysis with SPSS Meng-Ting Lo (lo.194@osu.edu) Department of Educational Studies Quantitative Research, Evaluation and Measurement Program (QREM) Research Methodology Center (RMC) Outline
More informationNONPARAMETRIC REGRESSION WIT MEASUREMENT ERROR: SOME RECENT PR David Ruppert Cornell University
NONPARAMETRIC REGRESSION WIT MEASUREMENT ERROR: SOME RECENT PR David Ruppert Cornell University www.orie.cornell.edu/ davidr (These transparencies, preprints, and references a link to Recent Talks and
More informationSTATISTICS (STAT) Statistics (STAT) 1
Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).
More informationBlending of Probability and Convenience Samples:
Blending of Probability and Convenience Samples: Applications to a Survey of Military Caregivers Michael Robbins RAND Corporation Collaborators: Bonnie Ghosh-Dastidar, Rajeev Ramchand September 25, 2017
More informationWeek 4: Simple Linear Regression III
Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of
More informationModeling Criminal Careers as Departures From a Unimodal Population Age-Crime Curve: The Case of Marijuana Use
Modeling Criminal Careers as Departures From a Unimodal Population Curve: The Case of Marijuana Use Donatello Telesca, Elena A. Erosheva, Derek A. Kreader, & Ross Matsueda April 15, 2014 extends Telesca
More informationTypes of missingness and common strategies
9 th UK Stata Users Meeting 20 May 2003 Multiple imputation for missing data in life course studies Bianca De Stavola and Valerie McCormack (London School of Hygiene and Tropical Medicine) Motivating example
More informationLecture 26: Missing data
Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:
More informationChapter 1. Introduction
Chapter 1 Introduction A Monte Carlo method is a compuational method that uses random numbers to compute (estimate) some quantity of interest. Very often the quantity we want to compute is the mean of
More informationMarkov Chain Monte Carlo (part 1)
Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationBeviMed Guide. Daniel Greene
BeviMed Guide Daniel Greene 1 Introduction BeviMed [1] is a procedure for evaluating the evidence of association between allele configurations across rare variants, typically within a genomic locus, and
More informationMultiple Imputation with Mplus
Multiple Imputation with Mplus Tihomir Asparouhov and Bengt Muthén Version 2 September 29, 2010 1 1 Introduction Conducting multiple imputation (MI) can sometimes be quite intricate. In this note we provide
More informationOverview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week
Statistics & Bayesian Inference Lecture 3 Joe Zuntz Overview Overview & Motivation Metropolis Hastings Monte Carlo Methods Importance sampling Direct sampling Gibbs sampling Monte-Carlo Markov Chains Emcee
More informationQuantitative Biology II!
Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationSOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.
SOS3003 Applied data analysis for social science Lecture note 04-2009 Erling Berge Department of sociology and political science NTNU Erling Berge 2009 1 Missing data Literature Allison, Paul D 2002 Missing
More informationHANDLING MISSING DATA
GSO international workshop Mathematic, biostatistics and epidemiology of cancer Modeling and simulation of clinical trials Gregory GUERNEC 1, Valerie GARES 1,2 1 UMR1027 INSERM UNIVERSITY OF TOULOUSE III
More informationFeature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262
Feature Selection Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2016 239 / 262 What is Feature Selection? Department Biosysteme Karsten Borgwardt Data Mining Course Basel
More informationMissing data a data value that should have been recorded, but for some reason, was not. Simon Day: Dictionary for clinical trials, Wiley, 1999.
2 Schafer, J. L., Graham, J. W.: (2002). Missing Data: Our View of the State of the Art. Psychological methods, 2002, Vol 7, No 2, 47 77 Rosner, B. (2005) Fundamentals of Biostatistics, 6th ed, Wiley.
More informationDATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R
DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 24 Overview 1 Background to the book 2 Crack growth example 3 Contents
More informationVariance Estimation in Presence of Imputation: an Application to an Istat Survey Data
Variance Estimation in Presence of Imputation: an Application to an Istat Survey Data Marco Di Zio, Stefano Falorsi, Ugo Guarnera, Orietta Luzi, Paolo Righi 1 Introduction Imputation is the commonly used
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationIntroduction to Mplus
Introduction to Mplus May 12, 2010 SPONSORED BY: Research Data Centre Population and Life Course Studies PLCS Interdisciplinary Development Initiative Piotr Wilk piotr.wilk@schulich.uwo.ca OVERVIEW Mplus
More informationRolling Markov Chain Monte Carlo
Rolling Markov Chain Monte Carlo Din-Houn Lau Imperial College London Joint work with Axel Gandy 4 th July 2013 Predict final ranks of the each team. Updates quick update of predictions. Accuracy control
More informationA GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM
A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM Jayawant Mandrekar, Daniel J. Sargent, Paul J. Novotny, Jeff A. Sloan Mayo Clinic, Rochester, MN 55905 ABSTRACT A general
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationPerformance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018
Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:
More informationWeek 10: Heteroskedasticity II
Week 10: Heteroskedasticity II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Dealing with heteroskedasticy
More informationMissing data analysis. University College London, 2015
Missing data analysis University College London, 2015 Contents 1. Introduction 2. Missing-data mechanisms 3. Missing-data methods that discard data 4. Simple approaches that retain all the data 5. RIBG
More informationApproximate Bayesian Computation. Alireza Shafaei - April 2016
Approximate Bayesian Computation Alireza Shafaei - April 2016 The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested
More informationBootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping
Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,
More informationCIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]
CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.
More informationScalable Bayes Clustering for Outlier Detection Under Informative Sampling
Scalable Bayes Clustering for Outlier Detection Under Informative Sampling Based on JMLR paper of T. D. Savitsky Terrance D. Savitsky Office of Survey Methods Research FCSM - 2018 March 7-9, 2018 1 / 21
More informationLatent variable transformation using monotonic B-splines in PLS Path Modeling
Latent variable transformation using monotonic B-splines in PLS Path Modeling E. Jakobowicz CEDRIC, Conservatoire National des Arts et Métiers, 9 rue Saint Martin, 754 Paris Cedex 3, France EDF R&D, avenue
More informationRolling Markov Chain Monte Carlo
Rolling Markov Chain Monte Carlo Din-Houn Lau Imperial College London Joint work with Axel Gandy 4 th September 2013 RSS Conference 2013: Newcastle Output predicted final ranks of the each team. Updates
More informationBayesian Statistics Group 8th March Slice samplers. (A very brief introduction) The basic idea
Bayesian Statistics Group 8th March 2000 Slice samplers (A very brief introduction) The basic idea lacements To sample from a distribution, simply sample uniformly from the region under the density function
More informationVariability in Annual Temperature Profiles
Variability in Annual Temperature Profiles A Multivariate Spatial Analysis of Regional Climate Model Output Tamara Greasby, Stephan Sain Institute for Mathematics Applied to Geosciences, National Center
More informationMotivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background
An Introduction to Multiple Imputation and its Application Craig K. Enders University of California - Los Angeles Department of Psychology cenders@psych.ucla.edu Background Work supported by Institute
More informationHierarchical Bayesian Modeling with Ensemble MCMC. Eric B. Ford (Penn State) Bayesian Computing for Astronomical Data Analysis June 12, 2014
Hierarchical Bayesian Modeling with Ensemble MCMC Eric B. Ford (Penn State) Bayesian Computing for Astronomical Data Analysis June 12, 2014 Simple Markov Chain Monte Carlo Initialise chain with θ 0 (initial
More informationDATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R
DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 25 Overview 1 Background to the book 2 A motivating example from my own
More informationExam Issued: May 29, 2017, 13:00 Hand in: May 29, 2017, 16:00
P. Hadjidoukas, C. Papadimitriou ETH Zentrum, CTL E 13 CH-8092 Zürich High Performance Computing for Science and Engineering II Exam Issued: May 29, 2017, 13:00 Hand in: May 29, 2017, 16:00 Spring semester
More informationBayes Estimators & Ridge Regression
Bayes Estimators & Ridge Regression Readings ISLR 6 STA 521 Duke University Merlise Clyde October 27, 2017 Model Assume that we have centered (as before) and rescaled X o (original X) so that X j = X o
More information