Case Study IV: Bayesian clustering of Alzheimer patients

Size: px
Start display at page:

Download "Case Study IV: Bayesian clustering of Alzheimer patients"

Transcription

1 Case Study IV: Bayesian clustering of Alzheimer patients Mike Wiper and Conchi Ausín Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School 2nd - 6th July, 2018 Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 1 / 26

2 Objective We illustrate how to use the EM algorithm, Gibbs sampling and Variational Bayes approximation for clustering of Alzeihemer patients. We would like to divide patients in subgroups according to the symptoms presented. Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 2 / 26

3 Alzeihemer data This data set can be downloaded from the BayesLCA R package. rm(list=ls()) library(bayeslca) data("alzheimer") This data set contains information about the presence or absence of six symptoms displayed by 240 patients diagnosed with early onset Alzheimer s disease recorded in the Mercer Institute of St. James Hospital in Dublin. attach(alzheimer) par(mfrow=c(2,3)) plot(hallucination) plot(activity) plot(aggression) plot(agitation) plot(diurnal) plot(affective) par(mfrow = c(1, 1)) Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 3 / 26

4 Latent class analysis We wish to obtain K groups of patients according to their symptoms. Thus, for each observation, x = (x 1,..., x M ), we may assume a K-component mixture of multivariate binary variables with probability distribution: Pr (x K, w, θ) = K M w K k=1 m=1 θ xm km (1 θ km) (1 xm) where w = {w K } and θ = {θ km } for K = 1,..., K and for m = 1,..., M. We assume the following prior distributions, w Dirichlet(δ 1,..., δ K ) θ km Beta (α km, β km ) Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 4 / 26

5 Latent class analysis Let x = {x 1,..., x N } be the sample of N = 126 patients. Each observation x i = (x i1,..., x im ) is a vector of M = 6 binary variables representing the presence or absence of the m-th symptom. We assume that there are K = 3 groups of patients and the prior probability of belonging to group k is w k. We also assume that the presence of the symptoms in each group follow independent Bernoulli distributions, Pr(x im θ km ) = θ x im km (1 θ km) 1 x im, for m = 1,..., M Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 5 / 26

6 Latent class analysis We may define a latent set of variables, z = {z 1,..., z N }, indicating the group of each patient. The prior probability that i-th patient belongs to group k is: Pr(z i = k w, θ) = w k and, given that the patient is in group k, Pr(x i z i = k, w, θ) = M m=1 θ x im km (1 θ km) 1 x im Then, the complete-data likelihood function is, f (x, z w, θ) = K k=1 i=1 N I(z i = k)w k M m=1 θ x im km (1 θ km) 1 x im Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 6 / 26

7 Latent class analysis Since we are assuming the following prior distributions, f (w δ 1,..., δ K ) K k=1 w (δ k 1) k f (θ km α km, β km ) θ α km 1 km (1 θ km ) (β km 1) We can obtain the log-posterior, [ K N M log f (w, θ x, z) = I(z i = k) log w k + k=1 k=1 i=1 K K M + (δ k 1) log w k + log k=1 m=1 m=1 { } ] log θ x im km (1 θ km) (1 x im) { θ (α km 1) km (1 θ km ) (β km 1) } Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 7 / 26

8 EM algorithm E Step Calculate E z xi,w (t),θ(t) [log f (w, θ x, z)] which depends on: [ E I(z i = k) x i, w (t), θ (t)] = Pr (z i = k x i, w (t), θ (t)) M Step Maximize the previous expectation. arg maxe z xi,w (t),θ(t) [log f (w, θ x, z)] w,θ Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 8 / 26

9 EM algorithm Repeat t until convergence: E Step M Step z (t+1) ik = w (t+1) k θ (t+1) km = w (t) k K s=1 w (t) s ) M m=1 (x Pr im θ (t) km M m=1 Pr (x im θ (t) sm = δ k + N i=1 z(t+1) ik 1 K s=1 δ s + N K α km + N i=1 z(t+1) ik x im α km + β km + N i=1 z(t+1) ik 2 ) Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 9 / 26

10 EM algorithm for Alzheimer data We apply the EM algorithm for our Alzheimer data and we assume three groups of patients. fit.em=blca(alzheimer, 3, method = "em") An important difficulty with the EM algorithm is that it may converge to a local maximum or saddle-point. Thus, the algorithm is run for a number of different starting values (5, by default). Parameter estimates from the run which achieved the highest log-posterior are returned. From only five starts, the algorithm obtains three distinct local maxima of the log-posterior. Then, it seems sensible to run the algorithm more times. fit.em=blca.em(alzheimer, 3,restarts=20) The algorithm provides MAP estimates of model parameters: print(fit.em) Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 10 / 26

11 EM algorithm for Alzheimer data Obtain complete information of prior specifications, EM performance, log-posterior, AIC and BIC results: summary(fit.em) Note that AIC and BIC can be used to select the number of patient groups. The MAP of class probabilities are: fit.em$classprob and the MAP of the item probabilities, conditional on class membership, are: fit.em$itemprob These estimates can be visualized with the following plot: par(mfrow=c(1,1)) plot(fit.em) Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 11 / 26

12 EM algorithm for Alzheimer data We may try different prior assumptions: fit.em=blca.em(alzheimer, 3,restarts=20,alpha=2,beta=2) print(fit.em) plot(fit.em) fit.em=blca.em(alzheimer, 3,restarts=20,alpha=0.001,beta=0.001) print(fit.em) plot(fit.em) We wish to approximate the whole posterior distribution of the model parameters rather than obtain only their MAP values. One possibility is using a Gibbs sampling. Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 12 / 26

13 Gibbs sampling Gibbs sampling is a particular MCMC method when the conditional posterior distributions are known. In order to obtain a sample from the joint distribution, f (w, θ, z x), we can sample iteratively from the conditional posterior distributions: ) 1 Sample θ (t+1) f (θ x, w (t), z (t) ) 2 Sample w (t+1) f (w x, θ (t+1), z (t) ) 3 Sample z (t+1) Pr (z x, θ (t+1), w (t+1) Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 13 / 26

14 Gibbs sampling We obtain that the conditional posterior distributions are given by: 1 For k = 1,..., m, and for k = 1,..., K, N i=1 f (θ km x, w, z) θ I(z i =k)x im +α km 1 N i=1 km (1 θ km ) I(z i =k)(1 x im )+β km 1 which is a Beta distribution. 2 And, f (w x, θ, z) which is a Dirichlet distribution. 3 Finally, for i = 1,..., N, Pr(z i = k x i, w, θ) = K k=1 N i=1 w I(z i =k)+δ k 1 k M w k m=1 Pr(x im θ km ) K s=1 w M s m=1 Pr(x im θ sm ) Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 14 / 26

15 Gibbs sampling algorithm for Alzheimer data We now apply the Gibbs sampling algorithm for the Alzheimer data. We initially use three groups: out=blca(alzheimer, 3, method = "gibbs") print(out) plot(out) We may also observe the plots of density estimates for model parameters. For the item probabilities, conditional on class membership: par(mfrow = c(3,2)) plot(out,which=3) And for the class probabilities: par(mfrow = c(1,1)) plot(out,which=4) Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 15 / 26

16 Prior sensitivity We may try different prior assumptions: out.prior2=blca(alzheimer, 3, method = "gibbs", alpha=2, beta=2) print(out.prior2) plot(out.prior2) out.prior3=blca(alzheimer, 3, method = "gibbs", alpha=0.001,beta=0.001) print(out.prior3) plot(out.prior3) Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 16 / 26

17 Model selection We may also try different values for the number of patient groups: out.size1=blca(alzheimer, 1, method = "gibbs") print(out.1) plot(out.1) out.size2=blca(alzheimer, 2, method = "gibbs") print(out3) plot(out3) We can use the DIC criteria (that will be studied in chapter 5) to select the mixture size. -out.size1$dic -out.size2$dic -out$dic Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 17 / 26

18 Gibbs sampling algorithm for Alzheimer data In all cases, we have run the Gibbs sampler over its default settings: with a burn-in of 100 iterations and thinning rate of 1. A convergence diagnosis must always be done. par(mfrow = c(4, 2)) plot(out, which = 5) We may observe that the mcmc performance is not very good. It seems that the mcmc chain has converged but it does not present a good mixing. This can be also observed with convergence diagnostic methods such as raftery.diag available in the coda package, which is automatically loaded in the BayesLCA package. raftery.diag(as.mcmc(out)) Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 18 / 26

19 Gibbs sampling algorithm for Alzheimer data The output of the convergence diagnostic suggests that the sampler converges quickly (burn-in values are low), but is not mixing satisfactorily (note the high dependence factor of many parameters). A Gibbs sampler with better tuned parameters can then be run: out2=blca(alzheimer, 3, method = "gibbs", burn.in = 150, thin = 1/10, iter = 50000) plot(out2, which = 5) Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 19 / 26

20 Gibbs sampling algorithm for Alzheimer data One question that should be mentioned is that the blca.gibbs function includes by default a relabeling method to reduce the label switching problem. This is a well-known problem in mixture models that appears due to the lack of identifiability. Without relabeling, we can observe that the label switching problem appears in the trace plots: fit.gs=blca(alzheimer, 3, method = "gibbs", relabel=f) plot(fit.gs, which = 5) Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 20 / 26

21 Variational Bayes The idea consists in approximating the posterior distribution f (ω, θ, z x) with a variational distribution q(w, θ, z) which assumes independence among block of parameters: q(w, θ, z) = q 1 (w γ)q 2 (θ ζ)q 3 (z φ) where (γ, ζ, φ) are the variational parameters. The VB approach looks for the distributions q j that minimize the Kullback-Leibler divergence between the posterior and variational approximation. In mixture models, it can be shown that the form of q j is the same as that of the conditional posterior distribution. Then, w γ Dirichlet(γ 1,..., γ k ) θ km ζ Beta(ζ km1, ζ km2 ) z i φ Multinomial(φ 1,..., φ n ) The variational parameters are updated iteratively until the KL divergence is minimized. Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 21 / 26

22 Variational Bayes We now apply the VB algorithm for the Alzheimer data: fit.vb=blca(alzheimer, 3, method = "vb") print(fit.vb) Observe that the Variational Bayes method is much more faster than the Gibbs sampling. And VB also provides posterior standard deviation estimates. fit.vb$itemprob fit.vb$classprob fit.vb$itemprob.sd fit.vb$classprob.sd MAP estimates are close to those obtained with the Gibss sampling algorithm: fit.gs$itemprob fit.vb$itemprob fit.gs$classprob fit.vb$classprob Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 22 / 26

23 Variational Bayes However, the Gibbs sampling provides a better approximation of the posterior distributions. Observe that the posterior standard deviation estimates are larger than those obtained for the VB method: fit.gs$itemprob.sd fit.gs$classprob.sd fit.vb$itemprob.sd fit.vb$classprob.sd Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 23 / 26

24 Variational Bayes We may also observe these differences in the plots of density estimates for model parameters. For the item probabilities, conditional on class membership: par(mfrow = c(3,2)) plot(fit.gs,which=3) plot(fit.vb,which=3) And for the class probabilities: par(mfrow = c(1,1)) plot(fit.gs,which=4) plot(fit.vb,which=4) Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 24 / 26

25 Variational Bayes One method for determining an appropriate number of classes to fit to the Alzheimer data is to deliberately over-fit the model, and then consider only the classes for which the posterior mean of w K is positive. fit.vb=blca(alzheimer, 10, method = "vb") fit.vb$classprob This suggests a 2-class fit is the best suited for the variational Bayes approximation. plot(fit.vb, which = 5) The multiple jumps in the lower bound indicate where components have emptied out. Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 25 / 26

26 Summary We have implemented a Bayesian approach for clustering of Alzheimer patients according to their symptoms. A mixture of K groups of multivariate binary distributions have been considered for modelling the observed data. We have implemented three different computational Bayesian methods to estimate finite mixture models: EM, MCMC and VB. The VB approximation provides a very fast procedure to estimate the posterior distribution of the model parameters. However, it is well known that the enforced independence between parameters that is imposed in VB approximations results in diminished variance estimates. A better approximation (although usually more time consuming) is provided by MCMC methods and, in particular, the Gibbs sampling. Standard MCMC methods can be extremely time consuming for big data sets. Mike Wiper and Conchi Ausín Clustering of Alzheimer patients Advanced Statistics and Data Mining 26 / 26

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24 MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,

More information

Clustering Relational Data using the Infinite Relational Model

Clustering Relational Data using the Infinite Relational Model Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew Ludkin September 4, 2015 Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015

More information

CS281 Section 9: Graph Models and Practical MCMC

CS281 Section 9: Graph Models and Practical MCMC CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs

More information

Markov Chain Monte Carlo (part 1)

Markov Chain Monte Carlo (part 1) Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for

More information

DPP: Reference documentation. version Luis M. Avila, Mike R. May and Jeffrey Ross-Ibarra 17th November 2017

DPP: Reference documentation. version Luis M. Avila, Mike R. May and Jeffrey Ross-Ibarra 17th November 2017 DPP: Reference documentation version 0.1.2 Luis M. Avila, Mike R. May and Jeffrey Ross-Ibarra 17th November 2017 1 Contents 1 DPP: Introduction 3 2 DPP: Classes and methods 3 2.1 Class: NormalModel.........................

More information

Linear Modeling with Bayesian Statistics

Linear Modeling with Bayesian Statistics Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

winbugs and openbugs

winbugs and openbugs Eric F. Lock UMN Division of Biostatistics, SPH elock@umn.edu 04/19/2017 Bayesian estimation software Several stand-alone applications and add-ons to estimate Bayesian models Stand-alone applications:

More information

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week Statistics & Bayesian Inference Lecture 3 Joe Zuntz Overview Overview & Motivation Metropolis Hastings Monte Carlo Methods Importance sampling Direct sampling Gibbs sampling Monte-Carlo Markov Chains Emcee

More information

Problem 1 (20 pt) Answer the following questions, and provide an explanation for each question.

Problem 1 (20 pt) Answer the following questions, and provide an explanation for each question. Problem 1 Answer the following questions, and provide an explanation for each question. (5 pt) Can linear regression work when all X values are the same? When all Y values are the same? (5 pt) Can linear

More information

Bayesian Modelling with JAGS and R

Bayesian Modelling with JAGS and R Bayesian Modelling with JAGS and R Martyn Plummer International Agency for Research on Cancer Rencontres R, 3 July 2012 CRAN Task View Bayesian Inference The CRAN Task View Bayesian Inference is maintained

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Computer vision: models, learning and inference. Chapter 10 Graphical Models Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x

More information

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) 10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) Rahil Mahdian 01.04.2016 LSV Lab, Saarland University, Germany What is clustering? Clustering is the classification of objects into different groups,

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Introduction to Mobile Robotics

Introduction to Mobile Robotics Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

MCMC Methods for data modeling

MCMC Methods for data modeling MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms

More information

Bayesian Statistics Group 8th March Slice samplers. (A very brief introduction) The basic idea

Bayesian Statistics Group 8th March Slice samplers. (A very brief introduction) The basic idea Bayesian Statistics Group 8th March 2000 Slice samplers (A very brief introduction) The basic idea lacements To sample from a distribution, simply sample uniformly from the region under the density function

More information

Expectation Maximization (EM) and Gaussian Mixture Models

Expectation Maximization (EM) and Gaussian Mixture Models Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation

More information

Case Study I: Naïve Bayesian spam filtering

Case Study I: Naïve Bayesian spam filtering Case Study I: Naïve Bayesian spam filtering Mike Wiper and Conchi Ausín Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School 26th - 30th June, 2017

More information

Clustering and The Expectation-Maximization Algorithm

Clustering and The Expectation-Maximization Algorithm Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications

More information

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM Jayawant Mandrekar, Daniel J. Sargent, Paul J. Novotny, Jeff A. Sloan Mayo Clinic, Rochester, MN 55905 ABSTRACT A general

More information

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA Modeling and Reasoning with Bayesian Networks Adnan Darwiche University of California Los Angeles, CA darwiche@cs.ucla.edu June 24, 2008 Contents Preface 1 1 Introduction 1 1.1 Automated Reasoning........................

More information

Variational Methods for Discrete-Data Latent Gaussian Models

Variational Methods for Discrete-Data Latent Gaussian Models Variational Methods for Discrete-Data Latent Gaussian Models University of British Columbia Vancouver, Canada March 6, 2012 The Big Picture Joint density models for data with mixed data types Bayesian

More information

Quantitative Biology II!

Quantitative Biology II! Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!

More information

Clustering web search results

Clustering web search results Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means

More information

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users Practical Considerations for WinBUGS Users Kate Cowles, Ph.D. Department of Statistics and Actuarial Science University of Iowa 22S:138 Lecture 12 Oct. 3, 2003 Issues in MCMC use for Bayesian model fitting

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example

The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 8 1 Example: Data augmentation / Auxiliary variables A commonly-used

More information

Markov chain Monte Carlo methods

Markov chain Monte Carlo methods Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis

More information

Package mfa. R topics documented: July 11, 2018

Package mfa. R topics documented: July 11, 2018 Package mfa July 11, 2018 Title Bayesian hierarchical mixture of factor analyzers for modelling genomic bifurcations Version 1.2.0 MFA models genomic bifurcations using a Bayesian hierarchical mixture

More information

Analysis of Incomplete Multivariate Data

Analysis of Incomplete Multivariate Data Analysis of Incomplete Multivariate Data J. L. Schafer Department of Statistics The Pennsylvania State University USA CHAPMAN & HALL/CRC A CR.C Press Company Boca Raton London New York Washington, D.C.

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here.

From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here. From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here. Contents About this Book...ix About the Authors... xiii Acknowledgments... xv Chapter 1: Item Response

More information

Univariate Extreme Value Analysis. 1 Block Maxima. Practice problems using the extremes ( 2.0 5) package. 1. Pearson Type III distribution

Univariate Extreme Value Analysis. 1 Block Maxima. Practice problems using the extremes ( 2.0 5) package. 1. Pearson Type III distribution Univariate Extreme Value Analysis Practice problems using the extremes ( 2.0 5) package. 1 Block Maxima 1. Pearson Type III distribution (a) Simulate 100 maxima from samples of size 1000 from the gamma

More information

Documentation for MavericK software: Version 1.0

Documentation for MavericK software: Version 1.0 Documentation for MavericK software: Version 1.0 Robert Verity MRC centre for outbreak analysis and modelling Imperial College London and Richard A. Nichols Queen Mary University of London May 19, 2016

More information

CS 6140: Machine Learning Spring 2016

CS 6140: Machine Learning Spring 2016 CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Exam

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

2. A Bernoulli distribution has the following likelihood function for a data set D: N 1 N 1 + N 0

2. A Bernoulli distribution has the following likelihood function for a data set D: N 1 N 1 + N 0 Machine Learning Fall 2015 Homework 1 Homework must be submitted electronically following the instructions on the course homepage. Make sure to explain you reasoning or show your derivations. Except for

More information

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

More information

Fitting Social Network Models Using the Varying Truncation S. Truncation Stochastic Approximation MCMC Algorithm

Fitting Social Network Models Using the Varying Truncation S. Truncation Stochastic Approximation MCMC Algorithm Fitting Social Network Models Using the Varying Truncation Stochastic Approximation MCMC Algorithm May. 17, 2012 1 This talk is based on a joint work with Dr. Ick Hoon Jin Abstract The exponential random

More information

1 Methods for Posterior Simulation

1 Methods for Posterior Simulation 1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

RJaCGH, a package for analysis of

RJaCGH, a package for analysis of RJaCGH, a package for analysis of CGH arrays with Reversible Jump MCMC 1. CGH Arrays: Biological problem: Changes in number of DNA copies are associated to cancer activity. Microarray technology: Oscar

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

ST440/540: Applied Bayesian Analysis. (5) Multi-parameter models - Initial values and convergence diagn

ST440/540: Applied Bayesian Analysis. (5) Multi-parameter models - Initial values and convergence diagn (5) Multi-parameter models - Initial values and convergence diagnostics Tuning the MCMC algoritm MCMC is beautiful because it can handle virtually any statistical model and it is usually pretty easy to

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

A Topography-Preserving Latent Variable Model with Learning Metrics

A Topography-Preserving Latent Variable Model with Learning Metrics A Topography-Preserving Latent Variable Model with Learning Metrics Samuel Kaski and Janne Sinkkonen Helsinki University of Technology Neural Networks Research Centre P.O. Box 5400, FIN-02015 HUT, Finland

More information

A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models

A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models Russell G. Almond Florida State University Abstract Mixture models form an important class of models for unsupervised learning, allowing

More information

Recap: The E-M algorithm. Biostatistics 615/815 Lecture 22: Gibbs Sampling. Recap - Local minimization methods

Recap: The E-M algorithm. Biostatistics 615/815 Lecture 22: Gibbs Sampling. Recap - Local minimization methods Recap: The E-M algorithm Biostatistics 615/815 Lecture 22: Gibbs Sampling Expectation step (E-step) Given the current estimates of parameters λ (t), calculate the conditional distribution of latent variable

More information

Package BAMBI. R topics documented: August 28, 2017

Package BAMBI. R topics documented: August 28, 2017 Type Package Title Bivariate Angular Mixture Models Version 1.1.1 Date 2017-08-23 Author Saptarshi Chakraborty, Samuel W.K. Wong Package BAMBI August 28, 2017 Maintainer Saptarshi Chakraborty

More information

Bayesian Computation with JAGS

Bayesian Computation with JAGS JAGS is Just Another Gibbs Sampler Cross-platform Accessible from within R Bayesian Computation with JAGS What I did Downloaded and installed JAGS. In the R package installer, downloaded rjags and dependencies.

More information

Model-Based Clustering for Online Crisis Identification in Distributed Computing

Model-Based Clustering for Online Crisis Identification in Distributed Computing Model-Based Clustering for Crisis Identification in Distributed Computing Dawn Woodard Operations Research and Information Engineering Cornell University with Moises Goldszmidt Microsoft Research 1 Outline

More information

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate

More information

Probabilistic Graphical Models Part III: Example Applications

Probabilistic Graphical Models Part III: Example Applications Probabilistic Graphical Models Part III: Example Applications Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2014 CS 551, Fall 2014 c 2014, Selim

More information

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis 7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning

More information

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Tracking Trends: Incorporating Term Volume into Temporal Topic Models Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA,

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

G-PhoCS Generalized Phylogenetic Coalescent Sampler version 1.2.3

G-PhoCS Generalized Phylogenetic Coalescent Sampler version 1.2.3 G-PhoCS Generalized Phylogenetic Coalescent Sampler version 1.2.3 Contents 1. About G-PhoCS 2. Download and Install 3. Overview of G-PhoCS analysis: input and output 4. The sequence file 5. The control

More information

Latent Variable Models for the Analysis, Visualization and Prediction of Network and Nodal Attribute Data. Isabella Gollini.

Latent Variable Models for the Analysis, Visualization and Prediction of Network and Nodal Attribute Data. Isabella Gollini. z i! z j Latent Variable Models for the Analysis, Visualization and Prediction of etwork and odal Attribute Data School of Engineering University of Bristol isabella.gollini@bristol.ac.uk January 4th,

More information

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups

More information

A Fast Learning Algorithm for Deep Belief Nets

A Fast Learning Algorithm for Deep Belief Nets A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton, Simon Osindero Department of Computer Science University of Toronto, Toronto, Canada Yee-Whye Teh Department of Computer Science National

More information

Latent Class Modeling as a Probabilistic Extension of K-Means Clustering

Latent Class Modeling as a Probabilistic Extension of K-Means Clustering Latent Class Modeling as a Probabilistic Extension of K-Means Clustering Latent Class Cluster Models According to Kaufman and Rousseeuw (1990), cluster analysis is "the classification of similar objects

More information

Monte Carlo for Spatial Models

Monte Carlo for Spatial Models Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing

More information

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation The Korean Communications in Statistics Vol. 12 No. 2, 2005 pp. 323-333 Bayesian Estimation for Skew Normal Distributions Using Data Augmentation Hea-Jung Kim 1) Abstract In this paper, we develop a MCMC

More information

Unsupervised: no target value to predict

Unsupervised: no target value to predict Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning

More information

Mesh segmentation. Florent Lafarge Inria Sophia Antipolis - Mediterranee

Mesh segmentation. Florent Lafarge Inria Sophia Antipolis - Mediterranee Mesh segmentation Florent Lafarge Inria Sophia Antipolis - Mediterranee Outline What is mesh segmentation? M = {V,E,F} is a mesh S is either V, E or F (usually F) A Segmentation is a set of sub-meshes

More information

Summary: A Tutorial on Learning With Bayesian Networks

Summary: A Tutorial on Learning With Bayesian Networks Summary: A Tutorial on Learning With Bayesian Networks Markus Kalisch May 5, 2006 We primarily summarize [4]. When we think that it is appropriate, we comment on additional facts and more recent developments.

More information

Tree of Latent Mixtures for Bayesian Modelling and Classification of High Dimensional Data

Tree of Latent Mixtures for Bayesian Modelling and Classification of High Dimensional Data Technical Report No. 2005-06, Department of Computer Science and Engineering, University at Buffalo, SUNY Tree of Latent Mixtures for Bayesian Modelling and Classification of High Dimensional Data Hagai

More information

Package beast. March 16, 2018

Package beast. March 16, 2018 Type Package Package beast March 16, 2018 Title Bayesian Estimation of Change-Points in the Slope of Multivariate Time-Series Version 1.1 Date 2018-03-16 Author Maintainer Assume that

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Package binomlogit. February 19, 2015

Package binomlogit. February 19, 2015 Type Package Title Efficient MCMC for Binomial Logit Models Version 1.2 Date 2014-03-12 Author Agnes Fussl Maintainer Agnes Fussl Package binomlogit February 19, 2015 Description The R package

More information

Scalable Multidimensional Hierarchical Bayesian Modeling on Spark

Scalable Multidimensional Hierarchical Bayesian Modeling on Spark Scalable Multidimensional Hierarchical Bayesian Modeling on Spark Robert Ormandi, Hongxia Yang and Quan Lu Yahoo! Sunnyvale, CA 2015 Click-Through-Rate (CTR) Prediction Estimating the probability of click

More information

Parameterization Issues and Diagnostics in MCMC

Parameterization Issues and Diagnostics in MCMC Parameterization Issues and Diagnostics in MCMC Gill Chapter 10 & 12 November 10, 2008 Convergence to Posterior Distribution Theory tells us that if we run the Gibbs sampler long enough the samples we

More information

Appendix A: An Alternative Estimation Procedure Dual Penalized Expansion

Appendix A: An Alternative Estimation Procedure Dual Penalized Expansion Supplemental Materials for Functional Linear Models for Zero-Inflated Count Data with Application to Modeling Hospitalizations in Patients on Dialysis by Şentürk, D., Dalrymple, L. S. and Nguyen, D. V.

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software December 2007, Volume 23, Issue 9. http://www.jstatsoft.org/ WinBUGSio: A SAS Macro for the Remote Execution of WinBUGS Michael K. Smith Pfizer Global Research and Development

More information

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008 Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a

More information

INTRO TO THE METROPOLIS ALGORITHM

INTRO TO THE METROPOLIS ALGORITHM INTRO TO THE METROPOLIS ALGORITHM A famous reliability experiment was performed where n = 23 ball bearings were tested and the number of revolutions were recorded. The observations in ballbearing2.dat

More information

A Short History of Markov Chain Monte Carlo

A Short History of Markov Chain Monte Carlo A Short History of Markov Chain Monte Carlo Christian Robert and George Casella 2010 Introduction Lack of computing machinery, or background on Markov chains, or hesitation to trust in the practicality

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 1, 2019 Today: Inference in graphical models Learning graphical models Readings: Bishop chapter 8 Bayesian

More information

Bayesian Mixture Labelling by Highest Posterior Density

Bayesian Mixture Labelling by Highest Posterior Density Bayesian Mixture Labelling by Highest Posterior Density Weixin Yao and Bruce G. Lindsay Abstract A fundamental problem for Bayesian mixture model analysis is label switching, which occurs due to the non-identifiability

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements Project poster session on Thursday Dec 3, 4-6pm in Annenberg 2 nd floor atrium! Easels, poster boards and cookies

More information

Performance of Sequential Imputation Method in Multilevel Applications

Performance of Sequential Imputation Method in Multilevel Applications Section on Survey Research Methods JSM 9 Performance of Sequential Imputation Method in Multilevel Applications Enxu Zhao, Recai M. Yucel New York State Department of Health, 8 N. Pearl St., Albany, NY

More information

CS Introduction to Data Mining Instructor: Abdullah Mueen

CS Introduction to Data Mining Instructor: Abdullah Mueen CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts

More information

Package TBSSurvival. January 5, 2017

Package TBSSurvival. January 5, 2017 Version 1.3 Date 2017-01-05 Package TBSSurvival January 5, 2017 Title Survival Analysis using a Transform-Both-Sides Model Author Adriano Polpo , Cassio de Campos , D.

More information

Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg

Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg Phil Gregory Physics and Astronomy Univ. of British Columbia Introduction Martin Weinberg reported

More information

Supplementary Material sppmix: Poisson point process modeling using normal mixture models

Supplementary Material sppmix: Poisson point process modeling using normal mixture models Supplementary Material sppmix: Poisson point process modeling using normal mixture models Athanasios C. Micheas and Jiaxun Chen Department of Statistics University of Missouri April 19, 2018 1 The sppmix

More information

Package sparsereg. R topics documented: March 10, Type Package

Package sparsereg. R topics documented: March 10, Type Package Type Package Package sparsereg March 10, 2016 Title Sparse Bayesian Models for Regression, Subgroup Analysis, and Panel Data Version 1.2 Date 2016-03-01 Author Marc Ratkovic and Dustin Tingley Maintainer

More information

Assessing the Quality of the Natural Cubic Spline Approximation

Assessing the Quality of the Natural Cubic Spline Approximation Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,

More information

Package mcemglm. November 29, 2015

Package mcemglm. November 29, 2015 Type Package Package mcemglm November 29, 2015 Title Maximum Likelihood Estimation for Generalized Linear Mixed Models Version 1.1 Date 2015-11-28 Author Felipe Acosta Archila Maintainer Maximum likelihood

More information

LDA for Big Data - Outline

LDA for Big Data - Outline LDA FOR BIG DATA 1 LDA for Big Data - Outline Quick review of LDA model clustering words-in-context Parallel LDA ~= IPM Fast sampling tricks for LDA Sparsified sampler Alias table Fenwick trees LDA for

More information

An imputation approach for analyzing mixed-mode surveys

An imputation approach for analyzing mixed-mode surveys An imputation approach for analyzing mixed-mode surveys Jae-kwang Kim 1 Iowa State University June 4, 2013 1 Joint work with S. Park and S. Kim Ouline Introduction Proposed Methodology Application to Private

More information

CLUSTERING. JELENA JOVANOVIĆ Web:

CLUSTERING. JELENA JOVANOVIĆ   Web: CLUSTERING JELENA JOVANOVIĆ Email: jeljov@gmail.com Web: http://jelenajovanovic.net OUTLINE What is clustering? Application domains K-Means clustering Understanding it through an example The K-Means algorithm

More information

Infectious Disease Models. Angie Dobbs. A Thesis. Presented to. The University of Guelph. In partial fulfilment of requirements.

Infectious Disease Models. Angie Dobbs. A Thesis. Presented to. The University of Guelph. In partial fulfilment of requirements. Issues of Computational Efficiency and Model Approximation for Individual-Level Infectious Disease Models by Angie Dobbs A Thesis Presented to The University of Guelph In partial fulfilment of requirements

More information

Multiplicative Mixture Models for Overlapping Clustering

Multiplicative Mixture Models for Overlapping Clustering Multiplicative Mixture Models for Overlapping Clustering Qiang Fu Dept of Computer Science & Engineering University of Minnesota, Twin Cities qifu@cs.umn.edu Arindam Banerjee Dept of Computer Science &

More information

Graphical Models, Bayesian Method, Sampling, and Variational Inference

Graphical Models, Bayesian Method, Sampling, and Variational Inference Graphical Models, Bayesian Method, Sampling, and Variational Inference With Application in Function MRI Analysis and Other Imaging Problems Wei Liu Scientific Computing and Imaging Institute University

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information