Latent Variable Models for the Analysis, Visualization and Prediction of Network and Nodal Attribute Data. Isabella Gollini.

Size: px
Start display at page:

Download "Latent Variable Models for the Analysis, Visualization and Prediction of Network and Nodal Attribute Data. Isabella Gollini."

Transcription

1 z i! z j Latent Variable Models for the Analysis, Visualization and Prediction of etwork and odal Attribute Data School of Engineering University of Bristol isabella.gollini@bristol.ac.uk January 4th, 4 Joint work with Prof. Brendan Murphy (University College Dublin) About Me With Jonty: Probabilistic methods for uncertainty assessment and quantification in natural hazards (floods, volcanoes, and earthquakes etc.). Models to cluster binary data with complex dependence structure Gollini, I., and Murphy, T.B., () Mixture of Latent Trait Analyzers for Model-Based Clustering of Categorical Data, Statistics and Computing. Models for network data Use of Variational methods for fast approximate inference. Outline otation number of nodes of an observed network etwork Data Latent Space Models for etworks Variational Inference Factor Analysis for odal Attributes Joint Model for etwork and odal Attributes Latent Variable Models for Multiple etworks Y ( ) adjacencymatrix if there is an edge between node i and j y ij = otherwise X ( M) matrix of M nodal attributes. z n (,σ I) D dimensional continuous latent variable Latent Space Model () for etworks Hoff et al. () introduced a model that assumes that each node n has an unknown position z n in a D-dim Euclidean latent space. p(y Z,α)= p(y ij z i,z j,α)= exp(α z i z j ) yij + exp(α z i z j ) Why Squared Euclidean Distance? It requires less approximation to be made in the estimation procedure. It allows to visualize more clearly the presence of potential clusters, giving a higher probability of a link between two close nodes in the latent space and lower probability to two nodes lying far away from each other. with p(α)= (ξ,ψ ), p(z n ) iid = (,σ I D )andσ,ξ,ψ are fixed parameters d ij = z i! z j d ij = z i! z j! =! =! =! =! =! The posterior distribution cannot be calculated analytically. p(y ij = ) = exp(!! d ij) + exp(!! d ij).5 OTE: We propose to use is the Squared Euclidean Distance

2 for etworks Variational Approach We fit the model using a Variational inference approach that is considerably quicker but less accurate than MCMC. The posterior probability of the unknown (Z,α) is: p(z,α Y)=p(Y Z,α)p(α) n= where C is the unknown normalising constant p(z n ) C We propose a variational posterior q(z,α Y) introducing variational parameters ξ, ψ, z n, Σ: q(z,α Y)=q(α) n= q(z n ) where q(α)= ( ξ, ψ )andq(z n )= ( z n, Σ). Variational Approach The basic idea behind the variational approach is to find a lower bound of the log marginal likelihood logp(y) by introducing the variational posterior distribution q(z,α Y). This approach leads to minimize the Kulback-Leibler divergence between the variational posterior q(z,α Y) and the true posterior p(z,α Y): KL[q(Z,α Y) p(z,α Y)] = q(z,α Y)log p(z,α Y) = = q(z,α Y) d(z,α) p(y,z,α) q(z,α Y)log p(y)q(z,α Y) d(z,α) q(z,α Y)log p(y,z,α) d(z,α) log p(y) q(z,α Y) Variational Approach for etworks Variational Approach EM Algorithm KL[q(Z,α Y) p(z,α Y)] divergence can be written as: KL[q(Z,α Y) p(z,α Y)] = KL[q(α) p(α)] + i= KL[q(z i ) p(z i )] E q(z,α Y) [log(p(y Z,α))] E q(z,α Y) [log(p(y Z,α))] is approximated using the Jensen s inequality: E q(z,α Y) [log(p(y Z,α))] = y ij E q(z,α Y) [α z i z j ] E q(z,α Y) [log( + exp(α z i z j ))] y ij (E q(z,α Y) [α z i z j ]) log( + E q(z,α Y) [exp(α z i z j )]) The EM algorithm at each (i + )th iteration: E-Step Estimate z (i+) n and Σ (i+) : Q(Θ ;Θ )=KL[q(Z,α Y) p(z,α Y)] where Θ =( ξ, ψ ). M-Step Estimate ξ and ψ : Θ (i+) = argmax Q(Θ ;Θ ) Monks etwork Comparison of Estimation Methods and Distance Metrics Sampson (969) recorded the social interactions among a group of = 8 monks while being a resident in a ew England monastery. The directed links of the network represent the liking relationships. z n Sampson Data z n MCMC - Euclidean distance (latentnet) z n Variational - Euclidean distance (VBLPCM) z n Variational - Squared Euclidean distance z n z n z n - - z n

3 Variational Inference VS MCMC etwork and odal Attributes Closed form posteriors Far faster than MCMC based methods In the absence of posterior dependence, the lower bound would match the log likelihood. As long as the posterior dependence is weak, the VA may be useful: Larger networks For starting point of MCMC algorithms To explore the model space. Underestimates variances Difficult to assess how tight the lower bound is. Sensitive to starting values (local minima) The classical approach to incorporate nodal attributes in the is: exp(α + β p(y Z,α,X,β)= T x ij z i z j ) y ij +exp(α + β T x ij z i z j ) β and x ij are vectors of length M. This contains only link covariate information x ij so it is not designed to deal with nodal attributes directly. This model assumes that the probability of a link depends on the nodal attributes (social selection) Sometimes the nodal attributes depend on the network links (social influence). We present a model where the network and the nodal attributes data mutually depend on each other. Factor Analysis (FA) for odal Attributes Factor Analysis (FA) for odal Attributes Factor analysis (FA) (Spearman, 94) is a useful technique to visualize continuous data, reducing the data dimensionality from M to D (where D M) inordertoexplainthe variability expressed by the correlation within the data. FA assumes that there is a continuous latent variable z n underlying the behavior of the continuous response variables given by an observation x n. z n iid (,σ I), and ε n iid (,Ψ), where Ψ = diag(ψ,...,ψ M ), and So, x n = µ + Λz n + ε n p(x n z n ) (µ,(λσ)(λσ) T + Ψ) The EM algorithm is used to find maximum likelihood estimate. p(z n x n ) (ẑ n, ˆΣ) Everything can be calculated analytically in closed form. The joint model for etwork and odal Attributes The probability of a node being connected with other nodes and the behaviour of nodal attributes are explained by the same latent variable. A continuous latent variable z n (,σ I D )summarizesthe information given by both the network and the nodal attributes. etwork Y and nodal attributes x n are independent given the latent variable z n. FA X Ψ Λ Z α Y The joint model for etwork and odal Attributes Fit the model We assume that: p(z n ) (,σ I). The network data are modeled via : p(z n Y) ( z n, Σ). The nodal attributes are modeled via FA: p(z n x n ) (ẑ n, ˆΣ). Joint model: where Σ = p(z n Y,x n ) p(z n Y)p(z n x n ) p(z n ) ( z n, Σ) Σ + ˆΣ σ I D and z n = Σ Σ zn + ˆΣ ẑ n

4 The joint model for etwork and odal Attributes EM Algorithm E-Step Estimate Σ (i+) The joint model for etwork and odal Attributes Yeast Proteins (i+) The network is composed by =5 nodes representing the interaction between Saccharomyces cerevisiae (yeast) proteins. and z n Q(Θ,ΘFA ; Θ, ΘFA ) = = Ep(Z Y,X;Θ,ΘFA ) + Ep(Z Y,X;Θ The nodal attributes consist of expression levels during yeast sporulation from M=8 experiments with Saccharomyces cerevisiae proteins. [log(p(y, Z Θ ))] +,ΘFA ) [log(p(x, Z ΘFA ))] therefore, (i+) (i+) = [Σ ] + [Σ ] ID σ (i+) (i+) (i+) (i+) (i+) (i+) z n = Σ [Σ ] z n + [Σ ] z n Σ Factor analysis is an appropriate tool to visualize the M-dimensional expression data in a low dimensional latent space. (i+) The fixed parameters: zn (, I ) ξ = ψ = M-Step Update (i+) (i+) (Θ, ΘFA ) = argmax Q(Θ, ΘFA ; Θ, ΘFA ) The joint model for etwork and odal Attributes Results The joint model for etwork and odal Attributes Performance.. ROC curves. AUC zjm = 9 % AUC ~ z JM = 9 % AUC ~ z = 9 %... ROC curve (left) and Boxplot (right) of the estimated probabilities of a link for the true negatives and true positives. positions (left) and LSJM positions (right). Multiple etwork Views Joint Modelling of Multiple etwork Views In many applications the behaviour of the nodes is strongly shaped by the complex relation of many interactions. Longitudinal networks: the links represent the same relation at different time points. Multiplex networks: the links come from different kind of relations (eg genetic and physical etc.) We have K networks on the same nodes. We propose a model that merges the information given by all these networks. A continuous latent variable zn (, σ ID ) identifies the position of node n in a D-dimensional latent space. Z Y Y Y

5 Joint Modelling of Multiple etwork Views Model The probability of a link depends on the distance between two nodes in the latent space. p(y,...,y K Z,α)= K k= exp(α k z i z j ) yijk + exp(α k z i z j ) Variational Approach k =,...,K: p(z n Y k ) ( z nk, Σ k ). Joining the two models: where Σ = p(z n Y,...,Y K ;Θ,...,Θ K ) K k= p(z n Y k ;Θ k ) p(z n ) K K k= Σ k K σ I D ( z n, Σ) and z n = Σ K k= Σ k z nk Joint Modelling of Multiple etwork Views EM Algorithm E-Step Estimate the parameters of the joint posterior distribution Σ (i+) and z (i+) n : Q(Θ,...,Θ K ;Θ,...,Θ K )= = K k= E p(z Y,...,Y K ;Θ,...,Θ K )[log(p(y k,z Θ k ))] We estimate the parameters z nk, Σ k of the posterior distribution p(z n Y k ;Θ k ) given each network k separately. We merge these estimates to find joint posterior distribution of the latent positions ( z n, Σ). M-Step Update the variational model parameters ξ,..., ξ K, ψ,..., ψ K : (Θ (i+),...,θ (i+) K )=argmax Q(Θ,...,Θ K ;Θ,...,Θ K ). LSJM Monks etwork (cont d) LSJM Monks etwork LSJM positions z n - - We analyze the networks of liking relationship at K =time points fitting the to each network separately. - Sampson Data z n Sampson Data z n Sampson Data z n - - LSJM - Sampson Data z n z n z n z n LSJM Protein-Protein Interactions LSJM Protein-Protein Interactions positions Genetic Interactions Physical Interactions K = undirected networks formed by genetic and physical protein-protein interactions between = 67 Saccharomyces cerevisiae proteins. The complex relational structure of this dataset has led to implementation of models aiming at describing the functional relationships between the observations. The data were downloaded from the Biological General Repository for Interaction Datasets (BioGRID) database. z n z n z n Latent posterior distributions fitting the for the two networks separately. z n

6 LSJM Protein-Protein Interactions ROC and BOX plots LSJM Protein-Protein Interactions LSJM positions! LSJM: Genetic! Physical LSJM Physical Interactions zn zn. p (y i j = Model). yij =. yij = yij = yij = AUC Genetic Interactions = 95 % AUC Physical Interactions = 95 %.... p (y i j = Model).... Genetic Interactions ROC curve ROC curves and Boxplots of the estimated probabilities of a link for the true negatives and true positives zn zn Left: p(zn Y,..., YK ; Θ,..., ΘK ) (z n, Σ ) fitting the LSJM Right: the dots represent the overall positions z n and the arrows connect the estimated position under each model p(zn Yk ; Θk ) = (z nk, Σ k ). LSJM Protein-Protein Interactions LSJM ROC and BOX plots. p (y i j = Model). Missing (unobserved) links can be easily managed by the LSJM using the information given by all the network views... p (y i j = Model) LSJM Missing Links and Missing odes... AUC Genetic Interactions = 94 % AUC Physical Interactions = 9 %.... To estimate the probability of the presence or absence of an edge we employ the posterior mean of the αk and of the latent positions so that we get the following equation:.. Physical Interactions Genetic Interactions ROC curve yij =. yij = yij = yij = yijk = p(yijk = z i,z j, ξ k ) = ROC curves and Boxplots of the estimated probabilities of a link for the true negatives and true positives. LSJM Protein-Protein Interactions Missing Data... Missing Links (-fold cross validation): LSJM: misclassification rate of 9% for the genetic interaction network, and 6% for the physical interaction network. : misclassification rate of 8% for the genetic interaction network, and 7% for the physical interaction network. AUC Genetic Interactions = 66 % AUC Physical Interactions = 9 %... ROC curves fitting a LSJM (left) and single (right) If we want to infer whether to assign yijk = or not, we need to introduce a threshold τk, and let yijk = if p(yijk = z ik,z jk, ξ k ) > τk. LSJM: misclassification rate of 4% for the genetic interactions dataset and % for the physical interaction network. : useless since it would locate the nodes only relying on the prior information..... Missing odes (-fold cross validation):.. ROC curve Overall AUC Genetic Interactions = 9 % AUC Physical Interactions = 88 %. + exp(ξ k z i z j ) LSJM Protein-Protein Interactions Missing Data To evaluate the link prediction we applied a -fold cross validation setting the % of the links to be missing at each time point. ROC curve Overall exp(ξ k z i z j ) Try to improve the predictions using a higher dimension for the latent variables.

7 Conclusions References The joint models are particularly useful to: Locate unconnected nodes/subgraphs in the latent space. Estimate missing links. Wide range of applications Variational Bayes allows to deal with networks of thousands of nodes. Possible extentions: Joint models for directed networks using the inner product instead of Euclidean distance. Joint models for categorical nodal attributes (LTA instead of FA). Joint models with clusters (LPCM, MFA, MLTA). Beyond binary networks: Rank and Count data. Hoff, P. D., Raftery, A. E., and Handcock, M. S. (). Latent space approaches to social network analysis. Journal of the american Statistical association, 97 (46), Salter-Townshend, M., White, A., Gollini, I., and Murphy, T.B. (). Review of Statistical etwork Analysis: Models, Algorithms and Software, Statistical Analysis and Data Mining, 5 (4), Gollini, I., and Murphy, T.B. (). Joint Modelling of Multiple etwork Views, arxiv:.759

Package lvm4net. R topics documented: August 29, Title Latent Variable Models for Networks

Package lvm4net. R topics documented: August 29, Title Latent Variable Models for Networks Title Latent Variable Models for Networks Package lvm4net August 29, 2016 Latent variable models for network data using fast inferential procedures. Version 0.2 Depends R (>= 3.1.1), MASS, ergm, network,

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

The Random Subgraph Model for the Analysis of an Ecclesiastical Network in Merovingian Gaul

The Random Subgraph Model for the Analysis of an Ecclesiastical Network in Merovingian Gaul The Random Subgraph Model for the Analysis of an Ecclesiastical Network in Merovingian Gaul Charles Bouveyron Laboratoire MAP5, UMR CNRS 8145 Université Paris Descartes This is a joint work with Y. Jernite,

More information

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate

More information

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 9, 2012 Today: Graphical models Bayes Nets: Inference Learning Readings: Required: Bishop chapter

More information

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information

Markov chain Monte Carlo methods

Markov chain Monte Carlo methods Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Structured prediction using the network perceptron

Structured prediction using the network perceptron Structured prediction using the network perceptron Ta-tsen Soong Joint work with Stuart Andrews and Prof. Tony Jebara Motivation A lot of network-structured data Social networks Citation networks Biological

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Clustering web search results

Clustering web search results Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means

More information

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Lecture 21 : A Hybrid: Deep Learning and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation

More information

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis 7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning

More information

Lecture 8: The EM algorithm

Lecture 8: The EM algorithm 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 8: The EM algorithm Lecturer: Manuela M. Veloso, Eric P. Xing Scribes: Huiting Liu, Yifan Yang 1 Introduction Previous lecture discusses

More information

08 An Introduction to Dense Continuous Robotic Mapping

08 An Introduction to Dense Continuous Robotic Mapping NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy

More information

Monte Carlo for Spatial Models

Monte Carlo for Spatial Models Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing

More information

Variational Autoencoders

Variational Autoencoders red red red red red red red red red red red red red red red red red red red red Tutorial 03/10/2016 Generative modelling Assume that the original dataset is drawn from a distribution P(X ). Attempt to

More information

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and

More information

Score function estimator and variance reduction techniques

Score function estimator and variance reduction techniques and variance reduction techniques Wilker Aziz University of Amsterdam May 24, 2018 Wilker Aziz Discrete variables 1 Outline 1 2 3 Wilker Aziz Discrete variables 1 Variational inference for belief networks

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

Mixture Models and EM

Mixture Models and EM Table of Content Chapter 9 Mixture Models and EM -means Clustering Gaussian Mixture Models (GMM) Expectation Maximiation (EM) for Mixture Parameter Estimation Introduction Mixture models allows Complex

More information

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and

More information

Digital Image Processing Laboratory: MAP Image Restoration

Digital Image Processing Laboratory: MAP Image Restoration Purdue University: Digital Image Processing Laboratories 1 Digital Image Processing Laboratory: MAP Image Restoration October, 015 1 Introduction This laboratory explores the use of maximum a posteriori

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Network-based auto-probit modeling for protein function prediction

Network-based auto-probit modeling for protein function prediction Network-based auto-probit modeling for protein function prediction Supplementary material Xiaoyu Jiang, David Gold, Eric D. Kolaczyk Derivation of Markov Chain Monte Carlo algorithm with the GO annotation

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

INLA: Integrated Nested Laplace Approximations

INLA: Integrated Nested Laplace Approximations INLA: Integrated Nested Laplace Approximations John Paige Statistics Department University of Washington October 10, 2017 1 The problem Markov Chain Monte Carlo (MCMC) takes too long in many settings.

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

Multiple Imputation with Mplus

Multiple Imputation with Mplus Multiple Imputation with Mplus Tihomir Asparouhov and Bengt Muthén Version 2 September 29, 2010 1 1 Introduction Conducting multiple imputation (MI) can sometimes be quite intricate. In this note we provide

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

Missing Data Estimation in Microarrays Using Multi-Organism Approach

Missing Data Estimation in Microarrays Using Multi-Organism Approach Missing Data Estimation in Microarrays Using Multi-Organism Approach Marcel Nassar and Hady Zeineddine Progress Report: Data Mining Course Project, Spring 2008 Prof. Inderjit S. Dhillon April 02, 2008

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression

More information

Generative and discriminative classification

Generative and discriminative classification Generative and discriminative classification Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Classification in its simplest form Given training data labeled for two or more classes Classification

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 4, 2015 Today: Graphical models Bayes Nets: EM Mixture of Gaussian clustering Learning Bayes Net structure

More information

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD Expectation-Maximization Nuno Vasconcelos ECE Department, UCSD Plan for today last time we started talking about mixture models we introduced the main ideas behind EM to motivate EM, we looked at classification-maximization

More information

MCMC Methods for data modeling

MCMC Methods for data modeling MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018 Assignment 2 Unsupervised & Probabilistic Learning Maneesh Sahani Due: Monday Nov 5, 2018 Note: Assignments are due at 11:00 AM (the start of lecture) on the date above. he usual College late assignments

More information

Image Segmentation using Gaussian Mixture Models

Image Segmentation using Gaussian Mixture Models Image Segmentation using Gaussian Mixture Models Rahman Farnoosh, Gholamhossein Yari and Behnam Zarpak Department of Applied Mathematics, University of Science and Technology, 16844, Narmak,Tehran, Iran

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Machine Learning. Sourangshu Bhattacharya

Machine Learning. Sourangshu Bhattacharya Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,

More information

1 Training/Validation/Testing

1 Training/Validation/Testing CPSC 340 Final (Fall 2015) Name: Student Number: Please enter your information above, turn off cellphones, space yourselves out throughout the room, and wait until the official start of the exam to begin.

More information

Robotics. Lecture 5: Monte Carlo Localisation. See course website for up to date information.

Robotics. Lecture 5: Monte Carlo Localisation. See course website  for up to date information. Robotics Lecture 5: Monte Carlo Localisation See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information. Andrew Davison Department of Computing Imperial College London Review:

More information

Text Modeling with the Trace Norm

Text Modeling with the Trace Norm Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to

More information

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study

More information

Context Change and Versatile Models in Machine Learning

Context Change and Versatile Models in Machine Learning Context Change and Versatile s in Machine Learning José Hernández-Orallo Universitat Politècnica de València jorallo@dsic.upv.es ECML Workshop on Learning over Multiple Contexts Nancy, 19 September 2014

More information

arxiv: v1 [stat.me] 29 May 2015

arxiv: v1 [stat.me] 29 May 2015 MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis Vincent Audigier 1, François Husson 2 and Julie Josse 2 arxiv:1505.08116v1 [stat.me] 29 May 2015 Applied Mathematics

More information

Inference and Representation

Inference and Representation Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised

More information

The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example

The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 8 1 Example: Data augmentation / Auxiliary variables A commonly-used

More information

Clustering algorithms

Clustering algorithms Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 1 / 22 Table of contents 1 Supervised

More information

Graphical Models & HMMs

Graphical Models & HMMs Graphical Models & HMMs Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Graphical Models

More information

Case Study IV: Bayesian clustering of Alzheimer patients

Case Study IV: Bayesian clustering of Alzheimer patients Case Study IV: Bayesian clustering of Alzheimer patients Mike Wiper and Conchi Ausín Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School 2nd - 6th

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016 CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Assignment 0: Admin 1 late day to hand it in tonight, 2 late days for Wednesday. Assignment 1 is out: Due Friday of next week.

More information

Variational Autoencoders. Sargur N. Srihari

Variational Autoencoders. Sargur N. Srihari Variational Autoencoders Sargur N. srihari@cedar.buffalo.edu Topics 1. Generative Model 2. Standard Autoencoder 3. Variational autoencoders (VAE) 2 Generative Model A variational autoencoder (VAE) is a

More information

Warped Mixture Models

Warped Mixture Models Warped Mixture Models Tomoharu Iwata, David Duvenaud, Zoubin Ghahramani Cambridge University Computational and Biological Learning Lab March 11, 2013 OUTLINE Motivation Gaussian Process Latent Variable

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Graphical Models, Bayesian Method, Sampling, and Variational Inference

Graphical Models, Bayesian Method, Sampling, and Variational Inference Graphical Models, Bayesian Method, Sampling, and Variational Inference With Application in Function MRI Analysis and Other Imaging Problems Wei Liu Scientific Computing and Imaging Institute University

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos Machine Learning for Computer Vision 1 22 October, 2012 MVA ENS Cachan Lecture 5: Introduction to generative models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris

More information

Graph mining assisted semi-supervised learning for fraudulent cash-out detection

Graph mining assisted semi-supervised learning for fraudulent cash-out detection Graph mining assisted semi-supervised learning for fraudulent cash-out detection Yuan Li Yiheng Sun Noshir Contractor Aug 2, 2017 Outline Introduction Method Experiments and Results Conculsion and Future

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Object Recognition 2015-2016 Jakob Verbeek, December 11, 2015 Course website: http://lear.inrialpes.fr/~verbeek/mlor.15.16 Classification

More information

Transitivity and Triads

Transitivity and Triads 1 / 32 Tom A.B. Snijders University of Oxford May 14, 2012 2 / 32 Outline 1 Local Structure Transitivity 2 3 / 32 Local Structure in Social Networks From the standpoint of structural individualism, one

More information

Self-organizing mixture models

Self-organizing mixture models Self-organizing mixture models Jakob Verbeek, Nikos Vlassis, Ben Krose To cite this version: Jakob Verbeek, Nikos Vlassis, Ben Krose. Self-organizing mixture models. Neurocomputing / EEG Neurocomputing,

More information

Problem 1 (20 pt) Answer the following questions, and provide an explanation for each question.

Problem 1 (20 pt) Answer the following questions, and provide an explanation for each question. Problem 1 Answer the following questions, and provide an explanation for each question. (5 pt) Can linear regression work when all X values are the same? When all Y values are the same? (5 pt) Can linear

More information

Identifying and Understanding Differential Transcriptor Binding

Identifying and Understanding Differential Transcriptor Binding Identifying and Understanding Differential Transcriptor Binding 15-899: Computational Genomics David Koes Yong Lu Motivation Under different conditions, a transcription factor binds to different genes

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

Hidden Markov Models in the context of genetic analysis

Hidden Markov Models in the context of genetic analysis Hidden Markov Models in the context of genetic analysis Vincent Plagnol UCL Genetics Institute November 22, 2012 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi

More information

mritc: A Package for MRI Tissue Classification

mritc: A Package for MRI Tissue Classification mritc: A Package for MRI Tissue Classification Dai Feng 1 Luke Tierney 2 1 Merck Research Labratories 2 University of Iowa July 2010 Feng & Tierney (Merck & U of Iowa) MRI Tissue Classification July 2010

More information

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13 CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit

More information

Classication of Corporate and Public Text

Classication of Corporate and Public Text Classication of Corporate and Public Text Kevin Nguyen December 16, 2011 1 Introduction In this project we try to tackle the problem of classifying a body of text as a corporate message (text usually sent

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 9: Data Mining (4/4) March 9, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides

More information

Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis

Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis Xavier Le Faucheur a, Brani Vidakovic b and Allen Tannenbaum a a School of Electrical and Computer Engineering, b Department of Biomedical

More information

Variational Methods for Discrete-Data Latent Gaussian Models

Variational Methods for Discrete-Data Latent Gaussian Models Variational Methods for Discrete-Data Latent Gaussian Models University of British Columbia Vancouver, Canada March 6, 2012 The Big Picture Joint density models for data with mixed data types Bayesian

More information

Applications of admixture models

Applications of admixture models Applications of admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price Applications of admixture models 1 / 27

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

CS281 Section 9: Graph Models and Practical MCMC

CS281 Section 9: Graph Models and Practical MCMC CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs

More information