Expectation Maximization: Inferring model parameters and class labels

Size: px
Start display at page:

Download "Expectation Maximization: Inferring model parameters and class labels"

Transcription

1 Expectation Maximization: Inferring model parameters and class labels Emily Fox University of Washington February 27, 2017 Mixture of Gaussian recap 1

2 2/26/17 Jumble of unlabeled images HISTOGRAM blue How do we model this distribution? 3 Model of jumble of unlabeled images blue blue 0.8 blue 2

3 2/26/17 What if image types not equally represented? e.g., forest images are very likely in the collection 0.8 blue Mixture of Gaussians (1D) Each mixture component represents a unique cluster specified by: {πk, μk, σk2 } π1 π2 π3 π = [ ] π1 σ1 π2 π3 σ2 πk = 1 Relative proportion of each class in world from which we get data σ3 μ2 6 0 πk 1 μ1 μ3 x 3

4 Mixture of Gaussians (general) μ 1,Σ 1 π 1 π 3 Each mixture component represents a unique cluster specified by: {π k, μ k, Σ k } π 2 μ 3,Σ 3 μ 2,Σ 2 7 According to the model Without observing the image content, what s the probability it s from cluster k? (e.g., prob. of seeing clouds image) p(z i = k) = k Given observation x i is from cluster k, what s the likelihood of seeing x i? (e.g., just look at distribution for clouds ) p(x i z i = k, µ k, k )=N(x i µ k, k ) π 1 π 2 π 3 8 x 4

5 Inferring soft assignments with expectation maximization (EM) Inferring cluster labels Data Desired soft assignments 10 5

6 Part 1: What if we knew the cluster parameters {π k,μ k,σ k }? Compute responsibilities Responsibility cluster k takes for observation i r ik = p(z i = k { j,µ j, j } K j=1,x i ) probability of assignment to cluster k given model parameters and observed value 12 6

7 Responsibilities in pictures Green cluster takes more responsibility Blue cluster takes more responsibility Uncertain split responsibility 13 Responsibilities in pictures Need to weight by cluster probabilities, not just cluster shapes Still uncertain, but green cluster seems more probable takes more responsibility 14 7

8 Responsibilities in equations r ik = Responsibility cluster k takes for observation i k N(x i µ k, k ) Initial probability of being from cluster k How likely is the observed value x i under this cluster assignment? 15 Responsibilities in equations r ik = Responsibility cluster k takes for observation i k N(x i µ k, k ) KX j N(x i µ j, j ) j=1 Normalized over all possible cluster assignments 16 8

9 Recall: According to the model Without observing the image content, what s the probability it s from cluster k? (e.g., prob. of seeing clouds image) p(z i = k) = k Given observation x i is from cluster k, what s the likelihood of seeing x i? (e.g., just look at distribution for clouds ) p(x i z i = k, µ k, k )=N(x i µ k, k ) π 1 π 2 π 3 17 x Formally: An application of Bayes rule Responsibility cluster k takes for observation i r ik = p(z i = k { j,µ params j, j } K j=1,x i ) A k N(x i µ k, k ) = KX j N(x i µ j, j ) j=1 A B A B 18 9

10 Formally: An application of Bayes rule Responsibility cluster k takes for observation i r ik = p(z i = k { j,µ params j, j } K j=1,x i ) k N(x i µ k, k ) = KX j N(x i µ j, j ) j=1 A B C B C 19 Formally: An application of Bayes rule r ik = p(a B,params) = = p(a params)p(b A,params) X p(c params)p(b C,params) C p(a params)p(b A,params) p(b params) 20 10

11 Part 1 summary Desired soft assignments (responsibilities) are easy to compute when cluster parameters {π k, μ k, Σ k } are known But, we don t know these! 21 Part 2a: Imagine we knew the cluster (hard) assignments z i 11

12 Estimating cluster parameters Imagine we know the cluster assignments Estimation problem decouples across clusters Is green point informative of fuchsia cluster parameters? NO! 23 Data table decoupling over clusters R G B Cluster x 1 [1] x 1 [2] x 1 [3] 3 x 2 [1] x 2 [2] x 2 [3] 3 R G B Cluster x 3 [1] x 3 [2] x 3 [3] 3 R G B Cluster x 4 [1] x 4 [2] x 4 [3] 1 x 5 [1] xx 5 [2] 5 [2] xx 5 [3] 5 [3] 2 x 6 [1] xx 6 [2] 6 [2] xx 6 [3] 6 [3]

13 Maximum likelihood estimation R G B Cluster x 1 [1] x 1 [2] x 1 [3] 3 x 2 [1] x 2 [2] x 2 [3] 3 x 3 [1] x 3 [2] x 3 [3] 3 Estimate {π k, μ k, Σ k } given data assigned to cluster k maximum likelihood estimation (MLE) Find parameters that maximize the score, or likelihood, of data 25 Mean/covariance MLE 26 R G B Cluster x 1 [1] x 1 [2] x 1 [3] 3 x 2 [1] x 2 [2] x 2 [3] 3 x 3 [1] x 3 [2] x 3 [3] 3 ˆµ k = 1 N k X i in k x i ˆ k = 1 X (x i ˆµ k )(x i ˆµ k ) T N k i in k Scalar case: 13

14 Cluster proportion MLE R G B Cluster x 4 [1] x 4 [2] x 4 [3] 1 # obs in cluster k R G B Cluster x 5 [1] x 5 [2] x 5 [3] 2 x 6 [1] x 6 [2] x 6 [3] 2 R G B Cluster x 1 [1] x 1 [2] x 1 [3] 3 x 2 [1] x 2 [2] x 2 [3] 3 x 3 [1] x 3 [2] x 3 [3] 3 ˆ k = N k N total # of obs True for general mixtures of i.i.d. data, not just Gaussian clusters 27 Part 2a summary needed to compute soft assignments Cluster parameters are simple to compute if we know the cluster assignments But, we don t know these! 28 14

15 Part 2b: What can we do with just soft assignments r ij? Estimating cluster parameters from soft assignments Instead of having a full observation x i in cluster k, just allocate a portion r ik x i divided across all clusters, as determined by r ik 30 15

16 Maximum likelihood estimation from soft assignments Just like in boosting with weighted observations R G B r i1 r i2 r i3 x 1 [1] x 1 [2] x 1 [3] x 2 [1] x 2 [2] x 2 [3] x 3 [1] x 3 [2] x 3 [3] x 4 [1] x 4 [2] x 4 [3] x 5 [1] x 5 [2] x 5 [3] x 6 [1] x 6 [2] x 6 [3] % chance this obs is in cluster 3 31 Total weight in cluster: (effective # of obs) Maximum likelihood estimation from soft assignments R G B Cluster Cluster data 1 weights r i1 r i2 r i3 x 1 [1] x 1 [2] x 1 [3] x 2 [1] x 2 [2] x 2 [3] x 3 [1] x 3 [2] x 3 [3] x 4 [1] x 4 [2] x 4 [3] x 5 [1] x 5 [2] x 5 [3] x 6 [1] x 6 [2] x 6 [3]

17 Maximum likelihood estimation from soft assignments R G B Cluster 1 weights 33 x 1 [1] x 1 [2] x 1 [3] 0.30 x 2 [1] x 2 [2] x 2 [3] 0.01 Cluster 2 R G B weights x 3 [1] x 3 [2] x 3 [3] x 4 [1] x 1 x[1] 4 [2] x 1 x[2] 4 [3] x [3] 0.18 x 5 [1] x 2 x[1] 5 [2] x 2 x[2] 5 [3] x [3] 0.26 Cluster 3 R G B weights x 6 [1] x 3 x[1] 6 [2] x 3 x[2] 6 [3] x [3] x 4 [1] x 1 [1] x 4 [2] x 1 [2] x 4 [3] x 1 [3] x 5 [1] x 2 [1] x 5 [2] x 2 [2] x 5 [3] x 2 [3] x 6 [1] x 3 [1] x 6 [2] x 3 [2] x 6 [3] x 3 [3] x 4 [1] x 4 [2] x 4 [3] 0.15 x 5 [1] x 5 [2] x 5 [3] 0.02 x 6 [1] x 6 [2] x 6 [3] 0.01 Cluster-specific location/shape MLE R G B Cluster 1 weights x 1 [1] x 1 [2] x 1 [3] 0.30 x 2 [1] x 2 [2] x 2 [3] 0.01 x 3 [1] x 3 [2] x 3 [3] x 4 [1] x 4 [2] x 4 [3] 0.75 x 5 [1] x 5 [2] x 5 [3] 0.05 x 6 [1] x 6 [2] x 6 [3] 0.13 ˆµ k = 1 N soft k ˆ k = 1 N soft k Compute cluster parameter estimates with weights on each row operation NX r ik x i i=1 NX r ik (x i ˆµ k )(x i ˆµ k ) T i=1 N soft k = NX r ik i=1 Total weight in cluster k = effective # obs 34 17

18 MLE of cluster proportions ˆ k Total weight in cluster: 35 r i1 r i2 r i Total weight in dataset: 6 ˆ k = Nsoft k N Estimate cluster proportions from relative weights # datapoints N N soft k = NX r ik i=1 Total weight in cluster k = effective # obs Defaults to hard assignment case when r ij in {0,1} Hard assignments have: 1 i in k r ik = 0 otherwise R G B r i1 r i2 r i3 x 1 [1] x 1 [2] x 1 [3] x 2 [1] x 2 [2] x 2 [3] x 3 [1] x 3 [2] x 3 [3] x 4 [1] x 4 [2] x 4 [3] x 5 [1] x 5 [2] x 5 [3] x 6 [1] x 6 [2] x 6 [3] One-hot encoding of cluster assignment Total weight in cluster:

19 Equating the estimates ˆ k = Nsoft k N ˆµ k = 1 N soft k ˆ k = 1 N soft k NX r ik x i i=1 NX r ik (x i ˆµ k )(x i ˆµ k ) T i=1 N soft k = NX i=1 r ik 37 Part 2b summary Still straightforward to compute cluster parameter estimates from soft assignments 38 19

20 Expectation maximization (EM) Expectation maximization (EM): An iterative algorithm Motivates an iterative algorithm: 1. E-step: estimate cluster responsibilities given current parameter estimates 2. M-step: maximize likelihood over parameters given current responsibilities 40 20

21 EM for mixtures of Gaussians in pictures initialization 41 EM for mixtures of Gaussians in pictures after 1 st iteration 42 21

22 EM for mixtures of Gaussians in pictures after 2 nd iteration 43 EM for mixtures of Gaussians in pictures converged solution 44 22

23 EM for mixtures of Gaussians in pictures replay 45 The nitty gritty of EM 23

24 Convergence and initialization of EM Convergence of EM EM is a coordinate-ascent algorithm - Can equate E-and M-steps with alternating maximizations of an objective function Convergences to a local mode We will assess via (log) likelihood of data under current parameter and responsibility estimates 48 24

25 Initialization Many ways to initialize the EM algorithm Important for convergence rates & quality of local mode found Examples: - Choose K observations at random to define K centroids. Assign other observations to nearest centriod to form initial parameter estimates. - Pick centers sequentially to provide good coverage of data like in k-means++ - Initialize from k-means solution - Grow mixture model by splitting (and sometimes removing) clusters until K clusters are formed 49 Potential of vanilla EM to overfit 25

26 Overfitting of MLE Maximizing likelihood can overfit to data Imagine at K=2 example with one obs assigned to cluster 1 and others assigned to cluster 2 - What parameter values maximize likelihood? Set center equal to point and shrink variance to 0 51 Likelihood goes to! Overfitting in high dims Doc-clustering example: Imagine only 1 doc assigned to cluster k has word w (or all docs in cluster agree on count of word w) 2 Likelihood maximized by setting µ k [w] = x i [w] and σ w,k = 0 Likelihood of any doc with different count on word w being in cluster k is 0! 52 26

27 Simple regularization of M-step for mixtures of Gaussians Simple fix: Don t let variances à 0! Add small amount to diagonal of covariance estimate Alternatively, take Bayesian approach & place prior on parameters. Similar idea, but all parameter estimates are smoothed via cluster pseudo-observations. 53 Formally relating k-means and EM for mixtures of Gaussians 27

28 Relationship to k-means Consider Gaussian mixture model with Σ = σ 2 σ 2 σ 2 σ 2 Spherically symmetric clusters - Spherical clusters w/ equal variances, so relative likelihoods just fcn of dist to cluster center - As variancesà0, likelihood ratio becomes 0 or 1 - Responsibilities weigh in cluster proportions, but dominated by likelihood disparity and let the variance parameter σ à 0 Datapoint gets fully assigned to nearest center, just as in k-means 55 Infinitesimally small variance EM = k-means 1. E-step: estimate cluster responsibilities given current parameter estimates Infinitesimally small Decision based on distance to nearest cluster center 2. M-step: maximize likelihood over parameters given current responsibilities (hard assignments!) 56 28

29 Summary for the EM algorithm What you can do now Estimate soft assignments (responsibilities) given mixture model parameters Solve maximum likelihood parameter estimation using soft assignments (weighted data) Implement an EM algorithm for inferring soft assignments and cluster parameters - Determine an initialization strategy - Implement a variant that helps avoid overfitting issues Compare and contrast with k-means - Soft vs. hard assignments - k-means as a limiting special case of EM for mixtures of Gaussians 58 29

Expectation Maximization: Inferring model parameters and class labels

Expectation Maximization: Inferring model parameters and class labels Expectation Maximization: Inferring model parameters and class labels Emily Fox University of Washington February 27, 2017 Mixture of Gaussian recap 1 2/27/2017 Jumble of unlabeled images HISTOGRAM blue

More information

Clustering web search results

Clustering web search results Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

Inference and Representation

Inference and Representation Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process

More information

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the

More information

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B Clustering in R d Clustering CSE 250B Two common uses of clustering: Vector quantization Find a finite set of representatives that provides good coverage of a complex, possibly infinite, high-dimensional

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course

More information

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points] CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression

More information

ALTERNATIVE METHODS FOR CLUSTERING

ALTERNATIVE METHODS FOR CLUSTERING ALTERNATIVE METHODS FOR CLUSTERING K-Means Algorithm Termination conditions Several possibilities, e.g., A fixed number of iterations Objects partition unchanged Centroid positions don t change Convergence

More information

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate

More information

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)

More information

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm Gaussian Mixture Models For Clustering Data Soft Clustering and the EM Algorithm K-Means Clustering Input: Observations: xx ii R dd ii {1,., NN} Number of Clusters: kk Output: Cluster Assignments. Cluster

More information

Expectation Maximization (EM) and Gaussian Mixture Models

Expectation Maximization (EM) and Gaussian Mixture Models Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation

More information

Markov Random Fields and Segmentation with Graph Cuts

Markov Random Fields and Segmentation with Graph Cuts Markov Random Fields and Segmentation with Graph Cuts Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project Proposal due Oct 27 (Thursday) HW 4 is out

More information

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD Expectation-Maximization Nuno Vasconcelos ECE Department, UCSD Plan for today last time we started talking about mixture models we introduced the main ideas behind EM to motivate EM, we looked at classification-maximization

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 4, 2015 Today: Graphical models Bayes Nets: EM Mixture of Gaussian clustering Learning Bayes Net structure

More information

IBL and clustering. Relationship of IBL with CBR

IBL and clustering. Relationship of IBL with CBR IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 9, 2012 Today: Graphical models Bayes Nets: Inference Learning Readings: Required: Bishop chapter

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

k-means demo Administrative Machine learning: Unsupervised learning Assignment 5 out Machine learning: Unsupervised learning" David Kauchak cs Spring 0 adapted from: http://www.stanford.edu/class/cs76/handouts/lecture7-clustering.ppt http://www.youtube.com/watch?v=or_-y-eilqo Administrative

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering K-means and Hierarchical Clustering Xiaohui Xie University of California, Irvine K-means and Hierarchical Clustering p.1/18 Clustering Given n data points X = {x 1, x 2,, x n }. Clustering is the partitioning

More information

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each

More information

The EM Algorithm Lecture What's the Point? Maximum likelihood parameter estimates: One denition of the \best" knob settings. Often impossible to nd di

The EM Algorithm Lecture What's the Point? Maximum likelihood parameter estimates: One denition of the \best knob settings. Often impossible to nd di The EM Algorithm This lecture introduces an important statistical estimation algorithm known as the EM or \expectation-maximization" algorithm. It reviews the situations in which EM works well and its

More information

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs An Introduction to Cluster Analysis Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs zhaoxia@ics.uci.edu 1 What can you say about the figure? signal C 0.0 0.5 1.0 1500 subjects Two

More information

Going nonparametric: Nearest neighbor methods for regression and classification

Going nonparametric: Nearest neighbor methods for regression and classification Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 3, 208 Locality sensitive hashing for approximate NN

More information

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................

More information

Missing variable problems

Missing variable problems Missing variable problems In many vision problems, if some variables were known the maximum likelihood inference problem would be easy fitting; if we knew which line each token came from, it would be easy

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

Unsupervised: no target value to predict

Unsupervised: no target value to predict Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes. Outliers Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Concepts What is an outlier? The set of data points that are considerably different than the remainder of the

More information

COMS 4771 Clustering. Nakul Verma

COMS 4771 Clustering. Nakul Verma COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find

More information

Clustering. Machine Learning: Jordan Boyd-Graber University of Maryland K -MEANS AND GMM. Machine Learning: Jordan Boyd-Graber UMD Clustering 1 / 1

Clustering. Machine Learning: Jordan Boyd-Graber University of Maryland K -MEANS AND GMM. Machine Learning: Jordan Boyd-Graber UMD Clustering 1 / 1 Clustering Machine Learning: Jordan Boyd-Graber University of Maryland K -MEANS AND GMM Machine Learning: Jordan Boyd-Graber UMD Clustering 1 / 1 Lecture for Today What is clustering? K-Means Gaussian

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)

More information

Clustering. Clustering in R d

Clustering. Clustering in R d Clustering Clustering in R d Two common uses of clustering: Vector quantization Find a finite set of representatives that provides good coverage of a complex, possibly infinite, high-dimensional space.

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science. Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Image Segmentation Some material for these slides comes from https://www.csd.uwo.ca/courses/cs4487a/

More information

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.

More information

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University K- Means + GMMs Clustering Readings: Murphy 25.5 Bishop 12.1, 12.3 HTF 14.3.0 Mitchell

More information

Biostatistics 615/815 Lecture 20: Expectation-Maximization (EM) Algorithm

Biostatistics 615/815 Lecture 20: Expectation-Maximization (EM) Algorithm Biostatistics 615/815 Lecture 20: Expectation-Maximization (EM) Algorithm Hyun Min Kang March 31st, 2011 Hyun Min Kang Biostatistics 615/815 - Lecture 20 March 31st, 2011 1 / 38 Recap - The Simplex Method

More information

Machine Learning. Supervised Learning. Manfred Huber

Machine Learning. Supervised Learning. Manfred Huber Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D

More information

Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization

Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization CS 650: Computer Vision Bryan S. Morse Optimization Approaches to Vision / Image Processing Recurring theme: Cast vision problem as an optimization

More information

732A54/TDDE31 Big Data Analytics

732A54/TDDE31 Big Data Analytics 732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks

More information

Image Segmentation using Gaussian Mixture Models

Image Segmentation using Gaussian Mixture Models Image Segmentation using Gaussian Mixture Models Rahman Farnoosh, Gholamhossein Yari and Behnam Zarpak Department of Applied Mathematics, University of Science and Technology, 16844, Narmak,Tehran, Iran

More information

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018 CSCI 567, Fall 18 Haipeng Luo Homework #4 Programming Assignment Due: 11:59 pm, ovember 4, 2018 General instructions Your repository will have now a directory P4/. Please do not change the name of this

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week Statistics & Bayesian Inference Lecture 3 Joe Zuntz Overview Overview & Motivation Metropolis Hastings Monte Carlo Methods Importance sampling Direct sampling Gibbs sampling Monte-Carlo Markov Chains Emcee

More information

Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA

Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA Image Analysis & Retrieval CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W 4-5:15pm@Bloch 0012 Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA Zhu Li Dept of CSEE,

More information

K-Means Clustering 3/3/17

K-Means Clustering 3/3/17 K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering

More information

An Introduction to PDF Estimation and Clustering

An Introduction to PDF Estimation and Clustering Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 1 An Introduction to PDF Estimation and Clustering David Corrigan corrigad@tcd.ie Electrical and Electronic Engineering Dept., University

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI

More information

University of Washington Department of Computer Science and Engineering / Department of Statistics

University of Washington Department of Computer Science and Engineering / Department of Statistics University of Washington Department of Computer Science and Engineering / Department of Statistics CSE 547 / Stat 548 Machine Learning (Statistics) for Big Data Homework 2 Winter 2014 Issued: Thursday,

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 9: Data Mining (4/4) March 9, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

Markov chain Monte Carlo methods

Markov chain Monte Carlo methods Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis

More information

Chapter 4: Non-Parametric Techniques

Chapter 4: Non-Parametric Techniques Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density

More information

Clustering and The Expectation-Maximization Algorithm

Clustering and The Expectation-Maximization Algorithm Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008 Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

http://www.xkcd.com/233/ Text Clustering David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Administrative 2 nd status reports Paper review

More information

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Guidelines Submission. Submit a hardcopy of the report containing all the figures and printouts of code in class. For readability

More information

Introduction to Machine Learning

Introduction to Machine Learning Department of Computer Science, University of Helsinki Autumn 2009, second term Session 8, November 27 th, 2009 1 2 3 Multiplicative Updates for L1-Regularized Linear and Logistic Last time I gave you

More information

Learning Objectives. Continuous Random Variables & The Normal Probability Distribution. Continuous Random Variable

Learning Objectives. Continuous Random Variables & The Normal Probability Distribution. Continuous Random Variable Learning Objectives Continuous Random Variables & The Normal Probability Distribution 1. Understand characteristics about continuous random variables and probability distributions 2. Understand the uniform

More information

The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example

The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 8 1 Example: Data augmentation / Auxiliary variables A commonly-used

More information

Markov Random Fields and Segmentation with Graph Cuts

Markov Random Fields and Segmentation with Graph Cuts Markov Random Fields and Segmentation with Graph Cuts Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project Proposal due Oct 30 (Monday) HW 4 is out

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1 Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group

More information

University of Washington Department of Computer Science and Engineering / Department of Statistics

University of Washington Department of Computer Science and Engineering / Department of Statistics University of Washington Department of Computer Science and Engineering / Department of Statistics CSE 599 / Stat 592 Machine Learning (Statistics) for Big Data Homework 2 Winter 2013 Issued: Thursday,

More information

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

More information

Clustering. Image segmentation, document clustering, protein class discovery, compression

Clustering. Image segmentation, document clustering, protein class discovery, compression Clustering CS 444 Some material on these is slides borrowed from Andrew Moore's machine learning tutorials located at: Clustering The problem of grouping unlabeled data on the basis of similarity. A key

More information

Bayes Net Learning. EECS 474 Fall 2016

Bayes Net Learning. EECS 474 Fall 2016 Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semi-supervised learning and expectation-maximization Homeworks #3-#4: the how of Graphical Models

More information

Introduction to Mobile Robotics

Introduction to Mobile Robotics Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,

More information

A Formal Approach to Score Normalization for Meta-search

A Formal Approach to Score Normalization for Meta-search A Formal Approach to Score Normalization for Meta-search R. Manmatha and H. Sever Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA 01003

More information

Semi- Supervised Learning

Semi- Supervised Learning Semi- Supervised Learning Aarti Singh Machine Learning 10-601 Dec 1, 2011 Slides Courtesy: Jerry Zhu 1 Supervised Learning Feature Space Label Space Goal: Optimal predictor (Bayes Rule) depends on unknown

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components Review Lecture 14 ! PRINCIPAL COMPONENT ANALYSIS Eigenvectors of the covariance matrix are the principal components 1. =cov X Top K principal components are the eigenvectors with K largest eigenvalues

More information

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) 10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) Rahil Mahdian 01.04.2016 LSV Lab, Saarland University, Germany What is clustering? Clustering is the classification of objects into different groups,

More information

Going nonparametric: Nearest neighbor methods for regression and classification

Going nonparametric: Nearest neighbor methods for regression and classification Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 8, 28 Locality sensitive hashing for approximate NN

More information

Multi-task Multi-modal Models for Collective Anomaly Detection

Multi-task Multi-modal Models for Collective Anomaly Detection Multi-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide ( Ide-san ), Dzung T. Phan, J. Kalagnanam PhD, Senior Technical Staff Member IBM Thomas J. Watson Research Center This slides

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information