Latent Variable Models and Expectation Maximization
|
|
- April Mosley
- 5 years ago
- Views:
Transcription
1 Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9
2 small then only the white circle is seen, and there is no extrema in entropy. in entropy. There There is anis entropy an entropy extrema extrema when when the the µ, Σ) α f trema scale scale is slightly is slightly larger larger than the thanradius the radius of theofbright bright circle, circle, tative p relative to a ref- which which has has In practice In practice this method this method gives gives stable stable identification identification of fea- of fea- to a refnitydensity re arts assumed are assumed to totures turesaover variety a variety of sizes of sizes and copes and copes well with well with intra-class intra-class ckground model model asin(within a range a range r). r). ant toant scaling, to scaling, although although experimental experimental tests show tests show that this thatis this We as-variability. discussed variability. Theprobabilistic saliency saliency measure measure models is designed is designed attolength betoinvari- invari- is α f d p r f p, U p ) dp r f er. e finder. 1 p(d θ) (N, f) p(d θ) Learning and thereafter and Parameters thereafter entropy the entropy todecreases Probability theasscale scale increases. increases. Distributions detected tures detected using using The n M. second The second is is variable and the and last the last ll ible possible occlusion occlusion l. shape s the shape and ocpearancthe appearance and and compasses many many of of and oc- obabilistic istic way, this way, this constrained objects objects ll a small covariance) covariance) Given fully observed training data, setting parameters θ not entirely i not entirely the case thedue casetodue aliasing to aliasing and other and other effects. effects. Note, Note, for Bayes only monochrome only nets monochrome is straight-forward information information is used is used to detect to detect and represent features. sent infeatures. many settings not all variables are observed and repre- However, (labelled) in the training data: x i = (x i, h i ) 2.3. e.g Feature Speech Feature representation recognition: have speech signals, but not The phoneme feature The feature detector labels detector identifies identifies regions regions of interest of interest on each on each image. e.g. image. Object The coordinates Therecognition: coordinates of theof centre the have centre give object us give Xus and labels Xthe andsize (car, the size bicycle), of but theof not region thepart region gives labels gives S. Figure S. (wheel, Figure 2 illustrates 2door, illustrates this seat) on this two ontypical two typical images images from the frommotorbike the motorbike dataset. dataset. Unobserved variables are called latent variables figs from Fergus et al. Figure Figure 2: Output 2: Output of theof feature the feature detector detector
3 Latent Variables and Simplicity Latent variables can explain observed correlations with a simple model. Fewer parameters. Common in science: The heart, genes, energy, gravity, Smoking Diet Exercise Smoking Diet Exercise 54 HeartDisease Symptom 1 Symptom 2 Symptom Symptom 1 Symptom 2 Symptom 3 (a) (b) Fig. Russell and Norvig 2.1
4 Latent Variable Models: Pros Statistically powerful, often good predictions. Many applications: Learning with missing data. Clustering: missing cluster label for data points. Principal Component Analysis: data points are generated in linear fashion from a small set of unobserved components. (more later) Matrix Factorization, Recommender Systems: Assign users to unobserved user types, assign items to unobserved item types. Use similarity between user type, item type to predict preference of user for item. Winner of $1M Netflix challenge. If latent variables have an intuitive interpretation (e.g., action movies, factors ), discovers new features.
5 Latent Variable Models: Cons From a user s point of view, like a black box if latent variables don t have an intuitive interpretation. Statistically, hard to guarantee convergence to a correct model with more data (the identifiability problem). Harder computationally, usually no closed form for maximum likelihood estimates. However, the Expectation-Maximization algorithm provides a general-purpose local search algorithm for learning parameters in probabilistic models with latent variables.
6 Outline K-Means EM Example: Gaussian Mixture Models The Expectation Maximization Algorithm
7 Outline K-Means EM Example: Gaussian Mixture Models The Expectation Maximization Algorithm
8 Unsupervised Learning 2 (a) 2 We will start with an unsupervised learning (clustering) problem: Given a dataset {x 1,..., x N }, each x i R D, partition the dataset into K clusters Intuitively, a cluster is a group of points, which are close together and far from others
9 Distortion Measure 2 (a) 2 2 (i) 2 Formally, introduce prototypes (or cluster centers) µ k R D Use binary r nk, 1 if point n is in cluster k, otherwise (1-of-K coding scheme again) Find {µ k }, {r nk } to minimize distortion measure: J = N n=1 k=1 K r nk x n µ k 2
10 Minimizing Distortion Measure Minimizing J directly is hard N K J = r nk x n µ k 2 n=1 k=1 However, two things are easy If we know µ k, minimizing J wrt r nk If we know r nk, minimizing J wrt µ k This chicken-and-egg scenario suggests an iterative procedure Start with initial guess for µ k Iteration of two steps: Minimize J wrt r nk Minimize J wrt µ k Rinse and repeat until convergence
11 Minimizing Distortion Measure Minimizing J directly is hard N K J = r nk x n µ k 2 n=1 k=1 However, two things are easy If we know µ k, minimizing J wrt r nk If we know r nk, minimizing J wrt µ k This chicken-and-egg scenario suggests an iterative procedure Start with initial guess for µ k Iteration of two steps: Minimize J wrt r nk Minimize J wrt µ k Rinse and repeat until convergence
12 Minimizing Distortion Measure Minimizing J directly is hard N K J = r nk x n µ k 2 n=1 k=1 However, two things are easy If we know µ k, minimizing J wrt r nk If we know r nk, minimizing J wrt µ k This chicken-and-egg scenario suggests an iterative procedure Start with initial guess for µ k Iteration of two steps: Minimize J wrt r nk Minimize J wrt µ k Rinse and repeat until convergence
13 2 (a) Determining Membership Variables Step 1 in an iteration of K-means is to minimize distortion measure J wrt cluster membership variables r nk 2 2 (b) 2 J = N n=1 k=1 K r nk x n µ k 2 Terms for different data points x n are independent, for each data point set r nk to minimize: K r nk x n µ k 2 How? k=1 Simply set r nk = 1 for the cluster center µ with smallest distance
14 2 (a) Determining Membership Variables Step 1 in an iteration of K-means is to minimize distortion measure J wrt cluster membership variables r nk 2 2 (b) 2 J = N n=1 k=1 K r nk x n µ k 2 Terms for different data points x n are independent, for each data point set r nk to minimize: K r nk x n µ k 2 How? k=1 Simply set r nk = 1 for the cluster center µ with smallest distance
15 2 (a) Determining Membership Variables Step 1 in an iteration of K-means is to minimize distortion measure J wrt cluster membership variables r nk 2 2 (b) 2 J = N n=1 k=1 K r nk x n µ k 2 Terms for different data points x n are independent, for each data point set r nk to minimize: K r nk x n µ k 2 How? k=1 Simply set r nk = 1 for the cluster center µ with smallest distance
16 Determining Cluster Centers Step 2: fix r nk, minimize J wrt the cluster centers µ k 2 (b) K N J = r nk x n µ k 2 switch order of sums k=1 n=1 2 2 (c) 2 Minimize wrt each µ k separately (how?) Take derivative, set to zero: 2 N r nk (x n µ k ) = n=1 µ k = n r nkx n n r nk i.e. mean of datapoints x n assigned to cluster k
17 Determining Cluster Centers Step 2: fix r nk, minimize J wrt the cluster centers µ k 2 (b) K N J = r nk x n µ k 2 switch order of sums k=1 n=1 2 2 (c) 2 Minimize wrt each µ k separately (how?) Take derivative, set to zero: 2 N r nk (x n µ k ) = n=1 µ k = n r nkx n n r nk i.e. mean of datapoints x n assigned to cluster k
18 K-means Algorithm Start with initial guess for µ k Iteration of two steps: Minimize J wrt r nk Assign points to nearest cluster center Minimize J wrt µ k Set cluster center as average of points in cluster Rinse and repeat until convergence
19 K-means example 2 (a) 2
20 K-means example 2 (b) 2
21 K-means example 2 (c) 2
22 K-means example 2 (d) 2
23 K-means example 2 (e) 2
24 K-means example 2 (f) 2
25 K-means example 2 (g) 2
26 K-means example 2 (h) 2
27 K-means example 2 (i) 2 Next step doesn t change membership stop
28 K-means Convergence Repeat steps until no change in cluster assignments For each step, value of J either goes down, or we stop Finite number of possible assignments of data points to clusters, so we are guaranteed to converge eventually Note it may be a local maximum rather than a global maximum to which we converge
29 K-means Example - Image Segmentation Original image K-means clustering on pixel colour values Pixels in a cluster are coloured by cluster mean Represent each pixel (e.g. 24-bit colour value) by a cluster number (e.g. 4 bits for K = 1), compressed version
30 K-means Generalized: the set-up Let s generalize the idea. Suppose we have the following set-up. x denotes all observed variables (e.g., data points). Z denotes all latent (hidden, unobserved) variables (e.g., cluster means). J(x, Z θ) where J measures the goodness of an assignment of latent variable models given the data points and parameters θ. e.g., J = -dispersion measure. parameters = assignment of points to clusters. It s easy to maximize J(x, Z θ) wrt θ for fixed Z. It s easy to maximize J(x, Z θ) wrt Z for fixed θ.
31 Outline K-Means EM Example: Gaussian Mixture Models The Expectation Maximization Algorithm
32 Hard Assignment vs. Soft Assignment 2 (i) In the K-means algorithm, a hard 2 assignment of points to clusters is made However, for points near the decision boundary, this may not be a good idea Instead, we could think about making a soft assignment of points to clusters
33 Gaussian Mixture Model 1 (a) 1 (b) The Gaussian mixture model (or mixture of Gaussians MoG) models the data as a combination of Gaussians. a: constant density contours. b: marginal probability p(x). c: surface plot. Widely used general approximation for multi-modal distributions. See text Sec for kernel version
34 Gaussian Mixture Model 1 (b) 1 (a) Above shows a dataset generated by drawing samples from three different Gaussians
35 ag=1) θ Generative Model ag C 1 (c).5 pper Hole X.5 1 (a) (b) The mixture of Gaussians is a generative model To generate a datapoint x, we first generate a value C = i for the component/cluster C {1, 2, 3} We then generate a vector x using P(x C = i) N (x µ i, Σ i ) using the Gaussian parameters for component i
36 B) P(Bag=1) θ MoG Marginal over Observed Variables Bag C vor Wrapper Hole X (a) (b) The marginal distribution p(x θ) for this model with parameter values θ is: p(x) = = 3 3 p(x, C = i) = p(c = i)p(x C = i) i=1 3 w i N (x µ i, Σ i ) i=1 i=1 A mixture of Gaussians
37 P(Bag=1) θ Learning Gaussian Mixture Models B) Bag C vor Wrapper Hole X The (a) EM algorithm can be used (b) to learn Bayes net parameters with unobserved variables 1. Initialize parameters randomly (or using k-means) 2. Repeat the following Expectation E-step Find probability of cluster assignments Maximization M-step Update parameter values
38 =1) E-step: Probability of Cluster Assignments g C 1 (b).5 per Hole X.5 1 ) (b) How to find p ij = P(C = i x j )? Use Bayes Theorem p ij P(x j C = i)p(c = i) Bayes net inference!
39 =1) E-step: Probability of Cluster Assignments g C 1 (b).5 per Hole X.5 1 ) (b) How to find p ij = P(C = i x j )? Use Bayes Theorem p ij P(x j C = i)p(c = i) Bayes net inference!
40 EM for Gaussian Mixtures: Update Parameters Update parameters wrt soft cluster assignments p ij n i j p ij w i P(C = i) := n i /N µ i := j p ij x j /n i Σ i = j p ij (x j µ i )(x j µ i ) T /n i n i is the effective (estimated) number of data points from component i
41 MoG EM - Example 2 (a) 2 Same initialization as with K-means before Often, K-means is actually used to initialize EM
42 MoG EM - Example 2 (b) 2 Calculate cluster probabilities p ij Shown using mixtures of colours.
43 MoG EM - Example 2 (c) 2 Calculate model parameters {P(C = i), µ i, Σ i } using the cluster probabilities
44 MoG EM - Example 2 (d) 2 Iteration 2
45 MoG EM - Example 2 (e) 2 Iteration 5
46 MoG EM - Example 2 Iteration 2 - converged (f) 2
47 Log-likelihood L EM and Maximum Likelihood 1. The EM procedure is guaranteed to increase at each step, the data log-likelihood ln p(d θ). 2. Therefore converges to local log-likelihood maximum. 3. Data log-likelihood of true model is around 65 in figure Iteration number Fig. Russell and Norvig 2.12
48 Outline K-Means EM Example: Gaussian Mixture Models The Expectation Maximization Algorithm
49 Hard EM Algorithm We assume a probabilistic model, specifically the complete-data log likelihood function P(x, Z = z θ). Goodness of the model is the log-likelihood L(x, Z = z θ) ln P(x, Z = z θ). Guess the value for latent variables that is the expected value given current parameter settings: E[Z] where the distribution over latent variables is given by P(Z x, θ (i) ). Given latent variable values, the next parameter values θ i+1 are evaluated by maximizing the likelihood L(x Z = E[Z], θ). Example: fill in missing values, single variable, Gaussian model.
50 Generalized EM Algorithm In Bayesian fashion, do not guess a best value for the latent variables Z. Instead, average over the distribution P(Z x, θ (i) ) given the current hypothesis. Given a latent variable distribution, the next parameter values θ (i+1) are evaluated by taking the expected likelihood L(x, Z = z θ (i+1) ) over all possible latent variable settings. In one equation: θ (i+1) = argmax θ P(Z x, θ (i) )L(x, Z = z θ) z
51 EM Algorithm: The procedure 1. Guess an initial parameter setting θ (i). 2. Repeat until convergence: 2.1 The E-step: Evaluate P(Z = z x, θ (i) ). 2.2 The M-step: Evaluate the function Q(θ, θ (i) ) z P(Z x, θ(i) )L(x, Z = z θ) Maximize Q(θ, θ (i) ) wrt θ. Update θ (i+1) to be the maximum.
52 Bag P(F=cherry B) 1 θ F1 2 θ F2 P(Bag=1) EM θfor Gaussian Mixtures: E-step (I) Bag C Flavor Wrapper Hole X (a) (b) Want to evaluate Q(θ, θ (i) ) z P(Z x, θ(i) )L(x, Z = z θ) Let Z ij = 1 denote the event that data point j is assigned to cluster i. Then L(x, Z = z θ) = ln P(x j C = i, θ) z ij P(C = i θ) z ij i j = z ij [ln P(x j C = i, θ) z ij + ln P(C = i θ)] i j z ij l ij i j
53 P(Bag=1) EM for Gaussian Mixtures: E-step (II) θ Bag P(F=cherry B) 1 θ F1 2 θ F2 Bag C Flavor Wrapper Hole (a) (b) Want to evaluate Q(θ, θ (i) ) z P(Z x, θ(i) ) i j z ijl ij This is the expectation of a sum = sum of expectations (see assignment). Therefore P(Z x, θ (i) ) z ij l ij = z i j = P(Z ij = 1 x, θ (i) )l ij = i j p ijl ij i j X E Zij x,θ (i)[z ijl ij ] i j
54 EM for Gaussian Mixtures: E-step (III) P(Bag=1) θ Bag P(F=cherry B) Bag C 1 θ F1 2 θ F2 Flavor Wrapper Hole X (a) Want to evaluate Q(θ, θ (i) ) z P(Z x, θ(i) )L(x, Z = z θ) We showed that this is equal to i j p ijl(x j, z ij θ) Deriving the maximization solution is more or less straightforward and leads to the updates shown before. (b) Variations on this theme work for many models where the likelihood function factors (think Bayes nets).
55 EM - Summary EM finds local maximum to (incomplete) data likelihood P(x θ) = Z p(x, Z θ) Iterates two steps: E step calculates the distribution of the missing variables Z (Hard EM fills in the variables). M step maximizes expected complete log likelihood (expectation wrt E step distribution) Often the updates can be obtained in closed form E.g., for discrete Bayesian networks (see Ch.2.3.2)
56 Conclusion K-means clustering Gaussian mixture model What about K? Model selection: either cross-validation or Bayesian version (average over all values for K) Expectation-maximization, a general method for learning parameters of models when not all variables are observed
Introduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationExpectation Maximization (EM) and Gaussian Mixture Models
Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation
More informationInference and Representation
Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised
More informationClustering web search results
Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means
More information9.1. K-means Clustering
424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationGaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm
Gaussian Mixture Models For Clustering Data Soft Clustering and the EM Algorithm K-Means Clustering Input: Observations: xx ii R dd ii {1,., NN} Number of Clusters: kk Output: Cluster Assignments. Cluster
More informationMixture Models and EM
Table of Content Chapter 9 Mixture Models and EM -means Clustering Gaussian Mixture Models (GMM) Expectation Maximiation (EM) for Mixture Parameter Estimation Introduction Mixture models allows Complex
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationK-means and Hierarchical Clustering
K-means and Hierarchical Clustering Xiaohui Xie University of California, Irvine K-means and Hierarchical Clustering p.1/18 Clustering Given n data points X = {x 1, x 2,, x n }. Clustering is the partitioning
More informationLecture 16: Object recognition: Part-based generative models
Lecture 16: Object recognition: Part-based generative models Professor Stanford Vision Lab 1 What we will learn today? Introduction Constellation model Weakly supervised training One-shot learning (Problem
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov
ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern
More informationCluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008
Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a
More informationCS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning
CS325 Artificial Intelligence Cengiz Spring 2013 Unsupervised Learning Missing teacher No labels, y Just input data, x What can you learn with it? Unsupervised Learning Missing teacher No labels, y Just
More informationColorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.
Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Image Segmentation Some material for these slides comes from https://www.csd.uwo.ca/courses/cs4487a/
More informationMachine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves
Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationIBL and clustering. Relationship of IBL with CBR
IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed
More informationMachine Learning Lecture 3
Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 9, 2012 Today: Graphical models Bayes Nets: Inference Learning Readings: Required: Bishop chapter
More informationLearning from Data Mixture Models
Learning from Data Mixture Models Copyright David Barber 2001-2004. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 1 It is not uncommon for data to come
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationExpectation Maximization: Inferring model parameters and class labels
Expectation Maximization: Inferring model parameters and class labels Emily Fox University of Washington February 27, 2017 Mixture of Gaussian recap 1 2/26/17 Jumble of unlabeled images HISTOGRAM blue
More informationMachine Learning Lecture 3
Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course
More informationMachine Learning Lecture 3
Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear
More informationClustering. Image segmentation, document clustering, protein class discovery, compression
Clustering CS 444 Some material on these is slides borrowed from Andrew Moore's machine learning tutorials located at: Clustering The problem of grouping unlabeled data on the basis of similarity. A key
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationDeep Generative Models Variational Autoencoders
Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative
More informationIntroduction to Mobile Robotics
Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,
More informationCluster Analysis. Debashis Ghosh Department of Statistics Penn State University (based on slides from Jia Li, Dept. of Statistics)
Cluster Analysis Debashis Ghosh Department of Statistics Penn State University (based on slides from Jia Li, Dept. of Statistics) Summer School in Statistics for Astronomers June 1-6, 9 Clustering: Intuition
More informationAssignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018
Assignment 2 Unsupervised & Probabilistic Learning Maneesh Sahani Due: Monday Nov 5, 2018 Note: Assignments are due at 11:00 AM (the start of lecture) on the date above. he usual College late assignments
More informationDiscovering Visual Hierarchy through Unsupervised Learning Haider Razvi
Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi hrazvi@stanford.edu 1 Introduction: We present a method for discovering visual hierarchy in a set of images. Automatically grouping
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More informationExpectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University
Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 4, 2015 Today: Graphical models Bayes Nets: EM Mixture of Gaussian clustering Learning Bayes Net structure
More informationExpectation-Maximization. Nuno Vasconcelos ECE Department, UCSD
Expectation-Maximization Nuno Vasconcelos ECE Department, UCSD Plan for today last time we started talking about mixture models we introduced the main ideas behind EM to motivate EM, we looked at classification-maximization
More informationClustering & Dimensionality Reduction. 273A Intro Machine Learning
Clustering & Dimensionality Reduction 273A Intro Machine Learning What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationMachine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization
Machine Learning A 708.064 WS15/16 1sst KU Version: January 11, 2016 Exercises Problems marked with * are optional. 1 Conditional Independence I [3 P] a) [1 P] For the probability distribution P (A, B,
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationCIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]
CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Exam
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationClustering. Shishir K. Shah
Clustering Shishir K. Shah Acknowledgement: Notes by Profs. M. Pollefeys, R. Jin, B. Liu, Y. Ukrainitz, B. Sarel, D. Forsyth, M. Shah, K. Grauman, and S. K. Shah Clustering l Clustering is a technique
More informationMachine Learning A W 1sst KU. b) [1 P] For the probability distribution P (A, B, C, D) with the factorization
Machine Learning A 708.064 13W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence a) [1 P] For the probability distribution P (A, B, C, D) with the factorization P (A, B,
More informationThe EM Algorithm Lecture What's the Point? Maximum likelihood parameter estimates: One denition of the \best" knob settings. Often impossible to nd di
The EM Algorithm This lecture introduces an important statistical estimation algorithm known as the EM or \expectation-maximization" algorithm. It reviews the situations in which EM works well and its
More informationAn Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework
IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster
More informationCOMPUTATIONAL STATISTICS UNSUPERVISED LEARNING
COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING Luca Bortolussi Department of Mathematics and Geosciences University of Trieste Office 238, third floor, H2bis luca@dmi.units.it Trieste, Winter Semester
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationSemantic Segmentation. Zhongang Qi
Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in
More informationhttp://www.xkcd.com/233/ Text Clustering David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Administrative 2 nd status reports Paper review
More informationDay 3 Lecture 1. Unsupervised Learning
Day 3 Lecture 1 Unsupervised Learning Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationAnnouncements. Image Segmentation. From images to objects. Extracting objects. Status reports next Thursday ~5min presentations in class
Image Segmentation Announcements Status reports next Thursday ~5min presentations in class Project voting From Sandlot Science Today s Readings Forsyth & Ponce, Chapter 1 (plus lots of optional references
More informationMore Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA
More Learning Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA 1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector
More informationLogisBcs. CS 6140: Machine Learning Spring K-means Algorithm. Today s Outline 3/27/16
LogisBcs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and InformaBon Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Exam
More informationGeoff McLachlan and Angus Ng. University of Queensland. Schlumberger Chaired Professor Univ. of Texas at Austin. + Chris Bishop
EM Algorithm Geoff McLachlan and Angus Ng Department of Mathematics & Institute for Molecular Bioscience University of Queensland Adapted by Joydeep Ghosh Schlumberger Chaired Professor Univ. of Texas
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationk-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out
Machine learning: Unsupervised learning" David Kauchak cs Spring 0 adapted from: http://www.stanford.edu/class/cs76/handouts/lecture7-clustering.ppt http://www.youtube.com/watch?v=or_-y-eilqo Administrative
More informationMachine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University K- Means + GMMs Clustering Readings: Murphy 25.5 Bishop 12.1, 12.3 HTF 14.3.0 Mitchell
More informationCS 231A Computer Vision (Fall 2012) Problem Set 3
CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationLecture 11: E-M and MeanShift. CAP 5415 Fall 2007
Lecture 11: E-M and MeanShift CAP 5415 Fall 2007 Review on Segmentation by Clustering Each Pixel Data Vector Example (From Comanciu and Meer) Review of k-means Let's find three clusters in this data These
More informationClustering Color/Intensity. Group together pixels of similar color/intensity.
Clustering Color/Intensity Group together pixels of similar color/intensity. Agglomerative Clustering Cluster = connected pixels with similar color. Optimal decomposition may be hard. For example, find
More informationLast week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints
Last week Multi-Frame Structure from Motion: Multi-View Stereo Unknown camera viewpoints Last week PCA Today Recognition Today Recognition Recognition problems What is it? Object detection Who is it? Recognizing
More informationSelf-organizing mixture models
Self-organizing mixture models Jakob Verbeek, Nikos Vlassis, Ben Krose To cite this version: Jakob Verbeek, Nikos Vlassis, Ben Krose. Self-organizing mixture models. Neurocomputing / EEG Neurocomputing,
More informationMore on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization
More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial
More information22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos
Machine Learning for Computer Vision 1 22 October, 2012 MVA ENS Cachan Lecture 5: Introduction to generative models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationCS Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts
More informationFisher vector image representation
Fisher vector image representation Jakob Verbeek January 13, 2012 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.11.12.php Fisher vector representation Alternative to bag-of-words image representation
More informationHomework #4 Programming Assignment Due: 11:59 pm, November 4, 2018
CSCI 567, Fall 18 Haipeng Luo Homework #4 Programming Assignment Due: 11:59 pm, ovember 4, 2018 General instructions Your repository will have now a directory P4/. Please do not change the name of this
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Unsupervised learning Daniel Hennes 29.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Supervised learning Regression (linear
More informationGraphical Models & HMMs
Graphical Models & HMMs Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Graphical Models
More information10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)
10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) Rahil Mahdian 01.04.2016 LSV Lab, Saarland University, Germany What is clustering? Clustering is the classification of objects into different groups,
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationCluster analysis formalism, algorithms. Department of Cybernetics, Czech Technical University in Prague.
Cluster analysis formalism, algorithms Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz poutline motivation why clustering? applications, clustering as
More informationComputer Vision Lecture 6
Course Outline Computer Vision Lecture 6 Segmentation Image Processing Basics Structure Extraction Segmentation Segmentation as Clustering Graph-theoretic Segmentation 12.11.2015 Recognition Global Representations
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationInvariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction
Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of
More informationIntroduction to Graphical Models
Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability
More information