Clustering. Image segmentation, document clustering, protein class discovery, compression

Similar documents
Clustering and Dimensionality Reduction

K-means and Hierarchical Clustering

Clustering web search results

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Introduction to Machine Learning CMU-10701

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

The EM Algorithm Lecture What's the Point? Maximum likelihood parameter estimates: One denition of the \best" knob settings. Often impossible to nd di

K-Means Clustering 3/3/17

COMS 4771 Clustering. Nakul Verma

Unsupervised Learning: Clustering

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

ECE 5424: Introduction to Machine Learning

ALTERNATIVE METHODS FOR CLUSTERING

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

Mixture Models and the EM Algorithm

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

K-means and Hierarchical Clustering

Inference and Representation

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Machine Learning. Unsupervised Learning. Manfred Huber

K-Means and Gaussian Mixture Models

Mixture Models and EM

Clustering Lecture 5: Mixture Model

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Latent Variable Models and Expectation Maximization

Lecture 8: The EM algorithm

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm

Cost Functions in Machine Learning

10703 Deep Reinforcement Learning and Control

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Machine Learning

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

Clustering. Shishir K. Shah

Introduction to Machine Learning. Xiaojin Zhu

Expectation Maximization!

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Announcements. Image Segmentation. From images to objects. Extracting objects. Status reports next Thursday ~5min presentations in class

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

CS 6140: Machine Learning Spring 2016

Supervised vs. Unsupervised Learning

Perceptron as a graph

Probabilistic Graphical Models

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

LogisBcs. CS 6140: Machine Learning Spring K-means Algorithm. Today s Outline 3/27/16

CS787: Assignment 3, Robust and Mixture Models for Optic Flow Due: 3:30pm, Mon. Mar. 12, 2007.

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

CLUSTERING. JELENA JOVANOVIĆ Web:

CS Introduction to Data Mining Instructor: Abdullah Mueen

Expectation Maximization: Inferring model parameters and class labels

CSC 411: Lecture 12: Clustering

Machine Learning. Semi-Supervised Learning. Manfred Huber

1 Case study of SVM (Rob)

Feature Matching and Robust Fitting

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

CS 229 Midterm Review

Fall 09, Homework 5

Unsupervised Learning

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today s outline: pp

Neural Network Neurons

CS 8520: Artificial Intelligence

An Introduction to PDF Estimation and Clustering

Clustering. Supervised vs. Unsupervised Learning

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

Quiz Section Week 8 May 17, Machine learning and Support Vector Machines

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Expectation-Maximization Algorithm and Image Segmentation

Mixture Models and EM

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Three-Dimensional Sensors Lecture 6: Point-Cloud Registration

Data-Intensive Computing with MapReduce

Bayesian K-Means as a Maximization-Expectation Algorithm

Introduction to Machine Learning

Introduction to Mobile Robotics

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

Image Deconvolution.

Content-based image and video analysis. Machine learning

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Clustering and The Expectation-Maximization Algorithm

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

Performance Measures

Unsupervised Learning

Semi- Supervised Learning

10703 Deep Reinforcement Learning and Control

Machine Learning

Clustering Color/Intensity. Group together pixels of similar color/intensity.

Expectation Maximization: Inferring model parameters and class labels

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Transcription:

Clustering CS 444 Some material on these is slides borrowed from Andrew Moore's machine learning tutorials located at:

Clustering The problem of grouping unlabeled data on the basis of similarity. A key component of data mining is there useful structure hidden in this data? Applications: Image segmentation, document clustering, protein class discovery, compression

K-means Ask user how many clusters they d like. (e.g. k=5)

K-means Ask user how many clusters they d like. (e.g. k=5) Randomly guess k cluster Center locations

K-means Ask user how many clusters they d like. (e.g. k=5) Randomly guess k cluster Center locations Each datapoint finds out which Center it s closest to. (Thus each Center owns a set of datapoints)

K-means Ask user how many clusters they d like. (e.g. k=5) Randomly guess k cluster Center locations Each datapoint finds out which Center it s closest to. Each Center finds the centroid of the points it owns

K-means Ask user how many clusters they d like. (e.g. k=5) Randomly guess k cluster Center locations Each datapoint finds out which Center it s closest to. Each Center finds the centroid of the points it owns and jumps there Repeat until terminated!

K-means Start Advance apologies: in Black and White this example will deteriorate Example generated by Dan Pelleg s super duper fast K means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999, (KDD99) (available on www.autonlab.org/pap.html)

K-means continues

K-means continues

K-means continues

K-means continues

K-means continues

K-means continues

K-means continues

K-means continues

K-means terminates

K-Means This can be seen as attempting to minimize the total squared difference between the data points and their clusters. It is guaranteed to converge. It is not guaranteed to reach a global minima. Commonly used because: It is easy to code It is efficient

Parameterized Probability Distributions Parameterized probability distribution: P( X )=P(X θ) - The parameters for the distribution. Trivial discrete example: X is a Boolean random variable indicates the probability that it will be true. =.6 =.1 p X=TRUE =.6 =.6 p X=FALSE =.6 =.4 p X=TRUE =.1 =.1 p X=FALSE =.1 =.9

Fitting a Distribution to Data Assume we have a set of data points x1 to x N. The goal is to find a distribution that fits that data. I.e. that could have generated the data. One possibility: Maximum likelihood estimate (MLE) find the parameters that maximize the probability of the data: =argmax P x 1, x 2,, x N

ML Learning We will assume that x1 to x N are iid independent and identically distributed. So we can rewrite our problem like this (factorization): =argmax N i=1 P x i Then we can apply our favorite log trick giving us log likelihood: =argmax N i=1 log P x i =argmax LL

Maximizing Log Likelihood Just another instance of function maximization. One approach, set the partial derivatives to 0 and solve: If you can't solve it, gradient descent, or your favorite search algorithm. LL 1 =0 LL 2 =0 LL K =0

Silly Example Parameterized coin: Theta probability of heads: d -- vector of toss data, h number of heads, t number of tails. N P d = i=1 P d i = h 1 t L d = log P d = h log t log 1 L = h t 1 = 0 = h h t Remember: d dx log(x)=1/ x Example borrowed from Russel & Norvig

EM A general approach to maximum likelihood learning in cases of missing data. E.g. clustering... Each data point REALLY comes from some cluster, but that data is missing.

EM Hidden variables are Z, observed variables are X. Guess an assignment to our parameters:. Expectation-Step: Compute the expected value of our hidden variables E[Z]. Maximization-Step Pretend that E[Z] is the true value of Z and use maximum likelihood to calculate a new

Gaussian Mixture Example We have this: Life would be easier if we had this:

EM for GMM E-Step (pi,j is the probability that point i was generated by mixture component j ) p i, j = K k=1 p x i j, j j p x i k, k k

EM for GMM M Step: Update the parameters: j = N i=1 N i=1 p i, j x i p i, j j = N i=1 p i, j x i j x i j T N i=1 p i, j N j = 1 N i=1 p i, j

Gaussian Mixture Example: Start Advance apologies: in Black and White this example will be incomprehensible

After first iteration

After 2nd iteration

After 3rd iteration

After 4th iteration

After 5th iteration

After 6th iteration

After 20th iteration