Expectation Maximization!

Similar documents
Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Flat Clustering. Slides are mostly from Hinrich Schütze. March 27, 2017

CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16)

Introduction to Information Retrieval

Introduction to Information Retrieval

PV211: Introduction to Information Retrieval


Introduction to Information Retrieval

MI example for poultry/export in Reuters. Overview. Introduction to Information Retrieval. Outline.

Information Retrieval and Organisation

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

Clustering CE-324: Modern Information Retrieval Sharif University of Technology

Introduction to Mobile Robotics

Pattern Recognition Lecture Sequential Clustering

Data Clustering. Danushka Bollegala

Machine Learning. Unsupervised Learning. Manfred Huber

Administrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES

ALTERNATIVE METHODS FOR CLUSTERING

Information Retrieval and Web Search Engines

CLUSTERING ALGORITHMS

CLUSTERING. Quiz information. today 11/14/13% ! The second midterm quiz is on Thursday (11/21) ! In-class (75 minutes!)

Clustering Lecture 5: Mixture Model

Chapter 9. Classification and Clustering

CS Introduction to Data Mining Instructor: Abdullah Mueen

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Lecture on Modeling Tools for Clustering & Regression

K-means and Hierarchical Clustering

Text Documents clustering using K Means Algorithm

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Big Data Analytics! Special Topics for Computer Science CSE CSE Feb 9

Information Retrieval and Web Search Engines

Unsupervised Learning

CS490W. Text Clustering. Luo Si. Department of Computer Science Purdue University

Artificial Intelligence. Programming Styles

INF 4300 Classification III Anne Solberg The agenda today:

Mixture Models and the EM Algorithm

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Clustering: K-means and Kernel K-means

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Administrative. Machine learning code. Machine learning: Unsupervised learning

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

K-Means and Gaussian Mixture Models

Clustering. k-mean clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Clustering CS 550: Machine Learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMS 4771 Clustering. Nakul Verma

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Master-Worker pattern

Using Machine Learning to Optimize Storage Systems

Clustering Algorithms for general similarity measures

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Clustering: Classic Methods and Modern Views

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Clustering web search results

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Clustering algorithms

Clustering k-mean clustering

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering

Clustering. RNA-seq: What is it good for? Finding Similarly Expressed Genes. Data... And Lots of It!

IBL and clustering. Relationship of IBL with CBR

Based on Raymond J. Mooney s slides

Network Traffic Measurements and Analysis

COMP 465: Data Mining Still More on Clustering

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

SGN (4 cr) Chapter 11

Inference and Representation

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Data Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1394

Note Set 4: Finite Mixture Models and the EM Algorithm

Chapter 4: Text Clustering

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

Master-Worker pattern

K-Means Clustering. Sargur Srihari

Segmentation: Clustering, Graph Cut and EM

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Traditional clustering fails if:

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Clustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures

CS145: INTRODUCTION TO DATA MINING

Clustering will not be satisfactory if:

Lecture 11: Clustering Introduction and Projects Machine Learning

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Clustering and The Expectation-Maximization Algorithm

CHAPTER 4: CLUSTER ANALYSIS

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

Data Informatics. Seon Ho Kim, Ph.D.

Exploratory Analysis: Clustering

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Clustering. Image segmentation, document clustering, protein class discovery, compression

Semi-supervised Clustering

Data Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1396

Finding Clusters 1 / 60

Transcription:

Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University and http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt

Steps in Clustering Select Features Define a Proximity Measure Define Clustering Criterion Define a Clustering Algorithm Validate the Results Interpret the Results

Kinds of Clustering Sequential Fast Results depend on data order Cost Optimization Fixed number of clusters (typically) Hierarchical Start with many clusters join clusters at each step

A Sequential Clustering Method m =1 C 1 = {x 1 } For i = 2 to n Find C k :d(x i,c k ) = min j d(x i,c j ) If (d(x i,c k ) > Θ) and (m < q) Basic Sequential Algorithmic Scheme (BSAS) S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic Press, London England, 1999 Assumption: The number of clusters is not known in advance. m = m +1 C m = {x i } Else C k = C k {x i } End d(x,c) = the distance between feature vector x and cluster C. Θ = the threshold of dissimilarity q = the maximum number of clusters n = the number of data points End

The K-means algorithm 1. Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids (means). 2. Assign each object to the group that has the closest centroid (mean). 3. When all objects have been assigned, recalculate the positions of the K centroids (means). 4. Repeat Steps 2 and 3 until the centroids no longer move.

Reconstruction Error { } ( ) = = = = otherwise 0 min if 1 1 j t j i t t i t i i t t i k i i b b E m x m x m x m X

k-means Clustering

Model-based Clustering K-means is a special case of model based clustering

Hard vs. soft clustering Hard clustering: Each document belongs to exactly one cluster More common and easier to do Soft clustering: A document can belong to more than one cluster. document about Chinese cars (china and automobiles) document about electric cars (technology and environment)

Model Based Clustering k-means

EM is a general framework Create an initial model, θ Arbitrarily, randomly, or with a small set of training examples Use the model θ to obtain another model θ such that Σ i log P θ (y i ) > Σ i log P θ (y i ) i.e. better models data Let θ = θ and repeat the above step until reaching a local maximum Guaranteed to find a better model after each iteration

Example - clustering documents

Inferring the Model Parameters from the Data Similar to K Means - Alternates between an expectation step (corresponding to reassignment) And a maximization step (corresponding to recomputation of the parameters of the model)

Maximization Step (re)compute the parameters q mk and alpha k (priors) as follows: if

Expectation Step compute the soft assignment of documents to clusters given the current parameters q mk and alpha k as follows:

E and M steps Expectation: Given the current model, figure out the expected probabilities of the documents belonging to each cluster p(x θ c ) Maximization: Given the probabilistic assignment of all the documents, estimate a new model, θ c Each iteration increases the likelihood of the data and it is guaranteed to converge!

Similar to K-Means Iterate: Assign/cluster each document to closest center Expectation: Given the current model, figure out the expected probabilities of the documents to each cluster p(x θ c ) Recalculate centers as the mean of the points in a cluster Maximization: Given the probabilistic assignment of all the documents, estimate a new model, θ c

EM example Figure from Chris Bishop

EM example Figure from Chris Bishop

Other algorithms K-means and EM-clustering are by far the most popular, particularly for documents However, they can t handle all clustering tasks What types of clustering problems can t they handle?

Non-gaussian data Spectral clustering

What is a good clustering? Internal criteria 25

What is a good clustering? Internal criteria Example of an internal criterion: Reconstruction Error in K- k t t E means ({ mi} X) = b i = 1 t i i x mi b t i 1 = 0 m otherwise = min x But an internal criterion often does not evaluate the actual utility of a clustering in the application. Alternative: External criteria if x t i j t m Evaluate with respect to a human-defined classification j 26

External criteria for clustering quality Based on a gold standard data set, e.g., the Reuters collection we also used for the evaluation of classification Goal: Clustering should reproduce the classes in the gold standard (But we only want to reproduce how documents are divided into groups, not the class labels.) First measure for how well we were able to reproduce the classes: purity 27

External criterion: Purity Ω= {ω 1, ω 2,..., ω K } is the set of clusters and C = {c 1, c 2,..., c J } is the set of classes. For each cluster ω k : find class c j with most members n kj in ω k Sum all n kj and divide by total number of points 28

Sec. 16.3 Purity example Cluster I: (max(5, 1, 0)) = 5 Cluster II: (max(1, 4, 1)) = 4 Cluster III: (max(2, 0, 3)) = 3 Purity(clusters1,II,II, classes XOD) = (1/17) (5 + 4 + 3) 0.71

Normalized Mutual Information (NMI) How much information does the clustering contain about the classification? Singleton clusters (number of clusters = number of docs) have maximum MI Therefore: normalize by entropy of clusters and classes 30

Normalized Mutual Information

Normalized Mutual Information

Rand index Definition: Based on 2x2 contingency table of all pairs of documents: 33 TP+FN+FP+TN is the total number of pairs. There are pairs for N documents. Example: = 136 in o/ /x example Each pair is either positive or negative (the clustering puts the two documents in the same or in different clusters)...... and either true (correct) or false (incorrect): the clustering decision is correct or incorrect. 33

Rand Index: Example 34

Rand measure for the o/ /x example (20 + 72)/(20 + 20 + 24 + 72) 0.68. 35 35

Sec. 16.3 Rand index and Cluster F-measure P = TP TP +FP R = TP TP + FN

Cluster F-measure: Example P = TP TP +FP R = TP TP + FN

Evaluation results for the o/ /x example All four measures range from 0 (really bad clustering) to 1 (perfect clustering). 38