Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University and http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt
Steps in Clustering Select Features Define a Proximity Measure Define Clustering Criterion Define a Clustering Algorithm Validate the Results Interpret the Results
Kinds of Clustering Sequential Fast Results depend on data order Cost Optimization Fixed number of clusters (typically) Hierarchical Start with many clusters join clusters at each step
A Sequential Clustering Method m =1 C 1 = {x 1 } For i = 2 to n Find C k :d(x i,c k ) = min j d(x i,c j ) If (d(x i,c k ) > Θ) and (m < q) Basic Sequential Algorithmic Scheme (BSAS) S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic Press, London England, 1999 Assumption: The number of clusters is not known in advance. m = m +1 C m = {x i } Else C k = C k {x i } End d(x,c) = the distance between feature vector x and cluster C. Θ = the threshold of dissimilarity q = the maximum number of clusters n = the number of data points End
The K-means algorithm 1. Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids (means). 2. Assign each object to the group that has the closest centroid (mean). 3. When all objects have been assigned, recalculate the positions of the K centroids (means). 4. Repeat Steps 2 and 3 until the centroids no longer move.
Reconstruction Error { } ( ) = = = = otherwise 0 min if 1 1 j t j i t t i t i i t t i k i i b b E m x m x m x m X
k-means Clustering
Model-based Clustering K-means is a special case of model based clustering
Hard vs. soft clustering Hard clustering: Each document belongs to exactly one cluster More common and easier to do Soft clustering: A document can belong to more than one cluster. document about Chinese cars (china and automobiles) document about electric cars (technology and environment)
Model Based Clustering k-means
EM is a general framework Create an initial model, θ Arbitrarily, randomly, or with a small set of training examples Use the model θ to obtain another model θ such that Σ i log P θ (y i ) > Σ i log P θ (y i ) i.e. better models data Let θ = θ and repeat the above step until reaching a local maximum Guaranteed to find a better model after each iteration
Example - clustering documents
Inferring the Model Parameters from the Data Similar to K Means - Alternates between an expectation step (corresponding to reassignment) And a maximization step (corresponding to recomputation of the parameters of the model)
Maximization Step (re)compute the parameters q mk and alpha k (priors) as follows: if
Expectation Step compute the soft assignment of documents to clusters given the current parameters q mk and alpha k as follows:
E and M steps Expectation: Given the current model, figure out the expected probabilities of the documents belonging to each cluster p(x θ c ) Maximization: Given the probabilistic assignment of all the documents, estimate a new model, θ c Each iteration increases the likelihood of the data and it is guaranteed to converge!
Similar to K-Means Iterate: Assign/cluster each document to closest center Expectation: Given the current model, figure out the expected probabilities of the documents to each cluster p(x θ c ) Recalculate centers as the mean of the points in a cluster Maximization: Given the probabilistic assignment of all the documents, estimate a new model, θ c
EM example Figure from Chris Bishop
EM example Figure from Chris Bishop
Other algorithms K-means and EM-clustering are by far the most popular, particularly for documents However, they can t handle all clustering tasks What types of clustering problems can t they handle?
Non-gaussian data Spectral clustering
What is a good clustering? Internal criteria 25
What is a good clustering? Internal criteria Example of an internal criterion: Reconstruction Error in K- k t t E means ({ mi} X) = b i = 1 t i i x mi b t i 1 = 0 m otherwise = min x But an internal criterion often does not evaluate the actual utility of a clustering in the application. Alternative: External criteria if x t i j t m Evaluate with respect to a human-defined classification j 26
External criteria for clustering quality Based on a gold standard data set, e.g., the Reuters collection we also used for the evaluation of classification Goal: Clustering should reproduce the classes in the gold standard (But we only want to reproduce how documents are divided into groups, not the class labels.) First measure for how well we were able to reproduce the classes: purity 27
External criterion: Purity Ω= {ω 1, ω 2,..., ω K } is the set of clusters and C = {c 1, c 2,..., c J } is the set of classes. For each cluster ω k : find class c j with most members n kj in ω k Sum all n kj and divide by total number of points 28
Sec. 16.3 Purity example Cluster I: (max(5, 1, 0)) = 5 Cluster II: (max(1, 4, 1)) = 4 Cluster III: (max(2, 0, 3)) = 3 Purity(clusters1,II,II, classes XOD) = (1/17) (5 + 4 + 3) 0.71
Normalized Mutual Information (NMI) How much information does the clustering contain about the classification? Singleton clusters (number of clusters = number of docs) have maximum MI Therefore: normalize by entropy of clusters and classes 30
Normalized Mutual Information
Normalized Mutual Information
Rand index Definition: Based on 2x2 contingency table of all pairs of documents: 33 TP+FN+FP+TN is the total number of pairs. There are pairs for N documents. Example: = 136 in o/ /x example Each pair is either positive or negative (the clustering puts the two documents in the same or in different clusters)...... and either true (correct) or false (incorrect): the clustering decision is correct or incorrect. 33
Rand Index: Example 34
Rand measure for the o/ /x example (20 + 72)/(20 + 20 + 24 + 72) 0.68. 35 35
Sec. 16.3 Rand index and Cluster F-measure P = TP TP +FP R = TP TP + FN
Cluster F-measure: Example P = TP TP +FP R = TP TP + FN
Evaluation results for the o/ /x example All four measures range from 0 (really bad clustering) to 1 (perfect clustering). 38