ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Memorial University of Newfoundland Pattern Recognition Lecture 15, June 29, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 July 2006 Sunday Monday Tuesday Wednesday Thursday Friday Saturday 2 4 6 8 Lecture 16 Lecture 17 Assignment 4 Due 9 11 13 15 Lecture 18 Lecture 19 16 Presentations 18 20 22 Presentations Assignment 5 Due 23 25 27 29 Lecture 21 Lecture 22 Final Reports 2

Last Week Clustering (Unsupervised Classification) Distribution Modes and Minima Pattern Grouping - Group similar patterns using distance metrics. - Merge or split clusters based on cluster similarity measurements. - Measures of clusters goodness # occurrences ß pel value 3 Recap: Simple Grouping using Threshold 1.! k = 1!! (number of clusters) 2.! z1 = x1!! (set first sample as class prototype) 3.! For all other samples xi:! a.! Find zj for which d(xi,zj) is minimum " b." If d(xi,zj) # T then assign xi to Cj! c.! Else k = k+1, zk = xi 4

Recap: Hierarchical Grouping Algorithms Successively merge based on some measure of within or between cluster similarity. For n samples {x1,..., xn} 1.! k = n, Ci = xi for i = 1...n 2.! Merge Ci, Cj which are most similar, k = k-1: min d(c i, C j ) C i = C i C j i,j 3.! Continue until some stopping condition is met. 5 Goodness of Partitioning Several of the stopping conditions suggest that we can use a measure of the scatter of each cluster to gauge how good the overall clustering is. In general, we would like compact clusters with a lot of space between them. We can use the measure of goodness to iteratively move samples from one cluster to another to optimize the groupings. 6

Clustering Criterion Global measurements of the goodness of the clusters. 1. Representative error = summed scatter within clusters Representation error of a clustering is the error from representing the N samples by the k cluster prototypes. Can choose zi to minimize J i = xɛc i x z i 2 7 So Now define scatter matrix Define summed scatter to be Thus J e = tr S W S W = = k i=1 S Wi 8

Clustering Criterion 2. Use volume of summed scatter: S w 9 Clustering Criterion 3. Could use between-cluster to within-cluster scatter So could use S B = k N i (m i m)(m i m) T i=1 tr Sw 1 S B Note: Any within-cluster criterion is minimized with k=n, and thus we would need an independent criterion for k. 10

K-Means Once we have a criterion, we can create an iterative clustering scheme. K-Means is the classic iterative clustering scheme: 1. Choose k prototypes {z1,..., zk} 2. Assign all samples to clusters: xɛc i if d(x, z i ) < d(x, z j ) j i 3. Update {zi} to minimize Ji, i=1,...,k { } z i = 1 x = m N i i 4. Reassign the samples using the new prototypes 5. Continue until no prototypes change. C i 11 ENG 8801/9881 From - KM Special Demo Topics Algorithm in Computer Java Engineering: Applet" Pattern " Recognition " http://web.sfc.keio.ac.jp/~osada/km/index.html 12

K-Means Good Points:! - Simple conceptually! - Successful if k is accurate and clusters are well-separated Problems: " - If k is incorrect, then the clusters can t be right! - Efficiency depends on sample order! - Non-spherical clusters cause problems 13 Extensions to K-Means There are several ways to extend the basic k-means algorithm: 1. Global minimization of Je! As an alternative to simply assigning samples to closest cluster prototype. 2. Allow a variable k 14

J e = k i=1 J i Basic Plan: Global Minimization of Je Ji is the representation error for the i th cluster. Move sample x from Ci to Cj if the magnitude of the increment to the representation error in Jj is less than the decrement to the representation error in Ji δ j < δ i 15 16

17 Similarly, we can show that the decrement to Ji is δ i = N i N i 1 x m i 2 So the reassignment rule (step 2 of K-means) is: - Move x from Ci to Cj if N j N j + 1 x m j 2 < N i N i 1 x m i 2 18

Notes on Global Minimization 1. Rule has little impact when Ni and Nj are very large 2. A point nearly on the MED boundary will be reassigned since N j while N j + 1 < 1 1 N i N i 1 > 1 no matter what Ni and Nj are. 3. If x is an unassigned sample, would get minimum increase to Je with: N i N i + 1 x m i 2 C j C i N j N j + 1 x m j 2 Modifies initial K-Means assignment by taking cluster size into account. If x is equidistant from mi, mj, then assign it to the smallest cluster. 19 Example Consider the following set of samples: (0,2) (1,0) (1,1)(1,2) (1,3) (1,4) (2,2) (3,1) (4,1) (5,0) (5,1) (5,2) (6,1) (7,1) Cluster using basic K-means and using K-means with global minimization method. Use k = 2. What happens if we start with k $ 2 (e.g. k = 3)? 20

Dealing with K Need a way of varying k in accordance with goodness of partitioning. Strategies for dealing with k: 1. Delete clusters with too few samples.! If Ni < T1 drop Ci and zi, and reassign samples from Ci. 2. Merge clusters which are close together.! If (m i m j ) T ( Si + S j 2 ) (m i m j ) < T 2! then replace Ci with the union of Ci and Cj, and drop zj. 3. Split clusters which are spread out.! If maximum eigenvalue of Si is greater than T3, split Ci with a plane through mi perpendicular to max. eigenvector and add new cluster. 21 There are many possible clustering algorithms. See Statistical Pattern Recognition: A Review by Jain, Duin, and Mao for more possibilities. 22