7/19/09 A Primer on Clustering: I. Models and Algorithms 1

Size: px

Start display at page:

Download "7/19/09 A Primer on Clustering: I. Models and Algorithms 1"

Milo Ethan Charles
6 years ago
Views:

1 7/19/9 A Primer on Clustering: I. Models and Algorithms 1

2 7/19/9 A Primer on Clustering: I. Models and Algorithms 2

3 7/19/9 A Primer on Clustering: I. Models and Algorithms 3

U only models : SAHN partitioning 3. V only models : SOM point prototypes 4.

4 A Primer on Clustering : I.Models and Algorithms Atenção! Os Estados Unidos 2 : A Espanha 1. Clustering: similarity, labels, and partitions 2. U only models : SAHN partitioning 3. V only models : SOM point prototypes 4. (U, V) models : c-means with AO 5. (U V, +) models : GMD with EM-AO 7/19/9 A Primer on Clustering: I. Models and Algorithms 4

5 Numerical Pattern Recognition 4 topics Humans : "Knowledge Tidbits" Process Description Feature Nomination X = Object Data R = Relation Data Design Test Sensors Process Description Feature Analysis Classifier Design Feature Analysis Extraction PreProcessing 2D - Display Identification Prediction Control Classifier Design This Module: Cluster Analysis Cluster Analysis Exploration Validity 2D - Display 7/19/9 A Primer on Clustering: I. Models and Algorithms 5

6 2 Kinds of Data for Clustering Objects O= {o 1,, o n } : o i = i-th physical object Object Data X = {x 1,, x n } R p : x i = feature vector for o i x ji = j-th (measured) feature of x i : 1 j p Relational R = [r ij ] = relationship (o i, o j ) ) or (x i, x j ) d ij Data s ij = pairwise similarity (o i, o j ) or (x i, x j ) d ij = pairwise dissimilarity (o i, o j ) or (x i, x j ) Tyically (R = D) d ii = d ij > d ij = d ji : 1 i n : 1 i,j n : 1 i j n (Positive-definite) (Symmetric) We often convert XD with distance d ij = x i x j 7/19/9 A Primer on Clustering: I. Models and Algorithms 6

7 Measuring (Dis)Similarity : The Usual Suspects Mahalonobis <,> Norms d Maha = x v = (x M 1 v)t M 1 (x v) Diagonal d diag = x v DM -1 = (x v)t D M -1 (x v) Euclidean d 2 = x v I = (x v) T (x v) City Block Minkowski Norms d 1 = x v 1 p = x j v j j=1 Sup or Max d = x v = max{ x j v } j 1jp 7/19/9 A Primer on Clustering: I. Models and Algorithms 7

8 Sample Parameters of the Statistical Norms Mean Vector v = n x k k=1 / n Covariance Matrix M = [m ij ] = n (x k k=1 v )(x k v ) T n Variance (Matrix) m 11 L 2 s 1 L D M = M O M = M O M L m 2 pp L s p 7/19/9 A Primer on Clustering: I. Models and Algorithms 8

9 Label the similar objects What is Clustering? (Unsupervised Learning} What is similar? How many groups (c)? 7/19/9 A Primer on Clustering: I. Models and Algorithms 9

10 mixtures Data sets usually contain of objects Clustering groups and labels objects Easy for Humans! Hard for computers! 7/19/9 A Primer on Clustering: I. Models and Algorithms 1

No : Stop P2: Clustering Find (c) clusters Most important and difficult

11 Unlabeled Data X={x 1, x n } R p (or) R=[r ij ] R nxn Clustering 3 Problems P1: Tendency X or R has clusters? No : Stop P2: Clustering Find (c) clusters Most important and difficult problem! No P3: Validity Are c clusters OK? Yes : Stop 7/19/9 A Primer on Clustering: I. Models and Algorithms 11

Unipolar Label Vectors @ c = 3 Crisp = Vertices N h3 Fuzzy or Probabilistic = Triangle

12 Unipolar Label c = 3 Crisp = Vertices N h3 Fuzzy or Probabilistic = Triangle Possibilistic = Cube-{} N p3 N f3 7/19/9 A Primer on Clustering: I. Models and Algorithms 12

13 U = row i M ship of all o k s in cluster i o 1 o k o n Objects u 11 L u 1k L u 1n M M M u i1 L u ik L u in M M M u c1 L u ck L u cn col k M ship of o k in each cluster Partition Matrices Membership Functions u i :O [,1] u i (o k )=u ik =M ship of o k in cluster i 7/19/9 A Primer on Clustering: I. Models and Algorithms 13

14 Crisp Fuzzy/Prob Possibilistic Row sums k u ik > same same Col sums i u ik = 1 same c u ik c i=1 M ships u ik {,1} u ik [,1] same Set Name M hcn M fcn M pcn Example Take a 2nd 7/19/9 A Primer on Clustering: I. Models and Algorithms 14

15 Clustering is NOT Classifier Design Data X C M pcn M fcn M hcn + Other parameters V, Clustering labels only the submitted data z N hc N pc R p D N fc Classifier D can label all the data in its domain D(z) 7/19/9 A Primer on Clustering: I. Models and Algorithms 15

16 Hardening Partitions Defuzzification (Deprobabilization = Bayes Rule for U M fcn!) U = M pcn H 1 (U) = U MM = M hcn resolve ties randomly -cut thresholding (< <1) filters chosen levels of m ship 1 H.9 (U) = M hcn H.8 (U) = 1 1 M hcn 7/19/9 A Primer on Clustering: I. Models and Algorithms 16

17 Clusters represented by partition matrix (U) or prototypes V = {v k } U c = U f = Crisp : hard boundaries Fuzzy : soft boundaries 7/19/9 A Primer on Clustering: I. Models and Algorithms 17

18 U models : find clusters U M pcn 3 Types of Basic Clustering Models Usually relational models, e.g., single linkage V=F(U) can be computed - if you want it V models : find point prototypes V R cp Usually local sequential models, e.g., SOM U=G(V) can be computed - super important for VL Data Methods (U,V) models : find (U,V) M pcn x R cp Usually global batch models, e.g., c-means (U, V, +) have other parameters, e.g., {p i }, { i } in GMD 7/19/9 A Primer on Clustering: I. Models and Algorithms 18

19 A Primer on Clustering : I.Models and Algorithms 2. U only models : SAHN partitioning Eu - homem de Budweiser do fósforo? Por que não eu? 3. V only models : point prototypes with SOM 4. (U, V) models : c-means with AO 5. (U V, +) models : GMD with EM-AO 7/19/9 A Primer on Clustering: I. Models and Algorithms 19

20 U Models : SAHN Clustering S Sequential : at most one U c M hcn for c from n to 1 A Agglomerative : begins at c=n, merges clusters until c=1 Devisive : begins at c=1, splits clusters until c=n H Hierarchichal : joined objects remain in nested clusters N Non-overlapping : {U k } M hcn are crisp partitions of O 7/19/9 A Primer on Clustering: I. Models and Algorithms 2

21 Algorithm SAHN : find U k M hcn for each k from n to 1 Object data: X R p Relational Data:R R nn Input convert to (or) Dissimilarity: D ii = i, D=D T D n n = [D ij ] = x i x j (or) Similarity: S ii =1 i, S=S T Pick Intercluster (set) distance d SAHN = d SL, d AL or d CL ONLY 1 user choice - no parameters or thresholds! A nice aspect of SAHN clustering Set k = 1 C = { {1}, {2},,{n} } C=initial clusters 7/19/9 A Primer on Clustering: I. Models and Algorithms 21

22 Dissimilarity Input Case For k=2 to n Find W,V in C to Merge W and V min d k = d SAHN,k (W,V) = WV=(WV) min S,T C S T {d SAHN,k (S,T)} Update C C*=C-{W,V} C = C*WV Update D Delete rows/cols in D for W,V Add row/col in D for new WV {d SAHN (WV, S) : S C, S WV} Outputs Crisp partitions {U SAHN,k M h(n-k+1)n : 1kn} Merger distances {min d k : 1kn} 7/19/9 A Primer on Clustering: I. Models and Algorithms 22

23 Geometry of Set Distances (dissimilarities) Single linkage (set) distance Complete linkage (set) distance V V W W d SL (W,V) = min{d ij } i W jv d CL (W,V) = max{d ij } i W jv 7/19/9 A Primer on Clustering: I. Models and Algorithms 23

24 Geometry of Set Distances (dissimilarities) Average linkage (set) distance V W d AL (W,V) = i W jv D ij W V 7/19/9 A Primer on Clustering: I. Models and Algorithms 24

25 SL Works Geometry of Linkage Clusters (dissimilarities) CL Fails AL supposedly mediates between these 2 extremes SL Fails CL Works 7/19/9 A Primer on Clustering: I. Models and Algorithms 25

26 7/19/9 A Primer on Clustering: I. Models and Algorithms 26 D = Dissimilarity Matrix of Euclidean Distances 5 points X in R Ex.1 SL Clustering

27 MST on 5 nodes shows evolution of SL clusters 1st Clusters C 5 ={ {1},{2},{3},{4},{5} } 1st Merger C 3 = {3,4,5} 2nd Merger C 2 = {1,2} 3rd Merger C 1 = {1,2,3,4,5} D = D = D = 2.24 D = [] 7/19/9 A Primer on Clustering: I. Models and Algorithms 27

28 Dendogram on 5 nodes shows evolution of SL clusters. SL Merger Distances min d k c=5 c=3 c= c=1 7/19/9 A Primer on Clustering: I. Models and Algorithms 28

Naturalmente posso fechar fora a Espanha V

29 Naturalmente posso fechar fora a Espanha V only models : SOM point prototypes A Primer on Clustering : I.Models and Algorithms 4. (U, V) models : c-means with AO 5. (U V, +) models : GMD with EM-AO 7/19/9 A Primer on Clustering: I. Models and Algorithms 29

30 V Models : 2 ways to get Point Prototypes V R cp V S SELECTED in X X V S V R BUILT from X X V R 7/19/9 A Primer on Clustering: I. Models and Algorithms 3

31 Point Prototype Algorithms : MANY choices 7/19/9 A Primer on Clustering: I. Models and Algorithms 31

32 Three Kinds of CL Models Nbhd. Updated Examples (N t ) -1 = v i Winner only SHCM, VQ (N t ) -1 V (N t ) -1 = V A subset of V All c nodes SOM GVQ-F, SCS, DR (N t ) -1 = update "neighborhood" in R p at iterate t can be imbedded in definition of learning rates The update subset of V is not a real neighborhood in R p ; the neighborhood is determined in display space 7/19/9 A Primer on Clustering: I. Models and Algorithms 32

33 x 1 Input Layer Hidden Layer x v 1 Network Architecture of SOM/VQ Models Output Layer x 2 x p x i x v j argmin{ x v } = s j j Winner v s x p x v c 7/19/9 A Primer on Clustering: I. Models and Algorithms 33

34 Input User picks Initialize Unlabeled Object data: X R p c,, T,{ ik,t }; {N 1 t }; p; err V = (v 1,,,v c, ) cp Basic CL Algorithm For t = to T For k = 1 to n s = argmin{ x k v } j,t j v i,t+1 = v i,t i,t (x k v i,t ) v i,t N 1 t (v s,t ) If V t+1 V t err STOP Output Else next k V* cp Update Nbhd is Fcn of Winner 7/19/9 A Primer on Clustering: I. Models and Algorithms 34

35 The Update Geometry of Most CL Models =1 x v new (x-v old ) + (excitation) v old = (inhibition) v new = v old + (x v old ) 7/19/9 A Primer on Clustering: I. Models and Algorithms 35

36 Old Hidden Layer in R p "Display Space" in R q New Hidden Layer in R p v 1,t v k p cell k q 11 v 1,t v 2,t 1 2 v s,t wins in R p v 2,t v i,t i cell s,t in R q v i,t s j N t (v s,t ) in R q v s,t N t 1 (v s,t ) R p v s,t v j.t c new v s N 1 t (v s,t ) v j.t v c,t SOM/VQ Update Neighborhood N 1 t (v s,t ) v c,t 7/19/9 A Primer on Clustering: I. Models and Algorithms 36

37 VQ Example : IRIS ~ Each 4-vector represents an Iris flower n = 15 vectors p = 4 features c = 3 physical subspecies n i = 5 vectors each class x k = x 1k sepal length x 2k sepal width x 3k petal length x 4k petal width c = 3 physical clusters Iris Data, Features (3,4) But only c = 2 geometric clusters 7/19/9 A Primer on Clustering: I. Models and Algorithms 37

38 VQ Prototypes : Initializations I 1, I 2 in conv(iris) I 1 = V Final V Same V* min err =.1 I 2 = m = m M Final V M + m 2 = max err =.225 M = See reference [BP95] for algorithmic parameters 7/19/9 A Primer on Clustering: I. Models and Algorithms 38

39 7/19/9 A Primer on Clustering: I. Models and Algorithms 39 Sestosa Sestosa same same -but not so hot but not so hot- for I for I 3, I, I 4 VQ Prototypes VQ Prototypes : Initializations I 3, I 4 outside outside conv(iris) V Final Final Final Final I V I 4 YIKES!!! YIKES!!! Clusters U=G(V) Clusters U=G(V) you can get with you can get with these prototypes these prototypes will will SUCK!!! SUCK!!!

40 Things you need to about SOM/VQ Kohonen does NOT advocate SOM for clustering! MANY schemes for learning rates { i,t } and neighborhood N t Learning rates { i,t } with (t) force termination artificially Learning rates often linear { i,t = i, (1-(t/T)): t=1,2,,t} Good initialization for V is crucial < k,t < 1 Guarantee {V t } (some) V* but k,t = 2 k,t < properties of V* are unknown! 7/19/9 A Primer on Clustering: I. Models and Algorithms 4

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods