Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Size: px
Start display at page:

Download "Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme"

Transcription

1 Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 / 37

2 Outline 1. k-means & k-medoids 2. Gaussian Mixture Models 3. Hierarchical Cluster Analysis 2 / 37

3 Syllabus Tue (1) 0. Introduction A. Supervised Learning Wed (2) A.1 Linear Regression Tue (3) A.2 Linear Classification Wed (4) A.3 Regularization Tue (5) A.4 High-dimensional Data Wed (6) A.5 Nearest-Neighbor Models Tue (7) A.6 Decision Trees Wed (8) A.7 Support Vector Machines Tue (9) A.8 A First Look at Bayesian and Markov Networks B. Unsupervised Learning Wed (10) B.1 Clustering Tue (11) B.2 Dimensionality Reduction Wed (12) B.3 Frequent Pattern Mining C. Reinforcement Learning Tue (13) C.1 State Space Models Wed (14) C.2 Markov Decision Processes 1 / 37

4 1. k-means & k-medoids Outline 1. k-means & k-medoids 2. Gaussian Mixture Models 3. Hierarchical Cluster Analysis 1 / 37

5 1. k-means & k-medoids Partitions Let X be a set. A set P P(X ) of subsets of X is called a partition of X if the subsets 1. are pairwise disjoint: A B =, A, B P, A B 2. cover X : A = X, and A P 3. do not contain the empty set: P. 1 / 37

6 1. k-means & k-medoids Partitions Let X := {x 1,..., x N } be a finite set. A set P := {X 1,..., X K } of subsets X k X is called a partition of X if the subsets 1. are pairwise disjoint: X k X j =, k, j {1,..., K}, k j K 2. cover X : X k = X, and k=1 3. do not contain the empty set: X k, k {1,..., K}. The sets X k are also called clusters, a partition P a clustering. K N is called number of clusters. Part(X ) denotes the set of all partitions of X. 1 / 37

7 1. k-means & k-medoids Partitions Let X be a finite set. A surjective function p : {1,..., X } {1,..., K} is called a partition function of X. The sets X k := p 1 (k) form a partition P := {X 1,..., X K }. 1 / 37

8 1. k-means & k-medoids Partitions Let X := {x 1,..., x N } be a finite set. A binary N K matrix P {0, 1} N K is called a partition matrix of X if it 1. is row-stochastic: K P i,k = 1, i {1,..., N}, k {1 2. does not contain a zero column: X i,k (0,..., 0) T, k {1,..., K}. k=1 The sets X k := {i {1,..., N} P i,k = 1} form a partition P := {X 1,..., X K }. P.,k is called membership vector of class k. 1 / 37

9 1. k-means & k-medoids The Cluster Analysis Problem Given a set X called data space, e.g., X := R m, a set X X called data, and a function D : X X Part(X ) R + 0 called distortion measure where D(P) measures how bad a partition P Part(X ) for a data set X X is, find a partition P = {X 1, X 2,... X K } Part (X ) with minimal distortion D(P). 2 / 37

10 1. k-means & k-medoids The Cluster Analysis Problem (given K) Given a set X called data space, e.g., X := R m, a set X X called data, a function D : X X Part(X ) R + 0 called distortion measure where D(P) measures how bad a partition P Part(X ) for a data set X X is, and a number K N of clusters, find a partition P = {X 1, X 2,... X K } Part K (X ) with K clusters with minimal distortion D(P). 2 / 37

11 1. k-means & k-medoids k-means: Distortion Sum of Distances to Cluster Centers Sum of squared distances to cluster centers: with D(P) := K k=1 n i=1: P i.k =1 x i µ k 2 µ k := mean {x i P i,k = 1, i = 1,..., n} 3 / 37

12 1. k-means & k-medoids k-means: Distortion Sum of Distances to Cluster Centers Sum of squared distances to cluster centers: with D(P) := n i=1 k=1 K P i,k x i µ k 2 = K k=1 n i=1: P i.k =1 x i µ k 2 µ k := n i=1 P i,kx i n i=1 P i,k = mean {x i P i,k = 1, i = 1,..., n} 3 / 37

13 1. k-means & k-medoids k-means: Distortion Sum of Distances to Cluster Centers Sum of squared distances to cluster centers: with D(P) := n i=1 k=1 K P i,k x i µ k 2 = K k=1 n i=1: P i.k =1 µ k := mean {x i P i,k = 1, i = 1,..., n} x i µ k 2 Minimizing D over partitions with varying number of clusters leads to singleton clustering with distortion 0; only the cluster analysis problem with given K makes sense. Minimizing D is not easy as reassigning a point to a different cluster also shifts the cluster centers. 3 / 37

14 1. k-means & k-medoids k-means: Minimizing Distances to Cluster Centers Add cluster centers µ as auxiliary optimization variables: D(P, µ) := n K P i,k x i µ k 2 i=1 k=1 4 / 37

15 1. k-means & k-medoids k-means: Minimizing Distances to Cluster Centers Add cluster centers µ as auxiliary optimization variables: D(P, µ) := n K P i,k x i µ k 2 i=1 k=1 Block coordinate descent: 1. fix µ, optimize P reassign data points to clusters: P i,k := δ(k = l i ), l i := arg min x i µ k 2 k {1,...,K} 4 / 37

16 1. k-means & k-medoids k-means: Minimizing Distances to Cluster Centers Add cluster centers µ as auxiliary optimization variables: Block coordinate descent: D(P, µ) := n i=1 k=1 K P i,k x i µ k 2 1. fix µ, optimize P reassign data points to clusters: P i,k := δ(k = l i ), l i := arg min x i µ k 2 k {1,...,K} 2. fix P, optimize µ recompute cluster centers: µ k := n i=1 P i,kx i n i=1 P i,k Iterate until partition is stable. 4 / 37

17 1. k-means & k-medoids k-means: Initialization k-means is usually initialized by picking K data points as cluster centers at random: 1. pick the first cluster center µ 1 out of the data points at random and then 2. sequentially select the data point with the largest sum of distances to already choosen cluster centers as next cluster center k 1 µ k := x i, i := arg max i {1,...,n} x i µ l 2, l=1 k = 2,..., K 5 / 37

18 1. k-means & k-medoids k-means: Initialization k-means is usually initialized by picking K data points as cluster centers at random: 1. pick the first cluster center µ 1 out of the data points at random and then 2. sequentially select the data point with the largest sum of distances to already choosen cluster centers as next cluster center k 1 µ k := x i, i := arg max i {1,...,n} x i µ l 2, l=1 k = 2,..., K Different initializations may lead to different local minima. run k-means with different random initializations and keep only the one with the smallest distortion (random restarts). 5 / 37

19 1. k-means & k-medoids k-means Algorithm 1: procedure cluster-kmeans(d := {x 1,..., x N } R M, K N, ɛ R + ) 2: i 1 unif({1,..., N}), µ 1 := x i1 3: for k := 2,..., K do 4: i k := arg max k 1 n {1,...,N} l=1 xn µ l, µ i := x ik,. 5: repeat 6: µ old := µ 7: for n := 1,..., N do 8: P n := arg min k {1,...,K} x n µ k 9: for k := 1,..., K do 10: µ k := mean {x n P n = k} K k=1 µ k µ old k < ɛ 1 K 11: until 12: return P Note: In implementations, the two loops over the data (lines 6 and 9) can be combined in one loop. 6 / 37

20 1. k-means & k-medoids Example x 1 x 2 x 3 x 4 x 5 x 6 7 / 37

21 1. k-means & k-medoids Example x 1 x 2 x 3 x 4 x 5 x 6 x 1 x 2 x 3 x 4 = µ 1 x 5 x 6 7 / 37

22 1. k-means & k-medoids Example x 1 x 2 x 3 x 4 x 5 x 6 x 1 = µ 2 x 2 x 3 x 4 = µ 1 x 5 x 6 7 / 37

23 1. k-means & k-medoids Example x 1 x 2 x 3 x 4 x 5 x 6 x 1 = µ 2 x 2 x 3 x 4 = µ 1 x 5 x 6 x 1 = µ 2 x 2 x 3 x 4 = µ 1 x 5 x 6 7 / 37

24 1. k-means & k-medoids Example x 1 x 2 x 3 x 4 x 5 x 6 x 1 = µ 2 x 2 x 3 x 4 = µ 1 x 5 x 6 x 1 = µ 2 x 2 x 3 x 4 = µ 1 x 5 x 6 x 1 µ 2 x 2 x 3 x 4 µ 1 x 5 x 6 7 / 37

25 1. k-means & k-medoids Example x 1 x 2 x 3 x 4 x 5 x 6 x 1 = µ 2 x 2 x 3 x 4 = µ 1 x 5 x 6 x 1 = µ 2 x 2 x 3 x 4 = µ 1 x 5 x 6 x 1 µ 2 x 2 x 3 x 4 µ 1 x 5 x 6 x 1 µ 2 x 2 x 3 x 4 µ 1 x 5 x 6 7 / 37

26 1. k-means & k-medoids Example x 1 x 2 x 3 x 4 x 5 x 6 x 1 = µ 2 x 2 x 3 x 4 = µ 1 x 5 x 6 x 1 = µ 2 x 2 x 3 x 4 = µ 1 x 5 x 6 x 1 µ 2 x 2 x 3 x 4 µ 1 x 5 x 6 x 1 µ 2 x 2 x 3 x 4 µ 1 x 5 x 6 x 1 x 2 = µ 2 x 3 x 4 x 5 = µ 1 x 6 7 / 37

27 1. k-means & k-medoids Example x 1 x 2 x 3 x 4 x 5 x 6 x 1 = µ 2 x 2 x 3 x 4 = µ 1 x 5 x 6 x 1 = µ 2 x 2 x 3 x 4 = µ 1 x 5 x 6 x 1 µ 2 x 2 x 3 x 4 µ 1 x 5 x 6 x 1 µ 2 x 2 x 3 x 4 µ 1 x 5 x 6 x 1 x 2 = µ 2 x 3 x 4 x 5 = µ 1 x 6 x 1 x 2 = µ 2 x 3 x 4 x 5 = µ 1 x 6 7 / 37

28 1. k-means & k-medoids Example x 1 x 2 x 3 x 4 x 5 x 6 x 1 = µ 2 x 2 x 3 x 4 = µ 1 x 5 x 6 x 1 = µ 2 x 2 x 3 x 4 = µ 1 x 5 x 6 d = 33 x 1 µ 2 x 2 x 3 x 4 µ 1 x 5 x 6 x 1 µ 2 x 2 x 3 x 4 µ 1 x 5 x 6 d = 23.7 x 1 x 2 = µ 2 x 3 x 4 x 5 = µ 1 x 6 x 1 x 2 = µ 2 x 3 x 4 x 5 = µ 1 x 6 d = 16 7 / 37

29 1. k-means & k-medoids How Many Clusters K? n σ 2 D 0 1 n K 8 / 37

30 1. k-means & k-medoids How Many Clusters K? n σ 2 D 0 1 n K 8 / 37

31 1. k-means & k-medoids k-medoids: k-means for General Distances One can generalize k-means to general distances d: D(P, µ) := n K P i,k d(x i, µ k ) i=1 k=1 9 / 37

32 1. k-means & k-medoids k-medoids: k-means for General Distances One can generalize k-means to general distances d: D(P, µ) := n K P i,k d(x i, µ k ) i=1 k=1 step 1 assigning data points to clusters remains the same P i,k := arg min d(x i, µ k ) k {1,...,K} but step 2 finding the best cluster representatives µ k is not solved by the mean and may be difficult in general. 9 / 37

33 1. k-means & k-medoids k-medoids: k-means for General Distances One can generalize k-means to general distances d: D(P, µ) := n i=1 k=1 K P i,k d(x i, µ k ) step 1 assigning data points to clusters remains the same P i,k := arg min d(x i, µ k ) k {1,...,K} but step 2 finding the best cluster representatives µ k is not solved by the mean and may be difficult in general. idea k-medoids: choose cluster representatives out of cluster data points: n µ k := x j, j := arg min j {1,...,n}:P j,k =1 P i,k d(x i, x j ) i=1 9 / 37

34 1. k-means & k-medoids k-medoids: k-means for General Distances k-medoids is a kernel method : it requires no access to the variables, just to the distance measure. For the Manhattan distance/l 1 distance, step 2 finding the best cluster representatives µ k can be solved without restriction to cluster data points: (µ k ) j := median{(x i ) j P i,k = 1, i = 1,..., n}, j = 1,..., m 10 / 37

35 2. Gaussian Mixture Models Outline 1. k-means & k-medoids 2. Gaussian Mixture Models 3. Hierarchical Cluster Analysis 11 / 37

36 2. Gaussian Mixture Models Soft Partitions: Row Stochastic Matrices Let X := {x 1,..., x N } be a finite set. A N K matrix P [0, 1] N K is called a soft partition matrix of X if it 1. is row-stochastic: K P i,k = 1, i {1,..., N}, k {1,..., K} 2. does not contain a zero column: X i,k (0,..., 0) T, k {1,..., K}. k=1 P i,k is called the membership degree of instance i in class k or the cluster weight of instance i in cluster k. P.,k is called membership vector of class k. SoftPart(X ) denotes the set of all soft partitions of X. Note: Soft partitions are also called soft clusterings and fuzzy clusterings. 11 / 37

37 2. Gaussian Mixture Models The Soft Clustering Problem Given a set X called data space, e.g., X := R m, a set X X called data, and a function D : X X SoftPart(X ) R + 0 called distortion measure where D(P) measures how bad a soft partition P SoftPart(X ) for a data set X X is, find a soft partition P SoftPart (X ) with minimal distortion D(P). 12 / 37

38 2. Gaussian Mixture Models The Soft Clustering Problem (with given K) Given a set X called data space, e.g., X := R m, a set X X called data, a function D : X X SoftPart(X ) R + 0 called distortion measure where D(P) measures how bad a soft partition P SoftPart(X ) for a data set X X is, and a number K N of clusters, find a soft partition P SoftPart K (X ) [0, 1] X K with K clusters with minimal distortion D(P). 12 / 37

39 2. Gaussian Mixture Models Mixture Models Mixture models assume that there exists an unobserved nominal variable Z with K levels: p(x, Z) = p(z)p(x Z) = K (π k p(x Z = k) δ(z=k) The complete data loglikelihood of the completed data (X, Z) then is l(θ; X, Z) := n i=1 k=1 k=1 K δ(z i = k)(ln π k + ln p(x = x i Z = k; θ k ) with Θ := (π 1,..., π K, θ 1,..., θ K ) l cannot be computed because z i s are unobserved. 13 / 37

40 2. Gaussian Mixture Models Mixture Models: Expected Loglikelihood Given an estimate Θ (t 1) of the parameters, mixtures aim to optimize the expected complete data loglikelihood: Q(Θ;Θ (t 1) ) := E[l(Θ; X, Z) Θ (t 1) ] n K = E[δ(Z i = k) x i, Θ (t 1) ](ln π k + ln p(x = x i Z = k; θ k )) i=1 k=1 which is relaxed to Q(Θ, r; Θ (t 1) ) = n i=1 k=1 K r i,k (ln π k + ln p(x = x i Z = k; θ k )) + (r i,k E[δ(Z i = k) x i, Θ (t 1) ]) 2 14 / 37

41 2. Gaussian Mixture Models Mixture Models: Expected Loglikelihood Block coordinate descent (EM algorithm): alternate until convergence 1. expectation step: r (t 1) i,k := E[δ(Z i = k) x i, Θ (t 1) ] = p(z = k X = x i ; Θ (t 1) ) = = 2. maximization step: p(x = x i Z = k; Θ (t 1) )p(z = k; Θ (t 1) ) K k =1 p(x = x i Z = k ; Θ (t 1) )p(z = k ; Θ (t 1) ) p(x = x i Z = k; θ (t 1) k )π (t 1) k K k =1 p(x = x i Z = k ; θ (t 1) k )π (t 1) k Θ (t) := arg max Q(Θ, r (t 1) ; Θ (t 1) ) Θ = arg max π 1,...,π K,θ 1,...,θ K n K r i,k (ln π k + ln p(x = x i Z = k; θ k )) i=1 k=1 (0) 15 / 37

42 2. Gaussian Mixture Models Mixture Models: Expected Loglikelihood 2. maximization step: Θ (t) = π (t) k = n i=1 arg max n π 1,...,π K,θ 1,...,θ K i=1 k=1 n i=1 r i,k n r i,k p(x = x i Z = k; θ k ) K r i,k (ln π k + ln p(x = x i Z = k; θ k )) (1) p(x = x i Z = k; θ k ) θ k = 0, k ( ) (*) needs to be solved for specific cluster specific distributions p(x Z). 16 / 37

43 2. Gaussian Mixture Models Gaussian Mixtures Gaussian mixtures: use Gaussians for p(x Z): p(x = x Z = k) = 1 (2π) m Σ k e 1 2 (x µ k) T Σ 1 k (x µ k ), θ k := (µ k, Σ k ) µ (t) k = Σ (t) k = n i=1 r (t 1) i,k x i k i=1 r (t 1) i,k n i=1 r (t 1) i,k (x i µ (t) k n i=1 r (t 1) i,k n i=1 = r (t 1) i,k xi T x i µ (t) k n i=1 r (t 1) i,k )T (x i µ (t) k ) T µ (t) k (2) (3) 17 / 37

44 2. Gaussian Mixture Models Gaussian Mixtures: EM Algorithm, Summary 1. expectation step: i, k r (t 1) 1 i,k = (2π) m Σ (t 1) k r (t 1) i,k r (t 1) i,k = K (t 1) k =1 r i,k 2. maximization step: k π (t) k = µ (t) k = Σ (t) k = n i=1 r (t 1) i,k n n i=1 r (t 1) i,k x i n i=1 r (t 1) i,k e 1 2 (x i µ (t 1) k n i=1 r (t 1) i,k xi T x i µ (t) k n i=1 r (t 1) T µ (t) k ) T Σ (t 1) k 1 (x i µ (t 1) k ) i,k (0a) (0b) (1) (2) (3) 18 / 37

45 2. Gaussian Mixture Models Gaussian Mixtures for Soft Clustering The responsibilities r [0, 1] N K are a soft partition. P := r The negative expected loglikelihood can be used as cluster distortion: D(P) := max Q(Θ, r) Θ To optimize D, we simply can run EM. 19 / 37

46 2. Gaussian Mixture Models Gaussian Mixtures for Soft Clustering The responsibilities r [0, 1] N K are a soft partition. P := r The negative expected loglikelihood can be used as cluster distortion: D(P) := max Q(Θ, r) Θ To optimize D, we simply can run EM. For hard clustering: assign points to the cluster with highest responsibility (hard EM): r (t 1) i,k = δ(k = arg max r (t 1) k i,k ) (0b ) =1,...,K 19 / 37

47 2. Gaussian Mixture Models Gaussian Mixtures for Soft Clustering / Example [Mur12, fig ] 20 / 37

48 2. Gaussian Mixture Models Gaussian Mixtures for Soft Clustering / Example [Mur12, fig ] 20 / 37

49 2. Gaussian Mixture Models Gaussian Mixtures for Soft Clustering / Example [Mur12, fig ] 20 / 37

50 2. Gaussian Mixture Models Model-based Cluster Analysis Different parametrizations of the covariance matrices Σ k restrict possible cluster shapes: full Σ: all sorts of ellipsoid clusters. diagonal Σ: ellipsoid clusters with axis-parallel axes unit Σ: spherical clusters. One also distinguishes cluster-specific Σ k : each cluster can have its own shape. shared Σ k = Σ: all clusters have the same shape. 21 / 37

51 2. Gaussian Mixture Models k-means: Hard EM with spherical clusters 1. expectation step: i, k r (t 1) 1 i,k = (2π) m Σ (t 1) k 1 = (2π) m e 1 2 (x i µ (t 1) k e 1 2 (x i µ (t 1) k ) T (x i µ (t 1) k ) ) T Σ (t 1) k 1 (x i µ (t 1) k ) (0a) r (t 1) i,k arg max r (t 1) k i,k = arg max =1,...,K k =1,...,K = δ(k = arg max r (t 1) k i,k ) (0b ) =1,...,K 1 = arg max k =1,...,K (2π) m e 1 2 (x i µ (t 1) k ) T (x i µ (t 1) k ) (x i µ (t 1) k ) T (x i µ (t 1) k ) = arg min x i µ (t 1) k k 2 =1,...,K 22 / 37

52 3. Hierarchical Cluster Analysis Outline 1. k-means & k-medoids 2. Gaussian Mixture Models 3. Hierarchical Cluster Analysis 23 / 37

53 3. Hierarchical Cluster Analysis Hierarchies Let X be a set. A tree (H, E), E H H edges pointing towards root with leaf nodes h corresponding bijectively to elements x h X plus a surjective map L : H {0,..., d}, d N with L(root) = 0 and L(h) = d for all leaves h H and L(h) L(g) for all (g, h) E called level map is called an hierarchy over X. 23 / 37

54 3. Hierarchical Cluster Analysis Hierarchies Let X be a set. A tree (H, E), E H H edges pointing towards root with leaf nodes h corresponding bijectively to elements x h X plus a surjective map L : H {0,..., d}, d N with L(root) = 0 and L(h) = d for all leaves h H and L(h) L(g) for all (g, h) E called level map is called an hierarchy over X. d is called the depth of the hierarchy. Hier(X ) denotes the set of all hierarchies over X. 23 / 37

55 3. Hierarchical Cluster Analysis Hierarchies / Example X : x 1 x 2 x 3 x 4 x 5 x 6 24 / 37

56 3. Hierarchical Cluster Analysis Hierarchies / Example X : x 1 x 2 x 3 x 4 x 5 x 6 24 / 37

57 3. Hierarchical Cluster Analysis Hierarchies / Example X : x 1 x 3 x 4 x 2 x 5 x 6 24 / 37

58 3. Hierarchical Cluster Analysis Hierarchies / Example L = 0 L = 1 L = 2 L = 3 L = 4 X : x 1 x 3 x 4 x 2 x 5 x 6 L = 5 24 / 37

59 3. Hierarchical Cluster Analysis Hierarchies: Nodes Correspond to Subsets Let (H, E) be such an hierarchy: nodes of an hierarchy correspond to subsets of X : leaf nodes h correspond to a singleton subset by definition. subset(h) := {x h }, x h X corresponding to leaf h interior nodes h correspond to the union of the subsets of their children: subset(h) := g H (g,h) E subset(g) thus the root node h corresponds to the full set X : subset(h) = X 25 / 37

60 3. Hierarchical Cluster Analysis Hierarchies: Nodes Correspond to Subsets {x 1, x 3, x 4, x 2, x 5, x 6 } {x 1, x 3, x 4 } {x 2, x 5, x 6 } {x 2, x 5 } {x 1, x 3 } X : {x 1 } {x 3 } {x 4 } {x 2 } {x 5 } {x 6 } 25 / 37

61 3. Hierarchical Cluster Analysis Hierarchies: Levels Correspond to Partitions Let (H, E) be such an hierarchy: levels l {0,..., d} correspond to partitions P l (H, L) := {h H L(h) l, g H : L(g) l, h g} 26 / 37

62 3. Hierarchical Cluster Analysis Hierarchies: Levels Correspond to Partitions {x 1, x 3, x 4, x 2, x 5, x 6 } {{x 1, x 3, x 4, x 2, x 5, x 6 }} {x 1, x 3, x 4 } {{x 1, x 3, x 4 }, {x 2, x 5, x 6 }} {x 2, x 5, x 6 } {x 2, x 5 } {{x 1, x 3 }, {x 4 }, {x 2, x 5, x 6 }} {{x 1, x 3 }, {x 4 }, {x 2, x 5 }, {x 6 }} {x 1, x 3 } {{x 1, x 3 }, {x 4 }, {x 2 }, {x 5 }, {x 6 }} {x 1 } {x 3 } {x 4 } {x 2 } {x 5 } {x 6 } {{x 1 }, {x 3 }, {x 4 }, {x 2 }, {x 5 }, {x 6 }} 26 / 37

63 3. Hierarchical Cluster Analysis The Hierarchical Cluster Analysis Problem Given a set X called data space, e.g., X := R m, a set X X called data and a function D : X X Hier(X ) R + 0 called distortion measure where D(P) measures how bad a hierarchy H Hier(X ) for a data set X X is, find a hierarchy H Hier(X ) with minimal distortion D(H). 27 / 37

64 3. Hierarchical Cluster Analysis Distortions for Hierarchies Examples for distortions for hierarchies: n D(H) := D(P K (H)) K=1 where P K (H) denotes the partition at level K 1 (with K classes) and D denotes a distortion for partitions. 28 / 37

65 3. Hierarchical Cluster Analysis Agglomerative and Divisive Hierarchical Clustering Hierarchies are usually learned by greedy search level by level: agglomerative clustering: 1. start with the singleton partition P n : P n := {X k k = 1,..., n}, X k := {x k }, k = 1,..., n 2. in each step K = n,..., 2 build P K 1 by joining the two clusters k, l {1,..., K} that lead to the minimal distortion D({X 1,..., X k,..., X l,..., X K, X k X l ) Note: X k denotes that the class X k is omitted from the partition. 29 / 37

66 3. Hierarchical Cluster Analysis Agglomerative and Divisive Hierarchical Clustering Hierarchies are usually learned by greedy search level by level: divisive clustering: 1. start with the all partition P 1 : P 1 := {X } 2. in each step K = 1, n 1 build P K+1 by splitting one cluster X k in two clusters X k, X l that lead to the minimal distortion D({X 1,..., X k,..., X K, X k, X l), X k = X k X l Note: X k denotes that the class X k is omitted from the partition. 29 / 37

67 3. Hierarchical Cluster Analysis Class-wise Defined Partition Distortions If the partition distortion can be written as a sum of distortions of its classes, K D({X 1,..., X K }) = D(X k ) then the optimal pair does only depend on X k, X l : D({X 1,..., X k,..., X l,..., X K, X k X l ) = D(X k X l ) ( D(X k ) + D(X l )) k=1 30 / 37

68 3. Hierarchical Cluster Analysis Closest Cluster Pair Partition Distortions For a cluster distance d : P(X ) P(X ) R + 0 with d(a B, C) min{ d(a, C), d(b, C)}, A, B, C X a partition can be judged by the closest cluster pair it contains: D({X 1,..., X K }) = min d(x k, X l ) k,l=1,k k l Such a distortion has to be maximized. To increase it, the closest cluster pair has to be joined. 31 / 37

69 3. Hierarchical Cluster Analysis Single Link Clustering d sl (A, B) := min d(x, y), A, x A,y B B X 32 / 37

70 3. Hierarchical Cluster Analysis Complete Link Clustering d cl (A, B) := max d(x, y), A, x A,y B B X 33 / 37

71 3. Hierarchical Cluster Analysis Average Link Clustering d al (A, B) := 1 A B x A,y B d(x, y), A, B X 34 / 37

72 3. Hierarchical Cluster Analysis Recursion Formulas for Cluster Distances d sl (X i X j, X k ) := min d(x, y) x X i X j,y X k = min{ min d(x, y), min d(x, y)} x X i,y X k x X j,y X k = min{d sl (X i, X k ), d sl (X j, X k )} 35 / 37

73 3. Hierarchical Cluster Analysis Recursion Formulas for Cluster Distances d sl (X i X j, X k ) = min{d sl (X i, X k ), d sl (X j, X k )} d cl (X i X j, X k ) := max d(x, y) x X i X j,y X k = max{ max d(x, y), max d(x, y)} x X i,y X k x X j,y X k = max{d cl (X i, X k ), d cl (X j, X k )} 35 / 37

74 3. Hierarchical Cluster Analysis Recursion Formulas for Cluster Distances d sl (X i X j, X k ) = min{d sl (X i, X k ), d sl (X j, X k )} d cl (X i X j, X k ) = max{d cl (X i, X k ), d cl (X j, X k )} 1 d al (X i X j, X k ) := d(x, y) X i X j X k x X i X j,y X k = X i 1 d(x, y) X i X j X i X k x X i,y X k + X j 1 d(x, y) X i X j X j X k x X j,y X k = X i X i + X j d X j al(x i, X k ) + X i + X j d al(x j, X k ) 35 / 37

75 3. Hierarchical Cluster Analysis Recursion Formulas for Cluster Distances d sl (X i X j, X k ) = min{d sl (X i, X k ), d sl (X j, X k )} d cl (X i X j, X k ) = max{d cl (X i, X k ), d cl (X j, X k )} d al (X i X j, X k ) = X i X i + X j d X j al(x i, X k ) + X i + X j d al(x j, X k ) agglomerative hierarchical clustering requires to compute the distance matrix D R n n only once: D i,j := d(x i, x j ), i, j = 1,..., K 35 / 37

76 3. Hierarchical Cluster Analysis Conclusion (1/2) Cluster analysis aims at detecting latent groups in data, without labeled examples ( record linkage). Latent groups can be described in three different granularities: partitions segment data into K subsets (hard clustering). hierarchies structure data into an hierarchy, in a sequence of consistent partitions (hierarchical clustering). soft clusterings / row-stochastic matrices build overlapping groups to which data points can belong with some membership degree (soft clustering). k-means finds a K-partition by finding K cluster centers with smallest Euclidean distance to all their cluster points. k-medoids generalizes k-means to general distances; it finds a K-partition by selecting K data points as cluster representatives with smallest distance to all their cluster points. 36 / 37

77 3. Hierarchical Cluster Analysis Conclusion (2/2) hierarchical single link, complete link and average link methods find a hierarchy by greedy search over consistent partitions, starting from the singleton parition (agglomerative) being efficient due to recursion formulas, requiring only a distance matrix. Gaussian Mixture Models find soft clusterings by modeling data by a class-specific multivariate Gaussian distribution p(x Z) and estimating expected class memberships (expected likelihood). The Expectation Maximiation Algorithm (EM) can be used to learn Gaussian Mixture Models via block coordinate descent. k-means is a special case of a Gaussian Mixture Model with hard/binary cluster memberships (hard EM) and spherical cluster shapes. 37 / 37

78 Readings k-means: [HTFF05], ch , , 8.5 [Bis06], ch. 9.1, [Mur12], ch hierarchical cluster analysis: [HTFF05], ch , [Mur12], ch [PTVF07], ch Gaussian mixtures: [HTFF05], ch , [Bis06], ch. 9.2, [Mur12], ch , [PTVF07], ch / 37

79 References Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, Trevor Hastie, Robert Tibshirani, Jerome Friedman, and James Franklin. The elements of statistical learning: data mining, inference and prediction, volume Kevin P. Murphy. Machine learning: a probabilistic perspective. The MIT Press, William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical Recipes. Cambridge University Press, 3rd edition, / 37

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,

More information

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Cluster Analysis: Agglomerate Hierarchical Clustering

Cluster Analysis: Agglomerate Hierarchical Clustering Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22 INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Unsupervised Learning: Clustering Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com (Some material

More information

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

Clustering web search results

Clustering web search results Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means

More information

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:

More information

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B Clustering in R d Clustering CSE 250B Two common uses of clustering: Vector quantization Find a finite set of representatives that provides good coverage of a complex, possibly infinite, high-dimensional

More information

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008 Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Methods for Intelligent Systems

Methods for Intelligent Systems Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

Mixture Models and EM

Mixture Models and EM Table of Content Chapter 9 Mixture Models and EM -means Clustering Gaussian Mixture Models (GMM) Expectation Maximiation (EM) for Mixture Parameter Estimation Introduction Mixture models allows Complex

More information

Inference and Representation

Inference and Representation Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)

More information

Clustering and The Expectation-Maximization Algorithm

Clustering and The Expectation-Maximization Algorithm Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Unsupervised: no target value to predict

Unsupervised: no target value to predict Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning

More information

Hierarchical Clustering 4/5/17

Hierarchical Clustering 4/5/17 Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions. Direction

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other

More information

Expectation Maximization (EM) and Gaussian Mixture Models

Expectation Maximization (EM) and Gaussian Mixture Models Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation

More information

Introduction to Mobile Robotics

Introduction to Mobile Robotics Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,

More information

Clustering algorithms

Clustering algorithms Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 1 / 22 Table of contents 1 Supervised

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Pierre Gaillard ENS Paris September 28, 2018 1 Supervised vs unsupervised learning Two main categories of machine learning algorithms: - Supervised learning: predict output Y from

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI

More information

Chapter 6 Continued: Partitioning Methods

Chapter 6 Continued: Partitioning Methods Chapter 6 Continued: Partitioning Methods Partitioning methods fix the number of clusters k and seek the best possible partition for that k. The goal is to choose the partition which gives the optimal

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

Cluster analysis formalism, algorithms. Department of Cybernetics, Czech Technical University in Prague.

Cluster analysis formalism, algorithms. Department of Cybernetics, Czech Technical University in Prague. Cluster analysis formalism, algorithms Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz poutline motivation why clustering? applications, clustering as

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering K-means and Hierarchical Clustering Xiaohui Xie University of California, Irvine K-means and Hierarchical Clustering p.1/18 Clustering Given n data points X = {x 1, x 2,, x n }. Clustering is the partitioning

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 9: Data Mining (4/4) March 9, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1 Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Unsupervised learning Daniel Hennes 29.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Supervised learning Regression (linear

More information

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM. Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)

More information

Objective of clustering

Objective of clustering Objective of clustering Discover structures and patterns in high-dimensional data. Group data with similar patterns together. This reduces the complexity and facilitates interpretation. Expression level

More information

Machine learning - HT Clustering

Machine learning - HT Clustering Machine learning - HT 2016 10. Clustering Varun Kanade University of Oxford March 4, 2016 Announcements Practical Next Week - No submission Final Exam: Pick up on Monday Material covered next week is not

More information

http://www.xkcd.com/233/ Text Clustering David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Administrative 2 nd status reports Paper review

More information

Based on Raymond J. Mooney s slides

Based on Raymond J. Mooney s slides Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Kinds of Clustering Sequential Fast Cost Optimization Fixed number of clusters Hierarchical

More information

CS Introduction to Data Mining Instructor: Abdullah Mueen

CS Introduction to Data Mining Instructor: Abdullah Mueen CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information

Clustering. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 238

Clustering. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 238 Clustering Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2015 163 / 238 What is Clustering? Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester

More information

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.

More information

Data Clustering. Danushka Bollegala

Data Clustering. Danushka Bollegala Data Clustering Danushka Bollegala Outline Why cluster data? Clustering as unsupervised learning Clustering algorithms k-means, k-medoids agglomerative clustering Brown s clustering Spectral clustering

More information

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

Expectation Maximization!

Expectation Maximization! Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University and http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Steps in Clustering Select Features

More information

ALTERNATIVE METHODS FOR CLUSTERING

ALTERNATIVE METHODS FOR CLUSTERING ALTERNATIVE METHODS FOR CLUSTERING K-Means Algorithm Termination conditions Several possibilities, e.g., A fixed number of iterations Objects partition unchanged Centroid positions don t change Convergence

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Clustering: K-means and Kernel K-means

Clustering: K-means and Kernel K-means Clustering: K-means and Kernel K-means Piyush Rai Machine Learning (CS771A) Aug 31, 2016 Machine Learning (CS771A) Clustering: K-means and Kernel K-means 1 Clustering Usually an unsupervised learning problem

More information

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) 10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) Rahil Mahdian 01.04.2016 LSV Lab, Saarland University, Germany What is clustering? Clustering is the classification of objects into different groups,

More information

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

INF4820, Algorithms for AI and NLP: Hierarchical Clustering INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score

More information

Clustering. Unsupervised Learning

Clustering. Unsupervised Learning Clustering. Unsupervised Learning Maria-Florina Balcan 04/06/2015 Reading: Chapter 14.3: Hastie, Tibshirani, Friedman. Additional resources: Center Based Clustering: A Foundational Perspective. Awasthi,

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

CLUSTERING. JELENA JOVANOVIĆ Web:

CLUSTERING. JELENA JOVANOVIĆ   Web: CLUSTERING JELENA JOVANOVIĆ Email: jeljov@gmail.com Web: http://jelenajovanovic.net OUTLINE What is clustering? Application domains K-Means clustering Understanding it through an example The K-Means algorithm

More information

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

k-means demo Administrative Machine learning: Unsupervised learning Assignment 5 out Machine learning: Unsupervised learning" David Kauchak cs Spring 0 adapted from: http://www.stanford.edu/class/cs76/handouts/lecture7-clustering.ppt http://www.youtube.com/watch?v=or_-y-eilqo Administrative

More information

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2 So far in the course Clustering Subhransu Maji : Machine Learning 2 April 2015 7 April 2015 Supervised learning: learning with a teacher You had training data which was (feature, label) pairs and the goal

More information

COMS 4771 Clustering. Nakul Verma

COMS 4771 Clustering. Nakul Verma COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find

More information

High throughput Data Analysis 2. Cluster Analysis

High throughput Data Analysis 2. Cluster Analysis High throughput Data Analysis 2 Cluster Analysis Overview Why clustering? Hierarchical clustering K means clustering Issues with above two Other methods Quality of clustering results Introduction WHY DO

More information

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Clustering & Dimensionality Reduction. 273A Intro Machine Learning Clustering & Dimensionality Reduction 273A Intro Machine Learning What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning

More information

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole Cluster Analysis Summer School on Geocomputation 27 June 2011 2 July 2011 Vysoké Pole Lecture delivered by: doc. Mgr. Radoslav Harman, PhD. Faculty of Mathematics, Physics and Informatics Comenius University,

More information

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................

More information

K-Means Clustering. Sargur Srihari

K-Means Clustering. Sargur Srihari K-Means Clustering Sargur srihari@cedar.buffalo.edu 1 Topics in Mixture Models and EM Mixture models K-means Clustering Mixtures of Gaussians Maximum Likelihood EM for Gaussian mistures EM Algorithm Gaussian

More information

9.1. K-means Clustering

9.1. K-means Clustering 424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific

More information

Cluster Analysis. Debashis Ghosh Department of Statistics Penn State University (based on slides from Jia Li, Dept. of Statistics)

Cluster Analysis. Debashis Ghosh Department of Statistics Penn State University (based on slides from Jia Li, Dept. of Statistics) Cluster Analysis Debashis Ghosh Department of Statistics Penn State University (based on slides from Jia Li, Dept. of Statistics) Summer School in Statistics for Astronomers June 1-6, 9 Clustering: Intuition

More information

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015 Clustering Subhransu Maji CMPSCI 689: Machine Learning 2 April 2015 7 April 2015 So far in the course Supervised learning: learning with a teacher You had training data which was (feature, label) pairs

More information

Clustering. Supervised vs. Unsupervised Learning

Clustering. Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan Working with Unlabeled Data Clustering Analysis Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan chanhl@mail.cgu.edu.tw Unsupervised learning Finding centers of similarity using

More information

Multivariate Analysis

Multivariate Analysis Multivariate Analysis Cluster Analysis Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Unsupervised Learning Cluster Analysis Natural grouping Patterns in the data

More information

Unsupervised Learning and Data Mining

Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Supervised Learning ó Decision trees ó Artificial neural nets ó K-nearest neighbor ó Support vectors ó Linear regression

More information

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information