Clustering. Machine Learning: Jordan Boyd-Graber University of Maryland K -MEANS AND GMM. Machine Learning: Jordan Boyd-Graber UMD Clustering 1 / 1

Size: px

Start display at page:

Download "Clustering. Machine Learning: Jordan Boyd-Graber University of Maryland K -MEANS AND GMM. Machine Learning: Jordan Boyd-Graber UMD Clustering 1 / 1"

Clinton Bradford
5 years ago
Views:

1 Clustering Machine Learning: Jordan Boyd-Graber University of Maryland K -MEANS AND GMM Machine Learning: Jordan Boyd-Graber UMD Clustering 1 / 1

2 Lecture for Today What is clustering? K-Means Gaussian Mixture Models Machine Learning: Jordan Boyd-Graber UMD Clustering 2 / 1

3 Clustering Classifica(on: what is label of new point? Clustering: how should we group these points? Clustering: or is this the right grouping? Clustering: what about this? Machine Learning: Jordan Boyd-Graber UMD Clustering 3 / 1

4 Clustering Uses: genomics medical imaging social network analysis recommender systems market segmentation voter analysis Machine Learning: Jordan Boyd-Graber UMD Clustering 4 / 1

5 Microarray Gene Expression Data From: Skin layer-specific transcriptional profiles in normal and recessive yellow (Mc1re/Mc1re) mice by April and Barsh in Pigment Cell Research (2006) Machine Learning: Jordan Boyd-Graber UMD Clustering 5 / 1

6 Medical Imaging (MRIs and PET scans) From: Fluorodeoxyglucose positron emission tomography of mild cognitive impairment with clinical follow-up at 3 years by Pardo et al. in Alzheimer s and Dementia (2010) Machine Learning: Jordan Boyd-Graber UMD Clustering 6 / 1

7 Social Networks Machine Learning: Jordan Boyd-Graber UMD Clustering 7 / 1

8 Recommender Systems From: tech.hulu.com/blog/ Machine Learning: Jordan Boyd-Graber UMD Clustering 8 / 1

9 Market Segmentation From: mappinganalytics.com/map/segmentation-maps/segmentation-map.html Machine Learning: Jordan Boyd-Graber UMD Clustering 9 / 1

10 Voter Analysis soccer moms (female, middle aged, married, middle income, white, kids, suburban) Nascar dads (male, middle aged, married, middle income, white, kids, Southern, suburban or rural) security moms (... ) low information voters Ivy League Elites Machine Learning: Jordan Boyd-Graber UMD Clustering 10 / 1

11 Clustering Questions: how do we fit clusters? how many clusters should we use? how should we evaluate model fit? Machine Learning: Jordan Boyd-Graber UMD Clustering 11 / 1

12 K-Means How do we fit the clusters? simplest method: K-means requires: real-valued data idea: pick K initial cluster means associate all points closest to mean k with cluster k use points in cluster k to update mean for that cluster re-associate points closest to new mean for k with cluster k use new points in cluster k to update mean for that cluster... stop when no change between updates Machine Learning: Jordan Boyd-Graber UMD Clustering 12 / 1

13 K-Means Animation at: Machine Learning: Jordan Boyd-Graber UMD Clustering 13 / 1

14 K-Means: Example Data: x 1 x X X1 Machine Learning: Jordan Boyd-Graber UMD Clustering 14 / 1

15 K-Means: Example Pick K centers (randomly): ( 1, 1) and (0,0) X X1 Machine Learning: Jordan Boyd-Graber UMD Clustering 15 / 1

16 K-Means: Example Calculate distance between points and those centers: x 1 x 2 ( 1, 1) (0,0) > centers <- rbind(c(-1,-1),c(0,0)) > dist1 <- apply(x,1,function(x) sqrt(sum((x-centers[1,])^2 > dist2 <- apply(x,1,function(x) sqrt(sum((x-centers[2,])^2 Machine Learning: Jordan Boyd-Graber UMD Clustering 16 / 1

17 K-Means: Example Choose mean with smaller distance: x 1 x 2 ( 1, 1) (0,0) > dists <- cbind(dist1,dist2) > cluster.ind <- apply(dists,1,which.min) Machine Learning: Jordan Boyd-Graber UMD Clustering 17 / 1

18 K-Means: Example New clusters: X Machine Learning: Jordan Boyd-Graber UMD Clustering 18 / 1 X1

19 K-Means: Example Refit means for each cluster: cluster 1: ( 1.0, 2.2), ( 2.4, 2.2), ( 1.0, 1.9) new mean: ( 1.5, 2.1) cluster 2: (0.4, 1.0), ( 0.5,0.6), ( 0.1, 1.7), (1.2, 3.3), (3.1, 1.6), (1.3, 1.6), (2.0, 0.8) new mean: (1.0,1.2) X X1 Machine Learning: Jordan Boyd-Graber UMD Clustering 19 / 1

20 K-Means: Example Recalculate distances for each cluster: x 1 x 2 ( 1.5, 2.1) (1.0,1.2) Machine Learning: Jordan Boyd-Graber UMD Clustering 20 / 1

21 K-Means: Example Choose mean with smaller distance: x 1 x 2 ( 1.5, 2.1) (1.0,1.2) Machine Learning: Jordan Boyd-Graber UMD Clustering 21 / 1

22 K-Means: Example New clusters: X Machine Learning: Jordan Boyd-Graber UMD Clustering 22 / 1 X1

23 K-Means: Example Refit means for each cluster: cluster 1: (0.4, 1.0), ( 1.0, 2.2), ( 2.4, 2.2), ( 1.0, 1.9) new mean: ( 1.0, 1.8) cluster 2: ( 0.5,0.6), ( 0.1,1.7), (1.2,3.3), (3.1,1.6), (1.3,1.6), (2.0, 0.8) new mean: (1.2,1.6) X X1 Machine Learning: Jordan Boyd-Graber UMD Clustering 23 / 1

24 K-Means: Example Recalculate distances for each cluster: x 1 x 2 ( 1.0, 1.8) (1.2,1.6) Machine Learning: Jordan Boyd-Graber UMD Clustering 24 / 1

25 K-Means: Example Select smallest distance and compare these clusters with previous: Table: New Clusters x 1 x 2 ( 1.0, 1.8) (1.2,1.6) Table: Old Clusters ( 1.5, 2.1) (1.0, 1.2) Machine Learning: Jordan Boyd-Graber UMD Clustering 25 / 1

26 K-Means in Practice R has a function for K-means in the stats package; this is probably already loaded let s use this for the Old Faithful data > library(datasets) > faith.2 <- kmeans(faithful,2) > names(faith.2) > plot(faithful[,1],faithful[,2],col=faith.2$clus + pch=faith.2$cluster,lwd=3) Machine Learning: Jordan Boyd-Graber UMD Clustering 26 / 1

27 wait eruptions Machine Learning: Jordan Boyd-Graber UMD Clustering 27 / 1

28 K-Means in R K-means can be used for image segmentation partition image into multiple segments find boundaries of objects make art Machine Learning: Jordan Boyd-Graber UMD Clustering 28 / 1

29 K-Means Clustering What is our data look like this? x1 x2 Machine Learning: Jordan Boyd-Graber UMD Clustering 29 / 1

30 K-Means Clustering True clustering: x1 x2 Machine Learning: Jordan Boyd-Graber UMD Clustering 30 / 1

31 K-Means Clustering K-means clustering (K = 2): x1 x2 Machine Learning: Jordan Boyd-Graber UMD Clustering 31 / 1

32 Mixture Models K-means associates data with cluster centers. What if we actually modeled the data? real-valued data observation xi in cluster c i have K clusters model each cluster with a Gaussian distribution x i c i = k N(µ k,σ k ) µk is mean vector, Σ k is covariance matrix Machine Learning: Jordan Boyd-Graber UMD Clustering 32 / 1

33 Mixture Models Gaussian mixture model (K = 2): x1 x2 Machine Learning: Jordan Boyd-Graber UMD Clustering 33 / 1

34 Mixture Models Why mixture models? more flexible: can account for clusters with different shapes have data model (will be useful for choosing K ) less sensitive to data scaling Machine Learning: Jordan Boyd-Graber UMD Clustering 34 / 1

35 Multivariate Gaussian Multivariate Gaussian distribution for x R d : p(x µ,σ) = (2π) d 2 Σ 1 2 e 1 2 (x µ)t Σ 1 (x µ) µ is vector of means Σ is covariance matrix Credit: Wikipedia Machine Learning: Jordan Boyd-Graber UMD Clustering 35 / 1

36 Multivariate Gaussian pdf when µ = [0,0] and Σ = : p(x) X2 1 2 Machine Learning: Jordan Boyd-Graber UMD Clustering 36 / X

37 Multivariate Gaussian x1 x x1 x2 Machine Learning: Jordan Boyd-Graber UMD Clustering 37 / 1

38 Fitting a Mixture Model Mixture model: observation xi in cluster c i with K clusters model each cluster with a Gaussian distribution x i c i = k N(µ k,σ k ) How do we find c 1,...,c n (clusters) and (µ 1,Σ 1 ),...,(µ K,Σ K ) (cluster centers)? Machine Learning: Jordan Boyd-Graber UMD Clustering 38 / 1

39 Fitting a Mixture Model First, let s simplify the model: covariance matrices have only diagonal elements, σ Σ = 0 σ σ 2 K set σ 2 = 1 = σ2, suppose known K Machine Learning: Jordan Boyd-Graber UMD Clustering 39 / 1

40 Fitting a Mixture Model Next, use a method similar to K-means: start with random cluster centers associate observations to clusters by (log-)likelihood, l(x i c i = k) = d 2 log(2π) 1 d 2 log σ 2 1 d (x k,j i,j µ k,j ) 2 /σ 2 k,j 2 d log(σ k ) 1 2σ 2 k d (x i,j µ k,j ) 2 j=1 j=1 d (x i,j µ k,j ) 2 j=1 refit centers µ1,...,µ K given clusters by µ k,j = 1 n k recluster observations... stop when no change in clusters Machine Learning: Jordan Boyd-Graber UMD Clustering 40 / 1 c i =k x i,j j=1

41 Fitting a Mixture Model clustering with K-means minimize distance d(x i,µ k ) = d (x i,j µ k,j ) 2 j=1 clustering with GMM maximize likelihood d l(x i c i = k) (x i,j µ k,j ) 2 j=1 update means with K-means use average µ k,j = 1 n k c i =k x i,j update means with GMM use average µ k,j = 1 n k c i =k x i,j Machine Learning: Jordan Boyd-Graber UMD Clustering 41 / 1

42 Fitting a Mixture Model OK, now what if Σ = σ σ σ 2 K and σ 2 1,...,σ2 K can take different values? use same algorithm update µk and σ 2 with maximum likelihood estimator, k µ k,j = 1 n k c i =k x i,j σ 2 = 1 (x k,j i,j µ k,j ) 2 n k c i =k Machine Learning: Jordan Boyd-Graber UMD Clustering 42 / 1

43 Fitting a Mixture Model Data: x 1 x X X1 Machine Learning: Jordan Boyd-Graber UMD Clustering 43 / 1

44 Fitting a Mixture Model pick centers and variances, µ1 = [ 1, 1], σ 2 = 1 [1,1], µ 1 = [1,1], σ 2 = [1,1] 1 compute (proportional) log likelihoods, d l(x i c i = k) = log(σ j ) 1 2 j=1 d (x i,j µ k,j ) 2 /σ 2 k,j j=1 x 1 x 2 k = 1 k = Machine Learning: Jordan Boyd-Graber UMD Clustering 44 / 1

45 Fitting a Mixture Model fit new means and variances: µ 1 = [ 1.3, 1.2] σ 2 = [3.1,0.4] 1 µ 2 = [0.9,1.8] σ 2 = [0.2,5.4] 2 compute new distances... Machine Learning: Jordan Boyd-Graber UMD Clustering 45 / 1

46 Fitting a Mixture Model x 1 x 2 k = 1 k = No change, so clusters are final Machine Learning: Jordan Boyd-Graber UMD Clustering 46 / 1

47 Fitting a Mixture Model X X1 Machine Learning: Jordan Boyd-Graber UMD Clustering 47 / 1

48 Limitations of k-means / mixture models k-means is fast and simple, but... What if your data are discrete? What if each data point has more than one cluster? (digits vs. documents) What if you don t know the number of clusters? Machine Learning: Jordan Boyd-Graber UMD Clustering 48 / 1

Unsupervised Clustering

Unsupervised Clustering Digging into Data April 21, 2015 Slides adapted from Lauren Hannah Digging into Data Unsupervised Clustering April 21, 2015 1 / 29 Lecture for Today What is clustering? K-Means