Stat 321: Transposable Data Clustering

Similar documents
Cluster Analysis. Angela Montanari and Laura Anderlucci

Machine Learning (BSMC-GA 4439) Wenke Liu

CSE 5243 INTRO. TO DATA MINING

Clustering CS 550: Machine Learning

CSE 5243 INTRO. TO DATA MINING

Hierarchical Clustering

Machine Learning (BSMC-GA 4439) Wenke Liu

Finding Clusters 1 / 60

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Unsupervised Learning

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Unsupervised Learning

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Chapter 10 in Introduction to statistical learning

Hierarchical Clustering

Measure of Distance. We wish to define the distance between two objects Distance metric between points:

Unsupervised Learning

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

ECS 234: Data Analysis: Clustering ECS 234

3. Cluster analysis Overview

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Hierarchical clustering

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Hierarchical Clustering

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

3. Cluster analysis Overview

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Hierarchical Clustering 4/5/17

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Chapter VIII.3: Hierarchical Clustering

Machine learning - HT Clustering

Cluster Analysis. Ying Shen, SSE, Tongji University

Clustering Part 3. Hierarchical Clustering

Clustering. Lecture 6, 1/24/03 ECS289A

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Chapter 6 Continued: Partitioning Methods

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

MSA220 - Statistical Learning for Big Data

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Based on Raymond J. Mooney s slides

Foundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot

University of Florida CISE department Gator Engineering. Clustering Part 2

Exploratory data analysis for microarrays

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Distances, Clustering! Rafael Irizarry!

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Cluster analysis. Agnieszka Nowak - Brzezinska

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

Data Exploration with PCA and Unsupervised Learning with Clustering Paul Rodriguez, PhD PACE SDSC

Lesson 3. Prof. Enza Messina

Chapter DM:II. II. Cluster Analysis

Cluster Analysis: Agglomerate Hierarchical Clustering

Unsupervised learning, Clustering CS434

Multivariate Analysis

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

4. Ad-hoc I: Hierarchical clustering

Supervised vs. Unsupervised Learning

Lecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Clustering: K-means and Kernel K-means

Tree Models of Similarity and Association. Clustering and Classification Lecture 5

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

Clustering part II 1

10701 Machine Learning. Clustering

Lecture 15 Clustering. Oct

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

CHAPTER 4: CLUSTER ANALYSIS

Gene Clustering & Classification

Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Multivariate Methods

Data Mining and Analysis: Fundamental Concepts and Algorithms

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

Unsupervised Learning and Clustering

Clustering Lecture 3: Hierarchical Methods

Unsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis

Data Mining Concepts & Techniques

Unsupervised Learning. Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

Hierarchical Clustering / Dendrograms

Hierarchical and Ensemble Clustering

Clustering Algorithms for general similarity measures

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

TELCOM2125: Network Science and Analysis

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

Segmentation Computer Vision Spring 2018, Lecture 27

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Knowledge Discovery in Databases. Part III Clustering

K-means and Hierarchical Clustering

Understanding Clustering Supervising the unsupervised

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters

Data Mining Algorithms

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Transcription:

Stat 321: Transposable Data Clustering Art B. Owen Stanford Statistics Art B. Owen (Stanford Statistics) Clustering 1 / 27

Clustering Given n objects with d attributes, place them (the objects) into groups. Art B. Owen (Stanford Statistics) Clustering 2 / 27

Clustering Given n objects with d attributes, place them (the objects) into groups. A form of unsupervised learning. Unsupervised because there is no response. Art B. Owen (Stanford Statistics) Clustering 2 / 27

Clustering Given n objects with d attributes, place them (the objects) into groups. A form of unsupervised learning. Unsupervised because there is no response. Has a long history, at least as old as taxonomy. Art B. Owen (Stanford Statistics) Clustering 2 / 27

Clustering Given n objects with d attributes, place them (the objects) into groups. A form of unsupervised learning. Unsupervised because there is no response. Has a long history, at least as old as taxonomy. Raises all the vexing issues of an exploratory method. The curse of exploration. Art B. Owen (Stanford Statistics) Clustering 2 / 27

Clustering Given n objects with d attributes, place them (the objects) into groups. A form of unsupervised learning. Unsupervised because there is no response. Has a long history, at least as old as taxonomy. Raises all the vexing issues of an exploratory method. The curse of exploration. We ll look at it as a precursor to bi-clustering of objects and attributes. Art B. Owen (Stanford Statistics) Clustering 2 / 27

Clustering Given n points in R d Art B. Owen (Stanford Statistics) Clustering 3 / 27

Clustering Given n points in R d Do they clump together into k clusters? Art B. Owen (Stanford Statistics) Clustering 3 / 27

Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, Art B. Owen (Stanford Statistics) Clustering 3 / 27

Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, and the boundaries, Art B. Owen (Stanford Statistics) Clustering 3 / 27

Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, and the boundaries, and cluster identities? Art B. Owen (Stanford Statistics) Clustering 3 / 27

Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, and the boundaries, and cluster identities? Outcomes Art B. Owen (Stanford Statistics) Clustering 3 / 27

Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, and the boundaries, and cluster identities? Outcomes In the best case, a clustering can reveal the presence of a new categorical variable, e.g. types of diabetes. Art B. Owen (Stanford Statistics) Clustering 3 / 27

Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, and the boundaries, and cluster identities? Outcomes In the best case, a clustering can reveal the presence of a new categorical variable, e.g. types of diabetes. Other times there are no clusters, just a smear Art B. Owen (Stanford Statistics) Clustering 3 / 27

Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, and the boundaries, and cluster identities? Outcomes In the best case, a clustering can reveal the presence of a new categorical variable, e.g. types of diabetes. Other times there are no clusters, just a smear Or we find clusters but not their meanings. Art B. Owen (Stanford Statistics) Clustering 3 / 27

Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, and the boundaries, and cluster identities? Outcomes In the best case, a clustering can reveal the presence of a new categorical variable, e.g. types of diabetes. Key idea Other times there are no clusters, just a smear Or we find clusters but not their meanings. Items within a cluster are more similar (less distant) to each other than items from different clusters. Art B. Owen (Stanford Statistics) Clustering 3 / 27

k-means Art B. Owen (Stanford Statistics) Clustering 4 / 27

k-means Algorithm 1 Pick k points z 1,..., z k R d, then repeat For i = 1,..., n put g(i) = min 1 j k x i z j 2 For j = 1,..., k put z j = avg{x i g(i) = j} Art B. Owen (Stanford Statistics) Clustering 4 / 27

k-means Algorithm 1 Pick k points z 1,..., z k R d, then repeat For i = 1,..., n put g(i) = min 1 j k x i z j 2 For j = 1,..., k put z j = avg{x i g(i) = j} Issues Handle averaging over empty set Pick stopping rule (it must converge, or at worst cycle) Answer depends on starting points Hard to pick k (at least 30 methods proposed by 1985) Art B. Owen (Stanford Statistics) Clustering 4 / 27

k-means Algorithm 1 Pick k points z 1,..., z k R d, then repeat For i = 1,..., n put g(i) = min 1 j k x i z j 2 For j = 1,..., k put z j = avg{x i g(i) = j} Issues Handle averaging over empty set Pick stopping rule (it must converge, or at worst cycle) Answer depends on starting points Hard to pick k (at least 30 methods proposed by 1985) Properties Usually rapid convergence An iteration can be done in O(nkd) time. No n 2 or d 2 or k 2. (k log(k) to sort negligible) Art B. Owen (Stanford Statistics) Clustering 4 / 27

k-means ctd There s a criterion Each step minimizes SS = n x i z g(i) 2 i=1 over it s free variables. Sadly, it is not convex. Art B. Owen (Stanford Statistics) Clustering 5 / 27

k-means ctd There s a criterion Each step minimizes SS = n x i z g(i) 2 i=1 over it s free variables. Sadly, it is not convex. Hartigan & Wong (1979) get solutions s.t. no g(i) change reduces SS Art B. Owen (Stanford Statistics) Clustering 5 / 27

k-means ctd There s a criterion Each step minimizes SS = n x i z g(i) 2 i=1 over it s free variables. Sadly, it is not convex. Hartigan & Wong (1979) get solutions s.t. no g(i) change reduces SS Maybe penalizing i<i Z i Z i 2 will work (Y. She, F. Bach) Art B. Owen (Stanford Statistics) Clustering 5 / 27

k-means ctd There s a criterion Each step minimizes SS = n x i z g(i) 2 i=1 over it s free variables. Sadly, it is not convex. Hartigan & Wong (1979) get solutions s.t. no g(i) change reduces SS Maybe penalizing i<i Z i Z i 2 will work (Y. She, F. Bach) x-means Pelleg and Moore Efficient lookups to get k into tens of thousands Picks k via AIC or BIC along the way Art B. Owen (Stanford Statistics) Clustering 5 / 27

k-means ctd There s a criterion Each step minimizes SS = n x i z g(i) 2 i=1 over it s free variables. Sadly, it is not convex. Hartigan & Wong (1979) get solutions s.t. no g(i) change reduces SS Maybe penalizing i<i Z i Z i 2 will work (Y. She, F. Bach) x-means Pelleg and Moore Efficient lookups to get k into tens of thousands Picks k via AIC or BIC along the way Defining true k problematic (galaxies in clusters in super-clusters) Art B. Owen (Stanford Statistics) Clustering 5 / 27

Variations Change dist Change the distance L 2 to L 1 to L p Changes mean to median to arg min Art B. Owen (Stanford Statistics) Clustering 6 / 27

Variations Change dist Change the distance L 2 to L 1 to L p Changes mean to median to arg min Change z k medoids Art B. Owen (Stanford Statistics) Clustering 6 / 27

Variations Change dist Change the distance L 2 to L 1 to L p Changes mean to median to arg min Change z k medoids Require z k = x i(k) for some i(k) Art B. Owen (Stanford Statistics) Clustering 6 / 27

Variations Change dist Change the distance L 2 to L 1 to L p Changes mean to median to arg min Change z k medoids Require z k = x i(k) for some i(k) Avoids using z with 2.3 kids, 30% pregnant, 10% male Art B. Owen (Stanford Statistics) Clustering 6 / 27

Variations Change dist Change the distance L 2 to L 1 to L p Changes mean to median to arg min Change z k medoids Require z k = x i(k) for some i(k) Avoids using z with 2.3 kids, 30% pregnant, 10% male PAM partitioning around medoids Art B. Owen (Stanford Statistics) Clustering 6 / 27

Variations Change dist Change the distance L 2 to L 1 to L p Changes mean to median to arg min Change z k medoids Require z k = x i(k) for some i(k) Avoids using z with 2.3 kids, 30% pregnant, 10% male PAM partitioning around medoids Minimize i min j D(x i, z j ) Art B. Owen (Stanford Statistics) Clustering 6 / 27

Variations Change dist Change the distance L 2 to L 1 to L p Changes mean to median to arg min Change z k medoids Require z k = x i(k) for some i(k) Avoids using z with 2.3 kids, 30% pregnant, 10% male PAM partitioning around medoids Minimize i min j D(x i, z j ) Slow. Uses only D ij values. Art B. Owen (Stanford Statistics) Clustering 6 / 27

What is a cluster? Defining issues Scale of variables matters Subset of variables matters too Do whales go with penguins or with elephants? Why does my breakfast cereal cluster with pure sugar? # Density bumps # mixture components Art B. Owen (Stanford Statistics) Clustering 7 / 27

What is a cluster? Defining issues Scale of variables matters Subset of variables matters too Do whales go with penguins or with elephants? Why does my breakfast cereal cluster with pure sugar? # Density bumps # mixture components Scaling data z ij = x ij m j s j m, s are mean & stdev (so z j (0, 1)) or min & range (so 0 z ij 1) or median & MAD (for robustness) NB: transformation defined column-wise [vs row-wise or simultaneous] Art B. Owen (Stanford Statistics) Clustering 7 / 27

Weighting Weight variables via w j 0 d ii = with d d w j x ij x i j = x ij x i j j=1 x ij = x ij w j j=1 Weighting and scaling are equivalent (for L p distances) Art B. Owen (Stanford Statistics) Clustering 8 / 27

Weighting Weight variables via w j 0 d ii = with d d w j x ij x i j = x ij x i j j=1 x ij = x ij w j j=1 Weighting and scaling are equivalent (for L p distances) Automatic scaling not always sensible. Variables in the same units should sometimes get the same scaling even if they have different variances. Art B. Owen (Stanford Statistics) Clustering 8 / 27

Distances n(n 1)/2 interpoint distances d ii = dist(x i, x i ) Usually 1 d ii 0 2 d ii = 0 3 d ii = d i i A metric distance has d ij d ik + d jk (Triangle inequality) A Euclidean distance d ij = z i z j for some points z i = z(x i ) An ultra-metric distance has d ij max(d ik, d jk ) (More later) Art B. Owen (Stanford Statistics) Clustering 9 / 27

Distances n(n 1)/2 interpoint distances d ii = dist(x i, x i ) Usually 1 d ii 0 2 d ii = 0 3 d ii = d i i A metric distance has d ij d ik + d jk (Triangle inequality) A Euclidean distance d ij = z i z j for some points z i = z(x i ) An ultra-metric distance has d ij max(d ik, d jk ) (More later) Other distances Canberra distance d x ij x i j j=1 x ij + x i j (with 0/0 = 0) Angular or cosine distance (dogs cats, wolves tigers) Art B. Owen (Stanford Statistics) Clustering 9 / 27

Similarities Opposite of distance: d S Eg S ii = 1 d ii or 1/d ii or d ii = S S ii Art B. Owen (Stanford Statistics) Clustering 10 / 27

Similarities Opposite of distance: d S Eg S ii = 1 d ii or 1/d ii or d ii = S S ii Correlation type similarities S ii = = = d j=1 x ijx i j d d j=1 x2 ij j=1 x2 i j d j=1 x ijx i j d d j=1 x2 ij j=1 x2 i j, or,, d j=1 (x ij x j ) 2 d or, d j=1 (x ij x j )(x i j x j ) j=1 (x i j x j ) 2, or, Art B. Owen (Stanford Statistics) Clustering 10 / 27

Similarities Some equalities are more equal than others 1 i and i are both Nobel laureates (unusually strong similarity) Art B. Owen (Stanford Statistics) Clustering 11 / 27

Similarities Some equalities are more equal than others 1 i and i are both Nobel laureates (unusually strong similarity) 2 i and i are both over 21 years old (mild similarity) Art B. Owen (Stanford Statistics) Clustering 11 / 27

Similarities Some equalities are more equal than others 1 i and i are both Nobel laureates (unusually strong similarity) 2 i and i are both over 21 years old (mild similarity) 3 i and i are both not Nobel laureates (barely similar at all) Art B. Owen (Stanford Statistics) Clustering 11 / 27

Similarities Some equalities are more equal than others 1 i and i are both Nobel laureates (unusually strong similarity) 2 i and i are both over 21 years old (mild similarity) 3 i and i are both not Nobel laureates (barely similar at all) We can handle 1 vs 2 by weighting the variables. Art B. Owen (Stanford Statistics) Clustering 11 / 27

Similarities Some equalities are more equal than others 1 i and i are both Nobel laureates (unusually strong similarity) 2 i and i are both over 21 years old (mild similarity) 3 i and i are both not Nobel laureates (barely similar at all) We can handle 1 vs 2 by weighting the variables. But 1 vs 3 is trickier (same variable). Art B. Owen (Stanford Statistics) Clustering 11 / 27

Binary similarity measures p features 2 2 table 1 0 1 a b 0 c d We want to count a more than d Generic measure: α > 0, δ 0 S ii = αa + δd αa + b + c + δd (α, δ) and (α, δ ) give the same ranking if αδ = α δ Janowitz recommends Jaccard or Russel-Rao d = 1 S Specific measures a + d S ii = Simple matchin a + b + c + d a S ii = Jaccard-Tanimoto a + b + c = 1 when a + b + c = 0 a S ii = Russel-Rao a + b + c + d 2(a + d) S ii = Sokal-Sneath 2(a + d) + b + c a S ii = Sokal-Sneath II a + 2(b + c) Art B. Owen (Stanford Statistics) Clustering 12 / 27

Agglomerative clustering General Start with n clusters of one element each Art B. Owen (Stanford Statistics) Clustering 13 / 27

Agglomerative clustering General Start with n clusters of one element each Repeat 1 Find closest two clusters 2 Merge them into a new cluster Art B. Owen (Stanford Statistics) Clustering 13 / 27

Agglomerative clustering General Start with n clusters of one element each Repeat 1 Find closest two clusters 2 Merge them into a new cluster Until only one cluster remains Art B. Owen (Stanford Statistics) Clustering 13 / 27

Agglomerative clustering General Start with n clusters of one element each Repeat 1 Find closest two clusters 2 Merge them into a new cluster Until only one cluster remains We need point to cluster and cluster to cluster distances Art B. Owen (Stanford Statistics) Clustering 13 / 27

Flavors of agglomerative clustering Single linkage Get chaining ; friends of friends d(c 1, C 2 ) = min i C 1 min j C 2 d ij Art B. Owen (Stanford Statistics) Clustering 14 / 27

Flavors of agglomerative clustering Single linkage Get chaining ; friends of friends Complete linkage Get dense nearly spherical clusters d(c 1, C 2 ) = min i C 1 min j C 2 d ij d(c 1, C 2 ) = max i C 1 max j C 2 d ij Art B. Owen (Stanford Statistics) Clustering 14 / 27

Flavors of agglomerative clustering Single linkage Get chaining ; friends of friends Complete linkage Get dense nearly spherical clusters d(c 1, C 2 ) = min i C 1 min j C 2 d ij d(c 1, C 2 ) = max i C 1 max j C 2 d ij Average linkage Compromise d(c 1, C 2 ) = 1 1 C 1 C 2 d ij i C 1 j C 2 Art B. Owen (Stanford Statistics) Clustering 14 / 27

Example Bird data > dim(voeg) [1] 395 34 > voeg[1:5,1:5] Heron Mallard Sparrowhawk Buzzard Kestrel 1 0 0 0 0 1 2 1 0 0 0 0 3 0 0 1 0 0 4 0 0 1 0 0 5 1 1 0 0 0 1 means that place i has bird j Place names not present Art B. Owen (Stanford Statistics) Clustering 15 / 27

Complete linkage dendrogram Art B. Owen (Stanford Statistics) Clustering 16 / 27

Single linkage dendrogram Art B. Owen (Stanford Statistics) Clustering 17 / 27

Average linkage dendrogram Art B. Owen (Stanford Statistics) Clustering 18 / 27

Complete vs Average linkage dendrograms Art B. Owen (Stanford Statistics) Clustering 19 / 27

Heatmap Art B. Owen (Stanford Statistics) Clustering 20 / 27

Heatmap Art B. Owen (Stanford Statistics) Clustering 21 / 27

Heatmap Art B. Owen (Stanford Statistics) Clustering 22 / 27

Dendrogram details Ordering n points = n 1 splits = 2 n 1 orderings R puts tightest cluster on the left heatmap lets you control ordering somewhat Art B. Owen (Stanford Statistics) Clustering 23 / 27

Dendrogram details Ordering n points = n 1 splits = 2 n 1 orderings R puts tightest cluster on the left heatmap lets you control ordering somewhat Ultrametric H ij height at which clusters containing i and j merge It s an ultrametric: for i, j, k top two of H ij, H ik, H jk are equal Need H ij max(h ik, H jk ) Non-ultrametric measures yield dendrograms with reversals Art B. Owen (Stanford Statistics) Clustering 23 / 27

More hierarchical Centroid merging d(c 1, C 2 ) = X C1 X C2 Requires original points, not just distances Art B. Owen (Stanford Statistics) Clustering 24 / 27

More hierarchical Centroid merging d(c 1, C 2 ) = X C1 X C2 Requires original points, not just distances Ward s d(c 1, C 2 ) = 2 C 1 C 2 C 1 + C 2 X C1 X C2 Via within cluster SS (after before) Goal is to min j i C j X i X Cj 2 Art B. Owen (Stanford Statistics) Clustering 24 / 27

Costs of hierarchical splits It takes O(n 2 d) to get all pairwise distances We have to take O(n) steps Naive implementation would be O(n 3 d). Art B. Owen (Stanford Statistics) Clustering 25 / 27

Costs of hierarchical splits It takes O(n 2 d) to get all pairwise distances We have to take O(n) steps Naive implementation would be O(n 3 d). Lance-Williams family of methods Dist of i j to k α i d ki + α j d kj + βd ij + γ d ki d kj eg: α = 1/2, β = 0, γ = 1/2 for single linkage. α i can depend on n i. Updates lead to O(n 2 d) cost. Really clever data structures (F. Murtagh) avoid even n 2 log(n). Art B. Owen (Stanford Statistics) Clustering 25 / 27

Divisive clustering Recipe Start with one cluster of n objects Repeat 1 Select one cluster 2 Split it into two Until there are n clusters of size 1 Art B. Owen (Stanford Statistics) Clustering 26 / 27

Divisive clustering Recipe Start with one cluster of n objects Repeat 1 Select one cluster 2 Split it into two Until there are n clusters of size 1 Choices Which cluster to split. E.g. largest diameter arg max j max X i X i i,i C j How to split it. E.g. remove far point, X i goes with nearer of far point, XCj far Art B. Owen (Stanford Statistics) Clustering 26 / 27

Divisive clustering Recipe Start with one cluster of n objects Repeat 1 Select one cluster 2 Split it into two Until there are n clusters of size 1 Choices Which cluster to split. E.g. largest diameter arg max j max X i X i i,i C j How to split it. E.g. remove far point, X i goes with nearer of far point, XCj far Apparently no good speedup Art B. Owen (Stanford Statistics) Clustering 26 / 27

Optimization based clustering E.g. Ward s Recipe Measure quality of a cluster Scale est, like diameter, RMS width, median width Combine into quality of clustering Typically the sum (max plausible) (Attempt to) optimize Art B. Owen (Stanford Statistics) Clustering 27 / 27

Optimization based clustering E.g. Ward s Recipe Measure quality of a cluster Scale est, like diameter, RMS width, median width Combine into quality of clustering Typically the sum (max plausible) (Attempt to) optimize Isolation measures Split min i C,j C d ij Cut i C,j C d ij Art B. Owen (Stanford Statistics) Clustering 27 / 27