Clustering Algorithms. Margareta Ackerman

Clustering Algorithms Margareta Ackerman

A sea of algorithms As we discussed last class, there are MANY clustering algorithms, and new ones are proposed all the time. They are very different from each other

Input/output There are clustering algorithms for a wide variety of input and output types. Today, we will focus on the most popular one. Input: The input is (X,d) and k, where 1. X is a set of elements (think of it as the labels of the points) 2. d: X x X R + is a dissimilarity function 3. k is the number of desired clusters, 1 k X

Input/output Input: The input is (X,d) and k, where 1. X is a set of elements (think of it as the labels of the points) 2. d: X x X R + is a dissimilarity function 3. k is the number of desired clusters, 1 k X Output: A partition of X into k sets {C1, C2,, Ck} where 1) Ci Cj is empty for all i and j 2) C1 C2 Ck = X.

Linkage-Based Algorithms Start by placing each point in its own cluster Then, merge the two closest clusters Continue to merge two closest clusters until exactly k clusters remain

Linkage-Based Algorithms: More detail Start by placing each point in its own cluster Calculate and store the distance between each pair of clusters While there are more than k clusters - Let A, B be the two closest clusters - Add cluster A U B - Remove clusters A and B - Find the distance between A U B and all other clusters

Examples of linkage-based algorithms How do we define the distance between clusters? Common examples: Single-linkage: min between-cluster distance Average-linkage: average between-cluster distance Complete-linkage: max between-cluster distance 7

Hierarchical algorithms Linkage-based algorithms are often applied in the hierarchical setting, where the algorithm outputs an entire tree of clustering. Hierarchical linkage-based algorithms are similar to the partitional versions we saw here (more about the hierarchal setting later).

K-means Perhaps the most popular clustering algorithm Often applied to data in Euclidean space. 9

Given a clustering {C1, C2,, Ck}, the k-means objective function is Where µi is the mean of Ci. That is, The ideal goal is to find a clustering with the minimum k- means cost. But that can take too long (it s NP-hard.) K-means Objective Function So instead, we apply a heuristic: An algorithm that, in practice, tends to find clusterings with low k-means cost. 10

Lloyd s method Pick k points (call them centers ) Until convergence: Assign each point to its closest center. This gives us k clusters. Compute the mean of each cluster Let these means be the new centers The algorithm converges when the clusters don t change in two consecutive iterations. 11

Variations of Lloyd s method How could we initialize the centers? Furthest centroids: Pick one random center c1. Set c2 to the furthest point from c1 Set ci to have the largest minimum distance from any center already chosen. 12

Variations of Lloyd s method How could we initialize the centers? Random: Pick k random initial centers. Using this approach, we might end up in a local optimum. So, we run the algorithm many times (~100) to completion and pick the minimum cost clustering. 13

Lloyd s method with random centers Picking random centers works VERY WELL in practice. In particular, it work much better than furthest centroids. It works so well, that k-means is synonymous with this approach. Does Lloyd s method with random centers always find the optimal k-means solution? No. We will see other ways to initialize Lloyd s method. 14

K-median Like k-means, except that we do not square the distance to the center. Given a clustering {C1, C2,, Ck}, the k-median objective function is Where µi is the mean of Ci. That is,

K-medoids Like k-means, except that the centers must be part of the data set. Given a clustering {C1, C2,, Ck}, the k-medoids objective function is where that minimizes the above sum. c i 2 C i kx i=1 X x2c i kx c i k 2

Min-sum Given a clustering {C1, C2,, Ck}, the min-sum objective function is kx i=1 X x,y2c i d(x, y)

Differences in Input/Output Behavior of Clustering Algorithms Single-linkage k-means 18

Differences in Input/Output Behavior of Clustering Algorithms Single-linkage, average-linkage, complete-linkage, min-diamater k-means, k-median, k-medoids 19

The User s Dilemma There are a wide variety of clustering algorithms, which can produce very different clusterings. How should a user decide which algorithm to use for a given application? 20

Clustering Algorithm Selection Users rely on cost related considerations: running times, space usage, software purchasing costs, etc There is inadequate emphasis on input-output behaviour 21

Framework for Algorithm Selection (Ackerman, Ben-David, and Loker, NIPS 2010) A framework that lets a user utilize prior knowledge to select an algorithm Identify properties that distinguish between different input-output behaviour of clustering paradigms The properties should be: 1) Intuitive and user-friendly 2) Useful for distinguishing clustering algorithms 22

Framework for Algorithm Selection The goal is to understand fundamental differences between clustering methods, and convey them formally, clearly, and as simply as possible. 23

Property-based classification for fixed k Ackerman, Ben-David, and Loker, NIPS 2010 Local Outer Con. Inner Con. Consistent Refin. Preserv Order Inv. Rich Outer Rich Rep Ind Scale Inv Single linkage Average linkage " " " Complete linkage " " K-means K-medoids Min-Sum Ratio-cut Normalized-cut " " " " " " " " " " " " " " " " " " " " 24

Kleinberg s axioms for fixed k Local Outer Con. Inner Con. Consistent Refin. Preserv Order Inv. Rich Outer Rich Rep Ind Scale Inv Single linkage Average linkage " " " Complete linkage " " K-means K-medoids Min-Sum Ratio-cut Normalized-cut Kleinberg s Axioms are consistent when k is given " " " " " " " " " " " " " " " " " " " " 25

Single-linkage satisfies everything Local Outer Con. Inner Con. Consistent Refin. Preserv Order Inv. Rich Outer Rich Rep Ind Scale Inv Single linkage Single linkage satisfied ALL of these properties So should we just use Single linkage all the time? It s not a good clustering algorithm in practice 26

What s Left To Be Done? Despite much work on clustering properties, some basic questions remained unanswered. Consider some of the most popular clustering methods: k-means, single-linkage, average-linkage, etc How do these algorithms differ in their input-output behavior? What are the advantages of k-means over other methods? We were missing some key properties. More on that in our next class 27