Clustering Algorithms. Margareta Ackerman

Similar documents
Theoretical Foundations of Clustering. Margareta Ackerman

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Towards Theoretical Foundations of Clustering

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

A Characterization of Linkage-Based Hierarchical Clustering

Clustering COMS 4771

Clustering. Chapter 10 in Introduction to statistical learning

CSE 5243 INTRO. TO DATA MINING

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

CSE 5243 INTRO. TO DATA MINING

Clustering: Overview and K-means algorithm

John Oliver from The Daily Show. Supporting worthy causes at the G20 Pittsburgh Summit: Bayesians Against Discrimination. Ban Genetic Algorithms

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Unsupervised Learning

Chapter 6: Cluster Analysis

Clustering CS 550: Machine Learning

Information Retrieval and Web Search Engines

L9: Hierarchical Clustering

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Clustering: K-means and Kernel K-means

MSA220 - Statistical Learning for Big Data

Selecting Clustering Algorithms Based on Their Weight Sensitivity

What to come. There will be a few more topics we will cover on supervised learning

Clustering Algorithms for general similarity measures

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

4. Ad-hoc I: Hierarchical clustering

Measures of Clustering Quality: A Working Set of Axioms for Clustering

Hierarchical Clustering

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

1 Case study of SVM (Rob)

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Hierarchical Clustering

Lecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering

k-means, k-means++ Barna Saha March 8, 2016

University of Florida CISE department Gator Engineering. Clustering Part 2

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Clustering. Unsupervised Learning

Clustering Lecture 3: Hierarchical Methods

Lecture 4 Hierarchical clustering

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Clustering: Centroid-Based Partitioning

Clustering: Overview and K-means algorithm

Data Clustering. Danushka Bollegala

Clustering Part 3. Hierarchical Clustering

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Information Retrieval and Web Search Engines

Unsupervised Learning : Clustering

Clustering. Unsupervised Learning

Distances, Clustering! Rafael Irizarry!

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

CS7267 MACHINE LEARNING

Data Clustering. Algorithmic Thinking Luay Nakhleh Department of Computer Science Rice University

Clustering part II 1

k-center Problems Joey Durham Graphs, Combinatorics and Convex Optimization Reading Group Summer 2008

Introduction to Machine Learning CMU-10701

Data Informatics. Seon Ho Kim, Ph.D.

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

21 The Singular Value Decomposition; Clustering

Clust Clus e t ring 2 Nov

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

Machine Learning for OR & FE

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Theoretical Foundations of Clustering few results, many challenges

K-Means. Algorithmic Methods of Data Mining M. Sc. Data Science Sapienza University of Rome. Carlos Castillo

Intelligent Image and Graphics Processing

Clustering. K-means clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. RNA-seq: What is it good for? Finding Similarly Expressed Genes. Data... And Lots of It!

Flat Clustering. Slides are mostly from Hinrich Schütze. March 27, 2017

Lecture 15 Clustering. Oct

Clustering Lecture 8. David Sontag New York University. Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein

Cluster Analysis for Microarray Data

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

Cluster Analysis. Ying Shen, SSE, Tongji University

Unsupervised learning, Clustering CS434

Hierarchical Clustering Lecture 9

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

Unsupervised Learning and Clustering

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

Cluster Analysis. Angela Montanari and Laura Anderlucci

Clustering in Data Mining

k-means Clustering David S. Rosenberg April 24, 2018 New York University

Clustering Color/Intensity. Group together pixels of similar color/intensity.

Approximation Algorithms for Clustering Uncertain Data

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Clustering. (Part 2)

Machine learning - HT Clustering

Lesson 3. Prof. Enza Messina

CHAPTER 4: CLUSTER ANALYSIS

Administrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B

Transcription:

Clustering Algorithms Margareta Ackerman

A sea of algorithms As we discussed last class, there are MANY clustering algorithms, and new ones are proposed all the time. They are very different from each other

Input/output There are clustering algorithms for a wide variety of input and output types. Today, we will focus on the most popular one. Input: The input is (X,d) and k, where 1. X is a set of elements (think of it as the labels of the points) 2. d: X x X R + is a dissimilarity function 3. k is the number of desired clusters, 1 k X

Input/output Input: The input is (X,d) and k, where 1. X is a set of elements (think of it as the labels of the points) 2. d: X x X R + is a dissimilarity function 3. k is the number of desired clusters, 1 k X Output: A partition of X into k sets {C1, C2,, Ck} where 1) Ci Cj is empty for all i and j 2) C1 C2 Ck = X.

Linkage-Based Algorithms Start by placing each point in its own cluster Then, merge the two closest clusters Continue to merge two closest clusters until exactly k clusters remain

Linkage-Based Algorithms: More detail Start by placing each point in its own cluster Calculate and store the distance between each pair of clusters While there are more than k clusters - Let A, B be the two closest clusters - Add cluster A U B - Remove clusters A and B - Find the distance between A U B and all other clusters

Examples of linkage-based algorithms How do we define the distance between clusters? Common examples: Single-linkage: min between-cluster distance Average-linkage: average between-cluster distance Complete-linkage: max between-cluster distance 7

Hierarchical algorithms Linkage-based algorithms are often applied in the hierarchical setting, where the algorithm outputs an entire tree of clustering. Hierarchical linkage-based algorithms are similar to the partitional versions we saw here (more about the hierarchal setting later).

K-means Perhaps the most popular clustering algorithm Often applied to data in Euclidean space. 9

Given a clustering {C1, C2,, Ck}, the k-means objective function is Where µi is the mean of Ci. That is, The ideal goal is to find a clustering with the minimum k- means cost. But that can take too long (it s NP-hard.) K-means Objective Function So instead, we apply a heuristic: An algorithm that, in practice, tends to find clusterings with low k-means cost. 10

Lloyd s method Pick k points (call them centers ) Until convergence: Assign each point to its closest center. This gives us k clusters. Compute the mean of each cluster Let these means be the new centers The algorithm converges when the clusters don t change in two consecutive iterations. 11

Variations of Lloyd s method How could we initialize the centers? Furthest centroids: Pick one random center c1. Set c2 to the furthest point from c1 Set ci to have the largest minimum distance from any center already chosen. 12

Variations of Lloyd s method How could we initialize the centers? Random: Pick k random initial centers. Using this approach, we might end up in a local optimum. So, we run the algorithm many times (~100) to completion and pick the minimum cost clustering. 13

Lloyd s method with random centers Picking random centers works VERY WELL in practice. In particular, it work much better than furthest centroids. It works so well, that k-means is synonymous with this approach. Does Lloyd s method with random centers always find the optimal k-means solution? No. We will see other ways to initialize Lloyd s method. 14

K-median Like k-means, except that we do not square the distance to the center. Given a clustering {C1, C2,, Ck}, the k-median objective function is Where µi is the mean of Ci. That is,

K-medoids Like k-means, except that the centers must be part of the data set. Given a clustering {C1, C2,, Ck}, the k-medoids objective function is where that minimizes the above sum. c i 2 C i kx i=1 X x2c i kx c i k 2

Min-sum Given a clustering {C1, C2,, Ck}, the min-sum objective function is kx i=1 X x,y2c i d(x, y)

Differences in Input/Output Behavior of Clustering Algorithms Single-linkage k-means 18

Differences in Input/Output Behavior of Clustering Algorithms Single-linkage, average-linkage, complete-linkage, min-diamater k-means, k-median, k-medoids 19

The User s Dilemma There are a wide variety of clustering algorithms, which can produce very different clusterings. How should a user decide which algorithm to use for a given application? 20

Clustering Algorithm Selection Users rely on cost related considerations: running times, space usage, software purchasing costs, etc There is inadequate emphasis on input-output behaviour 21

Framework for Algorithm Selection (Ackerman, Ben-David, and Loker, NIPS 2010) A framework that lets a user utilize prior knowledge to select an algorithm Identify properties that distinguish between different input-output behaviour of clustering paradigms The properties should be: 1) Intuitive and user-friendly 2) Useful for distinguishing clustering algorithms 22

Framework for Algorithm Selection The goal is to understand fundamental differences between clustering methods, and convey them formally, clearly, and as simply as possible. 23

Property-based classification for fixed k Ackerman, Ben-David, and Loker, NIPS 2010 Local Outer Con. Inner Con. Consistent Refin. Preserv Order Inv. Rich Outer Rich Rep Ind Scale Inv Single linkage Average linkage " " " Complete linkage " " K-means K-medoids Min-Sum Ratio-cut Normalized-cut " " " " " " " " " " " " " " " " " " " " 24

Kleinberg s axioms for fixed k Local Outer Con. Inner Con. Consistent Refin. Preserv Order Inv. Rich Outer Rich Rep Ind Scale Inv Single linkage Average linkage " " " Complete linkage " " K-means K-medoids Min-Sum Ratio-cut Normalized-cut Kleinberg s Axioms are consistent when k is given " " " " " " " " " " " " " " " " " " " " 25

Single-linkage satisfies everything Local Outer Con. Inner Con. Consistent Refin. Preserv Order Inv. Rich Outer Rich Rep Ind Scale Inv Single linkage Single linkage satisfied ALL of these properties So should we just use Single linkage all the time? It s not a good clustering algorithm in practice 26

What s Left To Be Done? Despite much work on clustering properties, some basic questions remained unanswered. Consider some of the most popular clustering methods: k-means, single-linkage, average-linkage, etc How do these algorithms differ in their input-output behavior? What are the advantages of k-means over other methods? We were missing some key properties. More on that in our next class 27