Clustering COMS 4771

Similar documents
Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B

Clustering. Unsupervised Learning

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. Unsupervised Learning

Clustering. Unsupervised Learning

COMS 4771 Clustering. Nakul Verma

Clustering: K-means and Kernel K-means

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Chapter 6: Cluster Analysis

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Lecture 4 Hierarchical clustering

CSE 5243 INTRO. TO DATA MINING

Clustering Algorithms. Margareta Ackerman

Clustering. Clustering in R d

Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

Machine Learning. Unsupervised Learning. Manfred Huber

K-Means Clustering 3/3/17

Clustering Part 3. Hierarchical Clustering

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

CSE 5243 INTRO. TO DATA MINING

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

Clustering Lecture 3: Hierarchical Methods

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Data Clustering. Danushka Bollegala

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

4. Ad-hoc I: Hierarchical clustering

Lecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Unsupervised learning, Clustering CS434

Hierarchical Clustering

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Hierarchical Clustering

Hierarchical clustering

Cluster Analysis: Agglomerate Hierarchical Clustering

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

Unsupervised Learning

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Introduction to Machine Learning. Xiaojin Zhu

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Machine Learning for OR & FE

May 1, CODY, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. May 3, Class HW SVM, PCA, and K-means, Bishop Ch

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

What to come. There will be a few more topics we will cover on supervised learning

Gene Clustering & Classification

Introduction to Artificial Intelligence

Kernels and Clustering

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Information Retrieval and Web Search Engines

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan

CS7267 MACHINE LEARNING

Unsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Kapitel 4: Clustering

Lecture 2 The k-means clustering problem

CS 343: Artificial Intelligence

Introduction to Mobile Robotics

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Hierarchy. No or very little supervision Some heuristic quality guidances on the quality of the hierarchy. Jian Pei: CMPT 459/741 Clustering (2) 1

Clustering CS 550: Machine Learning

10701 Machine Learning. Clustering

Clustering and The Expectation-Maximization Algorithm

Unsupervised Learning and Clustering

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering

Preprocessing DWML, /33

Discrete geometry. Lecture 2. Alexander & Michael Bronstein tosca.cs.technion.ac.il/book

Data Mining Concepts & Techniques

Clustering k-mean clustering

Lecture 8 May 7, Prabhakar Raghavan

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering gene expression data

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Administration. Final Exam: Next Tuesday, 12/6 12:30, in class. HW 7: Due on Thursday 12/1. Final Projects:

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

Hierarchical Clustering 4/5/17

Unsupervised Learning: Clustering

University of Florida CISE department Gator Engineering. Clustering Part 2

Machine Learning: Symbol-based

Clustering in Data Mining

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

k-means Clustering David S. Rosenberg April 24, 2018 New York University

Gene expression & Clustering (Chapter 10)

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Clustering Lecture 8. David Sontag New York University. Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

Transcription:

Clustering COMS 4771

1. Clustering

Unsupervised classification / clustering Unsupervised classification Input: x 1,..., x n R d, target cardinality k N. Output: function f : R d {1,..., k} =: [k]. Typical semantics: hidden subpopulation structure. 1 / 14

Unsupervised classification / clustering Unsupervised classification Input: x 1,..., x n R d, target cardinality k N. Output: function f : R d {1,..., k} =: [k]. Typical semantics: hidden subpopulation structure. Clustering Input: x 1,..., x n R d, target cardinality k N. Output: partitioning of x 1,..., x n into k groups. Often done via unsupervised classification; clustering often synonymous with unsupervised classification. Sometimes also have a representative c j R d for each j [k] (e.g., average of the x i in jth group) quantization. 1 / 14

Unsupervised classification / clustering Clustering Input: x 1,..., x n R d, target cardinality k N. Output: partitioning of x 1,..., x n into k groups. Often done via unsupervised classification; clustering often synonymous with unsupervised classification. Sometimes also have a representative c j R d for each j [k] (e.g., average of the x i in jth group) quantization. 1 / 14

Uses of clustering: feature representations One-hot / dummy variable encoding of f(x) φ(x) = 0. 0 1 0. 0 f(x) position (Often used together with other features.) 2 / 14

Uses of clustering: feature representations Histogram representation Cut up each x i R d into different parts x i,1,..., x i,m R p (e.g., small patches of an image). Cluster all the parts x i,j: get k representatives c 1,..., c k R p. Represent x i by a histogram over {1,..., k} based on assignments of x i s parts to representatives. 3 / 14

Uses of clustering: compression Quantization Replace each x i with its representative x i c f(xi ). Example: quantization at image patch level. 4 / 14

2. k-means

k-means optimization problem Input: x 1,..., x n R d, target cardinality k N. Output: k means c 1,..., c k R d. Objective: choose c 1,..., c k R d to minimize n min j [k] xi cj 2 2. i=1 5 / 14

k-means optimization problem Input: x 1,..., x n R d, target cardinality k N. Output: k means c 1,..., c k R d. Objective: choose c 1,..., c k R d to minimize n min j [k] xi cj 2 2. i=1 Natural assignment function f(x) := arg min x c j 2 2. j [k] 5 / 14

k-means optimization problem Input: x 1,..., x n R d, target cardinality k N. Output: k means c 1,..., c k R d. Objective: choose c 1,..., c k R d to minimize n min j [k] xi cj 2 2. i=1 Natural assignment function f(x) := arg min x c j 2 2. j [k] NP-hard, even if k = 2 or d = 2. 5 / 14

The easy cases k-means clustering for k = 1 Problem: Pick c R d to minimize n x i c 2 2. i=1 6 / 14

The easy cases k-means clustering for k = 1 Problem: Pick c R d to minimize n x i c 2 2. i=1 Solution: Using bias/variance decomposition, best choice is c = 1 n n i=1 xi. 6 / 14

The easy cases k-means clustering for k = 1 Problem: Pick c R d to minimize n x i c 2 2. i=1 Solution: Using bias/variance decomposition, best choice is c = 1 n n i=1 xi. k-means clustering for d = 1 Dynamic programming in time O(n 2 k). 6 / 14

The easy cases k-means clustering for k = 1 Problem: Pick c R d to minimize n x i c 2 2. i=1 Solution: Using bias/variance decomposition, best choice is c = 1 n n i=1 xi. k-means clustering for d = 1 Dynamic programming in time O(n 2 k). Lloyd s algorithm is a popular heuristic for general k and d. (There are many other algorithms for k-means clustering.) 6 / 14

Lloyd s algorithm Start with some initial means c 1,..., c k. Repeat: Partition x 1,..., x n into k clusters C 1,..., C k, based on distance to current means : x i arg min x i c j 2 2, j [k] i [n], breaking ties according to any fixed rule. Update means : c j := mean(c j), j [k]. 7 / 14

Sample run of Lloyd s algorithm 2 (a) 0 Arbitrary initialization of c 1 and c 2. 2 2 0 2 8 / 14

Sample run of Lloyd s algorithm 2 (b) 0 Iteration 1 Optimize assignments. 2 2 0 2 8 / 14

Sample run of Lloyd s algorithm 2 (c) 0 Iteration 1 Optimize means c j. 2 2 0 2 8 / 14

Sample run of Lloyd s algorithm 2 (d) 0 Iteration 2 Optimize assignments. 2 2 0 2 8 / 14

Sample run of Lloyd s algorithm 2 (e) 0 Iteration 2 Optimize means c j. 2 2 0 2 8 / 14

Sample run of Lloyd s algorithm 2 (f) 0 Iteration 3 Optimize assignments. 2 2 0 2 8 / 14

Sample run of Lloyd s algorithm 2 (g) 0 Iteration 3 Optimize means c j. 2 2 0 2 8 / 14

Sample run of Lloyd s algorithm 2 (h) 0 Iteration 4 Optimize assignments. 2 2 0 2 8 / 14

Sample run of Lloyd s algorithm 2 (i) 0 Iteration 4 Optimize means c j. 2 2 0 2 8 / 14

Initializing Lloyd s algorithm Basic idea: Choose initial centers to have good coverage of the data points. 9 / 14

Initializing Lloyd s algorithm Basic idea: Choose initial centers to have good coverage of the data points. Farthest-first traversal For j = 1,..., k: Pick c j R d from among x 1,..., x n farthest from previously chosen c 1,..., c j 1. (c 1 chosen arbitrarily.) 9 / 14

Initializing Lloyd s algorithm Basic idea: Choose initial centers to have good coverage of the data points. Farthest-first traversal For j = 1,..., k: Pick c j R d from among x 1,..., x n farthest from previously chosen c 1,..., c j 1. (c 1 chosen arbitrarily.) But this can be thrown off by outliers... 9 / 14

Initializing Lloyd s algorithm Basic idea: Choose initial centers to have good coverage of the data points. Farthest-first traversal For j = 1,..., k: Pick c j R d from among x 1,..., x n farthest from previously chosen c 1,..., c j 1. (c 1 chosen arbitrarily.) But this can be thrown off by outliers... D 2 sampling (a.k.a. k-means++ ) For j = 1,..., k: Randomly pick c j R d from among x 1,..., x n according to distribution P (x i) (Uniform distribution when j = 1.) min l=1,...,j 1 xi c l 2 2. 9 / 14

Choosing k Usually by hold-out validation / cross-validation on auxiliary task (e.g., supervised learning task). 10 / 14

Choosing k Usually by hold-out validation / cross-validation on auxiliary task (e.g., supervised learning task). Heuristic: Find large gap between (k 1)-means cost and k-means cost. 10 / 14

3. Hierarchical clustering

Clustering at multiple scales k = 2 or k = 3? 11 / 14

Clustering at multiple scales k = 2 or k = 3? Hierarchical clustering: encode clusterings for all values of k in a tree. 11 / 14

Clustering at multiple scales k = 2 or k = 3? Hierarchical clustering: encode clusterings for all values of k in a tree. Caveat: not always possible. 11 / 14

Clustering at multiple scales k = 2 or k = 3? Hierarchical clustering: encode clusterings for all values of k in a tree. Caveat: not always possible. 11 / 14

Clustering at multiple scales k = 2 or k = 3? Hierarchical clustering: encode clusterings for all values of k in a tree. Caveat: not always possible. 11 / 14

Example: phylogenetic tree 12 / 14

Agglomerative clustering Start with every point x i in its own cluster. Repeatedly merge closest pair of clusters, until only one cluster remains. 13 / 14

Agglomerative clustering Start with every point x i in its own cluster. Repeatedly merge closest pair of clusters, until only one cluster remains. Single-linkage dist(c, C ) := Complete-linkage dist(c, C ) := min x x 2. x C,x C max x x C,x C x 2. 13 / 14

Agglomerative clustering Start with every point x i in its own cluster. Repeatedly merge closest pair of clusters, until only one cluster remains. Single-linkage Average-linkage Many variants. E.g., Ward s average linkage dist(c, C ) := Complete-linkage dist(c, C ) := min x x 2. x C,x C max x x C,x C x 2. dist(c, C ) := C C C + C mean(c) mean(c ) 2 2. 13 / 14

Key takeaways Uses of clustering: Unsupervised classification ( hidden subpopulations ). Quantization... k-means clustering: popular objective for clustering and quantization. Lloyd s algorithm: alternating optimization, needs good initialization. Hierarchical clustering: clustering at multiple levels of granularity. 14 / 14