Clustering. K-means clustering

Similar documents
K-Means Clustering. Sargur Srihari

9.1. K-means Clustering

SGN (4 cr) Chapter 11

Three-Dimensional Sensors Lecture 6: Point-Cloud Registration

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Introduction to Mobile Robotics

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

Mixture Models and EM

Introduction to Data Mining

Clustering. RNA-seq: What is it good for? Finding Similarly Expressed Genes. Data... And Lots of It!

k-means, k-means++ Barna Saha March 8, 2016

Clustering. Unsupervised Learning

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

Clustering Lecture 5: Mixture Model

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Clustering. Unsupervised Learning

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Clustering: Centroid-Based Partitioning

CHAPTER 4 K-MEANS AND UCAM CLUSTERING ALGORITHM

University of Florida CISE department Gator Engineering. Clustering Part 2

Data Clustering. Algorithmic Thinking Luay Nakhleh Department of Computer Science Rice University

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Introduction to Machine Learning CMU-10701

Clustering: K-means and Kernel K-means

Clustering. Unsupervised Learning

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Clustering: Overview and K-means algorithm

Based on Raymond J. Mooney s slides

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

9/17/2009. Wenyan Li (Emily Li) Sep. 15, Introduction to Clustering Analysis

ECG782: Multidimensional Digital Signal Processing

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE

Clustering: Overview and K-means algorithm

A Course in Machine Learning

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Cluster Analysis for Microarray Data

Note Set 4: Finite Mixture Models and the EM Algorithm

Machine Learning. Unsupervised Learning. Manfred Huber

Introduction to Computer Science

IBL and clustering. Relationship of IBL with CBR

Hierarchical Clustering Lecture 9

Hierarchical clustering

Clustering Lecture 9: Other Topics. Jing Gao SUNY Buffalo

CSC 411: Lecture 12: Clustering

Hierarchical clustering

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Clustering Algorithms. Margareta Ackerman

Administrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES

Machine Learning for OR & FE

Chapter 9. Classification and Clustering

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Fisher vector image representation

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

Clust Clus e t ring 2 Nov

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

CPSC 340: Machine Learning and Data Mining. Density-Based Clustering Fall 2016

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Nonparametric Importance Sampling for Big Data

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Finding Clusters 1 / 60

1 Case study of SVM (Rob)

K-Means. Algorithmic Methods of Data Mining M. Sc. Data Science Sapienza University of Rome. Carlos Castillo

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

Clustering. Chapter 10 in Introduction to statistical learning

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

6.207/14.15: Networks Lecture 5: Generalized Random Graphs and Small-World Model

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

What to come. There will be a few more topics we will cover on supervised learning

CSE 5243 INTRO. TO DATA MINING

Clustering. (Part 2)

The EMCLUS Procedure. The EMCLUS Procedure

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Lecture 2 The k-means clustering problem

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

CSE 5243 INTRO. TO DATA MINING

Unsupervised Learning

ALTERNATIVE METHODS FOR CLUSTERING

MSA220 - Statistical Learning for Big Data

K-means Clustering of Proportional Data Using L1 Distance

Hierarchical Clustering

Announcements. Image Segmentation. From images to objects. Extracting objects. Status reports next Thursday ~5min presentations in class

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps

Semi-Supervised Clustering with Partial Background Information

Association Rule Mining and Clustering

k-means Clustering David S. Rosenberg April 24, 2018 New York University

Exploratory data analysis for microarrays

Lecture 8: The EM algorithm

ADVANCED MACHINE LEARNING MACHINE LEARNING. Kernel for Clustering kernel K-Means

Clustering algorithms

used to describe all aspects of imaging process: input scene f, imaging system O, and output image g.

Kapitel 4: Clustering

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

ECLT 5810 Clustering

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering and Visualisation of Data

University of Washington Department of Computer Science and Engineering / Department of Statistics

Transcription:

Clustering K-means clustering

Clustering Motivation: Identify clusters of data points in a multidimensional space, i.e. partition the data set {x 1,...,x N } into K clusters. Intuition: A cluster is a group of data points with small inter-point distances compared with the distances to points not in the cluster. Many approaches: K-means clustering (this lecture), hierarchical clustering, self-organizing maps,

K-means clustering (1) Data points: N observations of a random D-dimensional Euclidian variable x, i.e. {x 1,...,x N }, where x n = (x n,1, x n,2,..., x n,d ). x 1 x n x N x n,d D N

K-means clustering (2) Cluster assignment: Each data point x n is assigned to precisely one of K clusters, where K is given. The clustering is given by {r 1 }, where r n,k =1 if x n is assigned to cluster k and 0 otherwise. r 1 r n r N r n,k K N

K-means clustering (2) Cluster assignment: Each data point x n is assigned to precisely one of K clusters, where K is given. The clustering is given by {r 1 }, where r n,k =1 if x n is assigned to cluster k and 0 otherwise. r r n r N μ 1 μ k μ K 1 r n,k K μ k,d D N K Center points: Each cluster is assigned a center point {μ 1 }.

K-means clustering (2) Cluster assignment: Each data point x n is assigned to precisely one of K clusters, where K is given. The clustering is given by {r 1 }, where r n,k =1 if x n is assigned to cluster k and 0 otherwise. r r n r N μ 1 μ k μ K 1 r n,k K μ k,d D N K Center points: Each cluster is assigned a center point {μ 1 }.

K-means clustering (3) Quality of clustering: The quality of a clustering {r 1 } of data points {x 1,...,x N } with center points {μ 1 } is: The sum of the squares of the distances of each data points to the center point of its assigned cluster. Objective: Find {r 1 } and {μ 1 } such that J is minimized.

Algorithm 1) Init: Select initial center points {μ 1 } 2) Update clustering: Minimize J wrt. clustering {r 1 } while keeping the center points {μ 1 } fixed. 3) Update center points: Minimize J wrt. center points {μ 1 } while keeping the clustering {r 1 } fixed. Repeat 2) and 3) until convergence. The algorithms has similarities with the EM-algorithm. Here 2) is the E-step, and 3) is the M-step.

Algorithm update clustering 2) Update clustering: Minimize J wrt. clustering {r 1 } while keeping the center points {μ 1 } fixed. Observe that J is a linear function of r n. We minimize for each n independently by setting r n,k = 1 for that choice of k that minimize the distance x n - μ k 2,i.e. we assign data point x n to the cluster k which has its center point μ k nearest to x n.

Algorithm update clustering 2) Update clustering: Minimize J wrt. clustering {r 1 } while keeping the center points {μ 1 } fixed. Observe that J is a linear function of r n. We minimize for each n independently by setting r n,k = 1 for that choice of k that minimize the distance x n - μ k 2,i.e. we assign data point x n to the cluster k which has its center point μ k nearest to x n. Takes time O(NKD).

Algorithm update center points 3) Update center points: Minimize J wrt. center points {μ 1 } while keeping the clustering {r 1 } fixed. Observe that J is a quadratic function of μ k. We can minimize for each k independently. This yields:

Algorithm update center points 3) Update center points: Minimize J wrt. center points {μ 1 } while keeping the clustering {r 1 } fixed. Observe that J is a quadratic function of μ k. We can minimize for each k independently. This yields: Takes time O(NKD).

Algorithm update center points 3) Update center points: Minimize J wrt. center points {μ 1 } while keeping the clustering {r 1 } fixed. Observe that J is a quadratic function of μ k. We can minimize for each k independently. This yields: The sum of d'th coordinate of the data points in cluster k Takes time O(NKD). Number of data points in cluster k

Algorithm update center points 3) Update center points: Minimize J wrt. center points {μ 1 } while keeping the clustering {r 1 } fixed. Observe that J is a quadratic function of μ k. We can minimize for each k independently. This yields: Takes time O(NKD). The sum of d'th coordinate of the data points in cluster k Mean of the d'th coordinate of the data points in cluster k Number of data points in cluster k

Example (N=?, K=2, D=2) Init Round 1 Round 2 Round 3 Round 4

Example (N=?, K=2, D=2) Init Round 1 Round 2 Round 3 Round 4

Extensions Improve running time: The running time is O(NKD) per round. This might be limiting. Use data structures to e.g. speed up the determination of the closest center point (step 3).

Extensions Improve running time: The running time is O(NKD) per round. This might be limiting. Use data structures to e.g. speed up the determination of the closest center point (step 3). Other dissimilarity measures: Euclidian distance is not applicable to all types of data, so one might want to use another dissimilarity measure V(x,x') between data points. The algorithm remains the same, but the complexity of step 3 (minimizing J' wrt. the center points) might change depending on the dissimilarity measure. To avoid this problem one might say that the center point must be one of the data points.

Remark about K-means clustering: How to select initial center points Simple approach: Choose K random data points as the initial centers Approach from the paper k-means++: The Advantages of Careful Seeding : 1 Choose one center point uniformly at random from among the data points. 2 For each data point x, compute d(x), the euclidian distance between x and the nearest center point that has already been chosen. 3 Add one new data point at random as a new center point, using a weighted probability distribution where a data point x is chosen with probability proportional to d(x) 2. 4 Repeat Steps 2 and 3 until K centers have been chosen.