Clustering k-mean clustering

Similar documents
Clustering. k-mean clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Clustering, cont. Genome 373 Genomic Informatics Elhanan Borenstein. Some slides adapted from Jacques van Helden

K-Means Clustering 3/3/17

Clustering Jacques van Helden

Kapitel 4: Clustering

Contents. ! Data sets. ! Distance and similarity metrics. ! K-means clustering. ! Hierarchical clustering. ! Evaluation of clustering results

Dimension reduction : PCA and Clustering

Clustering. Lecture 6, 1/24/03 ECS289A

Unsupervised Learning : Clustering

Clustering. RNA-seq: What is it good for? Finding Similarly Expressed Genes. Data... And Lots of It!

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

COMS 4771 Clustering. Nakul Verma

Administrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Gene Clustering & Classification

Clustering Lecture 5: Mixture Model

10701 Machine Learning. Clustering


Machine Learning. Unsupervised Learning. Manfred Huber

Based on Raymond J. Mooney s slides

ECS 234: Data Analysis: Clustering ECS 234

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

Introduction to Mobile Robotics

Hierarchical Clustering 4/5/17

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Unsupervised Learning

Clustering Techniques

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Unsupervised Learning: Clustering

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Search Engines. Information Retrieval in Practice

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

CLUSTERING IN BIOINFORMATICS

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Clustering CS 550: Machine Learning

Expectation Maximization!

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Biological Networks Analysis

Network Traffic Measurements and Analysis

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Clustering part II 1

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION

CS Introduction to Data Mining Instructor: Abdullah Mueen

Clustering & Classification (chapter 15)

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Unsupervised Learning and Clustering

Unsupervised learning in Vision

CSE 494 Project C. Garrett Wolf

INF 4300 Classification III Anne Solberg The agenda today:

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

Exploratory data analysis for microarrays

CHAPTER 4: CLUSTER ANALYSIS

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

What is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology

Clustering: K-means and Kernel K-means

Foundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot

Clustering in Ratemaking: Applications in Territories Clustering

k-means Clustering Todd W. Neller Gettysburg College

Expectation Maximization (EM) and Gaussian Mixture Models

Clustering COMS 4771

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Unsupervised Learning and Data Mining

University of Florida CISE department Gator Engineering. Clustering Part 2

Unsupervised Learning

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Unsupervised Learning and Clustering

3. Cluster analysis Overview

University of Florida CISE department Gator Engineering. Clustering Part 4

Data Mining Algorithms

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

ECLT 5810 Clustering

Lecture 10: Semantic Segmentation and Clustering

k-means Clustering Todd W. Neller Gettysburg College Laura E. Brown Michigan Technological University

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

Lecture 2 The k-means clustering problem

Data Informatics. Seon Ho Kim, Ph.D.

Clustering Part 4 DBSCAN

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

ECLT 5810 Clustering

Clustering gene expression data

Introduction to Artificial Intelligence

UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania

Clustering Basic Concepts and Algorithms 1

Cluster Analysis: Agglomerate Hierarchical Clustering

Introduction to Machine Learning CMU-10701

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

K-Means. Oct Youn-Hee Han

Clustering Color/Intensity. Group together pixels of similar color/intensity.

CLUSTERING ALGORITHMS

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Introduction to Machine Learning. Xiaojin Zhu

Transcription:

Clustering k-mean clustering Genome 373 Genomic Informatics Elhanan Borenstein

The clustering problem: partition genes into distinct sets with high homogeneity and high separation Clustering (unsupervised) vs. classification Clustering methods: Agglomerative vs. divisive; hierarchical vs. non-hierarchical Hierarchical clustering algorithm: 1. Assign each object to a separate cluster. 2. Find the pair of clusters with the shortest distance, and regroup them into a single cluster. 3. Repeat 2 until there is a single cluster. Many possible distance metrics Metric matters A quick review

K-mean clustering Divisive Non-hierarchical

K-mean clustering An algorithm for partitioning nobservations/points into kclusters such that each observation belongs to the cluster with the nearest mean/center cluster_1 mean cluster_2 mean Isn t this a somewhat circular definition? Assignment of a point to a cluster is based on the proximity of the point to the cluster mean But the cluster mean is calculated based on all the points assigned to the cluster.

K-mean clustering: Chicken and egg An algorithm for partitioning n observations/points into kclusters such that each observation belongs to the cluster with the nearest mean/center The chicken and egg problem: I do not know the means before I determine the partitioning into clusters I do not know the partitioning into clusters before I determine the means Key principle - cluster around mobile centers: Start with some randomlocations of means/centers, partition into clusters according to these centers, and then correct the centers according to the clusters (somewhat similar to expectation-maximization algorithm)

K-mean clustering algorithm The number of centers, k, has to be specified a-priori Algorithm: 1. Arbitrarily select k initial centers 2. Assign each element to the closest center 3. Re-calculate centers (mean position of the assigned elements) 4. Repeat 2and 3until one of the following termination conditions is reached: i. The clusters are the same as in the previous iteration ii. The difference between two iterations is smaller than a specified threshold iii. The maximum number of iterations has been reached How can we do this efficiently?

Partitioning the space Assigning elements to the closest center B A

Partitioning the space Assigning elements to the closest center closer to A than to B B closer to B than to A A

Partitioning the space Assigning elements to the closest center closer to B than to A closer to A than to B B closer to B than to C A C

Partitioning the space Assigning elements to the closest center closest to A B closest to B A C closest to C

Partitioning the space Assigning elements to the closest center B A C

Voronoidiagram Decomposition of a metric space determined by distances to a specified discrete set of centers in the space Each colored cell represents the collection of all points in this space that are closer to a specific center s than to any other center Several algorithms exist to find the Voronoi diagram.

K-mean clustering algorithm The number of centers, k, has to be specified a priori Algorithm: 1. Arbitrarily select k initial centers 2. Assign each element to the closest center (Voronoi) 3. Re-calculate centers (mean position of the assigned elements) 4. Repeat 2and 3until one of the following termination conditions is reached: i. The clusters are the same as in the previous iteration ii. The difference between two iterations is smaller than a specified threshold iii. The maximum number of iterations has been reached

K-mean clustering example Two sets of points randomly generated 200 centered on (0,0) 50 centered on (1,1)

K-mean clustering example Two points are randomly chosen as centers (stars)

K-mean clustering example Each dot can now be assigned to the cluster with the closest center

K-mean clustering example First partition into clusters

K-mean clustering example Centers are re-calculated

K-mean clustering example And are again used to partition the points

K-mean clustering example Second partition into clusters

K-mean clustering example Re-calculating centers again

K-mean clustering example And we can again partition the points

K-mean clustering example Third partition into clusters

K-mean clustering example After 6 iterations: The calculated centers remains stable

K-mean clustering: Summary The convergence of k-mean is usually quite fast (sometimes 1 iteration results in a stable solution) K-means is time- and memory-efficient Strengths: Simple to use Fast Can be used with very large data sets Weaknesses: The number of clusters has to be predetermined The results may vary depending on the initial choice of centers

K-mean clustering: Variations Expectation-maximization (EM): maintains probabilisticassignments to clusters, instead of deterministic assignments, and multivariate Gaussian distributions instead of means. k-means++: attempts to choose better starting points. Some variations attempt to escape local optima by swapping points between clusters

The take-home message Hierarchical clustering K-mean clustering? D haeseleer, 2005

What else are we missing?

What else are we missing? What if the clusters are not linearly separable?

Cell cycle Spellman et al. (1998)