Consensus Clustering. Javier Béjar URL - Spring 2019 CS - MAI

Similar documents
A Comparison of Resampling Methods for Clustering Ensembles

Consensus clustering by graph based approach

Advances in Fuzzy Clustering and Its Applications. J. Valente de Oliveira and W. Pedrycz (Editors)

A Graph Based Approach for Clustering Ensemble of Fuzzy Partitions

A SURVEY OF CLUSTERING ENSEMBLE ALGORITHMS

Rough Set based Cluster Ensemble Selection

Clustering ensemble method

Consensus Clusterings

Cluster Ensemble Algorithm using the Binary k-means and Spectral Clustering

THE ENSEMBLE CONCEPTUAL CLUSTERING OF SYMBOLIC DATA FOR CUSTOMER LOYALTY ANALYSIS

Weighted-Object Ensemble Clustering

Understanding Clustering Supervising the unsupervised

Cluster Validation. Ke Chen. Reading: [25.1.2, KPM], [Wang et al., 2009], [Yang & Chen, 2011] COMP24111 Machine Learning

COMBINING MULTIPLE PARTITIONS CREATED WITH A GRAPH-BASED CONSTRUCTION FOR DATA CLUSTERING

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Unsupervised Learning and Clustering

Clustering Ensembles Based on Normalized Edges

Accumulation. Instituto Superior Técnico / Instituto de Telecomunicações. Av. Rovisco Pais, Lisboa, Portugal.

Hierarchical and Ensemble Clustering

Multiobjective Data Clustering

Clustering Lecture 3: Hierarchical Methods

Combining Multiple Clustering Systems

Towards conflict resolution in collaborative clustering

A Survey on Consensus Clustering Techniques

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

University of Florida CISE department Gator Engineering. Clustering Part 2

Unsupervised Learning and Clustering

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen

Gene Clustering & Classification

Meta-Clustering. Parasaran Raman PhD Candidate School of Computing

Data Clustering. Danushka Bollegala

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22

Weighted Clustering Ensembles

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

THE AREA UNDER THE ROC CURVE AS A CRITERION FOR CLUSTERING EVALUATION

INF 4300 Classification III Anne Solberg The agenda today:

Cluster ensembles address the problem of combining

Chapter DM:II. II. Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

Hierarchical Clustering

Clustering CS 550: Machine Learning

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Clustering. Chapter 10 in Introduction to statistical learning

Distances, Clustering! Rafael Irizarry!

10701 Machine Learning. Clustering

Alternative Clusterings: Current Progress and Open Challenges

K-MEANS BASED CONSENSUS CLUSTERING (KCC) A FRAMEWORK FOR DATASETS

Contents. 1) Introduction to clustering 2) Partitioning Methods K-Means K-Medoid Choice of parameters: Initialization, Silhouette coefficient

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Machine Learning : Clustering, Self-Organizing Maps

ECLT 5810 Clustering

Unsupervised Learning : Clustering

Ensemble Combination for Solving the Parameter Selection Problem in Image Segmentation

Contents. Preface to the Second Edition

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

COMS 4771 Clustering. Nakul Verma

An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures

DATA clustering is an essential technique in many fields,

ECLT 5810 Clustering

Refined Shared Nearest Neighbors Graph for Combining Multiple Data Clusterings

Lecture #17: Autoencoders and Random Forests with R. Mat Kallada Introduction to Data Mining with R

DATA clustering is an unsupervised learning technique

CSE 5243 INTRO. TO DATA MINING

Lecture on Modeling Tools for Clustering & Regression

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Keywords: clustering algorithms, unsupervised learning, cluster validity

Journal of Statistical Software

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

University of Florida CISE department Gator Engineering. Clustering Part 4

Introduction to Machine Learning. Xiaojin Zhu

University of Florida CISE department Gator Engineering. Clustering Part 5

Measurement of Similarity Using Cluster Ensemble Approach for Categorical Data

K-Means Clustering 3/3/17

Clustering Part 3. Hierarchical Clustering

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

Intro to Artificial Intelligence

Clustering Lecture 4: Density-based Methods

Clustering Part 4 DBSCAN

CSE 5243 INTRO. TO DATA MINING

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Distance based Clustering for Categorical Data

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Lesson 3. Prof. Enza Messina

Unsupervised: no target value to predict

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Machine Learning (BSMC-GA 4439) Wenke Liu

Cluster analysis. Agnieszka Nowak - Brzezinska

Multi-Aspect Tagging for Collaborative Structuring

Based on Raymond J. Mooney s slides

k-means A classical clustering algorithm

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

CS145: INTRODUCTION TO DATA MINING

Cluster Analysis chapter 7. cse634 Data Mining. Professor Anita Wasilewska Compute Science Department Stony Brook University NY

Weighted Consensus Clustering for Identifying Functional Modules In Protein-Protein Interaction Networks

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Random Forest A. Fornaser

DOCUMENT CLUSTERING USING HIERARCHICAL METHODS. 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar. 3. P.Praveen Kumar. achieved.

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Transcription:

Consensus Clustering Javier Béjar URL - Spring 2019 CS - MAI

Consensus Clustering The ensemble of classifiers is a well established strategy in supervised learning Unsupervised learning aims the same goal: Consensus clustering or clustering ensemble The idea is to merge complementary perspectives of the data into a more stable partition URL - Spring 2019 - MAI 1/22

Consensus Clustering Given a set of partitions of the same data X : with: P = {P 1, P 2,..., P n } P 1 = {C 1 1, C 1 2,..., C 1 k 1 }. P n = {C n 1, C n 2,..., C n k n } to obtain a new partition that uses the information of all n partitions URL - Spring 2019 - MAI 2/22

Goals Robustness, the combination has a better performance than each individual partition in some sense Consistency, the combination is similar to the individual partitions Stability, the resulting partition is less sensitive to outliers and noise Novelty, the combination is able to obtain different partitions that can not be obtained by the clustering methods that generated the individual partitions URL - Spring 2019 - MAI 3/22

advantages Knowledge reuse, the consensus can be computed from the partition assignments so previous partitions using the same or different attributes can be used Distributed computing, the individual partitions can be obtained independently Privacy, only the assignments of the individual partitions are needed for the consensus URL - Spring 2019 - MAI 4/22

Consensus Process

Consensus Process Consensus clustering is based generally in a two steps process: 1. Generate the individual partitions to be combined 2. Combine the partitions to generate the final partition = URL - Spring 2019 - MAI 5/22

Partition Generation Different example representations: Diversity by generating partitions with different subsets of attributes Different clustering algorithms: Take advantage that all clustering algorithms have a different biases Different parameter initialization: Use clustering algorithms able to produce different partitions using different parameters Subspace projection: Use dimensionality reductions techniques Subsets of examples: Use random subsamples of the dataset (bootstrapping)

Consensus Generation Coocurrence based methods: Use the labels obtained from each individual clustering and the coincidence of the labels for the examples Relabeling and voting, co-association matrix, graph and hypergraph partitioning, information theory measures, finite mixture models Median partition based methods: Given a set of partitions (P) and a similarity function (Γ(P i, P j )), find the partition (P c ) that maximizes the similarity to the set: P c = arg máx Γ(P, P i ) P P x P i P URL - Spring 2019 - MAI 7/22

Coocurrence based methods

Relabeling and voting First, solve the labeling correspondence problem After, determine the consensus using different strategies of voting Dimitriadou Weingessel, Hornik Voting-Merging: An Ensemble Method for Clustering Lecture Notes in Computer Science, 2001, 2130 1. Generate a clustering 2. Determine the correspondence with the current consensus 3. Each example gets a vote from their cluster assignment 4. Update the consensus URL - Spring 2019 - MAI 8/22

5 3 1 0 2 6 4 7 8 9 Co-Association matrix Co-Association matrix: Count how many times a pair of examples are in the same cluster Use the matrix as a similarity or a new set of characteristics Apply a cluster algorithm to the information from the co-association matrix 0 0 0 0 0 0 0 0 1 3 3 3 11 0 0 0 0 0 0 0 0 1 3 3 3 0 0 0 0 0 0 0 0 1 3 3 3 10 = 0 0 0 1 2 1 2 2 3 1 1 1 0 0 0 1 3 2 3 3 2 0 0 0 0 0 0 1 3 2 3 3 2 0 0 0 0 0 0 1 3 2 3 3 2 0 0 0 3 3 3 2 0 1 0 0 0 0 0 0 3 3 3 2 0 1 0 0 0 0 0 0 3 3 3 2 0 1 0 0 0 0 0 0 2 2 2 3 1 2 1 1 1 0 0 0 1 1 1 2 2 3 2 2 1 0 0 0 0 1 2 3 4 5 6 7 8 9 10 11

Co-Association matrix Fred Finding Consistent Clusters in Data Partitions Multiple Classifier Systems, 2001, 309-318 Fred, Jain Combining Multiple Clusterings Using Evidence Accumulation IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27, 835-850 1. Compute the co-association matrix 2. Apply hierarchical clustering (different criteria) 3. Use a heuristic to cut the resulting dendrogram URL - Spring 2019 - MAI 10/22

Graph and hypergraph partitioning Define consensus as a graph partitioning problem Different methods to build a graph or hypergraph from the partitions Strehl, Ghosh Cluster ensembles- A knowledge reuse framework for combining multiple partitions Journal of Machine Learning Research, MIT Press, 2003, 3, 583-617 Cluster based Similarity Partitioning Algorithm (CSPA) HyperGraph-Partitioning Algorithm (HGPA) Meta-CLustering Algorithm (MCLA) URL - Spring 2019 - MAI 11/22

CSPA Compute a similarity matrix from the clusterings Hyperedges matrix: For all clusterings, compute an indicator matrix (H) that represents the links among examples and clusters (Hypergraph) Compute the similarity matrix as: S = 1 r HHT where r is the number of clusterings Apply a graph partitioning algorithm to the distance matrix (METIS) Drawback: Quadratic cost in the number of examples O(n 2 kr) URL - Spring 2019 - MAI 12/22

CSPA: example C 1 C 2 C 3 x 1 1 2 1 x 2 1 2 1 x 3 1 1 2 x 4 2 1 2 x 5 2 3 2 = C 1,1 C 1,2 C 2,1 C 2,2 C 2,3 C 3,1 C 3,2 x 1 1 0 0 1 0 1 0 x 2 1 0 0 1 0 1 0 x 3 1 0 1 0 0 0 1 x 4 0 1 1 0 0 0 1 x 5 0 1 0 0 1 0 1 S = x 1 x 2 x 3 x 4 x 5 x 1 1 1 1/3 0 0 x 2 1 1 1/3 0 0 x 3 1/3 1/3 1 2/3 1/3 x 4 0 0 2/3 1 2/3 x 5 0 0 1/3 2/3 1 URL - Spring 2019 - MAI 13/22

HGPA Partitions the hypergraph generated by the examples and their cluterings The indicator matrix is partitioned into k clusters of approximately the same size The HMETIS hypergraph partitioning algorithm is used Linear in the number of examples O(nkr)

MCLA Group and collapse hyperedges and assign the objects to the hyperedge in which they participate the most Algorithm 1. Build a meta-graph with the hyperedges as vertices (edges have the vertices similarities as weights, Jaccard) 2. Partition the hyperdeges into k metaclusters 3. Collapse the hyperedges of each metacluster 4. Assign examples to their most associated metacluster Linear in the number of examples O(nk 2 r 2 ) URL - Spring 2019 - MAI 15/22

MCLA: Metagraph C 1,1 C 1,2 C 2,1 C 2,2 C 2,3 C 3,1 C 3,2 x 1 1 0 0 1 0 1 0 x 2 1 0 0 1 0 1 0 x 3 1 0 1 0 0 0 1 x 4 0 1 1 0 0 0 1 x 5 0 1 0 0 1 0 1 C11 C12 C21 C31 C22 C32 C23 URL - Spring 2019 - MAI 16/22

Information Theory Information Theory measures are used to assess the similarity among the clusters of the parititions For instance: Normalized Mutual Information Category Utility The labels can be transformed to a new set of features for each example (measuring example coincidence) The new features can be used to partition the examples using a clustering algorithm URL - Spring 2019 - MAI 17/22

Finite Mixture Models The problem is transformed into the estimation of the probability of assignment The mixture is composed by the product of multinomial distributions, one for each clustering Each example is described by the set of assignments of each clustering An EM algorithm is used to find the probability distribution that maximized the agreement URL - Spring 2019 - MAI 18/22

Median partition based methods

Median Partition Methods Given a set of partitions (P) and a similarity function among partitions Γ(P i, P j ), the Median Partition P c is the one that maximizes the similarity to the set P c = arg máx Γ(P, P i ) P P x P i P Has been proven to be a NP-hard problem for some similarity functions Γ URL - Spring 2019 - MAI 19/22

Similarity functions Based on the agreements and disagreements of pairs of examples between two partitions Rand index, Jaccard coefficient, Mirkin distance (and their randomness adjusted versions) Based on set matching can be also used Purity, F-measure Based on information theory measures (how much information two partitions share) NMI, Variation of Information, V-measure URL - Spring 2019 - MAI 20/22

Strategies Best of k (the partition of the set that minimizes the distance) Optimization using local search: Hill Climbing, Simulated Annealing, Genetic Algorithms Perform a movement of examples between two clusters of the current solution to improve the partition Non Negative Matrix Factorization Find the partition matrix closest to the averaged association matrix of a set of partitions URL - Spring 2019 - MAI 21/22

Python Notebooks This Python Notebook has examples of consensus clustering Consensus clustering Notebook (click here to go to the url) If you have downloaded the code from the repository you will able to play with the notebooks (run jupyter notebook to open the notebooks) URL - Spring 2019 - MAI 22/22