Interpretability and Informativeness of Clustering Methods for Exploratory Analysis of Clinical Data
|
|
- Brandon Weaver
- 5 years ago
- Views:
Transcription
1 Interpretability and Informativeness of Clustering Methods for Exploratory Analysis of Clinical Data Martin Azizyan, Aarti Singh Machine Learning Department Carnegie Mellon University Wei Wu Lane Center for Computational Biology Carnegie Mellon University 1 Introduction Clustering methods are among the most commonly used tools for exploratory data analysis. However, using clustering to perform data analysis can be challenging for modern datasets that contain a large number of dimensions, are complex in nature, and lack a ground-truth labeling. Traditional tools, like summarization and plotting of clusters, are of limited benefit in a high-dimensional setting. On the other hand, while many clustering methods have been studied theoretically, such analysis often has limited instructive value for practical purposes due to unverifiable assumptions, oracle-dependent tuning parameters, or unquantified finite-sample effects. This study is motivated by a clinical dataset that is affected by each of the complications described above, which can reduce the reliability and informativeness of common approaches to cluster analysis. The dataset, which contains measurements of 11 demographic and medical features of 78 patients with various severity levels of asthma and healthy control subjects from the Severe Asthma Research Program (SARP), was studied previously by [1] who used K-means clustering to discover subphenotypes of similarly presenting patients. Clinical knowledge suggests that i) the dataset contains multiple overlapping clusters of patients with no clear low-density separation between them, because symptoms are not clearly delineated across severity levels of asthma, and ii) the density of data points (patients) in different clusters are different because patients with increasing severity of asthma are increasingly rarer (more patients have milder asthma than severe, with the numbers getting fewer as the severity level increases). These characteristics plague other clinical datasets as well, and hence we expect our investigation will inform practice in conducting exploratory analyses of modern clinical datasets using clustering. In order to understand to what extent results based on K-means or other types of clustering methods can be interpreted as reliable, we compare their behaviors on several synthetic examples designed to capture possible complex characteristics of the data, and on the dataset itself. In addition to K-means, we examine the clusterpath [, ], a k-nn graph based density clustering algorithm [], hierarchical clustering [], and spectral clustering [6]. The clusterpath and the density clustering are chosen because they partition data by taking the density of the data into consideration, which seems to be suitable for the asthma dataset. Since these methods are motivated by hierarchical clustering and/or spectral clustering, the latter methods are also included in this work for comparison purposes. We believe that it is useful to better understand the behaviors of different types of clustering algorithms on real-life data and to equip practitioners with some prior intuition regarding the conceptual meaning of clusters discovered by these algorithms. In particular, understanding the significant differences between clustering methods on real data can i) inform the choice of a clustering method from the large set of available ones, and ii) enable meaningful conclusions to be drawn by applying several clustering methods to the same dataset and comparing the results. Methods The objective of K-means clustering is to find a partitioning of the dataset which minimizes the total sum of squared distances between points in the same cluster. While solving this optimization problem exactly is combinatorially difficult, common practice is to use a large number of random restarts of an approximation algorithm (e.g. we use the one implemented in R [7]). 1
2 Spectral clustering [6] first embeds the data points into a lower-dimensional space using eigenvectors of the Laplacian of a graph of similarities between the data points, and subsequently applying K- means on the resulting dataset. We use the symmetric normalized graph Laplacian [8] in all our experiments to compute the embedding. The clusterpath [, ] is a convex formulation of clustering. The algorithm requires as input a (weighted) graph capturing the similarity between the data points (which is similar to spectral clustering), in addition to the original matrix representing the points in Euclidean space. Furthermore, by varying a tuning parameter over a range of values, the resulting clusterings often form a sequence of refinements which can be represented as a hierarchical clustering. Due to the space constraint, we refer to [] for further details. While the clusterpath can be thought of as a convex relaxation of a clustering objective similar to K-means, it is clear that, at least in some cases, the two methods can produce very different results [], in part due to the use of the graph in the clusterpath. Typically either a k-nearest-neighbor- or a Gaussian kernel-based graph is used. We use a k-nearest-neighbor graph to limit the number of edges (with non-zero weight) in the graph, since the number of these edges can significantly affect the computational cost of the algorithm. Computing the clusterpath is non-trivial. Recently, several methods for accelerating the optimization of the clusterpath objective have been proposed [9, 10]. In this work, we implement an algorithm similar to the method described in [9] for our experiments, with some additional tuning for performance including adaptive restarting of acceleration. Despite this, computing the clusterpath is a few orders of magnitude slower than any other methods we use. The density clustering method described in [] uses pruned k-nearest-neighbor graphs to estimate the points lying in connected components of the level sets of the density. Each split in the resulting cluster tree represents points lying in high-density regions that are separated by a low-density region. See for example [11] for further discussion of density clustering methods. In order to maximize the comparability of results, for each dataset we use the same (k-nearest neighbor) graph as input to spectral clustering, density clustering, and the clusterpath. For hierarchical clustering, we use a bottom-up approach based on Ward s clustering criterion [1]. Results We begin by comparing the methods described in the previous section using the projection of the asthma dataset on its top principal components [1]. The PCA projection is plotted in Figure 1a, with the 89 healthy control subjects shown in black, and the remaining 89 asthma patients with varying degrees of symptom severity shown in red. Even though this low dimensional projection cannot capture all the structure of the original dataset, it can serve as a guide for designing further experiments to elucidate aspects of the algorithms in question that may be relevant to a clinical dataset. Figure 1 shows the results from each of the methods described in the previous section on the PCA projected dataset. We used a -nearest neighbor graph where applicable (the results were not very sensitive to the number of nearest neighbors used). The number of clusters computed using K-means was set to 6 to emulate the analysis in [1]. Our first observation is that each method except spectral clustering identifies the control subject group as a single cluster. The spectral clustering result with 6 clusters (Figure 1c) is similar to the K-means clustering, except for the splitting of the control group and joining of the green and cyan clusters. The latter two groups are split when the number of clusters is increased to 7 (Figure 1d), however a portion of the control group is placed in a cluster predominantly composed of asthma patients. This effect is surprising in light of the fact that the density cluster tree separates the control group as a distinct cluster in principle, spectral clustering should partition the data in low density regions as well. On the other hand, it is evident from Figure 1f that density clustering alone provides very limited information about this dataset beyond separating the control group. Although the portion of the data consisting of asthma patients has some interesting structure, the only conclusion about these data points that can be reached from the density cluster tree is that they appear unimodal. Finally, we note that while both the clusterpath and the hierarchical clustering do approximately replicate the same partitioning as K-means, a visual inspection of the hierarchical clustering tree gives the impression that there are as many as reliably separated clusters, while the clusterpath only strongly separates the control group.
3 control patient (a) PCA of asthma dataset (b) K-means, K = 6 (c) Spectral (6 clusters) (d) Spectral (7 clusters) (e) Clusterpath 0 1 (f) Density cluster tree (g) Hierarchical clustering Figure 1: Results on dimensional PCA projection of asthma dataset. Leaves in subfigures (e)-(g) colored according to K-means results in subfigure (b). To further explore the behaviors of these methods in the absence of well-separated clusters, we generate a two-dimensional synthetic dataset by drawing 100 samples from each of three overlapping non-spherical Gaussian components, giving the points shown in Figure a. We see from the density cluster tree (Figure d) that the mixture components are not separated by regions of detectably lower density. Despite this, K-means and spectral clustering both estimate reasonable approximations of the true mixture component labels, as do the clusterpath and hierarchical clustering. It is interesting to note that the black cluster is significantly better separated than the other two according to both the clusterpath and the hierarchical clustering. Finally, we compare the results of each clustering method on the full dataset analyzed by [1]. Figure shows the K-means and spectral clustering results (computed using the full, not projected, dataset) plotted on the same PCA projection as Figure 1a. The leaves of the dendrograms are colored using the labels of the K-means clustering in Figure a (which are identical to the K = 6 clusters discovered by K-means in the analysis of [1]). Spectral clustering (Figure b) again fails to maintain the control group as a single cluster as with the low-dimensional PCA projection data. K-means, the clusterpath, and hierarchical clustering all identify the normal control group as a separate cluster. On the other hand, density clustering now entirely fails to find any clusters whatsoever. It is not immediately clear why it is possible that the control group is only distributed around a separate mode after the PCA projection, or that the dimensionality of the data is simply too high (compared to the number of samples) for this particular algorithm to detect a density cluster. In either case, the density clustering results are entirely non-informative here. The clusterpath results are not much better; beyond separating the control group and a few outliers, the clusterpath tree has no structure. The same is not true of hierarchical clustering although the tree is noisier than its low-dimensional counterpart in Figure 1, it does have some non-trivial structure. It also seems to find some similar clusters to K-means. Conclusion To summarize, in this paper, we explored the interpretability and informativeness of popular clustering methods for identifying groups of asthma patients with different severity levels based on their phenotypes. This dataset, as many other clinical datasets, is characterized by clusters of varying density with the asthma patients corresponding to an almost unimodal distribution. Our results indicate that this characteristic renders methods such as density clustering (which has been the subject of many empirical and theoretical studies) non-informative, despite being density sensitive, as it re-
4 (a) True mixture component labels (b) Clusterpath (c) K-means clustering (d) Density cluster tree (e) Spectral clustering (f) Hierarchical clustering Figure : Results on a simple synthetic dataset with overlapping clusters. Dendrograms colored according to true mixture component labels (a) K-means, K = 6 (b) Spectral (6 clusters) (c) Clusterpath (d) Density cluster tree (e) Hierarchical clustering Figure : Results on full-dimensional asthma dataset. Leaves in subfigures (c)-(e) colored according to K-means results in subfigure (a). lies on existence of low-density separation between clusters. Among partitional methods, K-means seems to outperform spectral clustering since the latter tends to break up the normal control group. Amongst hierarchical methods, the clusterpath fails to yield informative results (other than the normal control cluster) on the asthma dataset, and requires orders of magnitude more computational resources to compute even when using a highly tuned algorithm, while agglomerative hierarchical clustering with the Ward criterion appears to generate noisier (but similar) results than K-means. Acknowledgments We thank Dr. Sally Wenzel at the University of Pittsburgh School of Medicine for sharing with us the asthma data. This research is supported in part by NSF grant IIS-11168, NSF CAREER award IIS-11, and R01GM08769.
5 References [1] Wei Wu, Eugene Bleecker, Wendy Moore, William W Busse, Mario Castro, Kian Fan Chung, William J Calhoun, Serpil Erzurum, Benjamin Gaston, Elliot Israel, et al. Unsupervised phenotyping of severe asthma research program participants using expanded lung data. Journal of Allergy and Clinical Immunology, 1(): , 01. [] Toby Dylan Hocking, Armand Joulin, Francis Bach, Jean-Philippe Vert, et al. Clusterpath: an algorithm for clustering using convex fusion penalties. In 8th international conference on machine learning, 011. [] Fredrik Lindsten, Henrik Ohlsson, and Lennart Ljung. Clustering using sum-of-norms regularization: With application to particle filter output computation. In Statistical Signal Processing Workshop (SSP), 011 IEEE, pages IEEE, 011. [] Kamalika Chaudhuri and Sanjoy Dasgupta. Rates of convergence for the cluster tree. In Advances in Neural Information Processing Systems, pages 1, 010. [] Rui Xu, Donald Wunsch, et al. Survey of clustering algorithms. Neural Networks, IEEE Transactions on, 16():6 678, 00. [6] Andrew Y Ng, Michael I Jordan, Yair Weiss, et al. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, :89 86, 00. [7] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 01. [8] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17():9 16, 007. [9] E. C. Chi and K. Lange. Splitting Methods for Convex Clustering. ArXiv e-prints, April 01. [10] G. K. Chen, E. Chi, J. Ranola, and K. Lange. Convex Clustering: An Attractive Alternative to Hierarchical Clustering. ArXiv e-prints, September 01. [11] Pavel Berkhin. A survey of clustering data mining techniques. In Grouping multidimensional data, pages 71. Springer, 006. [1] F. Murtagh and P. Legendre. Ward s Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm. ArXiv e-prints, November 011. [1] IT Jolliffe. Principal component analysis. Springer Series in Statistics, Berlin: Springer, 1986, 1, 1986.
Cluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationExperimental Evaluation of Feature Selection Methods for Clustering
Experimental Evaluation of Feature Selection Methods for Clustering Martin Azizyan 1, Aarti Singh 1, and Wei Wu 2 1 Machine Learning Department, Carnegie Mellon University 2 Lane Center for Computational
More informationClustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic
Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationSpectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,
Spectral Clustering XIAO ZENG + ELHAM TABASSI CSE 902 CLASS PRESENTATION MARCH 16, 2017 1 Presentation based on 1. Von Luxburg, Ulrike. "A tutorial on spectral clustering." Statistics and computing 17.4
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationAN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION WILLIAM ROBSON SCHWARTZ University of Maryland, Department of Computer Science College Park, MD, USA, 20742-327, schwartz@cs.umd.edu RICARDO
More informationA SURVEY ON CLUSTERING ALGORITHMS Ms. Kirti M. Patil 1 and Dr. Jagdish W. Bakal 2
Ms. Kirti M. Patil 1 and Dr. Jagdish W. Bakal 2 1 P.G. Scholar, Department of Computer Engineering, ARMIET, Mumbai University, India 2 Principal of, S.S.J.C.O.E, Mumbai University, India ABSTRACT Now a
More informationClustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2
So far in the course Clustering Subhransu Maji : Machine Learning 2 April 2015 7 April 2015 Supervised learning: learning with a teacher You had training data which was (feature, label) pairs and the goal
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 2
Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationLimitations of Matrix Completion via Trace Norm Minimization
Limitations of Matrix Completion via Trace Norm Minimization ABSTRACT Xiaoxiao Shi Computer Science Department University of Illinois at Chicago xiaoxiao@cs.uic.edu In recent years, compressive sensing
More informationClustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015
Clustering Subhransu Maji CMPSCI 689: Machine Learning 2 April 2015 7 April 2015 So far in the course Supervised learning: learning with a teacher You had training data which was (feature, label) pairs
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationarxiv: v1 [cs.lg] 3 Jan 2018
CLUSTERING OF DATA WITH MISSING ENTRIES Sunrita Poddar, Mathews Jacob Department of Electrical and Computer Engineering, University of Iowa, IA, USA arxiv:1801.01455v1 [cs.lg] 3 Jan 018 ABSTRACT The analysis
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationSpectral Clustering and Community Detection in Labeled Graphs
Spectral Clustering and Community Detection in Labeled Graphs Brandon Fain, Stavros Sintos, Nisarg Raval Machine Learning (CompSci 571D / STA 561D) December 7, 2015 {btfain, nisarg, ssintos} at cs.duke.edu
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationE-Companion: On Styles in Product Design: An Analysis of US. Design Patents
E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing
More informationClustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search
Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2
More informationDATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm
DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationMotivation. Technical Background
Handling Outliers through Agglomerative Clustering with Full Model Maximum Likelihood Estimation, with Application to Flow Cytometry Mark Gordon, Justin Li, Kevin Matzen, Bryce Wiedenbeck Motivation Clustering
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu Clustering Overview Supervised vs. Unsupervised Learning Supervised learning (classification) Supervision: The training data (observations, measurements,
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationLocality Preserving Projections (LPP) Abstract
Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL
More informationRECOVERY OF PARTIALLY OBSERVED DATA APPEARING IN CLUSTERS. Sunrita Poddar, Mathews Jacob
RECOVERY OF PARTIALLY OBSERVED DATA APPEARING IN CLUSTERS Sunrita Poddar, Mathews Jacob Department of Electrical and Computer Engineering The University of Iowa, IA, USA ABSTRACT We propose a matrix completion
More informationThe Projected Dip-means Clustering Algorithm
Theofilos Chamalis Department of Computer Science & Engineering University of Ioannina GR 45110, Ioannina, Greece thchama@cs.uoi.gr ABSTRACT One of the major research issues in data clustering concerns
More informationSGN (4 cr) Chapter 11
SGN-41006 (4 cr) Chapter 11 Clustering Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 25, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter
More informationThe K-modes and Laplacian K-modes algorithms for clustering
The K-modes and Laplacian K-modes algorithms for clustering Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://faculty.ucmerced.edu/mcarreira-perpinan
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationAarti Singh. Machine Learning / Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg
Spectral Clustering Aarti Singh Machine Learning 10-701/15-781 Apr 7, 2010 Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg 1 Data Clustering Graph Clustering Goal: Given data points X1,, Xn and similarities
More informationSELECTION OF A MULTIVARIATE CALIBRATION METHOD
SELECTION OF A MULTIVARIATE CALIBRATION METHOD 0. Aim of this document Different types of multivariate calibration methods are available. The aim of this document is to help the user select the proper
More informationAn Empirical Analysis of Communities in Real-World Networks
An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 5
Clustering Part 5 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville SNN Approach to Clustering Ordinary distance measures have problems Euclidean
More informationSubspace Clustering with Global Dimension Minimization And Application to Motion Segmentation
Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation Bryan Poling University of Minnesota Joint work with Gilad Lerman University of Minnesota The Problem of Subspace
More informationSelecting Models from Videos for Appearance-Based Face Recognition
Selecting Models from Videos for Appearance-Based Face Recognition Abdenour Hadid and Matti Pietikäinen Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P.O.
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More informationLecture 10: Semantic Segmentation and Clustering
Lecture 10: Semantic Segmentation and Clustering Vineet Kosaraju, Davy Ragland, Adrien Truong, Effie Nehoran, Maneekwan Toyungyernsub Department of Computer Science Stanford University Stanford, CA 94305
More informationPattern Clustering with Similarity Measures
Pattern Clustering with Similarity Measures Akula Ratna Babu 1, Miriyala Markandeyulu 2, Bussa V R R Nagarjuna 3 1 Pursuing M.Tech(CSE), Vignan s Lara Institute of Technology and Science, Vadlamudi, Guntur,
More informationBig-data Clustering: K-means vs K-indicators
Big-data Clustering: K-means vs K-indicators Yin Zhang Dept. of Computational & Applied Math. Rice University, Houston, Texas, U.S.A. Joint work with Feiyu Chen & Taiping Zhang (CQU), Liwei Xu (UESTC)
More informationRedefining and Enhancing K-means Algorithm
Redefining and Enhancing K-means Algorithm Nimrat Kaur Sidhu 1, Rajneet kaur 2 Research Scholar, Department of Computer Science Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India 1 Assistant Professor,
More informationBehavioral Data Mining. Lecture 18 Clustering
Behavioral Data Mining Lecture 18 Clustering Outline Why? Cluster quality K-means Spectral clustering Generative Models Rationale Given a set {X i } for i = 1,,n, a clustering is a partition of the X i
More informationK-means and Hierarchical Clustering
K-means and Hierarchical Clustering Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these
More informationCluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6
Cluster Analysis and Visualization Workshop on Statistics and Machine Learning 2004/2/6 Outlines Introduction Stages in Clustering Clustering Analysis and Visualization One/two-dimensional Data Histogram,
More informationParameter Selection for EM Clustering Using Information Criterion and PDDP
Parameter Selection for EM Clustering Using Information Criterion and PDDP Ujjwal Das Gupta,Vinay Menon and Uday Babbar Abstract This paper presents an algorithm to automatically determine the number of
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationLocality Preserving Projections (LPP) Abstract
Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL
More informationCSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection
CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationLecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 23 November 2018 Overview Unsupervised Machine Learning overview Association
More informationPruning Nearest Neighbor Cluster Trees
Pruning Nearest Neighbor Cluster Trees Samory Kpotufe Max Planck Institute for Intelligent Systems Tuebingen, Germany Joint work with Ulrike von Luxburg We ll discuss: An interesting notion of clusters
More informationData Clustering Hierarchical Clustering, Density based clustering Grid based clustering
Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms
More informationSpectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014
Spectral Clustering Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014 What are we going to talk about? Introduction Clustering and
More informationGraph projection techniques for Self-Organizing Maps
Graph projection techniques for Self-Organizing Maps Georg Pölzlbauer 1, Andreas Rauber 1, Michael Dittenbach 2 1- Vienna University of Technology - Department of Software Technology Favoritenstr. 9 11
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationBagging for One-Class Learning
Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one
More informationPart I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a
Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationClustering Part 3. Hierarchical Clustering
Clustering Part Dr Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Hierarchical Clustering Two main types: Agglomerative Start with the points
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationLearning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009
Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer
More informationClustering in Networks
Clustering in Networks (Spectral Clustering with the Graph Laplacian... a brief introduction) Tom Carter Computer Science CSU Stanislaus http://csustan.csustan.edu/ tom/clustering April 1, 2012 1 Our general
More informationUnsupervised learning in Vision
Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual
More informationClustering part II 1
Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:
More informationJure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research
Jure Leskovec, Cornell/Stanford University Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research Network: an interaction graph: Nodes represent entities Edges represent interaction
More informationStatistical Physics of Community Detection
Statistical Physics of Community Detection Keegan Go (keegango), Kenji Hata (khata) December 8, 2015 1 Introduction Community detection is a key problem in network science. Identifying communities, defined
More informationClustering and Dimensionality Reduction
Clustering and Dimensionality Reduction Some material on these is slides borrowed from Andrew Moore's excellent machine learning tutorials located at: Data Mining Automatically extracting meaning from
More informationTELCOM2125: Network Science and Analysis
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationThe clustering in general is the task of grouping a set of objects in such a way that objects
Spectral Clustering: A Graph Partitioning Point of View Yangzihao Wang Computer Science Department, University of California, Davis yzhwang@ucdavis.edu Abstract This course project provide the basic theory
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationCHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM
96 CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM Clustering is the process of combining a set of relevant information in the same group. In this process KM algorithm plays
More informationHierarchical Clustering
Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree-like diagram that records the sequences of merges
More informationSummer School in Statistics for Astronomers & Physicists June 15-17, Cluster Analysis
Summer School in Statistics for Astronomers & Physicists June 15-17, 2005 Session on Computational Algorithms for Astrostatistics Cluster Analysis Max Buot Department of Statistics Carnegie-Mellon University
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationA Novel Spectral Clustering Method Based on Pairwise Distance Matrix
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 26, 649-658 (2010) A Novel Spectral Clustering Method Based on Pairwise Distance Matrix CHI-FANG CHIN 1, ARTHUR CHUN-CHIEH SHIH 2 AND KUO-CHIN FAN 1,3 1 Institute
More informationHomework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)
Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationClustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures
Clustering and Dissimilarity Measures Clustering APR Course, Delft, The Netherlands Marco Loog May 19, 2008 1 What salient structures exist in the data? How many clusters? May 19, 2008 2 Cluster Analysis
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationCluster Analysis (b) Lijun Zhang
Cluster Analysis (b) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Grid-Based and Density-Based Algorithms Graph-Based Algorithms Non-negative Matrix Factorization Cluster Validation Summary
More informationBioinformatics - Lecture 07
Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles
More informationImage Processing. Image Features
Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching
More informationMultiobjective Data Clustering
To appear in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Multiobjective Data Clustering Martin H. C. Law Alexander P. Topchy Anil K. Jain Department of Computer Science
More information