Alternative Clusterings: Current Progress and Open Challenges
|
|
- Dortha Elliott
- 5 years ago
- Views:
Transcription
1 Alternative Clusterings: Current Progress and Open Challenges James Bailey Department of Computer Science and Software Engineering The University of Melbourne, Australia 1
2 Introduction Cluster analysis: group similar objects into clusters No single solution => Equally important, different views or Cluster by pose or individual? hypotheses regarding the data
3 Motivations Multiple explanations of the data user doesn t initially know what they want, needs options different viewpoints of users may be aiming to verify that multiple explanations do not exist (hypothesis verification, or for benchmarking clustering algorithms) Contrast with consensus clustering Every clustering should be accompanied by at least one alternative clustering!?
4 Alternative Clustering: Is it new? From one perspective, alternative clustering is not so new Generation of clusterings often goes like Generate and assess a clustering with 2 clusters Generate and assess a clustering with 3 clusters Generate and assess a clustering with k clusters We now have k-1 alternative clusterings. But some of them may be very similar
5 Alternative Clustering Algorithms Growing number of approaches ADFT, CAMI, COALA, Condens, Convolutional EM, Decorrelated k-means, MAXIMUS, Meta clustering, Multiview orthogonal clustering, NACI, Non redundant clustering,. Papers have appeared at KDD10, ICML10, SDM10, KDD09, SDM09,ICDM08,ICDM07,ICDM06,KDD05, ICDM04,,DMKD, KAIS,
6 How do these approaches differ? Task formulation: Number of alternatives to generate Sequential or Simultaneous Generation Mathematical basis Linear algebra Information theory Other objective functions
7 Sequential Alternative Clustering Generation Task: Given input clusterings {C1,..Cn}, generate an alternative clustering C, such that C is of high quality and C is different from {C1 Cn} Important special case: n=1 Existing C1 C2 Cn Alternative generate > C
8 Simultaneous Alternative Clustering Generation Task: Simultaneously generate n clusterings {C1, Cn}, such that each Ci is of high quality and each pair (Ci,Cj) is different from one another Important special case: n=2 generate > Alternatives C1 C2 Cn
9 Sequential vs. Simultaneous Sequential (greedy) Semi-supervised For i=2 to n {generate the optimal alternative clustering with respect to the previous i clusterings} Locally optimal at each step Simultaneous (non-greedy) Unsupervised In parallel, generate optimal set of n clusterings Globally optimal clustering collection but might miss some strong clusterings which would be generated by a sequential technique More difficult optimisation problem
10 Style of Algorithm Projection based Project the data into an orthogonal subspace and then re-cluster Appealing linear algebra formulation Relatively efficient Orthogonality may be too strict More complex objective function Generate the alternative clustering, trading off dissimilarity and quality in the objective function More flexible May require parameter choices
11 Simple Example Most existing techniques seem to work well (a canonical example)
12 Circle of Gaussians -Techniques which trade off dissimilarity and quality more likely to produce the second clustering -Orthogonal projection doesn t work so well here
13 Other issues Evaluation: Measuring quality/dissimilarity of alternatives Clustering setting: Desired shape of clusters: spherical versus elongated, linear versus non linear separation low versus high dimensionality data continuous versus discrete features soft versus hard clusters EM versus K-means versus hierarchical versus constraint based Number of clusters desired in each clustering
14 Alternative Clustering Evaluation Measuring dissimilarity: Mathematical measures - Rand index, Jaccard index, normalised mutual information Measuring quality: Internal validation measures: Dunn index, David Bouldin index, silhouette width External validation: Synthetic examples Combine dissimilarity and quality into a single number, or present separately? Are these numbers useful?
15 Where are we? Good existing algorithms for generation of one or two alternatives Sequential generation Simultaneous generation Not yet deployed on very large datasets Validated using assorted benchmark datasets and internal metrics
16 Open Issues What s the killer application? Deployment of alternative clusterings Need convincing use cases where consensus clustering is limited Objective function and performance measures How many alternatives is enough? How many clusters should be in an alternative clustering? the same number as the original clustering?
17 Open Issues cont. How to find alternative subspace clusters (rather than clusterings)? Visualisation of alternative clusterings More focused alternatives ``Give me another clustering which is similar in these respects and different in these other respects to the previous clustering
18 Moving Forward Central repository of code and canonical examples (synthetic and real) Make alternative clusterings algorithms accessible Identify cases in the literature of missing alternative clusterings
19 Bibliography E. Bae, J. Bailey and G. Dong. A Clustering Comparison Measure Using Density Profiles and its Application to the Discovery of Alternate Clusterings. To appear in Data Mining and Knowledge Discovery. D. Niu, J. G. Dy, and M. I. Jordan, Multiple non-redundant spectral clustering views, in Proc. of ICML 10, X. H. Dang and J. Bailey. A Hierarchical Information Theoretic Technique for the Discovery of Non Linear Alternative Clusterings. Proc. of KDD X. H. Dang and J. Bailey. Generation of alternative clusterings using the CAMI approach. Proc. of SDM Z. Qi and I. Davidson, A principled and flexible framework for finding alternative clusterings, Proc. of KDD P. Jain, R. Meka, and I. S. Dhillon. Simultaneous unsupervised learning of disparate clusterings. Proc. of SDM I. Davidson and Z. Qi. Finding alternative clusterings using constraints. Proc. of ICDM Y. Cui, X. Z. Fern, and J. G. Dy, Non-redundant multi-view clustering via orthogonalization. Proc. of ICDM E. Bae and J. Bailey. COALA: A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. Proc. of ICDM R. Caruana, M. Elhawary, N. Nguyen, and C. Smith. Meta clustering. In ICDM Conference, D. Gondek and T. Hofmann. Non-redundant clustering with conditional ensembles. Proc. of KDD Gondek, D., Hofmann, T. Non-redundant data clustering. Proc. of ICDM 2004.
Generating a Diverse Set of High-Quality Clusterings
Generating a Diverse Set of High-Quality Clusterings Jeff M. Phillips, Parasaran Raman, and Suresh Venkatasubramanian School of Computing, University of Utah {jeffp,praman,suresh}@cs.utah.edu Abstract.
More informationMultiple Non-Redundant Spectral Clustering Views
Donglin Niu ECE Department, Northeastern University, Boston, MA 02115 Jennifer G. Dy ECE Department, Northeastern University, Boston, MA 02115 dniu@ece.neu.edu jdy@ece.neu.edu MichaelI.Jordan jordan@cs.berkeley.edu
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationTOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA)
TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) 1 S. ADAEKALAVAN, 2 DR. C. CHANDRASEKAR 1 Assistant Professor, Department of Information Technology, J.J. College of Arts and Science, Pudukkottai,
More informationA Comparison of Resampling Methods for Clustering Ensembles
A Comparison of Resampling Methods for Clustering Ensembles Behrouz Minaei-Bidgoli Computer Science Department Michigan State University East Lansing, MI, 48824, USA Alexander Topchy Computer Science Department
More information9. Conclusions. 9.1 Definition KDD
9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationCluster Validation. Ke Chen. Reading: [25.1.2, KPM], [Wang et al., 2009], [Yang & Chen, 2011] COMP24111 Machine Learning
Cluster Validation Ke Chen Reading: [5.., KPM], [Wang et al., 9], [Yang & Chen, ] COMP4 Machine Learning Outline Motivation and Background Internal index Motivation and general ideas Variance-based internal
More informationMeta-Clustering. Parasaran Raman PhD Candidate School of Computing
Meta-Clustering Parasaran Raman PhD Candidate School of Computing What is Clustering? Goal: Group similar items together Unsupervised No labeling effort Popular choice for large-scale exploratory data
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationA Novel LTM-based Method for Multi-partition Clustering
Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain, 2012 A Novel LTM-based Method for Multi-partition Clustering Tengfei Liu, Nevin L. Zhang, Kin Man Poon, Hua Liu The Hong Kong
More informationConsensus Clustering. Javier Béjar URL - Spring 2019 CS - MAI
Consensus Clustering Javier Béjar URL - Spring 2019 CS - MAI Consensus Clustering The ensemble of classifiers is a well established strategy in supervised learning Unsupervised learning aims the same goal:
More informationA Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis
A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract
More informationTowards conflict resolution in collaborative clustering
Towards conflict resolution in collaborative clustering Germain Forestier, Cédric Wemmert and Pierre Gancarsi LSIIT - CNRS - University of Strasbourg - UMR 7005 Pôle API, Bd Sébastien Brant - 671 Illirch,
More informationGeneralized Information Theoretic Cluster Validity Indices for Soft Clusterings
Generalized Information Theoretic Cluster Validity Indices for Soft Clusterings Yang Lei, James C Bezdek, Jeffrey Chan, Nguyen Xuan Vinh, Simone Romano and James Bailey Department of Computing and Information
More informationVariational Inference for Nonparametric Multiple Clustering
Variational Inference for Nonparametric Multiple Clustering Yue Guan, Jennifer G. Dy, Donglin Niu Electrical & Computer Engineering Department Northeastern University Boston, MA 02115 {yguan, jdy, dniu}@ece.neu.edu
More informationhttp://www.xkcd.com/233/ Text Clustering David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Administrative 2 nd status reports Paper review
More informationOutlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationA Clustering Comparison Measure Using Density Profiles and its Application to the Discovery of Alternate Clusterings
Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) -2 A Clustering Comparison Measure Using Density Profiles and its Application
More informationTHE AREA UNDER THE ROC CURVE AS A CRITERION FOR CLUSTERING EVALUATION
THE AREA UNDER THE ROC CURVE AS A CRITERION FOR CLUSTERING EVALUATION Helena Aidos, Robert P.W. Duin and Ana Fred Instituto de Telecomunicações, Instituto Superior Técnico, Lisbon, Portugal Pattern Recognition
More informationSimultaneous Unsupervised Learning of Disparate Clusterings
Simultaneous Unsupervised Learning of Disparate Clusterings Prateek Jain, Raghu Meka and Inderjit S. Dhillon Department of Computer Sciences, University of Texas Austin, TX 7872-88, USA {pjain,raghu,inderjit}@cs.utexas.edu
More informationData Clustering. Danushka Bollegala
Data Clustering Danushka Bollegala Outline Why cluster data? Clustering as unsupervised learning Clustering algorithms k-means, k-medoids agglomerative clustering Brown s clustering Spectral clustering
More informationPre-Requisites: CS2510. NU Core Designations: AD
DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification
More informationMulti-Aspect Tagging for Collaborative Structuring
Multi-Aspect Tagging for Collaborative Structuring Katharina Morik and Michael Wurst University of Dortmund, Department of Computer Science Baroperstr. 301, 44221 Dortmund, Germany morik@ls8.cs.uni-dortmund
More informationA Novel Approach for Weighted Clustering
A Novel Approach for Weighted Clustering CHANDRA B. Indian Institute of Technology, Delhi Hauz Khas, New Delhi, India 110 016. Email: bchandra104@yahoo.co.in Abstract: - In majority of the real life datasets,
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 11, November 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationImage Analysis, Classification and Change Detection in Remote Sensing
Image Analysis, Classification and Change Detection in Remote Sensing WITH ALGORITHMS FOR ENVI/IDL Morton J. Canty Taylor &. Francis Taylor & Francis Group Boca Raton London New York CRC is an imprint
More informationClustering will not be satisfactory if:
Clustering will not be satisfactory if: -- in the input space the clusters are not linearly separable; -- the distance measure is not adequate; -- the assumptions limit the shape or the number of the clusters.
More informationConsensus Clusterings
Consensus Clusterings Nam Nguyen, Rich Caruana Department of Computer Science, Cornell University Ithaca, New York 14853 {nhnguyen,caruana}@cs.cornell.edu Abstract In this paper we address the problem
More informationOn Finding Complementary Clusterings
On Finding Complementary Clusterings Timo Pröscholdt and Michel Crucianu CEDRIC - Conservatoire National des Arts et Métiers 292 rue St Martin, 75141 Paris Cedex 3 - France Abstract. In many cases, a dataset
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationDOCUMENT CLUSTERING USING HIERARCHICAL METHODS. 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar. 3. P.Praveen Kumar. achieved.
DOCUMENT CLUSTERING USING HIERARCHICAL METHODS 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar 3. P.Praveen Kumar ABSTRACT: Cluster is a term used regularly in our life is nothing but a group. In the view
More informationUsing the Kolmogorov-Smirnov Test for Image Segmentation
Using the Kolmogorov-Smirnov Test for Image Segmentation Yong Jae Lee CS395T Computational Statistics Final Project Report May 6th, 2009 I. INTRODUCTION Image segmentation is a fundamental task in computer
More informationPatterns that Matter
Patterns that Matter Describing Structure in Data Matthijs van Leeuwen Leiden Institute of Advanced Computer Science 17 November 2015 Big Data: A Game Changer in the retail sector Predicting trends Forecasting
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationCOMP 465: Data Mining Still More on Clustering
3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following
More informationMining Clustering Dimensions
Sajib Dasgupta sajib@hlt.utdallas.edu Vincent Ng vince@hlt.utdallas.edu Human Language Technology Research Institute, University of Texas at Dallas, Richardson, TX 75083 USA Abstract Many real-world datasets
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationClustering. Content. Typical Applications. Clustering: Unsupervised data mining technique
Content Clustering Examples Cluster analysis Partitional: K-Means clustering method Hierarchical clustering methods Data preparation in clustering Interpreting clusters Cluster validation Clustering: Unsupervised
More informationRelative Constraints as Features
Relative Constraints as Features Piotr Lasek 1 and Krzysztof Lasek 2 1 Chair of Computer Science, University of Rzeszow, ul. Prof. Pigonia 1, 35-510 Rzeszow, Poland, lasek@ur.edu.pl 2 Institute of Computer
More informationMULTIPLE ALTERNATIVE CLUSTERINGS AND DIMENSIONALITY REDUCTION
MULTIPLE ALTERNATIVE CLUSTERINGS AND DIMENSIONALITY REDUCTION A Dissertation by Donglin Niu to the Graduate School of Engineering in Partial Fulfillment of the Requirements for the Degree of Doctor of
More informationClustering Documents Along Multiple Dimensions
Proceedings of the 26th AAAI Conference on Artificial Intelligence, Toronto, Canada, July 2012, pp. 879--885. Clustering Documents Along Multiple Dimensions Saib Dasgupta IBM Almaden Research Center 650
More informationBrowsing Robust Clustering-Alternatives
Browsing Robust Clustering-Alternatives Martin Hahmann, Dirk Habich, and Wolfgang Lehner TU Dresden; Database Technology Group; Dresden, Germany {martin.hahmann, dirk.habich, wolfgang.lehner}@tu-dresden.de
More informationHigh throughput Data Analysis 2. Cluster Analysis
High throughput Data Analysis 2 Cluster Analysis Overview Why clustering? Hierarchical clustering K means clustering Issues with above two Other methods Quality of clustering results Introduction WHY DO
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationECS 234: Data Analysis: Clustering ECS 234
: Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationIMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING
SECOND EDITION IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING ith Algorithms for ENVI/IDL Morton J. Canty с*' Q\ CRC Press Taylor &. Francis Group Boca Raton London New York CRC
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationClustering algorithms
Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 1 / 22 Table of contents 1 Supervised
More informationHidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi
Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Sequential Data Time-series: Stock market, weather, speech, video Ordered: Text, genes Sequential
More informationDetection and Deletion of Outliers from Large Datasets
Detection and Deletion of Outliers from Large Datasets Nithya.Jayaprakash 1, Ms. Caroline Mary 2 M. tech Student, Dept of Computer Science, Mohandas College of Engineering and Technology, India 1 Assistant
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More informationClustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures
Clustering and Dissimilarity Measures Clustering APR Course, Delft, The Netherlands Marco Loog May 19, 2008 1 What salient structures exist in the data? How many clusters? May 19, 2008 2 Cluster Analysis
More informationA Modified Hierarchical Clustering Algorithm for Document Clustering
A Modified Hierarchical Algorithm for Document Merin Paul, P Thangam Abstract is the division of data into groups called as clusters. Document clustering is done to analyse the large number of documents
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationSGN (4 cr) Chapter 11
SGN-41006 (4 cr) Chapter 11 Clustering Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 25, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationJoint Shape Segmentation
Joint Shape Segmentation Motivations Structural similarity of segmentations Extraneous geometric clues Single shape segmentation [Chen et al. 09] Joint shape segmentation [Huang et al. 11] Motivations
More informationCLASSIFICATION AND CHANGE DETECTION
IMAGE ANALYSIS, CLASSIFICATION AND CHANGE DETECTION IN REMOTE SENSING With Algorithms for ENVI/IDL and Python THIRD EDITION Morton J. Canty CRC Press Taylor & Francis Group Boca Raton London NewYork CRC
More informationAn Efficient Learning of Constraints For Semi-Supervised Clustering using Neighbour Clustering Algorithm
An Efficient Learning of Constraints For Semi-Supervised Clustering using Neighbour Clustering Algorithm T.Saranya Research Scholar Snr sons college Coimbatore, Tamilnadu saran2585@gmail.com Dr. K.Maheswari
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationTraditional clustering fails if:
Traditional clustering fails if: -- in the input space the clusters are not linearly separable; -- the distance measure is not adequate; -- the assumptions limit the shape or the number of the clusters.
More informationClustering Lecture 9: Other Topics. Jing Gao SUNY Buffalo
Clustering Lecture 9: Other Topics Jing Gao SUNY Buffalo 1 Basics Outline Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Miture model Spectral methods Advanced topics
More informationMATH 567: Mathematical Techniques in Data
Supervised and unsupervised learning Supervised learning problems: MATH 567: Mathematical Techniques in Data (X, Y ) P (X, Y ). Data Science Clustering I is labelled (input/output) with joint density We
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationContents. Foreword to Second Edition. Acknowledgments About the Authors
Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1
More informationKapitel 4: Clustering
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.
More informationExploring the Landscape of Clusterings
Exploring the Landscape of Clusterings Advisor: Suresh Venkatasubramanian Clustering Lattice... in the current form the work is extremely theoretical... unclear whether your distance function is meaningful
More informationClustering with Multiple Graphs
Clustering with Multiple Graphs Wei Tang Department of Computer Sciences The University of Texas at Austin Austin, U.S.A wtang@cs.utexas.edu Zhengdong Lu Inst. for Computational Engineering & Sciences
More informationActive Constrained Clustering via Non-Iterative Uncertainty Sampling
Active Constrained Clustering via Non-Iterative Uncertainty Sampling Panagiotis Stanitsas University of Minnesota stani078@umn.edu Anoop Cherian Australian National University anoop.cherian@anu.edu.au
More informationCluster Ensembles for High Dimensional Clustering: An Empirical Study
Cluster Ensembles for High Dimensional Clustering: An Empirical Study Xiaoli Z. Fern xz@ecn.purdue.edu School of Electrical and Computer Engineering, Purdue University, W. Lafayette, IN 47907, USA Carla
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationInternational Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani
LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationk-means Clustering Todd W. Neller Gettysburg College Laura E. Brown Michigan Technological University
k-means Clustering Todd W. Neller Gettysburg College Laura E. Brown Michigan Technological University Outline Unsupervised versus Supervised Learning Clustering Problem k-means Clustering Algorithm Visual
More informationA Patent Retrieval Method Using a Hierarchy of Clusters at TUT
A Patent Retrieval Method Using a Hierarchy of Clusters at TUT Hironori Doi Yohei Seki Masaki Aono Toyohashi University of Technology 1-1 Hibarigaoka, Tenpaku-cho, Toyohashi-shi, Aichi 441-8580, Japan
More informationClustering Analysis Basics
Clustering Analysis Basics Ke Chen Reading: [Ch. 7, EA], [5., KPM] Outline Introduction Data Types and Representations Distance Measures Major Clustering Methodologies Summary Introduction Cluster: A collection/group
More informationEnhancing Cluster Quality by Using User Browsing Time
Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationClustering. Lecture 6, 1/24/03 ECS289A
Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationImplementation of Fuzzy C-Means and Possibilistic C-Means Clustering Algorithms, Cluster Tendency Analysis and Cluster Validation
Implementation of Fuzzy C-Means and Possibilistic C-Means Clustering Algorithms, Cluster Tendency Analysis and Cluster Validation Md. Abu Bakr Siddiue *, Rezoana Bente Arif #, Mohammad Mahmudur Rahman
More informationAlgorithm Engineering Applied To Graph Clustering
Algorithm Engineering Applied To Graph Clustering Insights and Open Questions in Designing Experimental Evaluations Marco 1 Workshop on Communities in Networks 14. March, 2008 Louvain-la-Neuve Outline
More informationComparing Clusterings in Space
Michael H. Coen mhcoen@cs.wisc.edu M. Hidayath Ansari ansari@cs.wisc.edu Nathanael Fillmore nathanae@cs.wisc.edu University of Wisconsin-Madison, University Ave, Madison, WI 576 USA Abstract This paper
More informationEnhancing Cluster Quality by Using User Browsing Time
Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationAn Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs
An Introduction to Cluster Analysis Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs zhaoxia@ics.uci.edu 1 What can you say about the figure? signal C 0.0 0.5 1.0 1500 subjects Two
More informationData mining with sparse grids
Data mining with sparse grids Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Data mining with sparse grids p.1/40 Overview What is Data mining? Regularization networks
More informationMachine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler Overview What is clustering and its applications? Distance between two clusters. Hierarchical Agglomerative clustering.
More informationUNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania
UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING Daniela Joiţa Titu Maiorescu University, Bucharest, Romania danielajoita@utmro Abstract Discretization of real-valued data is often used as a pre-processing
More informationConstrained Co-clustering for Textual Documents
Constrained Co-clustering for Textual Documents Yangqiu Song Shimei Pan Shixia Liu Furu Wei Michelle X. Zhou Weihong Qian {yqsong,liusx,weifuru,qianwh}@cn.ibm.com; shimei@us.ibm.com; mzhou@us.ibm.com IBM
More informationOn Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution
ICML2011 Jun. 28-Jul. 2, 2011 On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution Masashi Sugiyama, Makoto Yamada, Manabu Kimura, and Hirotaka Hachiya Department of
More informationExpectation Maximization: Inferring model parameters and class labels
Expectation Maximization: Inferring model parameters and class labels Emily Fox University of Washington February 27, 2017 Mixture of Gaussian recap 1 2/26/17 Jumble of unlabeled images HISTOGRAM blue
More informationDETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS INTRODUCTION
DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS Yun-Ting Su James Bethel Geomatics Engineering School of Civil Engineering Purdue University 550 Stadium Mall Drive, West Lafayette,
More informationData Parallelism and the Support Vector Machine
Data Parallelism and the Support Vector Machine Solomon Gibbs he support vector machine is a common algorithm for pattern classification. However, many of the most popular implementations are not suitable
More information