Alternative Clusterings: Current Progress and Open Challenges

Size: px
Start display at page:

Download "Alternative Clusterings: Current Progress and Open Challenges"

Transcription

1 Alternative Clusterings: Current Progress and Open Challenges James Bailey Department of Computer Science and Software Engineering The University of Melbourne, Australia 1

2 Introduction Cluster analysis: group similar objects into clusters No single solution => Equally important, different views or Cluster by pose or individual? hypotheses regarding the data

3 Motivations Multiple explanations of the data user doesn t initially know what they want, needs options different viewpoints of users may be aiming to verify that multiple explanations do not exist (hypothesis verification, or for benchmarking clustering algorithms) Contrast with consensus clustering Every clustering should be accompanied by at least one alternative clustering!?

4 Alternative Clustering: Is it new? From one perspective, alternative clustering is not so new Generation of clusterings often goes like Generate and assess a clustering with 2 clusters Generate and assess a clustering with 3 clusters Generate and assess a clustering with k clusters We now have k-1 alternative clusterings. But some of them may be very similar

5 Alternative Clustering Algorithms Growing number of approaches ADFT, CAMI, COALA, Condens, Convolutional EM, Decorrelated k-means, MAXIMUS, Meta clustering, Multiview orthogonal clustering, NACI, Non redundant clustering,. Papers have appeared at KDD10, ICML10, SDM10, KDD09, SDM09,ICDM08,ICDM07,ICDM06,KDD05, ICDM04,,DMKD, KAIS,

6 How do these approaches differ? Task formulation: Number of alternatives to generate Sequential or Simultaneous Generation Mathematical basis Linear algebra Information theory Other objective functions

7 Sequential Alternative Clustering Generation Task: Given input clusterings {C1,..Cn}, generate an alternative clustering C, such that C is of high quality and C is different from {C1 Cn} Important special case: n=1 Existing C1 C2 Cn Alternative generate > C

8 Simultaneous Alternative Clustering Generation Task: Simultaneously generate n clusterings {C1, Cn}, such that each Ci is of high quality and each pair (Ci,Cj) is different from one another Important special case: n=2 generate > Alternatives C1 C2 Cn

9 Sequential vs. Simultaneous Sequential (greedy) Semi-supervised For i=2 to n {generate the optimal alternative clustering with respect to the previous i clusterings} Locally optimal at each step Simultaneous (non-greedy) Unsupervised In parallel, generate optimal set of n clusterings Globally optimal clustering collection but might miss some strong clusterings which would be generated by a sequential technique More difficult optimisation problem

10 Style of Algorithm Projection based Project the data into an orthogonal subspace and then re-cluster Appealing linear algebra formulation Relatively efficient Orthogonality may be too strict More complex objective function Generate the alternative clustering, trading off dissimilarity and quality in the objective function More flexible May require parameter choices

11 Simple Example Most existing techniques seem to work well (a canonical example)

12 Circle of Gaussians -Techniques which trade off dissimilarity and quality more likely to produce the second clustering -Orthogonal projection doesn t work so well here

13 Other issues Evaluation: Measuring quality/dissimilarity of alternatives Clustering setting: Desired shape of clusters: spherical versus elongated, linear versus non linear separation low versus high dimensionality data continuous versus discrete features soft versus hard clusters EM versus K-means versus hierarchical versus constraint based Number of clusters desired in each clustering

14 Alternative Clustering Evaluation Measuring dissimilarity: Mathematical measures - Rand index, Jaccard index, normalised mutual information Measuring quality: Internal validation measures: Dunn index, David Bouldin index, silhouette width External validation: Synthetic examples Combine dissimilarity and quality into a single number, or present separately? Are these numbers useful?

15 Where are we? Good existing algorithms for generation of one or two alternatives Sequential generation Simultaneous generation Not yet deployed on very large datasets Validated using assorted benchmark datasets and internal metrics

16 Open Issues What s the killer application? Deployment of alternative clusterings Need convincing use cases where consensus clustering is limited Objective function and performance measures How many alternatives is enough? How many clusters should be in an alternative clustering? the same number as the original clustering?

17 Open Issues cont. How to find alternative subspace clusters (rather than clusterings)? Visualisation of alternative clusterings More focused alternatives ``Give me another clustering which is similar in these respects and different in these other respects to the previous clustering

18 Moving Forward Central repository of code and canonical examples (synthetic and real) Make alternative clusterings algorithms accessible Identify cases in the literature of missing alternative clusterings

19 Bibliography E. Bae, J. Bailey and G. Dong. A Clustering Comparison Measure Using Density Profiles and its Application to the Discovery of Alternate Clusterings. To appear in Data Mining and Knowledge Discovery. D. Niu, J. G. Dy, and M. I. Jordan, Multiple non-redundant spectral clustering views, in Proc. of ICML 10, X. H. Dang and J. Bailey. A Hierarchical Information Theoretic Technique for the Discovery of Non Linear Alternative Clusterings. Proc. of KDD X. H. Dang and J. Bailey. Generation of alternative clusterings using the CAMI approach. Proc. of SDM Z. Qi and I. Davidson, A principled and flexible framework for finding alternative clusterings, Proc. of KDD P. Jain, R. Meka, and I. S. Dhillon. Simultaneous unsupervised learning of disparate clusterings. Proc. of SDM I. Davidson and Z. Qi. Finding alternative clusterings using constraints. Proc. of ICDM Y. Cui, X. Z. Fern, and J. G. Dy, Non-redundant multi-view clustering via orthogonalization. Proc. of ICDM E. Bae and J. Bailey. COALA: A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. Proc. of ICDM R. Caruana, M. Elhawary, N. Nguyen, and C. Smith. Meta clustering. In ICDM Conference, D. Gondek and T. Hofmann. Non-redundant clustering with conditional ensembles. Proc. of KDD Gondek, D., Hofmann, T. Non-redundant data clustering. Proc. of ICDM 2004.

Generating a Diverse Set of High-Quality Clusterings

Generating a Diverse Set of High-Quality Clusterings Generating a Diverse Set of High-Quality Clusterings Jeff M. Phillips, Parasaran Raman, and Suresh Venkatasubramanian School of Computing, University of Utah {jeffp,praman,suresh}@cs.utah.edu Abstract.

More information

Multiple Non-Redundant Spectral Clustering Views

Multiple Non-Redundant Spectral Clustering Views Donglin Niu ECE Department, Northeastern University, Boston, MA 02115 Jennifer G. Dy ECE Department, Northeastern University, Boston, MA 02115 dniu@ece.neu.edu jdy@ece.neu.edu MichaelI.Jordan jordan@cs.berkeley.edu

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA)

TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) 1 S. ADAEKALAVAN, 2 DR. C. CHANDRASEKAR 1 Assistant Professor, Department of Information Technology, J.J. College of Arts and Science, Pudukkottai,

More information

A Comparison of Resampling Methods for Clustering Ensembles

A Comparison of Resampling Methods for Clustering Ensembles A Comparison of Resampling Methods for Clustering Ensembles Behrouz Minaei-Bidgoli Computer Science Department Michigan State University East Lansing, MI, 48824, USA Alexander Topchy Computer Science Department

More information

9. Conclusions. 9.1 Definition KDD

9. Conclusions. 9.1 Definition KDD 9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Cluster Validation. Ke Chen. Reading: [25.1.2, KPM], [Wang et al., 2009], [Yang & Chen, 2011] COMP24111 Machine Learning

Cluster Validation. Ke Chen. Reading: [25.1.2, KPM], [Wang et al., 2009], [Yang & Chen, 2011] COMP24111 Machine Learning Cluster Validation Ke Chen Reading: [5.., KPM], [Wang et al., 9], [Yang & Chen, ] COMP4 Machine Learning Outline Motivation and Background Internal index Motivation and general ideas Variance-based internal

More information

Meta-Clustering. Parasaran Raman PhD Candidate School of Computing

Meta-Clustering. Parasaran Raman PhD Candidate School of Computing Meta-Clustering Parasaran Raman PhD Candidate School of Computing What is Clustering? Goal: Group similar items together Unsupervised No labeling effort Popular choice for large-scale exploratory data

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

A Novel LTM-based Method for Multi-partition Clustering

A Novel LTM-based Method for Multi-partition Clustering Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain, 2012 A Novel LTM-based Method for Multi-partition Clustering Tengfei Liu, Nevin L. Zhang, Kin Man Poon, Hua Liu The Hong Kong

More information

Consensus Clustering. Javier Béjar URL - Spring 2019 CS - MAI

Consensus Clustering. Javier Béjar URL - Spring 2019 CS - MAI Consensus Clustering Javier Béjar URL - Spring 2019 CS - MAI Consensus Clustering The ensemble of classifiers is a well established strategy in supervised learning Unsupervised learning aims the same goal:

More information

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract

More information

Towards conflict resolution in collaborative clustering

Towards conflict resolution in collaborative clustering Towards conflict resolution in collaborative clustering Germain Forestier, Cédric Wemmert and Pierre Gancarsi LSIIT - CNRS - University of Strasbourg - UMR 7005 Pôle API, Bd Sébastien Brant - 671 Illirch,

More information

Generalized Information Theoretic Cluster Validity Indices for Soft Clusterings

Generalized Information Theoretic Cluster Validity Indices for Soft Clusterings Generalized Information Theoretic Cluster Validity Indices for Soft Clusterings Yang Lei, James C Bezdek, Jeffrey Chan, Nguyen Xuan Vinh, Simone Romano and James Bailey Department of Computing and Information

More information

Variational Inference for Nonparametric Multiple Clustering

Variational Inference for Nonparametric Multiple Clustering Variational Inference for Nonparametric Multiple Clustering Yue Guan, Jennifer G. Dy, Donglin Niu Electrical & Computer Engineering Department Northeastern University Boston, MA 02115 {yguan, jdy, dniu}@ece.neu.edu

More information

http://www.xkcd.com/233/ Text Clustering David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Administrative 2 nd status reports Paper review

More information

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013 Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

A Clustering Comparison Measure Using Density Profiles and its Application to the Discovery of Alternate Clusterings

A Clustering Comparison Measure Using Density Profiles and its Application to the Discovery of Alternate Clusterings Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) -2 A Clustering Comparison Measure Using Density Profiles and its Application

More information

THE AREA UNDER THE ROC CURVE AS A CRITERION FOR CLUSTERING EVALUATION

THE AREA UNDER THE ROC CURVE AS A CRITERION FOR CLUSTERING EVALUATION THE AREA UNDER THE ROC CURVE AS A CRITERION FOR CLUSTERING EVALUATION Helena Aidos, Robert P.W. Duin and Ana Fred Instituto de Telecomunicações, Instituto Superior Técnico, Lisbon, Portugal Pattern Recognition

More information

Simultaneous Unsupervised Learning of Disparate Clusterings

Simultaneous Unsupervised Learning of Disparate Clusterings Simultaneous Unsupervised Learning of Disparate Clusterings Prateek Jain, Raghu Meka and Inderjit S. Dhillon Department of Computer Sciences, University of Texas Austin, TX 7872-88, USA {pjain,raghu,inderjit}@cs.utexas.edu

More information

Data Clustering. Danushka Bollegala

Data Clustering. Danushka Bollegala Data Clustering Danushka Bollegala Outline Why cluster data? Clustering as unsupervised learning Clustering algorithms k-means, k-medoids agglomerative clustering Brown s clustering Spectral clustering

More information

Pre-Requisites: CS2510. NU Core Designations: AD

Pre-Requisites: CS2510. NU Core Designations: AD DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification

More information

Multi-Aspect Tagging for Collaborative Structuring

Multi-Aspect Tagging for Collaborative Structuring Multi-Aspect Tagging for Collaborative Structuring Katharina Morik and Michael Wurst University of Dortmund, Department of Computer Science Baroperstr. 301, 44221 Dortmund, Germany morik@ls8.cs.uni-dortmund

More information

A Novel Approach for Weighted Clustering

A Novel Approach for Weighted Clustering A Novel Approach for Weighted Clustering CHANDRA B. Indian Institute of Technology, Delhi Hauz Khas, New Delhi, India 110 016. Email: bchandra104@yahoo.co.in Abstract: - In majority of the real life datasets,

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 11, November 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Image Analysis, Classification and Change Detection in Remote Sensing

Image Analysis, Classification and Change Detection in Remote Sensing Image Analysis, Classification and Change Detection in Remote Sensing WITH ALGORITHMS FOR ENVI/IDL Morton J. Canty Taylor &. Francis Taylor & Francis Group Boca Raton London New York CRC is an imprint

More information

Clustering will not be satisfactory if:

Clustering will not be satisfactory if: Clustering will not be satisfactory if: -- in the input space the clusters are not linearly separable; -- the distance measure is not adequate; -- the assumptions limit the shape or the number of the clusters.

More information

Consensus Clusterings

Consensus Clusterings Consensus Clusterings Nam Nguyen, Rich Caruana Department of Computer Science, Cornell University Ithaca, New York 14853 {nhnguyen,caruana}@cs.cornell.edu Abstract In this paper we address the problem

More information

On Finding Complementary Clusterings

On Finding Complementary Clusterings On Finding Complementary Clusterings Timo Pröscholdt and Michel Crucianu CEDRIC - Conservatoire National des Arts et Métiers 292 rue St Martin, 75141 Paris Cedex 3 - France Abstract. In many cases, a dataset

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

DOCUMENT CLUSTERING USING HIERARCHICAL METHODS. 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar. 3. P.Praveen Kumar. achieved.

DOCUMENT CLUSTERING USING HIERARCHICAL METHODS. 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar. 3. P.Praveen Kumar. achieved. DOCUMENT CLUSTERING USING HIERARCHICAL METHODS 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar 3. P.Praveen Kumar ABSTRACT: Cluster is a term used regularly in our life is nothing but a group. In the view

More information

Using the Kolmogorov-Smirnov Test for Image Segmentation

Using the Kolmogorov-Smirnov Test for Image Segmentation Using the Kolmogorov-Smirnov Test for Image Segmentation Yong Jae Lee CS395T Computational Statistics Final Project Report May 6th, 2009 I. INTRODUCTION Image segmentation is a fundamental task in computer

More information

Patterns that Matter

Patterns that Matter Patterns that Matter Describing Structure in Data Matthijs van Leeuwen Leiden Institute of Advanced Computer Science 17 November 2015 Big Data: A Game Changer in the retail sector Predicting trends Forecasting

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

COMP 465: Data Mining Still More on Clustering

COMP 465: Data Mining Still More on Clustering 3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following

More information

Mining Clustering Dimensions

Mining Clustering Dimensions Sajib Dasgupta sajib@hlt.utdallas.edu Vincent Ng vince@hlt.utdallas.edu Human Language Technology Research Institute, University of Texas at Dallas, Richardson, TX 75083 USA Abstract Many real-world datasets

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Clustering. Content. Typical Applications. Clustering: Unsupervised data mining technique

Clustering. Content. Typical Applications. Clustering: Unsupervised data mining technique Content Clustering Examples Cluster analysis Partitional: K-Means clustering method Hierarchical clustering methods Data preparation in clustering Interpreting clusters Cluster validation Clustering: Unsupervised

More information

Relative Constraints as Features

Relative Constraints as Features Relative Constraints as Features Piotr Lasek 1 and Krzysztof Lasek 2 1 Chair of Computer Science, University of Rzeszow, ul. Prof. Pigonia 1, 35-510 Rzeszow, Poland, lasek@ur.edu.pl 2 Institute of Computer

More information

MULTIPLE ALTERNATIVE CLUSTERINGS AND DIMENSIONALITY REDUCTION

MULTIPLE ALTERNATIVE CLUSTERINGS AND DIMENSIONALITY REDUCTION MULTIPLE ALTERNATIVE CLUSTERINGS AND DIMENSIONALITY REDUCTION A Dissertation by Donglin Niu to the Graduate School of Engineering in Partial Fulfillment of the Requirements for the Degree of Doctor of

More information

Clustering Documents Along Multiple Dimensions

Clustering Documents Along Multiple Dimensions Proceedings of the 26th AAAI Conference on Artificial Intelligence, Toronto, Canada, July 2012, pp. 879--885. Clustering Documents Along Multiple Dimensions Saib Dasgupta IBM Almaden Research Center 650

More information

Browsing Robust Clustering-Alternatives

Browsing Robust Clustering-Alternatives Browsing Robust Clustering-Alternatives Martin Hahmann, Dirk Habich, and Wolfgang Lehner TU Dresden; Database Technology Group; Dresden, Germany {martin.hahmann, dirk.habich, wolfgang.lehner}@tu-dresden.de

More information

High throughput Data Analysis 2. Cluster Analysis

High throughput Data Analysis 2. Cluster Analysis High throughput Data Analysis 2 Cluster Analysis Overview Why clustering? Hierarchical clustering K means clustering Issues with above two Other methods Quality of clustering results Introduction WHY DO

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

ECS 234: Data Analysis: Clustering ECS 234

ECS 234: Data Analysis: Clustering ECS 234 : Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING

IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING SECOND EDITION IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING ith Algorithms for ENVI/IDL Morton J. Canty с*' Q\ CRC Press Taylor &. Francis Group Boca Raton London New York CRC

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Clustering algorithms

Clustering algorithms Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 1 / 22 Table of contents 1 Supervised

More information

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Sequential Data Time-series: Stock market, weather, speech, video Ordered: Text, genes Sequential

More information

Detection and Deletion of Outliers from Large Datasets

Detection and Deletion of Outliers from Large Datasets Detection and Deletion of Outliers from Large Datasets Nithya.Jayaprakash 1, Ms. Caroline Mary 2 M. tech Student, Dept of Computer Science, Mohandas College of Engineering and Technology, India 1 Assistant

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Clustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures

Clustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures Clustering and Dissimilarity Measures Clustering APR Course, Delft, The Netherlands Marco Loog May 19, 2008 1 What salient structures exist in the data? How many clusters? May 19, 2008 2 Cluster Analysis

More information

A Modified Hierarchical Clustering Algorithm for Document Clustering

A Modified Hierarchical Clustering Algorithm for Document Clustering A Modified Hierarchical Algorithm for Document Merin Paul, P Thangam Abstract is the division of data into groups called as clusters. Document clustering is done to analyse the large number of documents

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

SGN (4 cr) Chapter 11

SGN (4 cr) Chapter 11 SGN-41006 (4 cr) Chapter 11 Clustering Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 25, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter

More information

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple

More information

Joint Shape Segmentation

Joint Shape Segmentation Joint Shape Segmentation Motivations Structural similarity of segmentations Extraneous geometric clues Single shape segmentation [Chen et al. 09] Joint shape segmentation [Huang et al. 11] Motivations

More information

CLASSIFICATION AND CHANGE DETECTION

CLASSIFICATION AND CHANGE DETECTION IMAGE ANALYSIS, CLASSIFICATION AND CHANGE DETECTION IN REMOTE SENSING With Algorithms for ENVI/IDL and Python THIRD EDITION Morton J. Canty CRC Press Taylor & Francis Group Boca Raton London NewYork CRC

More information

An Efficient Learning of Constraints For Semi-Supervised Clustering using Neighbour Clustering Algorithm

An Efficient Learning of Constraints For Semi-Supervised Clustering using Neighbour Clustering Algorithm An Efficient Learning of Constraints For Semi-Supervised Clustering using Neighbour Clustering Algorithm T.Saranya Research Scholar Snr sons college Coimbatore, Tamilnadu saran2585@gmail.com Dr. K.Maheswari

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

Traditional clustering fails if:

Traditional clustering fails if: Traditional clustering fails if: -- in the input space the clusters are not linearly separable; -- the distance measure is not adequate; -- the assumptions limit the shape or the number of the clusters.

More information

Clustering Lecture 9: Other Topics. Jing Gao SUNY Buffalo

Clustering Lecture 9: Other Topics. Jing Gao SUNY Buffalo Clustering Lecture 9: Other Topics Jing Gao SUNY Buffalo 1 Basics Outline Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Miture model Spectral methods Advanced topics

More information

MATH 567: Mathematical Techniques in Data

MATH 567: Mathematical Techniques in Data Supervised and unsupervised learning Supervised learning problems: MATH 567: Mathematical Techniques in Data (X, Y ) P (X, Y ). Data Science Clustering I is labelled (input/output) with joint density We

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

Kapitel 4: Clustering

Kapitel 4: Clustering Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.

More information

Exploring the Landscape of Clusterings

Exploring the Landscape of Clusterings Exploring the Landscape of Clusterings Advisor: Suresh Venkatasubramanian Clustering Lattice... in the current form the work is extremely theoretical... unclear whether your distance function is meaningful

More information

Clustering with Multiple Graphs

Clustering with Multiple Graphs Clustering with Multiple Graphs Wei Tang Department of Computer Sciences The University of Texas at Austin Austin, U.S.A wtang@cs.utexas.edu Zhengdong Lu Inst. for Computational Engineering & Sciences

More information

Active Constrained Clustering via Non-Iterative Uncertainty Sampling

Active Constrained Clustering via Non-Iterative Uncertainty Sampling Active Constrained Clustering via Non-Iterative Uncertainty Sampling Panagiotis Stanitsas University of Minnesota stani078@umn.edu Anoop Cherian Australian National University anoop.cherian@anu.edu.au

More information

Cluster Ensembles for High Dimensional Clustering: An Empirical Study

Cluster Ensembles for High Dimensional Clustering: An Empirical Study Cluster Ensembles for High Dimensional Clustering: An Empirical Study Xiaoli Z. Fern xz@ecn.purdue.edu School of Electrical and Computer Engineering, Purdue University, W. Lafayette, IN 47907, USA Carla

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

k-means Clustering Todd W. Neller Gettysburg College Laura E. Brown Michigan Technological University

k-means Clustering Todd W. Neller Gettysburg College Laura E. Brown Michigan Technological University k-means Clustering Todd W. Neller Gettysburg College Laura E. Brown Michigan Technological University Outline Unsupervised versus Supervised Learning Clustering Problem k-means Clustering Algorithm Visual

More information

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT A Patent Retrieval Method Using a Hierarchy of Clusters at TUT Hironori Doi Yohei Seki Masaki Aono Toyohashi University of Technology 1-1 Hibarigaoka, Tenpaku-cho, Toyohashi-shi, Aichi 441-8580, Japan

More information

Clustering Analysis Basics

Clustering Analysis Basics Clustering Analysis Basics Ke Chen Reading: [Ch. 7, EA], [5., KPM] Outline Introduction Data Types and Representations Distance Measures Major Clustering Methodologies Summary Introduction Cluster: A collection/group

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Clustering. Lecture 6, 1/24/03 ECS289A

Clustering. Lecture 6, 1/24/03 ECS289A Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

Implementation of Fuzzy C-Means and Possibilistic C-Means Clustering Algorithms, Cluster Tendency Analysis and Cluster Validation

Implementation of Fuzzy C-Means and Possibilistic C-Means Clustering Algorithms, Cluster Tendency Analysis and Cluster Validation Implementation of Fuzzy C-Means and Possibilistic C-Means Clustering Algorithms, Cluster Tendency Analysis and Cluster Validation Md. Abu Bakr Siddiue *, Rezoana Bente Arif #, Mohammad Mahmudur Rahman

More information

Algorithm Engineering Applied To Graph Clustering

Algorithm Engineering Applied To Graph Clustering Algorithm Engineering Applied To Graph Clustering Insights and Open Questions in Designing Experimental Evaluations Marco 1 Workshop on Communities in Networks 14. March, 2008 Louvain-la-Neuve Outline

More information

Comparing Clusterings in Space

Comparing Clusterings in Space Michael H. Coen mhcoen@cs.wisc.edu M. Hidayath Ansari ansari@cs.wisc.edu Nathanael Fillmore nathanae@cs.wisc.edu University of Wisconsin-Madison, University Ave, Madison, WI 576 USA Abstract This paper

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs An Introduction to Cluster Analysis Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs zhaoxia@ics.uci.edu 1 What can you say about the figure? signal C 0.0 0.5 1.0 1500 subjects Two

More information

Data mining with sparse grids

Data mining with sparse grids Data mining with sparse grids Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Data mining with sparse grids p.1/40 Overview What is Data mining? Regularization networks

More information

Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler Overview What is clustering and its applications? Distance between two clusters. Hierarchical Agglomerative clustering.

More information

UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania

UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING Daniela Joiţa Titu Maiorescu University, Bucharest, Romania danielajoita@utmro Abstract Discretization of real-valued data is often used as a pre-processing

More information

Constrained Co-clustering for Textual Documents

Constrained Co-clustering for Textual Documents Constrained Co-clustering for Textual Documents Yangqiu Song Shimei Pan Shixia Liu Furu Wei Michelle X. Zhou Weihong Qian {yqsong,liusx,weifuru,qianwh}@cn.ibm.com; shimei@us.ibm.com; mzhou@us.ibm.com IBM

More information

On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution

On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution ICML2011 Jun. 28-Jul. 2, 2011 On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution Masashi Sugiyama, Makoto Yamada, Manabu Kimura, and Hirotaka Hachiya Department of

More information

Expectation Maximization: Inferring model parameters and class labels

Expectation Maximization: Inferring model parameters and class labels Expectation Maximization: Inferring model parameters and class labels Emily Fox University of Washington February 27, 2017 Mixture of Gaussian recap 1 2/26/17 Jumble of unlabeled images HISTOGRAM blue

More information

DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS INTRODUCTION

DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS INTRODUCTION DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS Yun-Ting Su James Bethel Geomatics Engineering School of Civil Engineering Purdue University 550 Stadium Mall Drive, West Lafayette,

More information

Data Parallelism and the Support Vector Machine

Data Parallelism and the Support Vector Machine Data Parallelism and the Support Vector Machine Solomon Gibbs he support vector machine is a common algorithm for pattern classification. However, many of the most popular implementations are not suitable

More information