Cluster Validation. Ke Chen. Reading: [25.1.2, KPM], [Wang et al., 2009], [Yang & Chen, 2011] COMP24111 Machine Learning
|
|
- Lenard Butler
- 5 years ago
- Views:
Transcription
1 Cluster Validation Ke Chen Reading: [5.., KPM], [Wang et al., 9], [Yang & Chen, ] COMP4 Machine Learning
2 Outline Motivation and Background Internal index Motivation and general ideas Variance-based internal indexes Application: finding the proper cluster number External index Motivation and general ideas Rand Index Application: Weighted clustering ensemble Summary COMP4 Machine Learning
3 Motivation and Background Motivation Supervised classification Class labels known for ground truth Accuracy: based on labels given Clustering analysis No class labels Evaluation still demanded Validation needs to Oranges: Apples: Evaluation Accuracy = 8/= 8% P Compare clustering algorithms Solve the number of clusters Avoid finding patterns in noise Find the best clusters from data COMP4 Machine Learning 3
4 Motivation and Background Illustrative Example: which one is the best? Data Set (Random Points) K-means (K=3) y y x K-means (K=3) y y x Agglomerative (Complete Link) x COMP4 Machine Learning x 4
5 Motivation and Background Cluster validation refers to procedures that evaluate the results of clustering in a quantitative and objective fashion. How to be quantitative : To employ the measures. How to be objective : To validate the measures! INPUT: DataSet(X) Clustering Algorithm(s) Partitions P Validity Index m* Different settings/configurations COMP4 Machine Learning 5
6 Motivation and Background Internal Criteria (Indexes) Validate without external information With different number of clusters Solve the number of clusters?? External Criteria (Indexes) Validate against ground truth Compare two partitions: (how similar)?? COMP4 Machine Learning 6
7 Internal Index Ground truth is unavailable but unsupervised validation must be done with common sense or a priori knowledge. There are a variety of internal indexes: Variances-based methods Rate-distortion methods Davies-Bouldin index (DBI) Bayesian Information Criterion (BIC) Silhouette Coefficient Minimum description principle (MDL) Stochastic complexity (SC) Modified Huber s Г (MHГ) index COMP4 Machine Learning 7
8 Internal Index Variances-based methods Minimise within cluster variance (SSW) Maximise between cluster variance (SSB) Intra-cluster variance is minimized Inter-cluster variance is maximized c COMP4 Machine Learning 8
9 Internal Index Variances-based methods (cont.) Assume an algorithm leads to a partition of K clusters where cluster i has nn ii data points and ci is its centroid. d(.,.) is a distance used in this algorithm. Within cluster variance (SSW) SSW K n ( K) = d ( x, c i i= j= ij i ) Between cluster variance (SSB) SSB K = n d ( ) i ( ci, c) where c is the global mean (centroid) of the whole data set. K i= COMP4 Machine Learning 9
10 Internal Index Variance based F-ratio index Measures ratio of between-cluster variance against the withincluster variance (original F-test) F-ratio index (W-B index) for a partition of K clusters F( K) where x n K * SSW ( K) SSB( K) K i= j= = = K ij i is the K i= n i n d i d ( x ( c i ij, c, c) jth data point in cluster is the number of data points in cluster i ) c i c i COMP4 Machine Learning
11 Internal Index Application: finding the proper cluster number F-ratio (x^5) S minimum Algorithm PNN Algorithm IS Clusters COMP4 Machine Learning
12 .5.4 S4 Internal Index Application: finding the proper cluster number F-ratio.3.. minimum at 6 Algorithm PNN Algorithm IS Number of clusters COMP4 Machine Learning minimum at 5
13 External Index Ground truth is available but an clustering algorithm doesn t use such information during unsupervised learning. There are a variety of internal indexes: Rand Index Adjusted Rand Index Pair counting index Information theoretic index Set matching index DVI index Normalised mutual information (NMI) index COMP4 Machine Learning 3
14 External Index Main issues If the ground truth is known, the validity of a clustering can be verified by comparing the class or clustering labels. However, this is much more complicated than in supervised classification (where labels used in training) The cluster IDs in a partition resulting from clustering have been assigned arbitrarily due to unsupervised learning permutation. The number of clusters may be different from the number of classes (the ground truth ) inconsistence. The most important problem in external indexes would be how to find all possible correspondences between the ground truth and a partition (or two candidate partitions in the case of comparison). COMP4 Machine Learning 4
15 External Index Rand Index This is the first external index proposed by Rand (97) to address the correspondence problem. Basic idea: considering all pairs in the data set by looking into both agreement and disagreement against the ground truth The index defined as RI (X, Y)= (a+d)/(a+b+c+d) X/Y Y: Pairs in the same class Y: Pairs in different classes X: Pairs in the same cluster a b X: Pairs in different clusters c d COMP4 Machine Learning 5
16 External Index Rand Index (cont.) Example: for a 5-point data set, we have X i ii ii i i Y q p q p q To calculate a, b, c and d, we have to list all possible pairs (excluding any data point to itself) [, ], [, 3], [, 4], [, 5] [, 3], [, 4], [, 5] [3, 4], [3, 5] [4, 5] COMP4 Machine Learning 6
17 External Index Rand Index (cont.) Example: for a 5-point data set, we have X i ii ii i i Y q p q p q Initialisation: a, b, c, d For data point pair [, ], we have X: [, ] assigned to (i, ii) > in different clusters Y: [, ] labelled by (q, p) > in different classes Thus, d d + = COMP4 Machine Learning 7
18 External Index Rand Index (cont.) Example: for a 5-point data set, we have X i ii ii i i Y q p q p q Current Status: a=, b=, c=, d = For data point pair [, 3], we have X: [, 3] assigned to (i, ii) > in different clusters Y: [, 3] labelled by (q, q) > in the same class Thus, c c + = COMP4 Machine Learning 8
19 External Index Rand Index (cont.) Example: for a 5-point data set, we have X i ii ii i i Y q p q p q Current Status: a=, b=, c=, d = For data point pair [, 4], we have X: [, 4] assigned to (i, i) > in the same cluster Y: [, 4] labelled by (q, p) > in different classes Thus, b b + = COMP4 Machine Learning 9
20 External Index Rand Index (cont.) Example: for a 5-point data set, we have X i ii ii i i Y q p q p q Current Status: a=, b=, c=, d = For data point pair [, 5], we have X: [, 5] assigned to (i, i) > in the same cluster Y: [, 5] labelled by (q, q) > in the same class Thus, a a + = COMP4 Machine Learning
21 External Index Rand Index (cont.) Example: for a 5-point data set, we have X i ii ii i i Y q p q p q Current Status: a=, b=, c=, d = On-class Exercise: continuing until have the final value of a, b, c and d. COMP4 Machine Learning
22 External Index Rand Index: Contingency Table In general, a, b, c and d are calculated from a contingency table Assume there are k clusters in X and l classes in Y nn iiii : the number of points in both cluster ii and class jj COMP4 Machine Learning
23 External Index Rand Index: Contingency Table (cont.) Example: for a 5-point data set, we have X i ii ii i i Y q p q p q Contingency table p q i 3 ii 3 5 COMP4 Machine Learning 3
24 External Index Rand Index: measure the number of pairs in Same cluster/class in X and Y a = k l i= j= n ij ( n ij ) Same cluster in X /different classes in Y b = ( l n j n ij ). j= i= j= Different clusters in X / same class in Y c = ( k Different clusters/classes in X and Y d = k n i n ij ). i= i= j= ( N + k k l l l i= j= n ij ( k i= n i. + l j= n. j )) Ex. 4 COMP4 Machine Learning
25 Weighted Clustering Ensemble Motivation Evidence-accumulation based clustering ensemble (Fred & Jain, 5) is a simple clustering ensemble algorithm that use the evidence accumulated from multiple yet diversified partitions generated with different algorithms, initial conditions, different distance metrics, However, this clustering ensemble algorithm is subject to limitations: Don t distinguish between non-trivial and trivial partitions; i.e., all partitions used in a clustering ensemble are treated equally important. Sensitive to a cluster distance used in the hierarchical clustering for reaching a consensus Clustering validity indexes provide an effective way to measure non-trivialness or importance of a partition Internal index: directly measure the importance of a partition quantitatively and objectively External index: indirectly measure the importance of a partition via comparing with other partitions used in clustering ensemble for another round evidence accumulation COMP4 Machine Learning 5
26 Weighted Clustering Ensemble Use validity index values as weights Yun Yang and Ke Chen, Temporal data clustering via weighted clustering ensemble with different representations, IEEE Transactions on Knowledge and Data Engineering 3(), pp. 37-3,. Chen & Yang: Temporal Data Clustering with Different Representations COMP4 Machine Learning 6
27 Weighted Clustering Ensemble Example: convert clustering results into binary Distance matrix A B C D Cluster (C) Cluster (C) = D A D C B D C A B distance Matrix 7 Chen & Yang: Temporal Data Clustering with Different Representations
28 Weighted Clustering Ensemble Example: convert clustering results into binary Distance matrix A B C D Cluster (C) Cluster (C) = D A D C B D C A B distance Matrix 8 Chen & Yang: Temporal Data Clustering with Different Representations Cluster 3 (C3)
29 Weighted Clustering Ensemble Evidence accumulation: form the collective weighted distance matrix 9 Chen & Yang: Temporal Data Clustering with Different Representations = D = D ] ) ( ) [( 3 D w w w D w w w D N D M N D M E =
30 Weighted Clustering Ensemble Clustering analysis with our weighted clustering ensemble on CAVIAR database annotated video sequences of pedestrians, a set of high-quality moving trajectories clustering analysis of trajectories is useful for many applications Experimental setting Representations: PCF, DCF, PLS and PDWT Initial clustering analysis: K-mean algorithm (4<K<), 6 initial settings Ensemble: 3 partitions totally (8 partitions/representation) Chen & Yang: Temporal Data Clustering with Different Representations 3
31 Weighted Clustering Ensemble Clustering results on CAVIAR database Chen & Yang: Temporal Data Clustering with Different Representations 3
32 Weighted Clustering Ensemble Application: UCR time series benchmarks COMP4 Machine Learning 3
33 Weighted Clustering Ensemble Application: Rand index (%) values of clustering ensembles (Yang & Chen ) COMP4 Machine Learning 33
34 Summary Cluster validation is a process that evaluate clustering results with a pre-defined criterion. Two different types of cluster validation methods Internal indexes no ground truth available defined based on common sense or a priori knowledge Application: finding the proper number of clusters, External indexes ground truth known or reference given ( relative index ) Application: performance evaluation of clustering with reference information Apart from direct evaluation, both indexes may be applied in weighted clustering ensemble, leading to better yet robust results by ignoring trivial partitions. K. Wang et al, CVAP: Validation for cluster analysis, Data Science Journal, vol. 8, May 9. [Code online available: COMP4 Machine Learning 34
Hierarchical and Ensemble Clustering
Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7., EA], [25.5, KPM], [Fred & Jain, 25] COMP24 Machine Learning Outline Introduction Cluster Distance Measures Agglomerative Algorithm Example
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu November 7, 2017 Learnt Clustering Methods Vector Data Set Data Sequence Data Text
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationConsensus Clustering. Javier Béjar URL - Spring 2019 CS - MAI
Consensus Clustering Javier Béjar URL - Spring 2019 CS - MAI Consensus Clustering The ensemble of classifiers is a well established strategy in supervised learning Unsupervised learning aims the same goal:
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Chapter 16 Flat Clustering Dell Zhang Birkbeck, University of London What Is Text Clustering? Text Clustering = Grouping a set of documents into classes of similar
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationWorkload Characterization Techniques
Workload Characterization Techniques Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/
More informationCOMP 465: Data Mining Still More on Clustering
3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationECS 234: Data Analysis: Clustering ECS 234
: Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationUsing the Kolmogorov-Smirnov Test for Image Segmentation
Using the Kolmogorov-Smirnov Test for Image Segmentation Yong Jae Lee CS395T Computational Statistics Final Project Report May 6th, 2009 I. INTRODUCTION Image segmentation is a fundamental task in computer
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 6: Flat Clustering Wiltrud Kessler & Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 0-- / 83
More informationRoad map. Basic concepts
Clustering Basic concepts Road map K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use?
More informationFlat Clustering. Slides are mostly from Hinrich Schütze. March 27, 2017
Flat Clustering Slides are mostly from Hinrich Schütze March 7, 07 / 79 Overview Recap Clustering: Introduction 3 Clustering in IR 4 K-means 5 Evaluation 6 How many clusters? / 79 Outline Recap Clustering:
More informationTHE AREA UNDER THE ROC CURVE AS A CRITERION FOR CLUSTERING EVALUATION
THE AREA UNDER THE ROC CURVE AS A CRITERION FOR CLUSTERING EVALUATION Helena Aidos, Robert P.W. Duin and Ana Fred Instituto de Telecomunicações, Instituto Superior Técnico, Lisbon, Portugal Pattern Recognition
More informationChapter 6 Continued: Partitioning Methods
Chapter 6 Continued: Partitioning Methods Partitioning methods fix the number of clusters k and seek the best possible partition for that k. The goal is to choose the partition which gives the optimal
More informationData Clustering. Danushka Bollegala
Data Clustering Danushka Bollegala Outline Why cluster data? Clustering as unsupervised learning Clustering algorithms k-means, k-medoids agglomerative clustering Brown s clustering Spectral clustering
More informationAlternative Clusterings: Current Progress and Open Challenges
Alternative Clusterings: Current Progress and Open Challenges James Bailey Department of Computer Science and Software Engineering The University of Melbourne, Australia 1 Introduction Cluster analysis:
More informationPV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211
PV: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv IIR 6: Flat Clustering Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University, Brno Center
More informationCSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16)
CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16) Michael Hahsler Southern Methodist University These slides are largely based on the slides by Hinrich Schütze Institute for
More informationTypes of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters
Types of general clustering methods Clustering Algorithms for general similarity measures agglomerative versus divisive algorithms agglomerative = bottom-up build up clusters from single objects divisive
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationClustering Algorithms for general similarity measures
Types of general clustering methods Clustering Algorithms for general similarity measures general similarity measure: specified by object X object similarity matrix 1 constructive algorithms agglomerative
More informationCS490W. Text Clustering. Luo Si. Department of Computer Science Purdue University
CS490W Text Clustering Luo Si Department of Computer Science Purdue University [Borrows slides from Chris Manning, Ray Mooney and Soumen Chakrabarti] Clustering Document clustering Motivations Document
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationCluster Analysis. Angela Montanari and Laura Anderlucci
Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationBinary Images Clustering with k-means
Binary Images Clustering with k-means João Ferreira Nunes Instituto Politécnico de Viana do Castelo joao.nunes@estg.ipvc.pt Faculdade de Engenharia da Universidade do Porto pro09001@fe.up.pt Presentation
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationChapter DM:II. II. Cluster Analysis
Chapter DM:II II. Cluster Analysis Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster Analysis DM:II-1
More informationClustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures
Clustering and Dissimilarity Measures Clustering APR Course, Delft, The Netherlands Marco Loog May 19, 2008 1 What salient structures exist in the data? How many clusters? May 19, 2008 2 Cluster Analysis
More informationClustering Analysis Basics
Clustering Analysis Basics Ke Chen Reading: [Ch. 7, EA], [5., KPM] Outline Introduction Data Types and Representations Distance Measures Major Clustering Methodologies Summary Introduction Cluster: A collection/group
More informationTHE ENSEMBLE CONCEPTUAL CLUSTERING OF SYMBOLIC DATA FOR CUSTOMER LOYALTY ANALYSIS
THE ENSEMBLE CONCEPTUAL CLUSTERING OF SYMBOLIC DATA FOR CUSTOMER LOYALTY ANALYSIS Marcin Pełka 1 1 Wroclaw University of Economics, Faculty of Economics, Management and Tourism, Department of Econometrics
More informationClustering Lecture 3: Hierarchical Methods
Clustering Lecture 3: Hierarchical Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationBig Data Analytics! Special Topics for Computer Science CSE CSE Feb 9
Big Data Analytics! Special Topics for Computer Science CSE 4095-001 CSE 5095-005! Feb 9 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Clustering I What
More informationDistances, Clustering! Rafael Irizarry!
Distances, Clustering! Rafael Irizarry! Heatmaps! Distance! Clustering organizes things that are close into groups! What does it mean for two genes to be close?! What does it mean for two samples to
More informationTri-modal Human Body Segmentation
Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationCluster analysis. Agnieszka Nowak - Brzezinska
Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that
More informationClustering. Lecture 6, 1/24/03 ECS289A
Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationLearning the Three Factors of a Non-overlapping Multi-camera Network Topology
Learning the Three Factors of a Non-overlapping Multi-camera Network Topology Xiaotang Chen, Kaiqi Huang, and Tieniu Tan National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationExtracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang
Extracting Layers and Recognizing Features for Automatic Map Understanding Yao-Yi Chiang 0 Outline Introduction/ Problem Motivation Map Processing Overview Map Decomposition Feature Recognition Discussion
More information5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction
Computational Methods for Data Analysis Massimo Poesio UNSUPERVISED LEARNING Clustering Unsupervised learning introduction 1 Supervised learning Training set: Unsupervised learning Training set: 2 Clustering
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework
More informationExploratory Analysis: Clustering
Exploratory Analysis: Clustering (some material taken or adapted from slides by Hinrich Schutze) Heejun Kim June 26, 2018 Clustering objective Grouping documents or instances into subsets or clusters Documents
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 5
Clustering Part 5 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville SNN Approach to Clustering Ordinary distance measures have problems Euclidean
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationBioimage Informatics
Bioimage Informatics Lecture 13, Spring 2012 Bioimage Data Analysis (IV) Image Segmentation (part 2) Lecture 13 February 29, 2012 1 Outline Review: Steger s line/curve detection algorithm Intensity thresholding
More informationAccumulation. Instituto Superior Técnico / Instituto de Telecomunicações. Av. Rovisco Pais, Lisboa, Portugal.
Combining Multiple Clusterings Using Evidence Accumulation Ana L.N. Fred and Anil K. Jain + Instituto Superior Técnico / Instituto de Telecomunicações Av. Rovisco Pais, 149-1 Lisboa, Portugal email: afred@lx.it.pt
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 16: Flat Clustering Hinrich Schütze Institute for Natural Language Processing, Universität Stuttgart 2009.06.16 1/ 64 Overview
More informationPart I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a
Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering
More informationCluster Analysis (b) Lijun Zhang
Cluster Analysis (b) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Grid-Based and Density-Based Algorithms Graph-Based Algorithms Non-negative Matrix Factorization Cluster Validation Summary
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More information10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2
161 Machine Learning Hierarchical clustering Reading: Bishop: 9-9.2 Second half: Overview Clustering - Hierarchical, semi-supervised learning Graphical models - Bayesian networks, HMMs, Reasoning under
More informationClustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search
Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2
More informationAlgorithm Engineering Applied To Graph Clustering
Algorithm Engineering Applied To Graph Clustering Insights and Open Questions in Designing Experimental Evaluations Marco 1 Workshop on Communities in Networks 14. March, 2008 Louvain-la-Neuve Outline
More informationhttp://www.xkcd.com/233/ Text Clustering David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Administrative 2 nd status reports Paper review
More informationClustering & Classification (chapter 15)
Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 6: Flat Clustering Hinrich Schütze Center for Information and Language Processing, University of Munich 04-06- /86 Overview Recap
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationDETECTING RESOLVERS AT.NZ. Jing Qiao, Sebastian Castro DNS-OARC 29 Amsterdam, October 2018
DETECTING RESOLVERS AT.NZ Jing Qiao, Sebastian Castro DNS-OARC 29 Amsterdam, October 2018 BACKGROUND DNS-OARC 29 2 DNS TRAFFIC IS NOISY Despite general belief, not all the sources at auth nameserver are
More informationIDENTIFYING BEHAVIORS IN CROWDED SCENES THROUGH STABILITY ANALYSIS FOR DYNAMICAL SYSTEMS
IDENTIFYING BEHAVIORS IN CROWDED SCENES THROUGH STABILITY ANALYSIS FOR DYNAMICAL SYSTEMS Berkan Solmaz, Dr. Brian Moore and Dr. Mubarak Shah 1/21 Outline Behaviors to be Identified Theoretical Concepts
More informationScalable Clustering of Signed Networks Using Balance Normalized Cut
Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang,, Inderjit S. Dhillon The 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Oct.
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationCluster Analysis. CSE634 Data Mining
Cluster Analysis CSE634 Data Mining Agenda Introduction Clustering Requirements Data Representation Partitioning Methods K-Means Clustering K-Medoids Clustering Constrained K-Means clustering Introduction
More informationCluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University
Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Kinds of Clustering Sequential Fast Cost Optimization Fixed number of clusters Hierarchical
More informationClustering CE-324: Modern Information Retrieval Sharif University of Technology
Clustering CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Ch. 16 What
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationUnsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection
More informationTexture Image Segmentation using FCM
Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore Texture Image Segmentation using FCM Kanchan S. Deshmukh + M.G.M
More informationAutomatic Domain Partitioning for Multi-Domain Learning
Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels
More informationImproving Cluster Method Quality by Validity Indices
Improving Cluster Method Quality by Validity Indices N. Hachani and H. Ounalli Faculty of Sciences of Bizerte, Tunisia narjes hachani@yahoo.fr Faculty of Sciences of Tunis, Tunisia habib.ounalli@fst.rnu.tn
More informationAN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION WILLIAM ROBSON SCHWARTZ University of Maryland, Department of Computer Science College Park, MD, USA, 20742-327, schwartz@cs.umd.edu RICARDO
More informationChapter 9. Classification and Clustering
Chapter 9 Classification and Clustering Classification and Clustering Classification and clustering are classical pattern recognition and machine learning problems Classification, also referred to as categorization
More informationClustering (Basic concepts and Algorithms) Entscheidungsunterstützungssysteme
Clustering (Basic concepts and Algorithms) Entscheidungsunterstützungssysteme Why do we need to find similarity? Similarity underlies many data science methods and solutions to business problems. Some
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationHigh throughput Data Analysis 2. Cluster Analysis
High throughput Data Analysis 2 Cluster Analysis Overview Why clustering? Hierarchical clustering K means clustering Issues with above two Other methods Quality of clustering results Introduction WHY DO
More information6. Dicretization methods 6.1 The purpose of discretization
6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many
More informationECM A Novel On-line, Evolving Clustering Method and Its Applications
ECM A Novel On-line, Evolving Clustering Method and Its Applications Qun Song 1 and Nikola Kasabov 2 1, 2 Department of Information Science, University of Otago P.O Box 56, Dunedin, New Zealand (E-mail:
More informationRobust Clustering of PET data
Robust Clustering of PET data Prasanna K Velamuru Dr. Hongbin Guo (Postdoctoral Researcher), Dr. Rosemary Renaut (Director, Computational Biosciences Program,) Dr. Kewei Chen (Banner Health PET Center,
More information10. Clustering. Introduction to Bioinformatics Jarkko Salojärvi. Based on lecture slides by Samuel Kaski
10. Clustering Introduction to Bioinformatics 30.9.2008 Jarkko Salojärvi Based on lecture slides by Samuel Kaski Definition of a cluster Typically either 1. A group of mutually similar samples, or 2. A
More informationFoundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot
Foundations of Machine Learning CentraleSupélec Fall 2017 12. Clustering Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning objectives
More informationCommunity Detection: Comparison of State of the Art Algorithms
Community Detection: Comparison of State of the Art Algorithms Josiane Mothe IRIT, UMR5505 CNRS & ESPE, Univ. de Toulouse Toulouse, France e-mail: josiane.mothe@irit.fr Karen Mkhitaryan Institute for Informatics
More informationUnsupervised Learning
Unsupervised Learning Fabio G. Cozman - fgcozman@usp.br November 16, 2018 What can we do? We just have a dataset with features (no labels, no response). We want to understand the data... no easy to define
More informationMachine Learning: Symbol-based
10d Machine Learning: Symbol-based More clustering examples 10.5 Knowledge and Learning 10.6 Unsupervised Learning 10.7 Reinforcement Learning 10.8 Epilogue and References 10.9 Exercises Additional references
More informationCluster Analysis: Agglomerate Hierarchical Clustering
Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical
More informationUnsupervised Learning
Unsupervised Learning A review of clustering and other exploratory data analysis methods HST.951J: Medical Decision Support Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision
More informationRelative Clustering Validity Criteria: A Comparative Overview
Relative Clustering Validity Criteria: A Comparative Overview Lucas Vendramin, Ricardo J. G. B. Campello and Eduardo R. Hruschka Department of Computer Sciences of the University of São Paulo at São Carlos,
More information