Cluster Analysis, Multidimensional Scaling and Graph Theory. Cluster Analysis, Multidimensional Scaling and Graph Theory

Similar documents
Introduction to Artificial Intelligence

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

Machine Learning (BSMC-GA 4439) Wenke Liu

Network Traffic Measurements and Analysis

Machine Learning (BSMC-GA 4439) Wenke Liu

Clustering. Chapter 10 in Introduction to statistical learning

Cluster Analysis: Agglomerate Hierarchical Clustering

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

ECG782: Multidimensional Digital Signal Processing

Pattern Clustering with Similarity Measures

University of Florida CISE department Gator Engineering. Clustering Part 5

Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms

Gene Clustering & Classification

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION

Statistical Pattern Recognition

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms

HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION

Dynamic Clustering of Data with Modified K-Means Algorithm

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

MATH5745 Multivariate Methods Lecture 13

Unsupervised Learning : Clustering

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Supervised vs. Unsupervised Learning

Understanding Clustering Supervising the unsupervised

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

Clustering. Supervised vs. Unsupervised Learning

ABSTRACT INTRODUCTION

DISCRETIZATION BASED ON CLUSTERING METHODS. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania

Unsupervised Learning

Texture Image Segmentation using FCM

Clustering and Visualisation of Data

Summary. Machine Learning: Introduction. Marcin Sydow

FUZZY KERNEL K-MEDOIDS ALGORITHM FOR MULTICLASS MULTIDIMENSIONAL DATA CLASSIFICATION

Cluster Analysis using Spherical SOM

Statistical Pattern Recognition

A Support Vector Method for Hierarchical Clustering

Rule extraction from support vector machines

INF 4300 Classification III Anne Solberg The agenda today:

CPS331 Lecture: Resemblance-Based Learning last revised November 8, 2018

Statistical Pattern Recognition

Statistical Methods in AI

Hsiaochun Hsu Date: 12/12/15. Support Vector Machine With Data Reduction

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

10701 Machine Learning. Clustering

Motivation. Technical Background

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

Unsupervised Learning

Model-based segmentation and recognition from range data

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

International Journal Of Engineering And Computer Science ISSN: Volume 5 Issue 11 Nov. 2016, Page No.

Comparision between Quad tree based K-Means and EM Algorithm for Fault Prediction

On Sample Weighted Clustering Algorithm using Euclidean and Mahalanobis Distances

University of Florida CISE department Gator Engineering. Clustering Part 2

An Unsupervised Technique for Statistical Data Analysis Using Data Mining

Color based segmentation using clustering techniques

Search Engines. Information Retrieval in Practice

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Exploratory data analysis for microarrays

An Enhanced K-Medoid Clustering Algorithm

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Robust PDF Table Locator

Data mining with Support Vector Machine

Pattern recognition. Classification/Clustering GW Chapter 12 (some concepts) Textures

CSE 5243 INTRO. TO DATA MINING

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

Hierarchical Clustering

Supervised Variable Clustering for Classification of NIR Spectra

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Application of Fuzzy Classification in Bankruptcy Prediction

How do microarrays work

Unsupervised Learning and Clustering

Machine Learning in Biology

UNSUPERVISED LEARNING IN R. Introduction to hierarchical clustering

Data mining techniques for actuaries: an overview

Machine learning Pattern recognition. Classification/Clustering GW Chapter 12 (some concepts) Textures

Clustering CS 550: Machine Learning

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

CSE 5243 INTRO. TO DATA MINING

Clustering algorithms

EE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR

CLUSTER ANALYSIS. V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi

BRACE: A Paradigm For the Discretization of Continuously Valued Data

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract

Summer School in Statistics for Astronomers & Physicists June 15-17, Cluster Analysis

Flexible Lag Definition for Experimental Variogram Calculation

Exploratory Analysis: Clustering

University of Florida CISE department Gator Engineering. Clustering Part 4

MSA220 - Statistical Learning for Big Data

CSE 5243 INTRO. TO DATA MINING

PARALLEL CLASSIFICATION ALGORITHMS

Unsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis

Data Clustering With Leaders and Subleaders Algorithm

An adjustable p-exponential clustering algorithm

Transcription:

Cluster Analysis, Multidimensional Scaling and Graph Theory Dpto. de Estadística, E.E. y O.E.I. Universidad de Alcalá luisf.rivera@uah.es 1

Outline The problem of statistical classification Cluster analysis Multidimensional scaling and graph theory The adjacency matrix The Iris data Conclusions and references 2

1. The problem of Statistical Classification Introduction Identification of groups of similar cases is a very important task in everyday research. Information of p variables may be measured over n individuals. What is the group structure of these cases? X x11 x12 x1 p = x21 x22 x2p xn1 xn2 xnp 3

1. The problem of Statistical Classification Classification and statistical learning Classification systems look for a rule to classify objects. They can be supervised or unsupervised, depending on the existence of a prior knowledge of classes to which the objects belong. Classical methods: Discriminant Analysis Cluster Analysis Modern methods: Statistical learning Supervised Unsupervised 4

1. The problem of Statistical Classification Supervised vs. unsupervised learning (I) To develop a supervised classification system, it is necessary to know the classes (C) in which the population is divided, and also to which class each observed individual belongs. For each case i, we must know to which class it belongs, from set {1,2,...,C}: y1 y Y = 2, yi { 1,2,..., C}, i = 1,..., n. yn 5

1. The problem of Statistical Classification Supervised vs. unsupervised learning (II) A supervised classification system provides some kind of mathematical function Y = Y(X,w), where w is a vector of parameters adjusted from data. The values of these parameters are determined using a learning algorithm, which usually tries to minimize a function of classification error. Supervised classifiers Discriminant Analysis Neural Networks SVM Trees... 6

1. The problem of Statistical Classification Supervised vs. unsupervised learning (III) Unsupervised classification tries to find out the existing group structure in data, in a natural way. Normally, real classes (C) in population are unknown, thus there is no knowledge about the class each object belongs to. This kind of problems is sometimes referred of as pattern recognition, in the sense that it's intended to discover classes of objects in data. 7

1. The problem of Statistical Classification Supervised vs. unsupervised learning (IV) Unsupervised classification algorithms seek to divide the data set in some groups or classes of elements. Normally, a group is described as a set of similar cases, that are different to cases classified in other groups. It is necessary to find a way to measure the closeness between cases. Dissimilarity measures are used. Unsupervised classifiers Cluster analysis Neural networks K-NN... 8

The problem of statistical classification Cluster analysis Multidimensional scaling and graph theory The adjacency matrix The Iris data Conclusions and references 9

2. Cluster analysis Introduction The purpose of Cluster analysis is to discover groups of elements in data, according to the following properties: Each element belongs to only one group. Every element must be classified in one group. Elements in a group must be homogenous (similar) and different to elements in other groups. Clustering methods can be: Partitioning: based on the elements in dataset. Hierarchical: based on distances between elements in dataset. 10

2. Cluster analysis Example Let s consider this dataset (2 dim.): X 1 X 2 2,25 3,50 2,50 4,00 2,25 3,00 3,00 3,50 3,25 3,00 2,75 3,25 3,50 2,25 3,25 2,00 3,75 2,50 4,00 2,25 2,25 1,00 2,50 1,75 2,75 1,25 2,50 1,50 2,75 1,50 4,00,00 4,25 1,00 4,25,25 4,50,50 4,50,75 How many groups are there? 11

2. Cluster analysis Example. k-means (I) k=2 k=3 12

2. Cluster analysis Example. k-means (II) k=4 k=5 What s the structure of this dataset? 13

2. Cluster analysis Example. Hierarchical methods While as k-means is based on dataset, hierarchical methods are based on distances between cases in the dataset (nxn matrix): Caso 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Matriz de distancias distancia euclídea 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20,000,559,500,750 1,118,559 1,768 1,803 1,803 2,151 2,500 1,768 2,305 2,016 2,062 3,913 3,202 3,816 3,750 3,553,559,000 1,031,707 1,250,791 2,016 2,136 1,953 2,305 3,010 2,250 2,761 2,500 2,512 4,272 3,473 4,138 4,031 3,816,500 1,031,000,901 1,000,559 1,458 1,414 1,581 1,904 2,000 1,275 1,820 1,521 1,581 3,473 2,828 3,400 3,363 3,182,750,707,901,000,559,354 1,346 1,521 1,250 1,601 2,610 1,820 2,264 2,062 2,016 3,640 2,795 3,482 3,354 3,132 1,118 1,250 1,000,559,000,559,791 1,000,707 1,061 2,236 1,458 1,820 1,677 1,581 3,092 2,236 2,926 2,795 2,574,559,791,559,354,559,000 1,250 1,346 1,250 1,601 2,305 1,521 2,000 1,768 1,750 3,482 2,704 3,354 3,260 3,052 1,768 2,016 1,458 1,346,791 1,250,000,354,354,500 1,768 1,118 1,250 1,250 1,061 2,305 1,458 2,136 2,016 1,803 1,803 2,136 1,414 1,521 1,000 1,346,354,000,707,791 1,414,791,901,901,707 2,136 1,414 2,016 1,953 1,768 1,803 1,953 1,581 1,250,707 1,250,354,707,000,354 2,121 1,458 1,601 1,601 1,414 2,512 1,581 2,305 2,136 1,904 2,151 2,305 1,904 1,601 1,061 1,601,500,791,354,000 2,151 1,581 1,601 1,677 1,458 2,250 1,275 2,016 1,820 1,581 2,500 3,010 2,000 2,610 2,236 2,305 1,768 1,414 2,121 2,151,000,791,559,559,707 2,016 2,000 2,136 2,305 2,264 1,768 2,250 1,275 1,820 1,458 1,521 1,118,791 1,458 1,581,791,000,559,250,354 2,305 1,904 2,305 2,358 2,236 2,305 2,761 1,820 2,264 1,820 2,000 1,250,901 1,601 1,601,559,559,000,354,250 1,768 1,521 1,803 1,904 1,820 2,016 2,500 1,521 2,062 1,677 1,768 1,250,901 1,601 1,677,559,250,354,000,250 2,121 1,820 2,151 2,236 2,136 2,062 2,512 1,581 2,016 1,581 1,750 1,061,707 1,414 1,458,707,354,250,250,000 1,953 1,581 1,953 2,016 1,904 3,913 4,272 3,473 3,640 3,092 3,482 2,305 2,136 2,512 2,250 2,016 2,305 1,768 2,121 1,953,000 1,031,354,707,901 3,202 3,473 2,828 2,795 2,236 2,704 1,458 1,414 1,581 1,275 2,000 1,904 1,521 1,820 1,581 1,031,000,750,559,354 3,816 4,138 3,400 3,482 2,926 3,354 2,136 2,016 2,305 2,016 2,136 2,305 1,803 2,151 1,953,354,750,000,354,559 3,750 4,031 3,363 3,354 2,795 3,260 2,016 1,953 2,136 1,820 2,305 2,358 1,904 2,236 2,016,707,559,354,000,250 3,553 3,816 3,182 3,132 2,574 3,052 1,803 1,768 1,904 1,581 2,264 2,236 1,820 2,136 1,904,901,354,559,250,000 Esta es una matriz de disimilaridades Which proximity measure is better? Which clustering method should be used? 14

2. Cluster analysis Example. Hierarchical methods: single linkage Dendrogram using Single Linkage Rescaled Distance Cluster Combine C A S E 0 5 10 15 20 25 Label Num +---------+---------+---------+---------+---------+ 19 òûòòòø 1 20 ò ó 18 òòòòòú 16 òòòòòôòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòø 17 òòòòò ó 2 1 òòòòòòòòòòòòòûòø ó 2 3 òòòòòòòòòòòòò ó ó 4 òòòòòûòòòòòòòòòú ó 6 òòòòò ùòòòòòòòø ó 3 5 òòòòòòòòòòòòòòòú ó ó 2 òòòòòòòòòòòòòòò ó ó 9 òòòòòø ùòòòòòòòòòòòòòòòòòòòòòòòòò 3 4 10 òòòòòú ó 7 òòòòòôòòòòòòòòòòòòòòòòòú 8 òòòòò ó 14 òø ó 15 òú ó 12 òôòòòòòòòòòòòòòø ó 13 ò ùòòòòòòò 4 1 11 òòòòòòòòòòòòòòò 15

2. Cluster analysis Example. Hierarchical methods: complete linkage Dendrogram using Complete Linkage Rescaled Distance Cluster Combine C A S E 0 5 10 15 20 25 Label Num +---------+---------+---------+---------+---------+ 19 òûòø 1 20 ò ùòòòòòø 17 òòò ùòòòòòòòòòòòòòòòòòòòø 16 òûòòòòòòò ó 18 ò ó 4 14 òø ùòòòòòòòòòòòòòòòòòòòø 2 15 òôòòòòòø ó ó 12 ò ùòòòòòòòòòòòòòòòø ó ó 11 òòòûòòò ó ó ó 3 13 òòò ùòòòòò ó 9 òûòòòòòø ó ó 10 ò ùòòòòòòòòòòòòòòò ó 3 4 7 òûòòòòò ó 8 ò ó 4 òûòø ó 6 ò ùòòòòòòòòòø ó 5 òòò ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò 1 òòòûòòòòòø ó 3 òòò ùòòò 2 1 2 òòòòòòòòò 16

2. Cluster analysis Example. Hierarchical methods: Centroid Dendrogram using Centroid Method Rescaled Distance Cluster Combine C A S E 0 5 10 15 20 25 Label Num +---------+---------+---------+---------+---------+ 19 òûòòòø 1 20 ò ùòòòø 17 òòòòò ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòø 16 òòòûòòòòò ó 18 òòò ó 2 4 òòòûòòòø ó 2 6 òòò ùòø ó 5 òòòòòòò ùòòòø ó 1 òòòòòòòòòú ùòòòòòòòòòòòòòòòø ó 3 3 òòòòòòòòò ó ó ó 2 òòòòòòòòòòòòò ó ó 9 òòòø ùòòòòòòòòòòòòòòòòòòò 3 4 10 òòòôòòòòòø ó 7 òòò ùòòòòòòòòòòòòòòòòòø ó 8 òòòòòòòòò ó ó 14 òø ùò 15 òú ó 12 òôòòòòòòòø ó 13 ò ùòòòòòòòòòòòòòòòòò 4 1 11 òòòòòòòòò 17

2. Cluster analysis Shortcomings In hierarchical methods decisions have to be made such as proximity measure and clustering method (results are decision-dependent). k-means can be used only if Euclidean distance is valid for variables in dataset. In both methodologies, there is no specific criterion to determine the number of groups. When dimensionality of the problem gets bigger, no geometric interpretation is possible. 18

The problem of statistical classification Cluster analysis Multidimensional scaling and graph theory The adjacency matrix The Iris data Conclusions and references 19

3. Multidimensional scaling and graph theory Graphs A (weighted) graph on V is a pair G = (V,E), where V is the set of nodes and E is the set of edges or lines which connect them. The edges connect nodes from V, and define the shape of G. In graph theory, only the essential of the drawing may be important: edges are not relevant, just the nodes they connect. Position of nodes is not important, so they can be moved to get a simpler graph. In unsupervised classification problem, cases are the nodes of the graph, and dissimilarity matrix determine the set of edges. At first, graph is complete (each pair of nodes is connected). 20

3. Multidimensional scaling and graph theory Graphs. Example The representation of our example, as a graph, is: Each node (case) is connected with the rest. If there are n nodes, then there are nn ( 1) 2 edges. To ensure graph theory and cluster analysis meet, then V and E must have an adequate structure. 21

3. Multidimensional scaling and graph theory Multidimensional scaling Multidimensional scaling is a statistical method for representing a set of cases, from which their matrix of proximities is known, by a configuration of points in a low-dimensional Euclidean space, in such a form that Euclidean distance between points in this new space represents their dissimilarity at the beginning. This method is useful to put the cases of a classification problem in a Euclidean space, in which it is equivalent the use of k-means clustering or a hierarchical method based on Euclidean distance. 22

3. Multidimensional scaling and graph theory Multidimensional scaling. Where does it come in? Data Matrix X (dim. nxp) Proximities Multidimensional scaling Euclidean distances matrix Euclidean configuration (dim. nxm, m<<p) E V 23

3. Multidimensional scaling and graph theory Cluster analysis related to graph theory Application of multidimensional scaling to dissimilarity matrix between cases in dataset allows the homogenization of cluster analysis techniques with classification in graph theory. Thus, the problem of classification is reduced to the analysis of the distribution of edges in a graph, taking into account the distances in the Euclidean space derived from multidimensional scaling. 24

The problem of statistical classification Cluster analysis Multidimensional scaling and graph theory The adjacency matrix The Iris data Conclusions and references 25

4. The adjacency matrix Introduction In a graph, the adjacency matrix is the most important element, because it can be used to analyse conectivity between nodes (or cases in a dataset). Searching for analogy with cluster analysis, for two nodes in the graph, the stronger the connection they have (smaller distance), the more similar they are. Not every edge has the same importance. If there are long edges, it may be useless to take them into account, as they are connecting very different cases. It is necessary to find an strategy to define the number of groups in a dataset in terms of the distribution of edges. 26

4. The adjacency matrix Distribution of edges. Finding a threshold Edges represent Euclidean distance between cases in dataset in the Euclidean space derived by multidimensional scaling. The distribution of such distances can give us some clues about the existence of group structure in data. Cases can be classified in groups, if a correct threshold is selected: Mean value. Half of the mean value. Median.... 27

4. The adjacency matrix Distribution of edges. Example (I) Histogram of the distribution of edges of the two dimensional example: 25 20 15 10 5 mean = 1.7858. mean/2 = 0.8929. median = 1.8028.. 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 28

4. The adjacency matrix Distribution of edges. Example (II) Threshold = mean = 1.7858 One group. 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 29

4. The adjacency matrix Distribution of edges. Example (III) Threshold = mean/2 = 0.8929 Two groups. 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 30

4. The adjacency matrix Distribution of edges. Example (IV) Threshold = mean/3 = 0.5953 Four groups. 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 31

4. The adjacency matrix Distribution of edges. Example (V) Threshold = 0.6718 (smallest mode of kernel density) 0.5 0.45 0.4 0.35 0.3 X: 0.6718 Y: 0.2747 0.25 0.2 0.15 0.1 0.05 0-1 0 1 2 3 4 5 6 Four groups. 32

The problem of statistical classification Cluster analysis Multidimensional scaling and graph theory The adjacency matrix The Iris data Conclusions and references 33

5. The Iris data The Iris data Fisher,R.A.: "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to Mathematical Statistics" (John Wiley, NY, 1950). The dataset contains 3 classes of 50 instances each (referred to a type of iris plant). One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. Variables: 1. sepal length in cm 2. sepal width in cm 3. petal length in cm 4. petal width in cm Summary Statistics: Min Max Mean SD Class Correlation sepal length: 4.3 7.9 5.84 0.83 0.7826 sepal width: 2.0 4.4 3.05 0.43-0.4194 petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) petal width: 0.1 2.5 1.20 0.76 0.9565 (high!) 34

5. The Iris data Multidimensional scaling 2-dimensional derived configuration is: 0.5 0.4 0.3 0.2 0.1 0-0.1-0.2-0.3-0.4-0.5-1.5-1 -0.5 0 0.5 1 1.5 35

5. The Iris data K-means 2 groups 3 groups Some misclassified objects 36

5. The Iris data Graph 0.5 0.4 0.3 0.2 0.1 0-0.1-0.2-0.3-0.4-0.5-1.5-1 -0.5 0 0.5 1 1.5 600 500 400 300 200 100 0 0 0.5 1 1.5 2 2.5 mean = 0.8260 mean/2 = 0.4130 median = 0.7431 median/2 = 0.3716 37

5. The Iris data Distribution of edges 0.5 Threshold = 0.2605 (smallest mode of kernel density) 0.4 0.3 0.2 0.1 0-0.1-0.2-0.3-0.4-0.5-1.5-1 -0.5 0 0.5 1 1.5 0.5 0.4 0.3 0.2 0.1 0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 X: 0.2605 Y: 0.8633-0.1-0.2-0.3-0.4-0.5-1.5-1 -0.5 0 0.5 1 1.5 0.1 0-0.5 0 0.5 1 1.5 2 2.5 3 38

6. Conclusions and references General conclusions 1. Cluster analysis explores data, searching for groups. 2. Depending on method employed, cluster analysis requires some previous decisions (number of clusters, proximity measure, hierarchical method,...). 3. Multidimensional scaling gives the possibility to represent in a Euclidean space the relationships of proximity in a dataset. 4. The classification problem can be understood as the analysis of edges in a graph. Therefore, graph theory can be applied to classify the objects in a dataset. 39

6. Conclusions and references Particular conclusions 1. Graph theory elements have been used to explore cluster analysis problems. 2. To use graph theory, the study of the distribution of edges is proposed. The used of some parameter derived from distribution is analysed. 3. The best threshold is where the smallest mode of the edge distribution is located (experimentally). 40

6. Conclusions and references Further research 1. There is a need to deepen in relationship between distribution of distances and optimal point selection (simulation and use of robust measures?). 2. It may be possible to use some of the elements exposed for the determination of multivariate outliers (objects which are very far from the rest). 3. The incidence matrix could be used to search for the best classification, if permutations of cases are evaluated. 4. Why use all distances simultaneously? Triangulation in graphs. 41

6. Conclusions and references References (I) Anderberg, M.R. Cluster Analysis for application. Academic Press, 1973. Cheong, M.-Y.; Lee, H. Determining the number of clusters in cluster analysis. Journal of the Korean Statistical Society (2008), to appear. Eldershaw, C.; Hegland, M. Cluster analysis using triangulation. Computational Techniques and Aplications: CTAC97, 201-208, 1997. Gentle, J.E. Elements of Computational Statistics. Springer Verlag, 2002. 42

6. Conclusions and references References (II) Ghahramani, Z. Unsupervised Learning. In Bousquet, O.; Raetsch, G; von Luxburg, U. (Eds.): Advanced Lectures on Machine Learning. Springer Verlag, 2004. Gordon, A.D. Classification. Chapman and Hall, 1981. Hansen, P.; Jaumard, B. Cluster analysis and mathematical programming. Mathematical Programming, 79, 191-215, 1997. Van Ryzin, J. (Ed.) Classification and Clustering. Academic Press, 1977. 43

6. Conclusions and references References (III) Xu, R. Wunsch, D. Survey of Clustering Algorithms. IEEE Transactions on Neural Networks, Vol. 16( 3), 645-678, 2005. Yu, K.; Yu, S.; Tresp, V. Soft clustering on graphs. Advances in Neural Information Processing Systems, 18 (NIPS 2005). 44

Cluster Analysis, Multidimensional Scaling and Graph Theory Dpto. de Estadística, E.E. y O.E.I. Universidad de Alcalá luisf.rivera@uah.es 45