Cluster Analysis, Multidimensional Scaling and Graph Theory. Cluster Analysis, Multidimensional Scaling and Graph Theory
|
|
- Bryan Moore
- 6 years ago
- Views:
Transcription
1 Cluster Analysis, Multidimensional Scaling and Graph Theory Dpto. de Estadística, E.E. y O.E.I. Universidad de Alcalá luisf.rivera@uah.es 1
2 Outline The problem of statistical classification Cluster analysis Multidimensional scaling and graph theory The adjacency matrix The Iris data Conclusions and references 2
3 1. The problem of Statistical Classification Introduction Identification of groups of similar cases is a very important task in everyday research. Information of p variables may be measured over n individuals. What is the group structure of these cases? X x11 x12 x1 p = x21 x22 x2p xn1 xn2 xnp 3
4 1. The problem of Statistical Classification Classification and statistical learning Classification systems look for a rule to classify objects. They can be supervised or unsupervised, depending on the existence of a prior knowledge of classes to which the objects belong. Classical methods: Discriminant Analysis Cluster Analysis Modern methods: Statistical learning Supervised Unsupervised 4
5 1. The problem of Statistical Classification Supervised vs. unsupervised learning (I) To develop a supervised classification system, it is necessary to know the classes (C) in which the population is divided, and also to which class each observed individual belongs. For each case i, we must know to which class it belongs, from set {1,2,...,C}: y1 y Y = 2, yi { 1,2,..., C}, i = 1,..., n. yn 5
6 1. The problem of Statistical Classification Supervised vs. unsupervised learning (II) A supervised classification system provides some kind of mathematical function Y = Y(X,w), where w is a vector of parameters adjusted from data. The values of these parameters are determined using a learning algorithm, which usually tries to minimize a function of classification error. Supervised classifiers Discriminant Analysis Neural Networks SVM Trees... 6
7 1. The problem of Statistical Classification Supervised vs. unsupervised learning (III) Unsupervised classification tries to find out the existing group structure in data, in a natural way. Normally, real classes (C) in population are unknown, thus there is no knowledge about the class each object belongs to. This kind of problems is sometimes referred of as pattern recognition, in the sense that it's intended to discover classes of objects in data. 7
8 1. The problem of Statistical Classification Supervised vs. unsupervised learning (IV) Unsupervised classification algorithms seek to divide the data set in some groups or classes of elements. Normally, a group is described as a set of similar cases, that are different to cases classified in other groups. It is necessary to find a way to measure the closeness between cases. Dissimilarity measures are used. Unsupervised classifiers Cluster analysis Neural networks K-NN... 8
9 The problem of statistical classification Cluster analysis Multidimensional scaling and graph theory The adjacency matrix The Iris data Conclusions and references 9
10 2. Cluster analysis Introduction The purpose of Cluster analysis is to discover groups of elements in data, according to the following properties: Each element belongs to only one group. Every element must be classified in one group. Elements in a group must be homogenous (similar) and different to elements in other groups. Clustering methods can be: Partitioning: based on the elements in dataset. Hierarchical: based on distances between elements in dataset. 10
11 2. Cluster analysis Example Let s consider this dataset (2 dim.): X 1 X 2 2,25 3,50 2,50 4,00 2,25 3,00 3,00 3,50 3,25 3,00 2,75 3,25 3,50 2,25 3,25 2,00 3,75 2,50 4,00 2,25 2,25 1,00 2,50 1,75 2,75 1,25 2,50 1,50 2,75 1,50 4,00,00 4,25 1,00 4,25,25 4,50,50 4,50,75 How many groups are there? 11
12 2. Cluster analysis Example. k-means (I) k=2 k=3 12
13 2. Cluster analysis Example. k-means (II) k=4 k=5 What s the structure of this dataset? 13
14 2. Cluster analysis Example. Hierarchical methods While as k-means is based on dataset, hierarchical methods are based on distances between cases in the dataset (nxn matrix): Caso Matriz de distancias distancia euclídea ,000,559,500,750 1,118,559 1,768 1,803 1,803 2,151 2,500 1,768 2,305 2,016 2,062 3,913 3,202 3,816 3,750 3,553,559,000 1,031,707 1,250,791 2,016 2,136 1,953 2,305 3,010 2,250 2,761 2,500 2,512 4,272 3,473 4,138 4,031 3,816,500 1,031,000,901 1,000,559 1,458 1,414 1,581 1,904 2,000 1,275 1,820 1,521 1,581 3,473 2,828 3,400 3,363 3,182,750,707,901,000,559,354 1,346 1,521 1,250 1,601 2,610 1,820 2,264 2,062 2,016 3,640 2,795 3,482 3,354 3,132 1,118 1,250 1,000,559,000,559,791 1,000,707 1,061 2,236 1,458 1,820 1,677 1,581 3,092 2,236 2,926 2,795 2,574,559,791,559,354,559,000 1,250 1,346 1,250 1,601 2,305 1,521 2,000 1,768 1,750 3,482 2,704 3,354 3,260 3,052 1,768 2,016 1,458 1,346,791 1,250,000,354,354,500 1,768 1,118 1,250 1,250 1,061 2,305 1,458 2,136 2,016 1,803 1,803 2,136 1,414 1,521 1,000 1,346,354,000,707,791 1,414,791,901,901,707 2,136 1,414 2,016 1,953 1,768 1,803 1,953 1,581 1,250,707 1,250,354,707,000,354 2,121 1,458 1,601 1,601 1,414 2,512 1,581 2,305 2,136 1,904 2,151 2,305 1,904 1,601 1,061 1,601,500,791,354,000 2,151 1,581 1,601 1,677 1,458 2,250 1,275 2,016 1,820 1,581 2,500 3,010 2,000 2,610 2,236 2,305 1,768 1,414 2,121 2,151,000,791,559,559,707 2,016 2,000 2,136 2,305 2,264 1,768 2,250 1,275 1,820 1,458 1,521 1,118,791 1,458 1,581,791,000,559,250,354 2,305 1,904 2,305 2,358 2,236 2,305 2,761 1,820 2,264 1,820 2,000 1,250,901 1,601 1,601,559,559,000,354,250 1,768 1,521 1,803 1,904 1,820 2,016 2,500 1,521 2,062 1,677 1,768 1,250,901 1,601 1,677,559,250,354,000,250 2,121 1,820 2,151 2,236 2,136 2,062 2,512 1,581 2,016 1,581 1,750 1,061,707 1,414 1,458,707,354,250,250,000 1,953 1,581 1,953 2,016 1,904 3,913 4,272 3,473 3,640 3,092 3,482 2,305 2,136 2,512 2,250 2,016 2,305 1,768 2,121 1,953,000 1,031,354,707,901 3,202 3,473 2,828 2,795 2,236 2,704 1,458 1,414 1,581 1,275 2,000 1,904 1,521 1,820 1,581 1,031,000,750,559,354 3,816 4,138 3,400 3,482 2,926 3,354 2,136 2,016 2,305 2,016 2,136 2,305 1,803 2,151 1,953,354,750,000,354,559 3,750 4,031 3,363 3,354 2,795 3,260 2,016 1,953 2,136 1,820 2,305 2,358 1,904 2,236 2,016,707,559,354,000,250 3,553 3,816 3,182 3,132 2,574 3,052 1,803 1,768 1,904 1,581 2,264 2,236 1,820 2,136 1,904,901,354,559,250,000 Esta es una matriz de disimilaridades Which proximity measure is better? Which clustering method should be used? 14
15 2. Cluster analysis Example. Hierarchical methods: single linkage Dendrogram using Single Linkage Rescaled Distance Cluster Combine C A S E Label Num òûòòòø 1 20 ò ó 18 òòòòòú 16 òòòòòôòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòø 17 òòòòò ó 2 1 òòòòòòòòòòòòòûòø ó 2 3 òòòòòòòòòòòòò ó ó 4 òòòòòûòòòòòòòòòú ó 6 òòòòò ùòòòòòòòø ó 3 5 òòòòòòòòòòòòòòòú ó ó 2 òòòòòòòòòòòòòòò ó ó 9 òòòòòø ùòòòòòòòòòòòòòòòòòòòòòòòòò òòòòòú ó 7 òòòòòôòòòòòòòòòòòòòòòòòú 8 òòòòò ó 14 òø ó 15 òú ó 12 òôòòòòòòòòòòòòòø ó 13 ò ùòòòòòòò òòòòòòòòòòòòòòò 15
16 2. Cluster analysis Example. Hierarchical methods: complete linkage Dendrogram using Complete Linkage Rescaled Distance Cluster Combine C A S E Label Num òûòø 1 20 ò ùòòòòòø 17 òòò ùòòòòòòòòòòòòòòòòòòòø 16 òûòòòòòòò ó 18 ò ó 4 14 òø ùòòòòòòòòòòòòòòòòòòòø 2 15 òôòòòòòø ó ó 12 ò ùòòòòòòòòòòòòòòòø ó ó 11 òòòûòòò ó ó ó 3 13 òòò ùòòòòò ó 9 òûòòòòòø ó ó 10 ò ùòòòòòòòòòòòòòòò ó òûòòòòò ó 8 ò ó 4 òûòø ó 6 ò ùòòòòòòòòòø ó 5 òòò ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò 1 òòòûòòòòòø ó 3 òòò ùòòò òòòòòòòòò 16
17 2. Cluster analysis Example. Hierarchical methods: Centroid Dendrogram using Centroid Method Rescaled Distance Cluster Combine C A S E Label Num òûòòòø 1 20 ò ùòòòø 17 òòòòò ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòø 16 òòòûòòòòò ó 18 òòò ó 2 4 òòòûòòòø ó 2 6 òòò ùòø ó 5 òòòòòòò ùòòòø ó 1 òòòòòòòòòú ùòòòòòòòòòòòòòòòø ó 3 3 òòòòòòòòò ó ó ó 2 òòòòòòòòòòòòò ó ó 9 òòòø ùòòòòòòòòòòòòòòòòòòò òòòôòòòòòø ó 7 òòò ùòòòòòòòòòòòòòòòòòø ó 8 òòòòòòòòò ó ó 14 òø ùò 15 òú ó 12 òôòòòòòòòø ó 13 ò ùòòòòòòòòòòòòòòòòò òòòòòòòòò 17
18 2. Cluster analysis Shortcomings In hierarchical methods decisions have to be made such as proximity measure and clustering method (results are decision-dependent). k-means can be used only if Euclidean distance is valid for variables in dataset. In both methodologies, there is no specific criterion to determine the number of groups. When dimensionality of the problem gets bigger, no geometric interpretation is possible. 18
19 The problem of statistical classification Cluster analysis Multidimensional scaling and graph theory The adjacency matrix The Iris data Conclusions and references 19
20 3. Multidimensional scaling and graph theory Graphs A (weighted) graph on V is a pair G = (V,E), where V is the set of nodes and E is the set of edges or lines which connect them. The edges connect nodes from V, and define the shape of G. In graph theory, only the essential of the drawing may be important: edges are not relevant, just the nodes they connect. Position of nodes is not important, so they can be moved to get a simpler graph. In unsupervised classification problem, cases are the nodes of the graph, and dissimilarity matrix determine the set of edges. At first, graph is complete (each pair of nodes is connected). 20
21 3. Multidimensional scaling and graph theory Graphs. Example The representation of our example, as a graph, is: Each node (case) is connected with the rest. If there are n nodes, then there are nn ( 1) 2 edges. To ensure graph theory and cluster analysis meet, then V and E must have an adequate structure. 21
22 3. Multidimensional scaling and graph theory Multidimensional scaling Multidimensional scaling is a statistical method for representing a set of cases, from which their matrix of proximities is known, by a configuration of points in a low-dimensional Euclidean space, in such a form that Euclidean distance between points in this new space represents their dissimilarity at the beginning. This method is useful to put the cases of a classification problem in a Euclidean space, in which it is equivalent the use of k-means clustering or a hierarchical method based on Euclidean distance. 22
23 3. Multidimensional scaling and graph theory Multidimensional scaling. Where does it come in? Data Matrix X (dim. nxp) Proximities Multidimensional scaling Euclidean distances matrix Euclidean configuration (dim. nxm, m<<p) E V 23
24 3. Multidimensional scaling and graph theory Cluster analysis related to graph theory Application of multidimensional scaling to dissimilarity matrix between cases in dataset allows the homogenization of cluster analysis techniques with classification in graph theory. Thus, the problem of classification is reduced to the analysis of the distribution of edges in a graph, taking into account the distances in the Euclidean space derived from multidimensional scaling. 24
25 The problem of statistical classification Cluster analysis Multidimensional scaling and graph theory The adjacency matrix The Iris data Conclusions and references 25
26 4. The adjacency matrix Introduction In a graph, the adjacency matrix is the most important element, because it can be used to analyse conectivity between nodes (or cases in a dataset). Searching for analogy with cluster analysis, for two nodes in the graph, the stronger the connection they have (smaller distance), the more similar they are. Not every edge has the same importance. If there are long edges, it may be useless to take them into account, as they are connecting very different cases. It is necessary to find an strategy to define the number of groups in a dataset in terms of the distribution of edges. 26
27 4. The adjacency matrix Distribution of edges. Finding a threshold Edges represent Euclidean distance between cases in dataset in the Euclidean space derived by multidimensional scaling. The distribution of such distances can give us some clues about the existence of group structure in data. Cases can be classified in groups, if a correct threshold is selected: Mean value. Half of the mean value. Median
28 4. The adjacency matrix Distribution of edges. Example (I) Histogram of the distribution of edges of the two dimensional example: mean = mean/2 = median =
29 4. The adjacency matrix Distribution of edges. Example (II) Threshold = mean = One group
30 4. The adjacency matrix Distribution of edges. Example (III) Threshold = mean/2 = Two groups
31 4. The adjacency matrix Distribution of edges. Example (IV) Threshold = mean/3 = Four groups
32 4. The adjacency matrix Distribution of edges. Example (V) Threshold = (smallest mode of kernel density) X: Y: Four groups. 32
33 The problem of statistical classification Cluster analysis Multidimensional scaling and graph theory The adjacency matrix The Iris data Conclusions and references 33
34 5. The Iris data The Iris data Fisher,R.A.: "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, (1936); also in "Contributions to Mathematical Statistics" (John Wiley, NY, 1950). The dataset contains 3 classes of 50 instances each (referred to a type of iris plant). One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. Variables: 1. sepal length in cm 2. sepal width in cm 3. petal length in cm 4. petal width in cm Summary Statistics: Min Max Mean SD Class Correlation sepal length: sepal width: petal length: (high!) petal width: (high!) 34
35 5. The Iris data Multidimensional scaling 2-dimensional derived configuration is:
36 5. The Iris data K-means 2 groups 3 groups Some misclassified objects 36
37 5. The Iris data Graph mean = mean/2 = median = median/2 =
38 5. The Iris data Distribution of edges 0.5 Threshold = (smallest mode of kernel density) X: Y:
39 6. Conclusions and references General conclusions 1. Cluster analysis explores data, searching for groups. 2. Depending on method employed, cluster analysis requires some previous decisions (number of clusters, proximity measure, hierarchical method,...). 3. Multidimensional scaling gives the possibility to represent in a Euclidean space the relationships of proximity in a dataset. 4. The classification problem can be understood as the analysis of edges in a graph. Therefore, graph theory can be applied to classify the objects in a dataset. 39
40 6. Conclusions and references Particular conclusions 1. Graph theory elements have been used to explore cluster analysis problems. 2. To use graph theory, the study of the distribution of edges is proposed. The used of some parameter derived from distribution is analysed. 3. The best threshold is where the smallest mode of the edge distribution is located (experimentally). 40
41 6. Conclusions and references Further research 1. There is a need to deepen in relationship between distribution of distances and optimal point selection (simulation and use of robust measures?). 2. It may be possible to use some of the elements exposed for the determination of multivariate outliers (objects which are very far from the rest). 3. The incidence matrix could be used to search for the best classification, if permutations of cases are evaluated. 4. Why use all distances simultaneously? Triangulation in graphs. 41
42 6. Conclusions and references References (I) Anderberg, M.R. Cluster Analysis for application. Academic Press, Cheong, M.-Y.; Lee, H. Determining the number of clusters in cluster analysis. Journal of the Korean Statistical Society (2008), to appear. Eldershaw, C.; Hegland, M. Cluster analysis using triangulation. Computational Techniques and Aplications: CTAC97, , Gentle, J.E. Elements of Computational Statistics. Springer Verlag,
43 6. Conclusions and references References (II) Ghahramani, Z. Unsupervised Learning. In Bousquet, O.; Raetsch, G; von Luxburg, U. (Eds.): Advanced Lectures on Machine Learning. Springer Verlag, Gordon, A.D. Classification. Chapman and Hall, Hansen, P.; Jaumard, B. Cluster analysis and mathematical programming. Mathematical Programming, 79, , Van Ryzin, J. (Ed.) Classification and Clustering. Academic Press,
44 6. Conclusions and references References (III) Xu, R. Wunsch, D. Survey of Clustering Algorithms. IEEE Transactions on Neural Networks, Vol. 16( 3), , Yu, K.; Yu, S.; Tresp, V. Soft clustering on graphs. Advances in Neural Information Processing Systems, 18 (NIPS 2005). 44
45 Cluster Analysis, Multidimensional Scaling and Graph Theory Dpto. de Estadística, E.E. y O.E.I. Universidad de Alcalá luisf.rivera@uah.es 45
Introduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationAn Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs
An Introduction to Cluster Analysis Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs zhaoxia@ics.uci.edu 1 What can you say about the figure? signal C 0.0 0.5 1.0 1500 subjects Two
More informationCluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6
Cluster Analysis and Visualization Workshop on Statistics and Machine Learning 2004/2/6 Outlines Introduction Stages in Clustering Clustering Analysis and Visualization One/two-dimensional Data Histogram,
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationClustering. Chapter 10 in Introduction to statistical learning
Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What
More informationCluster Analysis: Agglomerate Hierarchical Clustering
Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationPattern Clustering with Similarity Measures
Pattern Clustering with Similarity Measures Akula Ratna Babu 1, Miriyala Markandeyulu 2, Bussa V R R Nagarjuna 3 1 Pursuing M.Tech(CSE), Vignan s Lara Institute of Technology and Science, Vadlamudi, Guntur,
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 5
Clustering Part 5 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville SNN Approach to Clustering Ordinary distance measures have problems Euclidean
More informationPerformance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms
Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms Binoda Nand Prasad*, Mohit Rathore**, Geeta Gupta***, Tarandeep Singh**** *Guru Gobind Singh Indraprastha University,
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationAN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION WILLIAM ROBSON SCHWARTZ University of Maryland, Department of Computer Science College Park, MD, USA, 20742-327, schwartz@cs.umd.edu RICARDO
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationConcept Tree Based Clustering Visualization with Shaded Similarity Matrices
Syracuse University SURFACE School of Information Studies: Faculty Scholarship School of Information Studies (ischool) 12-2002 Concept Tree Based Clustering Visualization with Shaded Similarity Matrices
More informationData Clustering Hierarchical Clustering, Density based clustering Grid based clustering
Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms
More informationStats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms
Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California,
More informationHARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION
HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION 1 M.S.Rekha, 2 S.G.Nawaz 1 PG SCALOR, CSE, SRI KRISHNADEVARAYA ENGINEERING COLLEGE, GOOTY 2 ASSOCIATE PROFESSOR, SRI KRISHNADEVARAYA
More informationDynamic Clustering of Data with Modified K-Means Algorithm
2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationMATH5745 Multivariate Methods Lecture 13
MATH5745 Multivariate Methods Lecture 13 April 24, 2018 MATH5745 Multivariate Methods Lecture 13 April 24, 2018 1 / 33 Cluster analysis. Example: Fisher iris data Fisher (1936) 1 iris data consists of
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More information10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2
161 Machine Learning Hierarchical clustering Reading: Bishop: 9-9.2 Second half: Overview Clustering - Hierarchical, semi-supervised learning Graphical models - Bayesian networks, HMMs, Reasoning under
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationKTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn
KTH ROYAL INSTITUTE OF TECHNOLOGY Lecture 14 Machine Learning. K-means, knn Contents K-means clustering K-Nearest Neighbour Power Systems Analysis An automated learning approach Understanding states in
More informationClustering. Supervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationABSTRACT INTRODUCTION
Raster-Vector Conversion Methods for Automated Cartography With Applications in Polygon Maps and Feature Analysis Shin-yi Hsu Xingyuan Huang Department of Geography Department of Geography SUNY-Binghamton
More informationDISCRETIZATION BASED ON CLUSTERING METHODS. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania
DISCRETIZATION BASED ON CLUSTERING METHODS Daniela Joiţa Titu Maiorescu University, Bucharest, Romania daniela.oita@utm.ro Abstract. Many data mining algorithms require as a pre-processing step the discretization
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationTexture Image Segmentation using FCM
Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore Texture Image Segmentation using FCM Kanchan S. Deshmukh + M.G.M
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationSummary. Machine Learning: Introduction. Marcin Sydow
Outline of this Lecture Data Motivation for Data Mining and Learning Idea of Learning Decision Table: Cases and Attributes Supervised and Unsupervised Learning Classication and Regression Examples Data:
More informationFUZZY KERNEL K-MEDOIDS ALGORITHM FOR MULTICLASS MULTIDIMENSIONAL DATA CLASSIFICATION
FUZZY KERNEL K-MEDOIDS ALGORITHM FOR MULTICLASS MULTIDIMENSIONAL DATA CLASSIFICATION 1 ZUHERMAN RUSTAM, 2 AINI SURI TALITA 1 Senior Lecturer, Department of Mathematics, Faculty of Mathematics and Natural
More informationCluster Analysis using Spherical SOM
Cluster Analysis using Spherical SOM H. Tokutaka 1, P.K. Kihato 2, K. Fujimura 2 and M. Ohkita 2 1) SOM Japan Co-LTD, 2) Electrical and Electronic Department, Tottori University Email: {tokutaka@somj.com,
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationA Support Vector Method for Hierarchical Clustering
A Support Vector Method for Hierarchical Clustering Asa Ben-Hur Faculty of IE and Management Technion, Haifa 32, Israel David Horn School of Physics and Astronomy Tel Aviv University, Tel Aviv 69978, Israel
More informationRule extraction from support vector machines
Rule extraction from support vector machines Haydemar Núñez 1,3 Cecilio Angulo 1,2 Andreu Català 1,2 1 Dept. of Systems Engineering, Polytechnical University of Catalonia Avda. Victor Balaguer s/n E-08800
More informationINF 4300 Classification III Anne Solberg The agenda today:
INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15
More informationCPS331 Lecture: Resemblance-Based Learning last revised November 8, 2018
CPS331 Lecture: Resemblance-Based Learning last revised November 8, 2018 Objectives: 1. To introduce support vector machines 2. To introduce the notion of linear separability 3. To introduce the "kernel
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and
More informationStatistical Methods in AI
Statistical Methods in AI Distance Based and Linear Classifiers Shrenik Lad, 200901097 INTRODUCTION : The aim of the project was to understand different types of classification algorithms by implementing
More informationHsiaochun Hsu Date: 12/12/15. Support Vector Machine With Data Reduction
Support Vector Machine With Data Reduction 1 Table of Contents Summary... 3 1. Introduction of Support Vector Machines... 3 1.1 Brief Introduction of Support Vector Machines... 3 1.2 SVM Simple Experiment...
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationMotivation. Technical Background
Handling Outliers through Agglomerative Clustering with Full Model Maximum Likelihood Estimation, with Application to Flow Cytometry Mark Gordon, Justin Li, Kevin Matzen, Bryce Wiedenbeck Motivation Clustering
More informationNotes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)
1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More informationModel-based segmentation and recognition from range data
Model-based segmentation and recognition from range data Jan Boehm Institute for Photogrammetry Universität Stuttgart Germany Keywords: range image, segmentation, object recognition, CAD ABSTRACT This
More informationNotes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)
1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should
More informationInternational Journal Of Engineering And Computer Science ISSN: Volume 5 Issue 11 Nov. 2016, Page No.
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 5 Issue 11 Nov. 2016, Page No. 19054-19062 Review on K-Mode Clustering Antara Prakash, Simran Kalera, Archisha
More informationComparision between Quad tree based K-Means and EM Algorithm for Fault Prediction
Comparision between Quad tree based K-Means and EM Algorithm for Fault Prediction Swapna M. Patil Dept.Of Computer science and Engineering,Walchand Institute Of Technology,Solapur,413006 R.V.Argiddi Assistant
More informationOn Sample Weighted Clustering Algorithm using Euclidean and Mahalanobis Distances
International Journal of Statistics and Systems ISSN 0973-2675 Volume 12, Number 3 (2017), pp. 421-430 Research India Publications http://www.ripublication.com On Sample Weighted Clustering Algorithm using
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 2
Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical
More informationAn Unsupervised Technique for Statistical Data Analysis Using Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 5, Number 1 (2013), pp. 11-20 International Research Publication House http://www.irphouse.com An Unsupervised Technique
More informationColor based segmentation using clustering techniques
Color based segmentation using clustering techniques 1 Deepali Jain, 2 Shivangi Chaudhary 1 Communication Engineering, 1 Galgotias University, Greater Noida, India Abstract - Segmentation of an image defines
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems
More informationKeywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.
Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationAn Enhanced K-Medoid Clustering Algorithm
An Enhanced Clustering Algorithm Archna Kumari Science &Engineering kumara.archana14@gmail.com Pramod S. Nair Science &Engineering, pramodsnair@yahoo.com Sheetal Kumrawat Science &Engineering, sheetal2692@gmail.com
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationRobust PDF Table Locator
Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records
More informationData mining with Support Vector Machine
Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine
More informationPattern recognition. Classification/Clustering GW Chapter 12 (some concepts) Textures
Pattern recognition Classification/Clustering GW Chapter 12 (some concepts) Textures Patterns and pattern classes Pattern: arrangement of descriptors Descriptors: features Patten class: family of patterns
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationA Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis
A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract
More informationHierarchical Clustering
What is clustering Partitioning of a data set into subsets. A cluster is a group of relatively homogeneous cases or observations Hierarchical Clustering Mikhail Dozmorov Fall 2016 2/61 What is clustering
More informationSupervised Variable Clustering for Classification of NIR Spectra
Supervised Variable Clustering for Classification of NIR Spectra Catherine Krier *, Damien François 2, Fabrice Rossi 3, Michel Verleysen, Université catholique de Louvain, Machine Learning Group, place
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationApplication of Fuzzy Classification in Bankruptcy Prediction
Application of Fuzzy Classification in Bankruptcy Prediction Zijiang Yang 1 and Guojun Gan 2 1 York University zyang@mathstat.yorku.ca 2 York University gjgan@mathstat.yorku.ca Abstract. Classification
More informationHow do microarrays work
Lecture 3 (continued) Alvis Brazma European Bioinformatics Institute How do microarrays work condition mrna cdna hybridise to microarray condition Sample RNA extract labelled acid acid acid nucleic acid
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More informationUNSUPERVISED LEARNING IN R. Introduction to hierarchical clustering
UNSUPERVISED LEARNING IN R Introduction to hierarchical clustering Hierarchical clustering Number of clusters is not known ahead of time Two kinds: bottom-up and top-down, this course bottom-up Hierarchical
More informationData mining techniques for actuaries: an overview
Data mining techniques for actuaries: an overview Emiliano A. Valdez joint work with Banghee So and Guojun Gan University of Connecticut Advances in Predictive Analytics (APA) Conference University of
More informationMachine learning Pattern recognition. Classification/Clustering GW Chapter 12 (some concepts) Textures
Machine learning Pattern recognition Classification/Clustering GW Chapter 12 (some concepts) Textures Patterns and pattern classes Pattern: arrangement of descriptors Descriptors: features Patten class:
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationCase-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationClustering algorithms
Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 1 / 22 Table of contents 1 Supervised
More informationEE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR
EE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR 1.Introductıon. 2.Multi Layer Perception.. 3.Fuzzy C-Means Clustering.. 4.Real
More informationCLUSTER ANALYSIS. V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi
CLUSTER ANALYSIS V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi-110 012 In multivariate situation, the primary interest of the experimenter is to examine and understand the relationship amongst the
More informationBRACE: A Paradigm For the Discretization of Continuously Valued Data
Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science
More informationChemometrics. Description of Pirouette Algorithms. Technical Note. Abstract
19-1214 Chemometrics Technical Note Description of Pirouette Algorithms Abstract This discussion introduces the three analysis realms available in Pirouette and briefly describes each of the algorithms
More informationSummer School in Statistics for Astronomers & Physicists June 15-17, Cluster Analysis
Summer School in Statistics for Astronomers & Physicists June 15-17, 2005 Session on Computational Algorithms for Astrostatistics Cluster Analysis Max Buot Department of Statistics Carnegie-Mellon University
More informationFlexible Lag Definition for Experimental Variogram Calculation
Flexible Lag Definition for Experimental Variogram Calculation Yupeng Li and Miguel Cuba The inference of the experimental variogram in geostatistics commonly relies on the method of moments approach.
More informationExploratory Analysis: Clustering
Exploratory Analysis: Clustering (some material taken or adapted from slides by Hinrich Schutze) Heejun Kim June 26, 2018 Clustering objective Grouping documents or instances into subsets or clusters Documents
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationPARALLEL CLASSIFICATION ALGORITHMS
PARALLEL CLASSIFICATION ALGORITHMS By: Faiz Quraishi Riti Sharma 9 th May, 2013 OVERVIEW Introduction Types of Classification Linear Classification Support Vector Machines Parallel SVM Approach Decision
More informationUnsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis
7 Supervised learning vs unsupervised learning Unsupervised Learning Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute These patterns are then
More informationData Clustering With Leaders and Subleaders Algorithm
IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719, Volume 2, Issue 11 (November2012), PP 01-07 Data Clustering With Leaders and Subleaders Algorithm Srinivasulu M 1,Kotilingswara
More informationAn adjustable p-exponential clustering algorithm
An adjustable p-exponential clustering algorithm Valmir Macario 12 and Francisco de A. T. de Carvalho 2 1- Universidade Federal Rural de Pernambuco - Deinfo Rua Dom Manoel de Medeiros, s/n - Campus Dois
More information