Clustering Using Elements of Information Theory

Size: px
Start display at page:

Download "Clustering Using Elements of Information Theory"

Transcription

1 Clustering Using Elements of Information Theory Daniel de Araújo 1,2, Adrião Dória Neto 2, Jorge Melo 2, and Allan Martins 2 1 Federal Rural University of Semi-Árido, Campus Angicos, Angicos/RN, Brasil daniel@ufersa.edu.br 2 Federal University of Rio Grande do Norte, Departament of Computer Engineering and Automation, Natal/RN, Brasil {adriao,jdmelo,allan}@dca.ufrn.br Abstract. This paper proposes an algorithm for clustering using an information-theoretic based criterion. The cross entropy between elements in different clusters is used as a measure of quality of the partition. The proposed algorithm uses classical clustering algorithms to initialize some small regions (auxiliary clusters) that will be merged to construct the final clusters. The algorithm was tested using several databases with different spatial distributions. Key words: Clustering, Cluster Analysis, Information Theoretic Learning, Complex Datasets, Entropy 1 Introduction There are many fields that clustering techniques can be applied such as marketing, biology, pattern recognition, image segmentation and text processing. Clustering algorithms attempt to organize unlabeled data points into clusters in a way that samples within a cluster are more similar than samples in different clusters [1]. To achieve this task, several algorithms were developed using different heuristics. Although in most part of clustering tasks no information about the underlying structure of the data is used during the clustering process, the majority of clustring algorithms requires the number of classes as a parameter to be given a priori. Moreover, the spatial distribution of the data is another problematic issue in clustering tasks, since most part of the algorithms have some bias to a specific cluster shape. For example, single linkage hierarquical algorithms are sensitive to noise and outliers tending to produce elongated clusters and k-means algorithm yields to elliptical clusters. The incorporation of spatial statistics of the data gives a good measure of spatial distribution of the objects in a dataset. One way of doing that is use information-theoretic elements to help the clustering process. More precisely, Information Theory involves the quantification of information in a dataset using

2 2 Clustering Using Elements of Information Theory some statistical measures. Recently, [2 4] achieved good results using some elements of information theory to help clustering tasks. Based on that, this paper proposes a information-theoretic based heuristic to clustering datasets. In fact, we propose a iterative two-step algorithm that tries to find the best label configuration by switching the labels according to a cost function based on the cross entropy [3]. The utilization of statisticial based measures enables the algorithm to cluster spatial complex datasets. The paper is organized as follows: in Sect. 2 we make some consideration about the information theory elements used in the clustering algorithm; in Sect. 3 we describe the information-theoretic based clustering criterion; in Sect. 4 we present the proposed clustering algorithm; Sect. 5 shows some results obtained and in Sect. 6 the conclusions and considerations are made. 2 Information Theoretic Learning Information Theory involves the quantification of information in a dataset using some statistical measures. The most used information-theoretic measures are Entropy and its variation. Entropy is a measure of uncertainty about a stochastic event or, alternatively, it measures the amount of missing information associated with an event [5]. From the idea of entropy arose other measures of information, like Mutual information [4], Kullback-Leibler divergency [6], cross entropy [4] and joint entropy [4]. Let us consider a dataset X = {x 1, x 2,..., x n } R d with independent and identically distributed (iid) samples. The most traditional measure of information is the Shannon s entropy H s, that is given by [7]: H s (x) = n p k I(p k ) (1) k=1 where n k=1 p k = 1 and p k 0. After that, Alfred Renyi proposed another measure of entropy in the 60 s, known as Renyi s entropy [8]: H R (x) = 1 1 α ln f α (x)dx α 0, α 1 (2) The most used variation of the Renyi entropy is its quadratic form, where α = 2: H R (x) = ln f x (x) 2 dx (3) In (3), f x (x) is a probability density function (PDF). So, it is necessary the estimation of that PDF and as we are working in a clustering context task, we don t have any information about the underlying structure of the data. Then, we used one of the most popular approach to make nonparametric estimation: the Parzen Window estimator [9]. The Parzen Window can be written as:

3 Clustering Using Elements of Information Theory 3 f(x) = 1 N G(x x i, σ 2 ) (4) N where G(x, σ 2 ) is multivariate Gaussian function defined as: i=1 G(x, σ 2 ) = 1 (2π) d/2 P exp 12 (x µ) X «1 (x µ) T 1/2 (5) in this case, is the covariance matrix and d is the dimension of x. When we substitute (4) and (5) in (3), we have: ( 1 H R (x) = ln N = log 1 N 2 N ) N G(x x i, σ 2 ) 1 N i=1 i=1 j=1 N G(x x j, σ 2 ) j=1 N G(x i x j, 2σ 2 ) (6) According to [4] this equation is known as Information Potential, because the similarity to potential energy between physical particles. The Information Potential was successfully used in several works as distance measure or clustering criterion [3, 2, 10]. As we can notice, entropy measures the information of one random variable. When we are interested in quantify the interaction between two different datasets, one choice is to compute the cross entropy between them [3]. Extending the concepts of Renyi s entropy, we can define formally the cross entropy between two random variables X = (x i ) N i=1 and Y = (y j) M j=1 as: ( H(X; Y ) = log ) px(t)py(t)dt = log 1 NM N i=1 j=1 M G(x i y j, 2σ 2 ) (7) The cross entropy criterion is very general and can be used either in a supervised or unsupervised learning framework. We are using an information-theoretic criteria based on maximization of cross entropy between clusters. 3 The Proposed Clustering Criterion Two major issues in clustering are: how to measure similarity (or dissimilarity) between objects or clusters in the dataset and the criterion function to be optimized [1]. For the first issue, the most obvious solution is use the distance

4 4 Clustering Using Elements of Information Theory between the samples. If the distance is used, then one would expect the distance between samples in the same cluster to be significantly less than the distance between samples in different clusters [1]. The most common class of distance used in clustering tasks is the Euclidean distance [1, 11, 12]. Clusters formed using this kind of measure are invariant to translation and rotation in feature space. Some applications, like gene expression analysis, rather use correlation or association coefficients to measure similarity between objects [13,12]. About the criterion function, one of the most used criterion is the sum of squared error. Clusterings of this type produce a partition containing clusters with minimum variance. However, the sum of squared error is most indicated when the natural clusters form compact and separated clouds [1, 14]. Another usual class of algorithms are the agglomerative hierarchical algorithms. That class of algorithm represents the dataset as a hierarchical tree where the root of the tree consists of one cluster containing all objects and the leaves are singleton clusters [1]. The spatial shape of the partitions produced by hierarchical clustering algorithms depends of the linkage criterion used. Single linkage algorithms are sensitive to noise and outliers. Average and complete linkage produces elliptical clusters. This paper proposes the use of cross entropy as a cost function to define the clusters of a given dataset. The objective of the algorithm is to maximize cross entropy between all clusters. As pointed earlier, the cross entropy is based on a entropy measure that need the estimation of the data density distribution. So, the approach utilized in this work is the cross entropy using Parzen Window estimation method described in Sect 2. Using elements of information theory as clustering criterion takes advantage of the underlying statistical information that the data carries. Indeed, the algorithm makes no assumption of the statistical distribution of the data, instead of that, it tries to estimate that distribution and uses it as a measure of similarity between clusters. When the cross entropy is utilized in the clustering context, it is taken into account the relation between each group. This relation is showed as the influence that one cluster can have on another. 4 The Proposed Clustering Algorithm The main goal of a clustering algorithm is to group objects in the dataset putting into the same cluster samples that are similar according to a specific clustering criterion. Iterative distance-based algorithms form a effective class of techniques to deal with clustering tasks. They work based on the minimization of the total squared distance of the samples to their cluster centers. Although, they have some problematic issues like the sensibility to the initial guesses of the cluster centers and restrictions related to the spatial distribution of the natural groups [11].

5 Clustering Using Elements of Information Theory 5 One way of using iterative distance-based algorithms efficiently is to cluster the dataset using a high number of clusters, i.e., using more clusters than the intended number of clusters in final partition and after that merge those clusters to form larger and more homogeneous clusters. Many authors apply this approach in their works and had good results for spatial complex datasets [3, 2, 10]. We are using cross entropy as a cost function and its calculation utilizes all data points of each cluster when we use the Parzen Window estimator. So, the larger the cluster is, the longer is the time to compute the cross entropy. When we apply the strategy of split the dataset into several small regions we treat two issues: the small regions are usually more homogeneous and easier to cluster using algorithms that could not correctly cluster the entire dataset, like k-means and with smaller clusters the time consumption to compute cross entropy decreases. Based on that, the proposed clustering algorithm works in a iterative twostep procedure: first, the dataset is divided in a large number of compact small regions, named auxiliary clusters. Then, each region is randomly labeled according to the specified number of cluster but not yet corresponding to the final partition labels, e.g., if we are dealing with a two-class dataset, two kinds of labels will be distributed to the auxiliary clusters and the two clusters are composed by all regions sharing the same labels. The second step of the algorithm consists in switch the labels of each small region checking whether this change increases the cost function. Every time the cost function is increased the change that causes the raise is kept, otherwise it is reversed. This processes is repeated until there is no changes in the labels. The small regions found in the initial phase work as auxiliary clusters that will be used to discover final clusters in the dataset. The task of finding auxiliary clusters can be made by a traditional clustering algorithm like k-means, competitive neural networks or hierarchical algorithms. In our case, we used the k-means algorithm, that is a well-known clustering algorithm [1]. The label switch process take each auxiliary cluster and changes its label to a different one. Then, the cost function is calculated and if some increase is noticed, that change is recorded and the new label configuration is assumed. After all clusters labels have been changed, the process starts again and continues until there is no new changes in any auxiliary cluster label. This can be seen as a search for the optimal label configuration of the auxiliary clusters and for consequence the optimal configuration of the clusters in the final partition. Due to the initial random assigning process, the proposed algorithm is nondeterministic and can produce different results for different initialization. Also, the number of auxiliary clusters direct influences the overall performance. 5 Experimental Results To test the performance of the proposed clustering algorithm we used some traditional clustering datasets with simple and complex spatial distributions.

6 6 Clustering Using Elements of Information Theory Figure 1 illustrates all datasets. Notice that the dataset A (Fig. 1a) is a classic two well-separated clouds classes and it is the simplest dataset of all used in this paper. The dataset B (Fig. 1b) and dataset C (Fig. 1c) have a more complex spatial distribution. (a) Dataset A: Two well-separated classes dataset (b) Dataset B: Two half-moons dataset (c) Dataset C: Two circles dataset Fig. 1: Datasets If we use the same traditional center-based technique (k-means) that we used to create the initial auxiliary clusters of our algorithm, it would be able to correctly separate the clusters of only one dataset, the simplest one (dataset A). This happens because the clusters into that dataset have spherical shapes. For the rest of tested datasets, k-means could not achieve good results. Figure 2 show the performance of k-means over all datasets tested with our proposed algorithm. As pointed out earlier, the number of auxiliary clusters and the number of final clusters are needed to start the process. Running some pre-experimental

7 Clustering Using Elements of Information Theory 7 (a) (b) (c) Fig. 2: k-means clustering for all datasets. tests, always with the number of clusters equals to the real number of classes, we could find which number of auxiliary cluster is more suitable based on the cross entropy value. Also, since there is some randomness in the initial label assigning process, we run 10 simulations using each dataset and show here the one with greater cross entropy. The results achieved using the proposed algorithm are show in Figs. 3, 4 and 5. For each dataset it is shown the entire process of clustering. In each Fig., the first picture shows the dataset clustered using the auxiliary clusters. The second picture illustrates the initial labels assigned randomly to each auxiliary clusters. The rest of pictures composes the switching label process leading to the last and final picture represent the final partition. As we can see, the algorithm performed the correct separation of the classes in all cases. The dataset A could be corrected clustered using any number of auxiliary clusters. The other datasets, despite the spatial complexity, could be corrected clustered using the specified parameters.

8 8 Clustering Using Elements of Information Theory Fig. 3: Partition achieved using five auxiliary clusters. Fig. 4: Partition achieved using 11 auxiliary clusters.

9 Clustering Using Elements of Information Theory 9 Fig. 5: Partition achieved using 10 auxiliary clusters. Those results are in agreement with other works using Information Theory to help the clustering process. For instance, [10] used Renyi s entropy as a clustering criteria, [15] proposed a co-clustering algorithm using mutual information and [2] who developed a clustering algorithm based on Kullback-Leibler divergence. 6 Conclusions In this paper we proposed a clustering algorithm using elements of information theory as a cost function criterion. A two-step heuristic creates a iterative procedure to form clusters. We also tested the algorithm using simple and complex spatial distribution. When the correct number of auxiliary clusters is used, the algorithm performed perfectly. The use of statistical based measures enables the algorithm to cluster spatial complex datasets using the underlying structure of the data. But it is reasonable to think that the algorithm has some issues derived from its base structure based on the k-means clustering algorithm and the kernel function used to estimate the density probability. But, considering the initial experimental tests, which achieved good results, there is a lot of variables that can be changed to improve the capacity of the algorithm to some clustering tasks. References 1. Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley (2001) 2. Martins, A.M., Neto, A.D.D., Costa, J.D., Costa, J.A.F.: Clustering using neural networks and kullback-leibler divergency. In: Proc. of IEEE International Joint Conference on Neural Networks. Volume 4. (2004)

10 10 Clustering Using Elements of Information Theory 3. Rao, S., de Medeiros Martins, A., Príncipe, J.C.: Mean shift: An information theoretic perspective. Pattern Recogn. Lett. 30(3) (2009) Principe, J.C.: 7. In: Information theoretic learning. John Wiley (2000) 5. Principe, J.C., Xu, D.: Information-theoretic learning using renyi s quadratic entropy. In: Proceedings of the First International Workshop on Independent Component Analysis and Signal Separation, Aussois. (1999) Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22(1) (1951) Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27 (Jul, Oct 1948) , Cover, T.M., Thomas, J.A.: Elements of Information Theory. 2 edn. John Wiley (1991) 9. Parzen, E.: On the estimation of a probability density function and the mode. Annals of Mathematical Statistics (33) (1962) Gokcay, E., Principe, J.C.: Information theoretic clustering. IEEE Trans. Pattern Anal. Mach. Intell. 24(2) (2002) Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. 2. edn. Morgan Kaufmann, San Francisco, CA (2005) 12. Hair, J., ed.: Multivariate data analysis. 6. ed. edn. Pearson/Prentice Hall, Upper Saddle River, NJ (2006) 13. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439) (Oct 1999) Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River, NJ, USA (1988) 15. Dhillon, I., Mallela, S., Modha, D.: Information-theoretic co-clustering. In: In ACM SIGKDD, Peter Grnwald. The Minimum Description Length Principle. (2003)

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

10. Clustering. Introduction to Bioinformatics Jarkko Salojärvi. Based on lecture slides by Samuel Kaski

10. Clustering. Introduction to Bioinformatics Jarkko Salojärvi. Based on lecture slides by Samuel Kaski 10. Clustering Introduction to Bioinformatics 30.9.2008 Jarkko Salojärvi Based on lecture slides by Samuel Kaski Definition of a cluster Typically either 1. A group of mutually similar samples, or 2. A

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Comparison of Optimization Methods for L1-regularized Logistic Regression

Comparison of Optimization Methods for L1-regularized Logistic Regression Comparison of Optimization Methods for L1-regularized Logistic Regression Aleksandar Jovanovich Department of Computer Science and Information Systems Youngstown State University Youngstown, OH 44555 aleksjovanovich@gmail.com

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

Introduction The problem of cancer classication has clear implications on cancer treatment. Additionally, the advent of DNA microarrays introduces a w

Introduction The problem of cancer classication has clear implications on cancer treatment. Additionally, the advent of DNA microarrays introduces a w MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No.677 C.B.C.L Paper No.8

More information

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Estimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification

Estimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification 1 Estimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification Feng Chu and Lipo Wang School of Electrical and Electronic Engineering Nanyang Technological niversity Singapore

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

Hierarchical Clustering

Hierarchical Clustering Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree-like diagram that records the sequences of merges

More information

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Keywords: clustering algorithms, unsupervised learning, cluster validity

Keywords: clustering algorithms, unsupervised learning, cluster validity Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based

More information

Hierarchical Clustering

Hierarchical Clustering What is clustering Partitioning of a data set into subsets. A cluster is a group of relatively homogeneous cases or observations Hierarchical Clustering Mikhail Dozmorov Fall 2016 2/61 What is clustering

More information

Kapitel 4: Clustering

Kapitel 4: Clustering Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.

More information

Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points

Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Dr. T. VELMURUGAN Associate professor, PG and Research Department of Computer Science, D.G.Vaishnav College, Chennai-600106,

More information

Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms

Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms Binoda Nand Prasad*, Mohit Rathore**, Geeta Gupta***, Tarandeep Singh**** *Guru Gobind Singh Indraprastha University,

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

A Fast Approximated k Median Algorithm

A Fast Approximated k Median Algorithm A Fast Approximated k Median Algorithm Eva Gómez Ballester, Luisa Micó, Jose Oncina Universidad de Alicante, Departamento de Lenguajes y Sistemas Informáticos {eva, mico,oncina}@dlsi.ua.es Abstract. The

More information

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan Working with Unlabeled Data Clustering Analysis Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan chanhl@mail.cgu.edu.tw Unsupervised learning Finding centers of similarity using

More information

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

On Sample Weighted Clustering Algorithm using Euclidean and Mahalanobis Distances

On Sample Weighted Clustering Algorithm using Euclidean and Mahalanobis Distances International Journal of Statistics and Systems ISSN 0973-2675 Volume 12, Number 3 (2017), pp. 421-430 Research India Publications http://www.ripublication.com On Sample Weighted Clustering Algorithm using

More information

Motivation. Technical Background

Motivation. Technical Background Handling Outliers through Agglomerative Clustering with Full Model Maximum Likelihood Estimation, with Application to Flow Cytometry Mark Gordon, Justin Li, Kevin Matzen, Bryce Wiedenbeck Motivation Clustering

More information

Data Clustering With Leaders and Subleaders Algorithm

Data Clustering With Leaders and Subleaders Algorithm IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719, Volume 2, Issue 11 (November2012), PP 01-07 Data Clustering With Leaders and Subleaders Algorithm Srinivasulu M 1,Kotilingswara

More information

The Effect of Word Sampling on Document Clustering

The Effect of Word Sampling on Document Clustering The Effect of Word Sampling on Document Clustering OMAR H. KARAM AHMED M. HAMAD SHERIN M. MOUSSA Department of Information Systems Faculty of Computer and Information Sciences University of Ain Shams,

More information

Unsupervised learning on Color Images

Unsupervised learning on Color Images Unsupervised learning on Color Images Sindhuja Vakkalagadda 1, Prasanthi Dhavala 2 1 Computer Science and Systems Engineering, Andhra University, AP, India 2 Computer Science and Systems Engineering, Andhra

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

An adjustable p-exponential clustering algorithm

An adjustable p-exponential clustering algorithm An adjustable p-exponential clustering algorithm Valmir Macario 12 and Francisco de A. T. de Carvalho 2 1- Universidade Federal Rural de Pernambuco - Deinfo Rua Dom Manoel de Medeiros, s/n - Campus Dois

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

PATTERN CLASSIFICATION AND SCENE ANALYSIS

PATTERN CLASSIFICATION AND SCENE ANALYSIS PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane

More information

Chapter DM:II. II. Cluster Analysis

Chapter DM:II. II. Cluster Analysis Chapter DM:II II. Cluster Analysis Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster Analysis DM:II-1

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

The Role of Biomedical Dataset in Classification

The Role of Biomedical Dataset in Classification The Role of Biomedical Dataset in Classification Ajay Kumar Tanwani and Muddassar Farooq Next Generation Intelligent Networks Research Center (nexgin RC) National University of Computer & Emerging Sciences

More information

Distances, Clustering! Rafael Irizarry!

Distances, Clustering! Rafael Irizarry! Distances, Clustering! Rafael Irizarry! Heatmaps! Distance! Clustering organizes things that are close into groups! What does it mean for two genes to be close?! What does it mean for two samples to

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for

More information

6. Dicretization methods 6.1 The purpose of discretization

6. Dicretization methods 6.1 The purpose of discretization 6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many

More information

HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION

HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION 1 M.S.Rekha, 2 S.G.Nawaz 1 PG SCALOR, CSE, SRI KRISHNADEVARAYA ENGINEERING COLLEGE, GOOTY 2 ASSOCIATE PROFESSOR, SRI KRISHNADEVARAYA

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 11, November 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Auxiliary Variational Information Maximization for Dimensionality Reduction

Auxiliary Variational Information Maximization for Dimensionality Reduction Auxiliary Variational Information Maximization for Dimensionality Reduction Felix Agakov 1 and David Barber 2 1 University of Edinburgh, 5 Forrest Hill, EH1 2QL Edinburgh, UK felixa@inf.ed.ac.uk, www.anc.ed.ac.uk

More information

Clustering. Shishir K. Shah

Clustering. Shishir K. Shah Clustering Shishir K. Shah Acknowledgement: Notes by Profs. M. Pollefeys, R. Jin, B. Liu, Y. Ukrainitz, B. Sarel, D. Forsyth, M. Shah, K. Grauman, and S. K. Shah Clustering l Clustering is a technique

More information

SGN (4 cr) Chapter 11

SGN (4 cr) Chapter 11 SGN-41006 (4 cr) Chapter 11 Clustering Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 25, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Data Clustering. Danushka Bollegala

Data Clustering. Danushka Bollegala Data Clustering Danushka Bollegala Outline Why cluster data? Clustering as unsupervised learning Clustering algorithms k-means, k-medoids agglomerative clustering Brown s clustering Spectral clustering

More information

Clustering part II 1

Clustering part II 1 Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:

More information

Ensembles. An ensemble is a set of classifiers whose combined results give the final decision. test feature vector

Ensembles. An ensemble is a set of classifiers whose combined results give the final decision. test feature vector Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier 3 super classifier result 1 * *A model is the learned

More information

Cluster Analysis for Microarray Data

Cluster Analysis for Microarray Data Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that

More information

A Memetic Heuristic for the Co-clustering Problem

A Memetic Heuristic for the Co-clustering Problem A Memetic Heuristic for the Co-clustering Problem Mohammad Khoshneshin 1, Mahtab Ghazizadeh 2, W. Nick Street 1, and Jeffrey W. Ohlmann 1 1 The University of Iowa, Iowa City IA 52242, USA {mohammad-khoshneshin,nick-street,jeffrey-ohlmann}@uiowa.edu

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other

More information

Package twilight. August 3, 2013

Package twilight. August 3, 2013 Package twilight August 3, 2013 Version 1.37.0 Title Estimation of local false discovery rate Author Stefanie Scheid In a typical microarray setting with gene expression data observed

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Biclustering Bioinformatics Data Sets. A Possibilistic Approach

Biclustering Bioinformatics Data Sets. A Possibilistic Approach Possibilistic algorithm Bioinformatics Data Sets: A Possibilistic Approach Dept Computer and Information Sciences, University of Genova ITALY EMFCSC Erice 20/4/2007 Bioinformatics Data Sets Outline Introduction

More information

Progress Report: Collaborative Filtering Using Bregman Co-clustering

Progress Report: Collaborative Filtering Using Bregman Co-clustering Progress Report: Collaborative Filtering Using Bregman Co-clustering Wei Tang, Srivatsan Ramanujam, and Andrew Dreher April 4, 2008 1 Introduction Analytics are becoming increasingly important for business

More information

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS H.S Behera Department of Computer Science and Engineering, Veer Surendra Sai University

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Clustering algorithms

Clustering algorithms Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 1 / 22 Table of contents 1 Supervised

More information

Texture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map

Texture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map Texture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map Markus Turtinen, Topi Mäenpää, and Matti Pietikäinen Machine Vision Group, P.O.Box 4500, FIN-90014 University

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process

More information

Image retrieval based on bag of images

Image retrieval based on bag of images University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 Image retrieval based on bag of images Jun Zhang University of Wollongong

More information

Spatial Outlier Detection

Spatial Outlier Detection Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point

More information

Information-Theoretic Co-clustering

Information-Theoretic Co-clustering Information-Theoretic Co-clustering Authors: I. S. Dhillon, S. Mallela, and D. S. Modha. MALNIS Presentation Qiufen Qi, Zheyuan Yu 20 May 2004 Outline 1. Introduction 2. Information Theory Concepts 3.

More information

Applying the Information Bottleneck Principle to Unsupervised Clustering of Discrete and Continuous Image Representations

Applying the Information Bottleneck Principle to Unsupervised Clustering of Discrete and Continuous Image Representations Applying the Information Bottleneck Principle to Unsupervised Clustering of Discrete and Continuous Image Representations Shiri Gordon Hayit Greenspan Jacob Goldberger The Engineering Department Tel Aviv

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

K-Means Clustering 3/3/17

K-Means Clustering 3/3/17 K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Cluster homogeneity as a semi-supervised principle for feature selection using mutual information

Cluster homogeneity as a semi-supervised principle for feature selection using mutual information Cluster homogeneity as a semi-supervised principle for feature selection using mutual information Frederico Coelho 1 and Antonio Padua Braga 1 andmichelverleysen 2 1- Universidade Federal de Minas Gerais

More information

Region-based Segmentation

Region-based Segmentation Region-based Segmentation Image Segmentation Group similar components (such as, pixels in an image, image frames in a video) to obtain a compact representation. Applications: Finding tumors, veins, etc.

More information

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

k-means demo Administrative Machine learning: Unsupervised learning Assignment 5 out Machine learning: Unsupervised learning" David Kauchak cs Spring 0 adapted from: http://www.stanford.edu/class/cs76/handouts/lecture7-clustering.ppt http://www.youtube.com/watch?v=or_-y-eilqo Administrative

More information

Image Segmentation for Image Object Extraction

Image Segmentation for Image Object Extraction Image Segmentation for Image Object Extraction Rohit Kamble, Keshav Kaul # Computer Department, Vishwakarma Institute of Information Technology, Pune kamble.rohit@hotmail.com, kaul.keshav@gmail.com ABSTRACT

More information

Some questions of consensus building using co-association

Some questions of consensus building using co-association Some questions of consensus building using co-association VITALIY TAYANOV Polish-Japanese High School of Computer Technics Aleja Legionow, 4190, Bytom POLAND vtayanov@yahoo.com Abstract: In this paper

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

COMS 4771 Clustering. Nakul Verma

COMS 4771 Clustering. Nakul Verma COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find

More information

Mean Shift Tracking. CS4243 Computer Vision and Pattern Recognition. Leow Wee Kheng

Mean Shift Tracking. CS4243 Computer Vision and Pattern Recognition. Leow Wee Kheng CS4243 Computer Vision and Pattern Recognition Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore (CS4243) Mean Shift Tracking 1 / 28 Mean Shift Mean Shift

More information

Bumptrees for Efficient Function, Constraint, and Classification Learning

Bumptrees for Efficient Function, Constraint, and Classification Learning umptrees for Efficient Function, Constraint, and Classification Learning Stephen M. Omohundro International Computer Science Institute 1947 Center Street, Suite 600 erkeley, California 94704 Abstract A

More information

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering. Chapter 4 Fuzzy Segmentation 4. Introduction. The segmentation of objects whose color-composition is not common represents a difficult task, due to the illumination and the appropriate threshold selection

More information

Color Image Segmentation

Color Image Segmentation Color Image Segmentation Yining Deng, B. S. Manjunath and Hyundoo Shin* Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 93106-9560 *Samsung Electronics Inc.

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Adaptive Metric Nearest Neighbor Classification

Adaptive Metric Nearest Neighbor Classification Adaptive Metric Nearest Neighbor Classification Carlotta Domeniconi Jing Peng Dimitrios Gunopulos Computer Science Department Computer Science Department Computer Science Department University of California

More information