Data Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1394
|
|
- Dinah Whitehead
- 5 years ago
- Views:
Transcription
1 Data Mining Clustering Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
2 Table of contents 1 Introduction 2 Data matrix and dissimilarity matrix 3 Proximity Measures 4 Clustering methods Partitioning methods Hierarchical methods Model-based clustering Density based clustering Grid-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
3 Table of contents 1 Introduction 2 Data matrix and dissimilarity matrix 3 Proximity Measures 4 Clustering methods Partitioning methods Hierarchical methods Model-based clustering Density based clustering Grid-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
4 Introduction Clustering is the process of grouping a set of data objects into multiple groups or clusters so that objects within a cluster have high similarity, but are very dissimilar to objects in other clusters. Dissimilarities and similarities are assessed based on the attribute values describing the objects and often involve distance measures. Clustering as a data mining tool has its roots in many application areas such as biology, security, business intelligence, and Web search. Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
5 Requirements for cluster analysis Clustering is a challenging research field and the following are its typical requirements. Scalability Ability to deal with different types of attributes Discovery of clusters with arbitrary shape Requirements for domain knowledge to determine input parameters Ability to deal with noisy data Incremental clustering and insensitivity to input order Capability of clustering high-dimensionality data Constraint-based clustering Interpretability and usability Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
6 Comparing clustering methods The clustering methods can be compared using the following aspects: The partitioning criteria : In some methods, all the objects are partitioned so that no hierarchy exists among the clusters. Separation of clusters : In some methods, data partitioned into mutually exclusive clusters while in some other methods, the clusters may not be exclusive, that is, a data object may belong to more than one cluster. Similarity measure : Some methods determine the similarity between two objects by the distance between them; while in other methods, the similarity may be defined by connectivity based on density or contiguity. Clustering space : Many clustering methods search for clusters within the entire data space. These methods are useful for low-dimensionality data sets. With high- dimensional data, however, there can be many irrelevant attributes, which can make similarity measurements unreliable. Consequently, clusters found in the full space are often meaningless. Its often better to instead search for clusters within different subspaces of the same data set. Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
7 Table of contents 1 Introduction 2 Data matrix and dissimilarity matrix 3 Proximity Measures 4 Clustering methods Partitioning methods Hierarchical methods Model-based clustering Density based clustering Grid-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
8 Data matrix and dissimilarity matrix Suppose that we have n objects described by p attributes. The objects are x 1 = (x 11, x 12,..., x 1p ), x 2 = (x 21, x 22,..., x 2p ), and so on, where x ij is the value for object x i of the j th attribute. For brevity, we hereafter refer to object x i as object i. The objects may be tuples in a relational database, and are also referred to as data samples or feature vectors. Main memory-based clustering and nearest-neighbor algorithms typically operate on either of the following two data structures: Data matrix This structure stores the n objects in the form of a table or n p matrix. x x 1f... x 1p..... x i1... x if... x ip..... x n1... x nf... x np Dissimilarity matrix : This structure stores a collection of proximities that are available for all pairs of objects. It is often represented by an n n matrix or table: 0 d(1, 2) d(1, 3)... d(1, n) d(2, 1) 0 d(2, 3)... d(2, n) d(n, 1) d(n, 2) d(n, 3)... 0 Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
9 Table of contents 1 Introduction 2 Data matrix and dissimilarity matrix 3 Proximity Measures 4 Clustering methods Partitioning methods Hierarchical methods Model-based clustering Density based clustering Grid-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
10 Proximity Measures indicates that the patient does not. Treating binary attributes as if they are numeric can be misleading. Therefore, methods specific to binary data are necessary for computing dissimilarity. So, how can we compute the dissimilarity between two binary attributes? One approach involves computing a dissimilarity matrix from the given binary data. If all Proximity measures for nominal attributes : Let the number of states of a nominal binary attributes are thought of as having the same weight, we have the 2 2 contingency table of Table between 2.3, where two q isobjects the number i of and attributes j canthat be equal computed 1 for both objects based on attribute be M. The dissimilarity the ratio of mismatches: i and j, r is the number of attributes d(i, j) = p that m equal 1 for object i but equal 0 for object j, s is the number of attributes that equal 0 for object i but equal 1 for object j, and t is the number of attributes that equal 0 for both objects i and j. The total number of attributes is p, where p = q + r + s + t. p where m is the number ofrecall matches that for and symmetric p is the binary total attributes, number each of stateattributes is equally valuable. describing Dissimilarity that is based on symmetric binary attributes is called symmetric binary the objects. dissimilarity. If objects i and j are described by symmetric binary attributes, then the Proximity measures for binary attributes : Binary attributes are either symmetric or asymmetric. Table 2.3 Contingency Table for Binary Attributes Object j 1 0 sum 1 q r q+ r Object i 0 s t s+ t sum q + s r+ t p For symmetric binary attributes, similarity is calculated as r + s d(i, j) = q + r + s + t For asymmetric binary attributes when the number of negative matches, t, is unimportant and the number of positive matches, q, is important, similarity is calculated as d(i, j) = r + s q + r + s Coefficient 1 d(i, j) is called the Jaccard coefficient. Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
11 Proximity Measures (cont.) Dissimilarity of numeric attributes : The most popular distance measure is Euclidean distance d(i, j) = (x i1 x j2 ) 2 + (x i2 x j1 ) (x ip x jp ) 2 Another well-known measure is Manhattan distance d(i, j) = x i1 x j2 + x i2 x j x ip x jp Minkowski distance is generalization of Euclidean and Manhattan distances d(i, j) = h x i1 x j2 h + x i2 x j1 h x ip x jp h Dissimilarity of ordinal attributes : We first replace each x if by its corresponding rank r if {1,..., M f } and then normalize it using z if = r if 1 M f 1 Then dissimilarity can be computed using distance measures for numeric attributes using z if. Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
12 Proximity Measures (cont.) Dissimilarity for attributes of mixed types : A more preferable approach is to process all attribute types together, performing a single analysis. d(i, j) = p f =1 δ(f ) ij d (f ) ij p f =1 δ(f ) ij where the indicator δ (f ) ij = 0 if either x if or x jf is missing x if = x jf = 0 and attribute f is asymmetric binary and otherwise δ (f ) ij = 1. The distance d (f ) ij is computed based on the type of attribute f. Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
13 Table of contents 1 Introduction 2 Data matrix and dissimilarity matrix 3 Proximity Measures 4 Clustering methods Partitioning methods Hierarchical methods Model-based clustering Density based clustering Grid-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
14 sify a given algorithm as uniquely belonging to only one clustering method category. Furthermore, some applications may have clustering criteria that require the integration of several clustering techniques. In the following sections, we examine each clustering method in detail. Advanced There are many clustering clustering methodsalgorithms and relatedinissues the literature. are discussed It is indifficult Chapter to 11. provide In general, a crisp the categorization notation of clustering used is as follows. methods Letbecause D a data these set of categories n objects tomay be clustered. overlap so Anthat objecta is method may described have features by d variables, from where several each categories. variable is also In general, called anthe attribute major orfundamental a dimension, clustering methods can be classified into the following categories. Clustering methods Method Partitioning methods Hierarchical methods Density-based methods Grid-based methods General Characteristics Find mutually exclusive clusters of spherical shape Distance-based May use mean or medoid (etc.) to represent cluster center Effective for small- to medium-size data sets Clustering is a hierarchical decomposition (i.e., multiple levels) Cannot correct erroneous merges or splits May incorporate other techniques like microclustering or consider object linkages Can find arbitrarily shaped clusters Clusters are dense regions of objects in space that are separated by low-density regions Cluster density: Each point must have a minimum number of points within its neighborhood May filter out outliers Use a multiresolution grid data structure Fast processing time (typically independent of the number of data objects, yet dependent on grid size) Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
15 Partitioning methods The simplest and most fundamental version of cluster analysis is partitioning, which organizes the objects of a set into several exclusive groups or clusters. Formally, given a data set, D, of n objects, and k, the number of clusters to form, a partitioning algorithm organizes the objects into k partitions (k n), where each partition represents a cluster. The clusters are formed to optimize an objective partitioning criterion, such as a dissimilarity function based on distance, so that the objects within a cluster are similar to one another and dissimilar to objects in other clusters in terms of the data set attributes Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
16 k-means clustering algorithm Suppose a data set, D, contains n objects in Euclidean space. Partitioning methods distribute the objects in D into k clusters, C 1,..., C k, that is, C i D and C i C j = φ for (1 i, j k). An objective function is used to assess the partitioning quality so that objects within a cluster are similar to one another but dissimilar to objects in other clusters. This is, the objective function aims for high intracluster similarity and low intercluster similarity. A centroid-based partitioning technique uses the centroid of a cluster, C i, to represent that cluster. The difference between an object p C i and µ i, the representative of the cluster, is measured by p µ i. The quality of cluster C i can be measured by the within-cluster variation, which is the sum of squared error between all objects in C i and the centroid c i, defined as E = n i=1 p C i p µ i 2 Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
17 336 Representative-based Clustering k-means clustering algorithm (cont.) (a) Initial dataset µ 1 = 2 µ 2 = (b) Iteration: t = 1 µ 1 = 2.5 µ 2 = µ 1 = (c) Iteration: t = 2 µ 2 = (d) Iteration: t = 3 µ 1 = 4.75 µ 2 = µ 1 = 7 (e) Iteration: t = µ 2 = (f) Iteration: t = 5(converged) Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
18 k-means clustering algorithm (cont.) The k-means method is not guaranteed to converge to the global optimum and often terminates at a local optimum. The results may depend on the initial random selection of cluster centers. o obtain good results in practice, it is common to run the k-means algorithm multiple times with different initial cluster centers. The time complexity of the k-means algorithm is O(nkt), where n is the total number of objects, k is the number of clusters, and t is the number of iterations. Normally, k n and t n. Therefore, the method is relatively scalable and efficient in processing large data sets. There are several variants of the k-means method. These can differ in the selection of the initial k-means, the calculation of dissimilarity, and the strategies for calculating cluster means. The k-modes method is a variant of k-means, which extends the k-means paradigm to cluster nominal data by replacing the means of clusters with modes. The partitioning around medoid (PAM) is a realization of k-medoids method. Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
19 Hierarchical methods 0 Chapter A hierarchical 10 Clusterclustering Analysis: Basic method Concepts works andby Methods grouping data objects into a hierarchy or tree of clusters. Agglomerative (AGNES) Step 0 Step 1 Step 2 Step 3 Step 4 a ab b c cde abcde d de e Step 4 Step 3 Step 2 Step 1 Step 0 Divisive (DIANA) Figure Hierarchical 10.6 Agglomerative clustering andmethods divisive hierarchical clustering on data objects {a, b, c, d, e}. Agglomerative hierarchical clustering Divisive hierarchical Level clustering a b c d e l = l = 1 l = 2 Hamid Beigy (Sharif University of Technology) Data Mining Fall / ty scale
20 Distance measures in hierarchical methods Whether using an agglomerative method or a divisive method, a core need is to measure the distance between two clusters, where each cluster is generally a set of objects. Four widely used measures for distance between clusters are as follows, where p q is the distance between two objects or points, p and q; µ i is the mean for cluster, C i ; and n i is the number of objects in C i. They are also known as linkage measures. Minimum distance d min (C i, C j ) = min { p q } p C i,q C j Maximum distance Mean distance Average distance d max (C i, C j ) = max p C i,q C j { p q } d mean (C i, C j ) = µ i µ j d min (C i, C j ) = 1 n i n j p C i,q C j p q Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
21 Hierarchical methods Step 4 Step 3 Step 2 Step 1 Step 0 Divisive (DIANA) Agglomerative and divisive hierarchical clustering on data objects {a, b, c, d, e}. Level l = 0 l = 1 l = 2 l = 3 l = 4 a b c d e Similarity scale Dendrogram representation for hierarchical clustering of data objects {a, b, c, d, e}. different clusters. This is a single-linkage approach in that each cluster is represente by all the objects in the cluster, and the similarity between two clusters is measure Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
22 Model-based clustering k-means is closely related to a probabilistic model known as the Gaussian mixture model. p(x) = k π k N (x µ k, Σ k ) π k, k, Σ k are parameters. π k are called mixing proportions, each Gaussian is called a mixture component. The model is simply a weighted sum of Gaussians. But it is much more powerful than a Gaussian mixture models example single Gaussian, because it can model multi-modal distributions. Gaussian m A Gaussia Note that for p(x) to I Abe mixture a probability of three Gaussians. distribution, we require that k π k = 1 and that for all k we have π k > 0. Thus, we may interpret the π k as probabilities themselves. Set of parameters θ = {{π k }, {µ k }, {Σ k }} Roland Memisevic Machine Learning 21 Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
23 Model-based clustering (cont.) Let X = {x 1,..., x n } be drawn i.i.d. from mixture of Gaussian. The log-likelihood of the observations equals to n k ln p(x θ) = ln π j N (x n µ j, Σ j ) i=1 Setting the derivatives of ln p(x θ) with respect to µ j and setting it equal to zero, we obtain N π j N (x i µ j, Σ j ) 0 = k l=1 π ln (x i µ l, Σ l ) Σ j(x i µ j ) Let i=1 γ(z ij ) = j=1 π j N (x i µ j, Σ j ) k l=1 π ln (x i µ l, Σ l ) Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
24 Model-based clustering (cont.) We had Multiplying by Σ 1 j 0 = N i=1 and rearranging, we obtain π j N (x i µ j, Σ j ) k l=1 π ln (x i µ l, Σ l ) Σ j(x i µ j ) Similar to the above step, we obtain Please read 9.2 of Bishop. µ j = 1 n γ(z ij )x i n j n j = i=1 n γ(z ij ) i=1 Σ j = 1 n γ(z ij )(x i µ j )(x i µ j ) T n j π j = n j n i=1 Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
25 Figure Clusters of arbitrary shape. Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31 Regions with High Density Density based clustering How can we find dense regions in density-based clustering? The density of an object o can be measured by the number of objects close to o. DBSCAN (Density-Based Spatial Their general Clustering ideaof of Applications these methods with Noise) tofinds continue core objects, growing that is, a given objectscluster that have asdense long as the neighborhoods. It connects core objects and their neighborhoods to form dense regions density in the neighborhood exceeds some threshold. as clusters. How can we How finddoes dense DBSCAN regions quantify in density-based the neighborhood clustering? of an object? A user-specified parameterof > an0 object is used tox specify can bethemeasured radius of aby neighborhood the number we consider of objects for every closeobject. to x. The density The -neighborhood of an object o is the space within a radius centered at o. DBSCAN (Density-Based Due to the fixed neighborhood Spatial Clustering size parameterized of Applications by, the with density Noise) of a neighborhood that can is, objects be measured thatsimply have by dense the number neighborhoods. of objects in the neighborhood. To deter- finds core objects, It connects minecore whether objects a neighborhood and their neighborhoods is dense not, DBSCAN to form dense uses another regions user-specified as clusters.
26 Density based clustering (cont.) 395 How does DBSCAN quantify the neighborhood of an object? 320 user-specified para- meter ɛ > 0 is used to specify the radius of a neighborhood we consider for every object. 245 Definition (ɛ-neighborhood) 170 The ɛ-neighborhood of an object x is the space within a radius ɛ centered at x. 95 Due to the fixed neighborhood size parameterized by ɛ, the density of a neighbor-hood can be measured simply by the number of objects in the neighborhood. 20 Definition (ɛ-neighborhood) Figure Density-based dataset. An object is a core object if the ɛ-neighborhood of the object contains at least MinPts objects. X 1 x ϵ x y z (a) Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31 (b)
27 Figure Density-reachability and density-connectivity in density-based clustering. Source: Based on MinPts = 3 Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31 Density based clustering (cont.) Given a set, D, of objects, we can identify all core objects with respect to the given parameters, ɛ and MinPts. The clustering task is therein reduced to using core objects and their neighborhoods to form dense regions, where the dense regions are clusters. Definition (Directly density-reachable) For a core object q and an object p, we say that p is directly density-reachable from q (with respect to ɛ and MinPts) if p is within the ɛ neighborhood of q. An object p is directly density-reachable from another 10.4 object Density-Based q if and Methods only if 473 q is a core object and p is in the ɛ neighborhood of q. q m p s r o
28 Density based clustering (cont.) How can we assemble a large dense region using small dense regions centered by core objects? Definition (Density-reachable) An object p is density-reachable from q (with respect to ɛ and MinPts in D) if there is a chain of objects p 1,..., p n, such that p 1 = q, p n = p, and p i is Density-Based directly density-reachable Methods 473 from p i with respect to ɛ and MinPts, for 1 i n, p i D. q m p s r o MinPts = 3 Figure Density-reachability and density-connectivity in density-based clustering. Source: Based on Ester, Kriegel, Sander, and Xu [EKSX96]. Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
29 Density based clustering (cont.) To connect core objects as well as their neighbors in a dense region, DBSCAN uses the notion of density-connectedness. Definition (Density-connected) Two objects p 1, p 2 D are density-connected with respect to ɛ and MinPts if there is an object q D such that both p 1 and p 2 are density-reachable 10.4 Density-Based from q with Methods respect 473 to ɛ and MinPts. q m p s r o MinPts = 3 Figure Density-reachability and density-connectivity in density-based clustering. Source: Based on Ester, Kriegel, Sander, and Xu [EKSX96]. Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
30 Density based clustering (cont.) How does DBSCAN find clusters? 1 Initially, all objects in data set D are marked as unvisited. 2 It randomly selects an unvisited object p, marks p as visited, and checks whether p is core point or not. 3 If p is not core point, then p is marked as a noise point. Otherwise, a new cluster C is created for p, and all the objects in the ɛ neighborhood of p are added to a candidate set N. 4 DBSCAN iteratively adds to C those objects in N that do not belong to any cluster. 5 In this process, for an object p N that carries the label unvisited, DBSCAN marks it as visited and checks its ɛ neighborhood. 6 If p is a core point, then those objects in its ɛ neighborhood are added to N. 7 DBSCAN continues adding objects to C until C can no longer be expanded, that is, N is empty. At this time, cluster C is completed, and thus is output. 8 To find the next cluster, DBSCAN randomly selects an unvisited object from the remaining ones. 9 The clustering process continues until all objects are visited. Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
31 Density based clustering (example) X X 1 Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
32 Grid-based clustering ter 10 Cluster Analysis: Basic Concepts and Methods The grid-based clustering approach uses a multiresolution grid data structure. First layer (i 1)st layer ith layer 0.19 Hierarchical structure for STING clustering. Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
33 Table of contents 1 Introduction 2 Data matrix and dissimilarity matrix 3 Proximity Measures 4 Clustering methods Partitioning methods Hierarchical methods Model-based clustering Density based clustering Grid-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
34 Cluster validation and assessment Cluster evaluation assesses the feasibility of clustering analysis on a data set and the quality of the results generated by a clustering method. The major tasks of clustering evaluation include the following: 1 Assessing clustering tendency : In this task, for a given data set, we assess whether a nonrandom structure exists in the data. Cluster ing analysis on a data set is meaningful only when there is a nonrandom structure in the data. 2 Determining the number of clusters in a data set : Algorithms such as k-means, require the number of clusters in a data set as the parameter. Moreover, the number of clusters can be regarded as an interesting and important summary statistic of a data set. Therefore, it is desirable to estimate this number even before a clustering algorithm is used to derive detailed clusters. A simple method is to set the number of clusters to about n/2 for a data set of n points. 3 Measuring clustering quality : After applying a clustering method on a data set, we want to assess how good the resulting clusters are. There are also measures that score clusterings and thus can compare two sets of clustering results on the same data set. Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
35 Assessing clustering tendency Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
36 Cluster validation and assessment How good is the clustering generated by a method, and how can we compare the clusterings generated by different methods? 1 Internal criterion : Typical objective functions in clustering formalize the goal of attaining high intra-cluster similarity and low inter-cluster similarity. But good scores on an internal criterion do not necessarily translate into good effectiveness in an application. An alternative to internal criteria is direct evaluation in the application of interest. 2 External criterion : External criterion evaluates how well the clustering matches the gold standard classes. The Rand index measures the percentage of decisions that are correct. Hamid Beigy (Sharif University of Technology) Data Mining Fall / 31
Data Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Clustering Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 41 Table of contents 1 Introduction 2 Data matrix and
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationData Mining: Concepts and Techniques. Chapter March 8, 2007 Data Mining: Concepts and Techniques 1
Data Mining: Concepts and Techniques Chapter 7.1-4 March 8, 2007 Data Mining: Concepts and Techniques 1 1. What is Cluster Analysis? 2. Types of Data in Cluster Analysis Chapter 7 Cluster Analysis 3. A
More informationClustering in Data Mining
Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,
More informationCluster analysis. Agnieszka Nowak - Brzezinska
Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that
More informationDATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm
DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More informationUnsupervised Learning. Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team
Unsupervised Learning Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team Table of Contents 1)Clustering: Introduction and Basic Concepts 2)An Overview of Popular Clustering Methods 3)Other Unsupervised
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationNotes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)
1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 2
Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: AK 232 Fall 2016 More Discussions, Limitations v Center based clustering K-means BFR algorithm
More informationWorking with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan
Working with Unlabeled Data Clustering Analysis Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan chanhl@mail.cgu.edu.tw Unsupervised learning Finding centers of similarity using
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationClustering part II 1
Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:
More informationNotes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)
1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 5
Clustering Part 5 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville SNN Approach to Clustering Ordinary distance measures have problems Euclidean
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationCluster Analysis. CSE634 Data Mining
Cluster Analysis CSE634 Data Mining Agenda Introduction Clustering Requirements Data Representation Partitioning Methods K-Means Clustering K-Medoids Clustering Constrained K-Means clustering Introduction
More informationCluster Analysis: Basic Concepts and Methods
HAN 17-ch10-443-496-9780123814791 2011/6/1 3:44 Page 443 #1 10 Cluster Analysis: Basic Concepts and Methods Imagine that you are the Director of Customer Relationships at AllElectronics, and you have five
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: KH 116 Fall 2017 Updates: v Progress Presentation: Week 15: 11/30 v Next Week Office hours
More informationCS Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts
More informationLecture 7 Cluster Analysis: Part A
Lecture 7 Cluster Analysis: Part A Zhou Shuigeng May 7, 2007 2007-6-23 Data Mining: Tech. & Appl. 1 Outline What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationRoad map. Basic concepts
Clustering Basic concepts Road map K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use?
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationPAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods
Whatis Cluster Analysis? Clustering Types of Data in Cluster Analysis Clustering part II A Categorization of Major Clustering Methods Partitioning i Methods Hierarchical Methods Partitioning i i Algorithms:
More informationData Mining Algorithms
for the original version: -JörgSander and Martin Ester - Jiawei Han and Micheline Kamber Data Management and Exploration Prof. Dr. Thomas Seidl Data Mining Algorithms Lecture Course with Tutorials Wintersemester
More information7.1 Euclidean and Manhattan distances between two objects The k-means partitioning algorithm... 21
Contents 7 Cluster Analysis 7 7.1 What Is Cluster Analysis?......................................... 7 7.2 Types of Data in Cluster Analysis.................................... 9 7.2.1 Interval-Scaled
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationChapter 4: Text Clustering
4.1 Introduction to Text Clustering Clustering is an unsupervised method of grouping texts / documents in such a way that in spite of having little knowledge about the content of the documents, we can
More informationCluster Analysis. Angela Montanari and Laura Anderlucci
Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a
More informationClustering algorithms
Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 1 / 22 Table of contents 1 Supervised
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationCommunity Detection. Jian Pei: CMPT 741/459 Clustering (1) 2
Clustering Community Detection http://image.slidesharecdn.com/communitydetectionitilecturejune0-0609559-phpapp0/95/community-detection-in-social-media--78.jpg?cb=3087368 Jian Pei: CMPT 74/459 Clustering
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationCOMP 465: Data Mining Still More on Clustering
3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following
More informationKeywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.
Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering
More informationSponsored by AIAT.or.th and KINDML, SIIT
CC: BY NC ND Table of Contents Chapter 4. Clustering and Association Analysis... 171 4.1. Cluster Analysis or Clustering... 171 4.1.1. Distance and similarity measurement... 173 4.1.2. Clustering Methods...
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationClustering (Basic concepts and Algorithms) Entscheidungsunterstützungssysteme
Clustering (Basic concepts and Algorithms) Entscheidungsunterstützungssysteme Why do we need to find similarity? Similarity underlies many data science methods and solutions to business problems. Some
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1
More informationAnalysis and Extensions of Popular Clustering Algorithms
Analysis and Extensions of Popular Clustering Algorithms Renáta Iváncsy, Attila Babos, Csaba Legány Department of Automation and Applied Informatics and HAS-BUTE Control Research Group Budapest University
More informationFinding Clusters 1 / 60
Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-means Density Based Clustering, e.g. DBScan Grid Based Clustering 1 / 60
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Slides From Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining
More informationLesson 3. Prof. Enza Messina
Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical
More informationA Review on Cluster Based Approach in Data Mining
A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,
More informationClustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY
Clustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY Clustering Algorithm Clustering is an unsupervised machine learning algorithm that divides a data into meaningful sub-groups,
More informationUnsupervised Learning Hierarchical Methods
Unsupervised Learning Hierarchical Methods Road Map. Basic Concepts 2. BIRCH 3. ROCK The Principle Group data objects into a tree of clusters Hierarchical methods can be Agglomerative: bottom-up approach
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More informationHierarchical Clustering
What is clustering Partitioning of a data set into subsets. A cluster is a group of relatively homogeneous cases or observations Hierarchical Clustering Mikhail Dozmorov Fall 2016 2/61 What is clustering
More informationLecture-17: Clustering with K-Means (Contd: DT + Random Forest)
Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Medha Vidyotma April 24, 2018 1 Contd. Random Forest For Example, if there are 50 scholars who take the measurement of the length of the
More informationClustering in Ratemaking: Applications in Territories Clustering
Clustering in Ratemaking: Applications in Territories Clustering Ji Yao, PhD FIA ASTIN 13th-16th July 2008 INTRODUCTION Structure of talk Quickly introduce clustering and its application in insurance ratemaking
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationKnowledge Discovery in Databases
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Lecture notes Knowledge Discovery in Databases Summer Semester 2012 Lecture 8: Clustering
More informationWhat is Cluster Analysis? COMP 465: Data Mining Clustering Basics. Applications of Cluster Analysis. Clustering: Application Examples 3/17/2015
// What is Cluster Analysis? COMP : Data Mining Clustering Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, rd ed. Cluster: A collection of data
More informationCluster Analysis. Outline. Motivation. Examples Applications. Han and Kamber, ch 8
Outline Cluster Analysis Han and Kamber, ch Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Model-Based Methods CS by Rattikorn Hewett Texas Tech University Motivation
More informationKapitel 4: Clustering
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.
More informationClustering Techniques
Clustering Techniques Marco BOTTA Dipartimento di Informatica Università di Torino botta@di.unito.it www.di.unito.it/~botta/didattica/clustering.html Data Clustering Outline What is cluster analysis? What
More informationPart I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a
Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationData Clustering Hierarchical Clustering, Density based clustering Grid based clustering
Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationData Mining 4. Cluster Analysis
Data Mining 4. Cluster Analysis 4.5 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction DBSCAN Algorithm OPTICS Algorithm DENCLUE Algorithm References Outline Introduction Introduction Density-based
More informationClustering Lecture 4: Density-based Methods
Clustering Lecture 4: Density-based Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced
More informationIntroduction to Mobile Robotics
Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationDATA MINING - 1DL105, 1Dl111. An introductory class in data mining
1 DATA MINING - 1DL105, 1Dl111 Fall 007 An introductory class in data mining http://user.it.uu.se/~udbl/dm-ht007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationClustering. Chapter 10 in Introduction to statistical learning
Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationClustering Tips and Tricks in 45 minutes (maybe more :)
Clustering Tips and Tricks in 45 minutes (maybe more :) Olfa Nasraoui, University of Louisville Tutorial for the Data Science for Social Good Fellowship 2015 cohort @DSSG2015@University of Chicago https://www.researchgate.net/profile/olfa_nasraoui
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationClustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search
Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2
More informationCOMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS
COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS Mariam Rehman Lahore College for Women University Lahore, Pakistan mariam.rehman321@gmail.com Syed Atif Mehdi University of Management and Technology Lahore,
More informationClustering. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 238
Clustering Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2015 163 / 238 What is Clustering? Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester
More informationData Mining: Concepts and Techniques. Chapter 7 Jiawei Han. University of Illinois at Urbana-Champaign. Department of Computer Science
Data Mining: Concepts and Techniques Chapter 7 Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj 6 Jiawei Han and Micheline Kamber, All rights reserved
More informationDBSCAN. Presented by: Garrett Poppe
DBSCAN Presented by: Garrett Poppe A density-based algorithm for discovering clusters in large spatial databases with noise by Martin Ester, Hans-peter Kriegel, Jörg S, Xiaowei Xu Slides adapted from resources
More informationClustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures
Clustering and Dissimilarity Measures Clustering APR Course, Delft, The Netherlands Marco Loog May 19, 2008 1 What salient structures exist in the data? How many clusters? May 19, 2008 2 Cluster Analysis
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Scalable Clustering Methods: BIRCH and Others Reading: Chapter 10.3 Han, Chapter 9.5 Tan Cengiz Gunay, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei.
More informationd(2,1) d(3,1 ) d (3,2) 0 ( n, ) ( n ,2)......
Data Mining i Topic: Clustering CSEE Department, e t, UMBC Some of the slides used in this presentation are prepared by Jiawei Han and Micheline Kamber Cluster Analysis What is Cluster Analysis? Types
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More informationChapter DM:II. II. Cluster Analysis
Chapter DM:II II. Cluster Analysis Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster Analysis DM:II-1
More informationSGN (4 cr) Chapter 11
SGN-41006 (4 cr) Chapter 11 Clustering Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 25, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter
More information