CS570: Introduction to Data Mining

Similar documents
CS145: INTRODUCTION TO DATA MINING

CS570: Introduction to Data Mining

CS249: ADVANCED DATA MINING

COMP 465: Data Mining Still More on Clustering

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES

COMP5331: Knowledge Discovery and Data Mining

Lecture 3 Clustering. January 21, 2003 Data Mining: Concepts and Techniques 1

Cluster Analysis. Outline. Motivation. Examples Applications. Han and Kamber, ch 8

Lezione 21 CLUSTER ANALYSIS

Clustering Techniques

Analysis and Extensions of Popular Clustering Algorithms

DBSCAN. Presented by: Garrett Poppe

Knowledge Discovery in Databases

CSE 5243 INTRO. TO DATA MINING

Metodologie per Sistemi Intelligenti. Clustering. Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo di Milano Leonardo

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 10

CS490D: Introduction to Data Mining Prof. Chris Clifton. Cluster Analysis

CS570: Introduction to Data Mining

CS Data Mining Techniques Instructor: Abdullah Mueen

d(2,1) d(3,1 ) d (3,2) 0 ( n, ) ( n ,2)......

University of Florida CISE department Gator Engineering. Clustering Part 5

K-DBSCAN: Identifying Spatial Clusters With Differing Density Levels

A New Approach to Determine Eps Parameter of DBSCAN Algorithm

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

Lecture 7 Cluster Analysis: Part A

UNIT V CLUSTERING, APPLICATIONS AND TRENDS IN DATA MINING. Clustering is unsupervised classification: no predefined classes

DS504/CS586: Big Data Analytics Big Data Clustering II

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS

Clustering Part 4 DBSCAN

Data Mining 4. Cluster Analysis

A Review on Cluster Based Approach in Data Mining

University of Florida CISE department Gator Engineering. Clustering Part 4

Clustering Lecture 4: Density-based Methods

DS504/CS586: Big Data Analytics Big Data Clustering II

Heterogeneous Density Based Spatial Clustering of Application with Noise

Data Mining: Concepts and Techniques. Chapter March 8, 2007 Data Mining: Concepts and Techniques 1

Data Mining: Concepts and Techniques. Chapter 7 Jiawei Han. University of Illinois at Urbana-Champaign. Department of Computer Science

Clustering Algorithms In Data Mining

Density-Based Clustering of Polygons

Clustering Algorithms for Data Stream

Faster Clustering with DBSCAN

Clustering part II 1

Unsupervised Learning. Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team

CS Introduction to Data Mining Instructor: Abdullah Mueen

Course Content. Classification = Learning a Model. What is Classification?

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

CS 412 Intro. to Data Mining Chapter 10. Cluster Analysis: Basic Concepts and Methods

Data Clustering With Leaders and Subleaders Algorithm

Course Content. What is Classification? Chapter 6 Objectives

Distance-based Methods: Drawbacks

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Clustering CS 550: Machine Learning

A New Fast Clustering Algorithm Based on Reference and Density

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining, 2 nd Edition

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

Clustering Algorithms for High Dimensional Data Literature Review

Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimensional Spatial Data

FuzzyShrinking: Improving Shrinking-Based Data Mining Algorithms Using Fuzzy Concept for Multi-Dimensional data

Data Stream Clustering Using Micro Clusters

AutoEpsDBSCAN : DBSCAN with Eps Automatic for Large Dataset

Multi-Modal Data Fusion: A Description

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES

Clustering in Data Mining

OPTICS-OF: Identifying Local Outliers

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

What is Cluster Analysis? COMP 465: Data Mining Clustering Basics. Applications of Cluster Analysis. Clustering: Application Examples 3/17/2015

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Data Mining Algorithms

USING SOFT COMPUTING TECHNIQUES TO INTEGRATE MULTIPLE KINDS OF ATTRIBUTES IN DATA MINING

Cluster Cores-based Clustering for High Dimensional Data

Big Data SONY Håkan Jonsson Vedran Sekara

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Clustering in Ratemaking: Applications in Territories Clustering

MOSAIC: A Proximity Graph Approach for Agglomerative Clustering 1

K-Means for Spherical Clusters with Large Variance in Sizes

数据挖掘 Introduction to Data Mining

Scalable Varied Density Clustering Algorithm for Large Datasets

Cluster Analysis (b) Lijun Zhang

An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment

Evaluating and Analyzing Clusters in Data Mining using Different Algorithms

bridge hill 0 1 river river (a) Customers' location and obstacles C3 C C1 0 1 C (b) Clusters formed when ig

C-NBC: Neighborhood-Based Clustering with Constraints

A Technical Insight into Clustering Algorithms & Applications

An Enhanced Density Clustering Algorithm for Datasets with Complex Structures

Kapitel 4: Clustering

Unsupervised learning on Color Images

Data Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1394

ENHANCED DBSCAN ALGORITHM

Partitioning Clustering Based on Support Vector Ranking

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

PATENT DATA CLUSTERING: A MEASURING UNIT FOR INNOVATORS

DATA MINING II - 1DL460

AN IMPROVED DENSITY BASED k-means ALGORITHM

A Comparative Study of Various Clustering Algorithms in Data Mining

CHAPTER 4: CLUSTER ANALYSIS

Transcription:

CS570: Introduction to Data Mining Cluster Analysis Reading: Chapter 10.4, 10.6, 11.1.3 Han, Chapter 8.4,8.5,9.2.2, 9.3 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining. Morgan Kaufmann, and 2006 Tan, Steinbach & Kumar. Introd. Data Mining., Pearson. Addison Wesley. October 15, 2013 1

Cluster Analysis Overview Partitioning methods Hierarchical methods Graph-based methods (CHAMELEON) Self-organizing maps (SOM) Density-based methods EM method Cluster evaluation Outlier analysis October 15, 2013 2

Spatial Data A cluster is regarded as a dense region of regions of high density A cluster can have arbitrary shapes Existence of streaks and noises October 15, 2013 Data Mining: Concepts and Techniques 3

Density-Based Clustering Methods Clustering based on density Major features: Clusters of arbitrary shape Handle noise Need density parameters as termination condition Several interesting studies: DBSCAN: Ester, et al. (KDD 96) OPTICS: Ankerst, et al (SIGMOD 99). DENCLUE: Hinneburg & D. Keim (KDD 98) CLIQUE: Agrawal, et al. (SIGMOD 98) (more grid-based) October 15, 2013 4

DBSCAN: Basic Concepts Density = number of points within a specified radius core point: has high density border point: has less density, but in the neighborhood of a core point noise point: not a core point or a border point. Core point border point noise point

Two parameters: DBScan: Definitions Eps: radius of the neighborhood MinPts: Minimum number of points in an Epsneighborhood of that point Eps-neighborhood of a point p: N Eps (p): core point: N Eps (q) >= MinPts {q belongs to D dist(p,q) <= Eps} q p MinPts = 5 Eps = 1 cm October 15, 2013 6

DBScan: Definitions Directly density-reachable: if p belongs to N Eps (q), q is a core point q p MinPts = 5 Eps = 1 cm Density-reachable: if there is a chain of points p 1,, p n, p 1 = q, p n = p such that p i+1 is directly densityreachable from p i q p 1 p Density-connected: if there is a point o such that both, p and q are densityreachable from o w.r.t. Eps and MinPts p o q 7

DBSCAN: Cluster Definition A cluster is defined as a maximal set of density-connected points Outlier Border Core Eps = 1cm MinPts = 5 October 15, 2013 8

DBSCAN: The Algorithm Arbitrary select an unvisited point p, mart it as visited and If p is a core point Retrieve all points density-reachable from p w.r.t. Eps and MinPts, a cluster is formed, add p to cluster. Otherwise mark the point as noise and visit the next unvisited point in the database Continue the process until all of the points have been processed. October 15, 2013 Data Mining: Concepts and Techniques 9

DBSCAN: Sensitive to Parameters October 15, 2013 Data Mining: Concepts and Techniques 10

DBSCAN: Determining EPS and MinPts Basic idea: Suppose the neighborhood size is k For points within a cluster, their k th nearest neighbors are at roughly the same distance Noise points have the k th nearest neighbor at farther distance Plot sorted distance of every point to its k th nearest neighbor

DBSCAN: Core, Border and Noise Points Original Points Point types: core, border and noise Eps = 10, MinPts = 4

When DBSCAN Does NOT Work Well Original Points (MinPts=4, Eps=9.75). Varying densities High-dimensional data (MinPts=4, Eps=9.92)

DBSCAN: Features Complexity: O(n 2 ), can be reduced to O(n logn) if using index structures Advantages does not require the number of clusters (vs. k-means) can find arbitrarily shaped clusters can identify noise mostly insensitive to the ordering of the points Disadvantages sensitive to parameters does not respond well to data sets with varying densities

OPTICS: A Cluster-Ordering Method (1999) OPTICS: Ordering Points To Identify the Clustering Structure Ankerst, Breunig, Kriegel, and Sander (SIGMOD 99) Produces a special order of the database wrt its density-based clustering structure This cluster-ordering contains info equiv to the densitybased clusterings corresponding to a broad range of parameter settings Good for both automatic and interactive cluster analysis, including finding intrinsic clustering structure Can be represented graphically or using visualization techniques October 15, 2013 15

OPTICS: Some Extension from DBSCAN Core Distance of p: smallest distance that makes p a core point MinPts = 5 ε = 3 cm Reachability Distance of p wrt o: max {core-distance (o), d (o, p)} p D e.g., r(p1, o) = 2.8cm p1 o r(p2,o) = 4cm p2 16

OPTICS: The Algorithm Arbitrary select an unvisited point p, mart it as visited and If p is a core point Retrieve all points density-reachable from p w.r.t. Eps and MinPts Update the reachability distance of points to nearest neighbor and output the points in ascending order Otherwise, visit the next unvisited point in the database Continue the process until all of the points have been processed. October 16, 2013 17

OPTICS: example Reachability -distance undefined Cluster-order of the objects October 15, 2013 18

Cluster Analysis Overview Partitioning methods Hierarchical methods Graph-based methods (CHAMELEON) Self-organizing maps (SOM) Density-based methods Other: EM method, COBWEB Cluster evaluation Outlier analysis October 15, 2013 19

Probabilistic Model-Based Clustering Assume the data are generated by a mathematical model Attempt to optimize the fit between the given data and some mathematical model Typical methods Statistical approach EM (Expectation maximization) Machine learning approach COBWEB Neural network approach SOM (Self-Organizing Feature Map) October 16, 2013 20

Clustering by Mixture Model Assume data are generated by a mixture of probabilistic model Generalization of k-means Each cluster can be represented by a probabilistic model, like a Gaussian (continuous) or a Poisson (discrete) distribution. October 16, 2013 21

Expectation Maximization (EM) Starts with an initial estimate of the parameters of the mixture model Iteratively refine the parameters using EM method Expectation step: computes expectation of the likelihood of each data point X i to belong to cluster C i Maximization step: computes maximum likelihood estimates of the parameters Until parameters do not change or below a threshold October 15, 2013 22

Conceptual Clustering Conceptual clustering Generates a concept description for each concept (class) Produces a hierarchical category or classification scheme Related to decision tree learning and mixture model learning COBWEB (Fisher 87) A popular and simple method of incremental conceptual learning Creates a hierarchical clustering in the form of a classification tree Each node refers to a concept and contains a probabilistic description of that concept October 15, 2013 23

COBWEB Classification Tree October 15, 2013 24

COBWEB: Learning the Classification Tree Incrementally builds the classification tree Given a new object Search for the best node at which to incorporate the object or add a new node for the object Update the probabilistic description at each node Merging and splitting Use a heuristic measure - Category Utility to guide construction of the tree October 15, 2013 25

COBWEB: Comments Limitations The assumption that the attributes are independent of each other is often too strong because correlation may exist Not suitable for clustering large database skewed tree and expensive probability distributions October 15, 2013 26

Cluster Analysis Overview Partitioning methods Hierarchical methods Graph-based methods (CHAMELEON) Self-organizing maps (SOM) Density-based methods Other: EM method, COBWEB Cluster evaluation Outlier analysis October 15, 2013 27

Cluster Evaluation Determine clustering tendency of data, i.e. distinguish whether non-random structure exists Determine correct number of clusters Evaluate how well the cluster results fit the data without external information Evaluate how well the cluster results are compared to externally known results Compare different clustering algorithms/results 28

y y y y Clusters found in Random Data Random Points 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 1 x 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 1 x DBSCAN 0.9 0.9 K-means 0.8 0.7 0.6 0.8 0.7 0.6 Complete Link 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 x 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 x 29

Measures of Cluster Validity Unsupervised (internal indices): Used to measure the goodness of a clustering structure without respect to external information. Sum of Squared Error (SSE) Supervised (external indices): Used to measure the extent to which cluster labels match externally supplied class labels. Entropy Relative: Used to compare two different clustering results Often an external or internal index is used for this function, e.g., SSE or entropy 30

Internal Measures: Cohesion and Separation Cluster Cohesion: how closely related are objects in a cluster Cluster Separation: how distinct or well-separated a cluster is from other clusters Example: Squared Error Cohesion: within cluster sum of squares (SSE) 2 WSS ( x mi ) i x C i Separation: between cluster sum of squares BSS C ( m i i m j 2 ) Cohesion separation 31

Internal Measures: Cohesion and Separation Cluster cohesion is the sum of the weight of all links within a cluster. Cluster separation is the sum of the weights between nodes in the cluster and nodes outside the cluster. cohesion separation 32

Internal Measures: Cohesion and Separation Example: SSE BSS + WSS = constant m 1 m 1 2 3 4 m 2 5 K=1 cluster: WSS (1 3) 2 (2 3) 2 (4 3) 2 (5 3) 2 10 BSS 4 (3 3) 2 0 Total 10 0 10 K=2 clusters: WSS (1 1.5) 2 (2 1.5) 2 (4 4.5) 2 (5 4.5) 2 1 BSS 2 (3 1.5) 2 2 (4.5 3) 2 9 Total 1 9 10 33

SSE Internal Measures: SSE SSE is good for comparing two clusterings Can also be used to estimate the number of clusters 10 6 4 2 0-2 9 8 7 6 5 4 3-4 2 1-6 5 10 15 0 2 5 10 15 20 25 30 K 34

Internal Measures: SSE Another example of a more complicated data set 1 2 6 4 3 5 7 SSE of clusters found using K-means 35

y Count Statistics framework for cluster validity More atypical -> likely valid structure in the data Use values resulting from random data as baseline Example Statistical Framework for SSE Clustering: SSE = 0.005 SSE of three clusters in 500 sets of random data points 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 x 50 45 40 35 30 25 20 15 10 5 0 0.016 0.018 0.02 0.022 0.024 0.026 0.028 0.03 0.032 0.034 SSE 36

External Measures Compare cluster results with ground truth or manually clustering Classification-oriented measures: entropy, purity, precision, recall, F-measures Similarity-oriented measures: Jaccard scores 37

External Measures: Classification-Oriented Measures Entropy: the degree to which each cluster consists of objects of a single class Precision: the fraction of a cluster that consists of objects of a specified class Recall: the extent to which a cluster contains all objects of a specified class 38

External Measure: Similarity-Oriented Measures Given a reference clustering T and clustering S f 00 : number of pair of points belonging to different clusters in both T and S f 01 : number of pair of points belonging to different cluster in T but same cluster in S f 10 : number of pair of points belonging to same cluster in T but different cluster in S f 11 : number of pair of points belonging to same clusters in both T and S Rand Jaccard f 00 f 01 f f 00 01 f f 11 10 f f 11 10 f 11 f 11 October 15, 2013 39 T S

y Points Using Similarity Matrix for Cluster Validation Order the similarity matrix with respect to cluster labels and inspect visually. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 x 10 20 30 40 50 60 70 80 90 100 20 40 60 80 100 Points 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Similarity 40

Points y Using Similarity Matrix for Cluster Validation Clusters in random data are not so crisp 10 20 30 40 50 60 70 80 90 100 20 40 60 80 100 Points 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Similarity 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 x DBSCAN 41

Points y Using Similarity Matrix for Cluster Validation Clusters in random data are not so crisp 10 20 30 40 50 60 70 80 90 100 20 40 60 80 100 Points 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Similarity 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 x K-means 42

Points y Using Similarity Matrix for Cluster Validation Clusters in random data are not so crisp 10 20 30 40 50 60 70 80 90 100 20 40 60 80 100 Points 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Similarity 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 x Complete Link 43

Using Similarity Matrix for Cluster Validation 1 0.9 1 2 6 4 3 500 1000 1500 0.8 0.7 0.6 0.5 0.4 2000 0.3 5 2500 0.2 7 3000 500 1000 1500 2000 2500 3000 0.1 0 DBSCAN 44

Chapter 7. Cluster Analysis Overview Partitioning methods Hierarchical methods Density-based methods Other methods Cluster evaluation Outlier analysis October 15, 2013 45

References (1) R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD'98 M. R. Anderberg. Cluster Analysis for Applications. Academic Press, 1973. M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure, SIGMOD 99. P. Arabie, L. J. Hubert, and G. De Soete. Clustering and Classification. World Scientific, 1996 Beil F., Ester M., Xu X.: "Frequent Term-Based Text Clustering", KDD'02 M. M. Breunig, H.-P. Kriegel, R. Ng, J. Sander. LOF: Identifying Density-Based Local Outliers. SIGMOD 2000. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. KDD'96. M. Ester, H.-P. Kriegel, and X. Xu. Knowledge discovery in large spatial databases: Focusing techniques for efficient class identification. SSD'95. D. Fisher. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139-172, 1987. D. Gibson, J. Kleinberg, and P. Raghavan. Clustering categorical data: An approach based on dynamic systems. VLDB 98. October 15, 2013 Data Mining: Concepts and Techniques 46

References (2) V. Ganti, J. Gehrke, R. Ramakrishan. CACTUS Clustering Categorical Data Using Summaries. KDD'99. D. Gibson, J. Kleinberg, and P. Raghavan. Clustering categorical data: An approach based on dynamic systems. In Proc. VLDB 98. S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. SIGMOD'98. S. Guha, R. Rastogi, and K. Shim. ROCK: A robust clustering algorithm for categorical attributes. In ICDE'99, pp. 512-521, Sydney, Australia, March 1999. A. Hinneburg, D.l A. Keim: An Efficient Approach to Clustering in Large Multimedia Databases with Noise. KDD 98. A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Printice Hall, 1988. G. Karypis, E.-H. Han, and V. Kumar. CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. COMPUTER, 32(8): 68-75, 1999. L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990. E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. VLDB 98. G. J. McLachlan and K.E. Bkasford. Mixture Models: Inference and Applications to Clustering. John Wiley and Sons, 1988. P. Michaud. Clustering techniques. Future Generation Computer systems, 13, 1997. R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. VLDB'94. October 15, 2013 Data Mining: Concepts and Techniques 47

References (3) L. Parsons, E. Haque and H. Liu, Subspace Clustering for High Dimensional Data: A Review, SIGKDD Explorations, 6(1), June 2004 E. Schikuta. Grid clustering: An efficient hierarchical clustering method for very large data sets. Proc. 1996 Int. Conf. on Pattern Recognition,. G. Sheikholeslami, S. Chatterjee, and A. Zhang. WaveCluster: A multi-resolution clustering approach for very large spatial databases. VLDB 98. A. K. H. Tung, J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-Based Clustering in Large Databases, ICDT'01. A. K. H. Tung, J. Hou, and J. Han. Spatial Clustering in the Presence of Obstacles, ICDE'01 H. Wang, W. Wang, J. Yang, and P.S. Yu. Clustering by pattern similarity in large data sets, SIGMOD 02. W. Wang, Yang, R. Muntz, STING: A Statistical Information grid Approach to Spatial Data Mining, VLDB 97. T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH : an efficient data clustering method for very large databases. SIGMOD'96. October 15, 2013 Data Mining: Concepts and Techniques 48

Clustering: Rich Applications and Multidisciplinary Efforts Pattern Recognition Spatial Data Analysis Create thematic maps in GIS by clustering feature spaces Detect spatial clusters or for other spatial mining tasks Image Processing Economic Science (especially market research) WWW Document clustering Cluster Weblog data to discover groups of similar access patterns October 15, 2013 Data Mining: Concepts and Techniques 49