Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering
|
|
- Martin Curtis
- 5 years ago
- Views:
Transcription
1 Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering
2 Clustering Algorithms Contents K-means Hierarchical algorithms Linkage functions Vector quantization SOM
3 Clustering Formulation Objects Attributes Find groups of similar points (observations) in multidimensional space No target variable (unsupervised learning) Model
4 Methods of Clustering - Overview Variety of methods: Hierarchical clustering create hierarchy of clusters (one cluster entirely contained within another cluster) Non-hierarchical methods create disjoint clusters Overlapping clusters (objects can belong to >1 cluster simultaneously) Fuzzy clusters (defined by the probability (grade) of membership of each object in each cluster) Useful data preprocessing prior to clustering: PCA (Principal Components Analysis) to reduce dimensionality of data Data standarization (transform data to reduce large influence of variables with larger variance on results of clustering)
5 Introductory Example 97 countries described by 3 attributes: Birth, Death, InfantDeath rate (given as number per 1000, data from year 1995)
6 Analysis I Clustering raw data K-means algorithm Result: 3 clusters (no. of obs. in each cluster: 13, 32, 52) Example cntd.
7
8 Example Profiles of Clusters
9 Example Profiles of Clusters Notice: data clustered based on InfantDeath Rate only!
10 Example Standarization of Data Analysis II Data standarized prior to clustering (variables divided by their standard deviation) Result: 3 clusters (with 35, 46, 16 obs.) Data clustered based on InfantDeath and Death Analysis II Analysis I Observe that data with largest variance have largest influence on results of clustering
11 Example Profiles of Clusters Analysis II: profiles of clusters
12 Methods of Clustering Non-hierarchical methods K-means clustering Non-deterministic O(n), n - number of observations Hierarchical methods Aglomerative (join small clusters) Divisive (split big clusters) Deterministic methods O(n 2 ) O(n 3 ), depending on the clustering method (i.e. definition of intercluster distance)
13 Methods of Clustering - Remarks Clustering large datasets K-means If results of hierarchical clustering needed first use K-means yielding e.g. 50 clusters, followed by hierarchical clustering on results of K-means Consensus clustering Discover real clusters in data analyze stability of results with noise injected
14 K-means Algorithm K-means clustering Select k points (centroids of initial clusters; select randomly) Assign each observation to the nearest centroid (nearest cluster) For each cluster find the new centroid Repeat step 2 and 3 until no change occurs in cluster assignments
15 K-means Algorithm Result: k separate clusters Algorithm requires that the correct number of clusters k is specified in advance (difficult problem: how to know the real number of clusters in data )
16 Hierarchical Clustering Notation x i observations, i=1..n C k clusters G current number of clusters D KL distance between clusters C K and C L Between-cluster distance D KL linkage function (various definitions available, results of clustering depend on D KL ) C L C K D KL
17 Hierarchical Clustering Algorithm (agglomerative hierarchical clustering) C k = {x k }, k=1..n, G=n Find K, L such that D KL = min D IJ, 1<=I,J<=G Replace clusters C K and C L by cluster C K C L, G=G-1 Repeat steps 2 and 3 while G>1 C L D KL C K Result: hierarchy of clusters dendrogram
18 Hierarchy of Clusters - Dendrogram
19 Definitions of Distance Between Clusters Different definitions of distance between clusters Average linkage Single linkage Complete linkage Density linkage Ward s minimum variance method (SAS CLUSTER procedure accepts 11 different definitions of inter-cluster distance)
20 Notation x i observations, i=1..n Average Linkage d(x,y) distance between observations (Euclidean distance assumed from now on) C k clusters N K number of observations in cluster C K D KL distance between clusters C K and C L mean CK mean observation in cluster C K W K = x i -mean CK 2 x i C K variance in cluster Average linkage Tends to join clusters with small variance Resulting clusters tend to have similar variance
21 Notation x i observations, i=1..n Complete Linkage d(x,y) distance between observations C k clusters N K number of observations in cluster C K D KL distance between clusters C K and C L mean CK mean observation in cluster C K W K = x i -mean CK 2 x i C K variance in cluster Complete linkage Resulting clusters tend to have similar diameter
22 Notation x i observations, i=1..n Single Linkage d(x,y) distance between observations C k clusters N K number of observations in cluster C K D KL distance between clusters C K and C L mean CK mean observation in cluster C K W K = x i -mean CK 2 x i C K variance in cluster Single linkage Tends to produce elongated clusters, irregular in shape
23 Ward s Minimum Variance Method Notation x i observations, i=1..n d(x,y) distance between observations C k clusters N K number of observations in cluster C K D KL distance between clusters C K and C L mean CK mean observation in cluster C K W K = x i -mean CK 2 x i C K variance in cluster B KL =W M -W K -W L where C M =C K C L Ward s minimum variance method Tends to join small clusters Tends to produce clusters with similar number of observations
24 Density Linkage Notation x i observations, i=1..n d(x,y) distance between observations r a fixed constant f(x) proportion of observations within sphere centered at x with radius r divided by the volume of the sphere (measure of density of points near observation x) Density linkage We realize single linkage using the measure d* Capable of discovering clusters of irregular shape
25 Example Average Linkage Elongated clusters in data
26 Elongated clusters in data Example K-means
27 Example Density Linkage Elongated clusters in data
28 Nonconvex clusters in data Example K-means
29 Example Centroid Linkage Nonconvex clusters in data
30 Example Density Linkage Nonconvex clusters in data
31 Clusters of unequal size Example True Clusters
32 Clusters of unequal size Example K-means
33 Example Ward s Method Clusters of unequal size
34 Example Average Linkage Method: average linkage
35 Example Centroid Linkage Clusters of unequal size
36 Example Single Linkage Clusters of unequal size
37 Example Well Separated Data Any method will work
38 Example Poorly Separated Data True clusters
39 Example Poorly Separated Data Method: K-means
40 Example Poorly Separated Data Ward s method
41 Clustering Methods Final Remarks Standarization of variables prior to clustering Often necessary, otherwise variables with large variance tend to have large influence on clustering Often standarized measurement z ij is computed as the z-score: where x ij original measurement in observation i and variable j, j mean value of variable j, s j mean absolute deviation of variable j (or its standard deviation) Other ideas: divide variable by its range, max value or standard deviation
42 Clustering Methods Final Remarks The number of clusters No satisfactory theory to determine the right number of clusters in data Various criteria can be observed to help determine the right number of clusters, e.g. criteria based on variance accounted for by clusters R 2 =1-P G /T or semipartial R 2 =B KL /T where T total variance of observations; P G = W K over G clusters B KL =W M -W K -W L where C M =C K C L Cubic Clustering Criterion (CCC) Often data visualization useful for determining the number of clusters Scatterplot for 2-3 dimensional data In high dimensions apply PCA transformation (or similar) visualize data in 2-3 dimensional space of first principal components
43 Example 2 R, Semi-partial 2 R
44 Example Number of Clusters Useful Checks PST2: 3 or 6 or 9 (one before peak in value) PSF: 9 (peak in value) CCC: 18 (CCC around 3)
45 Kohonen VQ (Vector Quantization) Algorithm similar to k-means Idea of VQ algorithm: 1. Select k points (initial cluster centroids) 2. For observation x i find nearest centroid (winning seed) denoted by S n 3. Modify S n according to the formula: where L learning constant (decresing during learning process) 4. Repeat steps 2 and 3 over all training observations 5. Repeat steps 2-4 given number of iterations
46 VQ MacQueen Method For L=const VQ algorithm does not coverge MacQueen method: Learning constant L reciprocal to the numer of observations N n in cluster associated with the winning seed S n This algorithm converges 46
47 Kohonen SOM (Self Organizing Maps) 1. Select k initial points (cluster centroids), represent them on a 2D map 2. For observation x i find winning seed S n 3. Modify all centroids : S j =S j (1-K(j,n)L)+x i K(j,n)L, where L learning constant (decreasing during training) K(j,n) function decreasing with increasing distance on the 2D map between S j i S n centroids (K(j,j)=1) 4. Repeat steps 2 and 3 over all training observations 47
48 Example SOM-based clustering of wine data (R language, dataset wines, package kohonen) 48
49 Example SOM-based clustering of wine data (R language, dataset wines, package kohonen) 49
50 R system implementation of the SOM algorithm: function som() (package kohonen) Results: structure wine.som important members: wine.som$codes wine.som$unit.classif # codebook vectors # winning units for all data points 50
51 Codebook vectors represent clusters created at each 2D grid element (attributes of codebook vectors are mean values of respective attributes of cluster elements) 51
52 R system implementation of the SOM algorithm: function som() (package kohonen) Results: structure wine.som important members: wine.som$codes wine.som$unit.classif # codebook vectors # winning units for all data points 52
53 R system implementation of the SOM algorithm: function som() (package kohonen) Results: structure wine.som important members: wine.som$codes wine.som$unit.classif # codebook vectors # winning units for all data points 53
54 Results: assignment of observations (individual wines) to 2D grid Grouping seeds (codebook vectors) e.g. with hierarchical clustering (hclust function): 54
55 Przykład SOM w R 55
56 Przykład SOM w R 56
57 Przykład SOM w R 57
Unsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationUnsupervised Learning. Pantelis P. Analytis. Introduction. Finding structure in graphs. Clustering analysis. Dimensionality reduction.
March 19, 2018 1 / 40 1 2 3 4 2 / 40 What s unsupervised learning? Most of the data available on the internet do not have labels. How can we make sense of it? 3 / 40 4 / 40 5 / 40 Organizing the web First
More informationCluster Analysis: Agglomerate Hierarchical Clustering
Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More information4. Cluster Analysis. Francesc J. Ferri. Dept. d Informàtica. Universitat de València. Febrer F.J. Ferri (Univ. València) AIRF 2/ / 1
Anàlisi d Imatges i Reconeixement de Formes Image Analysis and Pattern Recognition:. Cluster Analysis Francesc J. Ferri Dept. d Informàtica. Universitat de València Febrer 8 F.J. Ferri (Univ. València)
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationINF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22
INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationClustering. Content. Typical Applications. Clustering: Unsupervised data mining technique
Content Clustering Examples Cluster analysis Partitional: K-Means clustering method Hierarchical clustering methods Data preparation in clustering Interpreting clusters Cluster validation Clustering: Unsupervised
More informationMethods for Intelligent Systems
Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationFigure (5) Kohonen Self-Organized Map
2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationCluster Analysis. Angela Montanari and Laura Anderlucci
Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a
More informationLesson 3. Prof. Enza Messina
Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More informationClustering in Ratemaking: Applications in Territories Clustering
Clustering in Ratemaking: Applications in Territories Clustering Ji Yao, PhD FIA ASTIN 13th-16th July 2008 INTRODUCTION Structure of talk Quickly introduce clustering and its application in insurance ratemaking
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More informationClustering algorithms
Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 1 / 22 Table of contents 1 Supervised
More informationClustering Lecture 3: Hierarchical Methods
Clustering Lecture 3: Hierarchical Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationChapter 6: Cluster Analysis
Chapter 6: Cluster Analysis The major goal of cluster analysis is to separate individual observations, or items, into groups, or clusters, on the basis of the values for the q variables measured on each
More informationClustering Part 3. Hierarchical Clustering
Clustering Part Dr Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Hierarchical Clustering Two main types: Agglomerative Start with the points
More informationWorking with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan
Working with Unlabeled Data Clustering Analysis Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan chanhl@mail.cgu.edu.tw Unsupervised learning Finding centers of similarity using
More informationClustering part II 1
Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More informationCOSC 6397 Big Data Analytics. Fuzzy Clustering. Some slides based on a lecture by Prof. Shishir Shah. Edgar Gabriel Spring 2015.
COSC 6397 Big Data Analytics Fuzzy Clustering Some slides based on a lecture by Prof. Shishir Shah Edgar Gabriel Spring 215 Clustering Clustering is a technique for finding similarity groups in data, called
More informationToday s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ
Clustering CS498 Today s lecture Clustering and unsupervised learning Hierarchical clustering K-means, K-medoids, VQ Unsupervised learning Supervised learning Use labeled data to do something smart What
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationCOSC 6339 Big Data Analytics. Fuzzy Clustering. Some slides based on a lecture by Prof. Shishir Shah. Edgar Gabriel Spring 2017.
COSC 6339 Big Data Analytics Fuzzy Clustering Some slides based on a lecture by Prof. Shishir Shah Edgar Gabriel Spring 217 Clustering Clustering is a technique for finding similarity groups in data, called
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu Clustering Overview Supervised vs. Unsupervised Learning Supervised learning (classification) Supervision: The training data (observations, measurements,
More informationCHAPTER FOUR NEURAL NETWORK SELF- ORGANIZING MAP
96 CHAPTER FOUR NEURAL NETWORK SELF- ORGANIZING MAP 97 4.1 INTRODUCTION Neural networks have been successfully applied by many authors in solving pattern recognition problems. Unsupervised classification
More informationFoundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot
Foundations of Machine Learning CentraleSupélec Fall 2017 12. Clustering Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning objectives
More information4. Ad-hoc I: Hierarchical clustering
4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat Flat methods generate a single partition into k clusters. The number k of clusters has to be determined by the user ahead of time. Hierarchical
More informationECS 234: Data Analysis: Clustering ECS 234
: Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationCluster Analysis for Microarray Data
Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationExploratory Analysis: Clustering
Exploratory Analysis: Clustering (some material taken or adapted from slides by Hinrich Schutze) Heejun Kim June 26, 2018 Clustering objective Grouping documents or instances into subsets or clusters Documents
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationHierarchical Clustering
Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree-like diagram that records the sequences of merges
More informationClustering. Partition unlabeled examples into disjoint subsets of clusters, such that:
Text Clustering 1 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: Examples within a cluster are very similar Examples in different clusters are very different Discover
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationKeywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.
Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Slides From Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining
More informationRoad map. Basic concepts
Clustering Basic concepts Road map K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use?
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationMultivariate Analysis
Multivariate Analysis Cluster Analysis Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Unsupervised Learning Cluster Analysis Natural grouping Patterns in the data
More informationExploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray
Exploratory Data Analysis using Self-Organizing Maps Madhumanti Ray Content Introduction Data Analysis methods Self-Organizing Maps Conclusion Visualization of high-dimensional data items Exploratory data
More informationLecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Hierarchical Clustering Produces a set
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 2
Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical
More informationDATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm
DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)
More informationClustering in Data Mining
Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,
More information3. Cluster analysis Overview
Université Laval Multivariate analysis - February 2006 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as
More informationPAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods
Whatis Cluster Analysis? Clustering Types of Data in Cluster Analysis Clustering part II A Categorization of Major Clustering Methods Partitioning i Methods Hierarchical Methods Partitioning i i Algorithms:
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationHierarchical clustering
Aprendizagem Automática Hierarchical clustering Ludwig Krippahl Hierarchical clustering Summary Hierarchical Clustering Agglomerative Clustering Divisive Clustering Clustering Features 1 Aprendizagem Automática
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationUnsupervised Learning. Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team
Unsupervised Learning Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team Table of Contents 1)Clustering: Introduction and Basic Concepts 2)An Overview of Popular Clustering Methods 3)Other Unsupervised
More informationCluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole
Cluster Analysis Summer School on Geocomputation 27 June 2011 2 July 2011 Vysoké Pole Lecture delivered by: doc. Mgr. Radoslav Harman, PhD. Faculty of Mathematics, Physics and Informatics Comenius University,
More informationLecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 23 November 2018 Overview Unsupervised Machine Learning overview Association
More informationNotes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)
1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationFinding Clusters 1 / 60
Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-means Density Based Clustering, e.g. DBScan Grid Based Clustering 1 / 60
More informationSTATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010
STATS306B Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Outline K-means, K-medoids, EM algorithm choosing number of clusters: Gap test hierarchical clustering spectral
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationData Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Clustering Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 31 Table of contents 1 Introduction 2 Data matrix and
More informationUnsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationClustering & Dimensionality Reduction. 273A Intro Machine Learning
Clustering & Dimensionality Reduction 273A Intro Machine Learning What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #14: Clustering Seoul National University 1 In This Lecture Learn the motivation, applications, and goal of clustering Understand the basic methods of clustering (bottom-up
More information3. Cluster analysis Overview
Université Laval Analyse multivariable - mars-avril 2008 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationStatistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.
Clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group will be similar (or
More informationCSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection
CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationCHAPTER THREE THE DISTANCE FUNCTION APPROACH
50 CHAPTER THREE THE DISTANCE FUNCTION APPROACH 51 3.1 INTRODUCTION Poverty is a multi-dimensional phenomenon with several dimensions. Many dimensions are divided into several attributes. An example of
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No 08 Cluster Analysis Naeem Ahmed Email: naeemmahoto@gmailcom Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Outline
More informationCLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16
CLUSTERING CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 1. K-medoids: REFERENCES https://www.coursera.org/learn/cluster-analysis/lecture/nj0sb/3-4-the-k-medoids-clustering-method https://anuradhasrinivas.files.wordpress.com/2013/04/lesson8-clustering.pdf
More informationData Mining: Concepts and Techniques. Chapter March 8, 2007 Data Mining: Concepts and Techniques 1
Data Mining: Concepts and Techniques Chapter 7.1-4 March 8, 2007 Data Mining: Concepts and Techniques 1 1. What is Cluster Analysis? 2. Types of Data in Cluster Analysis Chapter 7 Cluster Analysis 3. A
More informationApplied Clustering Techniques. Jing Dong
Applied Clustering Techniques Jing Dong Nov 31, 2016 What is cluster analysis? What is Cluster Analysis? Cluster: o Similar to one another within the same cluster o Dissimilar to the objects in other clusters
More informationUnsupervised: no target value to predict
Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationCluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008
Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationClustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search
Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2
More information