k-means A classical clustering algorithm
|
|
- Cornelius Small
- 6 years ago
- Views:
Transcription
1 k-means A classical clustering algorithm Devert Alexandre School of Software Engineering of USTC 30 November 2012 Slide 1/65
2 Table of Contents 1 Introduction 2 Visual demo Step by step Voronoi diagrams 3 Algorithm Update step Assignment step Stopping criteria 4 Implementation 5 Clustering quality Internal criteria External criteria 6 Initialization 7 Limitations Devert Alexandre (School of Software Engineering of USTC) k-means Slide 2/65
3 Purpose k-means is a very generic clustering algorithm. Works with R n vectors sequences: strings, sampled signals, etc. high-level data: pictures, sounds, biometric records, etc. composite data: mix all the above Devert Alexandre (School of Software Engineering of USTC) k-means Slide 3/65
4 Inputs k-means inputs A set of points Choose a number of clusters Choose a distance function between points Devert Alexandre (School of Software Engineering of USTC) k-means Slide 4/65
5 Outputs k-means outputs A label for each point (one label = one cluster) A center per cluster Devert Alexandre (School of Software Engineering of USTC) k-means Slide 5/65
6 Algorithm k-means is an iterative algorithm need all the dataset at once Devert Alexandre (School of Software Engineering of USTC) k-means Slide 6/65
7 Table of Contents 1 Introduction 2 Visual demo Step by step Voronoi diagrams 3 Algorithm Update step Assignment step Stopping criteria 4 Implementation 5 Clustering quality Internal criteria External criteria 6 Initialization 7 Limitations Devert Alexandre (School of Software Engineering of USTC) k-means Slide 7/65
8 Input data We want 3 clusters, red, green and blue Devert Alexandre (School of Software Engineering of USTC) k-means Slide 8/65
9 Initialization Points are tagged randomly Devert Alexandre (School of Software Engineering of USTC) k-means Slide 9/65
10 Update Computes center of red, green and blue points Devert Alexandre (School of Software Engineering of USTC) k-means Slide 10/65
11 Space partition We can cut the space in 3 areas: a Voronoi diagram! One area area which points are closer to one center Devert Alexandre (School of Software Engineering of USTC) k-means Slide 11/65
12 Assignment Change the color of the points One point get the color of the closest cluster center Devert Alexandre (School of Software Engineering of USTC) k-means Slide 12/65
13 Update Update center of red, green and blue points Devert Alexandre (School of Software Engineering of USTC) k-means Slide 13/65
14 Space partition The 3 areas changed One area area which points are closer to one center Devert Alexandre (School of Software Engineering of USTC) k-means Slide 14/65
15 Assignment Change the color of the points One point get the color of the closest cluster center Devert Alexandre (School of Software Engineering of USTC) k-means Slide 15/65
16 Update Update center of red, green and blue points Devert Alexandre (School of Software Engineering of USTC) k-means Slide 16/65
17 Space partition The 3 areas changed (a little tiny bit) Devert Alexandre (School of Software Engineering of USTC) k-means Slide 17/65
18 Assignment All points coloured properly already we are done! Devert Alexandre (School of Software Engineering of USTC) k-means Slide 18/65
19 Voronoi diagram A little reminder/introduction to Voronoi diagrams Devert Alexandre (School of Software Engineering of USTC) k-means Slide 19/65
20 Voronoi diagram Let s put some points on this slide Devert Alexandre (School of Software Engineering of USTC) k-means Slide 20/65
21 Voronoi diagram Let s picture the distance to the closest point Devert Alexandre (School of Software Engineering of USTC) k-means Slide 21/65
22 Voronoi diagram Let s picture the places equidistant to several points Devert Alexandre (School of Software Engineering of USTC) k-means Slide 22/65
23 Voronoi diagram The Voronoi diagram is made of the borders between cells Devert Alexandre (School of Software Engineering of USTC) k-means Slide 23/65
24 Table of Contents 1 Introduction 2 Visual demo Step by step Voronoi diagrams 3 Algorithm Update step Assignment step Stopping criteria 4 Implementation 5 Clustering quality Internal criteria External criteria 6 Initialization 7 Limitations Devert Alexandre (School of Software Engineering of USTC) k-means Slide 24/65
25 Algorithm A 3 steps loop 1 Update where are the clusters centres? 2 Assignment who belong to whom? 3 Stopping criteria are we done yet? Devert Alexandre (School of Software Engineering of USTC) k-means Slide 25/65
26 Update step Update step: compute the center of each clusters. Center of a cluster can be 1 L 2 norm geometric center of the cluster s points 2 median point of the cluster s points 3 medoid point of the cluster s points 4 whatever makes sense for your data The last point will have a lecture just for it! Devert Alexandre (School of Software Engineering of USTC) k-means Slide 26/65
27 Center of a cluster Let s compute the center of those points Devert Alexandre (School of Software Engineering of USTC) k-means Slide 27/65
28 Center of a cluster We can use the mean on each dimension Devert Alexandre (School of Software Engineering of USTC) k-means Slide 28/65
29 Center of a cluster We can use the mean on each dimension Devert Alexandre (School of Software Engineering of USTC) k-means Slide 28/65
30 Center of a cluster We can use the mean on each dimension Devert Alexandre (School of Software Engineering of USTC) k-means Slide 28/65
31 Center of a cluster But the mean have troubles with outliers Devert Alexandre (School of Software Engineering of USTC) k-means Slide 29/65
32 Center of a cluster Using the median on each dimension is more robust Devert Alexandre (School of Software Engineering of USTC) k-means Slide 30/65
33 Assignment Assignment step: a point belongs to the closest cluster center before after Devert Alexandre (School of Software Engineering of USTC) k-means Slide 31/65
34 Stopping criteria Stopping criteria: no more cluster assignments change k-means always converge final clusters might not be ideal not so good clustering better clustering Devert Alexandre (School of Software Engineering of USTC) k-means Slide 32/65
35 Table of Contents 1 Introduction 2 Visual demo Step by step Voronoi diagrams 3 Algorithm Update step Assignment step Stopping criteria 4 Implementation 5 Clustering quality Internal criteria External criteria 6 Initialization 7 Limitations Devert Alexandre (School of Software Engineering of USTC) k-means Slide 33/65
36 27 lines, comments and line-skips included Devert Alexandre (School of Software Engineering of USTC) k-means Slide 34/65 UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA k-means in Python 1 i m p o r t numpy 2 3 # Parameters 4 p o i n t s = numpy. l o a d t x t ( m y F i l e. dat ) 5 nbclusters, nbpoints = 2, points. shape [ 0 ] 6 7 # I n i t i a l i z a t i o n 8 c l u s t e r s I d = numpy. random. r a n d i n t ( 0, n b C l u s t e r s, s i z e=n b P o i n t s ) 9 c l u s t e r s C e n t e r s = numpy. z e r o s ( ( n b C l u s t e r s, p o i n t s. shape [ 1 ] ) ) # I t e r a t i o n 12 c o n v e r g e d = F a l s e 13 w h i l e not converged : 14 # Update t h e c l u s t e r s c e n t e r ( mean ) 15 f o r c in xrange ( nbclusters ) : 16 numpy. mean ( 17 [ p o i n t s [ p ] f o r p i n x r a n g e ( n b P o i n t s ) i f c l u s t e r s I d [ p ] == c ], 18 a x i s =0, out=c l u s t e r s C e n t e r s [ c ] ) # A t t r i b u t e each p o i n t to i t s c l o s e s t c l u s t e r 21 o l d C l u s t e r s I d = c l u s t e r s I d 22 c l u s t e r s I d = [ 23 numpy. argmin ( [ numpy. l i n a l g. norm ( p c ) f o r c i n c l u s t e r s C e n t e r s ] ) f o r p i n p o i n t s 24 ] # Check i f convergence i s reached 27 c o n v e r g e d = numpy. a r r a y e q u a l ( o l d C l u s t e r s I d, c l u s t e r s I d )
37 Initialization 1 i m p o r t numpy 2 3 # Parameters 4 p o i n t s = numpy. l o a d t x t ( m y F i l e. dat ) 5 nbclusters, nbpoints = 2, points. shape [ 0 ] 6 7 # I n i t i a l i z a t i o n 8 c l u s t e r s I d = numpy. random. r a n d i n t ( 0, n b C l u s t e r s, s i z e=n b P o i n t s ) 9 c l u s t e r s C e n t e r s = numpy. z e r o s ( ( n b C l u s t e r s, p o i n t s. shape [ 1 ] ) ) line 1 let s use Numpy line 4 and 5 load data line 8 points are assigned randomly to a cluster line 9 matrix storing center of each cluster Devert Alexandre (School of Software Engineering of USTC) k-means Slide 35/65
38 Iterations 1 # I t e r a t i o n 2 c o n v e r g e d = F a l s e 3 w h i l e not converged : 4 # Update t h e c l u s t e r s c e n t e r ( mean ) 5 6 # A t t r i b u t e each p o i n t to i t s c l o s e s t c l u s t e r 7 8 # Check i f convergence i s reached Devert Alexandre (School of Software Engineering of USTC) k-means Slide 36/65
39 Update step 1 # Update t h e c l u s t e r s c e n t e r ( mean ) 2 f o r c in xrange ( nbclusters ) : 3 numpy. mean ( 4 [ p o i n t s [ p ] f o r p i n x r a n g e ( n b P o i n t s ) i f c l u s t e r s I d [ p ] == c ], 5 a x i s =0, out=c l u s t e r s C e n t e r s [ c ] ) line 2 for c from 0 to no. of clusters... line 3, 4 and 5 center of cluster c is the mean of all the points which belong to cluster c Devert Alexandre (School of Software Engineering of USTC) k-means Slide 37/65
40 Assignment step 1 # A t t r i b u t e each p o i n t to i t s c l o s e s t c l u s t e r 2 c l u s t e r s I d = [ 3 numpy. argmin ( 4 [ numpy. l i n a l g. norm ( p c ) f o r c i n c l u s t e r s C e n t e r s ] 5 ) 6 f o r p i n p o i n t s ] line 4 distance of cluster center c 0, c 1,..., c n to point p line 3 and 5 cluster of point p is the cluster with the closest center line 2 and 6 for each point p Devert Alexandre (School of Software Engineering of USTC) k-means Slide 38/65
41 Stopping criteria 1 # A t t r i b u t e each p o i n t to i t s c l o s e s t c l u s t e r 2 o l d C l u s t e r s I d = c l u s t e r s I d 3 c l u s t e r s I d = [ 4 numpy. argmin ( [ numpy. l i n a l g. norm ( p c ) f o r c i n c l u s t e r s C e n t e r s ] ) f o r p i n p o i n t s 5 ] 6 7 # Check i f convergence i s reached 8 c o n v e r g e d = numpy. a r r a y e q u a l ( o l d C l u s t e r s I d, c l u s t e r s I d ) line 2 keep previous cluster assignment of each point line 3, 4 and 5 assignment step line 8 if assignment did not changed, done Devert Alexandre (School of Software Engineering of USTC) k-means Slide 39/65
42 Table of Contents 1 Introduction 2 Visual demo Step by step Voronoi diagrams 3 Algorithm Update step Assignment step Stopping criteria 4 Implementation 5 Clustering quality Internal criteria External criteria 6 Initialization 7 Limitations Devert Alexandre (School of Software Engineering of USTC) k-means Slide 40/65
43 How much clusters? k-means can guess clusters, but not how many they are Devert Alexandre (School of Software Engineering of USTC) k-means Slide 41/65
44 Quality Wrong number of clusters bad clustering Not enough big gaps within a single cluster Devert Alexandre (School of Software Engineering of USTC) k-means Slide 42/65
45 Quality Wrong number of clusters bad clustering Too much clusters very close to each other Devert Alexandre (School of Software Engineering of USTC) k-means Slide 43/65
46 Internal quality indexes Internal quality indexes uses the clustered data only Devert Alexandre (School of Software Engineering of USTC) k-means Slide 44/65
47 Dunn index Higher is better min 1 i n (min i<j n d(c i, c j )) max 1 i n m i n no. clusters m i max. dist. of members of ith cluster with its center d(c i, c j ) dist. between ith and kth cluster centres Devert Alexandre (School of Software Engineering of USTC) k-means Slide 45/65
48 Davies-Bouldin index Lower is better 1 n n i=1 max i j ( ) mi + m j d(c i, c j ) n no. clusters m i avg. dist. f members of ith cluster with its center d(c i, c j ) dist. between ith and kth cluster centres Devert Alexandre (School of Software Engineering of USTC) k-means Slide 46/65
49 Using internal quality indexes When computing internal quality indexes do it for different no. clusters do it for several runs of k-means take the result with a grain of salt! Devert Alexandre (School of Software Engineering of USTC) k-means Slide 47/65
50 Using internal quality indexes Median Dunn & Davies-Bouldin indexes over 16 runs median Davies-Bouldin index median Dunn index no. clusters no. clusters According to the indices, 3 to 4 clusters seems ideal Devert Alexandre (School of Software Engineering of USTC) k-means Slide 48/65
51 External quality indexes External quality indexes use clustering result on data not used to compute the clustering Data used for clustering training set A Data used to test clustering test set B Devert Alexandre (School of Software Engineering of USTC) k-means Slide 49/65
52 Rand measure Higher is better (ratio of correct predictions) R = T + + T T + + F + + F + T T + no. true positives T no. true negatives F + no. false positives F no. false negatives Devert Alexandre (School of Software Engineering of USTC) k-means Slide 50/65
53 Jaccard index Higher is better (similarity of training & test sets) J(A, B) = A B A B Devert Alexandre (School of Software Engineering of USTC) k-means Slide 51/65
54 Table of Contents 1 Introduction 2 Visual demo Step by step Voronoi diagrams 3 Algorithm Update step Assignment step Stopping criteria 4 Implementation 5 Clustering quality Internal criteria External criteria 6 Initialization 7 Limitations Devert Alexandre (School of Software Engineering of USTC) k-means Slide 52/65
55 Random partition random partition end result not guaranteed Devert Alexandre (School of Software Engineering of USTC) k-means Slide 53/65
56 k-means++ initialization k-means++ choose centres as maximally distant points Devert Alexandre (School of Software Engineering of USTC) k-means Slide 54/65
57 k-means++ initialization The center C 1 of the 1st cluster is picked randomly amongst the point X 0, X 1,..., X n p 1 (X i ) = 1 n p 1 (X i ) is the probability for X i to be picked as C 1 Devert Alexandre (School of Software Engineering of USTC) k-means Slide 55/65
58 k-means++ initialization The center C 2 of the 2nd cluster is picked with probability proportional to its distance to the center C 1 of the 1st cluster p 2 (X i ) = d(c 1, X i ) n j=1 d(c 1, X j ) p 2 (X i ) is the probability for X i to be picked as C 2 Devert Alexandre (School of Software Engineering of USTC) k-means Slide 56/65
59 k-means++ initialization The center C 3 of the 3nd cluster is picked with probability proportional to the distance to the closest center p 3 (X i ) = min (d(c 1, X i ), d(c 2, X i )) n j=1 (min (d(c 1, X j ), d(c 2, X j )) p 3 (X i ) is the probability for X i to be picked as C 3 Devert Alexandre (School of Software Engineering of USTC) k-means Slide 57/65
60 k-means++ initialization Why using probability? Why not using the farthest point? Devert Alexandre (School of Software Engineering of USTC) k-means Slide 58/65
61 k-means++ initialization because we would select an outlier as a center Devert Alexandre (School of Software Engineering of USTC) k-means Slide 58/65
62 Smarter initialization k-means++ initialization usually gives better initial guess random partition kmeans++ Devert Alexandre (School of Software Engineering of USTC) k-means Slide 59/65
63 Multiple tries always run k-means several times very cheap anyway required to properly measure quality unlucky initialization problem less likely Devert Alexandre (School of Software Engineering of USTC) k-means Slide 60/65
64 Table of Contents 1 Introduction 2 Visual demo Step by step Voronoi diagrams 3 Algorithm Update step Assignment step Stopping criteria 4 Implementation 5 Clustering quality Internal criteria External criteria 6 Initialization 7 Limitations Devert Alexandre (School of Software Engineering of USTC) k-means Slide 61/65
65 Boundaries k-means clusters space in a rigid & sharp fashion Devert Alexandre (School of Software Engineering of USTC) k-means Slide 62/65
66 Clusters geometry k-means does not deal very well with non-globular clusters Devert Alexandre (School of Software Engineering of USTC) k-means Slide 63/65
67 Clusters geometry k-means does not deal very well with non-globular clusters Devert Alexandre (School of Software Engineering of USTC) k-means Slide 63/65
68 Clusters geometry k-means does not deal very well with non-globular clusters Devert Alexandre (School of Software Engineering of USTC) k-means Slide 63/65
69 Clusters geometry k-means does not deal very well with non-globular clusters Devert Alexandre (School of Software Engineering of USTC) k-means Slide 63/65
70 Clustering model k-means clusters space into cells Devert Alexandre (School of Software Engineering of USTC) k-means Slide 64/65
71 Clustering model Cells not always a proper model of our data... Devert Alexandre (School of Software Engineering of USTC) k-means Slide 65/65
72 Clustering model Cells not always a proper model of our data... Devert Alexandre (School of Software Engineering of USTC) k-means Slide 65/65
73 Clustering model Cells not always a proper model of our data... Devert Alexandre (School of Software Engineering of USTC) k-means Slide 65/65
CSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationADVANCED MACHINE LEARNING MACHINE LEARNING. Kernel for Clustering kernel K-Means
1 MACHINE LEARNING Kernel for Clustering ernel K-Means Outline of Today s Lecture 1. Review principle and steps of K-Means algorithm. Derive ernel version of K-means 3. Exercise: Discuss the geometrical
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster
More informationClustering. Lecture 6, 1/24/03 ECS289A
Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationSome material taken from: Yuri Boykov, Western Ontario
CS664 Lecture #22: Distance transforms, Hausdorff matching, flexible models Some material taken from: Yuri Boykov, Western Ontario Announcements The SIFT demo toolkit is available from http://www.evolution.com/product/oem/d
More informationDiscrete geometry. Lecture 2. Alexander & Michael Bronstein tosca.cs.technion.ac.il/book
Discrete geometry Lecture 2 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 The world is continuous, but the mind is discrete
More informationClustering Lecture 3: Hierarchical Methods
Clustering Lecture 3: Hierarchical Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced
More informationK-Means. Oct Youn-Hee Han
K-Means Oct. 2015 Youn-Hee Han http://link.koreatech.ac.kr ²K-Means algorithm An unsupervised clustering algorithm K stands for number of clusters. It is typically a user input to the algorithm Some criteria
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationClustering. Chapter 10 in Introduction to statistical learning
Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationK-Means. Algorithmic Methods of Data Mining M. Sc. Data Science Sapienza University of Rome. Carlos Castillo
K-Means Class Program University Semester Slides by Algorithmic Methods of Data Mining M. Sc. Data Science Sapienza University of Rome Fall 2017 Carlos Castillo http://chato.cl/ Sources: Mohammed J. Zaki,
More informationSegmentation Computer Vision Spring 2018, Lecture 27
Segmentation http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 218, Lecture 27 Course announcements Homework 7 is due on Sunday 6 th. - Any questions about homework 7? - How many of you have
More informationECS 234: Data Analysis: Clustering ECS 234
: Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More information数据挖掘 Introduction to Data Mining
数据挖掘 Introduction to Data Mining Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 S8700113C 1 Introduction Last week: Association Analysis
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationData Clustering. Danushka Bollegala
Data Clustering Danushka Bollegala Outline Why cluster data? Clustering as unsupervised learning Clustering algorithms k-means, k-medoids agglomerative clustering Brown s clustering Spectral clustering
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationApplications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors
Segmentation I Goal Separate image into coherent regions Berkeley segmentation database: http://www.eecs.berkeley.edu/research/projects/cs/vision/grouping/segbench/ Slide by L. Lazebnik Applications Intelligent
More informationUnsupervised Learning: K-means Clustering
Unsupervised Learning: K-means Clustering by Prof. Seungchul Lee isystems Design Lab http://isystems.unist.ac.kr/ UNIST Table of Contents I. 1. Supervised vs. Unsupervised Learning II. 2. K-means I. 2.1.
More informationClustering: Overview and K-means algorithm
Clustering: Overview and K-means algorithm Informal goal Given set of objects and measure of similarity between them, group similar objects together K-Means illustrations thanks to 2006 student Martin
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Slides From Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationINF 4300 Classification III Anne Solberg The agenda today:
INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationClustering: Overview and K-means algorithm
Clustering: Overview and K-means algorithm Informal goal Given set of objects and measure of similarity between them, group similar objects together K-Means illustrations thanks to 2006 student Martin
More informationCPSC 340: Machine Learning and Data Mining. Density-Based Clustering Fall 2016
CPSC 340: Machine Learning and Data Mining Density-Based Clustering Fall 2016 Assignment 1 : Admin 2 late days to hand it in before Wednesday s class. 3 late days to hand it in before Friday s class. 0
More informationUnsupervised Learning Partitioning Methods
Unsupervised Learning Partitioning Methods Road Map 1. Basic Concepts 2. K-Means 3. K-Medoids 4. CLARA & CLARANS Cluster Analysis Unsupervised learning (i.e., Class label is unknown) Group data to form
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationSubspace Clustering with Global Dimension Minimization And Application to Motion Segmentation
Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation Bryan Poling University of Minnesota Joint work with Gilad Lerman University of Minnesota The Problem of Subspace
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More informationCluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University
Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Kinds of Clustering Sequential Fast Cost Optimization Fixed number of clusters Hierarchical
More informationClustering. (Part 2)
Clustering (Part 2) 1 k-means clustering 2 General Observations on k-means clustering In essence, k-means clustering aims at minimizing cluster variance. It is typically used in Euclidean spaces and works
More informationUsing Augmented Measurements to Improve the Convergence of ICP. Jacopo Serafin and Giorgio Grisetti
Jacopo Serafin and Giorgio Grisetti Point Cloud Registration We want to find the rotation and the translation that maximize the overlap between two point clouds Page 2 Point Cloud Registration We want
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 9: Data Mining (4/4) March 9, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides
More informationCluster analysis. Agnieszka Nowak - Brzezinska
Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that
More informationHierarchical Clustering
Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree-like diagram that records the sequences of merges
More informationClustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search
Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2
More informationSegmentation (continued)
Segmentation (continued) Lecture 05 Computer Vision Material Citations Dr George Stockman Professor Emeritus, Michigan State University Dr Mubarak Shah Professor, University of Central Florida The Robotics
More informationMineração de Dados Aplicada
Data Exploration August, 9 th 2017 DCC ICEx UFMG Summary of the last session Data mining Data mining is an empiricism; It can be seen as a generalization of querying; It lacks a unified theory; It implies
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationFinding Clusters 1 / 60
Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-means Density Based Clustering, e.g. DBScan Grid Based Clustering 1 / 60
More informationToday s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ
Clustering CS498 Today s lecture Clustering and unsupervised learning Hierarchical clustering K-means, K-medoids, VQ Unsupervised learning Supervised learning Use labeled data to do something smart What
More informationImage Analysis - Lecture 5
Texture Segmentation Clustering Review Image Analysis - Lecture 5 Texture and Segmentation Magnus Oskarsson Lecture 5 Texture Segmentation Clustering Review Contents Texture Textons Filter Banks Gabor
More informationMultivariate Analysis (slides 9)
Multivariate Analysis (slides 9) Today we consider k-means clustering. We will address the question of selecting the appropriate number of clusters. Properties and limitations of the algorithm will be
More informationComputational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions
Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationLecture-17: Clustering with K-Means (Contd: DT + Random Forest)
Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Medha Vidyotma April 24, 2018 1 Contd. Random Forest For Example, if there are 50 scholars who take the measurement of the length of the
More informationUnsupervised learning in Vision
Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual
More informationCluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole
Cluster Analysis Summer School on Geocomputation 27 June 2011 2 July 2011 Vysoké Pole Lecture delivered by: doc. Mgr. Radoslav Harman, PhD. Faculty of Mathematics, Physics and Informatics Comenius University,
More informationClustering. Supervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationHierarchical clustering
Hierarchical clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Description Produces a set of nested clusters organized as a hierarchical tree. Can be visualized
More informationStats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms
Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California,
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationExpectation Maximization!
Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University and http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Steps in Clustering Select Features
More informationMachine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves
Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationCluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010
Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationTRANSACTIONAL CLUSTERING. Anna Monreale University of Pisa
TRANSACTIONAL CLUSTERING Anna Monreale University of Pisa Clustering Clustering : Grouping of objects into different sets, or more precisely, the partitioning of a data set into subsets (clusters), so
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationCSE 494 Project C. Garrett Wolf
CSE 494 Project C Garrett Wolf Introduction The main purpose of this project task was for us to implement the simple k-means and buckshot clustering algorithms. Once implemented, we were asked to vary
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More information10.4 Linear interpolation method Newton s method
10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by
More informationk-means Clustering David S. Rosenberg April 24, 2018 New York University
k-means Clustering David S. Rosenberg New York University April 24, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 24, 2018 1 / 19 Contents 1 k-means Clustering 2 k-means:
More informationLecture 11 Combinatorial Planning: In the Plane
CS 460/560 Introduction to Computational Robotics Fall 2017, Rutgers University Lecture 11 Combinatorial Planning: In the Plane Instructor: Jingjin Yu Outline Convex shapes, revisited Combinatorial planning
More information[7.3, EA], [9.1, CMB]
K-means Clustering Ke Chen Reading: [7.3, EA], [9.1, CMB] Outline Introduction K-means Algorithm Example How K-means partitions? K-means Demo Relevant Issues Application: Cell Neulei Detection Summary
More informationClustering Part 3. Hierarchical Clustering
Clustering Part Dr Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Hierarchical Clustering Two main types: Agglomerative Start with the points
More informationLecture 3: Linear Classification
Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.
More informationGradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz
Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint
More informationClustering Tips and Tricks in 45 minutes (maybe more :)
Clustering Tips and Tricks in 45 minutes (maybe more :) Olfa Nasraoui, University of Louisville Tutorial for the Data Science for Social Good Fellowship 2015 cohort @DSSG2015@University of Chicago https://www.researchgate.net/profile/olfa_nasraoui
More informationCBioVikings. Richard Röttger. Copenhagen February 2 nd, Clustering of Biomedical Data
CBioVikings Copenhagen February 2 nd, Richard Röttger 1 Who is talking? 2 Resources Go to http://imada.sdu.dk/~roettger/teaching/cbiovikings.php You will find The dataset These slides An overview paper
More informationFast Computation of Generalized Voronoi Diagrams Using Graphics Hardware
Fast Computation of Generalized Voronoi Diagrams Using Graphics Hardware paper by Kennet E. Hoff et al. (University of North Carolina at Chapel Hill) presented by Daniel Emmenegger GDV-Seminar ETH Zürich,
More informationOverview of Clustering
based on Loïc Cerfs slides (UFMG) April 2017 UCBL LIRIS DM2L Example of applicative problem Student profiles Given the marks received by students for different courses, how to group the students so that
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression
More informationCS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017
CS 2750: Machine Learning Clustering Prof. Adriana Kovashka University of Pittsburgh January 17, 2017 What is clustering? Grouping items that belong together (i.e. have similar features) Unsupervised:
More informationBig-data Clustering: K-means vs K-indicators
Big-data Clustering: K-means vs K-indicators Yin Zhang Dept. of Computational & Applied Math. Rice University, Houston, Texas, U.S.A. Joint work with Feiyu Chen & Taiping Zhang (CQU), Liwei Xu (UESTC)
More informationClustering. RNA-seq: What is it good for? Finding Similarly Expressed Genes. Data... And Lots of It!
RNA-seq: What is it good for? Clustering High-throughput RNA sequencing experiments (RNA-seq) offer the ability to measure simultaneously the expression level of thousands of genes in a single experiment!
More informationClust Clus e t ring 2 Nov
Clustering 2 Nov 3 2008 HAC Algorithm Start t with all objects in their own cluster. Until there is only one cluster: Among the current clusters, determine the two clusters, c i and c j, that are most
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationMixture models and clustering
1 Lecture topics: Miture models and clustering, k-means Distance and clustering Miture models and clustering We have so far used miture models as fleible ays of constructing probability models for prediction
More informationStructured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov
Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More informationIntroduction to Computer Science
DM534 Introduction to Computer Science Clustering and Feature Spaces Richard Roettger: About Me Computer Science (Technical University of Munich and thesis at the ICSI at the University of California at
More informationSTATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010
STATS306B Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Outline K-means, K-medoids, EM algorithm choosing number of clusters: Gap test hierarchical clustering spectral
More informationMachine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016
Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 2
Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical
More information