k-means A classical clustering algorithm

Size: px
Start display at page:

Download "k-means A classical clustering algorithm"

Transcription

1 k-means A classical clustering algorithm Devert Alexandre School of Software Engineering of USTC 30 November 2012 Slide 1/65

2 Table of Contents 1 Introduction 2 Visual demo Step by step Voronoi diagrams 3 Algorithm Update step Assignment step Stopping criteria 4 Implementation 5 Clustering quality Internal criteria External criteria 6 Initialization 7 Limitations Devert Alexandre (School of Software Engineering of USTC) k-means Slide 2/65

3 Purpose k-means is a very generic clustering algorithm. Works with R n vectors sequences: strings, sampled signals, etc. high-level data: pictures, sounds, biometric records, etc. composite data: mix all the above Devert Alexandre (School of Software Engineering of USTC) k-means Slide 3/65

4 Inputs k-means inputs A set of points Choose a number of clusters Choose a distance function between points Devert Alexandre (School of Software Engineering of USTC) k-means Slide 4/65

5 Outputs k-means outputs A label for each point (one label = one cluster) A center per cluster Devert Alexandre (School of Software Engineering of USTC) k-means Slide 5/65

6 Algorithm k-means is an iterative algorithm need all the dataset at once Devert Alexandre (School of Software Engineering of USTC) k-means Slide 6/65

7 Table of Contents 1 Introduction 2 Visual demo Step by step Voronoi diagrams 3 Algorithm Update step Assignment step Stopping criteria 4 Implementation 5 Clustering quality Internal criteria External criteria 6 Initialization 7 Limitations Devert Alexandre (School of Software Engineering of USTC) k-means Slide 7/65

8 Input data We want 3 clusters, red, green and blue Devert Alexandre (School of Software Engineering of USTC) k-means Slide 8/65

9 Initialization Points are tagged randomly Devert Alexandre (School of Software Engineering of USTC) k-means Slide 9/65

10 Update Computes center of red, green and blue points Devert Alexandre (School of Software Engineering of USTC) k-means Slide 10/65

11 Space partition We can cut the space in 3 areas: a Voronoi diagram! One area area which points are closer to one center Devert Alexandre (School of Software Engineering of USTC) k-means Slide 11/65

12 Assignment Change the color of the points One point get the color of the closest cluster center Devert Alexandre (School of Software Engineering of USTC) k-means Slide 12/65

13 Update Update center of red, green and blue points Devert Alexandre (School of Software Engineering of USTC) k-means Slide 13/65

14 Space partition The 3 areas changed One area area which points are closer to one center Devert Alexandre (School of Software Engineering of USTC) k-means Slide 14/65

15 Assignment Change the color of the points One point get the color of the closest cluster center Devert Alexandre (School of Software Engineering of USTC) k-means Slide 15/65

16 Update Update center of red, green and blue points Devert Alexandre (School of Software Engineering of USTC) k-means Slide 16/65

17 Space partition The 3 areas changed (a little tiny bit) Devert Alexandre (School of Software Engineering of USTC) k-means Slide 17/65

18 Assignment All points coloured properly already we are done! Devert Alexandre (School of Software Engineering of USTC) k-means Slide 18/65

19 Voronoi diagram A little reminder/introduction to Voronoi diagrams Devert Alexandre (School of Software Engineering of USTC) k-means Slide 19/65

20 Voronoi diagram Let s put some points on this slide Devert Alexandre (School of Software Engineering of USTC) k-means Slide 20/65

21 Voronoi diagram Let s picture the distance to the closest point Devert Alexandre (School of Software Engineering of USTC) k-means Slide 21/65

22 Voronoi diagram Let s picture the places equidistant to several points Devert Alexandre (School of Software Engineering of USTC) k-means Slide 22/65

23 Voronoi diagram The Voronoi diagram is made of the borders between cells Devert Alexandre (School of Software Engineering of USTC) k-means Slide 23/65

24 Table of Contents 1 Introduction 2 Visual demo Step by step Voronoi diagrams 3 Algorithm Update step Assignment step Stopping criteria 4 Implementation 5 Clustering quality Internal criteria External criteria 6 Initialization 7 Limitations Devert Alexandre (School of Software Engineering of USTC) k-means Slide 24/65

25 Algorithm A 3 steps loop 1 Update where are the clusters centres? 2 Assignment who belong to whom? 3 Stopping criteria are we done yet? Devert Alexandre (School of Software Engineering of USTC) k-means Slide 25/65

26 Update step Update step: compute the center of each clusters. Center of a cluster can be 1 L 2 norm geometric center of the cluster s points 2 median point of the cluster s points 3 medoid point of the cluster s points 4 whatever makes sense for your data The last point will have a lecture just for it! Devert Alexandre (School of Software Engineering of USTC) k-means Slide 26/65

27 Center of a cluster Let s compute the center of those points Devert Alexandre (School of Software Engineering of USTC) k-means Slide 27/65

28 Center of a cluster We can use the mean on each dimension Devert Alexandre (School of Software Engineering of USTC) k-means Slide 28/65

29 Center of a cluster We can use the mean on each dimension Devert Alexandre (School of Software Engineering of USTC) k-means Slide 28/65

30 Center of a cluster We can use the mean on each dimension Devert Alexandre (School of Software Engineering of USTC) k-means Slide 28/65

31 Center of a cluster But the mean have troubles with outliers Devert Alexandre (School of Software Engineering of USTC) k-means Slide 29/65

32 Center of a cluster Using the median on each dimension is more robust Devert Alexandre (School of Software Engineering of USTC) k-means Slide 30/65

33 Assignment Assignment step: a point belongs to the closest cluster center before after Devert Alexandre (School of Software Engineering of USTC) k-means Slide 31/65

34 Stopping criteria Stopping criteria: no more cluster assignments change k-means always converge final clusters might not be ideal not so good clustering better clustering Devert Alexandre (School of Software Engineering of USTC) k-means Slide 32/65

35 Table of Contents 1 Introduction 2 Visual demo Step by step Voronoi diagrams 3 Algorithm Update step Assignment step Stopping criteria 4 Implementation 5 Clustering quality Internal criteria External criteria 6 Initialization 7 Limitations Devert Alexandre (School of Software Engineering of USTC) k-means Slide 33/65

36 27 lines, comments and line-skips included Devert Alexandre (School of Software Engineering of USTC) k-means Slide 34/65 UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA k-means in Python 1 i m p o r t numpy 2 3 # Parameters 4 p o i n t s = numpy. l o a d t x t ( m y F i l e. dat ) 5 nbclusters, nbpoints = 2, points. shape [ 0 ] 6 7 # I n i t i a l i z a t i o n 8 c l u s t e r s I d = numpy. random. r a n d i n t ( 0, n b C l u s t e r s, s i z e=n b P o i n t s ) 9 c l u s t e r s C e n t e r s = numpy. z e r o s ( ( n b C l u s t e r s, p o i n t s. shape [ 1 ] ) ) # I t e r a t i o n 12 c o n v e r g e d = F a l s e 13 w h i l e not converged : 14 # Update t h e c l u s t e r s c e n t e r ( mean ) 15 f o r c in xrange ( nbclusters ) : 16 numpy. mean ( 17 [ p o i n t s [ p ] f o r p i n x r a n g e ( n b P o i n t s ) i f c l u s t e r s I d [ p ] == c ], 18 a x i s =0, out=c l u s t e r s C e n t e r s [ c ] ) # A t t r i b u t e each p o i n t to i t s c l o s e s t c l u s t e r 21 o l d C l u s t e r s I d = c l u s t e r s I d 22 c l u s t e r s I d = [ 23 numpy. argmin ( [ numpy. l i n a l g. norm ( p c ) f o r c i n c l u s t e r s C e n t e r s ] ) f o r p i n p o i n t s 24 ] # Check i f convergence i s reached 27 c o n v e r g e d = numpy. a r r a y e q u a l ( o l d C l u s t e r s I d, c l u s t e r s I d )

37 Initialization 1 i m p o r t numpy 2 3 # Parameters 4 p o i n t s = numpy. l o a d t x t ( m y F i l e. dat ) 5 nbclusters, nbpoints = 2, points. shape [ 0 ] 6 7 # I n i t i a l i z a t i o n 8 c l u s t e r s I d = numpy. random. r a n d i n t ( 0, n b C l u s t e r s, s i z e=n b P o i n t s ) 9 c l u s t e r s C e n t e r s = numpy. z e r o s ( ( n b C l u s t e r s, p o i n t s. shape [ 1 ] ) ) line 1 let s use Numpy line 4 and 5 load data line 8 points are assigned randomly to a cluster line 9 matrix storing center of each cluster Devert Alexandre (School of Software Engineering of USTC) k-means Slide 35/65

38 Iterations 1 # I t e r a t i o n 2 c o n v e r g e d = F a l s e 3 w h i l e not converged : 4 # Update t h e c l u s t e r s c e n t e r ( mean ) 5 6 # A t t r i b u t e each p o i n t to i t s c l o s e s t c l u s t e r 7 8 # Check i f convergence i s reached Devert Alexandre (School of Software Engineering of USTC) k-means Slide 36/65

39 Update step 1 # Update t h e c l u s t e r s c e n t e r ( mean ) 2 f o r c in xrange ( nbclusters ) : 3 numpy. mean ( 4 [ p o i n t s [ p ] f o r p i n x r a n g e ( n b P o i n t s ) i f c l u s t e r s I d [ p ] == c ], 5 a x i s =0, out=c l u s t e r s C e n t e r s [ c ] ) line 2 for c from 0 to no. of clusters... line 3, 4 and 5 center of cluster c is the mean of all the points which belong to cluster c Devert Alexandre (School of Software Engineering of USTC) k-means Slide 37/65

40 Assignment step 1 # A t t r i b u t e each p o i n t to i t s c l o s e s t c l u s t e r 2 c l u s t e r s I d = [ 3 numpy. argmin ( 4 [ numpy. l i n a l g. norm ( p c ) f o r c i n c l u s t e r s C e n t e r s ] 5 ) 6 f o r p i n p o i n t s ] line 4 distance of cluster center c 0, c 1,..., c n to point p line 3 and 5 cluster of point p is the cluster with the closest center line 2 and 6 for each point p Devert Alexandre (School of Software Engineering of USTC) k-means Slide 38/65

41 Stopping criteria 1 # A t t r i b u t e each p o i n t to i t s c l o s e s t c l u s t e r 2 o l d C l u s t e r s I d = c l u s t e r s I d 3 c l u s t e r s I d = [ 4 numpy. argmin ( [ numpy. l i n a l g. norm ( p c ) f o r c i n c l u s t e r s C e n t e r s ] ) f o r p i n p o i n t s 5 ] 6 7 # Check i f convergence i s reached 8 c o n v e r g e d = numpy. a r r a y e q u a l ( o l d C l u s t e r s I d, c l u s t e r s I d ) line 2 keep previous cluster assignment of each point line 3, 4 and 5 assignment step line 8 if assignment did not changed, done Devert Alexandre (School of Software Engineering of USTC) k-means Slide 39/65

42 Table of Contents 1 Introduction 2 Visual demo Step by step Voronoi diagrams 3 Algorithm Update step Assignment step Stopping criteria 4 Implementation 5 Clustering quality Internal criteria External criteria 6 Initialization 7 Limitations Devert Alexandre (School of Software Engineering of USTC) k-means Slide 40/65

43 How much clusters? k-means can guess clusters, but not how many they are Devert Alexandre (School of Software Engineering of USTC) k-means Slide 41/65

44 Quality Wrong number of clusters bad clustering Not enough big gaps within a single cluster Devert Alexandre (School of Software Engineering of USTC) k-means Slide 42/65

45 Quality Wrong number of clusters bad clustering Too much clusters very close to each other Devert Alexandre (School of Software Engineering of USTC) k-means Slide 43/65

46 Internal quality indexes Internal quality indexes uses the clustered data only Devert Alexandre (School of Software Engineering of USTC) k-means Slide 44/65

47 Dunn index Higher is better min 1 i n (min i<j n d(c i, c j )) max 1 i n m i n no. clusters m i max. dist. of members of ith cluster with its center d(c i, c j ) dist. between ith and kth cluster centres Devert Alexandre (School of Software Engineering of USTC) k-means Slide 45/65

48 Davies-Bouldin index Lower is better 1 n n i=1 max i j ( ) mi + m j d(c i, c j ) n no. clusters m i avg. dist. f members of ith cluster with its center d(c i, c j ) dist. between ith and kth cluster centres Devert Alexandre (School of Software Engineering of USTC) k-means Slide 46/65

49 Using internal quality indexes When computing internal quality indexes do it for different no. clusters do it for several runs of k-means take the result with a grain of salt! Devert Alexandre (School of Software Engineering of USTC) k-means Slide 47/65

50 Using internal quality indexes Median Dunn & Davies-Bouldin indexes over 16 runs median Davies-Bouldin index median Dunn index no. clusters no. clusters According to the indices, 3 to 4 clusters seems ideal Devert Alexandre (School of Software Engineering of USTC) k-means Slide 48/65

51 External quality indexes External quality indexes use clustering result on data not used to compute the clustering Data used for clustering training set A Data used to test clustering test set B Devert Alexandre (School of Software Engineering of USTC) k-means Slide 49/65

52 Rand measure Higher is better (ratio of correct predictions) R = T + + T T + + F + + F + T T + no. true positives T no. true negatives F + no. false positives F no. false negatives Devert Alexandre (School of Software Engineering of USTC) k-means Slide 50/65

53 Jaccard index Higher is better (similarity of training & test sets) J(A, B) = A B A B Devert Alexandre (School of Software Engineering of USTC) k-means Slide 51/65

54 Table of Contents 1 Introduction 2 Visual demo Step by step Voronoi diagrams 3 Algorithm Update step Assignment step Stopping criteria 4 Implementation 5 Clustering quality Internal criteria External criteria 6 Initialization 7 Limitations Devert Alexandre (School of Software Engineering of USTC) k-means Slide 52/65

55 Random partition random partition end result not guaranteed Devert Alexandre (School of Software Engineering of USTC) k-means Slide 53/65

56 k-means++ initialization k-means++ choose centres as maximally distant points Devert Alexandre (School of Software Engineering of USTC) k-means Slide 54/65

57 k-means++ initialization The center C 1 of the 1st cluster is picked randomly amongst the point X 0, X 1,..., X n p 1 (X i ) = 1 n p 1 (X i ) is the probability for X i to be picked as C 1 Devert Alexandre (School of Software Engineering of USTC) k-means Slide 55/65

58 k-means++ initialization The center C 2 of the 2nd cluster is picked with probability proportional to its distance to the center C 1 of the 1st cluster p 2 (X i ) = d(c 1, X i ) n j=1 d(c 1, X j ) p 2 (X i ) is the probability for X i to be picked as C 2 Devert Alexandre (School of Software Engineering of USTC) k-means Slide 56/65

59 k-means++ initialization The center C 3 of the 3nd cluster is picked with probability proportional to the distance to the closest center p 3 (X i ) = min (d(c 1, X i ), d(c 2, X i )) n j=1 (min (d(c 1, X j ), d(c 2, X j )) p 3 (X i ) is the probability for X i to be picked as C 3 Devert Alexandre (School of Software Engineering of USTC) k-means Slide 57/65

60 k-means++ initialization Why using probability? Why not using the farthest point? Devert Alexandre (School of Software Engineering of USTC) k-means Slide 58/65

61 k-means++ initialization because we would select an outlier as a center Devert Alexandre (School of Software Engineering of USTC) k-means Slide 58/65

62 Smarter initialization k-means++ initialization usually gives better initial guess random partition kmeans++ Devert Alexandre (School of Software Engineering of USTC) k-means Slide 59/65

63 Multiple tries always run k-means several times very cheap anyway required to properly measure quality unlucky initialization problem less likely Devert Alexandre (School of Software Engineering of USTC) k-means Slide 60/65

64 Table of Contents 1 Introduction 2 Visual demo Step by step Voronoi diagrams 3 Algorithm Update step Assignment step Stopping criteria 4 Implementation 5 Clustering quality Internal criteria External criteria 6 Initialization 7 Limitations Devert Alexandre (School of Software Engineering of USTC) k-means Slide 61/65

65 Boundaries k-means clusters space in a rigid & sharp fashion Devert Alexandre (School of Software Engineering of USTC) k-means Slide 62/65

66 Clusters geometry k-means does not deal very well with non-globular clusters Devert Alexandre (School of Software Engineering of USTC) k-means Slide 63/65

67 Clusters geometry k-means does not deal very well with non-globular clusters Devert Alexandre (School of Software Engineering of USTC) k-means Slide 63/65

68 Clusters geometry k-means does not deal very well with non-globular clusters Devert Alexandre (School of Software Engineering of USTC) k-means Slide 63/65

69 Clusters geometry k-means does not deal very well with non-globular clusters Devert Alexandre (School of Software Engineering of USTC) k-means Slide 63/65

70 Clustering model k-means clusters space into cells Devert Alexandre (School of Software Engineering of USTC) k-means Slide 64/65

71 Clustering model Cells not always a proper model of our data... Devert Alexandre (School of Software Engineering of USTC) k-means Slide 65/65

72 Clustering model Cells not always a proper model of our data... Devert Alexandre (School of Software Engineering of USTC) k-means Slide 65/65

73 Clustering model Cells not always a proper model of our data... Devert Alexandre (School of Software Engineering of USTC) k-means Slide 65/65

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

ADVANCED MACHINE LEARNING MACHINE LEARNING. Kernel for Clustering kernel K-Means

ADVANCED MACHINE LEARNING MACHINE LEARNING. Kernel for Clustering kernel K-Means 1 MACHINE LEARNING Kernel for Clustering ernel K-Means Outline of Today s Lecture 1. Review principle and steps of K-Means algorithm. Derive ernel version of K-means 3. Exercise: Discuss the geometrical

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster

More information

Clustering. Lecture 6, 1/24/03 ECS289A

Clustering. Lecture 6, 1/24/03 ECS289A Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

Some material taken from: Yuri Boykov, Western Ontario

Some material taken from: Yuri Boykov, Western Ontario CS664 Lecture #22: Distance transforms, Hausdorff matching, flexible models Some material taken from: Yuri Boykov, Western Ontario Announcements The SIFT demo toolkit is available from http://www.evolution.com/product/oem/d

More information

Discrete geometry. Lecture 2. Alexander & Michael Bronstein tosca.cs.technion.ac.il/book

Discrete geometry. Lecture 2. Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Discrete geometry Lecture 2 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 The world is continuous, but the mind is discrete

More information

Clustering Lecture 3: Hierarchical Methods

Clustering Lecture 3: Hierarchical Methods Clustering Lecture 3: Hierarchical Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced

More information

K-Means. Oct Youn-Hee Han

K-Means. Oct Youn-Hee Han K-Means Oct. 2015 Youn-Hee Han http://link.koreatech.ac.kr ²K-Means algorithm An unsupervised clustering algorithm K stands for number of clusters. It is typically a user input to the algorithm Some criteria

More information

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for

More information

Clustering. Chapter 10 in Introduction to statistical learning

Clustering. Chapter 10 in Introduction to statistical learning Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

K-Means. Algorithmic Methods of Data Mining M. Sc. Data Science Sapienza University of Rome. Carlos Castillo

K-Means. Algorithmic Methods of Data Mining M. Sc. Data Science Sapienza University of Rome. Carlos Castillo K-Means Class Program University Semester Slides by Algorithmic Methods of Data Mining M. Sc. Data Science Sapienza University of Rome Fall 2017 Carlos Castillo http://chato.cl/ Sources: Mohammed J. Zaki,

More information

Segmentation Computer Vision Spring 2018, Lecture 27

Segmentation Computer Vision Spring 2018, Lecture 27 Segmentation http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 218, Lecture 27 Course announcements Homework 7 is due on Sunday 6 th. - Any questions about homework 7? - How many of you have

More information

ECS 234: Data Analysis: Clustering ECS 234

ECS 234: Data Analysis: Clustering ECS 234 : Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

数据挖掘 Introduction to Data Mining

数据挖掘 Introduction to Data Mining 数据挖掘 Introduction to Data Mining Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 S8700113C 1 Introduction Last week: Association Analysis

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Data Clustering. Danushka Bollegala

Data Clustering. Danushka Bollegala Data Clustering Danushka Bollegala Outline Why cluster data? Clustering as unsupervised learning Clustering algorithms k-means, k-medoids agglomerative clustering Brown s clustering Spectral clustering

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors Segmentation I Goal Separate image into coherent regions Berkeley segmentation database: http://www.eecs.berkeley.edu/research/projects/cs/vision/grouping/segbench/ Slide by L. Lazebnik Applications Intelligent

More information

Unsupervised Learning: K-means Clustering

Unsupervised Learning: K-means Clustering Unsupervised Learning: K-means Clustering by Prof. Seungchul Lee isystems Design Lab http://isystems.unist.ac.kr/ UNIST Table of Contents I. 1. Supervised vs. Unsupervised Learning II. 2. K-means I. 2.1.

More information

Clustering: Overview and K-means algorithm

Clustering: Overview and K-means algorithm Clustering: Overview and K-means algorithm Informal goal Given set of objects and measure of similarity between them, group similar objects together K-Means illustrations thanks to 2006 student Martin

More information

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM. Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Slides From Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Clustering: Overview and K-means algorithm

Clustering: Overview and K-means algorithm Clustering: Overview and K-means algorithm Informal goal Given set of objects and measure of similarity between them, group similar objects together K-Means illustrations thanks to 2006 student Martin

More information

CPSC 340: Machine Learning and Data Mining. Density-Based Clustering Fall 2016

CPSC 340: Machine Learning and Data Mining. Density-Based Clustering Fall 2016 CPSC 340: Machine Learning and Data Mining Density-Based Clustering Fall 2016 Assignment 1 : Admin 2 late days to hand it in before Wednesday s class. 3 late days to hand it in before Friday s class. 0

More information

Unsupervised Learning Partitioning Methods

Unsupervised Learning Partitioning Methods Unsupervised Learning Partitioning Methods Road Map 1. Basic Concepts 2. K-Means 3. K-Medoids 4. CLARA & CLARANS Cluster Analysis Unsupervised learning (i.e., Class label is unknown) Group data to form

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation

Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation Bryan Poling University of Minnesota Joint work with Gilad Lerman University of Minnesota The Problem of Subspace

More information

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups

More information

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Kinds of Clustering Sequential Fast Cost Optimization Fixed number of clusters Hierarchical

More information

Clustering. (Part 2)

Clustering. (Part 2) Clustering (Part 2) 1 k-means clustering 2 General Observations on k-means clustering In essence, k-means clustering aims at minimizing cluster variance. It is typically used in Euclidean spaces and works

More information

Using Augmented Measurements to Improve the Convergence of ICP. Jacopo Serafin and Giorgio Grisetti

Using Augmented Measurements to Improve the Convergence of ICP. Jacopo Serafin and Giorgio Grisetti Jacopo Serafin and Giorgio Grisetti Point Cloud Registration We want to find the rotation and the translation that maximize the overlap between two point clouds Page 2 Point Cloud Registration We want

More information

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 9: Data Mining (4/4) March 9, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides

More information

Cluster analysis. Agnieszka Nowak - Brzezinska

Cluster analysis. Agnieszka Nowak - Brzezinska Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that

More information

Hierarchical Clustering

Hierarchical Clustering Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree-like diagram that records the sequences of merges

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

Segmentation (continued)

Segmentation (continued) Segmentation (continued) Lecture 05 Computer Vision Material Citations Dr George Stockman Professor Emeritus, Michigan State University Dr Mubarak Shah Professor, University of Central Florida The Robotics

More information

Mineração de Dados Aplicada

Mineração de Dados Aplicada Data Exploration August, 9 th 2017 DCC ICEx UFMG Summary of the last session Data mining Data mining is an empiricism; It can be seen as a generalization of querying; It lacks a unified theory; It implies

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Finding Clusters 1 / 60

Finding Clusters 1 / 60 Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-means Density Based Clustering, e.g. DBScan Grid Based Clustering 1 / 60

More information

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ Clustering CS498 Today s lecture Clustering and unsupervised learning Hierarchical clustering K-means, K-medoids, VQ Unsupervised learning Supervised learning Use labeled data to do something smart What

More information

Image Analysis - Lecture 5

Image Analysis - Lecture 5 Texture Segmentation Clustering Review Image Analysis - Lecture 5 Texture and Segmentation Magnus Oskarsson Lecture 5 Texture Segmentation Clustering Review Contents Texture Textons Filter Banks Gabor

More information

Multivariate Analysis (slides 9)

Multivariate Analysis (slides 9) Multivariate Analysis (slides 9) Today we consider k-means clustering. We will address the question of selecting the appropriate number of clusters. Properties and limitations of the algorithm will be

More information

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Medha Vidyotma April 24, 2018 1 Contd. Random Forest For Example, if there are 50 scholars who take the measurement of the length of the

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole Cluster Analysis Summer School on Geocomputation 27 June 2011 2 July 2011 Vysoké Pole Lecture delivered by: doc. Mgr. Radoslav Harman, PhD. Faculty of Mathematics, Physics and Informatics Comenius University,

More information

Clustering. Supervised vs. Unsupervised Learning

Clustering. Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

Hierarchical clustering

Hierarchical clustering Hierarchical clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Description Produces a set of nested clusters organized as a hierarchical tree. Can be visualized

More information

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California,

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

Expectation Maximization!

Expectation Maximization! Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University and http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Steps in Clustering Select Features

More information

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves

More information

Visual Representations for Machine Learning

Visual Representations for Machine Learning Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1 Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group

More information

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010 Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

TRANSACTIONAL CLUSTERING. Anna Monreale University of Pisa

TRANSACTIONAL CLUSTERING. Anna Monreale University of Pisa TRANSACTIONAL CLUSTERING Anna Monreale University of Pisa Clustering Clustering : Grouping of objects into different sets, or more precisely, the partitioning of a data set into subsets (clusters), so

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

CSE 494 Project C. Garrett Wolf

CSE 494 Project C. Garrett Wolf CSE 494 Project C Garrett Wolf Introduction The main purpose of this project task was for us to implement the simple k-means and buckshot clustering algorithms. Once implemented, we were asked to vary

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

10.4 Linear interpolation method Newton s method

10.4 Linear interpolation method Newton s method 10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by

More information

k-means Clustering David S. Rosenberg April 24, 2018 New York University

k-means Clustering David S. Rosenberg April 24, 2018 New York University k-means Clustering David S. Rosenberg New York University April 24, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 24, 2018 1 / 19 Contents 1 k-means Clustering 2 k-means:

More information

Lecture 11 Combinatorial Planning: In the Plane

Lecture 11 Combinatorial Planning: In the Plane CS 460/560 Introduction to Computational Robotics Fall 2017, Rutgers University Lecture 11 Combinatorial Planning: In the Plane Instructor: Jingjin Yu Outline Convex shapes, revisited Combinatorial planning

More information

[7.3, EA], [9.1, CMB]

[7.3, EA], [9.1, CMB] K-means Clustering Ke Chen Reading: [7.3, EA], [9.1, CMB] Outline Introduction K-means Algorithm Example How K-means partitions? K-means Demo Relevant Issues Application: Cell Neulei Detection Summary

More information

Clustering Part 3. Hierarchical Clustering

Clustering Part 3. Hierarchical Clustering Clustering Part Dr Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Hierarchical Clustering Two main types: Agglomerative Start with the points

More information

Lecture 3: Linear Classification

Lecture 3: Linear Classification Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.

More information

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint

More information

Clustering Tips and Tricks in 45 minutes (maybe more :)

Clustering Tips and Tricks in 45 minutes (maybe more :) Clustering Tips and Tricks in 45 minutes (maybe more :) Olfa Nasraoui, University of Louisville Tutorial for the Data Science for Social Good Fellowship 2015 cohort @DSSG2015@University of Chicago https://www.researchgate.net/profile/olfa_nasraoui

More information

CBioVikings. Richard Röttger. Copenhagen February 2 nd, Clustering of Biomedical Data

CBioVikings. Richard Röttger. Copenhagen February 2 nd, Clustering of Biomedical Data CBioVikings Copenhagen February 2 nd, Richard Röttger 1 Who is talking? 2 Resources Go to http://imada.sdu.dk/~roettger/teaching/cbiovikings.php You will find The dataset These slides An overview paper

More information

Fast Computation of Generalized Voronoi Diagrams Using Graphics Hardware

Fast Computation of Generalized Voronoi Diagrams Using Graphics Hardware Fast Computation of Generalized Voronoi Diagrams Using Graphics Hardware paper by Kennet E. Hoff et al. (University of North Carolina at Chapel Hill) presented by Daniel Emmenegger GDV-Seminar ETH Zürich,

More information

Overview of Clustering

Overview of Clustering based on Loïc Cerfs slides (UFMG) April 2017 UCBL LIRIS DM2L Example of applicative problem Student profiles Given the marks received by students for different courses, how to group the students so that

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression

More information

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017 CS 2750: Machine Learning Clustering Prof. Adriana Kovashka University of Pittsburgh January 17, 2017 What is clustering? Grouping items that belong together (i.e. have similar features) Unsupervised:

More information

Big-data Clustering: K-means vs K-indicators

Big-data Clustering: K-means vs K-indicators Big-data Clustering: K-means vs K-indicators Yin Zhang Dept. of Computational & Applied Math. Rice University, Houston, Texas, U.S.A. Joint work with Feiyu Chen & Taiping Zhang (CQU), Liwei Xu (UESTC)

More information

Clustering. RNA-seq: What is it good for? Finding Similarly Expressed Genes. Data... And Lots of It!

Clustering. RNA-seq: What is it good for? Finding Similarly Expressed Genes. Data... And Lots of It! RNA-seq: What is it good for? Clustering High-throughput RNA sequencing experiments (RNA-seq) offer the ability to measure simultaneously the expression level of thousands of genes in a single experiment!

More information

Clust Clus e t ring 2 Nov

Clust Clus e t ring 2 Nov Clustering 2 Nov 3 2008 HAC Algorithm Start t with all objects in their own cluster. Until there is only one cluster: Among the current clusters, determine the two clusters, c i and c j, that are most

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

Mixture models and clustering

Mixture models and clustering 1 Lecture topics: Miture models and clustering, k-means Distance and clustering Miture models and clustering We have so far used miture models as fleible ays of constructing probability models for prediction

More information

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI

More information

Introduction to Computer Science

Introduction to Computer Science DM534 Introduction to Computer Science Clustering and Feature Spaces Richard Roettger: About Me Computer Science (Technical University of Munich and thesis at the ICSI at the University of California at

More information

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Outline K-means, K-medoids, EM algorithm choosing number of clusters: Gap test hierarchical clustering spectral

More information

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016 Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

University of Florida CISE department Gator Engineering. Clustering Part 2

University of Florida CISE department Gator Engineering. Clustering Part 2 Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical

More information