Stat 321: Transposable Data Clustering

Size: px
Start display at page:

Download "Stat 321: Transposable Data Clustering"

Transcription

1 Stat 321: Transposable Data Clustering Art B. Owen Stanford Statistics Art B. Owen (Stanford Statistics) Clustering 1 / 27

2 Clustering Given n objects with d attributes, place them (the objects) into groups. Art B. Owen (Stanford Statistics) Clustering 2 / 27

3 Clustering Given n objects with d attributes, place them (the objects) into groups. A form of unsupervised learning. Unsupervised because there is no response. Art B. Owen (Stanford Statistics) Clustering 2 / 27

4 Clustering Given n objects with d attributes, place them (the objects) into groups. A form of unsupervised learning. Unsupervised because there is no response. Has a long history, at least as old as taxonomy. Art B. Owen (Stanford Statistics) Clustering 2 / 27

5 Clustering Given n objects with d attributes, place them (the objects) into groups. A form of unsupervised learning. Unsupervised because there is no response. Has a long history, at least as old as taxonomy. Raises all the vexing issues of an exploratory method. The curse of exploration. Art B. Owen (Stanford Statistics) Clustering 2 / 27

6 Clustering Given n objects with d attributes, place them (the objects) into groups. A form of unsupervised learning. Unsupervised because there is no response. Has a long history, at least as old as taxonomy. Raises all the vexing issues of an exploratory method. The curse of exploration. We ll look at it as a precursor to bi-clustering of objects and attributes. Art B. Owen (Stanford Statistics) Clustering 2 / 27

7 Clustering Given n points in R d Art B. Owen (Stanford Statistics) Clustering 3 / 27

8 Clustering Given n points in R d Do they clump together into k clusters? Art B. Owen (Stanford Statistics) Clustering 3 / 27

9 Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, Art B. Owen (Stanford Statistics) Clustering 3 / 27

10 Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, and the boundaries, Art B. Owen (Stanford Statistics) Clustering 3 / 27

11 Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, and the boundaries, and cluster identities? Art B. Owen (Stanford Statistics) Clustering 3 / 27

12 Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, and the boundaries, and cluster identities? Outcomes Art B. Owen (Stanford Statistics) Clustering 3 / 27

13 Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, and the boundaries, and cluster identities? Outcomes In the best case, a clustering can reveal the presence of a new categorical variable, e.g. types of diabetes. Art B. Owen (Stanford Statistics) Clustering 3 / 27

14 Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, and the boundaries, and cluster identities? Outcomes In the best case, a clustering can reveal the presence of a new categorical variable, e.g. types of diabetes. Other times there are no clusters, just a smear Art B. Owen (Stanford Statistics) Clustering 3 / 27

15 Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, and the boundaries, and cluster identities? Outcomes In the best case, a clustering can reveal the presence of a new categorical variable, e.g. types of diabetes. Other times there are no clusters, just a smear Or we find clusters but not their meanings. Art B. Owen (Stanford Statistics) Clustering 3 / 27

16 Clustering Given n points in R d Do they clump together into k clusters? If so, how to find the clusters, and the boundaries, and cluster identities? Outcomes In the best case, a clustering can reveal the presence of a new categorical variable, e.g. types of diabetes. Key idea Other times there are no clusters, just a smear Or we find clusters but not their meanings. Items within a cluster are more similar (less distant) to each other than items from different clusters. Art B. Owen (Stanford Statistics) Clustering 3 / 27

17 k-means Art B. Owen (Stanford Statistics) Clustering 4 / 27

18 k-means Algorithm 1 Pick k points z 1,..., z k R d, then repeat For i = 1,..., n put g(i) = min 1 j k x i z j 2 For j = 1,..., k put z j = avg{x i g(i) = j} Art B. Owen (Stanford Statistics) Clustering 4 / 27

19 k-means Algorithm 1 Pick k points z 1,..., z k R d, then repeat For i = 1,..., n put g(i) = min 1 j k x i z j 2 For j = 1,..., k put z j = avg{x i g(i) = j} Issues Handle averaging over empty set Pick stopping rule (it must converge, or at worst cycle) Answer depends on starting points Hard to pick k (at least 30 methods proposed by 1985) Art B. Owen (Stanford Statistics) Clustering 4 / 27

20 k-means Algorithm 1 Pick k points z 1,..., z k R d, then repeat For i = 1,..., n put g(i) = min 1 j k x i z j 2 For j = 1,..., k put z j = avg{x i g(i) = j} Issues Handle averaging over empty set Pick stopping rule (it must converge, or at worst cycle) Answer depends on starting points Hard to pick k (at least 30 methods proposed by 1985) Properties Usually rapid convergence An iteration can be done in O(nkd) time. No n 2 or d 2 or k 2. (k log(k) to sort negligible) Art B. Owen (Stanford Statistics) Clustering 4 / 27

21 k-means ctd There s a criterion Each step minimizes SS = n x i z g(i) 2 i=1 over it s free variables. Sadly, it is not convex. Art B. Owen (Stanford Statistics) Clustering 5 / 27

22 k-means ctd There s a criterion Each step minimizes SS = n x i z g(i) 2 i=1 over it s free variables. Sadly, it is not convex. Hartigan & Wong (1979) get solutions s.t. no g(i) change reduces SS Art B. Owen (Stanford Statistics) Clustering 5 / 27

23 k-means ctd There s a criterion Each step minimizes SS = n x i z g(i) 2 i=1 over it s free variables. Sadly, it is not convex. Hartigan & Wong (1979) get solutions s.t. no g(i) change reduces SS Maybe penalizing i<i Z i Z i 2 will work (Y. She, F. Bach) Art B. Owen (Stanford Statistics) Clustering 5 / 27

24 k-means ctd There s a criterion Each step minimizes SS = n x i z g(i) 2 i=1 over it s free variables. Sadly, it is not convex. Hartigan & Wong (1979) get solutions s.t. no g(i) change reduces SS Maybe penalizing i<i Z i Z i 2 will work (Y. She, F. Bach) x-means Pelleg and Moore Efficient lookups to get k into tens of thousands Picks k via AIC or BIC along the way Art B. Owen (Stanford Statistics) Clustering 5 / 27

25 k-means ctd There s a criterion Each step minimizes SS = n x i z g(i) 2 i=1 over it s free variables. Sadly, it is not convex. Hartigan & Wong (1979) get solutions s.t. no g(i) change reduces SS Maybe penalizing i<i Z i Z i 2 will work (Y. She, F. Bach) x-means Pelleg and Moore Efficient lookups to get k into tens of thousands Picks k via AIC or BIC along the way Defining true k problematic (galaxies in clusters in super-clusters) Art B. Owen (Stanford Statistics) Clustering 5 / 27

26 Variations Change dist Change the distance L 2 to L 1 to L p Changes mean to median to arg min Art B. Owen (Stanford Statistics) Clustering 6 / 27

27 Variations Change dist Change the distance L 2 to L 1 to L p Changes mean to median to arg min Change z k medoids Art B. Owen (Stanford Statistics) Clustering 6 / 27

28 Variations Change dist Change the distance L 2 to L 1 to L p Changes mean to median to arg min Change z k medoids Require z k = x i(k) for some i(k) Art B. Owen (Stanford Statistics) Clustering 6 / 27

29 Variations Change dist Change the distance L 2 to L 1 to L p Changes mean to median to arg min Change z k medoids Require z k = x i(k) for some i(k) Avoids using z with 2.3 kids, 30% pregnant, 10% male Art B. Owen (Stanford Statistics) Clustering 6 / 27

30 Variations Change dist Change the distance L 2 to L 1 to L p Changes mean to median to arg min Change z k medoids Require z k = x i(k) for some i(k) Avoids using z with 2.3 kids, 30% pregnant, 10% male PAM partitioning around medoids Art B. Owen (Stanford Statistics) Clustering 6 / 27

31 Variations Change dist Change the distance L 2 to L 1 to L p Changes mean to median to arg min Change z k medoids Require z k = x i(k) for some i(k) Avoids using z with 2.3 kids, 30% pregnant, 10% male PAM partitioning around medoids Minimize i min j D(x i, z j ) Art B. Owen (Stanford Statistics) Clustering 6 / 27

32 Variations Change dist Change the distance L 2 to L 1 to L p Changes mean to median to arg min Change z k medoids Require z k = x i(k) for some i(k) Avoids using z with 2.3 kids, 30% pregnant, 10% male PAM partitioning around medoids Minimize i min j D(x i, z j ) Slow. Uses only D ij values. Art B. Owen (Stanford Statistics) Clustering 6 / 27

33 What is a cluster? Defining issues Scale of variables matters Subset of variables matters too Do whales go with penguins or with elephants? Why does my breakfast cereal cluster with pure sugar? # Density bumps # mixture components Art B. Owen (Stanford Statistics) Clustering 7 / 27

34 What is a cluster? Defining issues Scale of variables matters Subset of variables matters too Do whales go with penguins or with elephants? Why does my breakfast cereal cluster with pure sugar? # Density bumps # mixture components Scaling data z ij = x ij m j s j m, s are mean & stdev (so z j (0, 1)) or min & range (so 0 z ij 1) or median & MAD (for robustness) NB: transformation defined column-wise [vs row-wise or simultaneous] Art B. Owen (Stanford Statistics) Clustering 7 / 27

35 Weighting Weight variables via w j 0 d ii = with d d w j x ij x i j = x ij x i j j=1 x ij = x ij w j j=1 Weighting and scaling are equivalent (for L p distances) Art B. Owen (Stanford Statistics) Clustering 8 / 27

36 Weighting Weight variables via w j 0 d ii = with d d w j x ij x i j = x ij x i j j=1 x ij = x ij w j j=1 Weighting and scaling are equivalent (for L p distances) Automatic scaling not always sensible. Variables in the same units should sometimes get the same scaling even if they have different variances. Art B. Owen (Stanford Statistics) Clustering 8 / 27

37 Distances n(n 1)/2 interpoint distances d ii = dist(x i, x i ) Usually 1 d ii 0 2 d ii = 0 3 d ii = d i i A metric distance has d ij d ik + d jk (Triangle inequality) A Euclidean distance d ij = z i z j for some points z i = z(x i ) An ultra-metric distance has d ij max(d ik, d jk ) (More later) Art B. Owen (Stanford Statistics) Clustering 9 / 27

38 Distances n(n 1)/2 interpoint distances d ii = dist(x i, x i ) Usually 1 d ii 0 2 d ii = 0 3 d ii = d i i A metric distance has d ij d ik + d jk (Triangle inequality) A Euclidean distance d ij = z i z j for some points z i = z(x i ) An ultra-metric distance has d ij max(d ik, d jk ) (More later) Other distances Canberra distance d x ij x i j j=1 x ij + x i j (with 0/0 = 0) Angular or cosine distance (dogs cats, wolves tigers) Art B. Owen (Stanford Statistics) Clustering 9 / 27

39 Similarities Opposite of distance: d S Eg S ii = 1 d ii or 1/d ii or d ii = S S ii Art B. Owen (Stanford Statistics) Clustering 10 / 27

40 Similarities Opposite of distance: d S Eg S ii = 1 d ii or 1/d ii or d ii = S S ii Correlation type similarities S ii = = = d j=1 x ijx i j d d j=1 x2 ij j=1 x2 i j d j=1 x ijx i j d d j=1 x2 ij j=1 x2 i j, or,, d j=1 (x ij x j ) 2 d or, d j=1 (x ij x j )(x i j x j ) j=1 (x i j x j ) 2, or, Art B. Owen (Stanford Statistics) Clustering 10 / 27

41 Similarities Some equalities are more equal than others 1 i and i are both Nobel laureates (unusually strong similarity) Art B. Owen (Stanford Statistics) Clustering 11 / 27

42 Similarities Some equalities are more equal than others 1 i and i are both Nobel laureates (unusually strong similarity) 2 i and i are both over 21 years old (mild similarity) Art B. Owen (Stanford Statistics) Clustering 11 / 27

43 Similarities Some equalities are more equal than others 1 i and i are both Nobel laureates (unusually strong similarity) 2 i and i are both over 21 years old (mild similarity) 3 i and i are both not Nobel laureates (barely similar at all) Art B. Owen (Stanford Statistics) Clustering 11 / 27

44 Similarities Some equalities are more equal than others 1 i and i are both Nobel laureates (unusually strong similarity) 2 i and i are both over 21 years old (mild similarity) 3 i and i are both not Nobel laureates (barely similar at all) We can handle 1 vs 2 by weighting the variables. Art B. Owen (Stanford Statistics) Clustering 11 / 27

45 Similarities Some equalities are more equal than others 1 i and i are both Nobel laureates (unusually strong similarity) 2 i and i are both over 21 years old (mild similarity) 3 i and i are both not Nobel laureates (barely similar at all) We can handle 1 vs 2 by weighting the variables. But 1 vs 3 is trickier (same variable). Art B. Owen (Stanford Statistics) Clustering 11 / 27

46 Binary similarity measures p features 2 2 table a b 0 c d We want to count a more than d Generic measure: α > 0, δ 0 S ii = αa + δd αa + b + c + δd (α, δ) and (α, δ ) give the same ranking if αδ = α δ Janowitz recommends Jaccard or Russel-Rao d = 1 S Specific measures a + d S ii = Simple matchin a + b + c + d a S ii = Jaccard-Tanimoto a + b + c = 1 when a + b + c = 0 a S ii = Russel-Rao a + b + c + d 2(a + d) S ii = Sokal-Sneath 2(a + d) + b + c a S ii = Sokal-Sneath II a + 2(b + c) Art B. Owen (Stanford Statistics) Clustering 12 / 27

47 Agglomerative clustering General Start with n clusters of one element each Art B. Owen (Stanford Statistics) Clustering 13 / 27

48 Agglomerative clustering General Start with n clusters of one element each Repeat 1 Find closest two clusters 2 Merge them into a new cluster Art B. Owen (Stanford Statistics) Clustering 13 / 27

49 Agglomerative clustering General Start with n clusters of one element each Repeat 1 Find closest two clusters 2 Merge them into a new cluster Until only one cluster remains Art B. Owen (Stanford Statistics) Clustering 13 / 27

50 Agglomerative clustering General Start with n clusters of one element each Repeat 1 Find closest two clusters 2 Merge them into a new cluster Until only one cluster remains We need point to cluster and cluster to cluster distances Art B. Owen (Stanford Statistics) Clustering 13 / 27

51 Flavors of agglomerative clustering Single linkage Get chaining ; friends of friends d(c 1, C 2 ) = min i C 1 min j C 2 d ij Art B. Owen (Stanford Statistics) Clustering 14 / 27

52 Flavors of agglomerative clustering Single linkage Get chaining ; friends of friends Complete linkage Get dense nearly spherical clusters d(c 1, C 2 ) = min i C 1 min j C 2 d ij d(c 1, C 2 ) = max i C 1 max j C 2 d ij Art B. Owen (Stanford Statistics) Clustering 14 / 27

53 Flavors of agglomerative clustering Single linkage Get chaining ; friends of friends Complete linkage Get dense nearly spherical clusters d(c 1, C 2 ) = min i C 1 min j C 2 d ij d(c 1, C 2 ) = max i C 1 max j C 2 d ij Average linkage Compromise d(c 1, C 2 ) = 1 1 C 1 C 2 d ij i C 1 j C 2 Art B. Owen (Stanford Statistics) Clustering 14 / 27

54 Example Bird data > dim(voeg) [1] > voeg[1:5,1:5] Heron Mallard Sparrowhawk Buzzard Kestrel means that place i has bird j Place names not present Art B. Owen (Stanford Statistics) Clustering 15 / 27

55 Complete linkage dendrogram Art B. Owen (Stanford Statistics) Clustering 16 / 27

56 Single linkage dendrogram Art B. Owen (Stanford Statistics) Clustering 17 / 27

57 Average linkage dendrogram Art B. Owen (Stanford Statistics) Clustering 18 / 27

58 Complete vs Average linkage dendrograms Art B. Owen (Stanford Statistics) Clustering 19 / 27

59 Heatmap Art B. Owen (Stanford Statistics) Clustering 20 / 27

60 Heatmap Art B. Owen (Stanford Statistics) Clustering 21 / 27

61 Heatmap Art B. Owen (Stanford Statistics) Clustering 22 / 27

62 Dendrogram details Ordering n points = n 1 splits = 2 n 1 orderings R puts tightest cluster on the left heatmap lets you control ordering somewhat Art B. Owen (Stanford Statistics) Clustering 23 / 27

63 Dendrogram details Ordering n points = n 1 splits = 2 n 1 orderings R puts tightest cluster on the left heatmap lets you control ordering somewhat Ultrametric H ij height at which clusters containing i and j merge It s an ultrametric: for i, j, k top two of H ij, H ik, H jk are equal Need H ij max(h ik, H jk ) Non-ultrametric measures yield dendrograms with reversals Art B. Owen (Stanford Statistics) Clustering 23 / 27

64 More hierarchical Centroid merging d(c 1, C 2 ) = X C1 X C2 Requires original points, not just distances Art B. Owen (Stanford Statistics) Clustering 24 / 27

65 More hierarchical Centroid merging d(c 1, C 2 ) = X C1 X C2 Requires original points, not just distances Ward s d(c 1, C 2 ) = 2 C 1 C 2 C 1 + C 2 X C1 X C2 Via within cluster SS (after before) Goal is to min j i C j X i X Cj 2 Art B. Owen (Stanford Statistics) Clustering 24 / 27

66 Costs of hierarchical splits It takes O(n 2 d) to get all pairwise distances We have to take O(n) steps Naive implementation would be O(n 3 d). Art B. Owen (Stanford Statistics) Clustering 25 / 27

67 Costs of hierarchical splits It takes O(n 2 d) to get all pairwise distances We have to take O(n) steps Naive implementation would be O(n 3 d). Lance-Williams family of methods Dist of i j to k α i d ki + α j d kj + βd ij + γ d ki d kj eg: α = 1/2, β = 0, γ = 1/2 for single linkage. α i can depend on n i. Updates lead to O(n 2 d) cost. Really clever data structures (F. Murtagh) avoid even n 2 log(n). Art B. Owen (Stanford Statistics) Clustering 25 / 27

68 Divisive clustering Recipe Start with one cluster of n objects Repeat 1 Select one cluster 2 Split it into two Until there are n clusters of size 1 Art B. Owen (Stanford Statistics) Clustering 26 / 27

69 Divisive clustering Recipe Start with one cluster of n objects Repeat 1 Select one cluster 2 Split it into two Until there are n clusters of size 1 Choices Which cluster to split. E.g. largest diameter arg max j max X i X i i,i C j How to split it. E.g. remove far point, X i goes with nearer of far point, XCj far Art B. Owen (Stanford Statistics) Clustering 26 / 27

70 Divisive clustering Recipe Start with one cluster of n objects Repeat 1 Select one cluster 2 Split it into two Until there are n clusters of size 1 Choices Which cluster to split. E.g. largest diameter arg max j max X i X i i,i C j How to split it. E.g. remove far point, X i goes with nearer of far point, XCj far Apparently no good speedup Art B. Owen (Stanford Statistics) Clustering 26 / 27

71 Optimization based clustering E.g. Ward s Recipe Measure quality of a cluster Scale est, like diameter, RMS width, median width Combine into quality of clustering Typically the sum (max plausible) (Attempt to) optimize Art B. Owen (Stanford Statistics) Clustering 27 / 27

72 Optimization based clustering E.g. Ward s Recipe Measure quality of a cluster Scale est, like diameter, RMS width, median width Combine into quality of clustering Typically the sum (max plausible) (Attempt to) optimize Isolation measures Split min i C,j C d ij Cut i C,j C d ij Art B. Owen (Stanford Statistics) Clustering 27 / 27

Cluster Analysis. Angela Montanari and Laura Anderlucci

Cluster Analysis. Angela Montanari and Laura Anderlucci Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

Hierarchical Clustering

Hierarchical Clustering What is clustering Partitioning of a data set into subsets. A cluster is a group of relatively homogeneous cases or observations Hierarchical Clustering Mikhail Dozmorov Fall 2016 2/61 What is clustering

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

Finding Clusters 1 / 60

Finding Clusters 1 / 60 Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-means Density Based Clustering, e.g. DBScan Grid Based Clustering 1 / 60

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning A review of clustering and other exploratory data analysis methods HST.951J: Medical Decision Support Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision

More information

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other

More information

Unsupervised Learning

Unsupervised Learning Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo 6.873/HST.951 Medical Decision

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:

More information

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ Clustering CS498 Today s lecture Clustering and unsupervised learning Hierarchical clustering K-means, K-medoids, VQ Unsupervised learning Supervised learning Use labeled data to do something smart What

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Clustering. Chapter 10 in Introduction to statistical learning

Clustering. Chapter 10 in Introduction to statistical learning Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What

More information

Hierarchical Clustering

Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits 0 0 0 00

More information

Measure of Distance. We wish to define the distance between two objects Distance metric between points:

Measure of Distance. We wish to define the distance between two objects Distance metric between points: Measure of Distance We wish to define the distance between two objects Distance metric between points: Euclidean distance (EUC) Manhattan distance (MAN) Pearson sample correlation (COR) Angle distance

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

ECS 234: Data Analysis: Clustering ECS 234

ECS 234: Data Analysis: Clustering ECS 234 : Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

3. Cluster analysis Overview

3. Cluster analysis Overview Université Laval Analyse multivariable - mars-avril 2008 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as

More information

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)

More information

Hierarchical clustering

Hierarchical clustering Hierarchical clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Description Produces a set of nested clusters organized as a hierarchical tree. Can be visualized

More information

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups

More information

Hierarchical Clustering

Hierarchical Clustering Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree-like diagram that records the sequences of merges

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

3. Cluster analysis Overview

3. Cluster analysis Overview Université Laval Multivariate analysis - February 2006 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

Hierarchical Clustering 4/5/17

Hierarchical Clustering 4/5/17 Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions. Direction

More information

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction Computational Methods for Data Analysis Massimo Poesio UNSUPERVISED LEARNING Clustering Unsupervised learning introduction 1 Supervised learning Training set: Unsupervised learning Training set: 2 Clustering

More information

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22 INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task

More information

Chapter VIII.3: Hierarchical Clustering

Chapter VIII.3: Hierarchical Clustering Chapter VIII.3: Hierarchical Clustering 1. Basic idea 1.1. Dendrograms 1.2. Agglomerative and divisive 2. Cluster distances 2.1. Single link 2.2. Complete link 2.3. Group average and Mean distance 2.4.

More information

Machine learning - HT Clustering

Machine learning - HT Clustering Machine learning - HT 2016 10. Clustering Varun Kanade University of Oxford March 4, 2016 Announcements Practical Next Week - No submission Final Exam: Pick up on Monday Material covered next week is not

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

Clustering Part 3. Hierarchical Clustering

Clustering Part 3. Hierarchical Clustering Clustering Part Dr Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Hierarchical Clustering Two main types: Agglomerative Start with the points

More information

Clustering. Lecture 6, 1/24/03 ECS289A

Clustering. Lecture 6, 1/24/03 ECS289A Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1

More information

Chapter 6 Continued: Partitioning Methods

Chapter 6 Continued: Partitioning Methods Chapter 6 Continued: Partitioning Methods Partitioning methods fix the number of clusters k and seek the best possible partition for that k. The goal is to choose the partition which gives the optimal

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Slides From Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining

More information

MSA220 - Statistical Learning for Big Data

MSA220 - Statistical Learning for Big Data MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups

More information

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Outline K-means, K-medoids, EM algorithm choosing number of clusters: Gap test hierarchical clustering spectral

More information

Based on Raymond J. Mooney s slides

Based on Raymond J. Mooney s slides Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit

More information

Foundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot

Foundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot Foundations of Machine Learning CentraleSupélec Fall 2017 12. Clustering Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning objectives

More information

University of Florida CISE department Gator Engineering. Clustering Part 2

University of Florida CISE department Gator Engineering. Clustering Part 2 Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection

More information

Distances, Clustering! Rafael Irizarry!

Distances, Clustering! Rafael Irizarry! Distances, Clustering! Rafael Irizarry! Heatmaps! Distance! Clustering organizes things that are close into groups! What does it mean for two genes to be close?! What does it mean for two samples to

More information

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole Cluster Analysis Summer School on Geocomputation 27 June 2011 2 July 2011 Vysoké Pole Lecture delivered by: doc. Mgr. Radoslav Harman, PhD. Faculty of Mathematics, Physics and Informatics Comenius University,

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1 Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group

More information

Cluster analysis. Agnieszka Nowak - Brzezinska

Cluster analysis. Agnieszka Nowak - Brzezinska Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that

More information

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Hierarchical Clustering Produces a set

More information

Data Exploration with PCA and Unsupervised Learning with Clustering Paul Rodriguez, PhD PACE SDSC

Data Exploration with PCA and Unsupervised Learning with Clustering Paul Rodriguez, PhD PACE SDSC Data Exploration with PCA and Unsupervised Learning with Clustering Paul Rodriguez, PhD PACE SDSC Clustering Idea Given a set of data can we find a natural grouping? Essential R commands: D =rnorm(12,0,1)

More information

Lesson 3. Prof. Enza Messina

Lesson 3. Prof. Enza Messina Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical

More information

Chapter DM:II. II. Cluster Analysis

Chapter DM:II. II. Cluster Analysis Chapter DM:II II. Cluster Analysis Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster Analysis DM:II-1

More information

Cluster Analysis: Agglomerate Hierarchical Clustering

Cluster Analysis: Agglomerate Hierarchical Clustering Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical

More information

Unsupervised learning, Clustering CS434

Unsupervised learning, Clustering CS434 Unsupervised learning, Clustering CS434 Unsupervised learning and pattern discovery So far, our data has been in this form: We will be looking at unlabeled data: x 11,x 21, x 31,, x 1 m x 12,x 22, x 32,,

More information

Multivariate Analysis

Multivariate Analysis Multivariate Analysis Cluster Analysis Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Unsupervised Learning Cluster Analysis Natural grouping Patterns in the data

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

4. Ad-hoc I: Hierarchical clustering

4. Ad-hoc I: Hierarchical clustering 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat Flat methods generate a single partition into k clusters. The number k of clusters has to be determined by the user ahead of time. Hierarchical

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

Lecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering

Lecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering CSE 291: Unsupervised learning Spring 2008 Lecture 5 Finding meaningful clusters in data So far we ve been in the vector quantization mindset, where we want to approximate a data set by a small number

More information

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan Working with Unlabeled Data Clustering Analysis Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan chanhl@mail.cgu.edu.tw Unsupervised learning Finding centers of similarity using

More information

Clustering: K-means and Kernel K-means

Clustering: K-means and Kernel K-means Clustering: K-means and Kernel K-means Piyush Rai Machine Learning (CS771A) Aug 31, 2016 Machine Learning (CS771A) Clustering: K-means and Kernel K-means 1 Clustering Usually an unsupervised learning problem

More information

Tree Models of Similarity and Association. Clustering and Classification Lecture 5

Tree Models of Similarity and Association. Clustering and Classification Lecture 5 Tree Models of Similarity and Association Clustering and Lecture 5 Today s Class Tree models. Hierarchical clustering methods. Fun with ultrametrics. 2 Preliminaries Today s lecture is based on the monograph

More information

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that: Text Clustering 1 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: Examples within a cluster are very similar Examples in different clusters are very different Discover

More information

Clustering part II 1

Clustering part II 1 Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

Lecture 15 Clustering. Oct

Lecture 15 Clustering. Oct Lecture 15 Clustering Oct 31 2008 Unsupervised learning and pattern discovery So far, our data has been in this form: x 11,x 21, x 31,, x 1 m y1 x 12 22 2 2 2,x, x 3,, x m y We will be looking at unlabeled

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler Overview What is clustering and its applications? Distance between two clusters. Hierarchical Agglomerative clustering.

More information

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2 So far in the course Clustering Subhransu Maji : Machine Learning 2 April 2015 7 April 2015 Supervised learning: learning with a teacher You had training data which was (feature, label) pairs and the goal

More information

Multivariate Methods

Multivariate Methods Multivariate Methods Cluster Analysis http://www.isrec.isb-sib.ch/~darlene/embnet/ Classification Historically, objects are classified into groups periodic table of the elements (chemistry) taxonomy (zoology,

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki Wagner Meira Jr. Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department

More information

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Time: 6:00pm 8:50pm Thu Location: AK 232 Fall 2016 High Dimensional Data v Given a cloud of data points we want to understand

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Clustering Lecture 3: Hierarchical Methods

Clustering Lecture 3: Hierarchical Methods Clustering Lecture 3: Hierarchical Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced

More information

Unsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis

Unsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis 7 Supervised learning vs unsupervised learning Unsupervised Learning Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute These patterns are then

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No 08 Cluster Analysis Naeem Ahmed Email: naeemmahoto@gmailcom Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Outline

More information

Unsupervised Learning. Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team

Unsupervised Learning. Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team Unsupervised Learning Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team Table of Contents 1)Clustering: Introduction and Basic Concepts 2)An Overview of Popular Clustering Methods 3)Other Unsupervised

More information

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

INF4820, Algorithms for AI and NLP: Hierarchical Clustering INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score

More information

Hierarchical Clustering / Dendrograms

Hierarchical Clustering / Dendrograms Chapter 445 Hierarchical Clustering / Dendrograms Introduction The agglomerative hierarchical clustering algorithms available in this program module build a cluster hierarchy that is commonly displayed

More information

Hierarchical and Ensemble Clustering

Hierarchical and Ensemble Clustering Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7., EA], [25.5, KPM], [Fred & Jain, 25] COMP24 Machine Learning Outline Introduction Cluster Distance Measures Agglomerative Algorithm Example

More information

Clustering Algorithms for general similarity measures

Clustering Algorithms for general similarity measures Types of general clustering methods Clustering Algorithms for general similarity measures general similarity measure: specified by object X object similarity matrix 1 constructive algorithms agglomerative

More information

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010 Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning

More information

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods Whatis Cluster Analysis? Clustering Types of Data in Cluster Analysis Clustering part II A Categorization of Major Clustering Methods Partitioning i Methods Hierarchical Methods Partitioning i i Algorithms:

More information

Segmentation Computer Vision Spring 2018, Lecture 27

Segmentation Computer Vision Spring 2018, Lecture 27 Segmentation http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 218, Lecture 27 Course announcements Homework 7 is due on Sunday 6 th. - Any questions about homework 7? - How many of you have

More information

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015 Clustering Subhransu Maji CMPSCI 689: Machine Learning 2 April 2015 7 April 2015 So far in the course Supervised learning: learning with a teacher You had training data which was (feature, label) pairs

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

Knowledge Discovery in Databases. Part III Clustering

Knowledge Discovery in Databases. Part III Clustering Knowledge Discovery in Databases Erich Schubert Winter Semester 7/8 Part III Clustering Clustering : / Overview Basic Concepts and Applications Distance & Similarity Clustering Approaches Overview Hierarchical

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering K-means and Hierarchical Clustering Xiaohui Xie University of California, Irvine K-means and Hierarchical Clustering p.1/18 Clustering Given n data points X = {x 1, x 2,, x n }. Clustering is the partitioning

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters Types of general clustering methods Clustering Algorithms for general similarity measures agglomerative versus divisive algorithms agglomerative = bottom-up build up clusters from single objects divisive

More information

Data Mining Algorithms

Data Mining Algorithms for the original version: -JörgSander and Martin Ester - Jiawei Han and Micheline Kamber Data Management and Exploration Prof. Dr. Thomas Seidl Data Mining Algorithms Lecture Course with Tutorials Wintersemester

More information

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM. Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)

More information