Clustering algorithms 6CCS3WSN-7CCSMWAL

Size: px
Start display at page:

Download "Clustering algorithms 6CCS3WSN-7CCSMWAL"

Transcription

1 Clustering algorithms 6CCS3WSN-7CCSMWAL

2 Contents Introduction: Types of clustering Hierarchical clustering Spatial clustering (k means etc) Community detection (next week)

3 What are we trying to cluster and why? What is the data?

4 What are we trying to cluster and why? What is the data? Vector (terms in documents)

5 What are we trying to cluster and why? What is the data? Vector (terms in documents) Graph based (who follows who in Twitter)

6 What are we trying to cluster and why? What is the data? Vector (terms in documents) Graph based (who follows who in Twitter) What do we want?

7 What are we trying to cluster and why? What is the data? Vector (terms in documents) Graph based (who follows who in Twitter) What do we want? Group together the similar items

8 What are we trying to cluster and why? What is the data? Vector (terms in documents) Graph based (who follows who in Twitter) What do we want? Group together the similar items Separate the items which are clearly different

9 How are we trying to cluster and why? We consider unsupervised techniques Various heuristics are possible

10 How are we trying to cluster and why? We consider unsupervised techniques Various heuristics are possible Vector data: Possible approaches?

11 How are we trying to cluster and why? We consider unsupervised techniques Various heuristics are possible Vector data: Possible approaches? Agglomerative or Divisive

12 How are we trying to cluster and why? We consider unsupervised techniques Various heuristics are possible Vector data: Possible approaches? Agglomerative or Divisive Agglomerative. Hierarchical Clustering: Group all data into a tree based on distance between data points

13 How are we trying to cluster and why? We consider unsupervised techniques Various heuristics are possible Vector data: Possible approaches? Agglomerative or Divisive Agglomerative. Hierarchical Clustering: Group all data into a tree based on distance between data points Divisive. Centroid: Split the data into a fixed number of regions based on distance to the regional centers

14 How are we trying to cluster and why? We consider unsupervised techniques Various heuristics are possible Vector data: Possible approaches? Agglomerative or Divisive Agglomerative. Hierarchical Clustering: Group all data into a tree based on distance between data points Divisive. Centroid: Split the data into a fixed number of regions based on distance to the regional centers Graph based data: Possible approaches?

15 How are we trying to cluster and why? We consider unsupervised techniques Various heuristics are possible Vector data: Possible approaches? Agglomerative or Divisive Agglomerative. Hierarchical Clustering: Group all data into a tree based on distance between data points Divisive. Centroid: Split the data into a fixed number of regions based on distance to the regional centers Graph based data: Possible approaches? Separate the graph into subgraphs based on communities

16 Theory and Practice The discussion proceeds by example It is best to try the techniques out for yourself in R

17 This is the content of file UKCITYDATA.txt Example: Major cities of UK Clustering data vectors. How to present data in a meaningful way How could we cluster these cities. If we choose geographic position (Latitude, Longitude) as our data how would you think they divide up? North West London Bristol Leeds Sheffield Bradford Manchester Liverpool Birmingham Glasgow Edinburgh Cardiff Belfast Newcastle

18 We look at two methods. Hierarchical clustering and k-means clustering. Both are based on distance between data points. but analyze and present the data in different ways.

19 Here is a picture of how things might be clustered

20 k means clustering West Cardiff Bristol London Liverpool Manchester Birmingham Bradford Sheffield Leeds Belfast Newcastle Glasgow Edinburgh North We made an arbitrary decision to choose 3 clusters

21 Hierarchical clustering Height Belfast Glasgow Edinburgh Birmingham Manchester Liverpool Newcastle Sheffield Leeds Bradford London Bristol Cardiff Height Cluster Dendrogram Belfast Glasgow Edinburgh Birmingham Manchester Liverpool Newcastle Sheffield Leeds Bradford London Bristol Cardiff d hclust (*, "ward.d") In the second figure we made an arbitrary decision to choose 3 clusters. How does it compare with the k means result?

22 R for this require(graphics) cdata = read.csv("ukcitydata.txt",header=t,row.names=1) cities <- as.matrix(cdata) #run hierarchical clustering using Wards method d=dist(cities) groups <- hclust(d,method="ward.d") #plot dendogram, use hang to ensure that labels fall below tree plot(groups, hang=-1) #cut into 3 subtrees (draw rectangles on plot) rect.hclust(groups,3) #k-means clustering colnames(cities) <- c("north", "West") cl <- kmeans(cities, 3) # make 3 clusters plot(cities, col = cl$cluster, xlim=c(50,58)) # plot clusters points(cl$centers, col = 1:2, pch = 8, cex = 2) # insert cluster centers text(cities, row.names(cities), cex=0.6, pos=4, col="blue") #label citie

23 Details: (Type cl in R to get k-means details) K-means clustering with 3 clusters of sizes 4, 3, 6 Cluster means: North West Clustering vector: London Bristol Leeds Sheffield Bradford Manchester Liverpool Birmingham Glasgow Edinburgh Cardiff Belfast Newcastle 3 Within cluster sum of squares by cluster: (between_ss / total_ss = 72.9 %)

24 Agglomerative Hierarchical Clustering (HAC) Need a measure of distance between data points Merge the two nearest clusters until there is a single cluster. The results are presented as a dendrogram showing hierarchy. Prune the dendrogram to give the required number of clusters. Distance: e.g. Euclidean distance d(a, b) = n (a i b i ) 2 = a b (1) i=1 The notation a b is standard for Euclidian distance. a, b vectors: a = (a 1, a 2,...a n ), b = (b 1, b 2,..., b n ) See clustering

25 Agglomerative Hierarchical Clustering: Detail (1) Assign each data point to its own (single member) cluster (2) Repeat steps 3 and 4 until you have a single cluster containing all data points (3) Find the pair of clusters that are closest to each other. Merge them to reduce the number of clusters by one (4) Compute distances between the new cluster and each of the old clusters From clustering

26 Distance between two clusters There are many methods. Three common ones are:

27 Distance between two clusters There are many methods. Three common ones are: Complete-linkage. For each pair of clusters A, B (or clusters and data points) calculate d(a, B) = max{d(x, y) : x A, y B}. Merge the two clusters for which d(a, B) is smallest.

28 Distance between two clusters There are many methods. Three common ones are: Complete-linkage. For each pair of clusters A, B (or clusters and data points) calculate d(a, B) = max{d(x, y) : x A, y B}. Merge the two clusters for which d(a, B) is smallest. Single-linkage clustering. For each pair of clusters A, B (or clusters and data points) calculate d(a, B) = min{d(x, y) : x A, y B}. Merge the two clusters for which d(a, B) is smallest.

29 Distance between two clusters There are many methods. Three common ones are: Complete-linkage. For each pair of clusters A, B (or clusters and data points) calculate d(a, B) = max{d(x, y) : x A, y B}. Merge the two clusters for which d(a, B) is smallest. Single-linkage clustering. For each pair of clusters A, B (or clusters and data points) calculate d(a, B) = min{d(x, y) : x A, y B}. Merge the two clusters for which d(a, B) is smallest. Ward s method. (Wards minimum variance method). Merge the two clusters which leads to the smallest increase in total within cluster variance. Intuitively the method tries to put together the two clusters whose means are closest.

30 Merge clusters Figure from Page 351, Chapter 17 of Introduction to IR book

31 Both Ward s method and Complete-linkage gave the same three groups in the dendrogram for UK cities, but Single-linkage gave a different answer. However the dendrogram of complete and Ward s look different. Ward s method is considered to give a nice flat clustering. Cluster Dendrogram Cluster Dendrogram Belfast Glasgow Edinburgh Newcastle Liverpool Birmingham Manchester Sheffield Leeds Bradford London Bristol Cardiff Belfast London Glasgow Edinburgh Newcastle Bristol Cardiff Birmingham Liverpool Manchester Sheffield Leeds Bradford Height Height d hclust (*, "complete") d hclust (*, "single")

32 Cophenetic distance The y-axis of the dendrogram (Height). The cophenetic distance between two observations that have been clustered is defined to be the intergroup dissimilarity at which the two observations are first combined into a single cluster. Cluster Dendrogram Belfast Glasgow Edinburgh Newcastle Liverpool Birmingham Manchester Sheffield Leeds Bradford London Bristol Cardiff Height d hclust (*, "complete")

33 Example We cluster the numbers 1, 2, 4, 8. If we ask for 3 clusters, hopefully they will be {1, 2}, {4}, {8}. We use method complete-linkage to merge clusters. Max Distance matrix: Clusters 1 0 C C C C8 The clusters with the smallest max distance are C1, C2. Merge these C12.

34 Max distance matrix. Distance from C12 to C4: max(d(1, 4), d(2, 4)) = d(1, 4) = 3 C12 C4 C8 C12 0 C4 3 0 C The clusters with the smallest max distance are C12, C4. Merge these C124. Max distance matrix: C124 C8 C124 0 C8 7 0

35 A plot of the dendrogram The clusters C1, C2 C12, C12, C4 C124, C124, C8 C1248 were merged at complete-linkage intercluster distances 1, 3, 7 This is recorded on the height axis. Cluster Dendrogram Height Exercise. Data 1, 2, 5, 9, 11. pd hclust (*, "complete")

36 k-means clustering Colour quantization: Reduce number of colours used Figures from Wikipedia

37 k-means clustering The number of clusters is an input to the algorithm, which then generates k centers and assigns each data point to nearest center.

38 k-means clustering The number of clusters is an input to the algorithm, which then generates k centers and assigns each data point to nearest center. The aim is to find some good clusters, but that is not always easy. How to define what we mean by good?

39 k-means clustering The number of clusters is an input to the algorithm, which then generates k centers and assigns each data point to nearest center. The aim is to find some good clusters, but that is not always easy. How to define what we mean by good? We want to partition the data points into k sets (the clusters) in such a way that we minimize the squared distance to the centers of the clusters. The center (or centroid µ) of a cluster is the average of the point positions.

40 k-means clustering If there are m points x 1,..., x m then µ = 1 m m x i. i=1 Typically the x i are vectors in which case µ is calculated component wise.

41 This is a wish list. In practice some starting centers are given. If not we generate some random ones. In either case the answer may not be exactly what we want. Assuming we do not have any starting centers: (1) Assign the data points (randomly) into k groups (2) Compute the centroid of each group (3) For each data point, compute the distance to each centroid. Assign the data point to the nearest centroid (4) If the clusters are unchanged then STOP, else go to step 2. If we use random starting centers, the final answer may vary

42 Example Divide 1, 2, 4, 5, 8, 9 into 3 clusters with starting centers 3, 6, data centers (1 2 4) (5) (8 9) assign to nearest ce 7/3 5 17/2 new centroid (1 2) (4 5) (8 9) assign to nearest ce 3/2 9/2 17/2 new centroid (1 2) (4 5) (8 9) assign to nearest ce 3/2 9/2 17/2 new centroid No change STOP

43 Ex1. What would have happened if we had broken the distance ties for 4 and 8 the other way in the first round? Ex2. Where do you think the cluster centers should be for the following set of points, for k = 2, 3? (1, 1), (1.5, 1.5), (2, 2), (2, 3), (3, 2), (3, 3) Check your answers by using them as the initial centers for the k-means algorithm.

44 Partitioning Around Medoids (PAM) Algorithm This is like k-means but the centers have to be part of the data set. The algorithm tries to find a k-partition of the n data points to minimize the dissimilarity F : F = n n d(i, j)z i,j, i=1 j=1 where z i,j = 1 if i, j in the same cluster and zero otherwise. The minimization is carried out subject to the constraint that all k clusters are non-empty. Obviously this is harder to do but makes more sense than k-means. Example: Divide 1, 2, 4, 5, 8, 9 into 3 clusters around medioids. Ans: (1, 2), (4, 5), (8, 9), either point in each cluster can act as a medioid.

45 Cities: Partitioning Around Medoids require(cluster) meds=pam(cities,3) clusplot(meds,labels=2)

46 Within cluster sum of squares (WCSS) The main limitation of the k-means method is that the solution found by the algorithm is often a local rather than global minimum. The algorithm can t improve things but the answer is not best possible. It is important to run the algorithm a number of times with different start centers and choose the result with the minimum WCSS. Keep running the algorithm until there is no significant improvement in WCSS. This is the reason for using random starting centers. For a given set of clusters S = (S 1,..., S k ), with centers (µ 1,..., µ k ) the within cluster sum of squares (WCSS) is defined as WCSS = k i=1 x S i x µ i 2. Here z 2 = z 2 i is squared Euclidian distance of z = (z 1,..., z n ).

47 k means: More detail > clus=kmeans(c(1,2,6),2) > clus K-means clustering with 2 clusters of sizes 2, 1 Cluster means: Clustering vector: (for data points (1,2,6) respectively) Within cluster sum of squares by cluster: Between_SS / total_ss = 96.4 % > clus$totss 14 > clus$betweenss 13.5

48 Total sum of squares (TSS) If there are m points x 1,..., x m then µ = 1 n m x i. For a given set of clusters S = (S 1,..., S k ) the within cluster sum of squares (WCSS) is defined as WCSS = i=1 k i=1 x S i x µ i 2. As usual z is squared Euclidian distance of z (see (??)). Mean of all data M = 1 k x i n TSS = i=1 n x M 2 i=1

49 Example: Explanation Divide 1, 2, 6 into 2 clusters Ans (1, 2) and 6

50 Example: Explanation Divide 1, 2, 6 into 2 clusters Ans (1, 2) and 6 Means (1 + 2)/2 = 1.5 and 6

51 Example: Explanation Divide 1, 2, 6 into 2 clusters Ans (1, 2) and 6 Means (1 + 2)/2 = 1.5 and 6 Overall mean M = ( )/3 = 3

52 Example: Explanation Divide 1, 2, 6 into 2 clusters Ans (1, 2) and 6 Means (1 + 2)/2 = 1.5 and 6 Overall mean M = ( )/3 = 3 WCSS = (1 1.5) 2 + (2 1.5) 2 + (6 6) 2 = 0.5

53 Example: Explanation Divide 1, 2, 6 into 2 clusters Ans (1, 2) and 6 Means (1 + 2)/2 = 1.5 and 6 Overall mean M = ( )/3 = 3 WCSS = (1 1.5) 2 + (2 1.5) 2 + (6 6) 2 = 0.5 TSS = (1 3) 2 + (2 3) 2 + (6 3) 2 = 14

54 Example: Explanation Divide 1, 2, 6 into 2 clusters Ans (1, 2) and 6 Means (1 + 2)/2 = 1.5 and 6 Overall mean M = ( )/3 = 3 WCSS = (1 1.5) 2 + (2 1.5) 2 + (6 6) 2 = 0.5 TSS = (1 3) 2 + (2 3) 2 + (6 3) 2 = 14 BCSS = TSS WCSS = 13.5

55 Example: Explanation Divide 1, 2, 6 into 2 clusters Ans (1, 2) and 6 Means (1 + 2)/2 = 1.5 and 6 Overall mean M = ( )/3 = 3 WCSS = (1 1.5) 2 + (2 1.5) 2 + (6 6) 2 = 0.5 TSS = (1 3) 2 + (2 3) 2 + (6 3) 2 = 14 BCSS = TSS WCSS = 13.5 BCSS/TSS = 13.5/14 = 96.4% This is a good fit because the BCSS (between cluster sum of squares) explains 96.4%of data variation, and the WCSS (within cluster sum of squares) was 3.5% of data variation. The data points are close to their cluster centers

56

57 FBook example: Social Network Clustering Analysis This analysis uses a dataset representing a random sample of U.S. high school students who had profiles on a well-known Social Network in from 2006 to From the top 500 words appearing across all pages, 36 words were chosen to represent five categories of interests, namely extracurricular activities, fashion, religion, romance, and antisocial behavior. The 36 words include terms such as football, sexy, kissed, bible, shopping, death, and drugs. The final dataset indicates, for each person, how many times each word appeared in the persons profile. The aim is to cluster the document corpus (FB pages) according to text content.

58 R program require(cluster) #raw.githubusercontent.com/brenden17/sklearnlab/master/facebook/snsdata.csv teens <- read.csv("snsdata.csv") #download from above and put in wkdir() apply(teens[5:40],2,sum) interests <- teens[5:40] # throw out columns 1--4 of data: gradyear gender age friends (on FBook) interests_z <- as.data.frame(lapply(interests, scale)) teen_clusters <- kmeans(interests_z, 5) teen_clusters$size #The cluster characterization can be obtained with pie charts: pie(colsums(interests[teen_clusters$cluster==1,]),cex=0.5) pie(colsums(interests[teen_clusters$cluster==2,]),cex=0.5) pie(colsums(interests[teen_clusters$cluster==3,]),cex=0.5) pie(colsums(interests[teen_clusters$cluster==4,]),cex=0.5) pie(colsums(interests[teen_clusters$cluster==5,]),cex=0.5)

59 The output >apply(teens[5:40],2,sum) basketball football soccer softball volleyball swimming cheerleading baseball tennis sports cute sex sexy hot kissed dance band marching music rock god church jesus bible hair dress blonde mall shopping clothes hollister abercrombie die death drunk drugs 1813 ========================================================== > teen_clusters$size [1]

60 The five clusters are presented as pie-charts. Its impossible to represent 36 dimensions (basketball,...,drugs) on a page otherwise The final answer are not fully reproducible (random start clusters used) The largest 5 segments are (in order within group Group 5 (5523 points) music shopping dance God hair Group 4 (22258 points) music God dance hair band Group 3 (1039 points) hair sex music kissed die Group 2 (594 points) baseball football basketball music rock Group 1 (586 points) sexy music hair dance cute

61 The plot of cluster 5 dance kissed sexysex hot cute sports tennis baseball cheerleading swimming volleyball softball band marching soccer football music rock god clothes basketball drugs drunk death die abercrombie hollister church shopping jesus bible hair dressblonde mall

62 The plots of cluster 4 dance sexy hot kissed sex cute sports tennis baseball cheerleading swimming volleyball softball soccer marching band football basketball music drugs drunk death die hollister abercrombie clothes rock shopping god church blonde dress jesus bible hair mall

63 The plots of the clusters 3 kissed hot sexy sex dance cute marching music band sports tennis baseball cheerleading swimming volleyball softball soccer football basketball rock drugs god church jesus bible death drunk die hair abercrombie hollister clothes dress blonde mall shopping

64 The plots of the clusters 2 cheerleading swimming volleyball softball soccer baseball football basketball tennis sports cute drugs drunk death die hollister abercrombie clothes shopping sex mall sexy hot kissed blonde dress dance hair band marching music rock god jesus bible church

65 The plots of the clusters 1 sex cute sexy sports tennis baseball cheerleading swimming volleyball softball soccer football hot kissed dance band marching clothes shopping basketball drugs drunk death die abercrombie hollister music rock god church jesus bible hair mall blonde dress

4. Ad-hoc I: Hierarchical clustering

4. Ad-hoc I: Hierarchical clustering 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat Flat methods generate a single partition into k clusters. The number k of clusters has to be determined by the user ahead of time. Hierarchical

More information

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction Computational Methods for Data Analysis Massimo Poesio UNSUPERVISED LEARNING Clustering Unsupervised learning introduction 1 Supervised learning Training set: Unsupervised learning Training set: 2 Clustering

More information

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other

More information

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010 Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,

More information

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22 INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster

More information

Hierarchical Clustering

Hierarchical Clustering Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree-like diagram that records the sequences of merges

More information

Lesson 3. Prof. Enza Messina

Lesson 3. Prof. Enza Messina Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical

More information

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan Working with Unlabeled Data Clustering Analysis Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan chanhl@mail.cgu.edu.tw Unsupervised learning Finding centers of similarity using

More information

Chapter 6: Cluster Analysis

Chapter 6: Cluster Analysis Chapter 6: Cluster Analysis The major goal of cluster analysis is to separate individual observations, or items, into groups, or clusters, on the basis of the values for the q variables measured on each

More information

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM. Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

Clustering. Chapter 10 in Introduction to statistical learning

Clustering. Chapter 10 in Introduction to statistical learning Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What

More information

Hierarchical Clustering Lecture 9

Hierarchical Clustering Lecture 9 Hierarchical Clustering Lecture 9 Marina Santini Acknowledgements Slides borrowed and adapted from: Data Mining by I. H. Witten, E. Frank and M. A. Hall 1 Lecture 9: Required Reading Witten et al. (2011:

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Hierarchical Clustering

Hierarchical Clustering What is clustering Partitioning of a data set into subsets. A cluster is a group of relatively homogeneous cases or observations Hierarchical Clustering Mikhail Dozmorov Fall 2016 2/61 What is clustering

More information

Distances, Clustering! Rafael Irizarry!

Distances, Clustering! Rafael Irizarry! Distances, Clustering! Rafael Irizarry! Heatmaps! Distance! Clustering organizes things that are close into groups! What does it mean for two genes to be close?! What does it mean for two samples to

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

Cluster Analysis for Microarray Data

Cluster Analysis for Microarray Data Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that

More information

11/2/2017 MIST.6060 Business Intelligence and Data Mining 1. Clustering. Two widely used distance metrics to measure the distance between two records

11/2/2017 MIST.6060 Business Intelligence and Data Mining 1. Clustering. Two widely used distance metrics to measure the distance between two records 11/2/2017 MIST.6060 Business Intelligence and Data Mining 1 An Example Clustering X 2 X 1 Objective of Clustering The objective of clustering is to group the data into clusters such that the records within

More information

Clustering part II 1

Clustering part II 1 Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:

More information

Finding Clusters 1 / 60

Finding Clusters 1 / 60 Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-means Density Based Clustering, e.g. DBScan Grid Based Clustering 1 / 60

More information

Cluster Analysis: Agglomerate Hierarchical Clustering

Cluster Analysis: Agglomerate Hierarchical Clustering Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical

More information

Hierarchical Clustering

Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits 0 0 0 00

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1 Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster

More information

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the

More information

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups

More information

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.

More information

CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING S7267 MAHINE LEARNING HIERARHIAL LUSTERING Ref: hengkai Li, Department of omputer Science and Engineering, University of Texas at Arlington (Slides courtesy of Vipin Kumar) Mingon Kang, Ph.D. omputer Science,

More information

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods Whatis Cluster Analysis? Clustering Types of Data in Cluster Analysis Clustering part II A Categorization of Major Clustering Methods Partitioning i Methods Hierarchical Methods Partitioning i i Algorithms:

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

Clustering Algorithms on Graphs Community Detection 6CCS3WSN-7CCSMWAL

Clustering Algorithms on Graphs Community Detection 6CCS3WSN-7CCSMWAL Clustering Algorithms on Graphs Community Detection 6CCS3WSN-7CCSMWAL Contents Zachary s famous example Community structure Modularity The Girvan-Newman edge betweenness algorithm In the beginning: Zachary

More information

Clustering. Unsupervised Learning

Clustering. Unsupervised Learning Clustering. Unsupervised Learning Maria-Florina Balcan 03/02/2016 Clustering, Informal Goals Goal: Automatically partition unlabeled data into groups of similar datapoints. Question: When and why would

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

Clustering Part 3. Hierarchical Clustering

Clustering Part 3. Hierarchical Clustering Clustering Part Dr Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Hierarchical Clustering Two main types: Agglomerative Start with the points

More information

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ Clustering CS498 Today s lecture Clustering and unsupervised learning Hierarchical clustering K-means, K-medoids, VQ Unsupervised learning Supervised learning Use labeled data to do something smart What

More information

Machine learning - HT Clustering

Machine learning - HT Clustering Machine learning - HT 2016 10. Clustering Varun Kanade University of Oxford March 4, 2016 Announcements Practical Next Week - No submission Final Exam: Pick up on Monday Material covered next week is not

More information

Tree Models of Similarity and Association. Clustering and Classification Lecture 5

Tree Models of Similarity and Association. Clustering and Classification Lecture 5 Tree Models of Similarity and Association Clustering and Lecture 5 Today s Class Tree models. Hierarchical clustering methods. Fun with ultrametrics. 2 Preliminaries Today s lecture is based on the monograph

More information

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)

More information

Clustering (COSC 416) Nazli Goharian. Document Clustering.

Clustering (COSC 416) Nazli Goharian. Document Clustering. Clustering (COSC 416) Nazli Goharian nazli@cs.georgetown.edu 1 Document Clustering. Cluster Hypothesis : By clustering, documents relevant to the same topics tend to be grouped together. C. J. van Rijsbergen,

More information

Clustering. Unsupervised Learning

Clustering. Unsupervised Learning Clustering. Unsupervised Learning Maria-Florina Balcan 04/06/2015 Reading: Chapter 14.3: Hastie, Tibshirani, Friedman. Additional resources: Center Based Clustering: A Foundational Perspective. Awasthi,

More information

A Review on Cluster Based Approach in Data Mining

A Review on Cluster Based Approach in Data Mining A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,

More information

Hierarchical clustering

Hierarchical clustering Hierarchical clustering Rebecca C. Steorts, Duke University STA 325, Chapter 10 ISL 1 / 63 Agenda K-means versus Hierarchical clustering Agglomerative vs divisive clustering Dendogram (tree) Hierarchical

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework

More information

Data Exploration with PCA and Unsupervised Learning with Clustering Paul Rodriguez, PhD PACE SDSC

Data Exploration with PCA and Unsupervised Learning with Clustering Paul Rodriguez, PhD PACE SDSC Data Exploration with PCA and Unsupervised Learning with Clustering Paul Rodriguez, PhD PACE SDSC Clustering Idea Given a set of data can we find a natural grouping? Essential R commands: D =rnorm(12,0,1)

More information

Clustering Algorithms for general similarity measures

Clustering Algorithms for general similarity measures Types of general clustering methods Clustering Algorithms for general similarity measures general similarity measure: specified by object X object similarity matrix 1 constructive algorithms agglomerative

More information

Clustering in Data Mining

Clustering in Data Mining Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,

More information

Clustering: K-means and Kernel K-means

Clustering: K-means and Kernel K-means Clustering: K-means and Kernel K-means Piyush Rai Machine Learning (CS771A) Aug 31, 2016 Machine Learning (CS771A) Clustering: K-means and Kernel K-means 1 Clustering Usually an unsupervised learning problem

More information

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 23 November 2018 Overview Unsupervised Machine Learning overview Association

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Unsupervised Learning Hierarchical Methods

Unsupervised Learning Hierarchical Methods Unsupervised Learning Hierarchical Methods Road Map. Basic Concepts 2. BIRCH 3. ROCK The Principle Group data objects into a tree of clusters Hierarchical methods can be Agglomerative: bottom-up approach

More information

Cluster Analysis. Angela Montanari and Laura Anderlucci

Cluster Analysis. Angela Montanari and Laura Anderlucci Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a

More information

Computing with large data sets

Computing with large data sets Computing with large data sets Richard Bonneau, spring 2009 Lecture 8(week 5): clustering 1 clustering Clustering: a diverse methods for discovering groupings in unlabeled data Because these methods don

More information

Clustering. Unsupervised Learning

Clustering. Unsupervised Learning Clustering. Unsupervised Learning Maria-Florina Balcan 11/05/2018 Clustering, Informal Goals Goal: Automatically partition unlabeled data into groups of similar datapoints. Question: When and why would

More information

21 The Singular Value Decomposition; Clustering

21 The Singular Value Decomposition; Clustering The Singular Value Decomposition; Clustering 125 21 The Singular Value Decomposition; Clustering The Singular Value Decomposition (SVD) [and its Application to PCA] Problems: Computing X > X takes (nd

More information

Lecture 4 Hierarchical clustering

Lecture 4 Hierarchical clustering CSE : Unsupervised learning Spring 00 Lecture Hierarchical clustering. Multiple levels of granularity So far we ve talked about the k-center, k-means, and k-medoid problems, all of which involve pre-specifying

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #14: Clustering Seoul National University 1 In This Lecture Learn the motivation, applications, and goal of clustering Understand the basic methods of clustering (bottom-up

More information

2. Find the smallest element of the dissimilarity matrix. If this is D lm then fuse groups l and m.

2. Find the smallest element of the dissimilarity matrix. If this is D lm then fuse groups l and m. Cluster Analysis The main aim of cluster analysis is to find a group structure for all the cases in a sample of data such that all those which are in a particular group (cluster) are relatively similar

More information

Hierarchical and Ensemble Clustering

Hierarchical and Ensemble Clustering Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7., EA], [25.5, KPM], [Fred & Jain, 25] COMP24 Machine Learning Outline Introduction Cluster Distance Measures Agglomerative Algorithm Example

More information

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole Cluster Analysis Summer School on Geocomputation 27 June 2011 2 July 2011 Vysoké Pole Lecture delivered by: doc. Mgr. Radoslav Harman, PhD. Faculty of Mathematics, Physics and Informatics Comenius University,

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI

More information

Cluster analysis. Agnieszka Nowak - Brzezinska

Cluster analysis. Agnieszka Nowak - Brzezinska Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that

More information

Multivariate Analysis

Multivariate Analysis Multivariate Analysis Cluster Analysis Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Unsupervised Learning Cluster Analysis Natural grouping Patterns in the data

More information

UNSUPERVISED LEARNING IN R. Introduction to hierarchical clustering

UNSUPERVISED LEARNING IN R. Introduction to hierarchical clustering UNSUPERVISED LEARNING IN R Introduction to hierarchical clustering Hierarchical clustering Number of clusters is not known ahead of time Two kinds: bottom-up and top-down, this course bottom-up Hierarchical

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 16

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 16 CS434a/541a: Pattern Recognition Prof. Olga Veksler Lecture 16 Today Continue Clustering Last Time Flat Clustring Today Hierarchical Clustering Divisive Agglomerative Applications of Clustering Hierarchical

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Chapter VIII.3: Hierarchical Clustering

Chapter VIII.3: Hierarchical Clustering Chapter VIII.3: Hierarchical Clustering 1. Basic idea 1.1. Dendrograms 1.2. Agglomerative and divisive 2. Cluster distances 2.1. Single link 2.2. Complete link 2.3. Group average and Mean distance 2.4.

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

Hierarchical clustering

Hierarchical clustering Aprendizagem Automática Hierarchical clustering Ludwig Krippahl Hierarchical clustering Summary Hierarchical Clustering Agglomerative Clustering Divisive Clustering Clustering Features 1 Aprendizagem Automática

More information

Hierarchical clustering

Hierarchical clustering Hierarchical clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Description Produces a set of nested clusters organized as a hierarchical tree. Can be visualized

More information

Clustering Algorithms. Margareta Ackerman

Clustering Algorithms. Margareta Ackerman Clustering Algorithms Margareta Ackerman A sea of algorithms As we discussed last class, there are MANY clustering algorithms, and new ones are proposed all the time. They are very different from each

More information

University of Florida CISE department Gator Engineering. Clustering Part 2

University of Florida CISE department Gator Engineering. Clustering Part 2 Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical

More information

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering Clustering Algorithms Contents K-means Hierarchical algorithms Linkage functions Vector quantization SOM Clustering Formulation

More information

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters Types of general clustering methods Clustering Algorithms for general similarity measures agglomerative versus divisive algorithms agglomerative = bottom-up build up clusters from single objects divisive

More information

Clustering Lecture 3: Hierarchical Methods

Clustering Lecture 3: Hierarchical Methods Clustering Lecture 3: Hierarchical Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced

More information

http://www.xkcd.com/233/ Text Clustering David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Administrative 2 nd status reports Paper review

More information

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:

More information

Hierarchical Clustering 4/5/17

Hierarchical Clustering 4/5/17 Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions. Direction

More information

Administrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES

Administrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning BANANAS APPLES Administrative Machine learning: Unsupervised learning" Assignment 5 out soon David Kauchak cs311 Spring 2013 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Machine

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No 08 Cluster Analysis Naeem Ahmed Email: naeemmahoto@gmailcom Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Outline

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Outline K-means, K-medoids, EM algorithm choosing number of clusters: Gap test hierarchical clustering spectral

More information

What is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology

What is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology Clustering Unsupervised learning Generating classes Distance/similarity measures Agglomerative methods Divisive methods Data Clustering 1 What is Clustering? Form o unsupervised learning - no inormation

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

Data Mining Algorithms

Data Mining Algorithms for the original version: -JörgSander and Martin Ester - Jiawei Han and Micheline Kamber Data Management and Exploration Prof. Dr. Thomas Seidl Data Mining Algorithms Lecture Course with Tutorials Wintersemester

More information

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

INF4820, Algorithms for AI and NLP: Hierarchical Clustering INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score

More information

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction CSE 255 Lecture 5 Data Mining and Predictive Analytics Dimensionality Reduction Course outline Week 4: I ll cover homework 1, and get started on Recommender Systems Week 5: I ll cover homework 2 (at the

More information

MATH5745 Multivariate Methods Lecture 13

MATH5745 Multivariate Methods Lecture 13 MATH5745 Multivariate Methods Lecture 13 April 24, 2018 MATH5745 Multivariate Methods Lecture 13 April 24, 2018 1 / 33 Cluster analysis. Example: Fisher iris data Fisher (1936) 1 iris data consists of

More information

Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler Overview What is clustering and its applications? Distance between two clusters. Hierarchical Agglomerative clustering.

More information

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,

More information

Clust Clus e t ring 2 Nov

Clust Clus e t ring 2 Nov Clustering 2 Nov 3 2008 HAC Algorithm Start t with all objects in their own cluster. Until there is only one cluster: Among the current clusters, determine the two clusters, c i and c j, that are most

More information

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Hierarchical Clustering Produces a set

More information