Hierarchical clustering

Similar documents
Statistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Exploratory data analysis for microarrays

Chapter 6: Cluster Analysis

Clustering CS 550: Machine Learning

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Hierarchical Clustering

Clustering Part 3. Hierarchical Clustering

Cluster Analysis for Microarray Data

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Hierarchical Clustering

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

May 1, CODY, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. May 3, Class HW SVM, PCA, and K-means, Bishop Ch

Multivariate Analysis

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Hierarchical clustering

Statistical Methods for Data Mining

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Unsupervised Learning

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

CS7267 MACHINE LEARNING

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

Network Traffic Measurements and Analysis

Clustering Algorithms for general similarity measures

Hierarchical Clustering

Cluster analysis. Agnieszka Nowak - Brzezinska

UNSUPERVISED LEARNING IN R. Introduction to hierarchical clustering

Cluster Analysis. Ying Shen, SSE, Tongji University

Lecture 4 Hierarchical clustering

Multivariate Methods

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters

4. Ad-hoc I: Hierarchical clustering

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Unsupervised Learning and Clustering

CSE 5243 INTRO. TO DATA MINING

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

10701 Machine Learning. Clustering

Cluster Analysis. Angela Montanari and Laura Anderlucci

Cluster Analysis: Agglomerate Hierarchical Clustering

Machine learning - HT Clustering

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

CSE 5243 INTRO. TO DATA MINING

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Unsupervised Learning and Clustering

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Unsupervised learning, Clustering CS434

Machine Learning (BSMC-GA 4439) Wenke Liu

Clustering Lecture 3: Hierarchical Methods

Clustering analysis of gene expression data

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Clustering and Visualisation of Data

Distances, Clustering! Rafael Irizarry!

Lecture 12: Unsupervised learning

Clustering. Chapter 10 in Introduction to statistical learning

MSA220 - Statistical Learning for Big Data

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Finding Clusters 1 / 60

Dimension reduction : PCA and Clustering

Gene Clustering & Classification

Foundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot

Data Mining Concepts & Techniques

Mining Significant Graph Patterns by Leap Search

Clustering: K-means and Kernel K-means

Hierarchical Clustering Lecture 9

CHAPTER 4: CLUSTER ANALYSIS

Hierarchical Clustering

Hierarchical Clustering 4/5/17

Lecture 25: Review I

CSE 347/447: DATA MINING

Market basket analysis

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering part II 1

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Data Processing and Analysis in Systems Medicine. Milena Kraus Data Management for Digital Health Summer 2017

Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

Hierarchical Clustering / Dendrograms

Lesson 3. Prof. Enza Messina

Machine Learning for OR & FE

Comparison of Linear Regression with K-Nearest Neighbors

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 16

Machine Learning (BSMC-GA 4439) Wenke Liu

HIERARCHICAL clustering analysis (or HCA) is an

Unsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis

Tree Models of Similarity and Association. Clustering and Classification Lecture 5

Ch.5 Classification and Clustering. In machine learning, there are two main types of learning problems, supervised and unsupervised learning.

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

Chapter 1. Using the Cluster Analysis. Background Information

SGN (4 cr) Chapter 11

Stats Overview Ji Zhu, Michigan Statistics 1. Overview. Ji Zhu 445C West Hall

Applied Multivariate Analysis

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

21 The Singular Value Decomposition; Clustering

Exploratory data analysis for microarrays

Transcription:

Hierarchical clustering Rebecca C. Steorts, Duke University STA 325, Chapter 10 ISL 1 / 63

Agenda K-means versus Hierarchical clustering Agglomerative vs divisive clustering Dendogram (tree) Hierarchical clustering algorithm Single, Complete, and Average linkage Application to genomic (PCA versus Hierarchical clustering) 2 / 63

From K-means to Hierarchical clustering Recall two properties of K-means clustering: 1. It fits exactly K clusters (as specified) 2. Final clustering assignment depends on the chosen initial cluster centers Assume pairwise dissimilarites d ij between data points. Hierarchical clustering produces a consistent result, without the need to choose initial starting positions (number of clusters). Catch: choose a way to measure the dissimilarity between groups, called the linkage Given the linkage, hierarchical clustering produces a sequence of clustering assignments. At one end, all points are in their own cluster, at the other end, all points are in one cluster 3 / 63

Agglomerative vs divisive clustering Agglomerative (i.e., bottom-up): Start with all points in their own group Until there is only one cluster, repeatedly: merge the two groups that have the smallest dissimilarity Divisive (i.e., top-down): Start with all points in one cluster Until all points are in their own cluster, repeatedly: split the group into two resulting in the biggest dissimilarity Agglomerative strategies are simpler, we ll focus on them. Divisive methods are still important, but you can read about these on your own if you want to learn more. 4 / 63

Simple example Given these data points, an agglomerative algorithm might decide on a clustering sequence as follows: 1 Step 1: {1}, {2}, {3}, {4}, {5}, {6}, {7}; Dimension 2 0.2 0.4 0.6 0.8 6 2 3 4 7 5 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Dimension 1 Step 2: {1}, {2, 3}, {4}, {5}, {6}, {7}; Step 3: {1, 7}, {2, 3}, {4}, {5}, {6}; Step 4: {1, 7}, {2, 3}, {4, 5}, {6}; Step 5: {1, 7}, {2, 3, 6}, {4, 5}; Step 6: {1, 7}, {2, 3, 4, 5, 6}; Step 7: {1, 2, 3, 4, 5, 6, 7}. 5 / 63

Algorithm 1. Place each data point into its own singleton group. 2. Repeat: iteratively merge the two closest groups 3. Until: all the data are merged into a single cluster 6 / 63

Example 7 / 63

Iteration 2 8 / 63

Iteration 3 9 / 63

Iteration 4 10 / 63

Iteration 5 11 / 63

Iteration 6 12 / 63

Iteration 7 13 / 63

Iteration 8 14 / 63

Iteration 9 15 / 63

Iteration 10 16 / 63

Iteration 11 17 / 63

Iteration 12 18 / 63

Iteration 13 19 / 63

Iteration 14 20 / 63

Iteration 15 21 / 63

Iteration 16 22 / 63

Iteration 17 23 / 63

Iteration 18 24 / 63

Iteration 19 25 / 63

Iteration 20 26 / 63

Iteration 21 27 / 63

Iteration 22 28 / 63

Iteration 23 29 / 63

Iteration 24 30 / 63

Clustering Suppose you are using the above algorithm to cluster the data points in groups. How do you know when to stop? How should we compare the data points? Let s investigate this further! 31 / 63

Agglomerative clustering Each level of the resulting tree is a segmentation of the data The algorithm results in a sequence of groupings It is up to the user to choose a natural clustering from this sequence 32 / 63

Dendogram We can also represent the sequence of clustering assignments as a dendrogram: Dimension 2 0.2 0.4 0.6 0.8 6 2 3 1 7 5 Height 0.1 0.2 0.3 0.4 0.5 0.6 0.7 4 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 7 4 5 6 2 3 Dimension 1 Note that cutting the dendrogram horizontally partitions the data points into clusters 33 / 63

Dendogram Agglomerative clustering is monotonic The similarity between merged clusters is monotone decreasing with the level of the merge. Dendrogram: Plot each merge at the (negative) similarity between the two merged groups Provides an interpretable visualization of the algorithm and data Useful summarization tool, part of why hierarchical clustering is popular 34 / 63

Group similarity Given a distance similarity measure (say, Eucliclean) between points, the user has many choices on how to define intergroup similarity. 1. Single linkage: the similiarity of the closest pair d SL (G, H) = min i G,j H d i,j 2. Complete linkage: the similarity of the furthest pair d CL (G, H) = max i G,j H d i,j 3. Group-average: the average similarity between groups d GA = 1 d i,j N G N H i G j H 35 / 63

Single Linkage In single linkage (i.e., nearest-neighbor linkage), the dissimilarity between G, H is the smallest dissimilarity between two points in opposite groups: d single (G, H) = min i G, j H d ij Example (dissimilarities d ij are distances, groups are marked by colors): single linkage score d single (G, H) is the distance of the closest pair 2 1 0 1 2 2 1 0 1 2 36 / 63

Single Linkage Example Here n = 60, X i R 2, d ij = X i X j 2. Cutting the tree at h = 0.9 gives the clustering assignments marked by colors 2 1 0 1 2 3 2 1 0 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0 Height Cut interpretation: for each point X i, there is another point X j in its cluster with d ij 0.9 37 / 63

Complete Linkage In complete linkage (i.e., furthest-neighbor linkage), dissimilarity between G, H is the largest dissimilarity between two points in opposite groups: d complete (G, H) = max i G, j H d ij Example (dissimilarities d ij are distances, groups are marked by colors): complete linkage score d complete (G, H) is the distance of the furthest pair 2 1 0 1 2 2 1 0 1 2 38 / 63

Complete Linkage Example Same data as before. Cutting the tree at h = 5 gives the clustering assignments marked by colors 2 1 0 1 2 3 2 1 0 1 2 3 0 1 2 3 4 5 6 Height Cut interpretation: for each point X i, every other point X j in its cluster satisfies d ij 5 39 / 63

Average Linkage In average linkage, the dissimilarity between G, H is the average dissimilarity over all points in opposite groups: d average (G, H) = 1 n G n H i G, j H d ij Example (dissimilarities d ij are distances, groups are marked by colors): average linkage score d average (G, H) is the average distance across all pairs (Plot here only shows distances between the blue points and one red point) 2 1 0 1 2 2 1 0 1 2 40 / 63

Average linkage example Same data as before. Cutting the tree at h = 2.5 gives clustering assignments marked by the colors 2 1 0 1 2 3 2 1 0 1 2 3 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Height Cut interpretation: there really isn t a good one! 41 / 63

Properties of intergroup similarity Single linkage can produce chaining, where a sequence of close observations in different groups cause early merges of those groups Complete linkage has the opposite problem. It might not merge close groups because of outlier members that are far apart. Group average represents a natural compromise, but depends on the scale of the similarities. Applying a monotone transformation to the similarities can change the results. 42 / 63

Things to consider Hierarchical clustering should be treated with caution. Different decisions about group similarities can lead to vastly different dendrograms. The algorithm imposes a hierarchical structure on the data, even data for which such structure is not appropriate. 43 / 63

Application on genomic data Unsupervised methods are often used in the analysis of genomic data. PCA and hierarchical clustering are very common tools. We will explore both on a genomic data set. We illustrate these methods on the NCI60 cancer cell line microarray data, which consists of 6,830 gene expression measurements on 64 cancer cell lines. 44 / 63

Application on genomic data library(islr) nci.labs <- NCI60$labs nci.data <- NCI60$data Each cell line is labeled with a cancer type. We do not make use of the cancer types in performing PCA and clustering, as these are unsupervised techniques. After performing PCA and clustering, we will check to see the extent to which these cancer types agree with the results of these unsupervised techniques. 45 / 63

Exploring the data dim(nci.data) ## [1] 64 6830 # cancer types for the cell lines nci.labs[1:4] ## [1] "CNS" "CNS" "CNS" "RENAL" table(nci.labs) ## nci.labs ## BREAST CNS COLON K562A-repro K562B-repro LEUKEMIA ## 7 5 7 1 1 6 ## MCF7A-repro MCF7D-repro MELANOMA OVARIAN PROSTATE ## 1 1 8 9 6 2 ## RENAL UNKNOWN ## 9 1 46 / 63

PCA pr.out <- prcomp(nci.data, scale=true) 47 / 63

PCA We now plot the first few principal component score vectors, in order to visualize the data. First, we create a simple function that assigns a distinct color to each element of a numeric vector. The function will be used to assign a color to each of the 64 cell lines, based on the cancer type to which it corresponds. 48 / 63

Simple color function #Input: positive integer, vector #Output: vector containing that #number of distinct colors Cols=function(vec){ cols<-rainbow(length(unique(vec))) return(cols[as.numeric(as.factor(vec))]) } 49 / 63

PCA Z2 60 40 20 0 20 Z3 40 20 0 20 40 40 20 0 20 40 60 Z1 40 20 0 20 40 60 Z1 On the whole, cell lines corresponding to a single cancer type do tend to have similar values on the first few PC score vectors. This indicates that cell lines from the same cancer type tend to have 50 / 63

Proportion of Variance Explained summary(pr.out) ## Importance of components%s: ## PC1 PC2 PC3 PC4 PC5 ## Standard deviation 27.8535 21.48136 19.82046 17.03256 15.97181 ## Proportion of Variance 0.1136 0.06756 0.05752 0.04248 0.03735 ## Cumulative Proportion 0.1136 0.18115 0.23867 0.28115 0.31850 ## PC6 PC7 PC8 PC9 PC10 ## Standard deviation 15.72108 14.47145 13.54427 13.14400 12.73860 ## Proportion of Variance 0.03619 0.03066 0.02686 0.02529 0.02376 ## Cumulative Proportion 0.35468 0.38534 0.41220 0.43750 0.46126 ## PC11 PC12 PC13 PC14 PC15 ## Standard deviation 12.68672 12.15769 11.83019 11.62554 11.43779 ## Proportion of Variance 0.02357 0.02164 0.02049 0.01979 0.01915 ## Cumulative Proportion 0.48482 0.50646 0.52695 0.54674 0.56590 ## PC16 PC17 PC18 PC19 PC20 ## Standard deviation 11.00051 10.65666 10.48880 10.43518 10.3219 ## Proportion of Variance 0.01772 0.01663 0.01611 0.01594 0.0156 ## Cumulative Proportion 0.58361 0.60024 0.61635 0.63229 0.6479 ## PC21 PC22 PC23 PC24 PC25 PC26 ## Standard deviation 10.14608 10.0544 9.90265 9.64766 9.50764 9.33253 ## Proportion of Variance 0.01507 0.0148 0.01436 0.01363 0.01324 0.01275 ## Cumulative Proportion 0.66296 0.6778 0.69212 0.70575 0.71899 0.73174 ## PC27 PC28 PC29 PC30 PC31 PC32 ## Standard deviation 9.27320 9.0900 8.98117 8.75003 8.59962 8.44738 ## Proportion of Variance 0.01259 0.0121 0.01181 0.01121 0.01083 0.01045 ## Cumulative Proportion 0.74433 0.7564 0.76824 0.77945 0.79027 0.80072 ## PC33 PC34 PC35 PC36 PC37 PC38 ## Standard deviation 8.37305 8.21579 8.15731 7.97465 7.90446 7.82127 ## Proportion of Variance 0.01026 0.00988 0.00974 0.00931 0.00915 0.00896 ## Cumulative Proportion 0.81099 0.82087 0.83061 0.83992 0.84907 0.85803 ## PC39 PC40 PC41 PC42 PC43 PC44 ## Standard deviation 7.72156 7.58603 7.45619 7.3444 7.10449 7.0131 ## Proportion of Variance 0.00873 0.00843 0.00814 0.0079 0.00739 0.0072 51 / 63

Proportion of Variance Explained plot(pr.out) pr.out Variances 0 200 400 600 52 / 63

PCA PVE 0 2 4 6 8 10 Cumulative PVE 20 40 60 80 100 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Principal Component Principal Component The proportion of variance explained (PVE) of the principal components of the NCI60 cancer cell line microarray data set. Left: the PVE of each principal component is shown. Right: the cumulative PVE of the principal components is shown. Together, all principal components explain 100 % of the variance. 53 / 63

Conclusions from the Scree Plot We see that together, the first seven principal components explain around 40% of the variance in the data. This is not a huge amount of the variance. Looking at the scree plot, we see that while each of the first seven principal components explain a substantial amount of variance, there is a marked decrease in the variance explained by further principal components. That is, there is an elbow in the plot after approximately the seventh principal component. This suggests that there may be little benefit to examining more than seven or so principal components (though even examining seven principal components may be difficult). 54 / 63

Hierarchical Clustering to NCI60 Data We now proceed to hierarchically cluster the cell lines in the NCI60 data, with the goal of finding out whether or not the observations cluster into distinct types of cancer. To begin, we standardize the variables to have mean zero and standard deviation one. As mentioned earlier, this step is optional and should be performed only if we want each gene to be on the same scale. sd.data=scale(nci.data) 55 / 63

Hierarchical Clustering to NCI60 Data Complete Linkage 40 60 80 100 120 140 160 UNKNOWN OVARIAN BREAST BREAST BREAST BREAST CNS CNS RENAL BREAST RENAL MELANOMA OVARIAN OVARIAN OVARIAN COLON COLON OVARIAN PROSTATE PROSTATE MELANOMA RENAL RENAL RENAL OVARIAN CNS CNS CNS RENAL RENAL RENAL RENAL MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA MCF7A repro BREAST MCF7D repro LEUKEMIA LEUKEMIA LEUKEMIA K562B repro K562A repro COLON COLON COLON COLON COLON BREAST LEUKEMIA LEUKEMIA LEUKEMIA 56 / 63

Hierarchical Clustering to NCI60 Data Average Linkage 40 60 80 100 120 LEUKEMIA K562B repro K562A repro MCF7A repro BREAST MCF7D repro BREAST BREAST LEUKEMIA LEUKEMIA LEUKEMIA LEUKEMIA LEUKEMIA RENAL BREAST BREAST COLON COLON COLON RENAL MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA OVARIAN OVARIAN OVARIAN UNKNOWN OVARIAN MELANOMA CNS CNS CNS RENAL RENAL RENAL RENAL RENAL RENAL RENAL PROSTATE OVARIAN PROSTATE COLON COLON OVARIAN COLON COLON CNS CNS BREAST BREAST 57 / 63

Hierarchical Clustering to NCI60 Data Single Linkage 40 50 60 70 80 90 100 110 LEUKEMIA K562B repro K562A repro BREAST BREAST MCF7A repro BREAST MCF7D repro UNKNOWN OVARIAN LEUKEMIA LEUKEMIA LEUKEMIA RENAL BREAST CNS LEUKEMIA LEUKEMIA OVARIAN CNS BREAST OVARIAN COLON BREAST MELANOMA RENAL MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA BREAST OVARIAN COLON PROSTATE MELANOMA COLON OVARIAN RENAL COLON PROSTATE COLON OVARIAN COLON COLON RENAL RENAL RENAL RENAL RENAL RENAL CNS CNS CNS We see that the choice of linkage certainly does affect the results obtained. 58 / 63

Hierarchical Clustering to NCI60 Data Typically, single linkage will tend to yield trailing clusters: very large clusters onto which individual observations attach one-by-one. On the other hand, complete and average linkage tend to yield more balanced, attractive clusters. For this reason, complete and average linkage are generally preferred to single linkage. 59 / 63

Complete linkage We will use complete linkage hierarchical clustering for the analysis that follows. We can cut the dendrogram at the height that will yield a particular number of clusters, say four. hc.out=hclust(dist(sd.data)) hc.clusters=cutree(hc.out,4) table(hc.clusters,nci.labs) ## nci.labs ## hc.clusters BREAST CNS COLON K562A-repro K562B-repro LEUKEMIA MCF7A-repro ## 1 2 3 2 0 0 0 0 ## 2 3 2 0 0 0 0 0 ## 3 0 0 0 1 1 6 0 ## 4 2 0 5 0 0 0 1 ## nci.labs ## hc.clusters MCF7D-repro MELANOMA OVARIAN PROSTATE RENAL UNKNOWN ## 1 0 8 8 6 2 8 1 ## 2 0 0 1 0 0 1 0 ## 3 0 0 0 0 0 0 0 ## 4 1 0 0 0 0 0 0 60 / 63

Complete linkage All the leukemia cell lines fall in cluster 3, while the breast cancer cell lines are spread out over three different clusters. We can plot the cut on the dendrogram that produces these four clusters. This is the height that results in four clusters. (It is easy to verify that the resulting clusters are the same as the ones we obtained using cutree(hc.out,4)) Cluster Dendrogram Height 40 60 80 100 120 140 160 BREAST BREAST CNS CNS RENAL BREAST RENAL MELANOMA OVARIAN OVARIAN OVARIAN COLON COLON OVARIAN PROSTATE PROSTATE MELANOMA RENAL RENAL RENAL OVARIAN UNKNOWN OVARIAN CNS CNS CNS RENAL RENAL RENAL RENAL MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA EAST EAST COLON COLON COLON COLON COLON BREAST 7A repro REAST repro LEUKEMIA LEUKEMIA LEUKEMIA LEUKEMIA B repro A repro LEUKEMIA LEUKEMIA 61 / 63

K-means versus complete linkage? How do these NCI60 hierarchical clustering results compare to what we get if we perform K-means clustering with K = 4? set.seed (2) km.out <- kmeans(sd.data, 4, nstart=20) km.clusters <- km.out$cluster table(km.clusters, hc.clusters ) ## hc.clusters ## km.clusters 1 2 3 4 ## 1 11 0 0 9 ## 2 0 0 8 0 ## 3 9 0 0 0 ## 4 20 7 0 0 62 / 63

K-means versus complete linkage? We see that the four clusters obtained using hierarchical clustering and K-means clustering are somewhat different. Cluster 2 in K-means clustering is identical to cluster 3 in hierarchical clustering. However, the other clusters differ: for instance, cluster 4 in K-means clustering contains a portion of the observations assigned to cluster 1 by hierarchical clustering, as well as all of the observations assigned to cluster 2 by hierarchical clustering. 63 / 63