Hierarchical clustering

Similar documents
Hierarchical Clustering

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

Data Mining Concepts & Techniques

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Clustering Part 3. Hierarchical Clustering

CSE 347/447: DATA MINING

Hierarchical Clustering

CS7267 MACHINE LEARNING

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

Clustering Lecture 3: Hierarchical Methods

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

Cluster analysis. Agnieszka Nowak - Brzezinska

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Statistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.

Cluster Analysis. Ying Shen, SSE, Tongji University

Lecture Notes for Chapter 8. Introduction to Data Mining

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

DATA MINING - 1DL105, 1Dl111. An introductory class in data mining

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

Clustering CS 550: Machine Learning

CSE 5243 INTRO. TO DATA MINING

Unsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis

Knowledge Discovery in Databases

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

University of Florida CISE department Gator Engineering. Clustering Part 2

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

HW4 VINH NGUYEN. Q1 (6 points). Chapter 8 Exercise 20

Clustering fundamentals

Hierarchical Clustering Lecture 9

Cluster Analysis. Angela Montanari and Laura Anderlucci

Hierarchical clustering

COMP20008 Elements of Data Processing. Outlier Detection and Clustering

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Unsupervised Learning

Cluster Analysis: Agglomerate Hierarchical Clustering

Finding Clusters 1 / 60

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

What is Cluster Analysis?

Chapter VIII.3: Hierarchical Clustering

Online Social Networks and Media. Community detection

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Hierarchical Clustering

Gene Clustering & Classification

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Lesson 3. Prof. Enza Messina

Chapter 6: Cluster Analysis

CS570: Introduction to Data Mining

Cluster Analysis for Microarray Data

Machine Learning (BSMC-GA 4439) Wenke Liu

Clustering part II 1

Hierarchical Clustering 4/5/17

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Machine Learning 15/04/2015. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis

Hierarchy. No or very little supervision Some heuristic quality guidances on the quality of the hierarchy. Jian Pei: CMPT 459/741 Clustering (2) 1

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

Unsupervised Learning : Clustering

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

4. Ad-hoc I: Hierarchical clustering

MATH5745 Multivariate Methods Lecture 13

Clustering: K-means and Kernel K-means

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

CHAPTER 4: CLUSTER ANALYSIS

Lecture 7: Segmentation. Thursday, Sept 20

3. Cluster analysis Overview

3. Cluster analysis Overview

Hierarchical clustering

Data Mining and Analysis: Fundamental Concepts and Algorithms

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

Unsupervised Learning and Clustering

Information Retrieval and Web Search Engines

Data Mining. Cluster Analysis: Basic Concepts and Algorithms

Machine Learning: Symbol-based

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Hierarchical Clustering

Machine Learning (BSMC-GA 4439) Wenke Liu

University of Florida CISE department Gator Engineering. Clustering Part 4

Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

Hierarchical and Ensemble Clustering

Clustering Part 4 DBSCAN

Unsupervised Learning Hierarchical Methods

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

DS504/CS586: Big Data Analytics Big Data Clustering II

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

Clustering: Overview and K-means algorithm

Unsupervised Learning and Clustering

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Clustering Part 2. A Partitional Clustering

Market basket analysis

Transcription:

Hierarchical clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1

Description Produces a set of nested clusters organized as a hierarchical tree. Can be visualized as a dendrogram: A tree like diagram that records the sequences of merges or splits. 2 / 1

Produces a set of nested clusters organized as a Hierarchical hierarchical clustering tree Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits A clustering and its dendrogram. Tan,Steinbach, Kumar Introduction to 4/18/2004 5 3 / 1

Strengths Do not have to assume any particular number of clusters. Each horizontal cut of the tree yields a clustering. The tree may correspond to a meaningful taxonomy: (e.g., animal kingdom, phylogeny reconstruction,...) Need only a similarity or distance matrix for implementation. 4 / 1

Agglomerative Start with the points as individual clusters. At each step, merge the closest pair of clusters until only one cluster (or some fixed number k clusters) remain. 5 / 1

Divisive Start with one, all-inclusive cluster. At each step, split a cluster until each cluster contains a point (or there are k clusters). 6 / 1

Agglomerative Clustering Algorithm 1 Compute the proximity matrix. 2 Let each data point be a cluster. 3 While there is more than one cluster: 1 Merge the two closest clusters. 2 Update the proximity matrix. The major difference is the computation of proximity of two clusters. 7 / 1

Starting point for agglomerative clustering. 8 / 1

Intermediate Situation After some merging steps, we have some clusters C1 C2 C3 C4 C5 C1 C2 C1 C3 C4 C3 C4 C5 Proximity Matrix C2 C5 Intermediate point, with 5 clusters. Tan,Steinbach, Kumar Introduction to 4/18/2004 10 9 / 1

We will merge C2 and C5. 10 / 1

Hierarchical clustering After Merging The question is How do we update the proximity matrix? C1 C2 U C5 C3 C4 C1? C1 C3 C4 C2 U C5???? C3? C4? Proximity Matrix C2 U C5 How do we update proximity matrix? Tan,Steinbach, Kumar Introduction to 4/18/2004 12 11 / 1

How to Define Inter-Cluster Si Similarity? p1 p2 p3 We need a notion of similarity between clusters. MIN MAX Group Average p4 p5... 12 / 1

How to Define Inter-Cluster Si p1 p2 p3 Single linkage uses the minimum distance. MIN MAX Group Average p4 p5... 13 / 1

How to Define Inter-Cluster Si p1 p2 p3 Complete linkage uses the maximum distance. MIN MAX Group Average p4 p5... 14 / 1

How to Define Inter-Cluster Sim p1 p2 p3 Group average linkage uses the average distance between groups. MIN MAX Group Average p4 p5... 15 / 1

How to Define Inter-Cluster Sim p1 p2 p4 Centroid uses the distance between the centroids of the clusters p5 (presumes one can compute centroids...). MIN MAX Group Average p3.. 16 / 1

Hierarchical clustering Similarity of two clusters is based on the two most similar (closest) points in the different clusters Determined by one pair of points, i.e., by one link in the proximity graph. 1 2 3 4 5 Proximity matrix and dendrogram of single linkage. Tan,Steinbach, Kumar Introduction to 4/18/2004 18 17 / 1

Distance matrix for nested clusterings 1 0.24 0.22 0.37 0.34 0.23 2 0.15 0.20 0.14 0.25 3 0.15 0.28 0.11 4 0.29 0.22 5 0.39 6 18 / 1

Hierarchical Clustering: MIN 3 5 1 5 2 2 1 3 6 4 4 Nested cluster representation and dendrogram of single linkage. Nested Clusters Dendrogram Tan,Steinbach, Kumar Introduction to 4/18/2004 19 19 / 1

Single linkage Can handle irregularly shaped regions fairly naturally. Sensitive to noise and outliers in the form of chaining. 20 / 1

The Iris data (single linkage) 21 / 1

The Iris data (single linkage) 22 / 1

Hierarchical clustering Similarity of two clusters is based on the two least similar (most distant) points in the different clusters Determined by all pairs of points in the two clusters 1 2 3 4 5 Proximity matrix and dendrogram of complete linkage. Tan,Steinbach, Kumar Introduction to 4/18/2004 22 23 / 1

Hierarchical Clustering: MAX 5 4 1 2 5 2 3 6 3 1 4 Nested Nested cluster Clusters and dendrogram ofdendrogram complete linkage. Tan,Steinbach, Kumar Introduction to 4/18/2004 23 24 / 1

Complete linkage Less sensitive to noise and outliers than single linkage. Regions are generally compact, but may violate closeness. That is, points may much closer to some points in neighbouring cluster than its own cluster. This manifests itself as breaking large clusters. Clusters are biased to be globular. 25 / 1

The Iris data (complete linkage) 26 / 1

The Iris data (complete linkage) 27 / 1

The Iris data (complete linkage) 28 / 1

Hierarchical Clustering: Group Average 5 5 2 2 4 1 4 3 1 3 6 Nested cluster and dendrogram of group average linkage. Nested Clusters Dendrogram Tan,Steinbach, Kumar Introduction to 4/18/2004 27 29 / 1

Average linkage Given two elements of the partition C r, C s, we might consider d GA (C r, C s ) = 1 C r C s x C r,y C s d(x, y) A compromise between single and complete linkage. Shares globular clusters of complete, less sensitive than single. 30 / 1

The Iris data (average linkage) 31 / 1

The Iris data (average linkage) 32 / 1

Ward s linkage Similarity of two clusters is based on the increase in squared error when two clusters are merged. Similar to average if dissimilarity between points is distance squared. Hence, it shares many properties of average linkage. A hierarchical analogue of K-means. Sometimes used to initialize K-means. 33 / 1

The Iris data (Ward s linkage) 34 / 1

The Iris data (Ward s linkage) 35 / 1

NCI data (complete linkage) 36 / 1

NCI data (single linkage) 37 / 1

NCI data (average linkage) 38 / 1

NCI data (Ward s linkage) 39 / 1

Computational issues O(n 2 ) space since it uses the proximity matrix. O(n 3 ) time in many cases as there are N steps, and at each step a matrix of size N 2 must be updated and/or searched. 40 / 1

Statistical issues Once a decision is made to combine two clusters, it cannot be undone. No objective function is directly minimized. Different schemes have problems with one or more of the following: Sensitivity to noise and outliers. Difficulty handling different sized clusters and convex shapes. Breaking large cluster. 41 / 1

42 / 1