CS7267 MACHINE LEARNING

Similar documents
Hierarchical Clustering

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

Data Mining Concepts & Techniques

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Clustering Part 3. Hierarchical Clustering

CSE 347/447: DATA MINING

Hierarchical Clustering

Hierarchical clustering

CSE 5243 INTRO. TO DATA MINING

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

CSE 5243 INTRO. TO DATA MINING

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

Introduction to Clustering

Clustering Lecture 3: Hierarchical Methods

Cluster analysis. Agnieszka Nowak - Brzezinska

DATA MINING - 1DL105, 1Dl111. An introductory class in data mining

Lecture Notes for Chapter 8. Introduction to Data Mining

Clustering CS 550: Machine Learning

Unsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

CSE 5243 INTRO. TO DATA MINING

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Cluster Analysis. Ying Shen, SSE, Tongji University

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Knowledge Discovery in Databases

Statistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

What is Cluster Analysis?

Unsupervised Learning : Clustering

Clustering fundamentals

COMP20008 Elements of Data Processing. Outlier Detection and Clustering

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

University of Florida CISE department Gator Engineering. Clustering Part 2

Unsupervised Learning

Online Social Networks and Media. Community detection

Cluster Analysis: Agglomerate Hierarchical Clustering

Cluster Analysis: Basic Concepts and Algorithms

Gene Clustering & Classification

Machine Learning 15/04/2015. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

CS570: Introduction to Data Mining

Hierarchy. No or very little supervision Some heuristic quality guidances on the quality of the hierarchy. Jian Pei: CMPT 459/741 Clustering (2) 1

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

HW4 VINH NGUYEN. Q1 (6 points). Chapter 8 Exercise 20

Unsupervised Learning and Clustering

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Clustering Basic Concepts and Algorithms 1

Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Hierarchical Clustering Lecture 9

Cluster Analysis. Angela Montanari and Laura Anderlucci

CHAPTER 4: CLUSTER ANALYSIS

Hierarchical clustering

Information Retrieval and Web Search Engines

Cluster Analysis for Microarray Data

Unsupervised Learning and Clustering

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering Part 4 DBSCAN

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Clustering in Data Mining

Clustering part II 1

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

ECLT 5810 Clustering

Data Mining. Cluster Analysis: Basic Concepts and Algorithms

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

University of Florida CISE department Gator Engineering. Clustering Part 4

Clustering Results. Result List Example. Clustering Results. Information Retrieval

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Clustering Tips and Tricks in 45 minutes (maybe more :)

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Cluster Analysis: Basic Concepts and Algorithms

Information Retrieval and Web Search Engines

Hierarchical Clustering

ECLT 5810 Clustering

Clustering Algorithms for general similarity measures

Chapter 6: Cluster Analysis

Hierarchical clustering

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

Clustering. Bruno Martins. 1 st Semester 2012/2013

Clustering: Overview and K-means algorithm

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

4. Ad-hoc I: Hierarchical clustering

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

Machine Learning (BSMC-GA 4439) Wenke Liu

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

数据挖掘 Introduction to Data Mining

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Based on Raymond J. Mooney s slides

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Transcription:

S7267 MAHINE LEARNING HIERARHIAL LUSTERING Ref: hengkai Li, Department of omputer Science and Engineering, University of Texas at Arlington (Slides courtesy of Vipin Kumar) Mingon Kang, Ph.D. omputer Science, Kennesaw State University

What is luster Analysis? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups

What is not luster Analysis? Supervised classification Have class label information Simple segmentation Dividing students into different registration groups alphabetically, by last name Results of a query Groupings are a result of an external specification

Notion of a luster can be Ambiguous

Types of lustering A clustering is a set of clusters Important distinction between hierarchical and partitional sets of clusters Partitional lustering A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset Hierarchical clustering A set of nested clusters organized as a hierarchical tree

Partitional lustering

Hierarchical lustering

Types of lusters enter-based clusters ontiguous clusters

Types of lusters: enter-based enter-based A cluster is a set of objects such that an object in a cluster is closer (more similar) to the center of a cluster, than to the center of any other cluster The center of a cluster is often a centroid, the average of all the points in the cluster, or a medoid, the most representative point of a cluster

Types of lusters: ontiguity-based ontiguous luster (Nearest neighbor or Transitive) A cluster is a set of points such that a point in a cluster is closer (or more similar) to one or more other points in the cluster than to any point not in the cluster.

Similarity and Dissimilarity Similarity Numerical measure of how alike two data objects are. Is higher when objects are more alike. Often falls in the range [0,1] Dissimilarity Numerical measure of how different are two data objects Lower when objects are more alike Minimum dissimilarity is often 0 Upper limit varies Proximity refers to a similarity or dissimilarity

Similarity/Dissimilarity for Simple Attributes p and q are the attribute values for two data objects.

Hierarchical lustering Produces a set of nested clusters organized as a hierarchical tree an be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits 0.2 0.15 0.1 6 4 3 4 2 5 2 5 0.05 0 1 3 2 5 4 6 3 1 1

Strengths of Hierarchical lustering Do not have to assume any particular number of clusters Any desired number of clusters can be obtained by cutting the dendogram at the proper level They may correspond to meaningful taxonomies Example in biological sciences (e.g., animal kingdom, phylogeny reconstruction, )

Hierarchical lustering Two main types of hierarchical clustering Agglomerative: Start with the points as individual clusters At each step, merge the closest pair of clusters until only one cluster (or k clusters) left Divisive: Start with one, all-inclusive cluster At each step, split a cluster until each cluster contains a point (or there are k clusters)

Agglomerative lustering Algorithm More popular hierarchical clustering technique Basic algorithm is straightforward 1. ompute the proximity matrix 2. Let each data point be a cluster 3. Repeat 4. Merge the two closest clusters 5. Update the proximity matrix 6. Until only a single cluster remains Key operation is the computation of the proximity of two clusters Different approaches to defining the distance between clusters distinguish the different algorithms

Starting Situation Start with clusters of individual points and a proximity matrix p1 p2 p3 p4 p5.. p1 p2 p3 p4 p5.... Proximity Matrix... p1 p2 p3 p4 p9 p10 p11 p12

Intermediate Situation After some merging steps, we have some clusters.... p1 p2 p3 p4 p9 p10 p11 p12 1 4 2 5 3 2 1 1 3 5 4 2 3 4 5 Proximity Matrix

Intermediate Situation We want to merge the two closest clusters (2 and 5) and update the proximity matrix. 1 2 3 4 5 1 2 3 5 4 1 2 3 4 5 Proximity Matrix... p1 p2 p3 p4 p9 p10 p11 p12

After Merging The question is: How do we update the proximity matrix? i.e. How do we measure proximity (distance, similarity) between two clusters? 1 3 4 1 2 U 5 3 4 1 2 U 5??????? 3 4 2 U 5... p1 p2 p3 p4 p9 p10 p11 p12

How to Define Inter-luster Similarity Similarity? MIN (Single Linkage) MAX (omplete Linkage) Group Average Distance Between entroids Other methods driven by an objective function Ward s Method uses squared error

How to Define Inter-luster Similarity MIN (Single Linkage) MAX (omplete Linkage) Group Average Distance Between entroids

How to Define Inter-luster Similarity MIN (Single Linkage) MAX (omplete Linkage) Group Average Distance Between entroids

How to Define Inter-luster Similarity MIN (Single Linkage) MAX (omplete Linkage) Group Average Distance Between entroids

How to Define Inter-luster Similarity MIN (Single Linkage) MAX (omplete Linkage) Group Average Distance Between entroids

luster Similarity: MIN or Single Link Similarity of two clusters is based on the two most similar (closest) points in the different clusters Determined by one pair of points, i.e., by one link in the proximity graph. I1 I2 I3 I4 I5 I1 1.00 0.90 0.10 0.65 0.20 I2 0.90 1.00 0.70 0.60 0.50 I3 0.10 0.70 1.00 0.40 0.30 I4 0.65 0.60 0.40 1.00 0.80 I5 0.20 0.50 0.30 0.80 1.00 1 2 3 4 5

Hierarchical lustering: MIN 3 5 1 5 2 2 3 1 6 0.2 0.15 0.1 4 4 0.05 0 3 6 2 5 4 1 Nested lusters Dendrogram

Similarity vs. Distance Similarity Matrix Distance Matrix I1 I2 I3 I4 I5 I1 1.00 0.90 0.10 0.65 0.20 I2 0.90 1.00 0.70 0.60 0.50 I3 0.10 0.70 1.00 0.40 0.30 I4 0.65 0.60 0.40 1.00 0.80 I5 0.20 0.50 0.30 0.80 1.00 I1 I2 I3 I4 I5 I1 0.00 0.20 1.80 0.70 1.60 I2 0.20 0.00 0.60 0.80 1.00 I3 1.80 0.60 0.00 1.20 1.40 I4 0.70 0.80 1.20 0.00 0.40 I5 1.60 1.00 1.40 0.40 0.00

Strength of MIN Original Points Two lusters an handle non-elliptical shapes

Limitations of MIN Original Points Two lusters Sensitive to noise and outliers

luster Similarity: MAX or omplete Linkage Similarity of two clusters is based on the two least similar (most distant) points in the different clusters Determined by all pairs of points in the two clusters I1 I2 I3 I4 I5 I1 1.00 0.90 0.10 0.65 0.20 I2 0.90 1.00 0.70 0.60 0.50 I3 0.10 0.70 1.00 0.40 0.30 I4 0.65 0.60 0.40 1.00 0.80 I5 0.20 0.50 0.30 0.80 1.00 1 2 3 4 5

Hierarchical lustering: MAX 4 1 5 2 5 2 3 6 3 1 4 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 3 6 4 1 2 5 Nested lusters Dendrogram

Strength of MAX Original Points Two lusters Less susceptible to noise and outliers

Limitations of MAX Original Points Two lusters Tends to break large clusters Biased towards globular clusters

luster Similarity: Group Average Proximity of two clusters is the average of pairwise proximity between points in the two clusters. pi luster i p luster proximity(p,p j j proximity(luster i, luster j) luster luster i i j j ) Need to use average connectivity for scalability since total proximity favors large clusters I1 I2 I3 I4 I5 I1 1.00 0.90 0.10 0.65 0.20 0.4 0.625 0.35 I2 0.90 1.00 0.70 0.60 0.50 I3 0.10.40.70 1.00 0.40 0.30 I4 0.65 0.625 0.60 0.40 1.00 0.80 I5 0.20 0.350.50 0.30 0.80 1.00 1 2 3 4 5

Hierarchical lustering: Group Average 5 2 2 1 5 0.25 0.2 0.15 4 3 4 3 1 6 0.1 0.05 0 3 6 4 2 5 1 Nested lusters Dendrogram

Hierarchical lustering: Group Average ompromise between Single and omplete Link Strengths Less susceptible to noise and outliers Limitations Biased towards globular clusters

Hierarchical lustering: omparison 5 2 3 2 4 4 5 1 1 3 6 MIN MAX 5 4 1 2 5 2 3 6 3 1 4 5 2 2 1 5 4 3 4 3 1 6 Group Average

Hierarchical lustering: Time and Space requirements O(N 2 ) space since it uses the proximity matrix. N is the number of points. O(N 3 ) time in many cases There are N steps and at each step the size, N 2, proximity matrix must be updated and searched omplexity can be reduced to O(N 2 log(n) ) time for some approaches