Hierarchical Clustering Lecture 9

Similar documents
Unsupervised: no target value to predict

Data Mining Concepts & Techniques

Hierarchical Clustering

4. Ad-hoc I: Hierarchical clustering

Unsupervised Learning

Clustering Part 3. Hierarchical Clustering

Cluster Analysis. Ying Shen, SSE, Tongji University

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

Hierarchical clustering

Hierarchical Clustering

CSE 5243 INTRO. TO DATA MINING

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Unsupervised Learning Hierarchical Methods

CSE 5243 INTRO. TO DATA MINING

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Lesson 3. Prof. Enza Messina

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Cluster analysis. Agnieszka Nowak - Brzezinska

Data Mining Practical Machine Learning Tools and Techniques

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

CSE 347/447: DATA MINING

Hierarchical Clustering 4/5/17

CS7267 MACHINE LEARNING

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Summer School in Statistics for Astronomers & Physicists June 15-17, Cluster Analysis

Basic Concepts Weka Workbench and its terminology

Data Mining Practical Machine Learning Tools and Techniques

Clustering Lecture 3: Hierarchical Methods

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Hierarchical Clustering

COMP33111: Tutorial and lab exercise 7

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Clustering. Chapter 10 in Introduction to statistical learning

Introduction to Data Mining

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Finding Clusters 1 / 60

CSE 5243 INTRO. TO DATA MINING

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 16

Clustering part II 1

Clustering CS 550: Machine Learning

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

CS570: Introduction to Data Mining

1.2 Venn Diagrams and Partitions

Information Retrieval and Web Search Engines

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

Cluster Analysis for Microarray Data

Data Mining and Analysis: Fundamental Concepts and Algorithms

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Hierarchical Clustering

Segmentation (continued)

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

University of Florida CISE department Gator Engineering. Clustering Part 2

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Association Rule Mining and Clustering

Machine learning - HT Clustering

Machine Learning (BSMC-GA 4439) Wenke Liu

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Clustering algorithms and introduction to persistent homology

Lecture 15 Clustering. Oct

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

Tree Models of Similarity and Association. Clustering and Classification Lecture 5

Clustering: Overview and K-means algorithm

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

CHAPTER 4: CLUSTER ANALYSIS

Unsupervised Learning and Clustering

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

Clustering Part 4 DBSCAN

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Information Retrieval and Web Search Engines

Workload Characterization Techniques

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Clustering in Data Mining

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Clustering algorithms

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

Unsupervised learning, Clustering CS434

Objective of clustering

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

Hierarchy. No or very little supervision Some heuristic quality guidances on the quality of the hierarchy. Jian Pei: CMPT 459/741 Clustering (2) 1

Distances, Clustering! Rafael Irizarry!

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms

Machine Learning (BSMC-GA 4439) Wenke Liu

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

Cluster Analysis. Angela Montanari and Laura Anderlucci

Chapter 12: Indexing and Hashing. Basic Concepts

11/2/2017 MIST.6060 Business Intelligence and Data Mining 1. Clustering. Two widely used distance metrics to measure the distance between two records

Clustering. K-means clustering

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Transcription:

Hierarchical Clustering Lecture 9 Marina Santini Acknowledgements Slides borrowed and adapted from: Data Mining by I. H. Witten, E. Frank and M. A. Hall 1 Lecture 9: Required Reading Witten et al. (2011: 273-284) 2 1

Outline Dendogram Agglomerative clustering Distance measures Cut-off 3 Hierarchical clustering Recursively splitting clusters produces a hierarchy that can be represented as a dendogram Could also be represented as a Venn diagram of sets and subsets (without intersections) Height of each node in the dendogram can be made proportional to the dissimilarity between its children 4 2

Agglomerative clustering Bottom-up approach Simple algorithm Requires a distance/similarity measure Start by considering each instance to be a cluster Find the two closest clusters and merge them Continue merging until only one cluster is left The record of mergings forms a hierarchical clustering structure a binary dendogram 5 Distance measures Single-linkage Minimum distance between the two clusters Distance between the clusters closest two members Can be sensitive to outliers Complete-linkage Maximum distance between the two clusters Two clusters are considered close only if all instances in their union are relatively similar Also sensitive to outliers Seeks compact clusters 6 3

Distance measures cont. Compromise between the extremes of minimum and maximum distance Represent clusters by their centroid, and use distance between centroids centroid linkage Works well for instances in multidimensional Euclidean space Not so good if all we have is pairwise similarity between instances Calculate average distance between each pair of members of the two clusters average-linkage Technical deficiency of both: results depend on the numerical scale on which distances are measured 7 More distance measures Group-average clustering Uses the average distance between all members of the merged cluster Differs from average-linkage because it includes pairs from the same original cluster Ward's clustering method Calculates the increase in the sum of squares of the distances of the instances from the centroid before and after fusing two clusters Minimize the increase in this squared distance at each clustering step All measures will produce the same result if the clusters are compact and well separated 8 4

Example hierarchical clustering 50 examples of different creatures from the zoo data Complete-linkage Dendogram Polar plot 9 Example hierarchical clustering 2 Single-linkage 10 5

Incremental clustering Heuristic approach (COBWEB/CLASSIT) Form a hierarchy of clusters incrementally Start: tree consists of empty root node Then: add instances one by one update tree appropriately at each stage to update, find the right leaf for an instance May involve restructuring the tree Base update decisions on category utility 11 ID Clustering weather data Outlook Temp. Humidity Windy 1 A B C D E 2 F G H I J K 3 L M N 12 6

Clustering weather data ID Outlook Temp. Humidity Windy 4 A B C D E F G H 5 Merge best host and runner-up I J K 3 L M N Consider splitting the best host if merging doesn t help 13 Final hierarchy 14 7

Example: the iris data (subset) 15 Clustering with cutoff 16 8

The end 17 9