CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 16

Similar documents
Clustering Part 3. Hierarchical Clustering

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Data Mining Concepts & Techniques

Hierarchical Clustering Lecture 9

Hierarchical Clustering

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Clustering Lecture 3: Hierarchical Methods

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

Machine Learning: Symbol-based

CSE 5243 INTRO. TO DATA MINING

Clustering. Unsupervised Learning

Clustering CS 550: Machine Learning

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

Hierarchical Clustering

Machine learning - HT Clustering

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

4. Ad-hoc I: Hierarchical clustering

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Chapter 6: Cluster Analysis

Unsupervised Learning and Clustering

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Cluster Analysis: Agglomerate Hierarchical Clustering

Unsupervised Learning

CSE 347/447: DATA MINING

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

CSE 5243 INTRO. TO DATA MINING

CS7267 MACHINE LEARNING

Clustering. Unsupervised Learning

Unsupervised Learning Hierarchical Methods

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Cluster Analysis. Ying Shen, SSE, Tongji University

Data Mining Algorithms

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Hierarchical clustering

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Chapter 18 Clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

CHAPTER 4: CLUSTER ANALYSIS

Hierarchical clustering

Clustering. Unsupervised Learning

Unsupervised Learning and Clustering

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

Hierarchical and Ensemble Clustering

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Hierarchical clustering

Network Traffic Measurements and Analysis

Introduction to Data Mining

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Kapitel 4: Clustering

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Cluster Analysis for Microarray Data

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Clustering in Ratemaking: Applications in Territories Clustering

Summer School in Statistics for Astronomers & Physicists June 15-17, Cluster Analysis

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Clustering part II 1

Cluster Analysis. Angela Montanari and Laura Anderlucci

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Lesson 3. Prof. Enza Messina

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

Clustering in Data Mining

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

Hierarchy. No or very little supervision Some heuristic quality guidances on the quality of the hierarchy. Jian Pei: CMPT 459/741 Clustering (2) 1

Hierarchical Clustering 4/5/17

Statistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Foundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot

Cluster analysis. Agnieszka Nowak - Brzezinska

Clustering. Lecture 6, 1/24/03 ECS289A

Homework # 4. Example: Age in years. Answer: Discrete, quantitative, ratio. a) Year that an event happened, e.g., 1917, 1950, 2000.

Machine Learning. Unsupervised Learning. Manfred Huber

Hierarchical Clustering

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

Tribhuvan University Institute of Science and Technology MODEL QUESTION

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

CPSC 425: Computer Vision

Hierarchical Clustering

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

10701 Machine Learning. Clustering

Chuck Cartledge, PhD. 23 September 2017

Clustering. Bruno Martins. 1 st Semester 2012/2013

Clustering algorithms 6CCS3WSN-7CCSMWAL

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Image Analysis Lecture Segmentation. Idar Dyrdal

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

Unsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis

Lecture 25: Review I

Machine Learning (BSMC-GA 4439) Wenke Liu

Gene Clustering & Classification

Objective of clustering

Transcription:

CS434a/541a: Pattern Recognition Prof. Olga Veksler Lecture 16

Today Continue Clustering Last Time Flat Clustring Today Hierarchical Clustering Divisive Agglomerative Applications of Clustering

Hierarchical Clustering: Dendogram preferred way to represent a hierarchical clustering is a dendrogram Binary tree Level k corresponds to partitioning with n-k+1 clusters if need k clusters, take clustering from level n-k+1 If samples are in the same cluster at level k, they stay in the same cluster at higher levels dendrogram typically shows the similarity of grouped clusters

Hierarchical Clustering: Venn Diagram Can also use Venn diagram to show hierarchical clustering, but similarity is not represented quantitatively

Hierarchical Clustering Algorithms for hierarchical clustering can be divided into two types: 1. Agglomerative (bottom up) procedures Start with n singleton clusters Form hierarchy by merging most similar clusters 3 4 2 5 6 2. Divisive (top bottom) procedures Start with all samples in one cluster Form hierarchy by splitting the worst clusters

Divisive Hierarchical Clustering Any flat algorithm which produces a fixed number of clusters can be used set c = 2

Agglomerative Hierarchical Clustering initialize with each example in singleton cluster while there is more than 1 cluster 1. find 2 nearest clusters 2. merge them Four common ways to measure cluster distance 1. minimum distance d ( D, D ) = min x y min i j x D, y i D j 2. maximum distance d (, ) max max Di D j = x y 3. average distance 4. mean distance x D, y i D j 1 ( Di, D j ) = d avg x y n n mean i j x D i y D j ( D D ) = µ d µ i, j i j

Single Linkage or Nearest Neighbor Agglomerative clustering with minimum distance d ( D, D ) = min x y 3 min i 5 j x D, y i D j 1 2 4 generates minimum spanning tree encourages growth of elongated clusters disadvantage: very sensitive to noise what we want at level with c=3 what we get at level with c=3 noisy sample

Complete Linkage or Farthest Neighbor Agglomerative clustering with maximum distance d max encourages compact clusters ( D, D ) = max x y i j x D, y i D j 1 2 3 4 Does not work well if elongated clusters present 5 ( D ) D 1 ( D ) D D 2 3 d max 1,D2 < dmax 2,D3 thus D 1 and D 2 are merged instead of D 2 and D 3

Average and Mean Agglomerative Clustering Agglomerative clustering is more robust under the average or the mean cluster distance 1 ( Di, D j ) = d avg x y n n mean i j x D i y D j ( D D ) = µ d µ i, j i j mean distance is cheaper to compute than the average distance unfortunately, there is not much to say about agglomerative clustering theoretically, but it does work reasonably well in practice

Agglomerative vs. Divisive Agglomerative is faster to compute, in general Divisive may be less blind to the global structure of the data Divisive when taking the first step (split), have access to all the data; can find the best possible split in 2 parts Agglomerative when taking the first step merging, do not consider the global structure of the data, only look at pairwise structure

First (?) Application of Clustering John Snow, a London physician plotted the location of cholera deaths on a map during an outbreak in the 1850s. The locations indicated that cases were clustered around certain intersections where there were polluted wells -- thus exposing both the problem and the solution. From: Nina Mishra HP Labs

Application of Clustering Astronomy SkyCat: Clustered 2x10 9 sky objects into stars, galaxies, quasars, etc based on radiation emitted in different spectrum bands. From: Nina Mishra HP Labs

Applications of Clustering Image segmentation Find interesting objects in images to focus attention at From: Image Segmentation by Nested Cuts, O. Veksler, CVPR2000

Applications of Clustering Image Database Organization for efficient search

Applications of Clustering Data Mining Technology watch Derwent Database, contains all patents filed in the last 10 years worldwide Searching by keywords leads to thousands of documents Find clusters in the database and find if there are any emerging technologies and what competition is up to Marketing Customer database Find clusters of customers and tailor marketing schemes to them

Applications of Clustering gene expression profile clustering similar expressions, expect similar function From:De Smet F., Mathys J., Marchal K., Thijs G., De Moor B. & Moreau Y. 2002. Adaptive Quality-based clustering of gene expression profiles, Bioinformatics, 18(6), 735-746.

Applications of Clustering Profiling Web Users Use web access logs to generate a feature vector for each user Cluster users based on their feature vectors Identify common goals for users Shopping Job Seekers Product Seekers Tutorials Seekers Can use clustering results to improving web content and design

Summary Clustering (nonparametric unsupervised learning) is useful for discovering inherent structure in data Clustering is immensely useful in different fields Clustering comes naturally to humans (in up to 3 dimensions), but not so to computers It is very easy to design a clustering algorithm, but it is very hard to say if it does anything good General purpose clustering does not exist, for best results, clustering should be tuned to application at hand