Unsupervised learning, Clustering CS434

Similar documents
Lecture 15 Clustering. Oct

What to come. There will be a few more topics we will cover on supervised learning

Hierarchical Clustering

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

Clustering CE-324: Modern Information Retrieval Sharif University of Technology

Clustering algorithms

Based on Raymond J. Mooney s slides

Data Informatics. Seon Ho Kim, Ph.D.

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Unsupervised Learning

Clust Clus e t ring 2 Nov

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Hierarchical Clustering

CSE 5243 INTRO. TO DATA MINING

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

CSE 5243 INTRO. TO DATA MINING

Distances, Clustering! Rafael Irizarry!

CS47300: Web Information Search and Management

4. Ad-hoc I: Hierarchical clustering

Chapter 6: Cluster Analysis

Clustering part II 1

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Cluster Analysis for Microarray Data

Unsupervised Learning and Clustering

Information Retrieval and Web Search Engines

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Hierarchical Clustering

UVA CS 4501: Machine Learning. Lecture 22: Unsupervised Clustering (I) Dr. Yanjun Qi. University of Virginia. Department of Computer Science

10701 Machine Learning. Clustering

2. Find the smallest element of the dissimilarity matrix. If this is D lm then fuse groups l and m.

DD2475 Information Retrieval Lecture 10: Clustering. Document Clustering. Recap: Classification. Today

Administrative. Machine learning code. Machine learning: Unsupervised learning

Cluster Analysis. Ying Shen, SSE, Tongji University

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 19: Unsupervised Clustering (I) Dr. Yanjun Qi. University of Virginia

Clustering COMS 4771

Network Traffic Measurements and Analysis

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

Hierarchical Clustering

Lesson 3. Prof. Enza Messina

CHAPTER 4: CLUSTER ANALYSIS

Hierarchical clustering

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Unsupervised Learning and Clustering

Big Data Analytics! Special Topics for Computer Science CSE CSE Feb 9

Hierarchical Clustering 4/5/17

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Clustering: K-means and Kernel K-means

Clustering Part 1. CSC 4510/9010: Applied Machine Learning. Dr. Paula Matuszek

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Hierarchical clustering

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

Statistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.

Hierarchical Clustering Lecture 9

Lecture 10: Semantic Segmentation and Clustering

Cluster analysis formalism, algorithms. Department of Cybernetics, Czech Technical University in Prague.

CSE 5243 INTRO. TO DATA MINING

HW4 VINH NGUYEN. Q1 (6 points). Chapter 8 Exercise 20

Clustering Lecture 3: Hierarchical Methods

Information Retrieval and Web Search Engines

Chapter VIII.3: Hierarchical Clustering

CPSC 425: Computer Vision

Data Mining Clustering

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Cluster analysis. Agnieszka Nowak - Brzezinska

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Hierarchical clustering

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Cluster Analysis. Angela Montanari and Laura Anderlucci

Clustering. Unsupervised Learning

Clustering Algorithms. Margareta Ackerman

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

Clustering CS 550: Machine Learning

Clustering (COSC 416) Nazli Goharian. Document Clustering.

Clustering. Content. Typical Applications. Clustering: Unsupervised data mining technique

Clustering (COSC 488) Nazli Goharian. Document Clustering.

Stat 321: Transposable Data Clustering

Exploratory data analysis for microarrays

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

Informa(on Retrieval

Unsupervised Learning Hierarchical Methods

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

Unsupervised Learning

Hierarchy. No or very little supervision Some heuristic quality guidances on the quality of the hierarchy. Jian Pei: CMPT 459/741 Clustering (2) 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

CS 534: Computer Vision Segmentation and Perceptual Grouping

Gene Clustering & Classification

Data Exploration with PCA and Unsupervised Learning with Clustering Paul Rodriguez, PhD PACE SDSC

Foundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot

Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

Clustering Part 3. Hierarchical Clustering

High throughput Data Analysis 2. Cluster Analysis

Transcription:

Unsupervised learning, Clustering CS434

Unsupervised learning and pattern discovery So far, our data has been in this form: We will be looking at unlabeled data: x 11,x 21, x 31,, x 1 m x 12,x 22, x 32,, x 2 m x 1n,x 2n, x 3n,, x n m y 1 y 2 y n x 11,x 21, x 31,, x 1 m x 12,x 22, x 32,, x 2 m x 1n,x 2n, x 3n,, x n m What do we expect to learn from such data? I have tons of data (web documents, gene data, etc) and need to: organize it better e.g., find subgroups understand it better e.g., understand interrelationships find regular trends in it e.g., If A and B, then C

Hierarchical clustering as a way to organize web pages

Finding association patterns in data

Dimension reduction

clustering

A hypothetical clustering example Monthly spending.......... Each point represents a person in the database...... Monthly income

A hypothetical clustering example Monthly spending................ This suggests there maybe three distinct spending patterns among the group of people in our database. Monthly income

What is Clustering In general clustering can be viewed as an exploratory procedure for finding interesting subgroups in given data A large portion of the efforts focus on a special kind of clustering: Group all given examples (i.e., exhaustive) into disjoint clusters (i.e., partitional), such that: Examples within a cluster are (very) similar Examples in different clusters are (very) different

Some example applications Information retrieval: cluster retrieved documents to present more organized and understandable results Consumer market analysis: cluster consumers into different interest groups so that marketing plans can be specifically designed for each individual group Image segmentation: decompose an image into regions with coherent color and texture Vector quantization for data (i.e., image) compression: group vectors into similar groups, and use group mean to represent group members Computational biology: group gene into co-expressed families based on their expression profile using different tissue samples and different experimental conditions

Image compression: Vector quantization Group all pixels into self-similar groups, instead of storing all pixel values, store the means of each group 701,554 bytes 127,292 bytes

Important components in clustering Distance/similarity measure How to measure the similarity between two objects? Clustering algorithm How to find the clusters based on the distance/similarity measures Evaluation of results How do we know that our clustering result is good

Distance/similarity Measures One of the most important question in unsupervised learning, often more important than the choice of clustering algorithms What is similarity? Similarity is hard to define, but we know it when we see it quoted from Eamonn Keogh (UCR) More pragmatically, there are many mathematical definitions of distance/similarity

Common distance/similarity measures Euclidean distance Manhattan distance L d 2 2 x i x i 2 ( x, x ) ( ) i 1 Straight line distance >=0 City block distance >=0 Cosine similarity Angle between two vectors, commonly used for measuring document similarity More flexible measures: D d ( x, x ) wi ( xi xi i 1 one can learn the appropriate weights given user guidance ) 2 Scale each feature differently using w i s Note: We can always transform between distance and similarity using a monotonically decreasing function, for example,

How to decide which to use? It is application dependant Consider your application domain, you need to ask questions such as: What does it mean for two consumers to be similar to each other? Or for two genes to be similar to each other? For example, for text domain, we typically use cosine similarity This may or may not give you the answer you want depends on your existing knowledge of the domain

A learning approach Ideally we d like to learn a distance function from user s inputs Ask users to provide things like object A is similar to object B, dissimilar to object C For example, if a user want to group the marbles based on their pattern They don t belong together! They belong together! Learn a distance function to correctly reflect these relationships E.g. weigh pattern-related features more than color features This is a more advanced topic that we will not cover in this class, but nonetheless very important

When we can not afford to learn a distance measure, and don t have a clue about which distance function is appropriate, what should we do? Clustering is an exploratory procedure and it is important to explore i.e., try different options

Clustering Now we have seen different ways that one can use to measure distances or similarities among a set of unlabeled objects, how can we use them to group the objects into clusters? People have looked at many different approaches and they can be categorized into two distinct types

Hierarchical and non-hierarchical Clustering Hierarchical clustering builds a tree-based hierarchical taxonomy (dendrogram) animal vertebrate invertebrate fish reptile mammal worm insect A non-hierarchical (flat) clustering produces a single partition of the unlabeled data Choosing a particular level in the hierarchy produces a flat clustering Recursive application of flat clustering can also produce a hierarchical clustering.

Hierarchical Agglomerative Clustering (HAC) Assumes a distance function for determining the similarity of two instances. One can also assume a similarity function, and reverse some of the operations (e.g., minimum distance -> maximum similarity) in the algorithm to make it work for similarities Starts with each object in a separate cluster and then repeatedly joins the two closest clusters until only one cluster is left

HAC Example A B C D E F G H A B C D E F G H

HAC Algorithm Start with all objects in their own cluster. Repeat until there is only one cluster: Among the current clusters, determine the two clusters, c i and c j, that are closest Replace c i and c j with a single cluster c i c j Problem: we assume a distance/similarity function that computes distance/similarity between examples, but here we also need to compute distance between clusters. How?

Distance Between Clusters Assume a distance function that determines the distance of two objects: D(x,x ). There are multiple way to define a cluster distance function: Single Link: distance of two closest members of clusters D(C 1, C 2 ) C 1 C 2 Complete Link: distance of two furthest members of clusters D(C 1, C 2 ) C 1 C 2 Average Link: average distance

Single Link Agglomerative Clustering Use minimum distance of all pairs: D( C, C ) min D( x, x' ) i j x c i, x' c j A B C D E F G H A B C D E F G H

Single-link s chaining effect Single link is famous for its chaining effect It can gradually adds more and more examples to the chain Create long straggling clusters

Complete Link Maximum distance of all pairs: D( C i, C j ) max x c, x' i c j D( x, x') A B C D E F G H A B C D E F G H Makes tight, spherical clusters

Complete link is outlier sensitive

Updating the Cluster Distances after merging is a piece of After merging c i and c j, the distance of the resulting cluster to any other cluster, c k, can be computed by: Single Link: D(( ci c j ), ck ) min( D( ci, ck ), D( c j, ck )) Complete Link: D(( ci c j ), ck ) max( D( ci, ck ), D( c j, ck )) This is constant time given previous distances

Average Link Basic idea: average similarity between members of the two clusters Averaged across all ordered pairs in the merged cluster D( c i, c j ) c i c j 1 ( c i c j 1) x ( c c ) x' ( c c ): x' x i j i j D( x,x' ) Compared to single link and complete link: Computationally more expensive naively it can be O(n 2 ) to compute the new distance between a pair of clusters If we use cosine similarity, we can compute the new distance in constant time Achieves a compromise between single and complete link

HAC creates a Dendrogram Dendrogram draws the tree such that the height of a tree branch = the distance between the two merged clusters at that particular step The distances are always monotonically increasing D(C1, C2)

How many clusters Cutting the dendrogram at a specific similarity level gives us a simple flat clustering Different cutting places lead different numbers of clusters One option is to cut the dendrogram where the gap between two successive combination similarities is largest

Comments on HAC HAC is a convenient tool that often provides interesting views of a dataset Primarily HAC can be viewed as an intuitively appealing clustering procedure for data analysis/exploration We can create clusterings of different granularity by stopping at different levels of the dendrogram HAC often used together with visualization of the dendrogram to decide how many clusters exist in the data Different linkage methods (single, complete and average) often lead to different solutions