CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Similar documents
CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Cluster Analysis. Ying Shen, SSE, Tongji University

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

CSE 5243 INTRO. TO DATA MINING

Hierarchical Clustering 4/5/17

CHAPTER 4: CLUSTER ANALYSIS

Clustering CS 550: Machine Learning

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

University of Florida CISE department Gator Engineering. Clustering Part 2

10701 Machine Learning. Clustering

Introduction to Mobile Robotics

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Chapter DM:II. II. Cluster Analysis

Network Traffic Measurements and Analysis

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

CSE 5243 INTRO. TO DATA MINING

Unsupervised Learning

Cluster analysis. Agnieszka Nowak - Brzezinska

Machine Learning. Unsupervised Learning. Manfred Huber

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Supervised vs. Unsupervised Learning

Data Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1394

Lesson 3. Prof. Enza Messina

Based on Raymond J. Mooney s slides

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

ECLT 5810 Clustering

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

ECLT 5810 Clustering

Introduction to Clustering

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Gene Clustering & Classification

CS Introduction to Data Mining Instructor: Abdullah Mueen

Clustering. Chapter 10 in Introduction to statistical learning

Clustering Lecture 5: Mixture Model

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Machine Learning (BSMC-GA 4439) Wenke Liu

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Information Retrieval and Web Search Engines

Cluster analysis formalism, algorithms. Department of Cybernetics, Czech Technical University in Prague.

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Data Informatics. Seon Ho Kim, Ph.D.

4. Ad-hoc I: Hierarchical clustering

Unsupervised Learning and Clustering

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Expectation Maximization (EM) and Gaussian Mixture Models

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Unsupervised Learning: Clustering

Unsupervised Learning and Clustering

Clustering. Supervised vs. Unsupervised Learning

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Clustering algorithms

Mixture Models and EM

Machine Learning (BSMC-GA 4439) Wenke Liu

Note Set 4: Finite Mixture Models and the EM Algorithm

Unsupervised Learning and Data Mining

What to come. There will be a few more topics we will cover on supervised learning

Clustering in Data Mining

Unsupervised Learning : Clustering

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Foundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Machine learning - HT Clustering

K-Means Clustering. Sargur Srihari

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Unsupervised Learning Partitioning Methods

Unsupervised Learning Hierarchical Methods

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Association Rule Mining and Clustering

Hierarchical Clustering

Clustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures

INF 4300 Classification III Anne Solberg The agenda today:

Motivation. Technical Background

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Finding Clusters 1 / 60

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Clustering in Ratemaking: Applications in Territories Clustering

Performance Evaluation of Various Classification Algorithms

Information Retrieval and Web Search Engines

Clustering Lecture 3: Hierarchical Methods

CS7267 MACHINE LEARNING

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Data Clustering. Algorithmic Thinking Luay Nakhleh Department of Computer Science Rice University

Hierarchical Clustering Lecture 9

Data Clustering. Danushka Bollegala

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Cluster Analysis. Angela Montanari and Laura Anderlucci

Hierarchical Clustering

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Understanding Clustering Supervising the unsupervised

Machine Learning for OR & FE

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Transcription:

Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups such that data points similar to each other are in the same group Similarity between data points is defined in terms of some distance metric (can be chosen) Clustering is useful for: Similarity/Dissimilarity analysis Analyze what data points in the sample are close to each other Dimensionality reduction High dimensional data replaced with a group (cluster) label

Clustering example We see data points and want to partition them into groups Which data points belong together? 3 - - -3-3 - - 3 Clustering example We see data points and want to partition them into the groups Which data points belong together? 3 - - -3-3 - - 3

Clustering example We see data points and want to partition them into the groups Requires a distance measure to tell us what points are close to each other and are in the same group 3 Euclidean distance - - -3-3 - - 3 Clustering example A set of patient cases We want to partition them into the groups based on similarities Patient # Age Sex Heart Rate Blood pressure Patient 55 M 85 5/8 Patient 6 M 87 3/85 Patient 3 67 F 8 6/86 Patient 4 65 F 9 3/9 Patient 5 7 M 84 35/85

Clustering example A set of patient cases We want to partition them into the groups based on similarities Patient # Age Sex Heart Rate Blood pressure Patient 55 M 85 5/8 Patient 6 M 87 3/85 Patient 3 67 F 8 6/86 Patient 4 65 F 9 3/9 Patient 5 7 M 84 35/85 How to design the distance metric to quantify similarities? Clustering example. Distance measures. In general, one can choose an arbitrary distance measure. Properties of the distance measures: Assume data entries a,b Positiveness: Symmetry: Identity: d ( a, b) d ( a, b) = d ( b, a ) d ( a, a ) =

Distance measures. Assume pure real-valued data-points: 34.5 78.5 89. 9. 3.5 4.4 66.3 78.8 8.9 33.6 36.7 78.3 9.3.4 7. 3. 7.6 88.5.5 What distance metric to use? Distance measures. Assume pure real-valued data-points: 34.5 78.5 89. 9. 3.5 4.4 66.3 78.8 8.9 33.6 36.7 78.3 9.3.4 7. 3. 7.6 88.5.5 What distance metric to use? Euclidian: works for an arbitrary k-dimensional space d ( a, b) = k i = ( a i b i )

Distance measures. Assume pure real-valued data-points: 34.5 78.5 89. 9. 3.5 4.4 66.3 78.8 8.9 33.6 36.7 78.3 9.3.4 7. 3. 7.6 88.5.5 What distance metric to use? Manhattan: works for an arbitrary k-dimensional space d ( a, b) = Etc... k i = a i b i Distance measures. Assume pure binary values data: What distance metric to use?

Distance measures. Assume pure binary values data: What distance metric to use? Edit distance: The number of attributes that need to be changed to make the entries the same The same metric can be used for pure categorical values Distance measures. Combination of real-valued and categorical attributes Patient # Age Sex Heart Rate Blood pressure Patient 55 M 85 5/8 Patient 6 M 87 3/85 Patient 3 67 F 8 6/86 Patient 4 65 F 9 3/9 Patient 5 7 M 84 35/85 What distance metric to use?

Distance measures. Combination of real-valued and categorical attributes Patient # Age Sex Heart Rate Blood pressure Patient 55 M 85 5/8 Patient 6 M 87 3/85 Patient 3 67 F 8 6/86 Patient 4 65 F 9 3/9 Patient 5 7 M 84 35/85 What distance metric to use? A weighted sum approach: e.g. a mix of Euclidian and Edit distances for subsets of attributes Clustering Clustering is useful for: Similarity/Dissimilarity analysis Analyze what data points in the sample are close to each other Dimensionality reduction High dimensional data replaced with a group (cluster) label Problems: Pick the correct similarity measure (problem specific) Choose the correct number of groups Many clustering algorithms require us to provide the number of groups ahead of time

Clustering algorithms Partitioning algorithms: K-means algorithm suitable only when data points have continuous values; groups are defined in terms of cluster centers (also called means). refinement of the method to categorical values: K-medoids Probabilistic methods (with EM) Latent variable models: class (cluster) is represented by a latent (hidden) variable value. Examples: mixture of Gaussians, Naïve Bayes with a hidden class Hierarchical methods Agglomerative Divisive K-means K-Means algorithm: Initialize randomly k values of means (centers) Repeat two steps until no change in the means: Partition the data according to the current set of means (using the similarity measure) Move the means to the center of the data in the current partition Stop when no change in the means Properties: Minimizes the sum of squared center-point distances for all clusters The algorithm always converges (local optima).

K-Means example 3 3 - - - - -3-3 -3 - - 3-3 - - 3 3 - - -3-3 - - 3 K-means algorithm Properties: converges to centers minimizing the sum of squared centerpoint distances (still local optima) The result is sensitive to the initial means values Advantages: Simplicity Generality can work for more than one distance measure Drawbacks: Can perform poorly with overlapping regions Lack of robustness to outliers Good for attributes (features) with continuous values Allows us to compute cluster means

Probabilistic (EM-based) algorithms Latent variable models Examples: Naïve Bayes with hidden class Mixture of Gaussians Partitioning: the data point belongs to the class with the highest posterior Advantages: Good performance on overlapping regions Robustness to outliers Data attributes can have different types of values Drawbacks: EM is computationally expensive and can take time to converge Density model should be given in advance Hierarchical clustering. Uses an arbitrary similarity/dissimilarity measure. Typical similarity measures d(a,b) : Pure real-valued data-points: Euclidean, Manhattan, Minkowski distances Pure binary values data: Number of matching values Pure categorical data: Number of matching values Combination of real-valued and categorical attributes A weighted sum approach

Hierarchical clustering. Approach: Compute dissimilarity matrix for all pairs of points uses standard or other distance measures Construct clusters greedily: Agglomerative approach Merge pair of clusters in a bottom-up fashion, starting from singleton clusters Divisive approach: Splits clusters in top-down fashion, starting from one complete cluster Stop the greedy construction when some criterion is satisfied E.g. fixed number of clusters Cluster merging Construction of clusters through greedy agglomerative approach Merge pair of clusters in a bottom-up fashion, starting from singleton clusters Merge clusters based on cluster (or linkage) distances. Defined in terms of point distances. Examples: Min distance d min ( C i, C j ) = min p C i, q C j p q Max distance d max ( C i, C j ) = max p C, q i C j p q Mean distance d mean ( C i, C j ) = C i i p i C j j q j

Hierarchical clustering example 8 6 4 - - 3 4 5 6 7 8 9 Hierarchical clustering example dendogram 8 6 6 4 5 4 - - 3 4 5 6 7 8 9 3 3 6 3 9 4 8 4 5 7 5 7 6 8 9 3 7 9 3 5 8 6 4

Hierarchical clustering Advantage: Smaller computational cost; avoids scanning all possible clusterings Disadvantage: Greedy choice fixes the order in which clusters are merged; cannot be repaired Partial solution: combine hierarchical clustering with iterative algorithms like k-means