Unsupervised Learning and Clustering

Similar documents
CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Unsupervised Learning

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 16

Hierarchical clustering for gene expression data analysis

Unsupervised Learning and Clustering

Machine Learning: Algorithms and Applications

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Graph-based Clustering

Clustering. A. Bellaachia Page: 1

Machine Learning. Topic 6: Clustering

K-means and Hierarchical Clustering

Hierarchical agglomerative. Cluster Analysis. Christine Siedle Clustering 1

Image Alignment CSC 767

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Feature Reduction and Selection

A Two-Stage Algorithm for Data Clustering

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Multi-stable Perception. Necker Cube

Survey of Cluster Analysis and its Various Aspects

Announcements. Supervised Learning

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Discriminative Dictionary Learning with Pairwise Constraints

LECTURE : MANIFOLD LEARNING

CS 534: Computer Vision Model Fitting

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Machine Learning 9. week

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Support Vector Machines

Data Mining MTAT (4AP = 6EAP)

Parallel matrix-vector multiplication

Machine Learning. K-means Algorithm

Simplification of 3D Meshes

Cluster Analysis of Electrical Behavior

On the Efficiency of Swap-Based Clustering

Classifier Selection Based on Data Complexity Measures *

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

Smoothing Spline ANOVA for variable screening

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.

A Simple Methodology for Database Clustering. Hao Tang 12 Guangdong University of Technology, Guangdong, , China

Support Vector Machines. CS534 - Machine Learning

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Data Mining Approaches to User Modeling for Adaptive Hypermedia: Survey and Future Directions

Fitting: Deformable contours April 26 th, 2018

Data Mining: Model Evaluation

Active Contours/Snakes

Web Mining: Clustering Web Documents A Preliminary Review

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Collaboratively Regularized Nearest Points for Set Based Recognition

Face Recognition using 3D Directional Corner Points

Query Clustering Using a Hybrid Query Similarity Measure

cos(a, b) = at b a b. To get a distance measure, subtract the cosine similarity from one. dist(a, b) =1 cos(a, b)

Support Vector Machines

This excerpt from. Foundations of Statistical Natural Language Processing. Christopher D. Manning and Hinrich Schütze The MIT Press.

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS

DOCUMENT clustering is a special version of data clustering

1. Introduction. Abstract

Local Quaternary Patterns and Feature Local Quaternary Patterns

Data Modelling and. Multimedia. Databases M. Multimedia. Information Retrieval Part II. Outline

Ecient Computation of the Most Probable Motion from Fuzzy. Moshe Ben-Ezra Shmuel Peleg Michael Werman. The Hebrew University of Jerusalem

Clustering is a discovery process in data mining.

Face Recognition Method Based on Within-class Clustering SVM

SIGGRAPH Interactive Image Cutout. Interactive Graph Cut. Interactive Graph Cut. Interactive Graph Cut. Hard Constraints. Lazy Snapping.

SVM-based Learning for Multiple Model Estimation

Topics. Clustering. Unsupervised vs. Supervised. Vehicle Example. Vehicle Clusters Advanced Algorithmics

Clustering using Vector Membership: An Extension of the Fuzzy C-Means Algorithm

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

APPLIED MACHINE LEARNING

CLUSTERING that discovers the relationship among data

TOWARDS FUZZY-HARD CLUSTERING MAPPING PROCESSES. MINYAR SASSI National Engineering School of Tunis BP. 37, Le Belvédère, 1002 Tunis, Tunisia

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Optimizing Document Scoring for Query Retrieval

MULTI-VIEW ANCHOR GRAPH HASHING

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

A Deflected Grid-based Algorithm for Clustering Analysis

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Clustering algorithms and validity measures

12. Segmentation. Computer Engineering, i Sejong University. Dongil Han

Research on Categorization of Animation Effect Based on Data Mining

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

TN348: Openlab Module - Colocalization

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment

Object-Based Techniques for Image Retrieval

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Video Content Representation using Optimal Extraction of Frames and Scenes

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. *, NO. *, Dictionary Pair Learning on Grassmann Manifolds for Image Denoising

A new segmentation algorithm for medical volume image based on K-means clustering

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Structure from Motion

A Bilinear Model for Sparse Coding

Classification / Regression Support Vector Machines

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Histogram based Evolutionary Dynamic Image Segmentation

Video Object Tracking Based On Extended Active Shape Models With Color Information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Transcription:

Unsupervsed Learnng and Clusterng

Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also called learnng wth teacher, snce correct answer (the true class) s provded Today we consder unsupervsed learnng scenaro, where we are only gven 1. samples 1,, n Ths s also called learnng wthout teacher, snce correct answer s not provded do not splt data nto tranng and test sets

Clusterng Seek natural clusters n the data What s a good clusterng? nternal (wthn the cluster) dstances should be small eternal (ntra-cluster) should be large Clusterng s a way to dscover new categores (classes)

What we Need for Clusterng 1. Promty measure, ether smlarty measure s(, k ): large f, k are smlar dssmlarty(or dstance) measure d(, k ): small f, k are smlar large d, small s large s, small d 2. Crteron functon to evaluate a clusterng good clusterng bad clusterng 3. Algorthm to compute clusterng For eample, by optmzng the crteron functon

How Many Clusters? 3 clusters or 2 clusters? Possble approaches 1. f the number of clusters to k 2. fnd the best clusterng accordng to the crteron functon (number of clusters may vary)

Promty Measures good promty measure s VERY applcaton dependent Clusters should be nvarant under the transformatons natural to the problem For eample for object recognton, should have nvarance to rotaton dstance 0 For character recognton, no nvarance to rotaton 9 6

Dstance (dssmlarty) Measures Manhattan (cty block) dstance d k k j k j d 1, Eucldean dstance d k k j k j d 1 2, appromaton to Eucldean dstance, cheaper to compute ma, 1 k j k d k j d Chebyshev dstance appromaton to Eucldean dstance, cheapest to compute translaton nvarant

Smlarty Measures Correlaton coeffcent Cosne smlarty:, j j T j s the smaller the angle, the larger the smlarty scale nvarant measure popular n tet retreval popular n mage processng 2 1/ 1 1 2 2 1, d k d k j k j k d k k k j s

SSE Crteron Functon Let n be the number of samples n D, and defne the mean of samples n s D 1 n Then the sum-of-squared errors crteron functon (to mnmze) s: c 2 JSSE D 1 D 1 2 Note that the number of clusters, c, s fed

SSE Crteron Functon J SSE c 1 D 2 SSE crteron approprate when data forms compact clouds that are relatvely well separated SSE crteron favors equally szed clusters, and may not be approprate when natural groupngs have very dfferent szes large J SSE small J SSE

Falure Eample for J SSE larger J SSE smaller J SSE The problem s that one of the natural clusters s not compact (the outer rng)

Other Mnmum Varance Crteron Functons We can elmnate constant terms from c 2 JSSE 1 2 c 1 D We get an equvalent crteron functon: J E n 1 y 2 1 n yd D d = average Eucldan dstance between all pars of samples n D Can obtan other crteron functons by replacng - y 2 by any other measure of dstance between ponts n D Alternatvely can replace d by the medan, mamum, etc. nstead of the average dstance 2

Mamum Dstance Crteron Consder J c 2 ma n ma y 1 yd, D Solves prevous case However J ma s not robust to outlers smallest J ma smallest J ma

Iteratve Optmzaton Algorthms Now have both promty measure and crteron functon, need algorthm to fnd the optmal clusterng Ehaustve search s mpossble, snce there are appromately c n /c! possble parttons Usually some teratve algorthm s used 1. Fnd a reasonable ntal partton 2. Repeat: move samples from one group to another s.t. the objectve functon J s mproved move samples to mprove J J = 777,777 J =666,666

K-means Clusterng We now consder an eample of teratve optmzaton algorthm for the specal case of J SSE objectve functon J SSE k 1 D 2 for a dfferent objectve functon, we need a dfferent optmzaton algorthm, of course F number of clusters to k (c = k) k-means s probably the most famous clusterng algorthm t has a smart way of movng from current parttonng to the net one

K-means Clusterng k = 3 1. Intalze pck k cluster centers arbtrary assgn each eample to closest center 2. compute sample means for each cluster 3. reassgn all samples to the closest mean 4. f clusters changed at step 3, go to step 2

K-means Clusterng Thus the algorthm converges after a fnte number of teratons of steps 2 and 3 However the algorthm s not guaranteed to fnd a global mnmum 1 2 2-means gets stuck here global mnmum of J SSE

Herarchcal Clusterng Up to now, consdered flat clusterng? For some data, herarchcal clusterng s more approprate than flat clusterng Herarchcal clusterng

Herarchcal Clusterng: Dendogram preferred way to represent a herarchcal clusterng s a dendrogram Bnary tree Level k corresponds to parttonng wth n-k+1 clusters f need k clusters, take clusterng from level n-k+1 If samples are n the same cluster at level k, they stay n the same cluster at hgher levels dendrogram typcally shows the smlarty of grouped clusters

Eample

Herarchcal Clusterng Algorthms for herarchcal clusterng can be dvded nto two types: 1. Agglomeratve (bottom up) procedures Start wth n sngleton clusters Form herarchy by mergng most smlar clusters 3 4 2 5 6 2. Dvsve (top bottom) procedures Start wth all samples n one cluster Form herarchy by splttng the worst clusters

Dvsve Herarchcal Clusterng Any flat algorthm whch produces a fed number of clusters can be used set c = 2

Agglomeratve Herarchcal Clusterng ntalze wth each eample n sngleton cluster whle there s more than 1 cluster 1. fnd 2 nearest clusters 2. merge them Four common ways to measure cluster dstance 1. mnmum dstance d D D y mn, j mn D, y D j 2. mamum dstance d, ma ma D D j y 3. average dstance 4. mean dstance D, y D j 1 D, Dj d avg y n n d mean j D yd j D D, j j

Sngle Lnkage or Nearest Neghbor Agglomeratve clusterng wth mnmum dstance d D D y 3 mn, j mn 5 D, y D j 1 2 4 generates mnmum spannng tree encourages growth of elongated clusters dsadvantage: very senstve to nose what we want at level wth c=3 what we get at level wth c=3 nosy sample

Complete Lnkage or Farthest Neghbor Agglomeratve clusterng wth mamum dstance d ma encourages compact clusters D, D ma y j D, y D j 1 2 3 4 Does not work well f elongated clusters present 5 D 1 D D 2 3 d ma D1,D2 < dma D2,D3 thus D 1 and D 2 are merged nstead of D 2 and D 3

Average and Mean Agglomeratve Clusterng Agglomeratve clusterng s more robust under the average or the mean cluster dstance 1 D, Dj d avg y n n d mean j D yd j D D, j j mean dstance s cheaper to compute than the average dstance unfortunately, there s not much to say about agglomeratve clusterng theoretcally, but t does work reasonably well n practce

Agglomeratve vs. Dvsve Agglomeratve s faster to compute, n general Dvsve may be less blnd to the global structure of the data Dvsve when takng the frst step (splt), have access to all the data; can fnd the best possble splt n 2 parts Agglomeratve when takng the frst step mergng, do not consder the global structure of the data, only look at parwse structure

Frst (?) Applcaton of Clusterng John Snow, a London physcan plotted the locaton of cholera deaths on a map durng an outbreak n the 1850s. The locatons ndcated that cases were clustered around certan ntersectons where there were polluted wells -- thus eposng both the problem and the soluton. From: Nna Mshra HP Labs

Applcaton of Clusterng Astronomy SkyCat: Clustered 210 9 sky objects nto stars, galaes, quasars, etc based on radaton emtted n dfferent spectrum bands. From: Nna Mshra HP Labs

Applcatons of Clusterng Image segmentaton Fnd nterestng objects n mages to focus attenton at From: Image Segmentaton by Nested Cuts, O. Veksler, CVPR2000

Applcatons of Clusterng Image Database Organzaton for effcent search

Applcatons of Clusterng Data Mnng Technology watch Derwent Database, contans all patents fled n the last 10 years worldwde Searchng by keywords leads to thousands of documents Fnd clusters n the database and fnd f there are any emergng technologes and what competton s up to Marketng Customer database Fnd clusters of customers and talor marketng schemes to them

Applcatons of Clusterng gene epresson profle clusterng smlar epressons, epect smlar functon U18675 4CL -0.151-0.207 0.126 0.359 0.208 0.091-0.083-0.209 M84697 a-tub 0.188 0.030 0.111 0.094-0.009-0.173-0.119-0.136 M95595 ACC2 0.000 0.041 0.000 0.000 0.000 0.000 0.000 0.000 X66719 ACO1 0.058 0.155 0.082 0.284 0.240 0.065-0.159-0.010 U41998 ACT 0.096-0.019 0.070 0.137 0.089 0.038 0.096-0.070 AF057044 ACX1 0.268 0.403 0.679 0.785 0.565 0.260 0.203 0.252 AF057043 ACX2 0.415 0.000-0.053 0.114 0.296 0.242 0.090 0.230 U40856 AIG1 0.096-0.106-0.027-0.026-0.005-0.052 0.054 0.006 U40857 AIG2 0.311 0.140 0.257 0.261 0.158 0.056-0.049 0.058 AF123253 AIM1-0.040 0.002-0.202-0.040 0.077 0.081 0.088 0.224 X92510 AOS 0.473 0.560 0.914 0.625 0.375 0.387 0.019 0.141 From:De Smet F., Mathys J., Marchal K., Thjs G., De Moor B. & Moreau Y. 2002. Adaptve Qualty-based clusterng of gene epresson profles, Bonformatcs, 18(6), 735-746.

Applcatons of Clusterng Proflng Web Users Use web access logs to generate a feature vector for each user Cluster users based on ther feature vectors Identfy common goals for users Shoppng Job Seekers Product Seekers Tutorals Seekers Can use clusterng results to mprovng web content and desgn

Summary Clusterng (nonparametrc unsupervsed learnng) s useful for dscoverng nherent structure n data Clusterng s mmensely useful n dfferent felds Clusterng comes naturally to humans (n up to 3 dmensons), but not so to computers It s very easy to desgn a clusterng algorthm, but t s very hard to say f t does anythng good General purpose clusterng does not est, for best results, clusterng should be tuned to applcaton at hand