APPLIED MACHINE LEARNING

Similar documents
Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Unsupervised Learning

Hierarchical clustering for gene expression data analysis

Unsupervised Learning and Clustering

Machine Learning. Topic 6: Clustering

CS 534: Computer Vision Model Fitting

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

Smoothing Spline ANOVA for variable screening

Feature Reduction and Selection

K-means and Hierarchical Clustering

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Support Vector Machines

Biostatistics 615/815

Machine Learning: Algorithms and Applications

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Fitting: Deformable contours April 26 th, 2018

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

CSE 326: Data Structures Quicksort Comparison Sorting Bound

Cluster Analysis of Electrical Behavior

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Clustering. A. Bellaachia Page: 1

CSE 326: Data Structures Quicksort Comparison Sorting Bound

Multi-stable Perception. Necker Cube

Announcements. Supervised Learning

LECTURE : MANIFOLD LEARNING

Unsupervised Learning and Clustering

Active Contours/Snakes

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Image Alignment CSC 767

LEAST SQUARES. RANSAC. HOUGH TRANSFORM.

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Lecture 4: Principal components

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Problem Set 3 Solutions

Programming in Fortran 90 : 2017/2018

cos(a, b) = at b a b. To get a distance measure, subtract the cosine similarity from one. dist(a, b) =1 cos(a, b)

EXTENDED BIC CRITERION FOR MODEL SELECTION

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.

Intelligent Information Acquisition for Improved Clustering

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Graph-based Clustering

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Performance Evaluation of Information Retrieval Systems

Detection of an Object by using Principal Component Analysis

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Support Vector Machines

Collaboratively Regularized Nearest Points for Set Based Recognition

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Angle-Independent 3D Reconstruction. Ji Zhang Mireille Boutin Daniel Aliaga

A Deflected Grid-based Algorithm for Clustering Analysis

Classifier Selection Based on Data Complexity Measures *

The Codesign Challenge

An Improved Image Segmentation Algorithm Based on the Otsu Method

5 The Primal-Dual Method

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Hierarchical agglomerative. Cluster Analysis. Christine Siedle Clustering 1

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Simplification of 3D Meshes

Backpropagation: In Search of Performance Parameters

Report on On-line Graph Coloring

Parallelism for Nested Loops with Non-uniform and Flow Dependences

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

Information Retrieval

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

SIGGRAPH Interactive Image Cutout. Interactive Graph Cut. Interactive Graph Cut. Interactive Graph Cut. Hard Constraints. Lazy Snapping.

Available online at ScienceDirect. Procedia Environmental Sciences 26 (2015 )

Clustering is a discovery process in data mining.

KOHONEN'S SELF ORGANIZING NETWORKS WITH "CONSCIENCE"

Region Segmentation Readings: Chapter 10: 10.1 Additional Materials Provided

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Sorting. Sorting. Why Sort? Consistent Ordering

Summarizing Data using Bottom-k Sketches

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming

Machine Learning 9. week

On the Efficiency of Swap-Based Clustering

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

Data Mining: Model Evaluation

CE 221 Data Structures and Algorithms

Topology Design using LS-TaSC Version 2 and LS-DYNA

A Robust Method for Estimating the Fundamental Matrix

Mathematics 256 a course in differential equations for engineering students

Clustering algorithms and validity measures

High Dimensional Data Clustering

Module Management Tool in Software Development Organizations

Efficient Video Coding with R-D Constrained Quadtree Segmentation

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

Lecture #15 Lecture Notes

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

A NEW FUZZY C-MEANS BASED SEGMENTATION STRATEGY. APPLICATIONS TO LIP REGION IDENTIFICATION

A Scalable Projective Bundle Adjustment Algorithm using the L Norm

Histogram based Evolutionary Dynamic Image Segmentation

Transcription:

Methods for Clusterng K-means, Soft K-means DBSCAN 1

Objectves Learn basc technques for data clusterng K-means and soft K-means, GMM (next lecture) DBSCAN Understand the ssues and major challenges n clusterng Choce of metrc Choce of number of clusters 2

What s clusterng? Clusterng s a type of multvarate statstcal analyss also nown as cluster analyss, unsupervsed classfcaton analyss, or numercal taxonomy. Clusterng s a process of parttonng a set of data (or objects) n a set of meanngful sub-classes, called clusters. Cluster: a collecton of data objects that are smlar to one another and thus can be treated collectvely as one group. 3

Classfcaton versus Clusterng Supervsed Classfcaton = Classfcaton We now the class labels and the number of classes. 1 2 3 1 2 3 Unsupervsed Classfcaton = Clusterng We do not now the class labels and may not now the number of classes.?????? 4

Classfcaton versus Clusterng Unsupervsed Classfcaton = Clusterng Hard problem when no par of objects have exactly the same feature. Need to determne how smlar two or more objects are to one another.????? 5

Whch clusters can you create? Whch two subgroups of pctures are smlar and why? 6

Whch clusters can you create? Whch two subgroups of pctures are smlar and why? 7

What s Good Clusterng? A good clusterng method produces hgh qualty clusters when: The ntra-class (that s, ntra-cluster) smlarty s hgh. The nter-class smlarty s low. The qualty measure of a cluster depends on the smlarty measure used! 8

Exercse: Person1 wth glasses Person1 wthout glasses Person2 wthout glasses Person2 wth glasses Intra-class smlarty s the hghest when: a) you choose to classfy mages wth and wthout glasses b) you choose to classfy mages of person1 aganst person2 9

Exercse: Person1 wth glasses Person1 wthout glasses Person2 wthout glasses Person2 wth glasses Projecton onto frst two prncpal components after PCA Intra-class smlarty s the hghest when: a) you choose to classfy mages wth and wthout glasses b) you choose to classfy mages of person1 aganst person2 10

Exercse: Person1 wth glasses Person1 wthout glasses Person2 wthout glasses Person2 wth glasses e1 e2 Projecton onto e1 aganst e2 The egenvector e1 s composed of a mx between the man characterstcs of the two faces and t s hence explanatory of both. However, snce both faces have lttle n common, the two groups have dfferent coordnates onto e1 but have quas dentcal coordnates for the glasses n each subgroup. Projectng onto e1 hence offers a mean to compute a metrc of smlarty across the two persons. 11

Exercse: Person1 wth glasses Person1 wthout glasses Person2 wthout glasses Person2 wth glasses e1 e2 e3 Projecton onto e1 aganst e3 When projectng onto e1 and e3, we can separate the mage of the person1 wth and wthout glasses, as the egenvector e3 embeds features dstnctve of person1 prmarly. 12

Exercse: Projecton onto frst two prncpal components after PCA Desgn a method to fnd out the groups when you no longer have the class labels? 13

Senstvty to Pror Knowledge Outlers (nose) x 3 Relevant Data x 1 x 2 Prors: Data cluster wthn a crcle There are 2 clusters 14

Senstvty to Pror Knowledge x 3 x 1 x 2 Prors: Data follow a complex dstrbuton There are 3 clusters 15

Clusters Types K-means produces globular clusters Globular Clusters Non-Globular Clusters DBSCAN produces nonglobular clusters 16

What s Good Clusterng? Requrements for good clusterng: Dscovery of clusters wth arbtrary shape Ablty to deal wth nose and outlers Insenstvty to nput records orderng Scalablty Hgh dmensonalty Interpretablty and reusablty 17

How to cluster? x 2 x 1 What choce of model (crcle, ellpse) for the cluster? How many models? 18

K-means Clusterng K-Means clusterng generates a number K of dsjont clusters to mmnze: x 2 J K 1 K,..., x 1 x c 2 x 1 x c th data pont geometrc centrod cluster label or number What choce of model (crcle, ellpse) for the cluster? Crcle How many models? Fxed number: K=2 Where to place them for optmal clusterng? 19

K-means Clusterng x 2 x 1 Intalzaton: ntalze at random the postons of the centers of the clusters In mldemos; centrods are ntalzed on one datapont wth no overlap across centrods. 20

x 2 K-means Clusterng arg mn d x, Responsblty of cluster for pont r 1 f 0 otherwse x x 1 x th data pont geometrc centrod Assgnment Step: Calculate the dstance from each data pont to each centrod. Assgn the responsblty of each data pont to ts closest centrod. If a te happens (.e. two centrods are equdstant to a data pont, one assgns the data pont to the smallest wnnng centrod). 21

x 2 K-means Clusterng arg mn d x, Responsblty of cluster for pont r 1 f 0 otherwse x x 1 rx r Update step (M-Step): Recompute the poston of centrod based on the assgnment of the ponts 22

x 2 K-means Clusterng arg mn d x, Responsblty of cluster for pont r 1 f 0 otherwse x x 1 rx r Assgnment Step: Calculate the dstance from each data pont to each centrod. Assgn the responsblty of each data pont to ts closest centrod. If a te happens (.e. two centrods are equdstant to a data pont, one assgns the data pont to the smallest wnnng centrod). 23

K-means Clusterng x 2 x 1 Update step (M-Step): Recompute the poston of centrod based on the assgnment of the ponts Stoppng Crteron: Go bac to step 2 and repeat the process untl the clusters are stable. 24

K-means Clusterng Intersecton ponts x 2 x 1 K-means creates a hard parttonng of the dataset 25

Effect of the dstance metrc on K-means L1-Norm L2-Norm L3-Norm L8-Norm 26

K-means Clusterng: Algorthm 1. Intalzaton: Pc K arbtrary centrods and set ther geometrc means to random values (n mldemos; centrods are ntalzed on one datapont wth no overlap across centrods). 2. Calculate the dstance from each data pont to each centrod. 3. Assgnment Step: Assgn the responsblty of each data pont to ts closest centrod (E-step). If a te happens (.e. two centrods are equdstant to a data pont, one assgns the data pont to the smallest wnnng centrod). 1 f arg mn, d x r 0 otherwse 4. Update Step: Adjust the centrods to be the means of all data ponts assgned to them (M-step) rx r 5. Go bac to step 2 and repeat the process untl the clusters are stable. 27

K-means Clusterng The algorthm of K-means s a smple verson of Expectaton-Maxmzaton appled to a model composed of sotropc Gauss functons (see next lecture) 28

K-means Clusterng: Propertes There are always K clusters. The clusters do not overlap. (soft K-means relaxes ths assumpton, see next sldes) Each member of a cluster s closer to ts cluster than to any other cluster. The algorthm s guaranteed to converge n a fnte number of teratons But t converges to a local optmum! It s hence very senstve to ntalzaton of the centrods. 29

Soft K-means Clusterng r : responsblty of cluster for pont x d, x e r [0,1], ' x 2 d, x e ' Normalzed over clusters: r 1 x 1 Assgnment Step (E-step): Calculate the dstance from each data pont to each centrod. Assgn the responsblty of each data pont to ts closest centrod. Each data pont to each of the means. x s gven a soft `degree of assgnment' 30

Soft K-means Clusterng x 2 r : responsblty of cluster for pont x d, x e r [0,1], ' d, x e ' Normalzed over clusters: r 1 Update step (M-Step): Recompute the poston of centrod based on the assgnment of the ponts The model parameters,.e. the means, are adjusted to match the weghted sample means of the data ponts that they are responsble for. x 1 r The update algorthm of the soft K-means s dentcal to that of the hard K-means, asde from the fact that the responsbltes to a partcular cluster are now real numbers varyng between 0 and 1. r x 31

Soft K-means Clusterng s the stffness 1 measures the dsparty across clusters r : responsblty of cluster for pont x d, x e r [0,1], ' d, x e ' Normalzed over clusters: r 1 small ~ large large ~ small 32

Soft K-means Clusterng 1 5 10 Soft K-means algorthm wth a small (left), medum (center) and large (rght) 33

Soft K-means Clusterng Iteratons of the Soft K-means algorthm from the random ntalzaton (left) to convergence (rght). Computed wth = 10. 34

(soft) K-means Clusterng: Propertes Advantages: Computatonally faster than other clusterng technques. Produces tghter clusters, especally f the clusters are globular. Guaranteed to converge. Drawbacs: Does not wor well wth non-globular clusters. Senstvty to choce of ntal parttons Dfferent ntal parttons can result n dfferent fnal clusters. Assumes a fxed number K of clusters. It s, therefore, good practce to run the algorthm several tmes usng dfferent K values, to determne the optmal number of clusters. 35

(soft) K-means Clusterng: Propertes Advantages: Computatonally faster than other clusterng technques. Produces tghter clusters, especally f the clusters are globular. Guaranteed to converge. Drawbacs: Does not wor well wth non-globular clusters. Senstvty to choce of ntal parttons Dfferent ntal parttons can result n dfferent fnal clusters. Assumes a fxed number K of clusters. It s, therefore, good practce to run the algorthm several tmes usng dfferent K values, to determne the optmal number of clusters. 36

K-means Clusterng: Weanesses Unbalanced clusters: K-means taes nto account only the dstance between the means and data ponts; t has no representaton of the varance of the data wthn each cluster. Elongated clusters: K-means mposes a fxed shape for each cluster (sphere). 37

K-means Clusterng: Weanesses Very senstve to the choce of the number of clusters K and the ntalzaton. Mldemos example 38

K-means: Lmtatons Outlers (nose) x 3 Relevant Data x 1 x 2 K-means would not be able to reject outlers 39

K-means: Lmtatons x 3 x 1 x 2 K-means would not be able to reject outlers K-means assgns all dataponts to a cluster Outlers get assgned to the closest cluster DBSCAN can determne outlers and can generate non-globular clusters 40

Densty Based Spatal Clusterng of Applcatons wth Nose (DBSCAN) e Outlers (nose) x 3 x 1 x 2 1. Pc a datapont at random 2. Compute number of dataponts wthn e 3. If < mdata, set ths datapont as outler 4. Go bac to 1 41

Densty Based Spatal Clusterng of Applcatons wth Nose (DBSCAN) x 3 Outlers (nose) Cluster 1 x 1 x 2 1. Pc a datapont at random 2. Compute number of dataponts wthn e 3. For each datapont found, assgn t to same cluster 4. Go bac to 1 42

Densty Based Spatal Clusterng of Applcatons wth Nose (DBSCAN) x 3 Outlers (nose) Cluster 1 Cluster 2 Cluster 1 x 1 x 2 1. Pc a datapont at random 2. Compute number of dataponts wthn e 3. For each datapont found, assgn t to same cluster 4. Merge two clusters f dstance between clusters < e 43

Densty Based Spatal Clusterng of Applcatons wth Nose (DBSCAN) x 3 Outlers (nose) Cluster 1 Cluster 2 Cluster 1 x 1 x 2 Hyperparameters: e: sze of neghborhood mdata: mnmum number of dataponts 44

46 Comparson: K-means / DBSCAN K-means DBSCAN Hyperparameters K: Nm of clusters e: sze, Mdata: mn. nm of dataponts Computatonal cost O(K*M) O(M*log(M)), M: nm dataponts Type of cluster Globular Non-globular (arbtrary shapes, nonlnear boundares) Robustness to nose Not robust Robust to outlers wthn e K-means s computatonal cheap. However, t s not robust to nose and produces only globular clusters. DBSCAN s computatonally ntensve, but t can detect automatcally nose and produces clusters of arbtrary shape. Both K-means and BDSCAN depend on choosng well the hyperparameters To determne the hyperparameters, use evaluaton methods for clusterng (next)

47 Evaluaton of Clusterng Methods Clusterng methods rely on hyper parameters Number of clusters, elements n the cluster, dstance metrc Need to determne the goodness of these choces Clusterng s unsupervsed classfcaton Do not now the real number of clusters and the data labels Dffcult to evaluate these choces wthout ground truth

4848 ADVANCED MACHINE LEARNING Evaluaton of Clusterng Methods Two types of measures: Internal versus external measures Internal measures rely on measures of smlarty: (low) ntra-cluster dstance versus (hgh) nter-cluster dstances Internal measures are problematc as the metrc of smlarty s often already optmzed by the clusterng algorthm. External measures rely on ground truth (class labels): Gven a (sub)-set of nown class labels compute smlarty of clusters to class labels. In real-world data, t s hard/nfeasble to gather ground truth.

49 Internal Measure: RSS Resdual Sum of Square RSS s an nternal measure (avalable n mldemos). It computes the dstance (n norm-2) of each datapont from ts centrod for all clusters. RSS= K 1 xc x 2

5050 ADVANCED MACHINE LEARNING RSS for K-Means Goal of K-means s to fnd cluster centers μ whch mnmze dstorton. RSS= K 1 xc x 2 Measure of Dstorton By K we RSS, what s the optmal K such that RSS 0? RSS = 0 when K = M. One has as many clusters as dataponts! M: 100 dataponts N: 2 dmensons RSS: 0 K: M clusters However, t can stll be used to determne an optmal K by montorng the slope of the decrease of the measure as K ncreases.

5151 ADVANCED MACHINE LEARNING K-means Clusterng: Examples Procedure: Run K-means ncrease monotoncally number of clusters run K- means wth several ntalzaton and tae best run; use RSS measure to measure mprovement n clusterng determne a plateau Optmal s at the elbow of the curve M: 100 dataponts N: 2 dmensons : 4 clusters

K-means wth RSS: Examples Cluster Analyss of Hedge Funds (fonds speculatfs) [N. Das, 9 th Int. Conf. on Computng Economs and Fnance, 2011] No legal defnton of Hedge funds - conssts of a wde category of nvestment funds wth hgh rs & hgh returns varety of strateges for gudng the nvestment Research Queston: classfy type of Hedge funds based on nformaton provded to the clent Data Dmenson (Features): such as: asset class, sze of the hedge fund, ncentve fee, rs-level, and lqudty of hedge funds 52

K-means wth RSS: Examples Cluster Analyss of Hedge Funds (fonds speculatfs) [N. Das, 9 th Int. Conf. on Computng Economs and Fnance, 2011] No legal defnton of Hedge funds - conssts of a wde category of nvestment funds wth hgh rs & hgh returns varety of strateges for gudng the nvestment Research Queston: classfy type of Hedge funds based on nformaton provded to the clent Data Dmenson (Features): such as: asset class, sze of the hedge fund, ncentve fee, rslevel, and lqudty of hedge funds Procedure: Run K-means ncrease monotoncally number of clusters run K-means wth several ntalzaton and tae best run; Cutoff Use RSS measure to measure mprovement n clusterng determne a plateau Number of Clusters (K) Optmal results are found wth 7 clusters. 53

5454 ADVANCED MACHINE LEARNING K-means Clusterng: Examples The elbow or plateau method for choosng the optmal from the RSS curve can be unrelable for certan datasets: : 2 Whch one s the optmal? : 11 M: 100 dataponts K: 3 dmensons We don t now! We need an addtonal penalty or crteron!

AIC and BIC determne how good the model fts the dataset n a probablstc sense (maxmum-lelhood measure). The measure s balanced by how many parameters are needed to get a good ft. - Aae Informaton Crteron: AIC= 2 ln L 2B - Bayesan Informaton Crteron: BIC 2ln L B ln M L: maxmum lelhood of the model B: number of free parameters M Other Metrcs to Evaluate Clusterng Methods : number of dataponts As the number of dataponts (observatons) ncrease, BIC assgns more weghts to smpler models than AIC. Low BIC mples ether fewer explanatory varables, better ft, or both. Penalty for an ncrease n computatonal costs due to number of parameters and number of dataponts Choosng AIC versus BIC depends on the applcaton: Is the purpose of the analyss to mae predctons, or to decde whch model best represents realty? AIC may have better predctve ablty than BIC, but BIC fnds a computatonally more effcent soluton. 55

5656 ADVANCED MACHINE LEARNING AIC for K-Means For the partcular case of K-means, we do not have a maxmum lelhood estmate of the model: AIC = 2 ln(l) + 2B L : lelhood of model B: number of free parameters However, we can formulate a metrc based on the RSS that penalzes for model complexty (# K-clusters), conceptually followng AIC: AIC RSS = RSS + B RSS= K 1 xc x 2 Weghtng Factor Number of free parameters B=(K*N) K: # clusters N: # dmensons

5757 ADVANCED MACHINE LEARNING BIC for K-Means For the partcular case of K-means, we do not have a maxmum lelhood estmate of the model: BIC = 2 ln(l) + ln(m)b However, we can formulate a metrc based on the RSS that penalzes for model complexty (# K-clusters, # M-dataponts), conceptually followng BIC: RSS= BIC RSS = RSS + ln(m) B K 1 xc x 2 Weghtng factor penalzes wrt. # dataponts (.e. computatonal complexty) Number of free parameters B=(K*N) K: # clusters N: # dmensons

5858 ADVANCED MACHINE LEARNING K-means Clusterng: Examples Procedure: Run K-means ncrease monotoncally number of clusters run K- means wth several ntalzaton and tae best run; use AIC/BIC curves to fnd the optmal, whch s mn AIC or mn(bic) Both mn(bic) and mn(aic) = 2 M: 100 dataponts N: 3 dmensons : 2 clusters

5959 ADVANCED MACHINE LEARNING M: 100 dataponts N: 2 dmensons K: 14 clusters BIC for K-Means BIC RSS = RSS + ln(m) (K N) : 14

6060 ADVANCED MACHINE LEARNING M: 100 dataponts N: 2 dmensons K: 4 clusters BIC for K-Means BIC RSS = RSS + ln(m) (K N) : 4

6161 ADVANCED MACHINE LEARNING AIC / BIC for DBSCAN Comput centrod of each cluster and apply AIC/BIC of K means DBSCAN large e DBSCAN medum e DBSCAN small e DBSCAN large e DBSCAN medum e DBSCAN small e RSS 43 26 0.5 BIC 42 34 78 AIC 69 51 24

6262 ADVANCED MACHINE LEARNING AIC / BIC for DBSCAN Comput centrod of each cluster and apply AIC/BIC of K means K-means DBSCAN large e DBSCAN medum e DBSCAN small e K-means DBSCAN large e DBSCAN medum e DBSCAN small e RSS 51 95 59 0.6 BIC 65 118 88 331 AIC 55 102 67 93

63 Evaluaton of Clusterng Methods Two types of measures: Internal versus external measures External measures assume that a subset of dataponts have class label sem-supervsed learnng They measure how well these dataponts are clustered. Needs to have an dea of the number of exstng classes and have labeled some dataponts Interestng only n cases when labelng s hghly tme-consumng when the data s very large (e.g. n speech recognton)

Sem-Supervsed Learnng Clusterng F1-Measure: (careful: smlar but not the same F-measure as the F-measure we wll see for classfcaton!) Tradeoff between clusterng correctly all dataponts of the same class n the same cluster and mang sure that each cluster contans ponts of only one class. M C K c, max F c, 1 1 c C M 1 : nm of labeled dataponts : the set of classes : nm of clusters, n : nm of members of class c and of cluster F C K F c, R c, P c, R c P c, P c, 2,, c R c n c n 64

Labeled Unlabeled Class 1 Class 2 2 4 Rc1, 1 1 R c2, 2 1 2 4 M C c, max F c, 1 1 c C M 1 : nm of labeled dataponts : the set of classes K : nm of clusters, n : nm of members of class c and of cluster F C K F c, R c, P c, R c P c, P c, 2,, c R c n c n 2 4 Pc1, 1 R c2, 2 6 6 Recall: proporton of dataponts correctly classfed/clusterzed Precson: proporton of dataponts of the same class n the cluster 65

Labeled Unlabeled Class 1 Class 2 2 4 F C, K F c1, 1 F c2, 2 0.7 6 6 M C c, max F c, 1 1 c C M 1 : nm of labeled dataponts : the set of classes K : nm of clusters, n : nm of members of class c and of cluster F C K F c, R c, P c, R c P c, P c, 2,, c R c n c n Penalze fracton of labeled ponts n each class Pcs for each class the cluster wth the maxmal F1 measure 66

Summary of F1-Measure Clusterng F1-Measure: (careful: smlar but not the same F-measure as the F-measure we wll see for classfcaton!) Tradeoff between clusterng correctly all dataponts of the same class n the same cluster and mang sure that each cluster contans ponts of only one class. M C c, max F c, 1 1 c C M 1 : nm of labeled dataponts : the set of classes K : nm of clusters, n : nm of members of class c and of cluster F C K F c, R c, P c, R c P c, P c, 2,, c R c n c n Penalze fracton of labeled ponts n each class Pcs for each class the cluster wth the maxmal F1 measure Recall: proporton of dataponts correctly classfed/clusterzed Precson: proporton of dataponts of the same class n the cluster 67

Summary of Lecture Introduced two clusterng technques: K-means and DBSCAN Dscussed pros and cons n terms of computatonal tme, power of representaton (globular/non-globular clusters) Introduced metrcs to evaluate clusterng and help to choose the hyperparameters: Internal measures (RSS, AIC, BIC) External measures: F1-measure (also called F-measure for clusterng) Next wee: Practcal on Clusterng: You wll compare performance of K-means and DBSCAN on your datasets and use the nternal and external measure to assess these performance and choose the hyperparameters. 68

Robotc Applcaton of Clusterng Method Varety of hand postures when graspng objects How to generate correct hand posture on robots? El-Khoury, S., Mao, L and Bllard, A. (2013) On the Generaton of a Varety of Grasps. Robotcs and Autonomous Systems Journal. 69

Robotc Applcaton of Clusterng Method 4 DOFs ndustral hand (Barrett Technology) 9 DOFs humanod hand (Cub Robot) Problem: Choose the pont of contact and generate feasble posture for the fngers to touch the object at the correct pont and wth the desred force. Dffculy: Hgh-degrees of freedom (large number of possble ponts of contact, large number of DOFs to control) 70

Formulate the problem as Constrant-Based Optmzaton : Mnmze generated torques at fngertps under constrants: Force closure Knematc feasblty Collson avodance Nonconvex optmzaton yelds several local / feasble solutons From 1890 trals converge to 791 feasble solutons From 1890 trals converge to 612 feasble solutons Too ~12.14s for each soluton Too ~2.65s. for each soluton! Too too long for realstc applcaton 71

Apply K-means on all solutons and group them nto clusters 11 Clusters 20 Clusters 72

A. Shula and A. Bllard, NIPS 2012 73

74

75