1 MACHINE LEARNING Kernel for Clustering ernel K-Means
Outline of Today s Lecture 1. Review principle and steps of K-Means algorithm. Derive ernel version of K-means 3. Exercise: Discuss the geometrical division of the space generated by RBF versus Polynomial ernels
3 K-means Clustering: Iterative Technique C x 1 C 1 x 1 1. Initialization: for C, 1... K clusters, pic K arbitrary centroids and set their geometric mean to random values.
4 K-means Clustering: E-step (expectation-step) C x i i arg min d x, 1 C 1 x 1 i x i th data point geometric centroid Assignment Step: Calculate the distance from each data point to each centroid. Assign each data point to its closest centroid. If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest winning centroid).
5 Update step (M-Step): Recompute the position of centroid based on the assignment of the points K-means Clustering: M-step (maximization-step) x 1 x number of datapoints in cluster : 1 i i i x C x C x m C m 1 C 1 C
6 K-means Clustering: E-step (expectation-step) x 1 C 1 C i i arg min d x, x 1 i x i th data point geometric centroid Go bac to the assignment step and repeat the update step.
7 K-means Clustering x C 1 C 1 x 1 Stopping Criterion: stop the process when the centers are stable.
8 K-means Clustering Intersection points x x 1 K-means creates a hard partitioning of the dataset
9 K-means Clustering Intersection points x x 1 K-Means clustering generates K disjoint clusters by minimizing the following quadratic cost function: K 1 K,...,, with, J d x d x x i i i i 1 x c
10 MACHINE LEARNING 01 K-means Clustering: Advantages The algorithm is guaranteed to converge in a finite number of iterations (but it converges to a local optimum!) It is computationally cheap and faster than other clustering techniques - update step is ~O(N).
11 MACHINE LEARNING 01 K-means Clustering: Sensitivity Very sensitive to the choice of the number of clusters K (hyperparameter) and the initialization.
MACHINE LEARNING 01 K-means Clustering: Hyperparameters K-means with usual norm- distance perform linear separation. Changing the power p of the metric allows to generate non-linear boundaries i i p d x, ; p L x : Lp norm P=1 P= P=3 P=4 1
13 Kernel K-means Idea: The objective function of K-means is composed of an inner product across datapoints. One can replace the inner product with a ernel to perform inner product in feature space. Exploit the principle of the ernel to perform classical K-means clustering with norm- in feature space: This yields non-linear boundaries. This retains simplicity of computation of linear K-means.
Kernel K-means K Means algorithm minimizes the objective function : J m K 1 K j j x C,..., x with 1 x C : number of datapoints in cluster j C m x j M i x i 1 Project into a feature space J K 1,..., K x j 1 x C j We cannot observe the mean in feature space. Construct the mean in feature space using images of points in same cluster. x j C m x j 14
15 Kernel K-means J K 1,..., x K j l x C = j m 1 x C K j j x, x T x x l x j j l x C T l T j x x x t x l t l x, x C 1 x C K = x, x j j m, t m j j x C l j x, x j l x, x m i t t l x x x, x t l x, x C 1 x C Objective function in feature space m
16 Kernel K-means Kernel K-means algorithm is also an iterative procedure: 1. Initialization: pic K clusters. Assignment Step: Assign each data point to its closest centroid (E-step). j i j x, x x, x j j l x x arg min, arg min, l i i i x C x, x C d x C m m 3. Update Step: Update the list of points belonging to each centroid (M-step) 4. Go bac to step and repeat the process until the clusters are stable.
17 Kernel K-means: Exercise I Metric used in ernel K-means to determine cluster assigment: arg min d x, C min, x x, i j j l x x x, x j l i i i j x C x, x C m m a) Draw the partitioning of the space when using two datapoints in dimensions with a rbf ernel and with K=. Discuss the effect of the initialization. b) Do a with a homogeneous polynomial ernel with p=. Is the result affected by the placement of the datapoints?
3 Kernel K-means With a RBF ernel Cst of value 1 If xi is close to all points in cluster, this is close to 1. If the points are well grouped in cluster, this sum is close to 1. arg min, min, x x, i j j l x x x, x j l i i i j x C x, x C d x C m m
4 Kernel K-means: Exercise II Metric used in ernel K-means to determine cluster assigment: arg min d x, C min, x x, i j j l x x x, x j l i i i j x C x, x C m m i) Draw the partitioning of the space for the datapoints on the right figure when using a rbf ernel and K. ii) Discuss the effect of. iii) Discuss the effect of the initialization.
6 Kernel K-means: Exercise III Metric used in ernel K-means to determine cluster assigment: arg min, min, x x, i j j l x x x, x j l i i i j x C x, x C d x C m m a) Draw the partitioning of the space for these two clusters with a rbf ernel. b) Discuss the effect of the number of datapoints in each cluster.
8 Kernel K-means With a polynomial ernel Positive value A: Some of the terms change sign depending on the angle between the pair of datapoints. The relative effect of the terms depends on the position from the origin (norm). B: If the points are aligned in the same Quadran, the sum is maximal arg min, min, x x, i j j l x x x, x j l i i i j x C x, x C d x C m m. A datapoint will be assigned to the cluster in the closest partition.
9 Kernel K-means: Exercise IV Metric used in ernel K-means to determine cluster assigment: arg min d x, C min, x x, i j j l x x x, x j l i i i j x C x, x C m m Draw the partitioning of the space when using 4 datapoints (K=4) with a polynomial ernel. Is the result affected by the placement of the datapoints? What is the effect of the power of the polynomial p?
35 Kernel K-means: examples Rbf Kernel, Clusters
36 Kernel K-means: examples Rbf Kernel, Clusters
37 Kernel K-means: examples Rbf Kernel, Clusters Kernel width: 0.5 Kernel width: 0.05
38 Kernel K-means: Limitations Choice of number of Clusters in Kernel K-means is important
39 Kernel K-means: Limitations Choice of number of Clusters in Kernel K-means is important
40 Kernel K-means: Limitations Choice of number of Clusters in Kernel K-means is important
41 MACHINE LEARNING 01 Limitations of ernel K-means Raw Data
4 MACHINE LEARNING 01 Limitations of ernel K-means ernel K-means with K=, RBF ernel
43 Summary 1. Kernel K-means follows the same principle as K-means. It is an iterative procedure, ain to Expectation-Maximization.. As K-means, it depends on initialization of the center which is random. 3. As K-means, the solution depends on choosing well the number of clusters K.