ADVANCED MACHINE LEARNING MACHINE LEARNING. Kernel for Clustering kernel K-Means

Similar documents
4. Ad-hoc I: Hierarchical clustering

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

CS 229 Midterm Review

Unsupervised Learning : Clustering

k-means Clustering David S. Rosenberg April 24, 2018 New York University

CSSE463: Image Recognition Day 21

CSE 5243 INTRO. TO DATA MINING

Unsupervised Learning Partitioning Methods

Introduction to Artificial Intelligence

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

k-means, k-means++ Barna Saha March 8, 2016

A Course in Machine Learning

Clustering: Centroid-Based Partitioning

Clustering. Chapter 10 in Introduction to statistical learning

k-means A classical clustering algorithm

Clustering: Classic Methods and Modern Views

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Clustering and Visualisation of Data

Introduction to Mobile Robotics

Clustering. (Part 2)

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray

Clustering: K-means and Kernel K-means

1 Case study of SVM (Rob)

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Cluster Analysis. Ying Shen, SSE, Tongji University

Applications of Linear Programming

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Introduction to Information Retrieval

INF 4300 Classification III Anne Solberg The agenda today:

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

CPSC 425: Computer Vision

The Simplex Algorithm for LP, and an Open Problem

K-means and Hierarchical Clustering

Flat Clustering. Slides are mostly from Hinrich Schütze. March 27, 2017

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

Search Engines. Information Retrieval in Practice

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Support vector machines

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Introduction to Machine Learning CMU-10701

Parallel Algorithms K means Clustering

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

University of Florida CISE department Gator Engineering. Clustering Part 2

Lecture 2 Convex Sets

Lecture 12 Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Introduction to Information Retrieval

CSE 5243 INTRO. TO DATA MINING

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Introduction to Computer Science

Support Vector Machines

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

CHAPTER 4: CLUSTER ANALYSIS

Hierarchical Clustering

Based on Raymond J. Mooney s slides

Document Clustering: Comparison of Similarity Measures

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

ECG782: Multidimensional Digital Signal Processing

Automatic Cluster Number Selection using a Split and Merge K-Means Approach

Robust PDF Table Locator

Clustering. Lecture 6, 1/24/03 ECS289A

Function approximation using RBF network. 10 basis functions and 25 data points.

CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16)

Behavioral Data Mining. Lecture 18 Clustering

An Unsupervised Technique for Statistical Data Analysis Using Data Mining

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Contents. I The Basic Framework for Stationary Problems 1

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole

Machine Learning (BSMC-GA 4439) Wenke Liu

Clustering CS 550: Machine Learning

Clustering. K-means clustering

Dynamic Collision Detection

Kernel Methods. Chapter 9 of A Course in Machine Learning by Hal Daumé III. Conversion to beamer by Fabrizio Riguzzi

Introduction to Data Mining

Machine Learning (BSMC-GA 4439) Wenke Liu

6. Dicretization methods 6.1 The purpose of discretization

Unsupervised Learning

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

In this lecture, we are going to talk about image segmentation, essentially defined as methods for grouping pixels together.

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

K-Means. Oct Youn-Hee Han

SUPPORT VECTOR MACHINES

Machine Learning. Unsupervised Learning. Manfred Huber

Open and Closed Sets

Machine Learning : Clustering, Self-Organizing Maps

9/17/2009. Wenyan Li (Emily Li) Sep. 15, Introduction to Clustering Analysis

Mathematical Programming and Research Methods (Part II)

ECLT 5810 Clustering

K-Means Clustering. Sargur Srihari

Copyright 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch.

Support Vector Machines

Transcription:

1 MACHINE LEARNING Kernel for Clustering ernel K-Means

Outline of Today s Lecture 1. Review principle and steps of K-Means algorithm. Derive ernel version of K-means 3. Exercise: Discuss the geometrical division of the space generated by RBF versus Polynomial ernels

3 K-means Clustering: Iterative Technique C x 1 C 1 x 1 1. Initialization: for C, 1... K clusters, pic K arbitrary centroids and set their geometric mean to random values.

4 K-means Clustering: E-step (expectation-step) C x i i arg min d x, 1 C 1 x 1 i x i th data point geometric centroid Assignment Step: Calculate the distance from each data point to each centroid. Assign each data point to its closest centroid. If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest winning centroid).

5 Update step (M-Step): Recompute the position of centroid based on the assignment of the points K-means Clustering: M-step (maximization-step) x 1 x number of datapoints in cluster : 1 i i i x C x C x m C m 1 C 1 C

6 K-means Clustering: E-step (expectation-step) x 1 C 1 C i i arg min d x, x 1 i x i th data point geometric centroid Go bac to the assignment step and repeat the update step.

7 K-means Clustering x C 1 C 1 x 1 Stopping Criterion: stop the process when the centers are stable.

8 K-means Clustering Intersection points x x 1 K-means creates a hard partitioning of the dataset

9 K-means Clustering Intersection points x x 1 K-Means clustering generates K disjoint clusters by minimizing the following quadratic cost function: K 1 K,...,, with, J d x d x x i i i i 1 x c

10 MACHINE LEARNING 01 K-means Clustering: Advantages The algorithm is guaranteed to converge in a finite number of iterations (but it converges to a local optimum!) It is computationally cheap and faster than other clustering techniques - update step is ~O(N).

11 MACHINE LEARNING 01 K-means Clustering: Sensitivity Very sensitive to the choice of the number of clusters K (hyperparameter) and the initialization.

MACHINE LEARNING 01 K-means Clustering: Hyperparameters K-means with usual norm- distance perform linear separation. Changing the power p of the metric allows to generate non-linear boundaries i i p d x, ; p L x : Lp norm P=1 P= P=3 P=4 1

13 Kernel K-means Idea: The objective function of K-means is composed of an inner product across datapoints. One can replace the inner product with a ernel to perform inner product in feature space. Exploit the principle of the ernel to perform classical K-means clustering with norm- in feature space: This yields non-linear boundaries. This retains simplicity of computation of linear K-means.

Kernel K-means K Means algorithm minimizes the objective function : J m K 1 K j j x C,..., x with 1 x C : number of datapoints in cluster j C m x j M i x i 1 Project into a feature space J K 1,..., K x j 1 x C j We cannot observe the mean in feature space. Construct the mean in feature space using images of points in same cluster. x j C m x j 14

15 Kernel K-means J K 1,..., x K j l x C = j m 1 x C K j j x, x T x x l x j j l x C T l T j x x x t x l t l x, x C 1 x C K = x, x j j m, t m j j x C l j x, x j l x, x m i t t l x x x, x t l x, x C 1 x C Objective function in feature space m

16 Kernel K-means Kernel K-means algorithm is also an iterative procedure: 1. Initialization: pic K clusters. Assignment Step: Assign each data point to its closest centroid (E-step). j i j x, x x, x j j l x x arg min, arg min, l i i i x C x, x C d x C m m 3. Update Step: Update the list of points belonging to each centroid (M-step) 4. Go bac to step and repeat the process until the clusters are stable.

17 Kernel K-means: Exercise I Metric used in ernel K-means to determine cluster assigment: arg min d x, C min, x x, i j j l x x x, x j l i i i j x C x, x C m m a) Draw the partitioning of the space when using two datapoints in dimensions with a rbf ernel and with K=. Discuss the effect of the initialization. b) Do a with a homogeneous polynomial ernel with p=. Is the result affected by the placement of the datapoints?

3 Kernel K-means With a RBF ernel Cst of value 1 If xi is close to all points in cluster, this is close to 1. If the points are well grouped in cluster, this sum is close to 1. arg min, min, x x, i j j l x x x, x j l i i i j x C x, x C d x C m m

4 Kernel K-means: Exercise II Metric used in ernel K-means to determine cluster assigment: arg min d x, C min, x x, i j j l x x x, x j l i i i j x C x, x C m m i) Draw the partitioning of the space for the datapoints on the right figure when using a rbf ernel and K. ii) Discuss the effect of. iii) Discuss the effect of the initialization.

6 Kernel K-means: Exercise III Metric used in ernel K-means to determine cluster assigment: arg min, min, x x, i j j l x x x, x j l i i i j x C x, x C d x C m m a) Draw the partitioning of the space for these two clusters with a rbf ernel. b) Discuss the effect of the number of datapoints in each cluster.

8 Kernel K-means With a polynomial ernel Positive value A: Some of the terms change sign depending on the angle between the pair of datapoints. The relative effect of the terms depends on the position from the origin (norm). B: If the points are aligned in the same Quadran, the sum is maximal arg min, min, x x, i j j l x x x, x j l i i i j x C x, x C d x C m m. A datapoint will be assigned to the cluster in the closest partition.

9 Kernel K-means: Exercise IV Metric used in ernel K-means to determine cluster assigment: arg min d x, C min, x x, i j j l x x x, x j l i i i j x C x, x C m m Draw the partitioning of the space when using 4 datapoints (K=4) with a polynomial ernel. Is the result affected by the placement of the datapoints? What is the effect of the power of the polynomial p?

35 Kernel K-means: examples Rbf Kernel, Clusters

36 Kernel K-means: examples Rbf Kernel, Clusters

37 Kernel K-means: examples Rbf Kernel, Clusters Kernel width: 0.5 Kernel width: 0.05

38 Kernel K-means: Limitations Choice of number of Clusters in Kernel K-means is important

39 Kernel K-means: Limitations Choice of number of Clusters in Kernel K-means is important

40 Kernel K-means: Limitations Choice of number of Clusters in Kernel K-means is important

41 MACHINE LEARNING 01 Limitations of ernel K-means Raw Data

4 MACHINE LEARNING 01 Limitations of ernel K-means ernel K-means with K=, RBF ernel

43 Summary 1. Kernel K-means follows the same principle as K-means. It is an iterative procedure, ain to Expectation-Maximization.. As K-means, it depends on initialization of the center which is random. 3. As K-means, the solution depends on choosing well the number of clusters K.