Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Similar documents
Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Clustering Lecture 8. David Sontag New York University. Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein

Clustering Lecture 14

The goals of segmentation

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

Segmentation. Bottom Up Segmentation

Segmentation (continued)

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

Segmentation Computer Vision Spring 2018, Lecture 27

CMPSCI 670: Computer Vision! Grouping

Unsupervised Learning: Clustering

Clustering. Unsupervised Learning

Clustering. Unsupervised Learning

Image Segmentation. Selim Aksoy. Bilkent University

Image Segmentation. Selim Aksoy. Bilkent University

Introduction to Machine Learning

Lecture: k-means & mean-shift clustering

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu. Lectures 21 & 22 Segmentation and clustering

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

CS 4495 Computer Vision. Segmentation. Aaron Bobick (slides by Tucker Hermans) School of Interactive Computing. Segmentation

Lecture: k-means & mean-shift clustering

Clustering. Unsupervised Learning

Clustering: K-means and Kernel K-means

Lecture 13: k-means and mean-shift clustering

Visual Representations for Machine Learning

Segmentation and Grouping April 19 th, 2018

Grouping and Segmentation

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Lecture 7: Segmentation. Thursday, Sept 20

Segmentation & Clustering

CSE 5243 INTRO. TO DATA MINING

Outline. Segmentation & Grouping. Examples of grouping in vision. Grouping in vision. Grouping in vision 2/9/2011. CS 376 Lecture 7 Segmentation 1

Machine learning - HT Clustering

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

University of Florida CISE department Gator Engineering. Clustering Part 2

Segmentation and Grouping

Segmentation & Grouping Kristen Grauman UT Austin. Announcements

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

CS 2770: Computer Vision. Edges and Segments. Prof. Adriana Kovashka University of Pittsburgh February 21, 2017

CSE 5243 INTRO. TO DATA MINING

Grouping and Segmentation

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Segmentation and Grouping April 21 st, 2015

SGN (4 cr) Chapter 11

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

Image Segmentation continued Graph Based Methods

Mining Social Network Graphs

Aarti Singh. Machine Learning / Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg

Targil 12 : Image Segmentation. Image segmentation. Why do we need it? Image segmentation

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

Segmentation: Clustering, Graph Cut and EM

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Hierarchical Clustering 4/5/17

CS 534: Computer Vision Segmentation and Perceptual Grouping

Clustering. Discover groups such that samples within a group are more similar to each other than samples across groups.

Unsupervised Learning and Clustering

Unsupervised Learning

Segmentation and Grouping

Lecture 10: Semantic Segmentation and Clustering

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Clustering Lecture 3: Hierarchical Methods

Cluster Analysis. Ying Shen, SSE, Tongji University

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Segmentation and Grouping

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Cluster Analysis: Agglomerate Hierarchical Clustering

Unsupervised Learning and Clustering

Spectral Clustering on Handwritten Digits Database

Clustering and Dimensionality Reduction

Perceptron as a graph

Network Traffic Measurements and Analysis

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Lecture 16 Segmentation and Scene understanding

Clustering Algorithms. Margareta Ackerman

Hierarchical clustering

Image Segmentation continued Graph Based Methods. Some slides: courtesy of O. Capms, Penn State, J.Ponce and D. Fortsyth, Computer Vision Book

Clustering in Data Mining

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

CS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003

Clustering algorithms and introduction to persistent homology

Clustering CS 550: Machine Learning

Clustering. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 238

Lecture 11: E-M and MeanShift. CAP 5415 Fall 2007

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

CS 343H: Honors AI. Lecture 23: Kernels and clustering 4/15/2014. Kristen Grauman UT Austin

CS 343: Artificial Intelligence

Clustering. K-means clustering

Instance-based Learning

Lecture 8: Fitting. Tuesday, Sept 25

Hierarchical Clustering

Kernels and Clustering

Image Segmentation. Srikumar Ramalingam School of Computing University of Utah. Slides borrowed from Ross Whitaker

Clustering COMS 4771

Clustering and Visualisation of Data

Transcription:

So far in the course Clustering Subhransu Maji : Machine Learning 2 April 2015 7 April 2015 Supervised learning: learning with a teacher You had training data which was (feature, label) pairs and the goal was to learn a mapping from features to labels Unsupervised learning: learning without a teacher Only features and no labels Why is unsupervised learning useful? Discover hidden structures in the data clustering Visualization dimensionality reduction lower dimensional features might help learning today 2/48 Clustering Basic idea: group together similar instances Example: 2D points Clustering Basic idea: group together similar instances Example: 2D points What could similar mean? One option: small Euclidean distance (squared) dist(x, y) = x y 2 2 Clustering results are crucially dependent on the measure of similarity (or distance) between points to be clustered 3/48 4/48

Clustering algorithms Clustering examples Simple clustering: organize elements into k groups K-means Mean shift Spectral clustering Image segmentation: break up the image into similar regions Hierarchical clustering: organize elements into a hierarchy Bottom up - agglomerative Top down - divisive image credit: Berkeley segmentation benchmark 5/48 Clustering examples 6/48 Clustering examples Clustering news articles Clustering queries 7/48 8/48

Clustering examples Clustering people by space and time Clustering examples Clustering species (phylogeny) phylogeny of canid species (dogs, wolves, foxes, jackals, etc) image credit: Pilho Kim 9/48 [K. Lindblad-Toh, Nature 2005] 10/48 Clustering using k-means Given (x1, x2,, xn) partition the n observations into k ( n) sets S = {S1, S2,, Sk} so as to minimize the within-cluster sum of squared distances The objective is to minimize: arg min S kx X i=1 x2s i x µ i 2 cluster center Lloyd s algorithm for k-means Initialize k centers by picking k points randomly among all the points Repeat till convergence (or max iterations) Assign each point to the nearest center (assignment step) arg min S kx X i=1 Estimate the mean of each group (update step) arg min S x2s i x µ i 2 kx X i=1 x2s i x µ i 2 11/48 12/48

k-means in action Properties of the Lloyd s algorithm Guaranteed to converge in a finite number of iterations The objective decreases monotonically over time Local minima if the partitions don t change. Since there are finitely many partitions, k-means algorithm must converge Running time per iteration Assignment step: O(NKD) Computing cluster mean: O(ND) Issues with the algorithm: Worst case running time is super-polynomial in input size No guarantees about global optimality Optimal clustering even for 2 clusters is NP-hard [Aloise et al., 09] http://simplystatistics.org/2014/02/18/k-means-clustering-in-a-gif/ 13/48 k-means++ 14/48 k-means++ algorithm A way to pick the good initial centers Intuition: spread out the k initial cluster centers The algorithm proceeds normally once the centers are initialized k-means++ algorithm for initialization: 1.Chose one center uniformly at random among all the points 2.For each point x, compute D(x), the distance between x and the nearest center that has already been chosen 3.Chose one new data point at random as a new center, using a weighted probability distribution where a point x is chosen with a probability proportional to D(x) 2 4.Repeat Steps 2 and 3 until k centers have been chosen [Arthur and Vassilvitskii 07] The approximation quality is O(log k) in expectation http://en.wikipedia.org/wiki/k-means%2b%2b 15/48 k-means for image segmentation Grouping pixels based on intensity similarity feature space: intensity value (1D) K=2 K=3 16/48

Clustering using density estimation One issue with k-means is that it is sometimes hard to pick k The mean shift algorithm seeks modes or local maxima of density in the feature space automatically determines the number of clusters Mean shift algorithm Mean shift procedure: For each point, repeat till convergence Compute mean shift vector Translate the kernel by m(x)# Simply following the gradient of density x xi 2 exp h Kernel density estimator K(x) = 1 X x xi 2 exp Z h i Small h implies more modes (bumpy distribution) 17/48 2 n # x - x $ " i % xig & ' i= 1 h ( % & m( x) = ) * % x 2 & n % # x - x $ i & % g ' h ( & ) * i= 1, - 18/48 Mean shift Mean shift Mean Shift vector Mean Shift vector 19/48 20/48

Mean shift Mean shift Mean Shift vector Mean Shift vector 21/48 22/48 Mean shift Mean shift Mean Shift vector Mean Shift vector 23/48 24/48

Mean shift Mean shift clustering Cluster all data points in the attraction basin of a mode Attraction basin is the region for which all trajectories lead to the same mode correspond to clusters 25/48 26/48 Mean shift for image segmentation Mean shift clustering results Feature: L*u*v* color values Initialize s at individual feature points Perform mean shift for each until convergence Merge s that end up near the same peak or mode http://www.caip.rutgers.edu/~comanici/mspami/mspamiresults.html 27/48 28/48

Mean shift discussion Spectral clustering Pros: Does not assume shape on clusters One parameter choice ( size) Generic technique Finds multiple modes Cons: Selection of size Is rather expensive: O(DN 2 ) per iteration Does not work well for high-dimensional features Kristen CMPSCI Grauman 689 29/48 [Shi & Malik 00; Ng, Jordan, Weiss NIPS 01] 30/48 Spectral clustering [Figures from Ng, Jordan, Weiss NIPS 01] 31/48 Spectral clustering Group points based on the links in a graph How do we create the graph? Weights on the edges based on similarity between the points A common choice is the Gaussian kernel One could create A fully connected graph B A xi x j 2 W (i, j) =exp 2 2 k-nearest graph (each node is connected only to its k-nearest neighbors) slide credit: Alan Fern 32/48

Graph cut Consider a partition of the graph into two parts A and B Cut(A, B) is the weight of all edges that connect the two groups Cut(A, B) = X i2a,j2b W (i, j) =0.3 An intuitive goal is to find a partition that minimizes the cut min-cuts in graphs can be computed in polynomial time 33/48 Problem with min-cut The weight of a cut is proportional to number of edges in the cut; tends to produce small, isolated components. [Shi & Malik, 2000 PAMI] We would like a balanced cut 34/48 Graphs as matrices Let W(i, j) denote the matrix of the edge weights The degree of node in the graph is: d(i) = X j W (i, j) The volume of a set A is defined as: Vol(A) = X i2a d(i) Normalized cut Intuition: consider the connectivity between the groups relative to the volume of each group: NCut(A, B) = Cut(A, B) Vol(A) NCut(A, B) =Cut(A, B) Unfortunately minimizing normalized cut is NP-Hard even for planar graphs [Shi & Malik, 00] + Cut(A, B) Vol(B) Vol(A) + Vol(B) Vol(A)Vol(B) minimized when Vol(A) = Vol(B) encouraging a balanced cut 35/48 36/48

Solving normalized cuts We will formulate an optimization problem Let W be the similarity matrix Let D be a diagonal matrix with D(i,i) = d(i) the degree of node i Let x be a vector {1, -1} N, x(i) = 1 i A The matrix (D-W) is called the Laplacian of the graph With some simplification we can show that the problem of minimizing normalized cuts can be written as: y T (D W )y min NCut(x) =min x y y T Dy subject to: y T D1 =0 y(i) 2 {1, b} Solving normalized cuts Normalized cuts objective: Relax the integer constraint on y: Same as: y T (D W )y min NCut(x) =min x y y T Dy (D W )y = Dy (D W )1 =0 subject to: y T D1 =0 y(i) 2 {1, b} min y y T (D W )y; subject to: y T Dy =1, y T D1 =0 (Generalized eigenvalue problem) Note that, so the first eigenvector is y1 = 1, with the corresponding eigenvalue of 0 The eigenvector corresponding to the second smallest eigenvalue is the solution to the relaxed problem 37/48 38/48 Spectral clustering example Data: Gaussian weighted edges connected to 3 nearest neighbors Spectral clustering example Components of the eigenvector corresponding to the second smallest eigenvalue 39/48 40/48

Creating partitions from eigenvalues Hierarchical clustering The eigenvalue is real valued, so to obtain a split you may threshold it How to pick the threshold? Pick the median value Choose a threshold that minimizes the normalized cut objective How to create multiple partitions? Recursively split each partition Compute multiple eigenvalues and cluster them using k-means Example: multiple eigenvalues of an image and their gradients http://ttic.uchicago.edu/~mmaire/papers/pdf/amfm_tpami2011.pdf 41/48 Hierarchical clustering 42/48 Agglomerative clustering Organize elements into a hierarchy Two kinds of methods: Agglomerative: a bottom up approach where elements start as individual clusters and clusters are merged as one moves up the hierarchy Divisive: a top down approach where elements start as a single cluster and clusters are split as one moves down the hierarchy Agglomerative clustering: First merge very similar instances Incrementally build larger clusters out of smaller clusters Algorithm: Maintain a set of clusters Initially, each instance in its own cluster Repeat: Pick the two closest clusters Merge them into a new cluster Stop when there s only one cluster left Produces not one clustering, but a family of clusterings represented by a dendrogram 43/48 44/48

Agglomerative clustering How should we define closest for clusters with multiple elements? Many options: Closest pair: single-link clustering Farthest pair: complete-link clustering Average of all pairs Agglomerative clustering Different choices create different clustering behaviors 45/48 46/48 Summary Clustering is an example of unsupervised learning Partitions or hierarchy Several partitioning algorithms: k-means: simple, efficient and often works in practice k-means++ for better initialization mean shift: modes of density slow but suited for problems with unknown number of clusters with varying shapes and sizes spectral clustering: clustering as graph partitions solve (D - W)x = λdx followed by k-means Hierarchical clustering methods: Agglomerative or divisive single-link, complete-link and average-link Slides credit Slides adapted from David Sontag, Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein, James Hays, Alan Fern, and Tommi Jaakkola Many images are from the Berkeley segmentation benchmark http://www.eecs.berkeley.edu/research/projects/cs/vision/bsds Normalized cuts image segmentation: http://www.timotheecour.com/research.html 47/48 48/48