CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

Similar documents
Segmentation and Grouping

Segmentation and Grouping April 19 th, 2018

CS 2770: Computer Vision. Edges and Segments. Prof. Adriana Kovashka University of Pittsburgh February 21, 2017

Outline. Segmentation & Grouping. Examples of grouping in vision. Grouping in vision. Grouping in vision 2/9/2011. CS 376 Lecture 7 Segmentation 1

CS 4495 Computer Vision. Segmentation. Aaron Bobick (slides by Tucker Hermans) School of Interactive Computing. Segmentation

Segmentation and Grouping April 21 st, 2015

Lecture: k-means & mean-shift clustering

Grouping and Segmentation

Lecture: k-means & mean-shift clustering

Segmentation. Bottom Up Segmentation

Segmentation & Grouping Kristen Grauman UT Austin. Announcements

Segmentation (continued)

The goals of segmentation

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

Lecture 13: k-means and mean-shift clustering

CMPSCI 670: Computer Vision! Grouping

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu. Lectures 21 & 22 Segmentation and clustering

Segmentation & Clustering

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Lecture 7: Segmentation. Thursday, Sept 20

Segmentation Computer Vision Spring 2018, Lecture 27

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Grouping and Segmentation

Digital Image Processing

Segmentation and Grouping

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Lecture 16 Segmentation and Scene understanding

Lecture 10: Semantic Segmentation and Clustering

Image Segmentation continued Graph Based Methods. Some slides: courtesy of O. Capms, Penn State, J.Ponce and D. Fortsyth, Computer Vision Book

Image Segmentation. Selim Aksoy. Bilkent University

Image Segmentation. Selim Aksoy. Bilkent University

Computer Vision Lecture 6

Based on Raymond J. Mooney s slides

Clustering. Discover groups such that samples within a group are more similar to each other than samples across groups.

Fitting: Voting and the Hough Transform April 23 rd, Yong Jae Lee UC Davis

Lecture 8: Fitting. Tuesday, Sept 25

Hierarchical Clustering 4/5/17

Announcements. Image Segmentation. From images to objects. Extracting objects. Status reports next Thursday ~5min presentations in class

Clust Clus e t ring 2 Nov

Clustering Color/Intensity. Group together pixels of similar color/intensity.

Computer Vision Lecture 6

Data Informatics. Seon Ho Kim, Ph.D.

CSE 5243 INTRO. TO DATA MINING

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Cluster Analysis. Ying Shen, SSE, Tongji University

What to come. There will be a few more topics we will cover on supervised learning

Image Analysis. Segmentation by Clustering

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

Announcements. Segmentation & Grouping. Review questions. Outline. Grouping in vision. Examples of grouping in vision 9/21/2015

CSE 5243 INTRO. TO DATA MINING

2%34 #5 +,,% ! # %& ()% #% +,,%. & /%0%)( 1 ! # %& % %()# +(& ,.+/ +&0%//#/ &

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Image Segmentation continued Graph Based Methods

Clustering algorithms

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Unsupervised learning in Vision

Clustering CS 550: Machine Learning

Unsupervised Learning Partitioning Methods

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Image Segmentation. Marc Pollefeys. ETH Zurich. Slide credits: V. Ferrari, K. Grauman, B. Leibe, S. Lazebnik, S. Seitz,Y Boykov, W. Freeman, P.

ECS 289H: Visual Recognition Fall Yong Jae Lee Department of Computer Science

Dr. Ulas Bagci

Targil 12 : Image Segmentation. Image segmentation. Why do we need it? Image segmentation

Information Retrieval and Organisation

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

Segmentation and low-level grouping.

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

University of Florida CISE department Gator Engineering. Clustering Part 2

Bias-Variance Trade-off (cont d) + Image Representations

Clustering CE-324: Modern Information Retrieval Sharif University of Technology

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Idea. Found boundaries between regions (edges) Didn t return the actual region

Kernels and Clustering

Image Segmentation. Computer Vision Jia-Bin Huang, Virginia Tech. Many slides from D. Hoiem

Image Segmentation. Shengnan Wang

Image Segmentation. Luc Van Gool, ETH Zurich. Vittorio Ferrari, Un. of Edinburgh. With important contributions by

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

10701 Machine Learning. Clustering

Unsupervised Learning and Clustering

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Data Clustering. Danushka Bollegala

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

Nonparametric Clustering of High Dimensional Data

Clustering part II 1

CS 343: Artificial Intelligence

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li


Lecture 11: E-M and MeanShift. CAP 5415 Fall 2007

Machine Learning (BSMC-GA 4439) Wenke Liu

Unsupervised Learning

From Pixels to Blobs

Introduction to Data Mining

Supervised Learning: Nearest Neighbors

Transcription:

CS 2750: Machine Learning Clustering Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

What is clustering? Grouping items that belong together (i.e. have similar features) Unsupervised: we only use the features X, not the labels Y This is useful because we may not have any labels but we can still detect patterns

Summarizing data Why do we cluster? Look at large amounts of data Represent a large continuous vector with the cluster number Counting Computing feature histograms Prediction Data points in the same cluster may have the same labels Slide credit: J. Hays, D. Hoiem

Count Counting and classification via clustering Compute a histogram to summarize the data After clustering, ask a human to label each group (cluster) 1 cat 2 panda 3 giraffe Group ID

Image segmentation via clustering Separate image into coherent objects image human segmentation Source: L. Lazebnik

Unsupervised discovery??? sky sky sky driveway house? grass grass house truck fence grass house?? driveway driveway We don t know what the objects in red boxes are, but we know they tend to occur in similar context If features = the context, objects in red will cluster together Then ask human for a label on one example from the cluster, and keep learning new object categories iteratively Lee and Grauman, Object-Graphs for Context-Aware Category Discovery, CVPR 2010

Today and next class Clustering: motivation and applications Algorithms K-means (iterate between finding centers and assigning points) Mean-shift (find modes in the data) Hierarchical clustering (start with all points in separate clusters and merge) Normalized cuts (split nodes in a graph based on similarity)

pixel count Image segmentation: toy example 1 2 3 black pixels gray pixels white pixels input image intensity These intensities define the three groups. We could label every pixel in the image according to which of these primary intensities it is. i.e., segment the image based on the intensity feature. What if the image isn t quite so simple? Source: K. Grauman

pixel count pixel count Now how to determine the three main intensities that define our groups? We need to cluster. input image intensity input image intensity Source: K. Grauman

0 190 255 intensity 1 2 3 Goal: choose three centers as the representative intensities, and label every pixel according to which of these centers it is nearest to. Best cluster centers are those that minimize SSD between all points and their nearest cluster center c i : Source: K. Grauman

Clustering With this objective, it is a chicken and egg problem: If we knew the cluster centers, we could allocate points to groups by assigning each to its closest center. If we knew the group memberships, we could get the centers by computing the mean per group. Source: K. Grauman

K-means clustering Basic idea: randomly initialize the k cluster centers, and iterate between the two steps we just saw. 1. Randomly initialize the cluster centers, c 1,..., c K 2. Given cluster centers, determine points in each cluster For each point p, find the closest c i. Put p into cluster i 3. Given points in each cluster, solve for c i Set c i to be the mean of points in cluster i 4. If c i have changed, repeat Step 2 Properties Will always converge to some solution Can be a local minimum does not always find the global minimum of objective function: Source: Steve Seitz

Source: A. Moore

Source: A. Moore

Source: A. Moore

Source: A. Moore

Source: A. Moore

K-means converges to a local minimum Figure from Wikipedia

K-means clustering Java demo http://home.dei.polimi.it/matteucc/clustering/tutoria l_html/appletkm.html Matlab demo http://www.cs.pitt.edu/~kovashka/cs1699/kmeans_ demo.m

Time Complexity Let n = number of instances, d = dimensionality of the features, k = number of clusters Assume computing distance between two instances is O(d) Reassigning clusters: O(kn) distance computations, or O(knd) Computing centroids: Each instance vector gets added once to a centroid: O(nd) Assume these two steps are each done once for a fixed number of iterations I: O(Iknd) Linear in all relevant factors Adapted from Ray Mooney

Another way of writing objective K-means: Let r nk = 1 if instance n belongs to cluster k, 0 otherwise K-medoids (more general distances):

Distance Metrics Euclidian distance (L 2 norm): L L 1 norm: m 2 2 ( x, y) ( x i yi ) i 1 L1 ( x, y) m i 1 x i y i Cosine Similarity (transform to a distance by subtracting from 1): x y 1 x y Slide credit: Ray Mooney

Segmentation as clustering Depending on what we choose as the feature space, we can group pixels in different ways. Grouping pixels based on intensity similarity Feature space: intensity value (1-d) Source: K. Grauman

K=2 K=3 quantization of the feature space; segmentation label map Source: K. Grauman

Segmentation as clustering Depending on what we choose as the feature space, we can group pixels in different ways. Grouping pixels based on color similarity R=255 G=200 B=250 B G R=245 G=220 B=248 Feature space: color value (3-d) R R=15 G=189 B=2 R=3 G=12 B=2 Source: K. Grauman

K-means: pros and cons Pros Simple, fast to compute Converges to local minimum of within-cluster squared error Cons/issues Setting k? One way: silhouette coefficient Sensitive to initial centers Use heuristics or output of another method Sensitive to outliers Detects spherical clusters Adapted from K. Grauman

Today Clustering: motivation and applications Algorithms K-means (iterate between finding centers and assigning points) Mean-shift (find modes in the data) Hierarchical clustering (start with all points in separate clusters and merge) Normalized cuts (split nodes in a graph based on similarity)

Mean shift algorithm The mean shift algorithm seeks modes or local maxima of density in the feature space image Feature space (L*u*v* color values) Source: K. Grauman

Kernel density estimation Kernel Estimated density Adapted from D. Hoiem Data (1-D)

Mean shift Search window Center of mass Mean Shift vector Slide by Y. Ukrainitz & B. Sarel

Mean shift Search window Center of mass Mean Shift vector Slide by Y. Ukrainitz & B. Sarel

Mean shift Search window Center of mass Mean Shift vector Slide by Y. Ukrainitz & B. Sarel

Mean shift Search window Center of mass Mean Shift vector Slide by Y. Ukrainitz & B. Sarel

Mean shift Search window Center of mass Mean Shift vector Slide by Y. Ukrainitz & B. Sarel

Mean shift Search window Center of mass Mean Shift vector Slide by Y. Ukrainitz & B. Sarel

Mean shift Search window Center of mass Slide by Y. Ukrainitz & B. Sarel

Source: D. Hoiem Points in same cluster converge

Mean shift clustering Cluster: all data points in the attraction basin of a mode Attraction basin: the region for which all trajectories lead to the same mode Slide by Y. Ukrainitz & B. Sarel

Computing the Mean Shift Simple Mean Shift procedure: Compute mean shift vector Translate the Kernel window by m(x) Adapted from Y. Ukrainitz & B. Sarel

Mean shift clustering/segmentation Compute features for each point (intensity, word counts, etc) Initialize windows at individual feature points Perform mean shift for each window until convergence Merge windows that end up near the same peak or mode Source: D. Hoiem

Mean shift segmentation results http://www.caip.rutgers.edu/~comanici/mspami/mspamiresults.html

Mean shift segmentation results

Pros: Does not assume shape on clusters Robust to outliers Cons/issues: Mean shift Need to choose window size Expensive: O(I n 2 d) Search for neighbors could be sped up in lower dimensions * Does not scale well with dimension of feature space * http://lear.inrialpes.fr/people/triggs/events/iccv03/cdrom/iccv03/0456_georgescu.pdf

Mean-shift reading Nicely written mean-shift explanation (with math) http://saravananthirumuruganathan.wordpress.com/2010/04/01/introduction-to-mean-shiftalgorithm/ Includes.m code for mean-shift clustering Mean-shift paper by Comaniciu and Meer http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.76.8968 Adaptive mean shift in higher dimensions http://lear.inrialpes.fr/people/triggs/events/iccv03/cdrom/iccv03/0456_georgescu.pdf https://pdfs.semanticscholar.org/d8a4/d6c60d0b833a82cd92059891a6980ff54526.pdf Source: K. Grauman

Today Clustering: motivation and applications Algorithms K-means (iterate between finding centers and assigning points) Mean-shift (find modes in the data) Hierarchical clustering (start with all points in separate clusters and merge) Normalized cuts (split nodes in a graph based on similarity)

Hierarchical Agglomerative Clustering (HAC) Assumes a similarity function for determining the similarity of two instances. Starts with all instances in separate clusters and then repeatedly joins the two clusters that are most similar until there is only one cluster. The history of merging forms a binary tree or hierarchy. Slide credit: Ray Mooney

HAC Algorithm Start with all instances in their own cluster. Until there is only one cluster: Among the current clusters, determine the two clusters, c i and c j, that are most similar. Replace c i and c j with a single cluster c i c j Slide credit: Ray Mooney

Agglomerative clustering

Agglomerative clustering

Agglomerative clustering

Agglomerative clustering

Agglomerative clustering

distance Agglomerative clustering How many clusters? - Clustering creates a dendrogram (a tree) - To get final clusters, pick a threshold - max number of clusters or - max distance within clusters (y axis) Adapted from J. Hays

Cluster Similarity How to compute similarity of two clusters each possibly containing multiple instances? Single Link: Similarity of two most similar members. sim( c i, c j ) max x c, y i c j sim( x, y) Complete Link: Similarity of two least similar members. sim( c i, c j ) x c min, y i c j sim( x, y) Group Average: Average similarity between members. Adapted from Ray Mooney

Today Clustering: motivation and applications Algorithms K-means (iterate between finding centers and assigning points) Mean-shift (find modes in the data) Hierarchical clustering (start with all points in separate clusters and merge) Normalized cuts (split nodes in a graph based on similarity)

Images as graphs q p w pq w Fully-connected graph node (vertex) for every pixel link between every pair of pixels, p,q affinity weight w pq for each link (edge) w pq measures similarity» similarity is inversely proportional to difference (in color and position ) Source: Steve Seitz

Segmentation by Graph Cuts q p w pq w A B C Break Graph into Segments Want to delete links that cross between segments Easiest to break links that have low similarity (low weight) similar pixels should be in the same segments dissimilar pixels should be in different segments Source: Steve Seitz

Cuts in a graph: Min cut Link Cut A set of links whose removal makes a graph disconnected cost of a cut: cut( A, B) Find minimum cut gives you a segmentation fast algorithms exist for doing this B w p, q p A, q B Source: Steve Seitz

Minimum cut Problem with minimum cut: Weight of cut proportional to number of edges in the cut; tends to produce small, isolated components. [Shi & Malik, 2000 PAMI] Source: K. Grauman

Cuts in a graph: Normalized cut A B Normalize for size of segments: cut( A, B) w p, q p A, q B cut( A, B) assoc( A, V ) cut( A, B) assoc( B, V ) assoc(a,v) = sum of weights of all edges that touch A Ncut value small when we get two clusters with many edges with high weights within them, and few edges of low weight between them J. Shi and J. Malik, Normalized Cuts and Image Segmentation, CVPR, 1997 Adapted from Steve Seitz

How to evaluate performance? Might depend on application Purity where and is the set of clusters is the set of classes http://nlp.stanford.edu/ir-book/html/htmledition/evaluation-of-clustering-1.html

How to evaluate performance See Murphy Sec. 25.1 for another two metrics (Rand index and mutual information)

Clustering Strategies K-means Iteratively re-assign points to the nearest cluster center Mean-shift clustering Estimate modes Graph cuts Split the nodes in a graph based on assigned links with similarity weights Agglomerative clustering Start with each point as its own cluster and iteratively merge the closest clusters