An Introduction to PDF Estimation and Clustering

Similar documents
Clustering Lecture 5: Mixture Model

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING

Dynamic Thresholding for Image Analysis

Generative and discriminative classification techniques

Machine Learning Lecture 3

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

Machine Learning Lecture 3

Machine Learning Lecture 3

Segmentation Computer Vision Spring 2018, Lecture 27

Lecture 11: E-M and MeanShift. CAP 5415 Fall 2007

Introduction to Machine Learning CMU-10701

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Grouping and Segmentation

Image Segmentation. Shengnan Wang

Clustering web search results

Learning from Data Mixture Models

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Expectation Maximization (EM) and Gaussian Mixture Models

Clustering Color/Intensity. Group together pixels of similar color/intensity.

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Segmentation and Grouping

ECE 5424: Introduction to Machine Learning

Nonparametric Density Estimation

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Tracking Computer Vision Spring 2018, Lecture 24

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

MSA220 - Statistical Learning for Big Data

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Clustering. Supervised vs. Unsupervised Learning

Clustering. Shishir K. Shah

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Lecture: k-means & mean-shift clustering

Unsupervised Learning: Clustering

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Lecture: k-means & mean-shift clustering

AN AUTOMATED SEGMENTATION FRAMEWORK FOR BRAIN MRI VOLUMES BASED ON ADAPTIVE MEAN-SHIFT CLUSTERING

The K-modes and Laplacian K-modes algorithms for clustering

Methods for Intelligent Systems

Supervised vs. Unsupervised Learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Clustering. Discover groups such that samples within a group are more similar to each other than samples across groups.

Non-Parametric Modeling

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Note Set 4: Finite Mixture Models and the EM Algorithm

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm

Lecture 9: Hough Transform and Thresholding base Segmentation

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Generative and discriminative classification techniques

VIDEO MATTING USING MOTION EXTENDED GRABCUT

Nonparametric Methods

K-Means. Oct Youn-Hee Han

Cluster Analysis. Debashis Ghosh Department of Statistics Penn State University (based on slides from Jia Li, Dept. of Statistics)

Mixture Models and EM

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

CS 229 Midterm Review

Robert Collins CSE598G. Robert Collins CSE598G

Mean Shift Tracking. CS4243 Computer Vision and Pattern Recognition. Leow Wee Kheng

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Performance Measures

Idea. Found boundaries between regions (edges) Didn t return the actual region

1 Case study of SVM (Rob)

Lecture 3 January 22

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Clustering Part 4 DBSCAN

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

The Approach of Mean Shift based Cosine Dissimilarity for Multi-Recording Speaker Clustering

S(x) = arg min s i 2S kx

Perceptron as a graph

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

3. Data Structures for Image Analysis L AK S H M O U. E D U

K-Means Clustering Using Localized Histogram Analysis

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

K-Means Clustering 3/3/17

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Topic 5 - Joint distributions and the CLT

CS 490: Computer Vision Image Segmentation: Thresholding. Fall 2015 Dr. Michael J. Reale

Expectation Maximization: Inferring model parameters and class labels

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Monte Carlo Integration

University of Florida CISE department Gator Engineering. Clustering Part 4

K-means and Hierarchical Clustering

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Unit 3 : Image Segmentation

5. Feature Extraction from Images

Color Image Segmentation Using a Spatial K-Means Clustering Algorithm

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

[7.3, EA], [9.1, CMB]

K-Means and Gaussian Mixture Models

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Kernel Density Estimation

Transcription:

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 1 An Introduction to PDF Estimation and Clustering David Corrigan corrigad@tcd.ie Electrical and Electronic Engineering Dept., University of Dublin, Trinity College. See www.sigmedia.tv for more information.

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 2 PDF Estimation Quantify the characteristics of a signal, x[n], my measuring its PDF, p(x n = x) Ubiquitous in Signal Processing applications - image segmentation, restoration, texture synthesis. 0.015 Probability 0.01 0.005 0 0 50 100 150 200 250 300 Intensity

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 3 PDF Estimation Estimators fall into two categories 1. Parametric Estimation A known model for the PDF is fitted to the data (e.g. A gaussian distribution for a noise signal) The PDF is then represented by the parameters (mean, variance etc.) 2. Non-Parametric Estimation No assumed model for the PDF The PDF is estimated by measuring the signal A correct parametric model gives a better model from less data than non-parametric techniques.

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 4 Non-Parametric Estimators Best known estimator is the histogram. Finds the frequency (and hence probability) of a signal value lying in a range. 0.02 0.1 0.015 0.075 Probability 0.01 Probability 0.05 0.005 0.025 0 0 50 100 150 200 250 Intensity 0 0 50 100 150 200 250 Intensity Histogram with bin width of 1 Histogram with bin width of 5 Histograms are poor if they are not adequately populated. Can increases the bin width or smoothen the histogram.

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 5 Kernel Density Estimation Another non-parametric Estimator Given a signal x[n] the PDF is: p(x = x) = 1 Nh N ( ) x x[i] K h i=1 (1) K(x) is the Kernel and h is the bandwidth. Common Kernels include Guassian kernels and the Epanechnikov kernel K(x) = k(1 x 2 ) x 2 < 1 0 otherwise (2) K(x) k 0 The Epanechnikov Kernel 2 1 0 1 2 x

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 6 Kernel Density Estimation A Kernel Density Estimate is visually comparable to a smoothened histogram (but quite different in concept). The bandwidth controls the smoothness of the KDE. The PDF can be estimated at any signal value (real or complex). We dont need to worry about quantising the signal or choosing bin widths. It is slow. O(n) to estimate PDF at a signal value.

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 7 Gaussian Mixture Models (GMM s) The PDF is weighted a weighted sum of Gaussian Distributions (can be multivariate for vector valued signals) p(x = x) = The model has K components K π(k)n(x; µ(k), R(k)) (3) k=1 π(k) is the weight of each gaussian such that k π(k) = 1. µ(k) and R(k) are the mean and variance (or co-variance) of the k th component. To create the GMM certain questions need to be answered - 1. How many clusters do we choose? 2. What are the initial estimates for the weights, means and variances? 3. How do we make sure that we have the optimum model for our data? To answer these questions we need to talk a bit about clustering.

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 8 Clustering Clustering involves partitioning the set of signal values into subsets of similar values. Used in signal modelling, segmentation, vector quantisation, image compression,... Consider the following 2D vector-valued signal 50 10 100 150 200 250 5 300 dy 350 400 0 450 500 550 100 200 300 400 500 600 700 Vector-valued signal d(x) = (dx(x), dy(x)) 5 10 5 0 5 dx Scatter Plot of d(x).

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 9 k-means Clustering Algorithm that divides data into an arbitrary number of clusters, K. The algorithm attempts to minimize the distance, V, between each data value,x, and the cluster centroids. V = K k=1 j C k (x j c k ) where C k is the k th cluster (4) K-means operates as follows - 1. The user selects the number of clusters and assigns a value to each cluster centroid. 2. Every data point is assigned to the cluster of the nearest centroid. This partitions the data into clusters. 3. The centroids of the new clusters are estimated (by estimating the mean data value of each cluster). 4. Steps 2 and 3 are iterated until centroid values are suitably converged.

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 10 k-means Clustering Nice demo - http://home.dei.polimi.it/matteucc/clustering/tutorial_html/appletkm.html Matlab has its own kmeans function. 10 10 5 5 dy dy 0 0 5 10 5 0 5 dx Scatter Plot of d(x). 5 10 5 0 5 dx Partitioned Scatter Plot. Related Fuzzy C-Means algorithm allows data points to belong to one or more clusters.

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 11 Back to GMM s How many components to pick? Usually an arbitrary number but could use mean shift or watershed algorithms (later). How to estimate the initial GMM parameter values? Using a clustering algorithm like k-means to get rough clustering of the data set. The mean and covariance of each component is estimated on the corresponding cluster. The component weight is the fraction of the overall number of data points in the cluster. How to get the model parameters to best fit the data? By using the Expectation Maximisation (EM) algorithm.

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 12 Expectation Maximisation EM finds the maximum likelihood estimates for the model parameters. The algorithm has two steps. 1. The E-Step: The current model parameters are used to cluster the data. A maximum likelihood solution ˆk(x) = arg max π(k)n(x; µ(k), R(k)) (5) 2. The M-Step: From the data set and the clustered data, the new parameter values are estimated. Given the data, find the model parameters that best fit the clustering obtained from the E-Step. For a GMM this is simplifies to estimating the mean and covariance of the data points in each cluster. Again the weights are the fraction of points in the cluster. The parameters are optimised by alternating between the two steps until the parameters converge.

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 13 Expectation Maximisation The algorithm is broadly similar to k-means clustering. Both algorithms have a clustering stage followed by a parameter estimation stage. In k-means the euclidean distance from the centroid is used. In EM, the euclidean distance from the centroid (i.e. mean) is normalised by the component covariance and weight. Nice demo applet here http://lcn.epfl.ch/tutorial/english/gaussian/html/.

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 14 Mean Shift Mean Shift clusters the data by finding the peaks of the Kernel Density Estimate. The number of clusters is automatically determined. Remember the equation for the KDE f(x) = 1 Nh N ( ) x x[i] K h i=1 (6) At a peak the gradient of the data is 0. The gradient is then f(x) = 1 Nh N ( ) x x[i] K h i=1 (7)

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 15 Mean Shift If we use the Epanechnikov kernel for K(x), its gradient is linear inside the bandwidth. Therefore f(x) 1 N x[i] S h (x) (x[i] x) (8) S h (x) is an N-Dimensional sphere of radius h. The RHS of the equation 8 is the difference between the current point x and the mean of all the data points in the sphere centred on x. It is known as the mean shift vector.

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 16 Mean Shift There are some important things to consider The direction of the mean shift vector is the direction of the gradient. The direction gradient vector points in the direction of maximum change. The gradient vector at a peak is 0. Therefore the mean shift vector is also 0. The peak can be found by following the mean shift vector to regions of higher density until the mean shift vector is 0.

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 17 Mean Shift This can be implemented as follows 1. Pick a data point at random. 2. Find the mean of all points in the sphere centred on the data point. 3. Repeat by searching the sphere centred on the mean from step 2. 4. Stop when successive means are the same. The mean is the value of the peak. Clustering using mean shift. The chosen bandwidth is 2.5.

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 18 Mean Shift To cluster the data this procedure is applied to every point in the data. Every data point will have a characteristic peak. All data points with the same peak are assigned to a cluster. 10 5 dy 0 5 10 5 0 5 dx Clustering using mean shift. The chosen bandwidth is 2.5.

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 19 Mean Shift It s good: It tells us something about the complexity of the signal. No need to guess the number of clusters. Bandwidth parameter gives some degree of control. It s bad: The algorithm is very slow (O(n 2 )). The distance between every pair of points in the data set must be known. Several publications of attempted to address this including Akash. Tendency to get small clusters in regions of low density. Post processing maybe necessary.

Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 20 Reading For more information on KDE s and mean-shift Comaniciu and Meer. Mean Shift Analysis and Applications, http://www.caip.rutgers.edu/~comanici/papers/msanalysis.pdf. Akash s paper from ICIP 2008 on the Path-Assigned Mean-Shift algorithm. For k-means clustering Some lecture notes http://home.dei.polimi.it/matteucc/clustering/tutorial_html/kmeans.html For training GMM s using EM The wikipedia entry for Expectation Maximisation gives more detail on EM shows how it applies to GMM s.