Intrinsic Dimensionality Estimation for Data Sets

Similar documents
Outlier Pursuit: Robust PCA and Collaborative Filtering

Diffusion Wavelets for Natural Image Analysis

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE

Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation

Local Linear Approximation for Kernel Methods: The Railway Kernel

CS 229 Midterm Review

Non-linear dimension reduction

New Approaches for EEG Source Localization and Dipole Moment Estimation. Shun Chi Wu, Yuchen Yao, A. Lee Swindlehurst University of California Irvine

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE

04 - Normal Estimation, Curves

The Pre-Image Problem in Kernel Methods

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

Gaussian and Mean Curvature Planar points: Zero Gaussian curvature and zero mean curvature Tangent plane intersects surface at infinity points Gauss C

Surface Reconstruction with MLS

Efficient Iterative Semi-supervised Classification on Manifold

Stable and Multiscale Topological Signatures

SOLVING PARTIAL DIFFERENTIAL EQUATIONS ON POINT CLOUDS

Data-driven modeling: A low-rank approximation problem

A Geometric Analysis of Subspace Clustering with Outliers

Stereo Matching for Calibrated Cameras without Correspondence

Topological Perspectives On Stratification Learning

Geodesic and parallel models for leaf shape

Kernels and representation

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Image Processing. Filtering. Slide 1

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22

Vulnerability of machine learning models to adversarial examples

Nonlinear Noise Reduction

International Journal of Computational Geometry & Applications c World Scientific Publishing Company

Basis Functions. Volker Tresp Summer 2017

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones

Super-resolution on Text Image Sequences

Shape analysis through geometric distributions

Shape Modeling and Geometry Processing

Automatic Singular Spectrum Analysis for Time-Series Decomposition

Image Denoising via Group Sparse Eigenvectors of Graph Laplacian

Surface Parameterization

Geometric Registration for Deformable Shapes 2.2 Deformable Registration

Non-parametric Regression Between Manifolds

ABSTRACT TO BE PRESENTED COMPUTATIONAL METHODS FOR ALGEBRAIC SPLINE SURFACES COMPASS II. September 14-16th, 2005

Edge and local feature detection - 2. Importance of edge detection in computer vision

Challenges motivating deep learning. Sargur N. Srihari

Feature scaling in support vector data description

topological data analysis and stochastic topology yuliy baryshnikov waikiki, march 2013

K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms

05 - Surfaces. Acknowledgements: Olga Sorkine-Hornung. CSCI-GA Geometric Modeling - Daniele Panozzo

WWW links for Mathematics 138A notes

Capturing, Modeling, Rendering 3D Structures

Topology-Invariant Similarity and Diffusion Geometry

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels

Parametrizing the easiness of machine learning problems. Sanjoy Dasgupta, UC San Diego

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

On Order-Constrained Transitive Distance

Compressed Point Cloud Visualizations

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

03 - Reconstruction. Acknowledgements: Olga Sorkine-Hornung. CSCI-GA Geometric Modeling - Spring 17 - Daniele Panozzo

Estimation of Intrinsic Dimension via Clustering

Data fusion and multi-cue data matching using diffusion maps

Texture Mapping using Surface Flattening via Multi-Dimensional Scaling

Observing Information: Applied Computational Topology.

Digital Geometry Processing

Summer School: Mathematical Methods in Robotics

Clustering: Classic Methods and Modern Views

Classification by Support Vector Machines

Machine Learning in Biology

Data-driven modeling: A low-rank approximation problem

Compressive Sensing for High-Dimensional Data

Object Classification Problem

GRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION

Content-based image and video analysis. Machine learning

Multiple-View Object Recognition in Band-Limited Distributed Camera Networks

Chapter 7: Competitive learning, clustering, and self-organizing maps

Unsupervised Learning

Robot Manifolds for Direct and Inverse Kinematics Solutions

The Fundamentals of Compressive Sensing

Motivation. Parametric Curves (later Surfaces) Outline. Tangents, Normals, Binormals. Arclength. Advanced Computer Graphics (Fall 2010)

Advanced Applied Multivariate Analysis

MERGING POINT CLOUDS FROM MULTIPLE KINECTS. Nishant Rai 13th July, 2016 CARIS Lab University of British Columbia

Topological Data Analysis - I. Afra Zomorodian Department of Computer Science Dartmouth College

Robotics Programming Laboratory

Time Series Clustering Ensemble Algorithm Based on Locality Preserving Projection

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,

Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis

Application of Proximal Algorithms to Three Dimensional Deconvolution Microscopy

Correspondence. CS 468 Geometry Processing Algorithms. Maks Ovsjanikov

Manifold Constrained Deep Neural Networks for ASR

A Stochastic Optimization Approach for Unsupervised Kernel Regression

Overview. Spectral Processing of Point- Sampled Geometry. Introduction. Introduction. Fourier Transform. Fourier Transform

Facial expression recognition using shape and texture information

CSE 250B Project Assignment 4

Dimension Induced Clustering

A fast direct solver for high frequency scattering from a cavity in two dimensions

SUPPLEMENTARY FILE S1: 3D AIRWAY TUBE RECONSTRUCTION AND CELL-BASED MECHANICAL MODEL. RELATED TO FIGURE 1, FIGURE 7, AND STAR METHODS.

Classification by Nearest Shrunken Centroids and Support Vector Machines

Computational Statistics and Mathematics for Cyber Security

A System to Automatically Index Genealogical Microfilm Titleboards Introduction Preprocessing Method Identification

Conforming Vector Interpolation Functions for Polyhedral Meshes

Dimension reduction : PCA and Clustering

CoE4TN3 Medical Image Processing

Transcription:

Intrinsic Dimensionality Estimation for Data Sets Yoon-Mo Jung, Jason Lee, Anna V. Little, Mauro Maggioni Department of Mathematics, Duke University Lorenzo Rosasco Center for Biological and Computational Learning, MIT September 1, 2009

Problem: We consider a novel approach for estimating the intrinsic dimensionality of high-dimensional point clouds. Assuming that the points are sampled from a k-dimensional data set corrupted by D-dimensional noise, with k << D, we estimate dimensionality via a new multiscale algorithm that generalizes PCA. The algorithm exploits the low-dimensional structure of the data, so that its power depends on k rather than D. Dimensionality estimation is important in many applications in machine learning, including: 1. signal processing 2. discovering number of variables in linear models 3. molecular dynamics 4. genetics 5. financial data

PCA Approach Counting number of significant singular values is classical technique in dimensionality estimation. When data is linear and noiseless, this method cannot fail. Idea: Consider data points x 1, x 2... x n in R D. Form normalized data matrix: x 1 X = 1 x 2 n...... x n Let C = X T X (the covariance matrix). Compute singular values of X (σ i (X ) = λ i (C), i = 1... D).

Issues with PCA Approach Finite sample case is not completely understood; how many data points do we need for accurate results? Noise confuses the dimensionality. Example: Sample 1000 points from 10-dim plane in R 100 ; corrupt with Gaussian noise of level σ =.2 (.2 N(0, I 100) added to each point) Non-linear data results in overestimation of the dimensionality.

Model: Manifold plus Noise 1. Let M be manifold of dimension k embedded in R D (bounded curvature). 2. Let x 1, x 2,..., x n be n samples. 3. Suppose data is corrupted by D-dimensional noise: x n = x n + ση n (e.g. η N(0, I D ) ) 4. Let: x 1 X n = x 2...... x n be the corresponding noisy data matrix. 5. Goal: Estimate the dimensionality k w.h.p. from X n.

Fix z. Specify scale: Multiscale Algorithm to Estimate Pointwise Dimensionaliy Let X (r) = M B z (r) Let X n (r) = X n Bz (r) Let X n (r) = X n Bz (r)

Fix z. Specify scale: Multiscale Algorithm to Estimate Pointwise Dimensionaliy Let X (r) = M B z (r) Let X n (r) = X n Bz (r) Let X n (r) = X n Bz (r) Algorithm: 1. Let {σ r i }D i=1 be the singular values of X n (r). 2. Classify the σ i as follows: linear growth in r: tangent plane singular value quadratic growth in r: curvature singular value no growth in r: noise singular value 3. Dimensionality at z = number of tangent plane σ i s

Example: Growth of Singular Values Consider S 5 embedded in R 100 Take 1000 noisy samples (σ =.05)

Outline of Analysis, I 1. Approximate the data set by a linear manifold X (r) and a normal correction X (r). It turns out that cov(x (r)) = cov(x (r)) + O(κ 2 r 4 ), with cov(x (r)) O(r 2 ). upper bound on r to avoid distortion due to curvature

Outline of Analysis, I 1. Approximate the data set by a linear manifold X (r) and a normal correction X (r). It turns out that cov(x (r)) = cov(x (r)) + O(κ 2 r 4 ), with cov(x (r)) O(r 2 ). upper bound on r to avoid distortion due to curvature 2. Apply sampling theorems for covariance matrices to bound distance between cov(x n (r)) and cov(x (r)) need O(k log k) points lower bound on r so that X n (r) contains enough points, i.e. O(k log k) w.h.p.

Outline of Analysis, I 1. Approximate the data set by a linear manifold X (r) and a normal correction X (r). It turns out that cov(x (r)) = cov(x (r)) + O(κ 2 r 4 ), with cov(x (r)) O(r 2 ). upper bound on r to avoid distortion due to curvature 2. Apply sampling theorems for covariance matrices to bound distance between cov(x n (r)) and cov(x (r)) need O(k log k) points lower bound on r so that X n (r) contains enough points, i.e. O(k log k) w.h.p. 3. Add ambient noise and bound w.h.p. its effect on the spectrum of X n (r), using results from random matrix theory and matrix perturbation. lower bound on r so that the tangent plane structure is distinguishable from the noise.

Outline of Analysis, II 1. Natural normalization: E[ η 2 R D ] = O(1) (e.g. σ = σ 0 D 1 2 ). Under the niceness assumptions κ = O(1) and σ 0 = O(1), the algorithm succeeds w.h.p. with only O(k log k) samples, independently of D. 2. If E[ η 2 R D ] grows with D (e.g. linearly as when η N (0, I D )), then for D large enough the algorithm fails w.h.p. 3. Consistency (n + ) of the algorithm follows trivially from our analysis with niceness assumptions on the noise and curvature. 4. The random matrix scaling limit (n +, D +, γ) is a particular case of our analysis. n D

Comparison with other algorithms Our algorithm: Requires O(k log k) points (under niceness assumptions on noise and curvature) Finite sample guarantees Only input: X n Discovers correct scale using multiscale approach

Comparison with other algorithms Our algorithm: Requires O(k log k) points (under niceness assumptions on noise and curvature) Finite sample guarantees Only input: X n Discovers correct scale using multiscale approach Other algorithms: Volume based (they require O(2 k ) points) Typically, no finite sample guarantees (at most consistent) Sensitive to noise Some involve many parameters Require user to specify correct scale (such as number of nearest neighors to consider)

Q 5 (D = 100, n = 500) and Q 10 (D = 100, n = 500) 100 90 80 MultiSVD Debiasing Smoothing RPMM 70 60 Dim Est 50 40 30 20 10 0 0 0.01 0.05 0.1 0.2 Noise Level De-biasing algorithm of Carter, Hero, and Raich; Smoothing algorithm of Carter and Hero; Regularized Poisson Mixture Model Algorithm of Haro, Randall, and Sapiro

S 4 (D = 100, n = 500) and S 9 (D = 100, n = 500) 50 45 40 MultiSVD Debiasing Smoothing RPMM 35 30 Dim Est 25 20 15 10 5 0 0 0.01 0.05 0.1 0.2 Noise Level De-biasing algorithm of Carter, Hero, and Raich; Smoothing algorithm of Carter and Hero; Regularized Poisson Mixture Model Algorithm of Haro, Randall, and Sapiro

Future Research Short-term: Tuning algorithm Extending results to manifolds of different dimensionalities Kernelization Long-term (employing techniques in various applications): Molecular Dynamics Genetics Financial data