Statistical properties of topological information inferred from data

Similar documents
Analyse Topologique des Données

Stability and Statistical properties of topological information inferred from metric data

Persistent homology in TDA

Persistence stability for geometric complexes

Clustering algorithms and introduction to persistent homology

Detection and approximation of linear structures in metric spaces

Geometric Data. Goal: describe the structure of the geometry underlying the data, for interpretation or summary

Topological Signatures

An introduction to Topological Data Analysis through persistent homology: Intro and geometric inference

New Results on the Stability of Persistence Diagrams

arxiv: v1 [math.st] 11 Oct 2017

Introduction to the R package TDA

A Multicover Nerve for Geometric Inference. Don Sheehy INRIA Saclay, France

Topological Inference for Modern Data Analysis

Topological Inference via Meshing

arxiv: v1 [cs.it] 10 May 2016

A Multicover Nerve for Geometric Inference

On the Topology of Finite Metric Spaces

Statistical topological data analysis using. Persistence landscapes

Data Analysis, Persistent homology and Computational Morse-Novikov theory

Stable and Multiscale Topological Signatures

Package TDA. December 9, 2017

Statistical Methods in Topological Data Analysis for Complex, High-Dimensional Data

Goodies in Statistic and ML. DataShape, Inria Saclay

Package TDA. April 27, 2017

Topological Inference via Meshing

Metrics on diagrams and persistent homology

A roadmap for the computation of persistent homology

A persistence landscapes toolbox for topological statistics

25 HIGH-DIMENSIONAL TOPOLOGICAL DATA ANALYSIS

A Geometric Perspective on Sparse Filtrations

Topological Data Analysis Workshop February 3-7, 2014

Persistent Homology of Directed Networks

Kernels for Persistence Diagrams with Applications in Geometry Processing

Topology and fmri Data

Computational Geometry Learning

Statistical Inference for Topological Data Analysis

Stratified Structure of Laplacian Eigenmaps Embedding

Reconstruction of Filament Structure

Homology and Persistent Homology Bootcamp Notes 2017

TOPOLOGICAL DATA ANALYSIS

arxiv: v1 [stat.me] 8 Oct 2015

Topological Data Analysis (Spring 2018) Persistence Landscapes

Map Construction and Comparison

Computational Topology in Reconstruction, Mesh Generation, and Data Analysis

Persistence-based Segmentation of Deformable Shapes

Donald Sheehy. Education. Employment. Teaching. Grants and Funding

Gromov-Hausdorff distances in Euclidean Spaces. Facundo Mémoli

A Stable Multi-Scale Kernel for Topological Machine Learning

What will be new in GUDHI library version 2.0.0

The Gudhi Library: Simplicial Complexes and Persistent Homology

Topological Persistence

Random Simplicial Complexes

On Map Construction and Map Comparison

Simplicial Complexes: Second Lecture

The Topological Data Analysis of Time Series Failure Data in Software Evolution

Applying Topological Persistence in Convolutional Neural Network for Music Audio Signals

Pattern Recognition Methods for Object Boundary Detection

Tutorial on the R package TDA

Persistence-based Structural Recognition

On the Nonlinear Statistics of Range Image Patches

arxiv: v1 [stat.me] 26 Apr 2018

A fast and robust algorithm to count topologically persistent holes in noisy clouds

Building a coverage hole-free communication tree

Towards Multi-scale Heat Kernel Signatures for Point Cloud Models of Engineering Artifacts

JPlex Software Demonstration. AMS Short Course on Computational Topology New Orleans Jan 4, 2011 Henry Adams Stanford University

Topological Data Analysis - I. Afra Zomorodian Department of Computer Science Dartmouth College

Western TDA Learning Seminar. June 7, 2018

Stochastic Multiresolution Persistent Homology Kernel

JPlex. CSRI Workshop on Combinatorial Algebraic Topology August 29, 2009 Henry Adams

Geometric Inference. 1 Introduction. Frédéric Chazal 1 and David Cohen-Steiner 2

Random Simplicial Complexes

Geometric Inference. 1 Introduction. Frédéric Chazal 1 and David Cohen-Steiner 2

Topological Data Analysis

Geometric Learning and Topological Inference with Biobotic Networks: Convergence Analysis

Estimating Geometry and Topology from Voronoi Diagrams

S 1 S 3 S 5 S 7. Homotopy type of VR(S 1,r) as scale r increases

Homological theory of polytopes. Joseph Gubeladze San Francisco State University

Compu&ng Correspondences in Geometric Datasets. 4.2 Symmetry & Symmetriza/on

66 III Complexes. R p (r) }.

Persistence Terrace for Topological Inference of Point Cloud Data

Kernel Methods for Topological Data Analysis

SE AND S N L DIAGRAMS: FLEXIBLE DATA STRUCTURES FOR MIR

Random Simplicial Complexes

Discriminative Clustering for Image Co-segmentation

Persistent Homology and Machine Learning. 1 Introduction. 2 Simplicial complexes

First-passage percolation in random triangulations

Topology and Data. Gunnar Carlsson Department of Mathematics, Stanford University Stanford, California October 2, 2008

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching

Homological Reconstruction and Simplification in R3

Mapper, Manifolds, and More! Topological Data Analysis and Mapper

Observing Information: Applied Computational Topology.

Topological space - Wikipedia, the free encyclopedia

Topological Classification of Data Sets without an Explicit Metric

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Open Problems Column Edited by William Gasarch

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Topological estimation using witness complexes. Vin de Silva, Stanford University

Computing the k-coverage of a wireless network

3. Persistence. CBMS Lecture Series, Macalester College, June Vin de Silva Pomona College

Transcription:

Saclay, July 8, 2014 DataSense Research day Statistical properties of topological information inferred from data Frédéric Chazal INRIA Saclay Joint works with B.T. Fasy (CMU), M. Glisse (INRIA), C. Labruère (Univ. Bourgogne), F. Lecci (CMU), B. Michel (LSTA Paris 6), A. Rinaldo (CMU), L. Wasserman (CMU).

Introduction [distribution of galaxies] [Non-rigid shape database] Data often come as (sampling of) metric spaces or sets/spaces endowed with a similarity measure with, possibly complex, topological/geometric structure. Topological Data Analysis (TDA): - infer relevant topological and geometric features of these spaces. - take advantage of topol./geom. information for further processing of data (classification, recognition, learning, clestering,...).

Introduction dgm(filt( X m )) X m Filt( X m ) Build topol. structure Persistent homology 0 0 Build a geometric filtered simplicial complex on top of X m multiscale topol. structure. Compute the persistent homology of the complex multiscale topol. signature. Compare the signatures of close data sets robustness and stability results. Statistical properties of signatures

Filtered simplicial complexes simplicial complex: generalizes the notion of neighboring graph to higher dimensions by adding triangles,tetrahedra,... filtered simplicial complex = nested family of simplicial complexes indexed by R Example (used all along the talk): Let (X, d X ) be a metric space. The Vietoris-Rips complex Rips(X) is defined by: for a R, [x 0, x 1,, x k ] Rips(X, a) d X (x i, x j ) a, for all i, j

Persistent homology of filtered complexes Rips parameter An efficient way to encode the evolution of the topology (homology) of families of nested spaces (filtered complex, sublevel sets,...). Multiscale topological information. Barcodes/persistence diagrams can be efficiently computed. Stability properties

Persistent homology of filtered complexes Rips parameter An efficient way to encode the evolution of the topology (homology) of families of nested spaces (filtered complex, sublevel sets,...). Multiscale topological information. Barcodes/persistence diagrams can be efficiently computed. Stability properties

Persistent homology of filtered complexes Rips parameter An efficient way to encode the evolution of the topology (homology) of families of nested spaces (filtered complex, sublevel sets,...). Multiscale topological information. Barcodes/persistence diagrams can be efficiently computed. Stability properties

Persistent homology of filtered complexes Rips parameter An efficient way to encode the evolution of the topology (homology) of families of nested spaces (filtered complex, sublevel sets,...). Multiscale topological information. Barcodes/persistence diagrams can be efficiently computed. Stability properties

Persistent homology of filtered complexes Rips parameter An efficient way to encode the evolution of the topology (homology) of families of nested spaces (filtered complex, sublevel sets,...). Multiscale topological information. Barcodes/persistence diagrams can be efficiently computed. Stability properties

Persistent homology of filtered complexes Rips parameter An efficient way to encode the evolution of the topology (homology) of families of nested spaces (filtered complex, sublevel sets,...). Multiscale topological information. Barcodes/persistence diagrams can be efficiently computed. Stability properties

Persistent homology of filtered complexes Persistence barcode Rips parameter death An efficient way to encode the evolution of the topology (homology) of families of nested spaces (filtered complex, sublevel sets,...). Multiscale topological information. Barcodes/persistence diagrams can be efficiently computed. Stability properties 0 Add the diagonal Multiplicity: 14 birth Persistence diagram

Stability properties Stability theorem : Close spaces/data sets have close persistence diagrams! If X and Y are pre-compact metric spaces, then d b (dgm(rips(x)), dgm(rips(y))) d GH (X, Y). Bottleneck distance: a matching distance between diagrams Gromov-Hausdorff distance Rem: this a particular case of a more general theorem [C.-de Silva-Oudot 2013].

Stability properties Example: Application to non rigid shape classification. camel cat elephant face head horse [C., Cohen-Steiner, Guibas, Mémoli, Oudot 09] Use the metric on the space of persistence diagrams. 1 1 1 1 0 0 0 0 0 0 0 0 Non rigid shapes in a same class are almost isometric, but computing Gromov- Hausdorff distance between shapes is extremely expensive. Compare diagrams of sampled shapes instead of shapes themselves. Other applications in image classifications, object recognition, clustering,...

The convergence rate is optimal among (a, b) standard measures, whatever the choice of the estimator of dgm(rips(x µ )) (minimax convergence rate). Convergence rates of persistence diagrams (M, ρ) metric space µ a probability measure with compact support X µ. Sample m points according to µ. X m Filt( X m ) Build topol. structure Persistent homology dgm(filt( X m )) 0 0 Assume that µ is (a, b)-standard: x X µ, r > 0, µ(b(x, r)) min(ar b, 1). E [ ] d b (dgm(rips(x µ )), dgm(rips( X m ))) C ( ln m m ) 1/b

To do more statistics: persistence landscapes d d+b 2 d b 2 b D = {( d i+b i, d i+b i )}i I 2 2 Persistence landscape [Bubenik 2012]: λ D (k, t) = kmax p D Λ p(t), t R, k N, where kmax is the kth largest value in the set. d+b 2 For p = ( b+d, d b ) D, 2 2 t b t [b, b+d ] 2 Λ p (t) = d t t ( b+d 2, d] 0 otherwise. Stability: For any t R and any k N, λ D (k, t) λ D (k, t) d B (D, D ).

To do more statistics: persistence landscapes d d+b 2 d b 2 b d+b 2 Persistence encoded as an element of a functional space (vector space!). Expectation of distribution of landscapes is well-defined and can be approximated from average of sampled landscapes. process point of view: convergence results and convergence rates confidence intervals can be computed using bootstrap.

(Sub)sampling and stability of expected landscapes (M, ρ, µ) X µ compact X 1, X 2,, X m i.i.d. sampled according to µ. X m Rips( Xm ) λ Rips( Xm ) µ m Φ P = Φ (µ m ) Λ µ,m (t) = E P [λ(t)] Theorem: Let (M, ρ) be a metric space and let µ, ν be proba measures on M with compact supports. We have Λ µ,m Λ ν,m m 1 p W p (µ, ν) where W p denotes the Wasserstein distance with cost function ρ(x, y) p. Consequences: Subsampling: efficient and easy to parallelize algorithm to infer topol. information from huge data sets. Robustness to outliers. R package (released soon) +Gudhi library: https://project.inria.fr/gudhi/software/

(Sub)sampling and stability of expected landscapes (Toy) Example: Accelerometer data from smartphone. - spatial time series (accelerometer data from the smarphone of users). - no registration/calibration preprocessing step needed to compare!

Conclusion Persistent diagrams of geometric complexes built on top of data provide a very general, flexible way to infer relevant multiscale topological information. Although they live in a non-friendly metric space, persistence diagrams have good statistical properties. The convergence results are indeed more general: - extend to other families of filtered simplicial complexes (Čech-complexes, witness complexes,...) - extend to non-metric spaces endowed with a similarity measure. Subsampling, averaging: robust topological inference under perturbations of the measure µ. very fast and easy to parallelize computation of topological feature for huge data.

Conclusion Persistent diagrams of geometric complexes built on top of data provide a very general, flexible way to infer relevant multiscale topological information. Although they live in a non-friendly metric space, persistence diagrams have good statistical properties. The convergence results are indeed more general: - extend to other families of filtered simplicial complexes (Čech-complexes, witness complexes,...) - extend to non-metric spaces endowed with a similarity measure. Subsampling, averaging: robust topological inference under perturbations of the measure µ. very fast and easy to parallelize computation of topological feature for huge data. Thank you for your attention!

References: F. Chazal, M. Glisse, C. Labruère, B. Michel, Convergence rates for persistence diagram estimation in Topological Data Analysis, in Int. Conf. on Machine Learning 2014 (ICML 2014). C. Li, M. Ovsjanikov, F. Chazal, Persistence-based Structural Recognition, in proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014). F. Chazal, B. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, Stochastic Convergence of Persistence Landscapes and Silhouettes, in ACM Symposium on Computational Geometry 2014. F. Chazal, B. Fasy, F. Lecci, B. Michel, A. Rinaldo, L. Wasserman, Subsampling Methods for Persistent Homology, arxiv:1406.1901, June 2014. F. Chazal, V. de Silva, S. Oudot, Persistence Stability for Geometric complexes, Geometria Dedicata 2014 (online first Dec. 2013). F. Chazal, V. de Silva, M. Glisse, S. Oudot, The Structure and Stability of Persistence Modules, arxiv:1207.3674, July 2012.