Robust cartogram visualization of outliers in manifold learning

Similar documents
Kernel Generative Topographic Mapping

Clustering with Reinforcement Learning

A Principled Approach to Interactive Hierarchical Non-Linear Visualization of High-Dimensional Data

Graph projection techniques for Self-Organizing Maps

Tight Clusters and Smooth Manifolds with the Harmonic Topographic Map.

Clustering. Supervised vs. Unsupervised Learning

ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM

Advanced visualization techniques for Self-Organizing Maps with graph-based methods

K-Means Clustering. Sargur Srihari

Chapter 4. Clustering Core Atoms by Location

Semi-Supervised Construction of General Visualization Hierarchies

Time Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks

Motivation. Technical Background

COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING

MR IMAGE SEGMENTATION

Supervised vs. Unsupervised Learning

A Topography-Preserving Latent Variable Model with Learning Metrics

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Experimental Evaluation of Latent Variable Models. for Dimensionality Reduction

Abstractacceptedforpresentationatthe2018SEGConventionatAnaheim,California.Presentationtobemadeinsesion

Experimental Analysis of GTM

Robust Visual Mining of Data with Error Information

t 1 y(x;w) x 2 t 2 t 3 x 1

9.1. K-means Clustering

SD 372 Pattern Recognition

Seismic facies analysis using generative topographic mapping

ABSTRACT. Keywords: visual training, unsupervised learning, lumber inspection, projection 1. INTRODUCTION

Supervised Variable Clustering for Classification of NIR Spectra

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering and Visualisation of Data

Contents. Preface to the Second Edition

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

MTTTS17 Dimensionality Reduction and Visualization. Spring 2018 Jaakko Peltonen. Lecture 11: Neighbor Embedding Methods continued

Machine Learning Lecture 3

A Population Based Convergence Criterion for Self-Organizing Maps

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Curvilinear Distance Analysis versus Isomap

Cluster Analysis. Ying Shen, SSE, Tongji University

University of Florida CISE department Gator Engineering. Clustering Part 2

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

Validation for Data Classification

ESANN'2006 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2006, d-side publi., ISBN

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Unsupervised Learning : Clustering

ManiSonS: A New Visualization Tool for Manifold Clustering

Clustering Lecture 5: Mixture Model

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Data Mining Approaches to Characterize Batch Process Operations

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Machine Learning Methods in Visualisation for Big Data 2018

Does Dimensionality Reduction Improve the Quality of Motion Interpolation?

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

Note Set 4: Finite Mixture Models and the EM Algorithm

Experimental Evaluation of Latent Variable Models for Dimensionality Reduction

Clustering algorithms

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Infographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016

The K-modes and Laplacian K-modes algorithms for clustering

Use of Multi-category Proximal SVM for Data Set Reduction

SELECTION OF A MULTIVARIATE CALIBRATION METHOD

Chapter 2 Basic Structure of High-Dimensional Spaces

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Exploratory data analysis for microarrays

Probabilistic Modelling and Reasoning Assignment 2

A Keypoint Descriptor Inspired by Retinal Computation

A Stochastic Optimization Approach for Unsupervised Kernel Regression

Multivariate Analysis (slides 9)

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Feature clustering and mutual information for the selection of variables in spectral data

Spatial Outlier Detection

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Monocular Human Motion Capture with a Mixture of Regressors. Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France

Numerical representation of the quality measures of triangles and. triangular meshes

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Introduction to Machine Learning CMU-10701

K-means Clustering of Proportional Data Using L1 Distance

LS-SVM Functional Network for Time Series Prediction

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005

What is machine learning?

Learning Topology with the Generative Gaussian Graph and the EM Algorithm

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

Road Sign Visualization with Principal Component Analysis and Emergent Self-Organizing Map

Accelerating the convergence speed of neural networks learning methods using least squares

Clustering: Classic Methods and Modern Views

Machine Learning. Unsupervised Learning. Manfred Huber

Pattern Clustering with Similarity Measures

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Affine Arithmetic Self Organizing Map

Chapter VIII.3: Hierarchical Clustering

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Automatic basis selection for RBF networks using Stein s unbiased risk estimator

Mixture models and clustering

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters

Comparison of supervised self-organizing maps using Euclidian or Mahalanobis distance in classification context

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Transcription:

Robust cartogram visualization of outliers in manifold learning Alessandra Tosi 1 and Alfredo Vellido 1 1- Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, Edifici Omega, Campus Nord, 834, Barcelona - Spain Abstract. Most real data sets contain atypical observations, often referred to as outliers. Their presence may have a negative impact in data modeling using machine learning. This is particularly the case in data density estimation approaches. Manifold learning techniques provide low-dimensional data representations, often oriented towards visualization. The visualization provided by density estimation manifold learning methods can be compromised by the presence of outliers. Recently, a cartogram-based representation of model-generated distortion was presented for nonlinear dimensionality reduction. Here, we investigate the impact of outliers on this visualization when using manifold learning techniques that behave robustly in their presence. 1 Introduction Most multivariate data sets generated by real application problems contain atypical observations, often referred to as outliers. What constitutes an atypical case is, indeed, a problem-dependent decision but, in any case, the presence of outlier cases is likely to have a negative impact in data modeling using machine learning and computational intelligence methods [1]. This should be particularly the case in data density estimation approaches. Manifold learning techniques for nonlinear dimensionality reduction provide low-dimensional data representations, often with the aim of providing exploratory visualization for high-dimensional data. By forcing the model to account for their presence, outliers can compromise the results of density estimation manifold learning methods. This is likely to have an effect on the visualization they can provide. Nonlinear manifold learning methods generate a varying local distortion that can turn exploratory visualization into a difficult undertaking. Recently, a cartogram-based representation of such model-generated distortion, inspired by geographic representation, was defined for nonlinear manifold learning [2]. The cartogram-based method was illustrated using Generative Topographic Mapping (GTM, [3]), a technique for which the mapping distortion can be quantitatively estimated in the form of magnification factors (MF, [4]). Unfortunately, the standard GTM, as a constrained mixture of Gaussians, is prone to provide poor visualization in the presence of outliers. Alternatively, a variant of GTM defined as a mixture of Student t-distributions, namely the t- GTM, has been shown to minimize the negative effect of the presence of outliers This research was partially funded by Spanish research project TIN212-31377.

in the modelling process [, 6]. In this brief paper, we first define and calculate the MF for t-gtm and we then investigate the impact of outliers on the cartogram-based visualization, comparing the results yielded by the standard GTM with those of t-gtm. 2 Methods 2.1 Robust topographic mapping and its magnification factors The GTM [3] is a nonlinear manifold learning model for multivariate data exploratory visualization. It can be seen as a low-dimensional manifold-constrained mixture of distributions model in which the centres of the distributions are defined as data centroids or prototypes y k, which are the images of a sample of K,k = 1,...,K regularly spaced latent points, as mapped from the latent to the observed data space according to the mapping function y k = Φ(u k )W. Here, in the standard model, Φ is a set of M Gaussian basis functions φ m, while W is a matrix of adaptive weights, estimated as part of the model training process. The this waydefined prototypesy k describe asmooth manifold that envelops the observed D dimensional data X = {x n } N n=1. In the presence of outliers, some of the prototypes of this constrained mixture of Gaussians will attempt to model these atypical data, thus biasing the structure of the resulting manifold. Recently, it was proposed to redefine the GTM as a mixture of Student t-distributions, the t-gtm, so as to avoid this undesired effect [, 6]. The conditional distribution of the observed data variables, given the latent variables, p(x u) takes the following form for t-gtm: p(x u,w,β,ν) = Γ(ν+D 2 )β D 2 β Γ( ν 2 )(νπ)d/2(1+ ν x y(u) 2 ) ν+d 2, (1) whereγ( ) isthegammafunction andtheparameterν (that takesvalueequal to 2 in our experiments) can be viewed as a tuner that adapts the divergence from normality, and β is the inverse variance of the t-gtm noise model. The latent variables can be integrated out of the conditional distribution to obtain the likelihood of the model, so that its parameters can be optimized through maximum likelihood. For details of this procedure, see []. From the maximum likelihood optimization, a closed expression for responsibility r kn of each latent point k for the generation of observation n, or p(u k x n ), is obtained. It can be used to visualize observations in the form of a soft-mapping posterior mean projection u mean n = K k=1 r knu k, in such a way that a data point is mapped in the latent space according to a responsibility-weighted combination of all latent point locations, instead of a hard-mapping as in winner-takes-all algorithms. The t-gtm generates a varying local distortion that can make exploratory data visualization difficult. This distortion can be quantified over the latent space continuum with MF. The relationship between a differential area da (for a 2-D visualization) in latent space and the corresponding area element in the GTM-generated manifold, da, can be expressed in terms of the derivatives of the basis functions φ m as da/da = det 1 2(ΨWW T Ψ T ), where J = ΨW is the

Jacobian of the mapping transformation, u i is the i th coordinate (i = 1,2) of a latent point and Ψ is a M 2 matrix with elements ϕ mi, defined as: φ m u i = Γ(ν+D D+2 2 )( ν D)β 2 Γ( ν 2 )πd/2 ν D+2 2 ( u i µ i ) ( m 1+ β ) ν+d 2 ν u µ m 2 2 (2) where µ m, m = 1,...,M are the centres of the Student t-distributions. 2.2 Cartogram representation for t-gtm A cartogram is a depiction of an internally partitioned cartography map, in which the true surfaces of the internal partitions are distorted to reflect locallyvarying quantities such as population density. This distortion is a continuous transformation from an original surface to the cartogram, so that a vector x = (x 1,x 2 ) in the former is mapped onto the latter according to x T(x), in such a way that the Jacobian of the transformation is proportional to an underlying distorting variable d. A method for the creation of cartograms, based on linear diffusion processes was defined in [7]. This method was recently adapted to the visual representation provided by nonlinear dimensionality reduction methods in [2]. In GTM, it entails replacing geographic maps by the latent visualization map, which is transformed into a cartogram using the square regular grid formed by the lattice of latent points u k as map internal boundaries. It also entails replacing distorting variables such as population density by explicit distortion measuressuchasmf.thismethodwasextendedtothebatch-sommodelin[8]. The reader is referred to [2] for further details. 3 Experiments The following preliminary experiments compare the effect of outliers on the cartogram representations of the MF for the standard GTM and for t-gtm. An artificial dataset of 3-D points was used to make the direct visualization of the reference vectors y k in the observed data space possible. A total of 1, 3-D points were randomly drawn from 3 spherical Gaussian distributions ( points each), all with unit variance and with centres set at the vertices of an equilateral triangle. Two different subsets of outliers were added to this dataset: A-type) three outliers located on the normal to the imaginary plane defined by the cluster triangle that passes through its baricenter; B-type) three outliers located on the normal to one vertex of the imaginary triangle. A 1 1 regular grid and the same initialization were employed both for GTM and t-gtm. The MF was calculated for both methods and cartograms were generated using these values. In all cartograms, it was assumed that the level of distortion in the space beyond the grid is uniform and equal to the mean distortion over the complete map: 1/K K k=1 J(u k), where J is the Jacobian of the transformation. Likewise, we assumed that the level of distortion within each of the squares associated to u k is itself uniform. The first experiment,

displayed in Fig.1, corresponds to the inclusion of A-type outliers, while the second, displayed in Fig.2, corresponds to the inclusion of B-type outliers. Despite the fact that most GTM prototypes concentrate in the three clusters, it is clear from the image in Fig.1 (top row, left) that, in the case of standard GTM, the A-type outliers force the manifold towards them in an unduly way. This causesa distortionthat is morecontrolledbythe outliersthanbythe empty space between clusters. Even though, the cartogram visualizations generated by GTM and t-gtm are rather similar. The reason for this is the artificial symmetry of the outliers location. The maps in Fig.2, corresponding to the second experiment with added B type outliers, tell a very different story. Now, the symmetry is lost and the MF of the standard GTM reflects the fact that the model stretches one of the sides of the manifold in its attempt to cover the outliers (top row, left). As a result, an artifactual high distortion appears in the top-righ corner of the MF representation map (where outliers are seen to be mapped) and biases the cartogram representation. The t-gtm, instead, ignores the outliers and respects the symmetry of the representation while restricting the manifold to the imaginary triangle defined by the three clusters. This is clearly reflected in the corresponding cartogram. Notice that, in both experiments, the extra MF distortion introduced by the outliers makes the data representation of the data of all clusters far more compact for GTM than for t-gtm. In any case, this simple preliminary experiments illustrate how modelling methods that behave robustly in the presence of outliers are more likely to produce more faithful representations of the nonlinear mapping distortion and, as a result, more faithful data visualizations. References [1] A. Vellido, E. Romero, F.F. González-Navarro, Ll. Belanche-Muñoz, M. Julià-Sapé, C. Arús, Outlier exploration and diagnostic classification of a multi-centre 1 H-MRS brain tumour database. Neurocomputing, 72(13-1):38-397, Elsevier, 29. [2] A. Vellido, D. García, À. Nebot, Cartogram visualization for nonlinear manifold learning models. Data Mining and Knowledge Discovery, doi: 1.17/s1618-12-294-6 [3] C.M. Bishop, M. Svensén, C.K.I. Williams, GTM: The Generative Topographic Mapping. Neural Computation, 1(1):21 234, MIT Press, 1998. [4] C.M. Bishop, M. Svensén, C.K.I. Williams, Magnification factors for the SOM and GTM algorithms. In proceedings of the workshop on self-organizing maps (WSOM 97), pages 333-338, June 4-6, Helsinki (Finland), 1997. [] A. Vellido, P.J.G. Lisboa, Handling outliers in brain tumour MRS data analysis through robust topographic mapping, Computers in Biology and Medicine, 36(1):149-163, Elsevier, 26. [6] A. Vellido, Missing data imputation through GTM as a mixture of t-distributions. Neural Networks, 19(1):1624-163, Elsevier, 26. [7] M.T. Gastner, M.E.J. Newman, Diffusion-based method for producing density-equalizing maps, Proceedings of the National Academy of Sciences of the United States of America, 11(2):7499-74, National Academy of Sciences, 24. [8] A. Tosi, A. Vellido, Cartogram representation of the batch-som magnification factor. In M. Verleysen, editor,proceedings of the European Symposium on Artificial Neural Networks Computational Intelligence and Machine Learning (ESANN 212), d-side pub., pages 23-28, Bruges (Belgium), 212.

GTM t GTM 2 2 1 1 1 1 1 1 2 4 2 4 2 2 3 2 1 1 2 2 2 3 2 1 1 2 2 2 1 1 1 1 Fig. 1: Top row) Representation of the original data clusters (1, points, in each cluster, plus A type outliers, all represented with crosses) with standard GTM (left) and t-gtm (right) The generated manifold is superimposed; it is represented as a grid whose knots are the model prototypes y k ; central row) Maps of MF values together with a colorbar for interpretation on the right-hand side of the maps; bottom row) Corresponding cartograms, based on the MF, to which the mean projections of the data are superimposed. The mapping locations of outliers are highlighted with circles.

GTM t GTM 2 2 1 1 1 1 1 1 2 3 2 1 1 2 4 2 2 4 2 3 2 1 1 2 4 2 2 4 16 16 14 14 12 12 1 1 8 8 6 6 4 4 2 2 Fig. 2: Representation of data, manifold grid, MF maps and cartograms for the second experiment with B type outliers as in Fig 1. Notice the difference in the mapping locations of outliers (again highlighted with circles) as compared to Fig 1. In this case, GTM maps the outliers in a high-distortion area that is generated by the own outliers and not by the cluster data points (note that this high distortion appears because of just three outlier points), whereas the t-gtm maps them correctly to the closest cluster, without any artifactual distortion.