Evgeny Maksakov Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages:

Similar documents
Non-linear dimension reduction

FODAVA Partners Leland Wilkinson (SYSTAT & UIC) Robert Grossman (UIC) Adilson Motter (Northwestern) Anushka Anand, Troy Hernandez (UIC)

FODAVA Partners Leland Wilkinson (SYSTAT, Advise Analytics & UIC) Anushka Anand, Tuan Nhon Dang (UIC)

Lecture 6: Statistical Graphics

Head Frontal-View Identification Using Extended LLE

Voronoi Diagrams in the Plane. Chapter 5 of O Rourke text Chapter 7 and 9 of course text

Glyphs. Presentation Overview. What is a Glyph!? Cont. What is a Glyph!? Glyph Fundamentals. Goal of Paper. Presented by Bertrand Low

Part I. Graphical exploratory data analysis. Graphical summaries of data. Graphical summaries of data

Other Voronoi/Delaunay Structures

Dimension Reduction CS534

Selecting Models from Videos for Appearance-Based Face Recognition

Lecture Topic Projects

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen

Timeseer: Detecting interesting distributions in multiple time series data

Differential Structure in non-linear Image Embedding Functions

Visual Encoding Design

How to project circular manifolds using geodesic distances?

CIE L*a*b* color model

Local Linear Embedding. Katelyn Stringer ASTR 689 December 1, 2015

Iterative Non-linear Dimensionality Reduction by Manifold Sculpting

Lecture 3: Data Principles

Locality Preserving Projections (LPP) Abstract

Robust Pose Estimation using the SwissRanger SR-3000 Camera

Remote Sensing Data Classification Using Combined Spectral and Spatial Local Linear Embedding (CSSLE)

Locality Preserving Projections (LPP) Abstract

Robot Manifolds for Direct and Inverse Kinematics Solutions

Performance Level Descriptors. Mathematics

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Manifold Clustering. Abstract. 1. Introduction

Stratified Structure of Laplacian Eigenmaps Embedding

RDRToolbox A package for nonlinear dimension reduction with Isomap and LLE.

Visually-Motivated Characterizations of Point Sets Embedded in High-Dimensional Geometric Spaces: Current Projects

Geodesic Based Ink Separation for Spectral Printing

Chapter 2: Looking at Multivariate Data

Integrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties.

InterAxis: Steering Scatterplot Axes via Observation-Level Interaction

Exploratory Data Analysis EDA

Computational Geometry

Learning a Manifold as an Atlas Supplementary Material

Data Mining: Exploring Data. Lecture Notes for Chapter 3

3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data

Large-Scale Face Manifold Learning

Multiple variables data sets visualization in ROOT

WELCOME! Lecture 3 Thommy Perlinger

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Scagnostics Distributions

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Planar Graphs. 1 Graphs and maps. 1.1 Planarity and duality

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

COMPUTING CONSTRAINED DELAUNAY

Dimension Reduction of Image Manifolds

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

Visualizing high-dimensional data:

CSE 6242 / CX October 9, Dimension Reduction. Guest Lecturer: Jaegul Choo

Graph projection techniques for Self-Organizing Maps

Image Spaces and Video Trajectories: Using Isomap to Explore Video Sequences

University of Florida CISE department Gator Engineering. Visualization

Lecture 8 Mathematics of Data: ISOMAP and LLE

Arizona Academic Standards

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

MATH GRADE 7. Assessment Anchors & Eligible Content. Pennsylvania Department of Education 2007

ABSTRACT. Keywords: visual training, unsupervised learning, lumber inspection, projection 1. INTRODUCTION

Sparse Manifold Clustering and Embedding

The Curse of Dimensionality

Computational Geometry. Algorithm Design (10) Computational Geometry. Convex Hull. Areas in Computational Geometry

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial

Applying the weighted barycentre method to interactive graph visualization

Finding Structure in CyTOF Data

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Réduction de modèles pour des problèmes d optimisation et d identification en calcul de structures

IMAGE MASKING SCHEMES FOR LOCAL MANIFOLD LEARNING METHODS. Hamid Dadkhahi and Marco F. Duarte

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 24 Solid Modelling

SHAPE, SPACE & MEASURE

Contents. Preface to the Second Edition

Foundation. Scheme of Work. Year 9. September 2016 to July 2017

This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight

SpringView: Cooperation of Radviz and Parallel Coordinates for View Optimization and Clutter Reduction

Indian Institute of Technology Kanpur. Visuomotor Learning Using Image Manifolds: ST-GK Problem

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Chapter 4. Clustering Core Atoms by Location

CSE Data Visualization. Multidimensional Vis. Jeffrey Heer University of Washington

Praxis Elementary Education: Mathematics Subtest (5003) Curriculum Crosswalk. Required Course Numbers. Test Content Categories.

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

University of Florida CISE department Gator Engineering. Clustering Part 5

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Visual Encoding Design

Voronoi diagram and Delaunay triangulation

PITSCO Math Individualized Prescriptive Lessons (IPLs)

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition

7 Fractions. Number Sense and Numeration Measurement Geometry and Spatial Sense Patterning and Algebra Data Management and Probability

Integrated Mathematics I Performance Level Descriptors

Courtesy of Prof. Shixia University

Nonlinear Spherical Shells for Approximate Principal Curves Skeletonization

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

Graph/Network Visualization

Grades 7 & 8 Pre-Algebra/Course 3,

Classification Performance related to Intrinsic Dimensionality in Mammographic Image Analysis

A Data-dependent Distance Measure for Transductive Instance-based Learning

Proximal Manifold Learning via Descriptive Neighbourhood Selection

Transcription:

Today Problems with visualizing high dimensional data Problem Overview Direct Visualization Approaches High dimensionality Visual cluttering Clarity of representation Visualization is time consuming Dimensional anchors Scagnostic SPLOMs Nonlinear Dimensionality Reduction Evgeny Maksakov Classical methods Locally Linear Embedding and Isomaps Charting manifold CS533C Department of Computer Science UBC 1 2 Multiple Line Graphs Multiple Line Graphs 3 4 Scatter Plot Matrices Scatter Plot Matrices Hard to distinguish dimensions if multiple line graphs overlaid Each dimension may have different scale that should be shown More than 3 dimensions can become confusing 5 + Useful for looking at all possible twoway interactions between dimensions Becomes inadequate for medium to high dimensionality 6 7 8 Bar Charts, Histograms Bar Charts, Histograms + allows to see correlations between any two variables when the data is sorted according to one particular dimension can be confusing + Good for small comparisons Contain little data 9 10 11 12 Circular Circular + Many connected dimensions are seen in limited space + Combines properties of glyphs and parallel coordinates making pattern recognition easier + Can see trends in data + Compact 13 Become inadequate for very high dimensionality Cluttering 14 15 Cluttering near center Harder to interpret relations between each pair of dimensions than parallel coordinates 16

17 Andrews Curves Andrews Curves Radviz Radviz + Allows to draw virtually unlimited dimensions + Good for data manipulation Hard to interpret + Low cluttering Cannot show quantitative data High computational complexity 18 Radviz employs spring model 19 20 What is dimensional anchor? What is dimensional anchor? Dimensional Anchors Attempt to Generalize Visualization Methods for High Dimensional Data Nothing like that DA is just an axis line! Anchorpoints are coordinates! 21 22 23 24 Picture from members.fortunecity.com/agreeve/seacol.htm & http://kresby.grafika.cz/data/media/46/dimension.jpg_middle.jpg DA Visualization Vector Scatterplot features Survey plot feature Radviz features Size of the scatter plot points Length of the perpendicular lines extending from individual anchor points in a scatter plot 4. Width of the rectangle in a survey plot Parallel coordinates features 5. Length of the parallel coordinate lines 7. Size of the radviz plot point 8. Length of spring lines extending from individual anchor points of radviz plot P (p1,p2,p3,p4,p5,p6,p7,p8,p9) Length of the lines connecting scatter plot points that are associated with the same data point 6. Blocking factor for the parallel coordinate lines 9. Zoom factor for the spring constant K 25 26 27 28 DA describes visualization for any combination of: Scatterplots Scatterplots with other layouts Parallel coordinates Scatterplot matrices Radviz Survey plots (histograms) 2 DAs, P = (0.8, 0.2, 0, 0, 0, 0, 0, 0, 0) 2 DAs, P = (0.1, 1.0, 0, 0, 0, 0, 0, 0, 0) 3 DAs, P = (0.6, 0, 0, 0, 0, 0, 0, 0, 0) 5 DAs, P = (0.5, 0, 0, 0, 0, 0, 0, 0, 0) P = (0, 0, 0, 0.4, 0, 0, 0, 0, 0) P = (0, 0, 0, 1.0, 0, 0, 0, 0, 0) Circle segments 29 30 31 32

33 Circular Segments Radviz like visualization Playing with parameters Crisscross layout with P = (0, 0, 0, 0, 0, 0, 0.4, 0, 0.5) Parallel coordinates with P = (0, 0, 0, 0, 0, 0, 0.4, 0, 0.5) P = (0, 0, 0, 1.0, 0, 0, 0, 0, 0) P = (0, 0, 0, 0, 1.0, 1.0, 0, 0, 0) P = (0, 0, 0, 0, 0, 0, 0.5, 1.0, 0.5) 34 35 36 Pictures from Patrick Hoffman et al. (1999) More? Scatterplot Diagnostics Tukey s Idea of Scagnostics Scagnostic SPLOM or Scagnostics Take measures from scatterplot matrix Construct scatterplot matrix (SPLOM) of these measures Look for data trends in this SPLOM Is like: Visualization of a set of pointers Also: Set of pointers to pointers also can be constructed Goal: To be able to locate unusual clusters of measures that characterize unusual clusters of raw scatterplots 37 38 39 40 Pictures from Patrick Hoffman et al. (1999) Problems with constructing Scagnostic SPLOM Solution* Properties of geometric graph for measures Graphs that fit these demands: 1) Some of Tukeys measures presume underlying continuous empirical or theoretical probability function. It can be a problem for other types of data. 2) The computational complexity of some of the Tukey measures is O( n_ ). 1. Use measures from the graphtheory. Do not presume a connected plane of support Can be metric over discrete spaces 2. Base the measures on subsets of the Delaunay triangulation Gives O(nlog(n)) in the number of points 3. Use adaptive hexagon binning before computing to further reduce the dependence on n. 4. Remove outlying points from spanning tree Undirected Simple Planar Straight Finite (edges consist of unordered pairs) (no edge pairs a vertex with itself) (has embedding in R2 with no crossed edges) (embedded eges are straight line segments) (V and E are finite sets) Convex Hull Alpha Hull Minimal Spanning Tree 41 * Leland Wilkinson et al. (2005) 42 43 44 Measures: Five interesting aspects of scattered points: Classifying scatterplots Looking for anomalies Length of en edge Length of a graph Look for a closed path (boundary of a polygon) Perimeter of a polygon Area of a polygon Diameter of a graph Outliers Outlying Shape Convex Skinny Stringy Straight Trend Monotonic Density Skewed Clumpy Coherence Striated 45 46 47 48

Manifold Nonlinear Dimensionality Reduction (NLDR) Assumptions: Methods Topological space that is locally Euclidean. data of interest lies on embedded nonlinear manifold within higher dimensional space manifold is low dimensional " can be visualized in low Locally Linear Embedding dimensional space. ISOMAPS 49 50 Picture from: http://en.wikipedia.org/wiki/image:kleinbottle01.png 51 52 Picture from: http://en.wikipedia.org/wiki/image:triangle_on_globe.jpg Locally Linear Embedding (LLE) Algorithm Isomaps Algorithm Application of LLE Original Sample Mapping by LLE 1. Construct neighborhood graph 2. Compute shortest paths 3. Construct ddimensional embedding (like in MDS) 53 Algorithm can only recover embeddings whose dimensionality, d, is strictly less than the number of neighbors, K. Margin between d and K is recommended. Algorithm is based on assumption that data point and its nearest neighbors can be modeled as locally linear; for curved manifolds, too large K will violate this assumption. In case of originally low dimensionality of data algorithm degenerates. 55 Picture from Lawrence K. Saul at al. (2002) Proposed improvements* Limitations of LLE 54 Pictures taken from http://www.cs.wustl.edu/~pless/isomapimages.html Picture from: Joshua B. Tenenbaum et al. (2000) Strengths and weaknesses: Analyze pairwise distances between data points instead of assuming that data is multidimensional vector ISOMAP handles holes well Reconstruct convex ISOMAP can fail if data hull is nonconvex Estimate the intrinsic dimensionality Vice versa for LLE Enforce the intrinsic dimensionality if it is known a priori or highly suspected Both offer embeddings without mappings. 57 56 Picture from Lawrence K. Saul at al. (2002) 58 Charting manifold 59 60 * Lawrence K. Saul at al (2002) Algorithm Idea Where ISOMAPs and LLE fail, Charting Prevail Video test 1) Find a set of data covering locally linear neighborhoods ( charts ) such that adjoining neighborhoods span maximally similar subspaces 2) Compute a minimaldistortion merger ( connection ) of all charts 61 62 63 64

65 Literature Questions? Covered papers: 1. GraphTheoretic Scagnostics L. Wilkinson, R. Grossman, A. Anand. Proc. InfoVis 2005. 2. Dimensional Anchors: a Graphic Primitive for Multidimensional Multivariate Information Visualizations, Patrick Hoffman et al., Proc. Workshop on New Paradigms in Information Visualization and Manipulation, Nov. 1999, pp. 9 16. 3. Charting a manifold Matthew Brand, NIPS 2003. 4. Think Globally, Fit Locally: Unsupervised Learning of Nonlinear Manifolds. Lawrence K. Saul & Sam T. Roweis. University of Pennsylvania Technical Report MSCIS0218, 2002 Other papers: A Global Geometric Framework for Nonlinear Dimensionality Reduction, Joshua B. Tenenbaum, Vin de Silva, John C. Langford, SCIENCE VOL 290 23192323 (2000) 66