High Dimensional Data Visualization

Similar documents
Methods for Visualizing High Dimensional Data

Visual Clustering in Parallel Coordinates

Exploratory Data Analysis EDA

Mathematics High School Geometry An understanding of the attributes and relationships of geometric objects can be applied in diverse contexts

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Mathematics High School Geometry

Evgeny Maksakov Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages:

Visual Encoding Design

Quality Metrics for Visual Analytics of High-Dimensional Data

Multiple variables data sets visualization in ROOT

Parallel-Coordinates Art

Spectral-Based Contractible Parallel Coordinates

Middle School Math Course 3

Week 7 Picturing Network. Vahe and Bethany

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

Montclair Public Schools Math Curriculum Unit Planning Template Unit # SLO # MC 2 MC 3

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Foundations for Functions Knowledge and Skills: Foundations for Functions Knowledge and Skills:

Some Open Problems in Graph Theory and Computational Geometry

Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract

Introduction to FEM calculations

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

pine cone Ratio = 13:8 or 8:5

On the number of distinct directions of planes determined by n points in R 3

EULER S FORMULA AND THE FIVE COLOR THEOREM

Assignment 1 Introduction to Graph Theory CO342

1a. Define, classify, and order rational and irrational numbers and their subsets. (DOK 1)

INDEPENDENT SCHOOL DISTRICT 196 Rosemount, Minnesota Educating our students to reach their full potential

This blog addresses the question: how do we determine the intersection of two circles in the Cartesian plane?

Copyright. Anna Marie Bouboulis

A GENTLE INTRODUCTION TO THE BASIC CONCEPTS OF SHAPE SPACE AND SHAPE STATISTICS

Vocabulary Unit 2-3: Linear Functions & Healthy Lifestyles. Scale model a three dimensional model that is similar to a three dimensional object.

Prentice Hall Mathematics: Course Correlated to: Massachusetts State Learning Standards Curriculum Frameworks (Grades 7-8)

Chapter 5. Projections and Rendering

Stereo Vision. MAN-522 Computer Vision

Modelling and Visualization of High Dimensional Data. Sample Examination Paper

The Topology of Bendless Orthogonal Three-Dimensional Graph Drawing. David Eppstein Computer Science Dept. Univ. of California, Irvine

CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS

Graph/Network Visualization

INTRODUCTION TO ECONOGRAPHICATION

Structure from Motion. Prof. Marco Marcon

Courtesy of Prof. Shixia University

Instructor: Paul Zeitz, University of San Francisco

cs6964 February TABULAR DATA Miriah Meyer University of Utah

Non-linear dimension reduction

Geometry I Can Statements I can describe the undefined terms: point, line, and distance along a line in a plane I can describe the undefined terms:

Mathematics 6 12 Section 26

Geometry. Cluster: Experiment with transformations in the plane. G.CO.1 G.CO.2. Common Core Institute

Prentice Hall Pre-Algebra 2004 Correlated to: Hawaii Mathematics Content and Performance Standards (HCPS) II (Grades 9-12)

Chapter 4. Relations & Graphs. 4.1 Relations. Exercises For each of the relations specified below:

Algorithms: Graphs. Amotz Bar-Noy. Spring 2012 CUNY. Amotz Bar-Noy (CUNY) Graphs Spring / 95

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

Dimension Reduction CS534

Rational Numbers: Graphing: The Coordinate Plane

PO 2. Identify irrational numbers. SE/TE: 4-8: Exploring Square Roots and Irrational Numbers, TECH: itext; PH Presentation Pro CD-ROM;

Mathematics Curriculum

Preliminaries: Size Measures and Shape Coordinates

8 th Grade Pre Algebra Pacing Guide 1 st Nine Weeks

3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data

Network Traffic Measurements and Analysis

Discrete mathematics , Fall Instructor: prof. János Pach

Data mining with Support Vector Machine

Chapter 2 Basic Structure of High-Dimensional Spaces

Honors Precalculus: Solving equations and inequalities graphically and algebraically. Page 1

Multidimensional Data and Modelling - Operations

Lecture Topic Projects

Cambridge IGCSE mapping

Conic Sections and Locii

Chapter 12 and 11.1 Planar graphs, regular polyhedra, and graph colorings

Matija Gubec International School Zagreb MYP 0. Mathematics

New Jersey Core Curriculum Content Standards for Mathematics Grade 7 Alignment to Acellus

Clustering and Visualisation of Data

Clustering CS 550: Machine Learning

Multiple View Geometry in Computer Vision

Prentice Hall Mathematics: Course Correlated to: Colorado Model Content Standards and Grade Level Expectations (Grade 8)

On Order-Constrained Transitive Distance

CME Project, Algebra Correlated to: The Pennsylvania Math Assessment Anchors and Eligible Content (Grade 11)

The Rectangular Coordinate System and Equations of Lines. College Algebra

Computational Geometry

Grade 7 Math Curriculum Map Erin Murphy

GEOMETRIC TOOLS FOR COMPUTER GRAPHICS

Graphical Presentation for Statistical Data (Relevant to AAT Examination Paper 4: Business Economics and Financial Mathematics) Introduction

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Bezier Curves. An Introduction. Detlef Reimers

An Experiment in Visual Clustering Using Star Glyph Displays

Geometry. Every Simplicial Polytope with at Most d + 4 Vertices Is a Quotient of a Neighborly Polytope. U. H. Kortenkamp. 1.

arxiv: v1 [cs.cc] 30 Jun 2017

(Refer Slide Time: 00:04:20)

Charting new territory: Formulating the Dalivian coordinate system

Multidimensional Data and Modelling

VW 1LQH :HHNV 7KH VWXGHQW LV H[SHFWHG WR

EVOLVE : A Visualization Tool for Multi-Objective Optimization Featuring Linked View of Explanatory Variables and Objective Functions

Performance Level Descriptors. Mathematics

Section 3.4 Basic Results of Graph Theory

Visual Analytics. Visualizing multivariate data:

Assignments are handed in on Tuesdays in even weeks. Deadlines are:

CS 231A Computer Vision (Winter 2015) Problem Set 2

CS195H Homework 5. Due:March 12th, 2015

EM225 Projective Geometry Part 2

Transcription:

High Dimensional Data Visualization

Some examples Text data. Finance. Time Series Data. Climate Data (http://www.erh.noaa.gov/lwx/f6.htm ). Spatial Data. Spatio temporal Data. Biological Data Many others US Industry Ratios. Each industry is characterized by eight numerical values (market capitalization, price/earning ratio, etc.) Source: visumap.net

Curse of Dimensionality Estimating the Gaussian density function around zero:

Phenomena in high dimensions High dimensional samples More noise! Samples in high dimensional space tend to be equidistant! Concentration of measure phenomenon The mathematical and statistical properties of high dimensional data spaces are often poorly understood or inadequately considered. Data are rarely randomly distributed in high dimensions and are highly correlated, often with spurious correlations. Literature: M. Verleysen: Learning high dimensional data. Limitations and Future Trends in Neural Computation, S. Ablameyko et al. (Eds.), IOS Press, 2003, pp. 141 162 D. L. Donoho, High Dimensional Data Analysis: The Curses and Blessings of Dimensionality. Lecture on August 8, 2000, to the American Mathematical Society "Math Challenges of the 21st Century". Available from http://wwwstat.stanford.edu/~donoho/. R Clarke, et al. The properties of high dimensional data spaces: implications for exploring gene and protein expression data. https://www.ncbi.nlm.nih.gov/pmc/articles/pmc2238676/ 4

Representative Methods Parallel coordinates Scatter plots Mosaicplot Grand tour Trellis displays Linked views Multivariate visualization Others

Parallel Coordinates Orthogonal coordinates fail after three dimensions Basic idea: Give up orthogonality Draw the coordinate system as a series of parallel axes in a plane Plot a point in parallel coordinates as a polyline This is a unique representation of a point. The number of axes is effectively unlimited.

Parallel Coordinates Parallel coordinate plot for 10 variables on almost 400 cars The polyline represents the point,,,,

Parallel Coordinates Points in Cartesian space map into lines in parallel coordinate space and vice versa There is a fundamental projective geometry duality in this representation

Parallel Coordinates Points in Cartesian space map into lines in parallel coordinate space and vice versa There is a fundamental projective geometry duality in this representation

Parallel Coordinates Points in Cartesian space map into lines in parallel coordinate space and vice versa There is a fundamental projective geometry duality in this representation

Parallel Coordinates Points in Cartesian space map into lines in parallel coordinate space and vice versa There is a fundamental projective geometry duality in this representation More dualities Circles and ellipses in Cartesian space map into hyperbolas in parallel coordinate space Rotations in Cartesian space map into translations in parallel coordinate space and vice versa Points of inflection in Cartesian space map into cusps in parallel coordinate space. These dualities allow easy interpretation in parallel coordinate space

Utility of Parallel Coordinates Overview No other statistical graphic can plot so much information (cases and variables) at a time. Thus parallel coordinate plots are an ideal tool to get a first overview of a data set. All axes have been scaled to min max. Several features, like a few very expensive cars, three very fuel efficient cars, and the negative correlation between car size and gas mileage, are immediately apparent.

Utility of Parallel Coordinates Profiles Parallel coordinate plots can be used to visualize the profile of a single case via highlighting. Profiles are not only restricted to single cases but can be plotted for a whole group, to compare the profile of that group with the rest of the data. The most fuel efficient car (Honda Insight 2dr.) highlighted

Utility of Parallel Coordinates Monitor When working on subsets of a data set parallel coordinate plots can help to relate features of a specific subset to the rest of the data set. When looking at the result of a multidimensional scaling procedure, parallel coordinate plots can help to find the major axes, which influence the configuration of the multidimensional scaling (MDS). MDS is a class of techniques where a set of given distances is approximated by distances in low dimensional Euclidean space The four cars in the lower right of the MDS are highlighted and happen to be the most expensive and most powerful cars in the data set

Parallel Coordinate Limitation A common question about parallel coordinates involves the adjacency issue. Axes that are adjacent allow for easier comparison than axes that are not adjacent. How many parallel coordinate displays do you need so that all pairwise adjacencies are present?

Parallel Coordinate Limitation If the parallel coordinate axes are ordered from 1 through d, then there is an easy pairwise comparison of 1 with 2, 2 with 3 and so on. However, the pairwise comparison of 1 with 3, 2 with 5 and so on was not easily done because these axes were not adjacent. One simple mathematical question then is what is the minimal number of permutations of the axes in order to guarantee all possible pairwise adjacencies. Although there are d! (fractorial of d) permutations, many of these are duplicate adjacencies. Actually far fewer permutations are required.

Parallel Coordinate Theory Graph representation for the axes A graph is drawn with vertices representing coordinate axes, labeled clockwise 1 to d. Edges represent adjacencies, so that vertex one connected to vertex two by an edge means axis one is placed adjacent to axis two. Based on the graph, To construct a minimal set of permutations that completes the graph is equivalent to findings a minimal set of orderings of the axes so that every possible adjacency is present

Parallel Coordinate Limitation

Hamiltonian Paths It is about how to represent all the spanning trees of the complete graph that the union of the edges in the obtained spanning trees is the original set of the edges and the intersection of the edges in the obtained spanning trees is empty or close to empty. A technique called Hamiltonian paths and Hamiltonian cycles of the graph can be used.

Hamiltonian paths and Hamiltonian cycle A Hamiltonian path or traceable path is a path that visits each vertex exactly once. A graph that contains a Hamiltonian path is called a traceable graph. A graph is Hamiltonian connected if for every pair of vertices there is a Hamiltonian path between the two vertices. A Hamiltonian cycle, Hamiltonian circuit, vertex tour or graph cycle is a cycle that visits each vertex exactly once (except for the vertex that is both the start and end, which is visited twice). A graph that contains a Hamiltonian cycle is called a Hamiltonian graph. Examples a complete graph with more than two vertices is Hamiltonian. every cycle graph is Hamiltonian. every tournament has an odd number of Hamiltonian paths (Rédei 1934). every platonic solid, considered as a graph, is Hamiltonian. Every maximal planar graph with no separating triangles is Hamiltonian.

A Simple Way to Find Hamiltonian Paths A basic zig zag pattern can be used in the construction. This creates an ordering which in the example is 1 2 7 3 6 4 5. For d even this general sequence can be written as 1, 2,, 3, 1, 4, 2, 2 /2 and for d odd as 1, 2,, 3, 1, 4, 2,. 3 /2. An even simpler formulation is: 1 mod, 1,2,, 1 With 1. Here 0 mod mod.

Parallel Coordinates Limitation II Hard to scale to large scale data sets! 400 data points 3848 data points

Parallel Coordinates Limitation II Certain clustering technique is needed + enhanced visual representation for the visualization of the obtained clusters! Original Density based 3848 data points

Parallel Coordinates Limitation II Other improvements arc edge [Zhou et al. EuroVis08] splatting [Zhou et al. EuroVis09] Continuous representation [Lehmann and Theisel TVCG11]

Parallel Coordinates Limitation II Other improvements Force direct [Walker et al. 2013] PCP + Angular histogram [Geng et al. Vis2011] Parallel Coordinates Art [Heinrich and Weiskopf Vis2013]

Additional Readings Alfred Inselberg. "Parallel Coordinates: Interactive visualization for high dimensions", Trends in Interactive Visualization, Advanced information and knowledge processing 2009, pp 49 78. Julian Heinrich and Daniel Weiskopf. "State of the art of Parallel Coordinates", Eurographics 2013, State of the Art Reports, pp. 95 116. Zhao Geng, Zhenmin Peng, Robert S.Laramee, Rick Walker, and Jonathan C. Roberts, Angular Histograms: Frequency Based Visualizations for Large, High Dimensional Data, IEEE TVCG, Vis 2011. Hong Zhou, Weiwei Cui, Huamin Qu, Yingcai Wu, Xiaoru Yuan, Wei Zhou, Splatting the lines in parallel coordinates, EuroVis09. Hong Zhou, Xiaoru Yuan, Huamin Qu, Weiwei Cui, Baoquan Chen. Visual clustering in parallel coordinates. EuroVis08. Julian Heinrich and Daniel Weiskopf. "Parallel Coordinates Art". IEEE VIS 2013 Arts Program.

Software for PCP While there is a large amount of papers about parallel coordinates, there are only few notable software publicly available to convert databases into parallel coordinates graphics. Notable software are ELKI, GGobi, Macrofocus High D, Mondrian, and ROOT. Libraries include Protovis.js, D3.js provide basic examples, while more complex examples are also available. D3.Parcoords.js (a D3 based library) and Macrofocus High D API (a Java library) specifically dedicated to coords graphic creation have also been published. The Python data structure and analysis library Pandas implements parallel coordinates plotting, using the plotting library matplotlib.the R package GGally, among others, implements parallel coordinates plotting. https://en.wikipedia.org/wiki/parallel_coordinates http://www.xdat.org/ http://davis.wpi.edu/xmdv/ http://www.oicweave.org/ http://www.kdnuggets.com/software/visualization.html

Dimensionality reduction Assume that the data of interest lie on an embedded non linear manifold within the higher dimensional space. If the manifold is of low enough dimension, the data can be visualised in the lowdimensional space Linear Decomposition Methods Principle component analysis (PCA) Singular value decomposition (SVD) Non linear dimensionality reduction Multidimensional scaling manifold learning many more!! https://en.wikipedia.org/wiki/nonlinear_dimensionality_reduction

Visual Analytics

Numerous data Data deluge! Useful knowledge! transformational science and engineering

Numerous data Data deluge! Useful knowledge! transformational science and engineering

Numerous data Data deluge! Useful knowledge! transformational science and engineering Visual analytics the science of combining interactive visual interfaces with automatic algorithms to support analytical reasoning and build synergies between humans and computers

Visual Analytics Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces People use visual analytics tools and techniques to Synthesize information and derive insight from massive, dynamic, ambiguous, and often conflicting data Detect the expected and discover the unexpected Provide timely, defensible, and understandable assessments Communicate assessment effectively for action. Not really an area per se More of an umbrella notion Combines multiple areas or disciplines Ultimately about using data to improve our knowledge and help make decisions Main Components:

Combine strengths of both human and electronic data processing Gives a semi automated analytical process Use strengths from each Below from Keim, 2008 Keim et al, chapter in Information Visualization: Human Centered Issues and Perspectives, 2008

Domain Roots Dept. of Homeland Security supported founding VA research Area has thus been connected with security, intelligence, law enforcement Should be domain independent, however, as other areas need VA too Business, science, biology, legal, etc. Source: Google images

Acknowledgment Thank the following people for the materials Prof. Min Chen Prof. Niklas Elmqvist

Structure of your final project report (4 8 pages using IEEE VGTC style) https://www.computer.org/web/tvcg/author Abstract Introduction Related work (adapt your literature review for this) Implementation Results Future directions References