Quality Metrics for Visual Analytics of High-Dimensional Data

Similar documents
DUE to the technological progress over the last decades,

Interactive Visual Exploration

Multidimensional Visualization and Clustering

Machine Learning Methods in Visualisation for Big Data 2018

Knowledge Discovery and Data Mining I

Model Space Visualization for Multivariate Linear Trend Discovery

Visualization with Data Clustering DIVA Seminar Winter 2006 University of Fribourg

Meta Parallel Coordinates For Visualizing Features in Large, High-Dimensional, Time-Varying Data

Large Scale Information Visualization. Jing Yang Fall Tree and Graph Visualization (2)

3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data

Background. Parallel Coordinates. Basics. Good Example

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Visual Analytics Techniques for Large Multi-Attribute Time Series Data

Clustering. Chapter 10 in Introduction to statistical learning

Visualization of Large Web Access Data Sets

Multiple Dimensional Visualization

Edge linking. Two types of approaches. This process needs to be able to bridge gaps in detected edges due to the reason mentioned above

Edge detection. Gradient-based edge operators

Courtesy of Prof. Shixia University

Feature Selection. Ardy Goshtasby Wright State University and Image Fusion Systems Research

Reducing Complex Visualizations for Analysis

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Pargnostics: Screen-Space Metrics for Parallel Coordinates

Visual Encoding Design

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

Parallel Coordinates ++

Exploratory Data Analysis EDA

Visualization and text mining of patent and non-patent data

Information Visualization and Visual Analytics roles, challenges, and examples Giuseppe Santucci

Using R-trees for Interactive Visualization of Large Multidimensional Datasets

TNM093 Tillämpad visualisering och virtuell verklighet. Jimmy Johansson C-Research, Linköping University

(Refer Slide Time: 0:32)

Understanding Clustering Supervising the unsupervised

Network Traffic Measurements and Analysis

Clustering Part 4 DBSCAN

Flexible Visual Summaries of High-Dimensional Data using Hierarchical Aggregated Data-Subsets

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Interactive Interface Design for Scalable Large Multivariate Volume Visualization

Information Visualization & Visual Analytics

An Improved Apriori Algorithm for Association Rules

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Chapter 3 Image Registration. Chapter 3 Image Registration

Fitting: The Hough transform

asses for Real Valued High mensional Data Sets Kamalakar Karlapalem Indian Institute of Technology, Gandhinagar Hyderabad

Value and Relation Display for Interactive Exploration of High Dimensional Datasets

Context Change and Versatile Models in Machine Learning

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Clustering to Reduce Spatial Data Set Size

Pixels. Orientation π. θ π/2 φ. x (i) A (i, j) height. (x, y) y(j)

Analytical Techniques for Anomaly Detection Through Features, Signal-Noise Separation and Partial-Value Association

Automated Procedures for Creating and Updating Digital City Models Using Very High Resolution (VHR) Satellite Imagery

Visualization? Information Visualization. Information Visualization? Ceci n est pas une visualization! So why two disciplines? So why two disciplines?

Project 11 Graphs (Using MS Excel Version )

SpringView: Cooperation of Radviz and Parallel Coordinates for View Optimization and Clutter Reduction

Chapter 4. Clustering Core Atoms by Location

Perception IV: Place Recognition, Line Extraction

An automated approach for the optimization of pixel-based visualizations

Clutter Reduction in Multi-Dimensional Data Visualization Using Dimension. reordering.

University of Florida CISE department Gator Engineering. Clustering Part 4

Object and Action Detection from a Single Example

Integrating Logistic Regression with Knowledge Discovery Systems

CS6220: DATA MINING TECHNIQUES

ECLT 5810 Clustering

Enhancing K-means Clustering Algorithm with Improved Initial Center

Table Of Contents: xix Foreword to Second Edition

ECLT 5810 Clustering

SAS Visual Analytics 8.2: Working with Report Content

ECLT 5810 Data Preprocessing. Prof. Wai Lam

E0005E - Industrial Image Analysis

Visualizing Frequent Patterns in Large Multivariate Time Series

Improving 2D scatterplots effectiveness through sampling, displacement, and user perception

HOUGH TRANSFORM. Plan for today. Introduction to HT. An image with linear structures. INF 4300 Digital Image Analysis

MIS2502: Data Analytics Clustering and Segmentation. Jing Gong

Chapter 1, Introduction

Math 121 Project 4: Graphs

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

Glyphs. Presentation Overview. What is a Glyph!? Cont. What is a Glyph!? Glyph Fundamentals. Goal of Paper. Presented by Bertrand Low

Geometric Techniques. Part 1. Example: Scatter Plot. Basic Idea: Scatterplots. Basic Idea. House data: Price and Number of bedrooms

Multi-Similarity Matrices of Eye Movement Data

What is Interaction?

Scalable Pixel-based Visual Interfaces: Challenges and Solutions

Robotics Programming Laboratory

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

Introduction to Geospatial Analysis

Visual object classification by sparse convolutional neural networks

Dimension reduction : PCA and Clustering

Image Mining: frameworks and techniques

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.

EE368 Project: Visual Code Marker Detection

CS Information Visualization Sep. 19, 2016 John Stasko

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Visual Analytics. Visualizing multivariate data:

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Clustering CS 550: Machine Learning

netzen - a software tool for the analysis and visualization of network data about

Frequency Distributions

Many-to-Many Relational Parallel Coordinates Displays

Transcription:

Quality Metrics for Visual Analytics of High-Dimensional Data Daniel A. Keim Data Analysis and Information Visualization Group University of Konstanz, Germany Workshop on Visual Analytics and Information Fusion at KDD 2011 August 21, 2011 Definition of Visual Analytics Definition: Visual Analytics is the science of analytical reasoning facilitated by visual interfaces. Visual analytics techniques are needed to integrate data from massive, dynamic, ambiguous, and often conflicting sources analyze the data to derive new insights make critical decisions in real-time. 1

Definition of Visual Analytics Visual Analytics: Tight Integration of Visual and Automatic Data Analysis Methods for Information Exploration and Scalable Decision Support Visual Data Exploration Visualisation Data Knowledge Models Automated Data Analysis Feedback loop Roadmap from the VisMaster EU Project www.visual-analytics.eu 2

Technical Challenges When should we use automated techniques versus interactive ti techniques? When do we need both? How to best combine interactive visualizations with automated t analysis techniques? The KDD Pipeline Fayyad, U. M. et al. 1996. From data mining to knowledge discovery: an overview. In Fayyad, U. M.et al (Eds.), Advances in knowledge discovery and data mining. AAAI Press / The MIT Press. 3

Quality-Metrics-Driven Automation Enrico Bertini, Member, Andrada Tatu, Daniel Keim: Quality Metrics in High-Dimensional Data Visualization: An Overview and Systematization, IEEE Conf. on Information Visualization, 2011. Outline Visual Analytics Definition Challenges Quality Metrics Quality Metrics for Visual Mappings Data Transformations View Transformations Perspectives 4

Quality Metrics for Visual Mapping Automated Mapping Support for Pixel Bar Charts for Scatter Plots for Parallel Coordinates Pixel Bar Charts Equal-Width Pixel Bar Chart D. A. Keim, M. C. Hao, U. Dayal and M. Hsu. Pixel Bar Charts: A Visualization Technique for Very Large Multi-Attribute Data Sets. Visualization, San Diego 2001, extended version in: Information Visualization Journal, Palgrave, 1 (2), 2002. 5

Pixel Bar Charts Product Type Product Type Equal-Height Pixel Bar Chart D. A. Keim, M. C. Hao, U. Dayal and M. Hsu. Pixel Bar Charts: A Visualization Technique for Very Large Multi-Attribute Data Sets. Visualization, San Diego 2001, extended version in: Information Visualization Journal, Palgrave, 1 (2), 2002. Pixel Bar Charts high low 1 23 4 5 6 7 10 11 12 1 23 4 5 6 7 1011 11 12 1 23 4 5 6 7 1011 11 12 Product Type Product Type Product Type Color=dollar amount Color=number of visits Color=quantity Multi Pixel Bar Charts D. A. Keim, M. C. Hao, U. Dayal and M. Hsu. Pixel Bar Charts: A Visualization Technique for Very Large Multi-Attribute Data Sets. Visualization, San Diego 2001, extended version in: Information Visualization Journal, Palgrave, 1 (2), 2002. 6

Pixel Bar Charts A pixel bar chart is defined by a five tuple <D x, D y, O x, O y, C> where D x, D y, O x, O y, C {A l,, A k, } and D x / D y are the dividing attributes on the x-/y-axes O x / O y are the ordering attributes on the x-/y-axes C defines the coloring attribute(s) D. A. Keim, M. C. Hao, U. Dayal and M. Hsu. Pixel Bar Charts: A Visualization Technique for Very Large Multi-Attribute Data Sets. Visualization, San Diego 2001, extended version in: Information Visualization Journal, Palgrave, 1 (2), 2002. Pixel Bar Charts Dividing Attribute on x-axis (e.g. Product Type) D x D. A. Keim, M. C. Hao, U. Dayal and M. Hsu. Pixel Bar Charts: A Visualization Technique for Very Large Multi-Attribute Data Sets. Visualization, San Diego 2001, extended version in: Information Visualization Journal, Palgrave, 1 (2), 2002. 7

Pixel Bar Charts D y Dividing Attribute on y-axis (e.g. Region) D x D. A. Keim, M. C. Hao, U. Dayal and M. Hsu. Pixel Bar Charts: A Visualization Technique for Very Large Multi-Attribute Data Sets. Visualization, San Diego 2001, extended version in: Information Visualization Journal, Palgrave, 1 (2), 2002. Pixel Bar Charts C O y O x Ordering Attributes on x- and y-axes D. A. Keim, M. C. Hao, U. Dayal and M. Hsu. Pixel Bar Charts: A Visualization Technique for Very Large Multi-Attribute Data Sets. Visualization, San Diego 2001, extended version in: Information Visualization Journal, Palgrave, 1 (2), 2002. 8

Automated Mapping Support for Pixel Bar Charts Data Parameter Setting Visualization Color = Blocknumber D x PixelBarCharts Rework Fail Pass Color = Wert OT O y Color = Wert BKSL O x min max Parameter Space Very Large Interactive Exploration impossible Automated Mapping Support for Pixel Bar Charts Candidate Set P 1,,P n Relevance feedback J. Schneidewind, M. Sips and D. A. Keim. Pixnostics: Towards Measuring the Value of Visualization. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST '06), 2006. 9

Automated Mapping Support for Pixel Bar Charts 900 Pixel Bar Charts only few provide insight J. Schneidewind, M. Sips and D. A. Keim. Pixnostics: Towards Measuring the Value of Visualization. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST '06), 2006. Automated Mapping Support for Pixel Bar Charts Rework Fail Pass ID Temperature SSL Detailed analysis of attribute combinations: The 2nd bar reveals an obvious pattern 25 most relevant Pixel Bar Charts constructed from highly ranked attribute combinations (Correlation Analysis / Classification) J. Schneidewind, M. Sips and D. A. Keim. Pixnostics: Towards Measuring the Value of Visualization. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST '06), 2006. 10

Automated Mapping Support for Pixel Bar Charts after sorting according to our measure J. Schneidewind, M. Sips and D. A. Keim. Pixnostics: Towards Measuring the Value of Visualization. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST '06), 2006. Automated Mapping Support?? 11

Automated Mapping Support for Scatter Plots Scatterplots with class information Each color represents one class Evaluate scatterplots according their separation properties Algorithm: Separate classes in distinct images Compute a density image for each one Estimate the mutual overlap Automated Mapping Support for Scatter Plots Best ranked views Olives dataset (Class Density Measure) Worst ranked views 12

Rotating Variance Measure Find linear and non-linear correlations between pairwise dimensions Continuous density field Local density for each pixel p: ρ = 1/r r = enclosing sphere of the k-nearest neighbors of p Rotating Variance Measure Take samples along different lines centered at the corresponding pixel. Compute the weighted distribution for each pixel x i Compute the weighted distribution for each pixel x 13

Rotating Variance Measure For each column y in the image we compute the minimum v i value and sum up the result, where v(x,y) = v i Automated Mapping Support for Scatter Plots Parkinson s disease dataset (Rotating Variance Measure) Best ranked views Worst ranked views 14

Hough Space Measure Interesting patterns are usually clustered lines with similar positions and directions Hough Transform y = ax + b b= - ax + y Problem: the slope of vertical lines is infinite Normal representation: ρ = xcosθ + ysinθ y ρ x θ Hough Space Measure For each non-background pixel in the visualization, we have a distinct sinusoidal curve in the ρθ-plane Hough/Accumulator Space Hough/Accumulator Space y ρ y ρ x θ x θ 15

Hough Space Measure Good visualizations must contain fewer well defined clusters, i.e. accumulator cells with high values Compute the median value m as an adaptive threshold s i, j 1 # # highvaluecells #Cells Hough Space Measure Overall quality measure Exhaustively computing all n-dimensions combinations Long computation time Unfeasible for a large n Solve a Traveling Salesman Problem A*-Search algorithm 16

Automated Mapping Support for Parallel Coordinates Parkinson s disease dataset (Hough Space Measure) Best ranked views Worst ranked views Automated Mapping Support for Parallel Coordinates Cars dataset (Hough Space Measure for all Classes) Best ranked views Worst ranked views 17

Outline Visual Analytics Definition Challenges Quality Metrics Quality Metrics for Visual Mappings Data Transformations View Transformations Perspectives Quality Metrics for Data Transformation Quality Metrics for Filtering Jing Yang,Wei Peng, Matthew O. Ward and Elke A. Rundensteiner: Interactive Hierarchical Dimension Ordering, Spacing and Filtering for Exploration of High Dimensional Datasets, IEEE Symposium on Information Visualization 2003 (InfoVis 2003), pp 105-112, 2003. 18

Quality Metrics for Data Transformation Quality Metrics for Sorting Dimensions S. Johansson and J. Johansson: Interactive Dimensionality Reduction Through User-defined Combinations of Quality Metrics, IEEE Trans. On Visualization and Computer Graphics, 2009. Outline Visual Analytics Definition Challenges Quality Metrics Quality Metrics for Visual Mappings Data Transformations View Transformations Perspectives 19

Quality Metrics for View Transformation Metrics for View Distortion Jing Yang,Wei Peng, Matthew O. Ward and Elke A. Rundensteiner: Interactive Hierarchical Dimension Ordering, Spacing and Filtering for Exploration of High Dimensional Datasets, IEEE Symposium on Information Visualization 2003 (InfoVis 2003), pp 105-112, 2003. Outline Visual Analytics Definition Challenges Quality Metrics Quality Metrics for Visual Mappings Data Transformations View Transformations Perspectives 20

Enrico Bertini, Member, Andrada Tatu, Daniel Keim: Quality Metrics in High-Dimensional Data Visualization: An Overview and Systematization, IEEE Conf. on Information Visualization, 2011. Outline Visual Analytics Definition Challenges Quality Metrics Quality Metrics for Visual Mappings Data Transformations View Transformations Perspectives 21

Pipelines Visual Analytics Data Data Data Visualization of the data DM-Algorithm Result Visualization of the DM-Algorithm result Result Visualization of the result DM-Algorithm step 1 DM-Algorithm step n Result Visualization + Interaction Knowledge Knowledge Knowledge Preceding Visualization Subsequent Visualization Visual Analytics 22

Conclusion All truths are easy to understand once they are discovered; the point is to discover them. Galileo Galile (1564-1642) Questions? infovis.uni-konstanz.de www.idvbook.com Advanced Visual Analytics Interfaces - AVI 2010 23