TableLens: A Clear Window for Viewing Multivariate Data Ramana Rao July 11, 2006

Similar documents
Data analysis using Microsoft Excel

Table of Contents (As covered from textbook)

Information Visualization

Linked Data Views. Introduction. Starting with Scatterplots. By Graham Wills

M i c r o s o f t E x c e l A d v a n c e d P a r t 3-4. Microsoft Excel Advanced 3-4

Chapter 2: Descriptive Statistics

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

THE SWALLOW-TAIL PLOT: A SIMPLE GRAPH FOR VISUALIZING BIVARIATE DATA.

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Q: Which month has the lowest sale? Answer: Q:There are three consecutive months for which sale grow. What are they? Answer: Q: Which month

Ancient System of Operations: Computing with Mathematical Devices. By: Calvin L. Fitts Jr. History of Mathematics. Ms.

Data Mining: Exploring Data

MATH11400 Statistics Homepage

3 Graphical Displays of Data

Equity Screening Manual

At the end of the chapter, you will learn to: Present data in textual form. Construct different types of table and graphs

Vocabulary: Data Distributions

DSC 201: Data Analysis & Visualization

Exploratory Data Analysis

STANDARDS OF LEARNING CONTENT REVIEW NOTES ALGEBRA I. 4 th Nine Weeks,

Data Analyst Nanodegree Syllabus

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

3 Graphical Displays of Data

Data Mining: Exploring Data. Lecture Notes for Chapter 3

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

Spreadsheet definition: Starting a New Excel Worksheet: Navigating Through an Excel Worksheet

Emerging Technologies in Knowledge Management By Ramana Rao, CTO of Inxight Software, Inc.

Name: Stat 300: Intro to Probability & Statistics Textbook: Introduction to Statistical Investigations

How to use FSBforecast Excel add in for regression analysis

Multivariate Data & Tables and Graphs

Multivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Table XXX MBA Assessment Results for Basic Content Knowledge Learning Goal: Aggregate Subject Matter Scores

Business Analytics Nanodegree Syllabus

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Math 227 EXCEL / MEGASTAT Guide

Data Analyst Nanodegree Syllabus

An Experiment in Visual Clustering Using Star Glyph Displays

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

Multivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles

Organisation and Presentation of Data in Medical Research Dr K Saji.MD(Hom)

The Table Lens: Merging Graphical and Symbolic Representations in an Interactive Focus+ Context Visualization for Tabular Information

Making EXCEL Work for YOU!

M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes.

STANDARDS OF LEARNING CONTENT REVIEW NOTES. ALGEBRA I Part II. 3 rd Nine Weeks,

Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)

Lab 3: From Data to Models

Chapter 2 Modeling Distributions of Data

CHAPTER 3: Data Description

Excel 2007/2010. Don t be afraid of PivotTables. Prepared by: Tina Purtee Information Technology (818)

Visualization? Information Visualization. Information Visualization? Ceci n est pas une visualization! So why two disciplines? So why two disciplines?

CS 229: Machine Learning Final Report Identifying Driving Behavior from Data

CS 4460 Intro. to Information Visualization Sep. 18, 2017 John Stasko

Excel Basics: Working with Spreadsheets

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

Working with Microsoft Excel. Touring Excel. Selecting Data. Presented by: Brian Pearson

Section 2.1: Intro to Simple Linear Regression & Least Squares

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

Supplementary File: Dynamic Resource Allocation using Virtual Machines for Cloud Computing Environment

Advanced data visualization (charts, graphs, dashboards, fever charts, heat maps, etc.)

Big Mathematical Ideas and Understandings

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Cornerstone 7. DoE, Six Sigma, and EDA

Business Rules Extracted from Code

Chapter 4: Analyzing Bivariate Data with Fathom

Averages and Variation

INSPIRE and SPIRES Log File Analysis

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Chapter 11. Worked-Out Solutions Explorations (p. 585) Chapter 11 Maintaining Mathematical Proficiency (p. 583)

Visualizing the World

Six Weeks:

STA 570 Spring Lecture 5 Tuesday, Feb 1

InfoZoom Analysing Formula One racing results with an interactive data mining and visualisation tool

Sections Graphical Displays and Measures of Center. Brian Habing Department of Statistics University of South Carolina.

Chapter 3 Understanding and Comparing Distributions

CHAPTER 4: MICROSOFT OFFICE: EXCEL 2010

Measures of Position

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data

Visual Analytics. Visualizing multivariate data:

DOWNLOAD PDF LEARN TO USE MICROSOFT ACCESS

Lesson 1. Why Use It? Terms to Know

InfoVis: a semiotic perspective

Exploring and Understanding Data Using R.

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

ODK Tables Graphing Tool

RESEARCH ANALYTICS From Web of Science to InCites. September 20 th, 2010 Marta Plebani

Cursor Design Considerations For the Pointer-based Television

turning data into dollars

Equations and Functions, Variables and Expressions

Tracking the future? Orbcomm s proposed IPO

Using Excel for Graphical Analysis of Data

A TEXT MINER ANALYSIS TO COMPARE INTERNET AND MEDLINE INFORMATION ABOUT ALLERGY MEDICATIONS Chakib Battioui, University of Louisville, Louisville, KY

Depiction of program declaring a variable and then assigning it a value

SAS Web Report Studio 3.1

Finding a needle in Haystack: Facebook's photo storage

Few s Design Guidance

Transcription:

TableLens: A Clear Window for Viewing Multivariate Data Ramana Rao July 11, 2006 Can a few simple operators on a familiar and minimal representation provide much of the power of exploratory data analysis? This became the question driving the design of TableLens shortly after a somewhat accidental discovery. I had joined the Information Visualization team at Xerox PARC (Palo Alto Research Center), and we were on a march to invent techniques for a variety of canonical information structures. I was working on a concept called SpreadCube in 1993, which was an effort to perceptualize tabular data models in 3-D spaces. Meanwhile, as life had it, I was training for a marathon, and I was keeping a running log as a tabular text file. On an 8-mile route, I visualized what my running log would look like if I colored the time run on a given day by the route. This thought led to a quick hack to display a table showing both numbers and colors in cells. And then tweaking away, I realized quickly that bars could be placed into the cell, miniaturized, losing the numbers, and voilà, TableLens was born. TableLens, unlike most of the other visualizations created by our PARC team, is by itself a general application. Quite quickly it became apparent that a very small set of features provides tremendous exploratory power. Though generally it is important to demonstrate with data that people know or care about, in this article, I'll use original examples. Even after all these years, these "ancient" examples seem sufficient to illustrate TableLens' power and simplicity. Figure 1: TableLens puts graphics first in a familiar table structure so patterns and outliers can readily be seen. TableLens: A Clear Window for Viewing Multivariate Data Page 1

TableLens enables exploring multivariate data sets by arranging data into tabular rows and columns. Figure 1 shows baseball hitting statistics from 1987 for 323 baseball players (the rows). It has 25 values (the columns) for each player. Some of cells are said to be in "focus," meaning sufficient space is allocated to display the values in those cells, while others are in "context," meaning they only contain graphical representations. The columns show quantitative statistics including season and career at bats, hits, home runs, and RBIs as well as categorical properties including team and position on the field. Quantitative variables are represented by graphical bars proportional in length to the represented values, and category values with a corresponding color and position within the field. Though this example shows statistics for baseball players, it clearly can apply to many data sets of a very common organization called cases-by-variables arrays, multivariate data, and relational tables. Such data sets are widespread in science, business, economics, government, education, and modern life. Consider how widely relational databases are applied. Other simple examples include: Clinical trials patients with health and trial effect data Companies with various financial and investment data Parts or products with descriptive, pricing, availability information Countries with their geographical, economic, and demographic properties Kinds of cars with their physical or performance characteristics Stocks or mutual funds with various ratios and ratings TableLens has a few features and operators that provide its basic power: Parallel Cases. That's fancy for table. Tables are familiar to many people. Contrast this with parallel coordinates (a complicated visualization of multivariate data that uses a variation of a line graph), which are quite powerful but still quite unusual. By keeping cases (that is, objects) aligned across the rows, it's easy to process them en masse by scanning up and down and across. Put Graphics First. No wizards or summaries or anything to get graphics, just all the data arrayed visually within a table. A thousand bars can be scanned much more rapidly than a thousand numbers shown as text. This allows easy spotting of trends, correlations, outliers, and so on within and across samples. Sorting. Sorting the table by different variables leads to a simple and intuitive way of understanding the shape and spread of values of a given variable, as well as a means to spot correlations between variables. Focus Plus Context. TableLens allows "opening" up regions that also show textual values along with the graphical representation. This ability to see specific values in context means that there isn't a back and forth shuffling between different views. Rearrangement. Not to be underestimated is the ability to reorder columns, not only to use proximity as an external cognitive aid, but also because manipulation itself supports a fluid and dynamic thinking process. TableLens: A Clear Window for Viewing Multivariate Data Page 2

Let's look in a little more detail. Sorting a column provides extremely powerful analytical capabilities. Many properties of the batch of values in a sorted column are apparent by examining the graphical marks (for example, bars) and the shape of the curve in the column. Essentially, a sorted column serves to explore the sample of values (as does a boxplot); but just as importantly, after one variable has been sorted, if another variable is correlated, then its values will also appear to be sorted. Thus, looking for correlated variables is a matter of scanning across the columns to identify other columns that exhibit a similarly shaped descending curve or one that approximates a mirror image of the curve. In Figure 2, the user has clicked on the At Bats column header, thus sorting the rows by that column's values. A number of other quantitative variables nearby now appear roughly sorted, thus revealing correlation. Salary also is very roughly correlated, though as would be expected, there must be other factors. A next question might be: Who is the guy that makes so much money? Figure 2: The user has sorted rows by At Bats. Figure 3 shows a different data set comprised of 500+ cars and nine variables. The table was sorted by MPG (miles per gallon) within a sort by a categorical variable, the origin of the car (European, Japanese, or American). Besides various quantitative correlations, a number of correlations with categories also show up. For example, at the time of this data, America made all the eight-cylinder, gas-guzzling cars. TableLens: A Clear Window for Viewing Multivariate Data Page 3

Figure 3: 500 cars sorted by Miles per Gallon and Origin (American, European, or Japanese) TableLens can scale to more rows than the number of vertical pixels available by mapping multiple values into a single pixel line. By aggregating the multiple values using a choice of min, max, median, or random, various tasks can be handled. In any case, even with 500 values, statistically valid inferences can be made with random samples, since the absolute size of a sample is more important than the proportion of the population it covers. The use of graphical representations clearly provides a scale advantage because the bars can be scaled to one pixel wide without disturbing comparisons. Scanning across the baseball data in a spreadsheet would require 18 vertical and 3 horizontal screen scrolls of a similarly sized window. More importantly, because the cost of processing information is now greatly reduced, new methods of exploration and analysis are made available to broader sets of users. At a glance, patterns and outliers are apparent; and then, opportunistically, the user can focus on portions of the table of interest for more detailed observation. Thus is enabled the data-driven style of exploration espoused in EDA (exploratory data analysis, as first described by the late Princeton statistician John Tukey). The combination of features here is unique and emerged rather rapidly in the right place and right time for our PARC visualization team in 1993. Certainly, various ideas from others relate as background influences to TableLens. The most important of these include spreadsheets in general, Lotus Improv in particular, exploratory data analysis, and graphical manipulations as inspired by Jacques Bertin. TableLens: A Clear Window for Viewing Multivariate Data Page 4

The field of exploratory data analysis offers some inspiration for building systems focused on exploration and discovery. In Tukey's words, EDA is "about looking at data to see what it seems to say." As opposed to utilizing data to confirm a carried-in hypothesis or belief, it is about examining the data and following what is seen. Much true knowledge work is about finding the right questions to ask as much as it is about answering known questions. In a sense, it is seeing before proceeding. TableLens supports exploratory analysis in a form suitable to a much broader range of people than conventional data analysis tools. Rather than supporting the entire repertoire of EDA methods, we sought a tool with a few good operations, one that might be learned in five minutes or so. The basic operations of data analysis looking for correlation, spotting outliers or peculiar values, forming and comparing groups of items can support a wide range of domains and activities. Not surprisingly, exploratory data analysis, just as statistics or mathematics, is fundamentally domain-independent and widely applicable. Author's Note: TableLens is patent-protected technology available from Inxight Software in a number of different product forms. About the Author Ramana Rao is a Founder and former CTO of Inxight Software. At Inxight and previously at Xerox PARC, Ramana performed pioneering work in intelligent information access, digital libraries, information visualization, and user interfaces. He is the inventor, along with Stuart Card, of TableLens. This article originally appeared on the Business Intelligence Network (www.b-eyenetwork.com) as one of Stephen Few s guest articles. A library of Stephen Few s articles, as well as other guest articles, are available at www.perceptualedge.com. TableLens: A Clear Window for Viewing Multivariate Data Page 5