Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS

Similar documents
1. Data Analysis Yields Numbers & Visualizations. 2. Why Visualize Data? 3. What do Visualizations do? 4. Research on Visualizations

GRAPHING BAYOUSIDE CLASSROOM DATA

Exploratory Data Analysis EDA

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Making Science Graphs and Interpreting Data

Graphical Presentation for Statistical Data (Relevant to AAT Examination Paper 4: Business Economics and Financial Mathematics) Introduction

Multivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles

University of Florida CISE department Gator Engineering. Visualization

Getting To Know The Multiform Bivariate Matrix

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Multivariate Data & Tables and Graphs

Parallel Coordinates ++

Information Visualization in Data Mining. S.T. Balke Department of Chemical Engineering and Applied Chemistry University of Toronto

visualizing q uantitative quantitative information information

hvpcp.apr user s guide: set up and tour

Multivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles

To make sense of data, you can start by answering the following questions:

Why Should We Care? More importantly, it is easy to lie or deceive people with bad plots

Math 121 Project 4: Graphs

Table of Contents (As covered from textbook)

Lab Activity #2- Statistics and Graphing

Univariate descriptives

Applied Regression Modeling: A Business Approach

8. MINITAB COMMANDS WEEK-BY-WEEK

Information Visualisation

TNM093 Tillämpad visualisering och virtuell verklighet. Jimmy Johansson C-Research, Linköping University

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Chapter 3. Determining Effective Data Display with Charts

Visual Computing. Lecture 2 Visualization, Data, and Process

Middle School Math Course 3

Studying in the Sciences

Multivariate Data More Overview

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

D&B Market Insight Release Notes. November, 2015

Tips and Guidance for Analyzing Data. Executive Summary

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

An Introduction to EDA with GeoDa

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form.

Econ 2148, spring 2019 Data visualization

DSC 201: Data Analysis & Visualization

Background. Lecture 2. Datafusion Systems

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Bar Charts and Frequency Distributions

JMP 10 Student Edition Quick Guide

Step 10 Visualisation Carlos Moura

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018

THE SWALLOW-TAIL PLOT: A SIMPLE GRAPH FOR VISUALIZING BIVARIATE DATA.

Integers & Absolute Value Properties of Addition Add Integers Subtract Integers. Add & Subtract Like Fractions Add & Subtract Unlike Fractions

Project 11 Graphs (Using MS Excel Version )

THINKING VISUALLY: AN INTRODUCTION TO DATA & INFORMATION VISUALIZATION

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

STA 570 Spring Lecture 5 Tuesday, Feb 1

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

Enduring Understandings: Some basic math skills are required to be reviewed in preparation for the course.

Middle School Math Series: Course 3 Textbook-Software Correlation

Chapter 4: Analyzing Bivariate Data with Fathom

Scatterplot: The Bridge from Correlation to Regression

This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

Digital Image Processing. Prof. P. K. Biswas. Department of Electronic & Electrical Communication Engineering

Applied Regression Modeling: A Business Approach

Perception Maneesh Agrawala CS 448B: Visualization Fall 2017 Last Time: Exploratory Data Analysis

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

IAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram

Lecture 3 - Data Visualization. Module 2

Regression III: Advanced Methods

Correlation of the ALEKS courses Algebra 1 and High School Geometry to the Wyoming Mathematics Content Standards for Grade 11

HOUR 12. Adding a Chart

Frequency Distributions

Excel Assignment 4: Correlation and Linear Regression (Office 2016 Version)

Beacon Catalog. Categories:

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester

Multistat2 1

SAS Visual Analytics 7.3 and 7.4: Getting Started with Exploration and Reporting

Select Cases. Select Cases GRAPHS. The Select Cases command excludes from further. selection criteria. Select Use filter variables

Geometric Techniques. Part 1. Example: Scatter Plot. Basic Idea: Scatterplots. Basic Idea. House data: Price and Number of bedrooms

SAS Visual Analytics 8.2: Getting Started with Reports

Excel 2010 Charts and Graphs

VW 1LQH :HHNV 7KH VWXGHQW LV H[SHFWHG WR

Mixed models in R using the lme4 package Part 2: Lattice graphics

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

Middle School Math Course 3 Correlation of the ALEKS course Middle School Math 3 to the Illinois Assessment Framework for Grade 8

7 Fractions. Number Sense and Numeration Measurement Geometry and Spatial Sense Patterning and Algebra Data Management and Probability

Advanced data visualization (charts, graphs, dashboards, fever charts, heat maps, etc.)

Knowledge Discovery and Data Mining I

SAS Visual Analytics 6.3

Tableau. training courses

SAS Visual Analytics 8.2: Working with Report Content

Last Time: Data and Image Models

Summary Of Topics covered in Year 7. Topic All pupils should Most pupils should Some pupils should Learn formal methods for

Key Terms. Symbology. Categorical attributes. Style. Layer file

Attribute Accuracy. Quantitative accuracy refers to the level of bias in estimating the values assigned such as estimated values of ph in a soil map.

Introduction to Geospatial Analysis

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

Glyphs. Presentation Overview. What is a Glyph!? Cont. What is a Glyph!? Glyph Fundamentals. Goal of Paper. Presented by Bertrand Low

BIG Data How to handle it. Mark Holton, College of Engineering, Swansea University,

Level 4 means that I can

8 th Grade Pre Algebra Pacing Guide 1 st Nine Weeks

Transcription:

Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix Paula Ahonen-Rainio Maa-123.3530 Visual Analysis in GIS 11.11.2015

Topics today YOUR REPORTS OF A-2 Thematic maps with charts for analysis Some remainders LECTURE/YOUR EXAMPLES IN A-2 Guidelines for statistical graphics line, column and bar charts; Do you really need pie charts? histogram and box plot to avoid misleading presentation of data LECTURE (continues in A-3) Basic methods for multivariable data Scatter plot matrix PCP (parallel coordinates plot)

Visual analysis of large data amounts The aim: pattern discovery trends correlation and relationships detection of irregularities (incl. outliers) for exploration or confirmation of hypotheses Capacity of maps and charts may be limiting Preprocessing of data often necessary, e.g. transformations, classification, clustering,

Limitations of maps in analysis Limited capacity for multiple variables Intuitiveness of the interpretation is both a strength and weakness Cf. preattentive vs. attentive perception dominance of large sizes vs. importance of areas Our visual system rather compares than measures Influence of projection in large areas Tools for interaction are necessary adds capacity remarkably Other?

Notice In addition to thematic mapping, maps should provide a sufficient and suitable geographic reference for locating thematic phenomena for revealing reasons behind, such as differences and discontinuities in geography!!! E.g. spatial variation of rainfall is partly dependent on the variations of elevation E.g. variation of voting behaviour (pre-election voting) is partly dependent on the habitation density, i.e. the distances to voting places E.g. variation of voting results is partly dependent on the variations in economic activities of land use (agricultural vs. industrial vs. urban regions)

Statistical graphics Role of statistical graphics in visual analysis presentation preparation for analysis also exploration? How to make a proper chart Books of Edward Tufte Wainer, H. (1984) How to Display data Badly. The American Statistician, vol 38, no 2. http://users.stat.umn.edu/~sandy/courses/8801/handouts/04.tabular/wainer1984.pdf Presentation of statistical data paper print by V. Kuusela, Statistics Finland (available during the sessions)

Avoid the typical mistakes in statistical graphics! Study the basics, check the following: Whether you present magnitudes or trends Continuity is interpreted from the horizontal axis The principle difference between line chart and column chart The principle difference between column chart and bar chart* Problems with the interpretation of pie charts * column chart: vertical bars bar chart: horizontal bars Title is as essential element in a chart as it is in a map

Tips for proper graphics study these! Watch the video and/or browse the slides: D. Taylor Introduction to Data Visualization in YouTube https://www.youtube.com/watch?v=xigjtudgxyy Snap shots of the video http://www.slideshare.net/prooffreader/introduction-to-data-visualization-41067274 Look at these examples of D.Taylor (slide #) [time in the video] Do not cut colums (18-19) [6:10-] Golden ratio slope and aspect ratio (20-23) [7:70-10:05] Stacked bar charts (34) [12:55-14:25] Keep in mind your audience (42) [17:45-19:00] Histogram (48-50) [21:15-] Data to ink ratio by E. Tufte (51) [22:50-] Pie chart (53-55) [23:40-] Jobs of a data visualization (56) [25:10-] Problem with logarithmic scale (57) [26:20-] First impression [28:00-28:30]

What we can learn from your examples This is a learning session, so please don t get upset if we criticise your graphics...

Mapping of data to coordinates For multivariable data Variables on coordinates all data objects in the same display requires metric values need for transformation from nominal and ordinal to metric; see the example in the prereading about PCP scaling of axes is critical may require preprocessing some tools: interactively during the analysis Basic methods Scatter plot 3D scatter plot, scatter plot matrix Parallel coordinates plot (PCP) 3D parallel coordinates, radial coordinates

Scatter plot fi: parvikuvio, hajontakuva Dependency between two variables: Mathematically: correlation coefficient +1 0-1 regression Visually in a scatter plot for bivariate analysis independent and dependent variable intuitive interpretation: y depends on x y x proportional relationship inverse relationship no correlation patterns may be less clear

scatter plot If we have multiple variables, how are they correlated? What if there is local correlation instead of the global one? How to detect the dependent subsets? Scatter plot for bivariate analysis matrix for multiple variables

Example: Scatter plot matrix Three species of iris distinguised by three colours linear relationship clustering http://support.sas.com/

Example: Scatterplot matrix added with In this multiform matrix only the elements above the diagonal are scatterplots. So, only one setting of independent/dependent variables is displayed. Histograms occupy the diagonal. The below-diagonal space displays bivariable spacefill visualisations. In a spacefill each grid square or pixel represents one data object. The colour of a square represents one of the two attributes and the order of the squares (e.g. a scanline) is according to the second attribute. This technique solves the problem of overprinting in a scatterplot. A spacefill can be used to visually estimate the strength of the relationship between the two displayed attributes. If the attributes are strongly correlated, there is a relatively regular and smooth transition from the lightest to the darkest colour from bottom to top (or from top to bottom). A weaker correlation produces a scattered pattern. A random pattern means that there is no correlation between the respective attributes.

Parallel coordinates plot (PCP) An axis for each variable, all axes are displayed in parallel Several axes, ordering of axes interactively (in the example only two variables, to demonstrate the idea and compare the depiction with a scatterplot) Each data object is presented by a polyline that intersects each axis according to the value of the respective variable B A A B

Example: PCP

PCP What we perceive: A polyline Crossing of a polyline with an axis Pattern of lines between two adjacent axes Therefore, interactivity is necessary for the analysis Reordering of the axes changes the patterns

Brushing for focus The counties with the highest percentages of college graduates have been highlighted. S.Few: Multivariate Analysis Using Parallel Coordinate (see the prereading)

Coping with too many data objects: Colour coding by classification Per one variable only The three views to the data are linked by colour: In the upper view (PCP), the data objects are classified by their value of the rightmost variable and colour- coded accordingly. 3D scatter plot MacEachren et al. GeoVISTA www.psu.edu/geovista

Coping with too many data objects: Clustering & mean values Chen & MacEachren (2008) Resolution Control in Multivariate Spatial Analysis. CaJ 45:4, pp. 261-273.

Example: Categorical values in PCP A parabox (another name for parallel coordinates) graph from Advizor Solutions Source: S.Few "Multivariate Analysis Using Parallel Coordinates"