Visual Analytics Tools for the Global Change Assessment Model. Ross Maciejewski Arizona State University

Similar documents
Visual Analytics Tools for the Global Change Assessment Model. Michael Steptoe, Ross Maciejewski, & Robert Link Arizona State University

Statistical Graphs & Charts

Network Traffic Measurements and Analysis

Few s Design Guidance

DI TRANSFORM. The regressive analyses. identify relationships

VIDAEXPERT: DATA ANALYSIS Here is the Statistics button.

Spectral Classification

Spreadsheet Warm Up for SSAC Geology of National Parks Modules, 2: Elementary Spreadsheet Manipulations and Graphing Tasks

Cluster Analysis for Microarray Data

Data Analyst Nanodegree Syllabus

Multiple Regression White paper

Linear Topics Notes and Homework DUE ON EXAM DAY. Name: Class period:

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

LINEAR TOPICS Notes and Homework: DUE ON EXAM

Data Analyst Nanodegree Syllabus

Visualizing Crime in San Francisco During the 2014 World Series

Data Foundations. Topic Objectives. and list subcategories of each. its properties. before producing a visualization. subsetting

Clustering. Chapter 10 in Introduction to statistical learning

An independent component analysis based tool for exploring functional connections in the brain

Unsupervised learning: Clustering & Dimensionality reduction. Theo Knijnenburg Jorma de Ronde

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Clustering and Dimensionality Reduction

Region-based Segmentation

EZY Intellect Pte. Ltd., #1 Changi North Street 1, Singapore

No Programming Required Create web apps rapidly with Web AppBuilder for ArcGIS

PROMO 2017a - Tutorial

Stager. A Web Based Application for Presenting Network Statistics. Arne Øslebø

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Glyphs. Presentation Overview. What is a Glyph!? Cont. What is a Glyph!? Glyph Fundamentals. Goal of Paper. Presented by Bertrand Low

Cloud Monitoring as a Service. Built On Machine Learning

Visualization of HashStash with Qt

PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Background. Parallel Coordinates. Basics. Good Example

Exploratory data analysis for microarrays

ALGEBRA II A CURRICULUM OUTLINE

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering

CPSC 425: Computer Vision

Integrating Word, Excel, Access, and PowerPoint

Chapter 2: Descriptive Statistics

Unit 6 Quadratic Functions

Data Management Project Using Software to Carry Out Data Analysis Tasks

Chapter 2 - Graphical Summaries of Data

Dta Mining and Data Warehousing

Learning Objectives for Data Concept and Visualization

Software Requirements Specification. for WAVED. Version 3.0. Prepared By:

Viscous Fingers: A topological Visual Analytic Approach

Section 1.1 The Distance and Midpoint Formulas

Python Certification Training

Using Statistical Techniques to Improve the QC Process of Swell Noise Filtering

Step-by-Step Guide to Advanced Genetic Analysis

Basically, a graph is a representation of the relationship between two or more variables.

Chapter 6 Objectives

Aimetis Crowd Detection. 1.x User Guide

Exploring Data. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics.

IAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram

Developing Similar Applications for Dissimilar Audiences

Ex.1 constructing tables. a) find the joint relative frequency of males who have a bachelors degree.

Introduction to WHO s DHIS2 Data Quality Tool

Putting it all together: Creating a Big Data Analytic Workflow with Spotfire

DataView Features. Input Data Formats. Current Release

Slides Prepared by JOHN S. LOUCKS St. Edward s s University Thomson/South-Western. Slide

A Web Application to Visualize Trends in Diabetes across the United States

Security analytics: From data to action Visual and analytical approaches to detecting modern adversaries

Data Visualization Techniques

Analytics and Visualization

Index. grouped (see Grouped bar chart) horizontal, jqplot library, 128 stacked bar charts (see Stacked bar charts) Bubble chart,

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction

Applied Neuroscience. Columbia Science Honors Program Fall Machine Learning and Neural Networks

Chapter 7 Multiple Constraints and Conflicting Objectives. Materials Selection in Mechanical Design, 4th Edition, 2010 Michael Ashby

10701 Machine Learning. Clustering

Warmups! Write down & Plot the points on the graph

AP Physics 1 and 2 Summer Assignment

Step-by-Step Guide to Relatedness and Association Mapping Contents

Chemistry 1A Graphing Tutorial CSUS Department of Chemistry

HYBRID FORCE-DIRECTED AND SPACE-FILLING ALGORITHM FOR EULER DIAGRAM DRAWING. Maki Higashihara Takayuki Itoh Ochanomizu University

3. Data Analysis and Statistics

TrajAnalytics: A software system for visual analysis of urban trajectory data

Create Web Charts. With jqplot. Apress. Fabio Nelli

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395

Esri and MarkLogic: Location Analytics, Multi-Model Data

Supplementary text S6 Comparison studies on simulated data

Introduction to Nesstar

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

Detect Cyber Threats with Securonix Proxy Traffic Analyzer

With A Little Help From Yelp

Hierarchical clustering

Data Visualization (CIS 468)

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Enterprise Miner Software: Changes and Enhancements, Release 4.1

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Computer Studies programs and courses have Changed!!!

Data Exploration with PCA and Unsupervised Learning with Clustering Paul Rodriguez, PhD PACE SDSC

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham

Chemistry Excel. Microsoft 2007

ESPON Online Mapping Tool. RIMAP User's Manual

Functions. Copyright Cengage Learning. All rights reserved.

Team AquaFu (Aperio) Data Science on Water Data

General Instructions. Questions

Transcription:

Visual Analytics Tools for the Global Change Assessment Model Ross Maciejewski Arizona State University

GCAM Simulation After running thousands or even hundreds of simulations through GCAM this process generates an ensemble of datasets that are impossible to go through individually. Each dataset has been generated using varying parameters which may have minimal impact on the outputs while others may have a more drastic impact.

Data Preprocessing In order to tackle the issue of processing this data it was necessary to define a file format that was flexible and would enable the extraction of key features from the datasets. JSON was the chosen file format with Scenario specific: Input identification Output query selection Global feature extraction Period of simulation

System Overview Our system is able to process these JSON files that contain: Variable Input parameters The description of inputs that drove the simulation Output parameters The queries selected to be run in the simulation at the regional level Global Parameters The queries selected to be run in the simulation at the global level Providing analysis across a range of varying scenario

System Implementation Desktop Application (Windows, Mac, & Linux) Electron Framework User Interface Web Technologies» HTML» CSS» JavaScript Data Processing NodeJS Python

Visual Analytics Techniques Clustering Hierarchical K-Means Principal Component Analysis Correlation Identification Anomaly Detection

Visual Analytics Views Dendrogram View Parallel Coordinates Plot Scenario View Difference View Scenario Comparison View Region Comparison View Min/Max View Cluster View

Dendrogram View The dendrogram view shows the hierarchy of clusters formed when running when running all of the provided scenarios through hierarchical clustering. The view is draggable and zoomable allowing the user to easily explore the hierarchy. Nodes are clickable providing filter mechanisms for the Parallel Coordinates View and the Line Chart View.

Dendrogram View Cont. In the dendrogram, leaf nodes represent the scenarios and parent nodes represent the grouping procedure of clustering process. Distance between the nodes or clusters is labeled as dist. To help users gain a better understanding of the clustering results, we label the parent nodes with the input parameter that has the largest difference at that step in the grouping process.

Node Labeling Example Scenario 2 inputs: Energy: 0, CCS: 0, SoEcon: 1, AGLU: 0, Ind/Tran/Build: 0, Fossil: 1 Scenario 5 inputs: Energy: 0, CCS: 0, SoEcon: 1, AGLU: 0, Ind/Tran/Build: 1, Fossil: 1 By applying the difference equation below we find the greatest difference between Scenarios 2 and 5 to be Ind/Tran/Build so the parent is colored Blue.

Parallel Coordinates View Displays the output values for each region or scenario based on the selected queries. The plot can display the actual values and the slope of the values. The user can select to examine the data across all years as a summation or average, or they can select the value from a specific year in the simulation.

Filtering options: Brushing Each y-axis represents a different feature from the output query. The user can brush along the y- axis to filter the data in the table below the plot. Each new brush combines previously filtered values with the new data under brushing.

Filtering op,ons: Hovering The user can hover over a line to examine a specific region or scenario in the data table below.

Difference View (Scenario A vs Scenario B) The Difference View shows the normalized difference of Scenario A vs Scenario B across all features.

Difference View Cont. The normalized difference is found by rescaling each feature using feature scaling. Feature scaling is used to bring all values into the range [0,1].The equation for feature scaling is defined as follows: After rescaling each feature the difference of Scenario B is taken from Scenario A giving us a range of [-1,1].

Scenario Comparison View The Scenario Comparison View shows the summation of all countries for each feature in the scenario files.

Map View The Map View is used as a filter for the Parallel Coordinates View. When a region is selected the Parallel Coordinates View will be filtered to the selection.

Region Comparison View The Region Comparison View shows the value for the selected country across all scenario files.

Min/Max View Users can insert a min/max line chart for any of the output features from the provided queries. In the example on the left we have inserted line charts using CO2 emissions by aggregate sector for: Building Liquid Systems Regional Sugar for Ethanol Biomass Systems

Min/Max View Cont. In each line chart, the x-axis represents the year of simulation and y-axis represents the output value. The black line represents the mean output value of all regions. The blue area chart represents the range mean ± 2 * std. Each dot represents a set of regions ( or a single region) having the maximum or minimum value. The computation of output values for each country and each year is explained in the next slide.

Min/Max View Cont. In the example on the left The max value is shared by 35 regions and changes in 2015 to china. China continues to be the dominant region until 2090. During this period China soars above the 2 std area. Later in 2090 the max value is shared by 24 regions for the remainder of the simulation. The min value is held by Columbia and this continues until 2025. From 2025 onto the end of the simulation 4 regions share the minimum value.

Min/Max View Cont. The analysis goal is to 1) find the abnormal countries in each type of output and each year from all scenarios, and 2) find similar outputs with similar changing trend. Step 1: for each output at each country and each year, we compute its average value over all scenarios. Step 2: So, for each output and each year, suppose we have N countries (i=1 N), we can compute their mean and standard deviation over all countries: Step 4: we can draw a line chart for each output of which the x-axis is the time and y-axis is. We can also add some dots onto the line chart. The dots represent the abnormal countries having the maximal and minimal values.

Cluster View The Cluster View shows the PCA plot of all scenarios and their clusters from K-Means. This view provides a spatial overview of each scenario and shows their similarity based on locality and clustering.

Cluster View Cont. Each dot represents a different scenario and the coloring is used to show which cluster they belong to. The User can control the number of clusters generated by K-Means.

Demo

Thank You Contributions Visual analytics tool for processing multiple scenarios Tools for analyzing and comparing scenarios Future Work More visualization features UI improvements Server distribution Parallelization