TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

Similar documents
TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

Chapter 3. Determining Effective Data Display with Charts

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Advanced data visualization (charts, graphs, dashboards, fever charts, heat maps, etc.)

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

SAS Visual Analytics 8.2: Getting Started with Reports

SAS Visual Analytics 8.2: Working with Report Content

Chapter 2: Descriptive Statistics

Overview. Frequency Distributions. Chapter 2 Summarizing & Graphing Data. Descriptive Statistics. Inferential Statistics. Frequency Distribution

Why Should We Care? More importantly, it is easy to lie or deceive people with bad plots

SAS Web Report Studio 3.1

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form.

Chapter 2 Describing, Exploring, and Comparing Data

Table of Contents (As covered from textbook)

Creating a Basic Chart in Excel 2007

CHAPTER 2. Objectives. Frequency Distributions and Graphs. Basic Vocabulary. Introduction. Organise data using frequency distributions.

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

DATA MINING AND WAREHOUSING

1. Data Analysis Yields Numbers & Visualizations. 2. Why Visualize Data? 3. What do Visualizations do? 4. Research on Visualizations

Lesson 18-1 Lesson Lesson 18-1 Lesson Lesson 18-2 Lesson 18-2

Descriptive Statistics, Standard Deviation and Standard Error

Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS

GRAPHING BAYOUSIDE CLASSROOM DATA

C ontent descriptions

Classroom Course Description. Course Outline. Tableau Intermediate & Advance. Audience

Data Mining: Exploring Data. Lecture Notes for Chapter 3

STAT STATISTICAL METHODS. Statistics: The science of using data to make decisions and draw conclusions

MS-55045: Microsoft End to End Business Intelligence Boot Camp

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Middle School Math Course 2

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

3. Data Analysis and Statistics

Middle School Math Course 3 Correlation of the ALEKS course Middle School Math 3 to the Illinois Assessment Framework for Grade 8

Microsoft End to End Business Intelligence Boot Camp

This guide covers 3 functions you can perform with DataPlace: o Mapping, o Creating Tables, and o Creating Rankings. Registering with DataPlace

Regression III: Advanced Methods

Data Analyst Nanodegree Syllabus

8 Organizing and Displaying

Choosing the right graph in Excel

Analytics Fundamentals by Mark Peco

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended.

Desktop Studio: Charts

Decimals should be spoken digit by digit eg 0.34 is Zero (or nought) point three four (NOT thirty four).

SAS Visual Analytics 6.1

Data Science. Data Analyst. Data Scientist. Data Architect

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

MERBEIN P-10 COLLEGE MATHS SCOPE & SEQUENCE

More Summer Program t-shirts

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Appendix A: Graph Types Available in OBIEE

HOUR 12. Adding a Chart

Visualizing Data: Freq. Tables, Histograms

Excel Core Certification

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and

Performance Level Descriptors. Mathematics

Use of GeoGebra in teaching about central tendency and spread variability

Tips and Guidance for Analyzing Data. Executive Summary

3 Graphical Displays of Data

Beyond The Vector Data Model - Part Two

All Work should be in your Blue Book and Hardback Your previous tests throughout the year should also help with revision.

Tableau. training courses

Statistics can best be defined as a collection and analysis of numerical information.

Anatomy of Tables and Graphs. Matthew Wettergreen, PhD

Chapter 2 - Graphical Summaries of Data

Name Date Types of Graphs and Creating Graphs Notes

Error Analysis, Statistics and Graphing

Overview of SAS/GIS Software

Oklahoma Learning Pathways

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

RedLink Publisher Dashboard Overview January 2018

Glyphs. Presentation Overview. What is a Glyph!? Cont. What is a Glyph!? Glyph Fundamentals. Goal of Paper. Presented by Bertrand Low

Desktop Studio: Charts. Version: 7.3

SAS Visual Analytics 6.4

Chapter 2 Modeling Distributions of Data

QDA Miner. Addendum v2.0

TIMSS 2011 Fourth Grade Mathematics Item Descriptions developed during the TIMSS 2011 Benchmarking

MATHEMATICS Scope and sequence P 6

7 th Grade STAAR Crunch March 30, 2016

16 Data Visualizations. to Improve Your Application

MicroStrategy Analytics Desktop

2.1: Frequency Distributions and Their Graphs

At the end of the chapter, you will learn to: Present data in textual form. Construct different types of table and graphs

Geography 281 Map Making with GIS Project Three: Viewing Data Spatially

Chapter Two: Descriptive Methods 1/50

Foundation Level Learning Targets Version 2.2

SAS Visual Analytics 6.3

Visualizing Crime in San Francisco During the 2014 World Series

Question Bank. 4) It is the source of information later delivered to data marts.

Chapter 6: DESCRIPTIVE STATISTICS

Grade 7 Mathematics STAAR/TEKS 2014

What Type Of Graph Is Best To Use To Show Data That Are Parts Of A Whole

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Lecture 6: Chapter 6 Summary

20466C - Version: 1. Implementing Data Models and Reports with Microsoft SQL Server

Product Documentation SAP Business ByDesign August Analytics

MATHEMATICS Grade 7 Advanced Standard: Number, Number Sense and Operations

Transcription:

Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. This preview shows selected pages that are representative of the entire course book; pages are not consecutive. The page numbers shown at the bottom of each page indicate their actual position in the course book. All table-of-contents pages are included to illustrate all of the topics covered by the course. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

COURSE OBJECTIVES You Will Learn: Visualization as a communication medium Preparing data for visualization Components of visualization Choosing and using charts and graphs Visual exploration and analysis Visual design techniques Extending visualization with infographics Visual storytelling Data visualization tools TDWI takes pride in the educational soundness and technical accuracy of all of our courses. Please send us your comments we d like to hear from you. Address your feedback to: info@tdwi.org Publication Date: December 2017 Copyright 2017 by TDWI. All rights reserved. No part of this document may be reproduced in any form, or by any means, without written permission from TDWI. ii TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

TABLE OF CONTENTS Module 1 Data Visualization Concepts... 1-1 Module 2 Fundamentals of Visualization... 2-1 Module 3 Visualization Techniques... 3-1 Module 4 Visualization and BI... 4-1 Module 5 Tools and Resources... 5-1 Appendix A Bibliography and References... A-1 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY iii

Data Visualization Concepts Module 1 Data Visualization Concepts Topic Page Data Visualization Today 1-2 Data and Visualization 1-10 Data Visualization Components 1-16 Visual Cues 1-20 Coordinate Systems 1-40 Measurement Scales 1-46 Visual Context 1-56 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-1

Data Visualization Concepts Data and Visualization Finding the Right Data 1-10 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

Data Visualization Concepts Data and Visualization Finding the Right Data MATCHING DATA TO PURPOSE DATA REPOSITORIES Before data can be visualized, it must be obtained. Data is selected for visualization based on a business problem or question of interest. The most obvious sources of data are enterprise repositories for example, a data warehouse, data mart, or a data lake. These repositories are the primary source of enterprise data, and will be a frequent focus for data visualizations. Data may be queried or extracted for these sources using a variety of techniques, and filtered based on business-driven criteria such as time or subject. OTHER SOURCES Data of interest may also be housed outside an organization often referred to as external data. External data repositories may include: University data General data applications Topical data (geographical, sports, census) Websites (data scraping) DATA HANDLING Acquisition of data may require complex processing. Examples include the integration of data from multiple sources, the transformation of data based on business rules and analytics objectives, or the cleansing of data based on quality standards. A variety of interfaces, formats, and processing activities may be required. These data management activities are beyond the scope of this course, but it is important to recognize that they are often required. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-11

Data Visualization Concepts Data Visualization Components The Parts of Data Visuals 1-16 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

Data Visualization Concepts Data Visualization Components The Parts of Data Visuals VISUAL LANGUAGE COMPONENTS VISUAL CUES Recall the earlier statement that data visualization is a language with components and rules. The components of visual communication include visual cues, coordinate systems, and scales. These are the building blocks with which we create charts and graphs. Visual cues are used to draw the eye of the viewer to specific parts of or relationships in a graph. Visual cues include: Placement: Where we locate things on a graph with attention to both position and proximity Lines: Length, angle, and direction of lines in a graph all directly influence visual perception Shapes: Area and volume of shapes in a graph (circles, squares, etc.) communicate the relative size of the things that they represent Color: Both hue (red, green, blue...) and saturation (intensity) influence the perceived importance of objects in a graph COORDINATE SYSTEMS Common coordinate systems are the means by which quantitative values are plotted to specific locations in a graph. They include: Cartesian systems place data points on a two-dimensional grid that is defined by an x-axis and a y-axis. Scatter plots and line graphs are examples of graphs using Cartesian coordinates. Polar systems place data points based on a circular system. The center of the circle is the base point or zero point. The value and characteristics of a data point determine placement as distance from the center point on the radius of the circle and angle from zero to 360 degrees around the circumference of the circle. Radar graphs are a common example of polar coordinates. Geographic systems place data points on a map. Data point values may be placed according to a coordinate system based on latitude and longitude. Placement may also include altitude for three-dimensional plots. Heat maps are sometimes an example of using a geographic system. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-17

Fundamentals of Visualization Module 2 Fundamentals of Visualization Topic Page Data Visualization Methods 2-2 Data Visualization Standards 2-18 Visualization with Purpose 2-40 Data Visualization Development 2-50 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-1

Fundamentals of Visualization Data Visualization Standards Good Design 2-18 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

Fundamentals of Visualization Data Visualization Standards Good Design COMMUNICATING WITH VISUALS READABLE CLEAR UNAMBIGUOUS The goal of data visualization is good communication. The opportunity to miscommunicate is always present. To avoid miscommunication, pay careful attention to design to be sure that your visuals are readable, clear, and unambiguous. The two examples at the top of the facing page illustrate graphs that are difficult to read. The climate model is cluttered and confusing with too many lines to follow and a complex legend that consumes more than one-fourth of the visualization space. The heat index plot is confusing with lines, data points, and text colliding. The meaning of the lines is uncertain, and the visual uses city abbreviations that are not standard and are subject to interpretation. Both of these images have high potential to confuse and miscommunicate. The center of the facing page shows two examples of communication that are unclear. In the on-time arrivals graph, both U.S. Airways Shuttle and Delta Shuttle show 91 percent on-time arrivals but the bars are of different lengths. It takes some study to understand that the bar length represents the total number of arrivals or is it the total number of on-time arrivals? Next it is uncertain whether the number of arrivals corresponds with the end of the bar or the end of the airplane and train images. Does Delta Shuttle have approximately 2,000 arrivals or 2,500? The choropleth map communicates badly because the pattern and shading choices are poor. A quick look suggests that Texas has more of whatever is being measured than California. Only by studying the legend do you see that the Texas shading means two and the California pattern means seven. The two images at the bottom of the page are ambiguous for similar reasons. Both use 3-D charts to show two-dimensional data a technique that inhibits rather than advances communication. In the column graph it is difficult to match the top of a column to the y-axis scale, so precision is lost. In the pie chart the relative size of the slices is even more difficult to compare than with a simple two-dimensional wedge. On the left side of the chart it is difficult to match the state abbreviations with the slices they are intended to label. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-19

Fundamentals of Visualization Visualization with Purpose What Do You Want to Show? 2-40 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

Fundamentals of Visualization Visualization with Purpose What Do You Want to Show? WHAT DO YOU WANT TO SHOW? Good visual design begins by knowing what you want to communicate most commonly comparisons, proportions, relationships, or patterns. Comparisons illustrate what differences are between things. Proportions show what parts comprise a whole. Relationships show what dependencies exist among subjects or variables. Distribution patterns show at what frequencies data values or conditions occur. Location patterns show what places are associated with specific data values or conditions. Probability patterns show what the chances are that a specific value or condition will occur. Behavior over time patterns show what trends are observed in the data. Knowing what you want to show is a fundamental first step to choosing the right graphing and visualization techniques to communicate well. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-41

Data Visualization Techniques Module 3 Data Visualization Techniques Topic Page Visualization Techniques 3-2 Visualizing Comparisons 3-4 Visualizing Proportions 3-14 Visualizing Relationships 3-24 Visualizing Patterns 3-38 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-1

Data Visualization Techniques Visualizing Proportions Proportion Data 3-14 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

Data Visualization Techniques Visualizing Proportions Proportion Data PARTS TO A WHOLE DATA CHARACTERISTICS Visualizing proportion data focuses on using size or area to show differences or similarities between values or parts to a whole. For proportion analysis we often look for the maximum, minimum, or the overall distribution. Looking at maximum and minimum is simple sort your data by a particular variable and pick the ends for your maximum and minimum. Distribution looks at frequency of values and also the shape of data. Proportion data typically uses aggregated data, thus some transformation and cleansing may be required to structure the data to be able to analyze value differences or parts to a whole. To analyze parts to a whole, the data will need to be aggregated by a category that is a subcategory to the whole. For instance, summarizing sales by store for a region would enable analysis of store sales within a region. Store sales are the parts, and the whole is the region s total sales. Analyzing differences or similarities between values may require aggregation of single variables for comparison. A common way to analyze differences or similarities between values is a bubble chart. The facing page shows the top 10 states by population where each bubble reflects the size of population in percent via the size of the bubble. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-15

Data Visualization Techniques Visualizing Relationships Distribution 3-28 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

Data Visualization Techniques Visualizing Relationships Distribution DEFINITION HISTOGRAM STEM AND LEAF DENSITY PLOT Distribution shows the relationship between the frequency of occurrence of each value of a variable and the total set of values, as well as the mean, median, minimum, and maximum values. Many visualization methods display frequency how data is spread out over an interval or grouped. We introduced one earlier the box plot to demonstrate outliers. There are three main visualization methods that show distribution. A histogram visualizes the distribution of data over a continuous interval or certain time period. Each bar in a histogram represents the tabulated frequency for that interval/bin. The total area of the histogram is equal to the number of data points. Histograms help us estimate where values are concentrated, what the extremes are, and whether there are any gaps or unusual values. They are also useful for giving a rough view of probability distribution. Stem and leaf plots organize data by place value to show the distribution of data. Place values are shown ascending downwards on a stem column, typically but not always in tens. Data that is within each place value is listed and extends sideways from it as a leaf. A density plot depicts the distribution of data over a continuous period. A density plot is a variation of a histogram that uses kernel smoothing to plot values, resulting in a smoother distribution. An advantage that density plots have over histograms is that they show the distribution shape more clearly because they are not affected by the number bars used in a histogram. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-29

Visualization and Business Intelligence Module 4 Visualization and Business Intelligence Topic Page Visualization and BI 4-2 Analytics 4-4 Visual Reporting 4-8 Infographics 4-14 Data Storytelling 4-22 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-1

Visualization and Business Intelligence Analytics From Exploration to Models 4-4 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

Visualization and Business Intelligence Analytics From Exploration to Models FROM EXPLORATION TO ANALYSIS Visualization is a valuable part of analytics and it is often used by data scientists as part of the analytical process. The Cross Industry Standard Process for Data Mining (CRISP-DM) is the primary approach used by data scientists and reviewing it illustrates how data visualization assists the process: Business Understanding The first phase focuses on understanding the project objectives and requirements and converting this knowledge into an analytics problem definition. Visualization helps explore data used in problem definition. Data Understanding The data understanding phase starts with the initial data collection and proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data, or to detect interesting subsets to form hypotheses for hidden information. Data profiling and visualizing data via frequency charts, histograms, and box plots assist with data understanding. Data Preparation The data preparation phase covers all activities to construct the final data set from the initial raw data. Data preparation tasks are likely to be performed multiple times, and not in any prescribed order. Visualizing data via frequency charts, histograms, and box plots assists with data validation. Modeling In this phase, various modeling techniques are selected and applied, and their parameters are calibrated to optimal values. Visualizing data patterns assists in comparing modeling techniques and results. Evaluation At this stage in the project you have built a model (or models) that appears to have high quality, from a data analysis perspective. Visualizing the final model results supports explanation to multiple stakeholders. Deployment Models are deployed into production, monitored, and revised as needed. Using visualization to track model performance via trends and time series is typical after a model is deployed. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-5

Visualization and Business Intelligence Visual Reporting Dashboards, Scorecards, and Reports 4-8 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

Visualization and Business Intelligence Visual Reporting Dashboards, Scorecards, and Reports PUBLISHING VISUALIZATIONS Many data visualizations are published on a periodic basis as content in reports, scorecards, and dashboards. For each of these uses the designer must consider placement, proximity, and other visual cues for a visual space that includes multiple charts, graphs, tables, etc. Any type of graph, chart, or table may be used in reports, scorecards, and dashboards. Common practices include: Reports using the most familiar types of graphs line, bar, column, and pie charts that are easily understood by a diverse group of people. Scorecards using tables with embedded graphics, including specialty graphs that we ll discuss in a moment. Dashboards using gauges, bar graphs, column graphs, and specialty graphs. Although gauges are common, they are not necessarily good visualization practice. Most gauges are in the form of radial bar graphs. Rectangular bars convey the same information in less space and in a way that is easier for visual comparison. Some good practices when organizing multiple graphs for reporting include: Reports should be organized with careful attention to sequence. Putting summary graphs ahead of details gives context for more detailed information. Grouping graphs for example, all of the revenue graphs together, then all of the expense graphs helps with continuity and comparisons. Scorecards should have the most important performance indicators at the top. It is also helpful to group metrics with lagging indicators immediately followed by contributing leading indicators. Dashboards should have a limited number of graphs and other visuals. Seven or fewer graphical objects is a good rule of thumb. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-9